Sabbatical Data Project – Part 1
Part of my sabbatical is to do a “mini” project at GCC as a proof of concept or example of how faculty can use data to answer questions about the courses they teach. “This part (of my sabbatical) will also involve working with GCC’s SPA (IE department) to create a data project that supports my department and other departments that might have an interest in using data.” My hope was to expand upon the ideas of a Learning Grant my department had last year: Analytics for English Faculty Learning Community. The goals of this project were to make reading, writing, and English humanities faculty members, who teach online and/or hybrid courses at Glendale Community College, more knowledgeable about the types of student data available to them and the ethically and pedagogically sound methods to use said data.
So for this mini project, I plan to expand upon the ideas behind this grant into this part of my sabbatical. One question that I’ve always had is, “Is there a way to predict how well students will do in an online or hybrid class before they even begin?” If we can predict which students might be at risk, we can then design targeted interventions to help those students. I’m not talking about standard best practices that we should be doing for all students. I’m talking about interventions that are more invasive that might be disruptive to the course if an instructor tried to do them with all students. An example of an intervention might be requiring students to meet with the instructor with in the first week to discuss goals for the course. This could prove helpful, but probably impossible for an instructor with multiple online and hybrid courses.
The data we would like to see are:
- what previous classes have been attempted, including online and hybrid class history
- how many drops and withdrawals,
- past grades on prerequisite courses,
- GPA,
- placement test scores,
- and demographic data.
I’ll post more on this “mini” project after we get the data. In the mean time I’ll be researching what interventions we might add to our suggested list to help support the students identified at risk and for colleges who have similar programs set up and what their success has been.
I’m also starting a new MOOC this week: Analytics in Course Design: Leveraging Canvas Data. The primary goal of this course is to explore Canvas data and visualization techniques that faculty and instructional designers can use to make informed decisions about Canvas course design. Finally, a course that makes sense to me. The course has four modules that each apply different aspects of Canvas data to course design. The modules cover Student Engagement, Course Design, Assignment Submissions, and Discussion Interactions.
Sabbatical 2018-19: Time to Get Back to Work (Spring 2019) I’m Going to Harvard
I’m starting a new course that I was supposed to complete in the fall. It’s an edX course from Harvard: HarvardX: PH125.1x Data Science: R Basics.
In this course I hope to learn how to read, extract, and create datasets in R, how to perform a variety of operations and analyses on datasets using R, and how to write my own functions/sub-routines in R. I’m not sure what all that means yet, but I have a good idea.
“R is a language and environment for statistical computing and graphics. It’s an integrated suite of software facilities for data manipulation, calculation and graphical display” (R website).
The R language is widely used among statisticians and data miners for developing statistical software and data analysis, therefore, it’s on my list of skills to learn during my sabbatical. The goal is not to become an expert at R, but to get an idea of what is involved in using this tool for data analysis.
The first section of this course is an introduction to R Basics, functions and datatypes. So I’ll start with learning to appreciate the rationale for data analysis using R, define objects and perform basic arithmetic and logical operations, use pre-defined functions to perform operations on objects, and distinguish between various data types.
Basically I’m typing in a bunch of variables and formulas into the program to get answers. Example: What is the sum of the first 20 positive integers? I’m not sure why I’d want to know this, but I would: Type in n <- 20 to define the variable and then use the formula n*(n+1)/2 to get your answer.
Keeping my fingers crossed that my love hatred for math doesn’t show through. However, I can see how someone who is good a math would be a natural fit for coding. Wish me luck. I’m going to need it.
Sabbatical 2018 Week 14: Starting the Capstone Project
I successfully completed the first five courses of the Big Data Specialization through Coursera, and I’ll have to say it was crazy. I never knew there was so much to learn about data analysis. My mind is still spinning. Now I’m expected to put it all together and actually do a project. Whew! Wish me luck. I’m going to need it.
The Capstone project is a 5-week project where I’ll be doing some data exploration, aggregation, and filtering using Splunk. Then in the following week, I will perform classification on the fictional game data using a decision tree in KNIME. Next, I will learn how to use Spark MLlib to do cluster analysis on the simulated game data. This is then followed by exploring a somewhat different dataset, simulated chat data, and performing some graph analytics using Neo4j in the 4th week. Then I will be gathering results together and preparing a presentation and report. I will complete the project by submitting my presentation and final report in the final week.
At this point, I am not confident I will be able to complete this, but I said that about my doctorate dissertation too, and I completed that. It’s amazing what you can accomplish if you just try. So try I will.
I will have to say, however, that prior to signing up for this specialization, they make it sound like anyone with no prior experience could learn how to be a data analyst by just completing this specialization. These people are high. You really need a background in coding to be a data analyst. Learning how to use all the many programs was a challenge and impossible to memorize the coding needed to run the programs. It was fun, however, copying and pasting the code and watching it do stuff.
The specialization did give me insight into all that goes into data analysis and trust me, there is a lot. The capstone course doesn’t officially begin until February, but I was able to enroll and get started. I’ll probably stop and take a break for the holiday and finish this in February.
Sabbatical 2018 Week 12: Canvas Data Portal
I was finally able to get access to Maricopa’s instance of Canvas Data Portal, so this week I’ll share a little about what it is and how we might use it. I connected with Randy Anderson at the district, our Canvas administrator, and he was very helpful in getting me and Lisa set up for our accounts and permissions. I guess he figured we couldn’t do too much damage. I’ll explain that in a bit.
Canvas Data is a service from Canvas that provides admins with optimized access to their data for reporting and queries. “Canvas Data Admins can download flat files or view files hosted in an Amazon Redshift data warehouse. The data will be an extracted and transformed version of a school’s Canvas activity and can be accessed using any open database connectivity (ODBC) analytics tool to generate custom data visualization and reports” (Canvas). I’ve been learning about some analytics tools in my Big Data Specialization courses. Unfortunately for me, none of the 30+ tools mentioned so far are ODBC analytics tools. They were mostly big data management systems (BDMS).
The most common ODBC analytics tools include Excel (using Amazon Redshift), Tableau, R, and SQL Workbench/J. I’m scheduled to learn both Tableau and R in the spring in either the Johns Hopkins Data Science Specialization on Coursera or the Data Visualization with Tableau Specialization on Coursera. I haven’t decided which specialization I’ll officially do, but I’ll be able to access both.
Apparently, the district office checked into the cost of hosting in an Amazon Redshift data warehouse, and it was cost prohibitive. This is the method that many other institutions choose, while others do in-house database management. Either way, this decision is beyond me, and I’ll just have to wait to see how it pans out in Maricopa if it does at all. In the meantime, I’m hoping to be able to play with smaller sets of data from Canvas Data portal using the tab delimited (.txt) flat files. “Canvas Data parses and aggregates the over 280 million rows of Canvas usage data generated daily and exports them” (Canvas). That’s a lot of data. And I’m guessing without a specialized database or warehouse, we’ll have trouble utilizing these files.
The portal includes a Canvas Data schema which includes documentation that explains all the table data that is exported from Canvas. We could use this data to answer a multitude of questions about our students, instructors, and the courses in Canvas. For instance, Canvas suggests we could answer questions like, “What makes a successful department/course/instructor?” “How can our institution improve student retention?” and even “How are students doing in the course (current and historical)?” There’s much information to be gained from the data.
I get a little overwhelmed just thinking about all there is left to learn. The Canvas Data FAQ is a good place to start. From there I’ve already learned how to open the flat files and how to add headers to the columns. I’ve also bookmarked the R FAQ and a page for 7-Zip, a free file archiver with a high compression ratio. It’s the tool needed to open the Canvas Data .gz files. In the spring I’ll also get to visit a couple of colleges who already have all this setup and running. I good example of what would be really cool is the Unizin Data Warehouse at Indiana University. It gives faculty direct access to Canvas data for their courses. I would love to have that set up in Maricopa. Someday maybe.
Sabbatical 2018 Week 10: What Happened to Weeks 7-9?
Boy, time sure flies when you’re busy. It’s already week 10. I had to go back to August to count the weeks because I barely know what day it is, let alone how many weeks have passed in the semester. This of course is all good. While on sabbatical, I’ve also been renovating a vacation home we purchased up in Happy Jack, so my life has been consumed with data and renovations for the last three months. Thankfully one of those projects is almost complete. And that would not be the data project. On we roll.
Big Data is still my world at the moment. I’m currently in course 5 of the Big Data Specialization on Coursera. Course 5 is Graph Analytics for Big Data. I’m learning about how real world data science problems can be modeled as graphs along with various tools and techniques. The biggest thing I’ve learned so far is that most people don’t know what graphs are. Most people think graphs are these pretty pie charts.
These are not graphs apparently. These are pie charts. I knew that. I love pie charts. We are not learning how to make pie charts in the Graph Analytics for Big Data course. We’re learning how to make this below. This is a graph with nodes and edges.
I should have know this was not going to be simple. This graph theory is tied to math, so they are “mathematical structures used to model pairwise relations between objects.” “Graphs can be used to model many types of relations and processes in physical, biological, social and information systems” (Wikipedia).
A good example of how graphs can be used is with fraud detection. Graph databases are uniquely positioned to spot the connections between large data sets and identify patterns, a useful trait when it comes to spotting complex, modern fraud techniques. A better example is the product recommendations you get on Amazon and other online retail sites. Amazon can pull together product, customer, inventory, supplier and social sentiment data into a graph database to spot patterns and make smarter recommendations to you.
I’m still wrapping my head around how graphs can be useful in education. For an assignment I designed a graph around a peer review assignment for students. It’s pretty basic, but in my mind this might be useful data to find patterns to help students improve their work.
Later in this course we will be learning how to use Neo4j, a graph database management system and GraphX, Apache Spark’s API for graphs and graph-parallel computation. So I imagine my graphs in another week will be much better.
Next post I’ll share some information about Canvas Data Portal, as I now have access to Maricopa’s instance. It’s so exciting even though I don’t really know how to “look” at the data yet, but I can see all the flat files. I just need a database to magically appear with a data scientist attached to help. 🙂
Sabbatical 2018 Week 6: The Big Data Landscape is Ridiculously Huge
Last week I completed course two in the Big Data Specialization: Big Data Modeling and Management Systems. This was another very technical course. We gained an in-depth knowledge of why big data modeling and management is essential in preparing to gain insights from your data. We gained knowledge of real world big data modeling and management use cases in areas such as energy and gaming. We also learned to understand different kinds of data models, the ability to describe streaming data and the different challenges it presents, and the differences between a DBMS and a BDMS.
We did a lot of playing in the Cloudera VM again. I type in the codes given and things magically happen. It’s kind of cool, but no way I’m going to remember how to replicate any of this. For example, we learned how to import and query text documents with Lucene and perform weighted queries to see how rankings change. We learned how to perform statistical operations and layout algorithms on graph data in Gephi. I believe we actually installed and ran that program on our computers instead of in Cloudera. Then back in Cloudera we learned how to view semi-structured data streaming in real-time from a weather station and create plots of streaming weather station data.
If your head is spinning from just the few programs I mentioned already, it’s going to explode when you hear we also were introduced to Redis, Aerospike, AsterixDB, Solr, and Vertica. I thought I might pass out. The Big Data landscape is ridiculously huge. How anyone knows all of these programs is beyond me.
Also this week I reached out to district IT to schedule a meeting with the Canvas administrators to discuss Canvas Data Portal. It sounds like they have already started doing some exploring on their own. In fact, I was told to contact another individual who had already done some initial investigation in the use of Amazon Redshift. And a few developers have already explored it as part of a Transformation data project. It also looks like I’ll be able to get access to our Data Portal soon as well so I can start exploring. This is great news, as I thought this one step would be the one thing to derail my sabbatical proposal. Things are moving forward. I’m a little behind on my reading and annotated bib, but besides that I’m right on track. Yay, me!
Sabbatical 2018 Week 5: Not All Work and No Play
If you’ve never taken an extended sabbatical from your job, you’re really missing out. It’s a great experience that I’m grateful to have taken advantage of twice in my 20 years in Maricopa. I really think I’ve worked hard enough to deserve it, and you probably have too. According to the MCLI website,
“A sabbatical leave is an opportunity to broaden or deepen educational interests, to explore new areas, or examine instructional methods to enhance the mission of the college. A sabbatical leave gives faculty a respite from their normal duties in order to provide them an opportunity to grow professionally. The goal of a sabbatical leave project is to engage faculty in the areas of study, research, travel, work experience, or other creative activity, and to contribute to the institution as a whole upon his/her return to the college.”
If you’re into learning new things then a sabbatical in Maricopa is for you. However, in the more generic sense the word sabbatical, which can be a noun or an adjective, comes from the Greek word sabatikos, which means “of the Sabbath,” the day of rest that happens every seventh day. Most teaching jobs come with the promise of a sabbatical, which is a year of not having to teach, though you still get paid. It’s also interesting to know that only 5% of US companies offer paid sabbaticals. So I’m not complaining that I still have to work during my sabbatical. At least it’s something I’m interested in learning and doesn’t involve grading hundreds of essay. It’s definitely a respite from the norm.
The challenging part for me is getting used to doing less. Many faculty do more than just teach a 15 hours schedule, and Maricopa is good about providing opportunities and compensating those of us who do more. For the past 4 years, I’ve been wrapped up in the world of professional development, online learning and OER. I’ve taught very little, but worked more than I have in previous years collaborating, coordinating, and strategizing with our Instructional Designer, CTLE Staff, eCourses faculty lead and faculty developers. My involvement also included working district wide with other CTL directors, elearning and OER leaders. It’s hard to just go cold turkey and not talk to or work with any of those people anymore. My only saving grace is that many of those people are personal friends and we still chat when I sneak on campus to visit or attend a planned happy hour. Shout out to Meghan, my better half for the last 4 years.
One major plus is that my other partner in crime for the past 4 years, Dr. Lisa Young, is also on sabbatical this year, and her sabbatical proposal is similar to mine – Big Data. And as the Faculty Director of SCC’s CTL and Co-Tri-Chair of the Maricopa Millions project, she’s been involved in all the same things I have. So she can relate. Part of our sabbatical plan is to hike every other week to discuss our projects and other stuff. It’s comforting to know she’s learning the same things and good to have someone to bounce ideas off of. And it doesn’t hurt to get some exercise in on a regular basis. Below is evidence of our endeavors.
The best part of a sabbatical is you get to determine your schedule, so there’s a lot of flexibility in there for doing the things you never seem to have time for. The reality is that many of the people you’d like to do those things with are still working hard and stressed out. Ha! (Sorry Beth! Thanks for visiting me yesterday)
And one more for the road. So far we’ve hiked Holbert and Mormon Trails on South Mountain, Cholla Trail on Camelback, and Trail 100 in Dreamy Draw followed by breakfast at Dick’s Hideaway, Scramble, and First Watch. Breakfast is an added bonus. What is up with my hair?! Anyway, I’m looking forward to it cooling off so we don’t have to hike so early. Then sabbatical life will be truly perfect. Well, if they can figure out how to pay me correctly then it will be truly perfect.
Sabbatical 2018 Week 4: Where’s My Money?
It has become painfully clear that I will never be a data analyst. That’s not necessarily a bad thing considering I already have a job as an educator at a great community college. Thank goodness for that because I’m a little over my head here in my Big Data Specialization from the University of California San Diego. Somehow I’m learning just enough to get by, but don’t ask me anything specific. You really have to be a programmer to use this stuff.
Course 2 was Big Data Modeling and Management Systems and it was very technical. It was all about Big Data technologies, and frankly I’m happy to leave that part to the IT experts. Systems and tools discussed included: AsterixDB, HP Vertica, Impala, Neo4j, Redis, SparkSQL <eyes glass over>. We learned an in-depth knowledge of why big data modeling and management is essential in preparing to gain insights from your data, and knowledge of real world big data modeling and management use cases in areas such as energy and gaming. We also learned about different kinds of data models, the ability to describe streaming data and the different challenges it presents, and the differences between a DBMS and a BDMS.
I some how managed to complete the final assignment for this course, which was to design a data model for a fictitious game: “Catch the Pink Flamingo.” The strangest thing about this whole Coursera setup is the assignments are peer reviewed. I’m awaiting my fate as I type. I wasn’t really clear if what I was doing was correct, but I did my best and submitted the assignment. Then I had to go in and review my classmates’ work. Yeah, right? It looked good. Nothing like mine, but hey, who’s right? I guess we’ll see once my assignment is peer reviewed.
Two courses down; four to go. Then on to the Johns Hopkins Data Science Specialization. In the mean time, I’ve reached out to our district IT person in charge of Canvas. I’m hoping to meet with her soon to discuss Canvas Data Portal. ITS has a proposal process when our resources are needed for more than 20 hours, so I have to go to the PMO site which is where a business case can be initiated to start the process. Additionally, the IITGC provides prioritization of business cases/projects for ITS, so I’ll have to cross my fingers and hope my case gets prioritized.
Okay, back to figuring out how to get paid correctly. Hey, Maricopa, where’s my money?
Sabbatical 2018 Week 3: What the Hadoop?
So I finished my first Coursera course: Introduction to Big Data. It was the first and shortest of the 6 Big Data specialization courses. It was only a 3 week course. I added my course completion certificate to my LinkedIn profile, which needs to be updated. (hint hint)
I really like the reporting system in Coursera. I posted a screenshot that shows progress. It really helps the student know exactly where they are in the course and what needs to be done and when it needs to be done. If there is something to be done, it will be listed first with a Start button to quickly get to that part of the course, as you can see in the image. Makes me wish I had something like this for my students in my courses in Canvas.
The last part of this course had some programming. We got a short introduction to Hadoop and how to run the Wordcount program. Surprisingly this time I found playing in the Cloudera VirtualBox fun. Amazing how that is when you don’t run into errors and the programs work as expected. Or more accurately when there aren’t any user errors. I actually felt like I knew what I was doing. Maybe a little over confident, but eh, who cares.
I can’t imagine that I would remember the code to run the program: hadoop jar /usr/jars/hadoop-examples.jar wordcount in the future, but I do have good notes for future reference. And I’m still a little fuzzy about MapReduce, as initially I couldn’t see a good use for it in my work. Our last discussion in this class stumped me a bit: What are some examples in your work or daily life where applying the map reduce algorithm can speed up the process of the situation? Dang, that’s a good question. Ha! I guess I’m still trying to figure that one out beyond the basic sorting students by demographic data or past grades.
I’m also finishing week 4 of the second course: Big Data Modeling and Management Systems this week. Who knew there was so much to learn about data modeling. Data models deal with many different types of data formats. Streaming data is becoming ubiquitous, and working with streaming data requires a different approach from working with static data. So we are learning how to gain practical hands-on experience working with different forms of streaming data this week in this course.
Big Data & Analytics Annotated Bibliography
As part of my sabbatical, I need to gain a basic understanding of statistics and data structure and get an overall sense of what educational data analytics entails, so I did some research and created a short reading list of published articles and books to read. Last summer I read Weapons of Math Destruction: How Big Data Increases Inequality and Threatens Democracy by Cathy O’Neil as part of our learning analytics professional learning community (PLC) at GCC. I also started reading a few of the articles I found including Academic analytics and data mining in higher education and Educational Data Analytics Technologies For Data-Driven Decision Making In Schools.
I plan to add to this list as I go, so if you have any suggested articles or books you think I should read, send them my way. Over the course of this semester I will be reading and adding to my Big Data & Analytics Annotated Bibliography. I’ve created this post to share my work. I’ve also included my Appendix D: Reference/Reading list for Sabbatical below.
Big Data & Analytics Annotated Bibliography
Baepler, P., & Murdoch, C. (2010). Academic analytics and data mining in higher education.
International Journal for the Scholarship of Teaching and Learning, 4(2). doi:10.20429/
ijsotl.2010.040217
This essay links the concepts of academic analytics, data mining in higher education, and
course management system audits and suggests how these techniques and the data they produce
might be useful to those who practice the scholarship of teaching and learning. Academic
analytics, educational data mining, and CMS audits, although in their incipient stages, can
begin to sift through the noise and provide SoTL researchers with a new set of tools to
understand and act on a growing stream of useful data.
Appendix C
Sabbatical Reading List
Baepler, P., & Murdoch, C. (2010). Academic analytics and data mining in higher education. International Journal for the Scholarship of Teaching and Learning, 4(2). doi:10.20429/ijsotl.2010.040217
Delaware County Community College. (n.d.). Big data, algorithms, and predictive analytics – Learning analytics – LibGuides at Delaware County Community College. Retrieved July 13, 2017, from http://libguides.dccc.edu/learning_analytics/big_data
Herold, B. (2016, January 11). The future of big data and analytics in K-12 education – Education Week. Retrieved from http://www.edweek.org/ew/articles/2016/01/13/the-future-of-big-data-and-analytis.html
Lawson, J. (2015). Data science in higher education: A step-by-step introduction to machine learning for institutional researchers. Chico, CA.
Picciano, A. G. (2012). The evolution of big data and learning analytics in American higher education. Online Learning, 16(3). doi:10.24059/olj.v16i3.267
Reinitz, B. (2017, August 10). 2017 Trends and Technologies: Analytics. Retrieved from https://library.educause.edu/resources/2017/8/2017-trends-and-technologies-analytics
Sampson, D. G. (2016, October 22). Learning analytics: Analyze your lesson to discover more about your students – eLearning Industry. Retrieved from https://elearningindustry.com/learning-analytics-analyze-lesson
Sampson, D. G. (2016, October 20). Educational data analytics technologies for data-driven decision making in schools – eLearning Industry. Retrieved from https://elearningindustry.com/educational-data-analytics-technologies