Skip to content

Archive for November, 2018

29
Nov

Sabbatical 2018 Week 14: Starting the Capstone Project

I successfully completed the first five courses of the Big Data Specialization through Coursera, and I’ll have to say it was crazy. I never knew there was so much to learn about data analysis. My mind is still spinning. Now I’m expected to put it all together and actually do a project. Whew! Wish me luck. I’m going to need it.

The Capstone project is a 5-week project where I’ll be doing some data exploration, aggregation, and filtering using Splunk. Then in the following week, I will perform classification on the fictional game data using a decision tree in KNIME. Next, I will learn how to use Spark MLlib to do cluster analysis on the simulated game data. This is then followed by exploring a somewhat different dataset, simulated chat data, and performing some graph analytics using Neo4j in the 4th week. Then I will be gathering results together and preparing a presentation and report. I will complete the project by submitting my presentation and final report in the final week.

At this point, I am not confident I will be able to complete this, but I said that about my doctorate dissertation too, and I completed that. It’s amazing what you can accomplish if you just try. So try I will.

I will have to say, however, that prior to signing up for this specialization, they make it sound like anyone with no prior experience could learn how to be a data analyst by just completing this specialization. These people are high. You really need a background in coding to be a data analyst. Learning how to use all the many programs was a challenge and impossible to memorize the coding needed to run the programs. It was fun, however, copying and pasting the code and watching it do stuff.

The specialization did give me insight into all that goes into data analysis and trust me, there is a lot. The capstone course doesn’t officially begin until February, but I was able to enroll and get started. I’ll probably stop and take a break for the holiday and finish this in February.

4
Nov

Sabbatical 2018 Week 12: Canvas Data Portal

I was finally able to get access to Maricopa’s instance of Canvas Data Portal, so this week I’ll share a little about what it is and how we might use it. I connected with Randy Anderson at the district, our Canvas administrator, and he was very helpful in getting me and Lisa set up for our accounts and permissions. I guess he figured we couldn’t do too much damage. I’ll explain that in a bit.

Canvas Data is a service from Canvas that provides admins with optimized access to their data for reporting and queries. “Canvas Data Admins can download flat files or view files hosted in an Amazon Redshift data warehouse. The data will be an extracted and transformed version of a school’s Canvas activity and can be accessed using any open database connectivity (ODBC) analytics tool to generate custom data visualization and reports” (Canvas). I’ve been learning about some analytics tools in my Big Data Specialization courses. Unfortunately for me, none of the 30+ tools mentioned so far are ODBC analytics tools. They were mostly big data management systems (BDMS).

Example course data dashboard created in Tableau

Example course data dashboard created in Tableau

The most common ODBC analytics tools include Excel (using Amazon Redshift), Tableau, R, and SQL Workbench/J. I’m scheduled to learn both Tableau and R in the spring in either the Johns Hopkins Data Science Specialization on Coursera or the Data Visualization with Tableau Specialization on Coursera. I haven’t decided which specialization I’ll officially do, but I’ll be able to access both.

Apparently, the district office checked into the cost of hosting in an Amazon Redshift data warehouse, and it was cost prohibitive. This is the method that many other institutions choose, while others do in-house database management. Either way, this decision is beyond me, and I’ll just have to wait to see how it pans out in Maricopa if it does at all. In the meantime, I’m hoping to be able to play with smaller sets of data from Canvas Data portal using the tab delimited (.txt) flat files. “Canvas Data parses and aggregates the over 280 million rows of Canvas usage data generated daily and exports them” (Canvas). That’s a lot of data. And I’m guessing without a specialized database or warehouse, we’ll have trouble utilizing these files.

The portal includes a Canvas Data schema which includes documentation that explains all the table data that is exported from Canvas. We could use this data to answer a multitude of questions about our students, instructors, and the courses in Canvas. For instance, Canvas suggests we could answer questions like, “What makes a successful department/course/instructor?” “How can our institution improve student retention?” and even “How are students doing in the course (current and historical)?” There’s much information to be gained from the data.

I get a little overwhelmed just thinking about all there is left to learn. The Canvas Data FAQ is a good place to start. From there I’ve already learned how to open the flat files and how to add headers to the columns. I’ve also bookmarked the R FAQ and a page for 7-Zip, a free file archiver with a high compression ratio. It’s the tool needed to open the Canvas Data .gz files. In the spring I’ll also get to visit a couple of colleges who already have all this setup and running. I good example of what would be really cool is the Unizin Data Warehouse at Indiana University. It gives faculty direct access to Canvas data for their courses. I would love to have that set up in Maricopa. Someday maybe.