Skip to content

Recent Articles

29
Nov

Sabbatical 2018 Week 14: Starting the Capstone Project

I successfully completed the first five courses of the Big Data Specialization through Coursera, and I’ll have to say it was crazy. I never knew there was so much to learn about data analysis. My mind is still spinning. Now I’m expected to put it all together and actually do a project. Whew! Wish me luck. I’m going to need it.

The Capstone project is a 5-week project where I’ll be doing some data exploration, aggregation, and filtering using Splunk. Then in the following week, I will perform classification on the fictional game data using a decision tree in KNIME. Next, I will learn how to use Spark MLlib to do cluster analysis on the simulated game data. This is then followed by exploring a somewhat different dataset, simulated chat data, and performing some graph analytics using Neo4j in the 4th week. Then I will be gathering results together and preparing a presentation and report. I will complete the project by submitting my presentation and final report in the final week.

At this point, I am not confident I will be able to complete this, but I said that about my doctorate dissertation too, and I completed that. It’s amazing what you can accomplish if you just try. So try I will.

I will have to say, however, that prior to signing up for this specialization, they make it sound like anyone with no prior experience could learn how to be a data analyst by just completing this specialization. These people are high. You really need a background in coding to be a data analyst. Learning how to use all the many programs was a challenge and impossible to memorize the coding needed to run the programs. It was fun, however, copying and pasting the code and watching it do stuff.

The specialization did give me insight into all that goes into data analysis and trust me, there is a lot. The capstone course doesn’t officially begin until February, but I was able to enroll and get started. I’ll probably stop and take a break for the holiday and finish this in February.

4
Nov

Sabbatical 2018 Week 12: Canvas Data Portal

I was finally able to get access to Maricopa’s instance of Canvas Data Portal, so this week I’ll share a little about what it is and how we might use it. I connected with Randy Anderson at the district, our Canvas administrator, and he was very helpful in getting me and Lisa set up for our accounts and permissions. I guess he figured we couldn’t do too much damage. I’ll explain that in a bit.

Canvas Data is a service from Canvas that provides admins with optimized access to their data for reporting and queries. “Canvas Data Admins can download flat files or view files hosted in an Amazon Redshift data warehouse. The data will be an extracted and transformed version of a school’s Canvas activity and can be accessed using any open database connectivity (ODBC) analytics tool to generate custom data visualization and reports” (Canvas). I’ve been learning about some analytics tools in my Big Data Specialization courses. Unfortunately for me, none of the 30+ tools mentioned so far are ODBC analytics tools. They were mostly big data management systems (BDMS).

Example course data dashboard created in Tableau

Example course data dashboard created in Tableau

The most common ODBC analytics tools include Excel (using Amazon Redshift), Tableau, R, and SQL Workbench/J. I’m scheduled to learn both Tableau and R in the spring in either the Johns Hopkins Data Science Specialization on Coursera or the Data Visualization with Tableau Specialization on Coursera. I haven’t decided which specialization I’ll officially do, but I’ll be able to access both.

Apparently, the district office checked into the cost of hosting in an Amazon Redshift data warehouse, and it was cost prohibitive. This is the method that many other institutions choose, while others do in-house database management. Either way, this decision is beyond me, and I’ll just have to wait to see how it pans out in Maricopa if it does at all. In the meantime, I’m hoping to be able to play with smaller sets of data from Canvas Data portal using the tab delimited (.txt) flat files. “Canvas Data parses and aggregates the over 280 million rows of Canvas usage data generated daily and exports them” (Canvas). That’s a lot of data. And I’m guessing without a specialized database or warehouse, we’ll have trouble utilizing these files.

The portal includes a Canvas Data schema which includes documentation that explains all the table data that is exported from Canvas. We could use this data to answer a multitude of questions about our students, instructors, and the courses in Canvas. For instance, Canvas suggests we could answer questions like, “What makes a successful department/course/instructor?” “How can our institution improve student retention?” and even “How are students doing in the course (current and historical)?” There’s much information to be gained from the data.

I get a little overwhelmed just thinking about all there is left to learn. The Canvas Data FAQ is a good place to start. From there I’ve already learned how to open the flat files and how to add headers to the columns. I’ve also bookmarked the R FAQ and a page for 7-Zip, a free file archiver with a high compression ratio. It’s the tool needed to open the Canvas Data .gz files. In the spring I’ll also get to visit a couple of colleges who already have all this setup and running. I good example of what would be really cool is the Unizin Data Warehouse at Indiana University. It gives faculty direct access to Canvas data for their courses. I would love to have that set up in Maricopa. Someday maybe.

 

31
Oct

Sabbatical 2018 Week 10: What Happened to Weeks 7-9?

Boy, time sure flies when you’re busy. It’s already week 10. I had to go back to August to count the weeks because I barely know what day it is, let alone how many weeks have passed in the semester. This of course is all good. While on sabbatical, I’ve also been renovating a vacation home we purchased up in Happy Jack, so my life has been consumed with data and renovations for the last three months. Thankfully one of those projects is almost complete. And that would not be the data project. On we roll.

Big Data is still my world at the moment. I’m currently in course 5 of the Big Data Specialization on Coursera. Course 5 is Graph Analytics for Big Data. I’m learning about how real world data science problems can be modeled as graphs along with various tools and techniques. The biggest thing I’ve learned so far is that most people don’t know what graphs are. Most people think graphs are these pretty pie charts.

These are not graphs apparently. These are pie charts. I knew that. I love pie charts. We are not learning how to make pie charts in the Graph Analytics for Big Data course. We’re learning how to make this below. This is a graph with nodes and edges.

I should have know this was not going to be simple. This graph theory is tied to math, so they are “mathematical structures used to model pairwise relations between objects.” “Graphs can be used to model many types of relations and processes in physical, biological, social and information systems” (Wikipedia).

A good example of how graphs can be used is with fraud detection. Graph databases are uniquely positioned to spot the connections between large data sets and identify patterns, a useful trait when it comes to spotting complex, modern fraud techniques. A better example is the product recommendations you get on Amazon and other online retail sites. Amazon can pull together product, customer, inventory, supplier and social sentiment data into a graph database to spot patterns and make smarter recommendations to you.

I’m still wrapping my head around how graphs can be useful in education. For an assignment I designed a graph around a peer review assignment for students. It’s pretty basic, but in my mind this might be useful data to find patterns to help students improve their work.

Later in this course we will be learning how to use Neo4j, a graph database management system and GraphX, Apache Spark’s API for graphs and graph-parallel computation. So I imagine my graphs in another week will be much better.

Next post I’ll share some information about Canvas Data Portal, as I now have access to Maricopa’s instance. It’s so exciting even though I don’t really know how to “look” at the data yet, but I can see all the flat files. I just need a database to magically appear with a data scientist attached to help. 🙂

26
Sep

Sabbatical 2018 Week 6: The Big Data Landscape is Ridiculously Huge

Last week I completed course two in the Big Data Specialization: Big Data Modeling and Management Systems. This was another very technical course. We gained an in-depth knowledge of why big data modeling and management is essential in preparing to gain insights from your data. We gained knowledge of real world big data modeling and management use cases in areas such as energy and gaming. We also learned to understand different kinds of data models, the ability to describe streaming data and the different challenges it presents, and the differences between a DBMS and a BDMS.

We did a lot of playing in the Cloudera VM again. I type in the codes given and things magically happen. It’s kind of cool, but no way I’m going to remember how to replicate any of this. For example, we learned how to import and query text documents with Lucene and perform weighted queries to see how rankings change. We learned how to perform statistical operations and layout algorithms on graph data in Gephi. I believe we actually installed and ran that program on our computers instead of in Cloudera. Then back in Cloudera we learned how to view semi-structured data streaming in real-time from a weather station and create plots of streaming weather station data.

If your head is spinning from just the few programs I mentioned already, it’s going to explode when you hear we also were introduced to Redis, Aerospike, AsterixDB, Solr, and Vertica. I thought I might pass out. The Big Data landscape is ridiculously huge. How anyone knows all of these programs is beyond me.

Also this week I reached out to district IT to schedule a meeting with the Canvas administrators to discuss Canvas Data Portal. It sounds like they have already started doing some exploring on their own. In fact, I was told to contact another individual who had already done some initial investigation in the use of Amazon Redshift. And a few developers have already explored it as part of a Transformation data project. It also looks like I’ll be able to get access to our Data Portal soon as well so I can start exploring. This is great news, as I thought this one step would be the one thing to derail my sabbatical proposal. Things are moving forward. I’m a little behind on my reading and annotated bib, but besides that I’m right on track. Yay, me!

 

 

 

25
Sep

Sabbatical 2018 Week 5: Not All Work and No Play

If you’ve never taken an extended sabbatical from your job, you’re really missing out. It’s a great experience that I’m grateful to have taken advantage of twice in my 20 years in Maricopa. I really think I’ve worked hard enough to deserve it, and you probably have too. According to the MCLI website,

“A sabbatical leave is an opportunity to broaden or deepen educational interests, to explore new areas, or examine instructional methods to enhance the mission of the college. A sabbatical leave gives faculty a respite from their normal duties in order to provide them an opportunity to grow professionally. The goal of a sabbatical leave project is to engage faculty in the areas of study, research, travel, work experience, or other creative activity, and to contribute to the institution as a whole upon his/her return to the college.”

If you’re into learning new things then a sabbatical in Maricopa is for you. However, in the more generic sense the word sabbatical, which can be a noun or an adjective, comes from the Greek word sabatikos, which means “of the Sabbath,” the day of rest that happens every seventh day. Most teaching jobs come with the promise of a sabbatical, which is a year of not having to teach, though you still get paid. It’s also interesting to know that only 5% of US companies offer paid sabbaticals. So I’m not complaining that I still have to work during my sabbatical. At least it’s something I’m interested in learning and doesn’t involve grading hundreds of essay. It’s definitely a respite from the norm.

The challenging part for me is getting used to doing less. Many faculty do more than just teach a 15 hours schedule, and Maricopa is good about providing opportunities and compensating those of us who do more. For the past 4 years, I’ve been wrapped up in the world of professional development, online learning and OER. I’ve taught very little, but worked more than I have in previous years collaborating, coordinating, and strategizing with our Instructional Designer, CTLE Staff, eCourses faculty lead and faculty developers. My involvement also included working district wide with other CTL directors, elearning and OER leaders. It’s hard to just go cold turkey and not talk to or work with any of those people anymore. My only saving grace is that many of those people are personal friends and we still chat when I sneak on campus to visit or attend a planned happy hour. Shout out to Meghan, my better half for the last 4 years.

One major plus is that my other partner in crime for the past 4 years, Dr. Lisa Young, is also on sabbatical this year, and her sabbatical proposal is similar to mine – Big Data. And as the Faculty Director of SCC’s CTL and Co-Tri-Chair of the Maricopa Millions project, she’s been involved in all the same things I have. So she can relate. Part of our sabbatical plan is to hike every other week to discuss our projects and other stuff. It’s comforting to know she’s learning the same things and good to have someone to bounce ideas off of. And it doesn’t hurt to get some exercise in on a regular basis. Below is evidence of our endeavors.

The best part of a sabbatical is you get to determine your schedule, so there’s a lot of flexibility in there for doing the things you never seem to have time for. The reality is that many of the people you’d like to do those things with are still working hard and stressed out. Ha! (Sorry Beth! Thanks for visiting me yesterday)

And one more for the road. So far we’ve hiked Holbert and Mormon Trails on South Mountain, Cholla Trail on Camelback, and Trail 100 in Dreamy Draw followed by breakfast at Dick’s Hideaway, Scramble, and First Watch. Breakfast is an added bonus. What is up with my hair?! Anyway, I’m looking forward to it cooling off so we don’t have to hike so early. Then sabbatical life will be truly perfect. Well, if they can figure out how to pay me correctly then it will be truly perfect.

12
Sep

Sabbatical 2018 Week 4: Where’s My Money?

I don't know image.It has become painfully clear that I will never be a data analyst. That’s not necessarily a bad thing considering I already have a job as an educator at a great community college. Thank goodness for that because I’m a little over my head here in my Big Data Specialization from the University of California San Diego. Somehow I’m learning just enough to get by, but don’t ask me anything specific. You really have to be a programmer to use this stuff.

Course 2 was Big Data Modeling and Management Systems and it was very technical. It was all about Big Data technologies, and frankly I’m happy to leave that part to the IT experts. Systems and tools discussed included: AsterixDB, HP Vertica, Impala, Neo4j, Redis, SparkSQL <eyes glass over>. We learned an in-depth knowledge of why big data modeling and management is essential in preparing to gain insights from your data, and knowledge of real world big data modeling and management use cases in areas such as energy and gaming. We also learned about different kinds of data models, the ability to describe streaming data and the different challenges it presents, and the differences between a DBMS and a BDMS.

I some how managed to complete the final assignment for this course, which was to design a data model for a fictitious game: “Catch the Pink Flamingo.” The strangest thing about this whole Coursera setup is the assignments are peer reviewed. I’m awaiting my fate as I type. I wasn’t really clear if what I was doing was correct, but I did my best and submitted the assignment. Then I had to go in and review my classmates’ work. Yeah, right? It looked good. Nothing like mine, but hey, who’s right? I guess we’ll see once my assignment is peer reviewed.

Two courses down; four to go. Then on to the Johns Hopkins Data Science Specialization. In the mean time, I’ve reached out to our district IT person in charge of Canvas. I’m hoping to meet with her soon to discuss Canvas Data Portal. ITS has a proposal process when our resources are needed for more than 20 hours, so I have to go to the PMO site which is where a business case can be initiated to start the process. Additionally, the IITGC provides prioritization of business cases/projects for ITS, so I’ll have to cross my fingers and hope my case gets prioritized.

Okay, back to figuring out how to get paid correctly. Hey, Maricopa, where’s my money?

 

31
Aug

Sabbatical 2018 Week 3: What the Hadoop?

Coursera courseSo I finished my first Coursera course: Introduction to Big Data. It was the first and shortest of the 6 Big Data specialization courses. It was only a 3 week course. I added my course completion certificate to my LinkedIn profile, which needs to be updated. (hint hint)

I really like the reporting system in Coursera. I posted a screenshot that shows progress. It really helps the student know exactly where they are in the course and what needs to be done and when it needs to be done. If there is something to be done, it will be listed first with a Start button to quickly get to that part of the course, as you can see in the image. Makes me wish I had something like this for my students in my courses in Canvas.

The last part of this course had some programming. We got a short introduction to Hadoop and how to run the Wordcount program. Surprisingly this time I found playing in the Cloudera VirtualBox fun. Amazing how that is when you don’t run into errors and the programs work as expected. Or more accurately when there aren’t any user errors. I actually felt like I knew what I was doing. Maybe a little over confident, but eh, who cares.

I can’t imagine that I would remember the code to run the program: hadoop jar /usr/jars/hadoop-examples.jar wordcount in the future, but I do have good notes for future reference. And I’m still a little fuzzy about MapReduce, as initially I couldn’t see a good use for it in my work. Our last discussion in this class stumped me a bit: What are some examples in your work or daily life where applying the map reduce algorithm can speed up the process of the situation? Dang, that’s a good question. Ha! I guess I’m still trying to figure that one out beyond the basic sorting students by demographic data or past grades.

I’m also finishing week 4 of the second course: Big Data Modeling and Management Systems this week. Who knew there was so much to learn about data modeling. Data models deal with many different types of data formats. Streaming data is becoming ubiquitous, and working with streaming data requires a different approach from working with static data. So we are learning how to gain practical hands-on experience working with different forms of streaming data this week in this course.

30
Aug

Big Data & Analytics Annotated Bibliography

As part of my sabbatical, I need to gain a basic understanding of statistics and data structure and get an overall sense of what educational data analytics entails, so I did some research and created a short reading list of published articles and books to read. Last summer I read Weapons of Math Destruction: How Big Data Increases Inequality and Threatens Democracy by Cathy O’Neil as part of our learning analytics professional learning community (PLC) at GCC. I also started reading a few of the articles I found including Academic analytics and data mining in higher education and Educational Data Analytics Technologies For Data-Driven Decision Making In Schools.

I plan to add to this list as I go, so if you have any suggested articles or books you think I should read, send them my way. Over the course of this semester I will be reading and adding to my Big Data & Analytics Annotated Bibliography. I’ve created this post to share my work. I’ve also included my Appendix D: Reference/Reading list for Sabbatical below.

Big Data & Analytics Annotated Bibliography

Baepler, P., & Murdoch, C. (2010). Academic analytics and data mining in higher education.
International Journal for the Scholarship of Teaching and Learning, 4(2). doi:10.20429/
ijsotl.2010.040217

This essay links the concepts of academic analytics, data mining in higher education, and
course management system audits and suggests how these techniques and the data they produce
might be useful to those who practice the scholarship of teaching and learning. Academic
analytics, educational data mining, and CMS audits, although in their incipient stages, can
begin to sift through the noise and provide SoTL researchers with a new set of tools to
understand and act on a growing stream of useful data.

Appendix C

Sabbatical Reading List

Baepler, P., & Murdoch, C. (2010). Academic analytics and data mining in higher education. International Journal for the Scholarship of Teaching and Learning4(2). doi:10.20429/ijsotl.2010.040217

Delaware County Community College. (n.d.). Big data, algorithms, and predictive analytics – Learning analytics – LibGuides at Delaware County Community College. Retrieved July 13, 2017, from http://libguides.dccc.edu/learning_analytics/big_data

Herold, B. (2016, January 11). The future of big data and analytics in K-12 education – Education Week. Retrieved from http://www.edweek.org/ew/articles/2016/01/13/the-future-of-big-data-and-analytis.html

Lawson, J. (2015). Data science in higher education: A step-by-step introduction to machine learning for institutional researchers. Chico, CA.

Picciano, A. G. (2012). The evolution of big data and learning analytics in American higher education. Online Learning, 16(3). doi:10.24059/olj.v16i3.267

Reinitz, B. (2017, August 10). 2017 Trends and Technologies: Analytics. Retrieved from https://library.educause.edu/resources/2017/8/2017-trends-and-technologies-analytics

Sampson, D. G. (2016, October 22). Learning analytics: Analyze your lesson to discover more about your students – eLearning Industry. Retrieved from https://elearningindustry.com/learning-analytics-analyze-lesson

Sampson, D. G. (2016, October 20). Educational data analytics technologies for data-driven decision making in schools – eLearning Industry. Retrieved from https://elearningindustry.com/educational-data-analytics-technologies

27
Aug

Pre-Sabbatical InstructureCarn – Summer 2018

I was checking in on my timeline I presented in my sabbatical proposal and remembered that my FPG travel in July was part of my sabbatical. My plan for Summer 2018 included attending the annual Canvas conference, InstructureCarn, which was held in late July in Colorado. I used FPG funds for this conference travel. At the conference I made some connections with more schools that are using Canvas Data Portal that I can hopefully connect with later during my sabbatical. 

Carnival TentThe conference had a carnival theme and a ton of sessions on Canvas Data, so I had a nice lineup to choose from. Most of the session presenters were actual data scientists, so a lot of what they talked about was over my head – very technical. It will be nice to go back and watch a few of the sessions again once Instructure posts the recordings online and I know little more about the technical side. For instance, the first session I sat in was Concept-Based Data Analysis: A New Method for Organizing and Visualizing Data Using Course Design Principles. Fascinating stuff, but I had no idea how to get to where they were. The presentation explained that by combining sound pedagogical principles with new methods of data collection from Canvas, there’s a method for visualizing classroom data to evaluate the effectiveness of course material, highlight concepts that call for improvement, and present this data to students, faculty, and administrators in a holistic format. Yes, please!

The next session I attended made a lot more sense to me, a novice, and was geared more to what I imagine I could possible persuade our campus to set up. The presentation, Determining Student Activity in Canvas Data, showed how you can efficiently clean and use the data in Canvas Data to build a database and determine student activity and grades from just a few tables. The one thing I’m learning about all these great data projects is that it takes a team to develop them. They get buy-in from admin, IT, Student Services, Faculty and Data Scientist before they create anything. That could end up being a challenge for me.

Candied ApplesOverall, I attended 10 sessions that had something to do with Canvas Data or Analytics. Luckily for me Instructure had a lot of planned fun carnival activities built into the day and evening because my brain hurt after some of those sessions. But it was nice to unwind in the evening with colleagues and friends. We actually attended a carnival with all kinds of different street food, rides and games. I mean, who could pass up a table full of candied apples. We couldn’t!

I think Beth may have had too much sugar.

And yes, I did eat the whole thing. We even got little panda bears and all kinds of other swag.

All in all it was time well spent, both in the conference sessions and all the fun in between. I will say my biggest disappointment was a session I was looking forward to disappeared off the program and no longer existed. It was the perfect session for me: A Non-Programmers Guide to Using the Canvas Data Portal. Yes! Sign me up. Nope. Gone. 🙁  They enticed me with: “The Canvas Data Portal is a great tool, but can be intimidating for non-technical or non-programming professionals. In this session, I will go through my personal journey learning and utilizing the Canvas Data Portal as well as provide tutorials, tips, and strategies for non-technical or non-programming individuals so they can fully utilize the Canvas Data Portal in their Canvas Instance.” But then they didn’t show up. No “personal journey.” No “tutorials, tips, and strategies.” I should track them down.

 

26
Aug

Sabbatical 2018 Week 2: Big Data Modeling

I survived week 2 of my sabbatical. I spent a good portion of time learning about big data modeling. I learned a few things including how to identify the major components in semi-structured data from a weather station and how to create plots of weather station data. I’m not confident I really learned how to do this; however, I was able to follow directions and type in the correct commands to get the desired results.

VMVirtualBoxThe challenge is that we’re using this Oracle VM VirtualBox, and I’m not certain why. For instance, one of the first steps was to open a spreadsheet application in the terminal shell. All was fine until I got an error message when running command “oocalc”. No spreadsheet application for me. I checked the discussion forum and found others have had this same error, but all the suggested fixes didn’t work for me. I posted my problem and have not yet received any help. Now I understand why so few people complete MOOCs. You’re on your own.

Oh well. Screw the terminal. I just downloaded a LibreOffice spreadsheet application to my computer and loaded up the CSV file and everything worked fine. I did try to use Microsoft Excel at first, but the instructions didn’t match up.

Later in the week I had to go back to the dreaded VirtualBox to learn how to display the nested structure of a JSON file and to extract data from a JSON file. This time we were playing around with some Twitter data and everything was fine. My confidence was boosted although temporarily. I had some challenges in the terminal shell in the next lesson trying to view the dimensions and pixel values in a image. It didn’t work at all for me. So I rolled my eyes and sent a silent prayer that that knowledge would never be necessary. I’m starting to get a feel for how some of my students might feel when learning new concepts in Comp I and II. They’re probably praying that I never ask them to demonstrate certain skills ever. I feel your pain.

I ended the week with a few more mishaps in the VirtualBox. I’m really hoping the tool is not a standard tool for data analysis and something that’s related to how Coursera works. I’m getting a little tired of watching a video of the tool working great, but when I try it – FAIL! It’s really not good for my ego or my confidence. But I will persist.

sarcasticUp next I’ll be finishing up the first course: Intro to Big Data and moving on to Week 3 of 6 in the Big Data Modeling and Management Systems course. Can’t wait to use the VirtualBox!

I also need to set up a meeting with district Canvas support to discuss the Canvas Data Portal. They’re going to turn that right on once I ask.