Skip to content

Posts tagged ‘VirtualBox’


Sabbatical 2018 Week 3: What the Hadoop?

Coursera courseSo I finished my first Coursera course: Introduction to Big Data. It was the first and shortest of the 6 Big Data specialization courses. It was only a 3 week course. I added my course completion certificate to my LinkedIn profile, which needs to be updated. (hint hint)

I really like the reporting system in Coursera. I posted a screenshot that shows progress. It really helps the student know exactly where they are in the course and what needs to be done and when it needs to be done. If there is something to be done, it will be listed first with a Start button to quickly get to that part of the course, as you can see in the image. Makes me wish I had something like this for my students in my courses in Canvas.

The last part of this course had some programming. We got a short introduction to Hadoop and how to run the Wordcount program. Surprisingly this time I found playing in the Cloudera VirtualBox fun. Amazing how that is when you don’t run into errors and the programs work as expected. Or more accurately when there aren’t any user errors. I actually felt like I knew what I was doing. Maybe a little over confident, but eh, who cares.

I can’t imagine that I would remember the code to run the program: hadoop jar /usr/jars/hadoop-examples.jar wordcount in the future, but I do have good notes for future reference. And I’m still a little fuzzy about MapReduce, as initially I couldn’t see a good use for it in my work. Our last discussion in this class stumped me a bit: What are some examples in your work or daily life where applying the map reduce algorithm can speed up the process of the situation? Dang, that’s a good question. Ha! I guess I’m still trying to figure that one out beyond the basic sorting students by demographic data or past grades.

I’m also finishing week 4 of the second course: Big Data Modeling and Management Systems this week. Who knew there was so much to learn about data modeling. Data models deal with many different types of data formats. Streaming data is becoming ubiquitous, and working with streaming data requires a different approach from working with static data. So we are learning how to gain practical hands-on experience working with different forms of streaming data this week in this course.


Sabbatical 2018 Week 2: Big Data Modeling

I survived week 2 of my sabbatical. I spent a good portion of time learning about big data modeling. I learned a few things including how to identify the major components in semi-structured data from a weather station and how to create plots of weather station data. I’m not confident I really learned how to do this; however, I was able to follow directions and type in the correct commands to get the desired results.

VMVirtualBoxThe challenge is that we’re using this Oracle VM VirtualBox, and I’m not certain why. For instance, one of the first steps was to open a spreadsheet application in the terminal shell. All was fine until I got an error message when running command “oocalc”. No spreadsheet application for me. I checked the discussion forum and found others have had this same error, but all the suggested fixes didn’t work for me. I posted my problem and have not yet received any help. Now I understand why so few people complete MOOCs. You’re on your own.

Oh well. Screw the terminal. I just downloaded a LibreOffice spreadsheet application to my computer and loaded up the CSV file and everything worked fine. I did try to use Microsoft Excel at first, but the instructions didn’t match up.

Later in the week I had to go back to the dreaded VirtualBox to learn how to display the nested structure of a JSON file and to extract data from a JSON file. This time we were playing around with some Twitter data and everything was fine. My confidence was boosted although temporarily. I had some challenges in the terminal shell in the next lesson trying to view the dimensions and pixel values in a image. It didn’t work at all for me. So I rolled my eyes and sent a silent prayer that that knowledge would never be necessary. I’m starting to get a feel for how some of my students might feel when learning new concepts in Comp I and II. They’re probably praying that I never ask them to demonstrate certain skills ever. I feel your pain.

I ended the week with a few more mishaps in the VirtualBox. I’m really hoping the tool is not a standard tool for data analysis and something that’s related to how Coursera works. I’m getting a little tired of watching a video of the tool working great, but when I try it – FAIL! It’s really not good for my ego or my confidence. But I will persist.

sarcasticUp next I’ll be finishing up the first course: Intro to Big Data and moving on to Week 3 of 6 in the Big Data Modeling and Management Systems course. Can’t wait to use the VirtualBox!

I also need to set up a meeting with district Canvas support to discuss the Canvas Data Portal. They’re going to turn that right on once I ask.