Skip to content

Posts from the ‘Big Data’ Category

26
Aug

Sabbatical 2018 Week 2: Big Data Modeling

I survived week 2 of my sabbatical. I spent a good portion of time learning about big data modeling. I learned a few things including how to identify the major components in semi-structured data from a weather station and how to create plots of weather station data. I’m not confident I really learned how to do this; however, I was able to follow directions and type in the correct commands to get the desired results.

VMVirtualBoxThe challenge is that we’re using this Oracle VM VirtualBox, and I’m not certain why. For instance, one of the first steps was to open a spreadsheet application in the terminal shell. All was fine until I got an error message when running command “oocalc”. No spreadsheet application for me. I checked the discussion forum and found others have had this same error, but all the suggested fixes didn’t work for me. I posted my problem and have not yet received any help. Now I understand why so few people complete MOOCs. You’re on your own.

Oh well. Screw the terminal. I just downloaded a LibreOffice spreadsheet application to my computer and loaded up the CSV file and everything worked fine. I did try to use Microsoft Excel at first, but the instructions didn’t match up.

Later in the week I had to go back to the dreaded VirtualBox to learn how to display the nested structure of a JSON file and to extract data from a JSON file. This time we were playing around with some Twitter data and everything was fine. My confidence was boosted although temporarily. I had some challenges in the terminal shell in the next lesson trying to view the dimensions and pixel values in a image. It didn’t work at all for me. So I rolled my eyes and sent a silent prayer that that knowledge would never be necessary. I’m starting to get a feel for how some of my students might feel when learning new concepts in Comp I and II. They’re probably praying that I never ask them to demonstrate certain skills ever. I feel your pain.

I ended the week with a few more mishaps in the VirtualBox. I’m really hoping the tool is not a standard tool for data analysis and something that’s related to how Coursera works. I’m getting a little tired of watching a video of the tool working great, but when I try it – FAIL! It’s really not good for my ego or my confidence. But I will persist.

sarcasticUp next I’ll be finishing up the first course: Intro to Big Data and moving on to Week 3 of 6 in the Big Data Modeling and Management Systems course. Can’t wait to use the VirtualBox!

I also need to set up a meeting with district Canvas support to discuss the Canvas Data Portal. They’re going to turn that right on once I ask.

13
Aug

Sabbatical 2018 Week 1: Getting Started with Big Data

Coursera: Big Data Specialization

Coursera: Big Data Specialization

Happy Sabbatical to me and Lisa Young. Today begins my journey into the world of Big Data. I’m starting by taking two Coursera Specializations on big data. A Coursera Specialization is a series of courses that helps you master a skill. I’m beginning with the Big Data Specialization by UC San Diego. This specialization includes 6 courses. Description: “Do you need to understand big data and how it will impact your business? This Specialization is for you. You will gain an understanding of what insights big data can provide through hands-on experience with the tools and systems used by big data scientists and engineers. Previous programming experience is not required! You will be guided through the basics of using Hadoop with MapReduce, Spark, Pig and Hive. By following along with provided code, you will experience how one can perform predictive modeling and leverage graph analytics to model problems. This specialization will prepare you to ask the right questions about data, communicate effectively with data scientists, and do basic exploration of large, complex datasets. In the final Capstone Project, developed in partnership with data software company Splunk, you’ll apply the skills you learned to do basic analyses of big data.”

I was glad to discover this specialization on Coursera because it’s exactly what I need for my sabbatical, and the best part is it only cost $50 a month. I’m anticipating I can finish in 3-4 months. The series is designed to be a part time endeavor; however, I have lots of time to devote to the courses. UC San Diego is an academic powerhouse, recognized as one of the top 10 public universities by U.S. News and World Report, so I’m pleased to be learning from this elite group of instructors. The San Diego Supercomputer Center (SDSC) at UC San Diego is a leader in data-intensive computing and cyberinfrastructure.

The second specialization I plan to take is the Data Scientist Specialization by Johns Hopkins University which includes 10 courses. Description: “Ask the right questions, manipulate data sets, and create visualizations to communicate results. This Specialization covers the concepts and tools you’ll need throughout the entire data science pipeline, from asking the right kinds of questions to making inferences and publishing results.” I’m a bit apprehensive about this series, as they do recommend some programming experience (in any language). And they also suggest “a working knowledge of mathematics up to algebra.” Ugh! I’m not sure I have a working knowledge of mathematics. I guess we’ll see. I somehow managed four college degrees (AA, BA, MA, EDD) and only remember taking one math class (college Algebra) which I took way back in 1984. Lucky for me Coursera offers a course for people like me: Data Science Math Skills by Duke. It’s a 4 week course that is designed to teach learners the basic math you will need in order to be successful in almost any data science math course and was created for learners who have basic math skills but may not have taken algebra or pre-calculus. We’ll see how this goes. Wish me luck.