Skip to content

Posts tagged ‘Coursera’

29
Nov

Sabbatical 2018 Week 14: Starting the Capstone Project

I successfully completed the first five courses of the Big Data Specialization through Coursera, and I’ll have to say it was crazy. I never knew there was so much to learn about data analysis. My mind is still spinning. Now I’m expected to put it all together and actually do a project. Whew! Wish me luck. I’m going to need it.

The Capstone project is a 5-week project where I’ll be doing some data exploration, aggregation, and filtering using Splunk. Then in the following week, I will perform classification on the fictional game data using a decision tree in KNIME. Next, I will learn how to use Spark MLlib to do cluster analysis on the simulated game data. This is then followed by exploring a somewhat different dataset, simulated chat data, and performing some graph analytics using Neo4j in the 4th week. Then I will be gathering results together and preparing a presentation and report. I will complete the project by submitting my presentation and final report in the final week.

At this point, I am not confident I will be able to complete this, but I said that about my doctorate dissertation too, and I completed that. It’s amazing what you can accomplish if you just try. So try I will.

I will have to say, however, that prior to signing up for this specialization, they make it sound like anyone with no prior experience could learn how to be a data analyst by just completing this specialization. These people are high. You really need a background in coding to be a data analyst. Learning how to use all the many programs was a challenge and impossible to memorize the coding needed to run the programs. It was fun, however, copying and pasting the code and watching it do stuff.

The specialization did give me insight into all that goes into data analysis and trust me, there is a lot. The capstone course doesn’t officially begin until February, but I was able to enroll and get started. I’ll probably stop and take a break for the holiday and finish this in February.

12
Sep

Sabbatical 2018 Week 4: Where’s My Money?

I don't know image.It has become painfully clear that I will never be a data analyst. That’s not necessarily a bad thing considering I already have a job as an educator at a great community college. Thank goodness for that because I’m a little over my head here in my Big Data Specialization from the University of California San Diego. Somehow I’m learning just enough to get by, but don’t ask me anything specific. You really have to be a programmer to use this stuff.

Course 2 was Big Data Modeling and Management Systems and it was very technical. It was all about Big Data technologies, and frankly I’m happy to leave that part to the IT experts. Systems and tools discussed included: AsterixDB, HP Vertica, Impala, Neo4j, Redis, SparkSQL <eyes glass over>. We learned an in-depth knowledge of why big data modeling and management is essential in preparing to gain insights from your data, and knowledge of real world big data modeling and management use cases in areas such as energy and gaming. We also learned about different kinds of data models, the ability to describe streaming data and the different challenges it presents, and the differences between a DBMS and a BDMS.

I some how managed to complete the final assignment for this course, which was to design a data model for a fictitious game: “Catch the Pink Flamingo.” The strangest thing about this whole Coursera setup is the assignments are peer reviewed. I’m awaiting my fate as I type. I wasn’t really clear if what I was doing was correct, but I did my best and submitted the assignment. Then I had to go in and review my classmates’ work. Yeah, right? It looked good. Nothing like mine, but hey, who’s right? I guess we’ll see once my assignment is peer reviewed.

Two courses down; four to go. Then on to the Johns Hopkins Data Science Specialization. In the mean time, I’ve reached out to our district IT person in charge of Canvas. I’m hoping to meet with her soon to discuss Canvas Data Portal. ITS has a proposal process when our resources are needed for more than 20 hours, so I have to go to the PMO site which is where a business case can be initiated to start the process. Additionally, the IITGC provides prioritization of business cases/projects for ITS, so I’ll have to cross my fingers and hope my case gets prioritized.

Okay, back to figuring out how to get paid correctly. Hey, Maricopa, where’s my money?

 

31
Aug

Sabbatical 2018 Week 3: What the Hadoop?

Coursera courseSo I finished my first Coursera course: Introduction to Big Data. It was the first and shortest of the 6 Big Data specialization courses. It was only a 3 week course. I added my course completion certificate to my LinkedIn profile, which needs to be updated. (hint hint)

I really like the reporting system in Coursera. I posted a screenshot that shows progress. It really helps the student know exactly where they are in the course and what needs to be done and when it needs to be done. If there is something to be done, it will be listed first with a Start button to quickly get to that part of the course, as you can see in the image. Makes me wish I had something like this for my students in my courses in Canvas.

The last part of this course had some programming. We got a short introduction to Hadoop and how to run the Wordcount program. Surprisingly this time I found playing in the Cloudera VirtualBox fun. Amazing how that is when you don’t run into errors and the programs work as expected. Or more accurately when there aren’t any user errors. I actually felt like I knew what I was doing. Maybe a little over confident, but eh, who cares.

I can’t imagine that I would remember the code to run the program: hadoop jar /usr/jars/hadoop-examples.jar wordcount in the future, but I do have good notes for future reference. And I’m still a little fuzzy about MapReduce, as initially I couldn’t see a good use for it in my work. Our last discussion in this class stumped me a bit: What are some examples in your work or daily life where applying the map reduce algorithm can speed up the process of the situation? Dang, that’s a good question. Ha! I guess I’m still trying to figure that one out beyond the basic sorting students by demographic data or past grades.

I’m also finishing week 4 of the second course: Big Data Modeling and Management Systems this week. Who knew there was so much to learn about data modeling. Data models deal with many different types of data formats. Streaming data is becoming ubiquitous, and working with streaming data requires a different approach from working with static data. So we are learning how to gain practical hands-on experience working with different forms of streaming data this week in this course.

26
Aug

Sabbatical 2018 Week 2: Big Data Modeling

I survived week 2 of my sabbatical. I spent a good portion of time learning about big data modeling. I learned a few things including how to identify the major components in semi-structured data from a weather station and how to create plots of weather station data. I’m not confident I really learned how to do this; however, I was able to follow directions and type in the correct commands to get the desired results.

VMVirtualBoxThe challenge is that we’re using this Oracle VM VirtualBox, and I’m not certain why. For instance, one of the first steps was to open a spreadsheet application in the terminal shell. All was fine until I got an error message when running command “oocalc”. No spreadsheet application for me. I checked the discussion forum and found others have had this same error, but all the suggested fixes didn’t work for me. I posted my problem and have not yet received any help. Now I understand why so few people complete MOOCs. You’re on your own.

Oh well. Screw the terminal. I just downloaded a LibreOffice spreadsheet application to my computer and loaded up the CSV file and everything worked fine. I did try to use Microsoft Excel at first, but the instructions didn’t match up.

Later in the week I had to go back to the dreaded VirtualBox to learn how to display the nested structure of a JSON file and to extract data from a JSON file. This time we were playing around with some Twitter data and everything was fine. My confidence was boosted although temporarily. I had some challenges in the terminal shell in the next lesson trying to view the dimensions and pixel values in a image. It didn’t work at all for me. So I rolled my eyes and sent a silent prayer that that knowledge would never be necessary. I’m starting to get a feel for how some of my students might feel when learning new concepts in Comp I and II. They’re probably praying that I never ask them to demonstrate certain skills ever. I feel your pain.

I ended the week with a few more mishaps in the VirtualBox. I’m really hoping the tool is not a standard tool for data analysis and something that’s related to how Coursera works. I’m getting a little tired of watching a video of the tool working great, but when I try it – FAIL! It’s really not good for my ego or my confidence. But I will persist.

sarcasticUp next I’ll be finishing up the first course: Intro to Big Data and moving on to Week 3 of 6 in the Big Data Modeling and Management Systems course. Can’t wait to use the VirtualBox!

I also need to set up a meeting with district Canvas support to discuss the Canvas Data Portal. They’re going to turn that right on once I ask.

13
Aug

Sabbatical 2018 Week 1: Getting Started with Big Data

Coursera: Big Data Specialization

Coursera: Big Data Specialization

Happy Sabbatical to me and Lisa Young. Today begins my journey into the world of Big Data. I’m starting by taking two Coursera Specializations on big data. A Coursera Specialization is a series of courses that helps you master a skill. I’m beginning with the Big Data Specialization by UC San Diego. This specialization includes 6 courses. Description: “Do you need to understand big data and how it will impact your business? This Specialization is for you. You will gain an understanding of what insights big data can provide through hands-on experience with the tools and systems used by big data scientists and engineers. Previous programming experience is not required! You will be guided through the basics of using Hadoop with MapReduce, Spark, Pig and Hive. By following along with provided code, you will experience how one can perform predictive modeling and leverage graph analytics to model problems. This specialization will prepare you to ask the right questions about data, communicate effectively with data scientists, and do basic exploration of large, complex datasets. In the final Capstone Project, developed in partnership with data software company Splunk, you’ll apply the skills you learned to do basic analyses of big data.”

I was glad to discover this specialization on Coursera because it’s exactly what I need for my sabbatical, and the best part is it only cost $50 a month. I’m anticipating I can finish in 3-4 months. The series is designed to be a part time endeavor; however, I have lots of time to devote to the courses. UC San Diego is an academic powerhouse, recognized as one of the top 10 public universities by U.S. News and World Report, so I’m pleased to be learning from this elite group of instructors. The San Diego Supercomputer Center (SDSC) at UC San Diego is a leader in data-intensive computing and cyberinfrastructure.

The second specialization I plan to take is the Data Scientist Specialization by Johns Hopkins University which includes 10 courses. Description: “Ask the right questions, manipulate data sets, and create visualizations to communicate results. This Specialization covers the concepts and tools you’ll need throughout the entire data science pipeline, from asking the right kinds of questions to making inferences and publishing results.” I’m a bit apprehensive about this series, as they do recommend some programming experience (in any language). And they also suggest “a working knowledge of mathematics up to algebra.” Ugh! I’m not sure I have a working knowledge of mathematics. I guess we’ll see. I somehow managed four college degrees (AA, BA, MA, EDD) and only remember taking one math class (college Algebra) which I took way back in 1984. Lucky for me Coursera offers a course for people like me: Data Science Math Skills by Duke. It’s a 4 week course that is designed to teach learners the basic math you will need in order to be successful in almost any data science math course and was created for learners who have basic math skills but may not have taken algebra or pre-calculus. We’ll see how this goes. Wish me luck.