Skip to content

Archive for September 26th, 2018

26
Sep

Sabbatical 2018 Week 6: The Big Data Landscape is Ridiculously Huge

Last week I completed course two in the Big Data Specialization: Big Data Modeling and Management Systems. This was another very technical course. We gained an in-depth knowledge of why big data modeling and management is essential in preparing to gain insights from your data. We gained knowledge of real world big data modeling and management use cases in areas such as energy and gaming. We also learned to understand different kinds of data models, the ability to describe streaming data and the different challenges it presents, and the differences between a DBMS and a BDMS.

We did a lot of playing in the Cloudera VM again. I type in the codes given and things magically happen. It’s kind of cool, but no way I’m going to remember how to replicate any of this. For example, we learned how to import and query text documents with Lucene and perform weighted queries to see how rankings change. We learned how to perform statistical operations and layout algorithms on graph data in Gephi. I believe we actually installed and ran that program on our computers instead of in Cloudera. Then back in Cloudera we learned how to view semi-structured data streaming in real-time from a weather station and create plots of streaming weather station data.

If your head is spinning from just the few programs I mentioned already, it’s going to explode when you hear we also were introduced to Redis, Aerospike, AsterixDB, Solr, and Vertica. I thought I might pass out. The Big Data landscape is ridiculously huge. How anyone knows all of these programs is beyond me.

Also this week I reached out to district IT to schedule a meeting with the Canvas administrators to discuss Canvas Data Portal. It sounds like they have already started doing some exploring on their own. In fact, I was told to contact another individual who had already done some initial investigation in the use of Amazon Redshift. And a few developers have already explored it as part of a Transformation data project. It also looks like I’ll be able to get access to our Data Portal soon as well so I can start exploring. This is great news, as I thought this one step would be the one thing to derail my sabbatical proposal. Things are moving forward. I’m a little behind on my reading and annotated bib, but besides that I’m right on track. Yay, me!