Skip to content

Posts tagged ‘Big Data’


Sabbatical 2018 Week 6: The Big Data Landscape is Ridiculously Huge

Last week I completed course two in the Big Data Specialization: Big Data Modeling and Management Systems. This was another very technical course. We gained an in-depth knowledge of why big data modeling and management is essential in preparing to gain insights from your data. We gained knowledge of real world big data modeling and management use cases in areas such as energy and gaming. We also learned to understand different kinds of data models, the ability to describe streaming data and the different challenges it presents, and the differences between a DBMS and a BDMS.

We did a lot of playing in the Cloudera VM again. I type in the codes given and things magically happen. It’s kind of cool, but no way I’m going to remember how to replicate any of this. For example, we learned how to import and query text documents with Lucene and perform weighted queries to see how rankings change. We learned how to perform statistical operations and layout algorithms on graph data in Gephi. I believe we actually installed and ran that program on our computers instead of in Cloudera. Then back in Cloudera we learned how to view semi-structured data streaming in real-time from a weather station and create plots of streaming weather station data.

If your head is spinning from just the few programs I mentioned already, it’s going to explode when you hear we also were introduced to Redis, Aerospike, AsterixDB, Solr, and Vertica. I thought I might pass out. The Big Data landscape is ridiculously huge. How anyone knows all of these programs is beyond me.

Also this week I reached out to district IT to schedule a meeting with the Canvas administrators to discuss Canvas Data Portal. It sounds like they have already started doing some exploring on their own. In fact, I was told to contact another individual who had already done some initial investigation in the use of Amazon Redshift. And a few developers have already explored it as part of a Transformation data project. It also looks like I’ll be able to get access to our Data Portal soon as well so I can start exploring. This is great news, as I thought this one step would be the one thing to derail my sabbatical proposal. Things are moving forward. I’m a little behind on my reading and annotated bib, but besides that I’m right on track. Yay, me!





Big Data & Analytics Annotated Bibliography

As part of my sabbatical, I need to gain a basic understanding of statistics and data structure and get an overall sense of what educational data analytics entails, so I did some research and created a short reading list of published articles and books to read. Last summer I read Weapons of Math Destruction: How Big Data Increases Inequality and Threatens Democracy by Cathy O’Neil as part of our learning analytics professional learning community (PLC) at GCC. I also started reading a few of the articles I found including Academic analytics and data mining in higher education and Educational Data Analytics Technologies For Data-Driven Decision Making In Schools.

I plan to add to this list as I go, so if you have any suggested articles or books you think I should read, send them my way. Over the course of this semester I will be reading and adding to my Big Data & Analytics Annotated Bibliography. I’ve created this post to share my work. I’ve also included my Appendix D: Reference/Reading list for Sabbatical below.

Big Data & Analytics Annotated Bibliography

Baepler, P., & Murdoch, C. (2010). Academic analytics and data mining in higher education.
International Journal for the Scholarship of Teaching and Learning, 4(2). doi:10.20429/

This essay links the concepts of academic analytics, data mining in higher education, and
course management system audits and suggests how these techniques and the data they produce
might be useful to those who practice the scholarship of teaching and learning. Academic
analytics, educational data mining, and CMS audits, although in their incipient stages, can
begin to sift through the noise and provide SoTL researchers with a new set of tools to
understand and act on a growing stream of useful data.

Appendix C

Sabbatical Reading List

Baepler, P., & Murdoch, C. (2010). Academic analytics and data mining in higher education. International Journal for the Scholarship of Teaching and Learning4(2). doi:10.20429/ijsotl.2010.040217

Delaware County Community College. (n.d.). Big data, algorithms, and predictive analytics – Learning analytics – LibGuides at Delaware County Community College. Retrieved July 13, 2017, from

Herold, B. (2016, January 11). The future of big data and analytics in K-12 education – Education Week. Retrieved from

Lawson, J. (2015). Data science in higher education: A step-by-step introduction to machine learning for institutional researchers. Chico, CA.

Picciano, A. G. (2012). The evolution of big data and learning analytics in American higher education. Online Learning, 16(3). doi:10.24059/olj.v16i3.267

Reinitz, B. (2017, August 10). 2017 Trends and Technologies: Analytics. Retrieved from

Sampson, D. G. (2016, October 22). Learning analytics: Analyze your lesson to discover more about your students – eLearning Industry. Retrieved from

Sampson, D. G. (2016, October 20). Educational data analytics technologies for data-driven decision making in schools – eLearning Industry. Retrieved from


Sabbatical 2018 Week 1: Getting Started with Big Data

Coursera: Big Data Specialization

Coursera: Big Data Specialization

Happy Sabbatical to me and Lisa Young. Today begins my journey into the world of Big Data. I’m starting by taking two Coursera Specializations on big data. A Coursera Specialization is a series of courses that helps you master a skill. I’m beginning with the Big Data Specialization by UC San Diego. This specialization includes 6 courses. Description: “Do you need to understand big data and how it will impact your business? This Specialization is for you. You will gain an understanding of what insights big data can provide through hands-on experience with the tools and systems used by big data scientists and engineers. Previous programming experience is not required! You will be guided through the basics of using Hadoop with MapReduce, Spark, Pig and Hive. By following along with provided code, you will experience how one can perform predictive modeling and leverage graph analytics to model problems. This specialization will prepare you to ask the right questions about data, communicate effectively with data scientists, and do basic exploration of large, complex datasets. In the final Capstone Project, developed in partnership with data software company Splunk, you’ll apply the skills you learned to do basic analyses of big data.”

I was glad to discover this specialization on Coursera because it’s exactly what I need for my sabbatical, and the best part is it only cost $50 a month. I’m anticipating I can finish in 3-4 months. The series is designed to be a part time endeavor; however, I have lots of time to devote to the courses. UC San Diego is an academic powerhouse, recognized as one of the top 10 public universities by U.S. News and World Report, so I’m pleased to be learning from this elite group of instructors. The San Diego Supercomputer Center (SDSC) at UC San Diego is a leader in data-intensive computing and cyberinfrastructure.

The second specialization I plan to take is the Data Scientist Specialization by Johns Hopkins University which includes 10 courses. Description: “Ask the right questions, manipulate data sets, and create visualizations to communicate results. This Specialization covers the concepts and tools you’ll need throughout the entire data science pipeline, from asking the right kinds of questions to making inferences and publishing results.” I’m a bit apprehensive about this series, as they do recommend some programming experience (in any language). And they also suggest “a working knowledge of mathematics up to algebra.” Ugh! I’m not sure I have a working knowledge of mathematics. I guess we’ll see. I somehow managed four college degrees (AA, BA, MA, EDD) and only remember taking one math class (college Algebra) which I took way back in 1984. Lucky for me Coursera offers a course for people like me: Data Science Math Skills by Duke. It’s a 4 week course that is designed to teach learners the basic math you will need in order to be successful in almost any data science math course and was created for learners who have basic math skills but may not have taken algebra or pre-calculus. We’ll see how this goes. Wish me luck.