This course teaches you how to use Scalding (a domain specific language) built on Scala and Cascading to build distributed applications on Hadoop. The course also focuses on the data science aspect using Algebird, an abstract algebra library for Scala, to solve real-world sketching/streaming problems on distributed systems. You will learn how to reason about a variety of problems, how to build and test locally, and how to deploy on Hadoop. You will also learn the algorithms used to solve problems at scale where performance, compute and memory resources, and the window of time you have to process streaming data are all challenges you'll have to overcome, and how you can use Scalding and Algebird to solve for these constraints. This course also covers some Scala basics to get you up to speed and looks into how you can monitor, visualize, and troubleshoot your application's workflow and performance problems. Watch this course if you were considering, or already know how to use Pig, Hive, or any other DSL for Hadoop and not only wanted more power over your workflows, but also a DSL that is actively being developed to support up and coming execution frameworks like Apache Tez and Apache Spark with all the flexibility that a full functional programming language like Scala has to offer. If you're serious about learning how to build enterprise-grade applications on Hadoop, data science, and Lambda architectures, then this course is for you.
From developer to analyst, this course tackles a few big questions about big data: Why does this technology exist and why do I need it? How can I get the best out of it utilizing something familiar like SQL and how does this all fit together in an ever-evolving eco-system? This course will introduce the concepts of distributed computing, Hadoop and MapReduce and then goes into great detail into Apache Hive which is an SQL-like query language that can be used with Hadoop and NoSQL databases like HBase and Cassandra. The course presents some challenges you might experience solving real production problems and how Hive makes that task easier to accomplish.
Analyzing terabytes of data can be daunting, so what do you do with petabytes? The era of Big Data is upon us and it's time to sharpen up the toolbox in this new genre of systems and technologies. In this course Ben explains the evolution of Big Data systems, as well as, the various architectures and popular vendors in this space. After covering the fundamentals of Big Data systems Ben covers how to access these systems using Tableau Software. Using Tableau Software he covers how to work with your Big Data and visualize in ways that will leave your boss singing your praise.
In this course, ZDNet’s Big Data correspondent Andrew Brust teaches you all about Big Data. This course will get you up and running with the definitions and technologies you need to know, and the vendors you need to know about. By the end of the course, you’ll know what Big Data is, how it can integrate with conventional database and Business Intelligence (BI) technologies, and how to devise a strategy for adopting Big Data in your organization. No Big Data or NoSQL knowledge is required, but a lot will be imparted. This course is aimed at executives and business decision makers, and is actionable for technologists as well.
Cloud platforms have gone mainstream, and Microsoft’s Windows Azure is among the most important options in this area. This course provides an introduction to Windows Azure, walking through each of its components. The goal is to provide a big-picture overview, explaining what the platform includes and when you would use each of its technologies.