Cloudera Developer Training for Apache Spark

Overview

Cloudera University’s three-day training course for Apache Spark enables participants to build complete, unified Big Data applications combining batch, streaming, and interactive analytics on all their data.

Objectives

Skills learned in this course include:
  • Using the Spark shell for interactive data analysis
  • The features of Spark’s Resilient Distributed Datasets
  • How Spark runs on a clusterParallel programming with SparkWriting Spark applications
  • Processing streaming data with Spark

Audience

This course is best suited to developers and engineers.

Syllabus

Why Spark?

  • Problems with Traditional Large-Scale Systems
  • Introducing Spark

Spark Basics

  • What is Apache Spark?
  • Using the Spark Shell
  • Resilient Distributed Datasets (RDDs)
  • Functional Programming with Spark

Working with RDDs

  • RDD Operations
  • Key-Value Pair RDDs
  • MapReduce and Pair RDD Operations

The Hadoop Distributed File System

  • Why HDFS?
  • HDFS Architecture
  • Using HDFS

Running Spark on a Cluster

  • Overview
  • A Spark Standalone Cluster
  • The Spark Standalone Web UI

Parallel Programing with Spark

  • RDD Partitions and HDFS Data Locality
  • Working With Partitions
  • Executing Parallel Operations

Caching and Persistence

  • RDD Lineage
  • Caching Overview
  • Distributed Persistence

Writing Spark Applications

  • Spark Applications vs. Spark Shell
  • Creating the SparkContext
  • Configuring Spark Properties
  • Building and Running a Spark Application
  • Logging

Spark, Hadoop, & the Enterprise Data Center

  • Overview
  • Spark and the Hadoop Ecosystem
  • Spark and MapReduce

Spark Streaming

  • Spark Streaming Overview
  • Example: Streaming Word Count
  • Other Streaming Operations
  • Sliding Window Operations
  • Developing Spark Streaming Applications

Common Spark Algorithms

  • Iterative Algorithms
  • Graph Analysis
  • Machine Learning

Improving Spark Performance

  • Shared Variables: Broadcast Variables
  • Shared Variables: Accumulators
  • Common Performance Issues

Training provider

Teaching mode:
Classroom - Instructor Led
Online - Instructor Led
Duration: 3 days
Gooroo has partnered with the global leaders in IT training to give you access to quality training, personalised to you, targeted at increasing your job opportunities and salary.

Our pricing

We do not display pricing as Gooroo members qualify for special discounts not available elsewhere. You must enquire through Gooroo to get this benefit.

New courses are happening all the time

Our partner's expert training consultant will provide you with the times and all the details you need. Enquire today.

Top skills covered in this course

Apache Spark
Worldwide
This skill has an average salary of
US$104,009
and is mentioned in
0.72%
of job ads.
Analysis
Worldwide
This skill has an average salary of
US$79,143
and is mentioned in
20.42%
of job ads.
Apache Hadoop
Worldwide
This skill has an average salary of
US$112,744
and is mentioned in
0.54%
of job ads.
Data center
Worldwide
This skill has an average salary of
US$97,842
and is mentioned in
0.65%
of job ads.