HDP Developer: Apache Spark Using Scala

Displaying courses for Great Britain [Change]


This course is designed for developers who need to create applications to analyze Big Data stored in Apache Hadoop using Spark. Topics include: Hadoop, YARN, HDFS, using Spark for interactive data exploration, building and deploying Spark applications, optimization of applications, creating Spark pipelines with multiple libraries, working with different filetypes, building data frames, exploring the Spark SQL API, using Spark Streaming and an introduction to Spark MLlib.


Software engineers that are looking to develop time sensitive applications for Hadoop.


Please note: Hortonworks courses are delivered using electronic courseware. for delegates attending remotely (Virtual classes or Attend from Anywhere) you must ensure that you have dual monitors or a single monitor plus tablet device. Dual monitors are required in order to allow you to view labs and lab instructions on separate screens.

Technical pre-requisites

Students should be familiar with programming principles and have previous experience in software development. SQL knowledge is helpful. No prior Hadoop experience required, but is very helpful.


  • Describe Hadoop, HDFS, YARN, and uses cases for Hadoop
  • Describe Spark and Spark specific use cases
  • Understand the HDFS architecture
  • Use the HDFS commands to insert and retrieve data
  • Explain the differences between Spark and MapReduce
  • Explore data interactively through the spark shell utility
  • Explain the RDD concept
  • Understand concepts of functional programming
  • Use the Python or Scala Spark APIs
  • Create all types of RDDs: Pair, Double, and Generic
  • Use RDD type-specific functions
  • Explain interaction of components of a Spark Application
  • Explain the creation of the DAG schedule
  • Build and package Spark applications
  • Use application configuration items
  • Deploy applications to the cluster using YARN
  • Use data caching to increase performance of applications
  • Implement advanced features of spark
  • Learn general application optimization guidelines/tips
  • Create applications using the Spark SQL library
  • Create/transform data using dataframes
  • Read, use, and save to different Hadoop file formats
  • Understand the concepts of Spark Streaming
  • Create a streaming application
  • Use Spark MLlib to gain insights from data

Hands-On Labs

  • Create a Spark 'Hello World' word count application
  • Use HDFS commands to add and remove files and folders
  • Use advanced RDD programming to perform sort, join, pattern matching and regex tasks
  • Explore partitioning and the Spark UI
  • Increase performance using data caching
  • Checkpoint iterative applications
  • Build/package a Spark application using Maven
  • Use a broadcast variable to efficiently join a small dataset to
  • a massive dataset
  • Use an accumulator for reporting data quality issues
  • Create a data frame and perform analysis
  • Load/transform/store data using Spark with Hive tables
  • Create a point-in-time spark stream application
  • Create a spark stream application using window functions
  • Create a Spark MLlib application using K-Means

Training provider

Teaching mode: Classroom - Instructor Led
Duration: 3 days
Gooroo has partnered with the global leaders in IT training to give you access to quality training, personalised to you, targeted at increasing your job opportunities and salary.

Our pricing

We do not display pricing as Gooroo members qualify for special discounts not available elsewhere. You must enquire through Gooroo to get this benefit.

New courses are happening all the time

Our partner's expert training consultant will provide you with the times and all the details you need. Enquire today.