Overview

This three day Cassandra course is a hybrid course for developers and administration staff. The class is 60% lecture and 40% labs.

This is a fast-paced, vendor agnostic, technical overview of the Cassandra database. In each sub-topic, the instructor will provide links and resource recommendations for students who want to explore that area further, for example, YouTube videos, books, blog posts. Delegates will be given a PDF slide deck, which can be used as reference material after the course. PDFs will also be given out for the 5 labs in the course.

Objectives

At the end of this course you will be able to:

  • Identify the correct use cases for Cassandra
  • Appreciate the core concepts of the operations side of the Cassandra database
  • Dive into the critical architecture paths of Cassandra: Bloom filters, Block Indexes, SSTables
  • Access a 3-node Cassandra cluster in Rackspace to perform hands-on labs
  • Understand the fundamentals of how to write Java or Python code to interact with Cassandra
  • Gain links to the best books, blog posts and videos to learn more about Cassandra on their own

Audience

  • This course is targeted at both technical and non-technical people who want to understand the emerging world of Big Data, with a specific focus on Cassandra.

  • Software Engineers, Data Scientists, Network Engineers or Technologists, ideally with experience in relational/SQL databases and Java programming or a similar modern programming language.

Prerequisites

  • No prior knowledge of databases or programming is assumed, although having some basic experience with relational/SQL databases and Java will help.

Syllabus

Session 1: Intro to Cassandra

  • How to pick a NoSQL database
  • Brief use case discussion of: Key/Value, Key/Document, Column Family, Graph, Real-time
  • Structured vs. Unstructured data
  • Cassandra Origins: Amazon Dynamo, Google BigTable and Facebook
  • So, what's Cassandra good for? Use Cases.
  • Hardware recommendations (Spinning disks vs SSD,
  • CPU/RAM/Network requirements, etc)
  • Cassandra versions
  • Cassandra distributions
  • Book, YouTube & Blog recommendations for learning more about Cassandra
  • Lab 1: Install Cassandra 2.0 on a single node in the cloud

Session 2: Cassandra Architecture Fundamentals and Intro to CQL

  • Peer to peer design
  • Logical Data Model: Keyspace, Column Family/Table, Rows, Columns
  • Traditional Ring design vs. VNodes
  • Partitioners: Murmer3, Random (md5) and ByteOrdered
  • Gossip communications
  • Coordinator node
  • Seed nodes
  • Write/Read consistency levels: Any, One, Two, Three, Quorum
  • Snitches: Dynamic snitching, Simple Snitch, Rack Inferring
  • Snitch, Property File Snitch, Gossiping Property File Snitch
  • Routing Client requests
  • How a table is flushed from Memtable onto disk into SSTable files
  • Compaction fundamentals to reduce SSTable data files
  • Nodetool commands: gossipinfo, cfstats, describing
  • YAML file fundamentals
  • Operations management web GUI
  • Stress testing Cassandra
  • CQL command fundamentals
  • Lab 2: Run Cassandra commands and explore operations management concepts (Create a new Keyspace and table,
  • write data to the table, flush the table to SSTable on disk, learn how to run compaction, run nodetool commands, explore the
  • web GUI, benchmark the one node by inserting and reading 100,000 rows)

Session 3: Scaling Cassandra, Advanced CQL and Advanced YAML file

  • Best practices for scaling a Cassandra cluster
  • Managing a Cassandra cluster across data centers (new write/read consistency levels: Local quorum, each_quorum, all, serial)
  • Deeper dive into the YAML file settings
  • Advanced CQL concepts
  • Lab 3: Grow the cluster size to 3 nodes (Install Cassandra on 2 additional nodes in Rackspace and edit the YAML files to
  • configure the 3-node cluster)

Session 4: Database Internals

  • Deep dive into the Write path
  • In-memory structures for each SSTable: partition index, partition summary, bloom filter
  • Fsync settings for the commit log
  • How inserts, updates and deletes are treated byCassandra
  • Hinted Handoffs
  • Deletes and Tombstone fundamentals
  • Advanced Compaction concepts
  • Deep dive into the Read path: Row cache, partition key cache, partition summary, bloom filters, etc
  • Off-heap components in Cassandra
  • Compression concepts
  • Lightweight Transactions
  • Snapshots
  • Lab 4: Advanced Cassandra commands (query the system table, take a snapshot, decommission a node, rejoin the same
  • node back into the cluster)

Session 5: Java or Python API

  • Different ways to programmatically query Cassandra: Thrift, Hector, Astyanax, Java, Python, C#, ODBC, plus others
  • Writing your first client application
  • Connecting to the Cassandra cluster programmatically
  • Using a session to execute CQL commands
  • Asynchronous I/O to Cassandra cluster
  • Node discovery
  • Automatic failover
  • Modifying cluster configuration programmatically
  • Lab 5: Java or Python API lab (learn how to programmatically insert and read data from a Cassandra cluster using the Java or Python API)

Session 6: Advanced Concepts

  • JVM performance tuning fundamentals
  • JConsole vs jmxterm
  • Tools to monitor/test Cassandra clusters: disk i/o, memory analysis, visualisation
  • Logging in Cassandra (log4j)
  • Security: SSL encryption for client-to-node and node-to-node
  • Security: Authentication and Authorisation fundamentals
  • Security: Firewall ports
  • Using Hadoop with Cassandra
  • Using Solr with Cassandra

Training provider

Teaching mode: Classroom - Instructor Led
Duration: 3 days
Gooroo has partnered with the global leaders in IT training to give you access to quality training, personalised to you, targeted at increasing your job opportunities and salary.

Our pricing

We do not display pricing as Gooroo members qualify for special discounts not available elsewhere. You must enquire through Gooroo to get this benefit.

New courses are happening all the time

Our partner's expert training consultant will provide you with the times and all the details you need. Enquire today.

Top skills covered in this course

Analysis
Worldwide
This skill has an average salary of
US$75,897
and is mentioned in
14.99%
of job ads.
Apache Hadoop
Worldwide
This skill has an average salary of
US$113,310
and is mentioned in
0.60%
of job ads.
Java
Worldwide
This skill has an average salary of
US$99,235
and is mentioned in
5.40%
of job ads.
Database
Worldwide
This skill has an average salary of
US$74,746
and is mentioned in
7.46%
of job ads.