DB 105 - Apache Spark™ Programming

DB 105 - Apache Spark™ Programming


This 3-day course provides a thorough review of the Apache Spark framework, including the "Spark fundamentals" with specific emphasis on skills development and the unique needs of a Data Engineering team through the use of lecture and hands-on labs.


This course is combined with DB 100 - Apache Spark Overview to provide a comprehensive overview of the Apache Spark framework for Data Engineers.

After working through the Apache Spark fundamentals on the first day, the following days resume with more advanced APIs and techniques such as a review of specific Readers & Writers, broadcast table joins, additional SQL functions, and more hands-on labs. Additionally, the Structured Streaming demos from day #1 are replaced with broader, streaming-specific, lectures, and labs.

Throughout the three day course, participants are also introduced into more of the Apache Spark architecture. Topics include, but are not limited to, the DAG Execution model, an introduction to the Catalyst Optimizer, and Spark-Partitioning.


3 Days


Upon completion, participants should be able to:

  • Describe how Apache Spark's distributed design allows for the processing of Gigabytes to Terabytes of data
  • Understand and troubleshoot a broader set of performance problems that new developers often encounter
  • Understand the breadth and depth of Apache Spark's capabilities
  • Use the DataFrame APIs to ingest, alter and write data as well employ some of the more advanced transformations
  • Create Structured Streaming jobs
  • Understand how the machine learning pipeline works
  • Analyze Spark jobs from Databricks and the Spark UI


  • This course is ideal for Data Engineers that are new to Apache Spark or that have been using Apache Spark for less than one year
  • This course is suitable for SQL Analyst seeking to grow beyond simple SQL queries and into the use of the DataFrame APIs
  • This course is suitable for Data Analyst, Data Scientists, and ML Practitioners that have a stronger engineering background and would like to benefit from a deeper understanding of the architecture and APIs


Prerequisites Knowledge:

  • Knowledge of SQL is helpful
  • Experience with either Python or Scala is required
  • Some familiarity with Apache Spark or other big-data processing frameworks is helpful but not required

Prerequisites Courses:

Software & Hardware Requirements

  • Web Browser: Chrome
  • An Internet Connection
  • GoToTraining (for remote classes only)
    Please see the GoToMeeting System Check
  • A computer, laptop, or tablet with a keyboard

Additional Notes

  • The appropriate, web-based programming environment will be provided to participants
  • Note: This class can be taught concurrently in Python and Scala


  • About Databricks, Spark
  • A high-level overview of the Spark Architecture
  • Spark Entry Points, Simple Data Injestion & overview of API docs
  • Hands-on practice with different data injestion options
  • Hands-on practice with the DataFrames APIs
  • Introduction to Spark's execution model
  • Hands-on practice with performance optimization
  • Introduction to Structured Streaming
  • Introduction to Machine Learning Pipelines

Upcoming Classes

Aug 10 - 14
9:00 AM - 1:00 PM
Pacific Daylight Time
Online - Virtual - US Pacific
$ 2000.00 USD
Aug 26 - 28
9:00 AM - 5:00 PM
Eastern Daylight Time
Online - Virtual - US Eastern
$ 2000.00 USD
Sep 30 - Oct 2
9:00 AM - 5:00 PM
Eastern Daylight Time
Online - Virtual - US Eastern
$ 2000.00 USD
Oct 14 - 16
9:30 AM - 5:30 PM
Online - France Virtual - GMT+2
$ 2000.00 USD
Nov 4 - 6
9:00 AM - 5:00 PM
Pacific Standard Time
Online - Virtual - US Pacific
$ 2000.00 USD
Dec 9 - 11
9:00 AM - 5:00 PM
Eastern Standard Time
Online - Virtual - US Eastern
$ 2000.00 USD

Onsite Training

Request Quote

Public Training

Virtual - US Pacific

Virtual - US Eastern

France Virtual - GMT+2

Classes marked with Full are full and no additional registrations are accepted. If you cannot find another class that suits your schedule, feel free to request a class and we will do our best to accomodate your needs.

Don't see a date that works for you?

Request Class

DB 105 - Apache Spark™ Programming Ratings

Training Organized
Training Objectives
Training Expectations
Training Curriculum
Training Labs
Training Overall

What do these ratings mean?