DB 105 - Apache Spark™ Programming

DB 105 - Apache Spark™ Programming

Summary

This 3-day course provides a thorough review of the Apache Spark framework, including the "Spark fundamentals" with specific emphasis on skills development and the unique needs of a Data Engineering team through the use of lecture and hands-on labs.

Description

This course is combined with DB 100 - Apache Spark Overview to provide a comprehensive overview of the Apache Spark framework for Data Engineers.

After working through the Apache Spark fundamentals on the first day, the following days resume with more advanced APIs and techniques such as a review of specific Readers & Writers, broadcast table joins, additional SQL functions, and more hands-on labs. Additionally, the Structured Streaming demos from day #1 are replaced with broader, streaming-specific, lectures, and labs.

Throughout the three day course, participants are also introduced into more of the Apache Spark architecture. Topics include, but are not limited to, the DAG Execution model, an introduction to the Catalyst Optimizer, and Spark-Partitioning.

Duration

3 Days

Objectives

Upon completion, participants should be able to:

  • Describe how Apache Spark's distributed design allows for the processing of Gigabytes to Terabytes of data
  • Understand and troubleshoot a broader set of performance problems that new developers often encounter
  • Understand the breadth and depth of Apache Spark's capabilities
  • Use the DataFrame APIs to ingest, alter and write data as well employ some of the more advanced transformations
  • Create Structured Streaming jobs
  • Understand how the machine learning pipeline works
  • Analyze Spark jobs from Databricks and the Spark UI

Audience

  • This course is ideal for Data Engineers that are new to Apache Spark or that have been using Apache Spark for less than one year
  • This course is suitable for SQL Analyst seeking to grow beyond simple SQL queries and into the use of the DataFrame APIs
  • This course is suitable for Data Analyst, Data Scientists, and ML Practitioners that have a stronger engineering background and would like to benefit from a deeper understanding of the architecture and APIs

Prerequisites

Prerequisites Knowledge:

  • Knowledge of SQL is helpful
  • Experience with either Python or Scala is required
  • Some familiarity with Apache Spark or other big-data processing frameworks is helpful but not required

Prerequisites Courses:

Software & Hardware Requirements

  • Web Browser: Chrome
  • An Internet Connection
  • GoToTraining (for remote classes only)
    Please see the GoToMeeting System Check
  • A computer, laptop, or tablet with a keyboard

Additional Notes

  • The appropriate, web-based programming environment will be provided to participants
  • Note: This class can be taught concurrently in Python and Scala

Outline

  • About Databricks, Spark
  • A high-level overview of the Spark Architecture
  • Spark Entry Points, Simple Data Injestion & overview of API docs
  • Hands-on practice with different data injestion options
  • Hands-on practice with the DataFrames APIs
  • Introduction to Spark's execution model
  • Hands-on practice with performance optimization
  • Introduction to Structured Streaming
  • Introduction to Machine Learning Pipelines

Upcoming Classes

Date
Time
Location
Price
Apr 21 - 23
9:00 AM - 5:00 PM
Eastern Daylight Time
Online - Virtual - US Eastern
$ 2500.00 USD
May 26 - 28
9:00 AM - 5:00 PM
Pacific Daylight Time
San Francisco , United States
$ 2500.00 USD
May 26 - 28
9:00 AM - 5:00 PM
Pacific Daylight Time
Online - Virtual - US Pacific
$ 2500.00 USD
Jun 29 - Jul 1
9:00 AM - 5:00 PM
Eastern Daylight Time
McLean , United States
$ 2500.00 USD
Jun 29 - Jul 1
9:00 AM - 5:00 PM
Eastern Daylight Time
Online - Virtual - US Eastern
$ 2500.00 USD
Jul 22 - 24
9:00 AM - 5:00 PM
Pacific Daylight Time
San Francisco , United States
$ 2500.00 USD
Jul 22 - 24
9:00 AM - 5:00 PM
Pacific Daylight Time
Online - Virtual - US Pacific
$ 2500.00 USD
Aug 26 - 28
9:00 AM - 5:00 PM
Eastern Daylight Time
McLean , United States
$ 2500.00 USD
Aug 26 - 28
9:00 AM - 5:00 PM
Eastern Daylight Time
Online - Virtual - US Eastern
$ 2500.00 USD
Sep 30 - Oct 2
9:00 AM - 5:00 PM
Eastern Daylight Time
McLean , United States
$ 2500.00 USD
Sep 30 - Oct 2
9:00 AM - 5:00 PM
Eastern Daylight Time
Online - Virtual - US Eastern
$ 2500.00 USD
Nov 4 - 6
9:00 AM - 5:00 PM
Pacific Standard Time
San Francisco , United States
$ 2500.00 USD
Nov 4 - 6
9:00 AM - 5:00 PM
Pacific Standard Time
Online - Virtual - US Pacific
$ 2500.00 USD
Dec 9 - 11
9:00 AM - 5:00 PM
Eastern Standard Time
McLean , United States
$ 2500.00 USD
Dec 9 - 11
9:00 AM - 5:00 PM
Eastern Standard Time
Online - Virtual - US Eastern
$ 2500.00 USD

Onsite Training

Request Quote

Public Training

Virtual - US Eastern

San Francisco, CA

Virtual - US Pacific

McLean, VA


Don't see a date that works for you?

Request Class

DB 105 - Apache Spark™ Programming Ratings

Training Organized
Training Objectives
Training Expectations
Training Curriculum
Training Labs
Training Overall

What do these ratings mean?