Apache Spark™ Programming with Databricks

Summary
This course uses a case study driven approach to explore the fundamentals of Spark Programming with Databricks, including Spark architecture, the DataFrame API, query optimization, Structured Streaming, and Delta.
Description
This course uses a case study driven approach to explore the fundamentals of Spark Programming with Databricks, including Spark architecture, the DataFrame API, query optimization, and Structured Streaming. First, you will become familiar with Databricks and Spark, recognize their major components, and explore datasets for the case study using the Databricks environment. After ingesting data from various file formats, you will process and analyze datasets by applying a variety of DataFrame transformations, Column expressions, and built-in functions. Lastly, you will execute streaming queries to process streaming data and highlight the advantages of using Delta Lake.
Objectives
Upon completion of the course, students should be able to meet the following objectives:
- Define the major components of Spark architecture and execution hierarchy
- Describe how DataFrames are built, transformed, and evaluated in Spark
- Apply the DataFrame API to explore, preprocess, join, and ingest data in Spark
- Apply the Structured Streaming API to perform analytics on streaming data
- Navigate the Spark UI and describe how the catalyst optimizer, partitioning, and caching affect Spark's execution performance
Prerequisites
- Familiarity with basic SQL concepts (select, filter, groupby, join, etc.)
- Beginner programming experience with Python or Scala (syntax, conditions, loops, functions)
Additional Notes
All participants will need :
Outline
Day 1: DataFrames
Day 2: DataFrames and Transformations
Day 3: Transformations and Spark Internals
Day 4: Structured Streaming and Delta
Upcoming Classes
Pacific Daylight Time
Central European Summer Time
Central European Summer Time
Eastern Daylight Time
Eastern Daylight Time
Central European Summer Time
Pacific Daylight Time
Central European Summer Time
Eastern Daylight Time