Self Paced
Courses

ETL Part 1: Data Extraction (with capstone) – Course
In this course data engineers access data where it lives and then apply data extraction best practices, including schemas, corrupt record handling, and parallelized code. By the end of this course, you will extract data from multiple sources, use schema inference and apply user-defined schemas, and navigate Azure Databricks and Apache Spark™ documents to source solutions.
NOTE: This course is specific to the Databricks Unified Analytics Platform (based on Apache Spark™). While you might find it helpful for learning how to use Apache Spark in other environments, it does not teach you how to use Apache Spark in those environments.

ETL Part 2: Data Transformation and Loads (with capstone) – Course
In this course data engineers apply data transformation and writing best practices such as user-defined functions, join optimizations, and parallel database writes. By the end of this course, you will transform complex data with custom functions, load it into a target database, and navigate Databricks and Spark documents to source solutions.
NOTE: This course is specific to the Databricks Unified Analytics Platform (based on Apache Spark™). While you might find it helpful for learning how to use Apache Spark in other environments, it does not teach you how to use Apache Spark in those environments.

ETL Part 3: Production (with capstone) – Course
In this course data engineers optimize and automate Extract, Transform, Load (ETL) workloads using stream processing, job recovery strategies, and automation strategies like REST API integration. By the end of this course you will schedule highly optimized and robust ETL jobs, debugging problems along the way.
NOTE: This course is specific to the Databricks Unified Analytics Platform (based on Apache Spark™). While you might find it helpful for learning how to use Apache Spark in other environments, it does not teach you how to use Apache Spark in those environments.

Structured Streaming (with capstone) – Course
This hands-on self-paced training course targets Data Engineers who want to process big data using Apache Spark™ Structured Streaming. The course ends with a capstone project building a complete data streaming pipeline using structured streaming.
NOTE: This course is specific to the Databricks Unified Analytics Platform (based on Apache Spark™). While you might find it helpful for learning how to use Apache Spark in other environments, it does not teach you how to use Apache Spark in those environments.
Learning Pathways

Free Customer/Partner Introductory Learning – Learning Pathway
The courses included in this learning bundle are listed in alphabetical order. If you need help deciding which courses are right for you, please reference the learning pathway images below.
All Learning Paths![]() |
|||||
Business Leaders Learning Path![]() |
SQL Analyst Learning Path![]() |
Platform Administrator Learning Path![]() |
|||
Data Scientist Learning Path![]() |
Data Engineer Learning Path![]() |