SP821: ETL Part 2: Transformations and Loads (AWS Databricks)
In this course data engineers apply data transformation and writing best practices such as user-defined functions, join optimizations, and parallel database writes. By the end of this course, you will transform complex data with custom functions, load it into a target database, and navigate Databricks and Spark documents to source solutions.
NOTE: This course is specific to the Databricks Unified Analytics Platform (based on Apache Spark™). While you might find it helpful for learning how to use Apache Spark in other environments, it does not teach you how to use Apache Spark in those environments.
3-6 hours, 75% hands-on
The course is a series of seven self-paced lessons available in both Scala and Python. A final capstone project involves writing custom, generalizable transformation logic to population data warehouse summary tables and efficiently writing the tables to a database. Each lesson includes hands-on exercises.
This version of the course is intended to be run on AWS Databricks.
During this course learners
- Apply built-in functions to manipulate data
- Write UDFs with a single DataFrame column inputs
- Apply UDFs with a multiple DataFrame column inputs and that return complex types
- Employ table join best practices relavant to big data environments
- Repartition DataFrames to optimize table inserts
- Write to managed and unmanaged tables
- Course Overview and Setup
- Common Transformations
- User Defined Functions
- Advanced UDFs
- Joins and Lookup Tables
- Database Writes
- Table Management
- Capstone Project: Custom Transformations, Aggregating and Loading
- Primary Audience: Data Engineers
- ETL Part 1 strongly encouraged
- Please be sure to use a supported browser.
This self-paced training course may be used by 1 user for 12 months from the date of purchase. It may not be transferred or shared with any other user.