In this course data engineers apply data transformation and writing best practices such as user-defined functions, join optimizations, and parallel database writes. By the end of this course, you will transform complex data with custom functions, load it into a target database, and navigate Databricks and Spark documents to source solutions.
2-4 hours, 75% hands-on
The course is a series of seven self-paced lessons available in both Scala and Python. A final capstone project involves writing custom, generalizable transformation logic to population data warehouse summary tables and efficiently writing the tables to a database. Each lesson includes hands-on exercises.
Supported platforms include Databricks Community Edition, Azure Databricks and Amazon.
During this course you:
- Apply built-in functions to manipulate data
- Write UDFs with a single DataFrame column inputs
- Apply UDFs with a multiple DataFrame column inputs and that return complex types
- Employ table join best practices relavant to big data environments
- Repartition DataFrames to optimize table inserts
- Write to managed and unmanaged tables
- ETL Part 1 self-paced course.
- Course Overview and Setup
- Common Transformations
- User Defined Functions
- Advanced UDFs
- Joins and Lookup Tables
- Database Writes
- Table Management
- Capstone Project: Custom Transformations, Aggregating and Loading
- Please be sure to use a supported browser.
This self-paced training course may be used by 1 user for 12 months from the date of purchase. It may not be transferred or shared with any other user.