SP821: ETL Part 2: Transformations and Loads (AWS Databricks)

Enroll

To register for this course please click "Register" below. If you are registering for someone else please check "This is for someone else".

The training is priced from $ 75.00 USD per participant



Summary

In this course data engineers apply data transformation and writing best practices such as user-defined functions, join optimizations, and parallel database writes. By the end of this course, you will transform complex data with custom functions, load it into a target database, and navigate Databricks and Spark documents to source solutions.

NOTE: This course is specific to the Databricks Unified Analytics Platform (based on Apache Spark™). While you might find it helpful for learning how to use Apache Spark in other environments, it does not teach you how to use Apache Spark in those environments.

Description

Length

3-6 hours, 75% hands-on

Format: Self-paced

The course is a series of seven self-paced lessons available in both Scala and Python. A final capstone project involves writing custom, generalizable transformation logic to population data warehouse summary tables and efficiently writing the tables to a database. Each lesson includes hands-on exercises.

Platform

This version of the course is intended to be run on AWS Databricks.

Learning Objectives

During this course learners

  • Apply built-in functions to manipulate data
  • Write UDFs with a single DataFrame column inputs
  • Apply UDFs with a multiple DataFrame column inputs and that return complex types
  • Employ table join best practices relavant to big data environments
  • Repartition DataFrames to optimize table inserts
  • Write to managed and unmanaged tables

Lessons

  1. Course Overview and Setup
  2. Common Transformations
  3. User Defined Functions
  4. Advanced UDFs
  5. Joins and Lookup Tables
  6. Database Writes
  7. Table Management
  8. Capstone Project: Custom Transformations, Aggregating and Loading

Target Audience

  • Primary Audience: Data Engineers

Prerequisites

  • ETL Part 1 strongly encouraged

Lab Requirements

License Limitations

This self-paced training course may be used by 1 user for 12 months from the date of purchase. It may not be transferred or shared with any other user.

Terms

The use of the self-paced training course is subject to the Terms of Service and the Databricks Privacy Policy.

Duration

6 hours