ETL Part 1 - Data Extraction — 1 user / 1 year

ETL Part 1 - Data Extraction — 1 user / 1 year

Regular price
$99.00
Sale price
$99.00

In this course data engineers access data where it lives and then apply data extraction best practices, including schemas, corrupt record handling, and parallelized code. By the end of this course, you will extract data from multiple sources, use schema inference and apply user-defined schemas, and navigate Databricks and Apache Spark™ documents to source solutions.

What's New

Version 1.2.4: Bug fixes to the Optional/01-WhySpark module.

Length

2-4 hours, 75% hands-on

Format

The course is a series of seven self-paced lessons available in both Scala and Python. A final capstone project involves writing an end-to-end ETL job that loads semi-structured JSON data into a relational model. Each lesson includes hands-on exercises.

Platforms

Supported platforms include Azure Databricks, Databricks Community Edition, and non-Azure Databricks.

  • If you're planning to use the course on Azure Databricks, select the "Azure Databricks" Platform option.
  • If you're planning to use the course on Databricks Community Edition or on a non-Azure version of Databricks, select the "Other Databricks" Platform option.

Learning Objectives

During this course you:

  • Write a basic ETL pipeline using the Spark design pattern
  • Ingest data using DBFS mounts in Azure Blob Storage and S3
  • Ingest data using serial and parallel JDBC reads
  • Define and apply a user-defined schema to semi-structured JSON data
  • Handle corrupt records
  • Productionize an ETL pipeline

Lessons

  1. Course Overview and Setup
  2. ETL Process Overview
  3. Connecting to Azure Blob Storage and S3
  4. Connecting to JDBC
  5. Applying Schemas to JSON Data
  6. Corrupt Record Handling
  7. Loading Data and Productionalizing
  8. Capstone Project: Parsing Nested Data

Lab Requirements

License Limitations

This self-paced training course may be used by 1 user for 12 months from the date of purchase.  It may not be transferred or shared with any other user.

Terms

The use of the self-paced training course is subject to the Terms of Service and the Databricks Privacy Policy.