SP822-Az: ETL Part 3: Production (Azure Databricks)

Enroll

To register for this course please click "Register" below. If you are registering for someone else please check "This is for someone else".

The training is priced from $ 75.00 USD per participant



Summary

In this course data engineers optimize and automate Extract, Transform, Load (ETL) workloads using stream processing, job recovery strategies, and automation strategies like REST API integration. By the end of this course you will schedule highly optimized and robust ETL jobs, debugging problems along the way.

NOTE: This course is specific to the Databricks Unified Analytics Platform (based on Apache Spark™). While you might find it helpful for learning how to use Apache Spark in other environments, it does not teach you how to use Apache Spark in those environments.

Description

Length

3-6 hours, 75% hands-on

Format: Self-paced

The course is a series of six self-paced lessons available in both Scala and Python. A final capstone project involves refactoring a batch ETL job to a streaming pipeline. In the process, students run the workload as a job and monitor it. Each lesson includes hands-on exercises.

Platform

This version of the course is intended to be run on Azure Databricks.

Note: Access to a Databricks workspace is not part of your course purchase price. You are responsible for getting access to Databricks. See the FAQ for instructions on how to get access to an Databricks workspace.

Note: This course will not run on Databricks Community Edition.

Learning Objectives

During this course learners

  • Perform an ETL job on a streaming data source
  • Parameterize a code base and manage task dependencies
  • Submit and monitor jobs using the REST API or Command Line Interface
  • Design and implement a job failure recovery strategy using the principle of idempotence
  • Optimize ETL queries using compression and caching best practices with optimal hardware choices

Lessons

  1. Course Overview and Setup
  2. Streaming ETL
  3. Runnable Notebooks
  4. Scheduling Jobs
  5. Job Failure
  6. ETL Optimizations
  7. Capstone Project

Target Audience

  • Primary Audience: Data Engineers

Prerequisites

  • ETL Part 1 (optional, but strongly encouraged)
  • ETL Part 2 (optional, but strongly encouraged)

Lab Requirements

License Limitations

This self-paced training course may be used by 1 user for 12 months from the date of purchase. It may not be transferred or shared with any other user.

Terms

The use of the self-paced training course is subject to the Terms of Service and the Databricks Privacy Policy.

Duration

8 hours