Managed Delta Lake

Summary

This hands-on self-paced training course targets data engineers, data scientists and data analysts who want to use Managed Delta Lake for ETL processing on data lakes.

Description

Length

3-6 hours, 75% hands-on practical experiences

Format: Self-paced eLearning

The course is a series of seven self-paced lessons. Each lesson includes hands-on exercises.

The course contains Databricks notebooks for both Azure Databricks and AWS Databricks; you can run the course on either platform.

Learning Objectives

During this course learners

  • Use the interactive Databricks notebook environment.
  • Create, append and upsert data into a data lake.
  • Use Managed Delta Lake to manage and extract actionable insights out of a data lake.
  • Use Databricks advanced optimization features to speed up queries.
  • Seamlessly ingest streaming and historical data.
  • Implement a data pipeline using Managed Delta Lake.

Lessons

  1. Introducing Delta Lake
  2. Create
  3. Append
  4. Upsert
  5. Streaming
  6. Architecture

Target Audience

  • Primary Audience: Data Engineers

Prerequisites

  • Getting Started with Apache Spark SQL
  • ETL Part 1: Data Extraction
  • ETL Part 2: Transformations and Loads

Lab Requirements

Duration

8 hours