Lakehouse with Delta Lake Deep Dive

Summary

This course provides a practical hands-on introduction to building a Lakehouse with Delta Lake.

Description

This course begins with an overview of the Lakehouse architecture, and an in-depth look at key Delta Lake features and functionality that make a Lakehouse possible. Participants will learn about applying software engineering principles with Databricks as they build end-to-end OLAP data pipelines using Delta Lake for batch and streaming data. The course also discusses serving data to end users through aggregate tables and Databricks SQL Analytics. Throughout the course, emphasis will be placed on using data engineering best practices with Databricks.

Learning objectives

  • Identify the core components of Delta Lake that make a Lakehouse possible.

  • Define commonly used optimizations available in Delta Engine.

  • Build end-to-end batch and streaming OLAP data pipeline using Delta Lake.

  • Make data available for consumption by downstream stakeholders using specified design patterns.

  • Document data at the table level to promote data discovery and cross-team communication.

  • Apply Databricks’ recommended best practices in engineering a single source of truth Delta architecture.

Prerequisites

  • Intermediate to advanced SQL skills

  • Intermediate to advanced Python skills

  • Beginning experience using the Spark DataFrames API

  • Beginning knowledge of general data engineering concepts

  • Beginning knowledge of the core features and use cases of Delta Lake

Learning path

  • This course is part of the Data Engineering and Data Scientist/ML learning path.

Proof of completion

  • Upon 80% completion of this course, you will receive a proof of completion.