Data Engineering with Databricks

Summary

A comprehensive course on data engineering with Databricks that shows participants how to set up a multi hop Delta architecture.

Description

This course begins with an overview of data architecture concepts, an introduction to the Lakehouse paradigm, and an in-depth look at Delta Lake features and functionality. Participants will learn about applying software engineering principles with Databricks as they build end-to-end OLAP data pipelines using Delta Lake for batch and streaming data. Considerations around normalization, change data capture, slowly changing dimensions, and regulatory compliance will be explored. The course also discusses serving data to end users through aggregate tables and Redash. Throughout the course, emphasis will be placed on using data engineering best practices with Databricks.

Learning objectives

  • Build an end-to-end batch and streaming OLAP data pipeline using the Databricks Workspace.

  • Make data available for consumption by downstream stakeholders using specified design patterns

  • Apply Databricks’ recommended best practices in engineering a single source of truth Delta architecture.

Prerequisites

  • Intermediate to advanced programming skills in Python or Scala

  • Intermediate to advanced SQL skills

  • Beginning experience using the Spark DataFrames API

  • Beginning knowledge of general data engineering concepts

  • Beginning knowledge of the core features and use cases of Delta Lake

Learning path

  • This course is part of the data engineer learning path.

Proof of completion

  • Upon 80% completion of this course, you will receive a proof of completion. 

 

Duration

2 Days