Data Engineering with Databricks

Data Engineering with Databricks

Summary

This 2-day course will teach you best practices for using Databricks to build data pipelines, through lectures and hands-on labs. At the end of the course, you will have all the knowledge and skills that a data engineer would need to build an end-to-end Delta Lake pipeline for streaming and batch data, from raw data ingestion to consumption by end users.

Description

This course begins with a review of programming with Spark APIs and an introduction to key terms and definitions of Databricks data engineering tools, followed by an overview of DB Connect, the Spark UI, and writing testable code. Participants will learn about the Cloud Data Platform in terms of data architecture concepts and will build an end-to-end OLAP data pipeline using Delta Lake with batch and streaming data, learning best practices throughout. Participants who wish to dive deeper into tuning and optimization can take the Advanced Data Engineering with Databricks course.

Duration

2 Days

Objectives

Upon completion of this course, students should be able to:

  • Build an end-to-end batch and streaming OLAP data pipeline
  • Make data available for consumption by downstream stakeholders using specificied design patterns
  • Apply Databricks' recommended best practices in engineering a single source of truth Delta architecture

Audience

Data Engineers and Machine Learning Engineers

Prerequisites

  • Intermediate to advanced programming skills in Python or Scala
  • Intermediate to advanced SQL skills
  • Beginning experience using the Spark DataFrames API
  • Beginning knowledge of general data engineering concepts
  • Beginning knowledge of the core features and use cases of Delta Lake

Upcoming Classes

No classes have been scheduled, but you can always Request a Class.