Data Engineering with Databricks
This 2-day course will teach you best practices for using Databricks to build data pipelines, through lectures and hands-on labs. At the end of the course, you will have all the knowledge and skills that a data engineer would need to build an end-to-end Delta Lake pipeline for streaming and batch data, from raw data ingestion to consumption by end users.
This course begins with a review of programming with Spark APIs and an introduction to key terms and definitions of Databricks data engineering tools, followed by an overview of DB Connect, the Spark UI, and writing testable code. Participants will learn about the Cloud Data Platform in terms of data architecture concepts and will build an end-to-end OLAP data pipeline using Delta Lake with batch and streaming data, learning best practices throughout.
Participants who wish to dive deeper into tuning and optimization can take the Advanced Data Engineering with Databricks course.
Upon completion of this course, students should be able to:
- Build an end-to-end batch and streaming OLAP data pipeline
- Make data available for consumption by downstream stakeholders using specificied design patterns
- Apply Databricks' recommended best practices in engineering a single source of truth Delta architecture
Data Engineers and Machine Learning Engineers
No classes have been scheduled, but you can always Request a Class