Delta Lake Rapid Start with Python

Summary

Learn and use the primary methods for working with Delta Lake using Python.

Description

Apache Spark™ is the dominant processing framework for big data. Delta Lake is a robust storage solution designed specifically to work with Apache SparkTM. It adds reliability to Spark so your analytics and machine learning initiatives have ready access to quality, reliable data. Delta Lake makes data lakes easier to work with and more robust. It is designed to address many of the problems commonly found with data lakes. This course covers the basics of working with Delta Lake, specifically with Python, on Databricks.

Learning objectives

  • Explain the big picture of data engineering with Apache Spark and Delta Lake on Databricks.

  • Create a new Delta table and to convert an existing Parquet-based data lake table.

  • Differentiate between a batch append and an upsert to a Delta table.

  • View different versions of a Delta table using Delta Lake Time Travel.

  • Execute a MERGE command to upsert data into a Delta table.

Prerequisites

  • Intermediate experience with Python or Scala programming

  • Intermediate experience with SQL

  • Beginning experience with data engineering concepts

Learning path

  • This course is part of the Data Engineer and Data Scientist learning paths.

Proof of completion

  • Upon 80% completion of this course, you will receive a proof of completion.