Delta Lake Rapid Start with Spark SQL

Summary

Delta Lake is a robust storage solution designed specifically to work with Apache Spark™. In this course, you learn and use the primary methods for working with Delta Lake using Spark SQL.™.

Description

Upon completion of this course, you will be able to:

  • Use Delta Lake to create a new Delta table and to convert an existing Parquet-based data lake table
  • Differentiate between a batch append and an upsert to a Delta table
  • Use Delta Lake Time Travel to view different versions of a Delta tables
  • Execute a MERGE command to upsert data into a Delta table

Who should take this course?

  • Data Analysts
  • Data Scientists
  • Data Engineers interested in data pipelines built using Spark SQL alone

Prerequisite knowledge required

  • How to upload data into a Databricks workspace
  • How to visualize data using Databricks
  • Intermediate level Spark SQL usage including the CTAS pattern, use of Spark SQL functions such as from_unixtime, lag, and lead, and partitioning
  • The fundamentals of Delta Lake