Apache Spark Programming with Databricks

Summary

Explore the fundamentals of Spark Programming with Databricks.

Description

This course uses a case study driven approach to explore the fundamentals of Spark Programming with Databricks, including Spark architecture, the DataFrame API, query optimization, and Structured Streaming. First, you will become familiar with Databricks and Spark, recognize their major components, and explore datasets for the case study using the Databricks environment. After ingesting data from various file formats, you will process and analyze datasets by applying a variety of DataFrame transformations, Column expressions, and built-in functions. Lastly, you will execute streaming queries to process streaming data and highlight the advantages of using Delta Lake.

Learning objectives

  • Identify core features of Spark and Databricks.

  • Describe how DataFrames are created and evaluated in Spark.

  • Apply DataFrame transformations to process and analyze data.

  • Apply Structured Streaming to process streaming data.

  • Explain fundamental Delta Lake concepts

Prerequisites

  • Familiarity with basic SQL concepts (select, filter, groupby, join, etc)

  • Beginner programming experience with Python or Scala (syntax, conditions, loops, functions)

Learning path

  • This course is part of the data engineer and data scientist learning paths.

Proof of completion

  • Upon 80% completion of this course, you will receive a proof of completion. 

 

Duration

2 Days