SP850-Az: Structured Streaming (Azure Databricks)
This hands-on self-paced training course targets Data Engineers who want to process big data using Apache Spark™ Structured Streaming. The course ends with a capstone project building a complete data streaming pipeline using structured streaming.
NOTE: This course is specific to the Databricks Unified Analytics Platform (based on Apache Spark™). While you might find it helpful for learning how to use Apache Spark in other environments, it does not teach you how to use Apache Spark in those environments.
3-6 hours, 75% hands-on
The course is a series of five self-paced lessons plus a final capstone project building a complete data pipeline using Structured Streaming. Each lesson includes hands-on exercises.
This version of the course is intended to be run on Azure Databricks.
Note: Access to a Databricks workspace is not part of your course purchase price. You are responsible for getting access to Databricks. See the FAQ for instructions on how to get access to an Databricks workspace.
During this course learners
- Use the interactive Databricks notebook environment
- Ingest streaming log file data
- Aggregate small batches of data with time windows
- Stream data from a Kafka connection
- Use Structured Streaming in conjunction with Databricks Delta
- Visualize streaming live data
- Use Structured Streaming to analyze streaming Twitter data
- Structured Streaming Concepts
- Time Windows
- Using Kafka
- Capstone Project
- Primary Audience: Data Engineers
- Getting Started with Apache Spark™ DataFrames (optional, but strongly encouraged)
- Please be sure to use a supported browser.
This self-paced training course may be used by 1 user for 12 months from the date of purchase. It may not be transferred or shared with any other user.