SP850: Structured Streaming (AWS Databricks)
This hands-on self-paced training course targets Data Engineers who want to process big data using Apache Spark™ Structured Streaming. The course ends with a capstone project building a complete data streaming pipeline using structured streaming.
NOTE: This course is specific to the Databricks Unified Analytics Platform (based on Apache Spark™). While you might find it helpful for learning how to use Apache Spark in other environments, it does not teach you how to use Apache Spark in those environments.
3-6 hours, 75% hands-on
The course is a series of five self-paced lessons plus a final capstone project building a complete data pipeline using Structured Streaming. Each lesson includes hands-on exercises.
This version of the course is intended to be run on AWS Databricks.
During this course learners
- Use the interactive Databricks notebook environment
- Ingest streaming log file data
- Aggregate small batches of data with time windows
- Stream data from a Kafka connection
- Use Structured Streaming in conjunction with Databricks Delta
- Visualize streaming live data
- Use Structured Streaming to analyze streaming Twitter data
- Structured Streaming Concepts
- Time Windows
- Using Kafka
- Capstone Project
- Primary Audience: Data Engineers
- Getting Started with Apache Spark™ DataFrames (optional, but strongly encouraged)
- Please be sure to use a supported browser.
This self-paced training course may be used by 1 user for 12 months from the date of purchase. It may not be transferred or shared with any other user.