Structured Streaming

Summary

This hands-on self-paced training course targets data engineers who want to process big data using Apache Spark™ Structured Streaming.

Description

Length

3 hours, 100% hands-on

Format: Self-paced

The course is a series of four self-paced lessons. Each lesson includes hands-on exercises.

The course contains Databricks notebooks for both Azure Databricks and AWS Databricks; you can run the course on either platform.

Learning Objectives

During this course learners

  • Use the interactive Databricks notebook environment
  • Ingest streaming log file data
  • Aggregate small batches of data with time windows
  • Use Databricks Auto Loader
  • Use Structured Streaming in conjunction with Delta Lake
  • Visualize streaming live data

Lessons

  1. Structured Streaming Concepts
  2. Time Windows
  3. Ingest Data with Auto Loader
  4. Streaming Multi-Hop Tables in the Lakehouse

Target Audience

  • Primary Audience: Data Engineers

Prerequisites

  • Getting Started with Apache Spark™ SQL (optional, but strongly encouraged)

Lab Requirements

Duration

8 hours