Structured Streaming

Summary

This hands-on self-paced training course targets data engineers who want to process big data using Apache Spark™ Structured Streaming.

Description

Length

3-6 hours, 75% hands-on

Format: Self-paced

The course is a series of four self-paced lessons. Each lesson includes hands-on exercises.

The course contains Databricks notebooks for both Azure Databricks and AWS Databricks; you can run the course on either platform.

Learning Objectives

During this course learners

  • Use the interactive Databricks notebook environment
  • Ingest streaming log file data
  • Aggregate small batches of data with time windows
  • Stream data from a Kafka connection
  • Use Structured Streaming in conjunction with Databricks Delta
  • Visualize streaming live data
  • Use Structured Streaming to analyze streaming Twitter data

Lessons

  1. Introduction
  2. Structured Streaming Concepts
  3. Time Windows
  4. Using Kafka

Target Audience

  • Primary Audience: Data Engineers

Prerequisites

  • Getting Started with Apache Spark™ SQL (optional, but strongly encouraged)

Lab Requirements

Duration

8 hours