Structured Streaming (with capstone)

Enroll

To register for this course please click "Register" below. If you are registering for someone else please check "This is for someone else".

The training is priced from $ 75.00 USD per participant



Summary

This hands-on self-paced training course targets Data Engineers who want to process big data using Apache Spark™ Structured Streaming. The course ends with a capstone project building a complete data streaming pipeline using structured streaming.

NOTE: This course is specific to the Databricks Unified Analytics Platform (based on Apache Spark™). While you might find it helpful for learning how to use Apache Spark in other environments, it does not teach you how to use Apache Spark in those environments.

Description

WARNING

You will see the following warning:

WARNING: This notebook was tested on DBR 6.2 but we found DBR 7.0. Using an untested DBR may yield unexpected results and/or various errors Please update your cluster configuration and/or download a newer version of this course before proceeding.

A new version of the courseware currently is not available but we encourage you to continue enjoying the course with a newer runtime

Length

3-6 hours, 75% hands-on

Format: Self-paced

The course is a series of five self-paced lessons plus a final capstone project building a complete data pipeline using Structured Streaming. Each lesson includes hands-on exercises.

Platform

This course is intended to be run in a Databricks workspace. The course contains Databricks notebooks for both Azure Databricks and AWS Databricks; you can run the course on either platform.

Note: Access to a Databricks workspace is not part of your course purchase price. You are responsible for getting access to Databricks. See the FAQ for instructions on how to get access to an Databricks workspace.

Learning Objectives

During this course learners

  • Use the interactive Databricks notebook environment
  • Ingest streaming log file data
  • Aggregate small batches of data with time windows
  • Stream data from a Kafka connection
  • Use Structured Streaming in conjunction with Databricks Delta
  • Visualize streaming live data
  • Use Structured Streaming to analyze streaming Twitter data

Lessons

  1. Introduction
  2. Structured Streaming Concepts
  3. Time Windows
  4. Using Kafka
  5. Capstone Project

Target Audience

  • Primary Audience: Data Engineers

Prerequisites 

There are no prerequistites for this course. 

Lab Requirements

Duration

6 hours