Apache Spark™ Programming with Databricks

Apache Spark™ Programming with Databricks

Summary

This course uses a case study driven approach to explore the fundamentals of Spark Programming with Databricks, including Spark architecture, the DataFrame API, query optimization, Structured Streaming, and Delta.

Description

This course uses a case study driven approach to explore the fundamentals of Spark Programming with Databricks, including Spark architecture, the DataFrame API, query optimization, and Structured Streaming. First, you will become familiar with Databricks and Spark, recognize their major components, and explore datasets for the case study using the Databricks environment. After ingesting data from various file formats, you will process and analyze datasets by applying a variety of DataFrame transformations, Column expressions, and built-in functions. Lastly, you will execute streaming queries to process streaming data and highlight the advantages of using Delta Lake.

Duration

2 Days

Objectives

 

Upon completion of the course, students should be able to meet the following objectives:

  • Define the major components of Spark architecture and execution hierarchy
  • Describe how DataFrames are built, transformed, and evaluated in Spark
  • Apply the DataFrame API to explore, preprocess, join, and ingest data in Spark
  • Apply the Structured Streaming API to perform analytics on streaming data
  • Navigate the Spark UI and describe how the catalyst optimizer, partitioning, and caching affect Spark's execution performance

Audience

 

  • Data engineer
  • Data scientist
  • Machine learning engineer
  • Data architect

Prerequisites

 

  • Familiarity with basic SQL concepts (select, filter, groupby, join, etc.)
  • Beginner programming experience with Python or Scala (syntax, conditions, loops, functions)

Additional Notes

All ​participants ​will ​need ​:

  • an ​internet ​connection
  • a ​device ​that is compliant with the following supported internet browsers
  • NOTE: GoToTraining ​is ​our chosen online ​platform ​through which the ​class ​will ​be ​delivered and ​prior ​to ​attendance, ​each ​registrant ​will ​receive ​GoToTraining ​log-in ​instructions.

  • Outline

    Day 1: DataFrames

  • Introduction: Databricks Ecosystem, Spark Overview, Case Study
  • Databricks Platform: Databricks Concepts, Databricks Platform, Lab
  • Spark SQL: Spark SQL, DataFrames, SparkSession, Lab
  • Reader and Writer: Data Sources, DataFrameReader/Writer, Lab
  • Day 2: DataFrames and Transformations

  • DataFrame and Column: Columns and Expressions, Transformations, Actions, Rows, Lab
  • Aggregation: Groupby, Grouped Data Methods, Aggregate Functions, Math Functions, Lab
  • Datetimes: Dates and Timestamps, Datetime Patterns, Date Functions, Lab
  • Complex types: String Functions, Collection Functions
  • Additional Functions: Non-aggregate Functions, Na Functions, Lab
  • Day 3: Transformations and Spark Internals

  • Transformations: UDFs: UDFs, Vectorized UDFs, Performance, Lab
  • Spark Architecture: Spark Cluster, Spark Execution, Shuffling, Query Optimization, Catalyst Optimizer, Adaptive Query Execution
  • Query Optimization: Query Optimization, Catalyst Optimizer, Adaptive Query Execution
  • Partitioning: Partitions vs. Cores, Default Shuffle Partitions, Repartition, Lab
  • Review: Review of lab
  • Day 4: Structured Streaming and Delta

  • Streaming Query: Streaming Concepts, Streaming Query, Transformations, Monitoring, Lab
  • Processing Streams: Lab
  • Delta Lake: Delta Lake Concepts, Batch and Streaming
  • Upcoming Classes

    Date
    Time
    Location
    Price
    Oct 25 - 26
    9:00 AM - 6:00 PM
    Eastern Daylight Time
    Online - Virtual - US Eastern
    $ 1500.00 USD
    Nov 15 - 16
    9:00 AM - 6:00 PM
    Eastern Standard Time
    Online - Virtual - US Eastern
    $ 1500.00 USD
    Nov 22 - 23
    8:00 AM - 5:00 PM
    Australian Eastern Daylight Time (Victoria)
    Online - Virtual - Australia
    $ 1500.00 USD
    Nov 22 - 23
    9:00 AM - 5:00 PM
    Central European Time
    Online - Virtual - EMEA
    $ 1500.00 USD
    Dec 6 - 7
    9:00 AM - 6:00 PM
    Eastern Standard Time
    Online - Virtual - US Eastern
    $ 1500.00 USD
    Dec 15 - 16
    9:00 AM - 5:00 PM
    Central European Time
    Online - Virtual - EMEA
    $ 1500.00 USD
    Jan 10 - 11
    9:00 AM - 5:00 PM
    Central European Time
    Online - Virtual - EMEA
    $ 1500.00 USD
    Jan 20 - 21
    9:00 AM - 6:00 PM
    Eastern Standard Time
    Online - Virtual - US Eastern
    $ 1500.00 USD
    Feb 7 - 8
    8:00 AM - 5:00 PM
    Australian Eastern Daylight Time (Victoria)
    Online - Virtual - Australia
    $ 1500.00 USD
    Feb 10 - 11
    9:00 AM - 6:00 PM
    Pacific Standard Time
    Online - Virtual - US Pacific
    $ 1500.00 USD
    Feb 23 - 24
    9:00 AM - 5:00 PM
    Central European Time
    Online - Virtual - EMEA
    $ 1500.00 USD
    Feb 24 - 25
    9:00 AM - 6:00 PM
    Eastern Standard Time
    Online - Virtual - US Eastern
    $ 1500.00 USD
    Mar 14 - 15
    9:00 AM - 6:00 PM
    Eastern Daylight Time
    Online - Virtual - US Eastern
    $ 1500.00 USD
    Mar 28 - 29
    9:00 AM - 6:00 PM
    Pacific Daylight Time
    Online - Virtual - US Pacific
    $ 1500.00 USD
    Apr 11 - 12
    9:00 AM - 6:00 PM
    Eastern Daylight Time
    Online - Virtual - US Eastern
    $ 1500.00 USD
    Apr 18 - 19
    9:00 AM - 5:00 PM
    Central European Summer Time
    Online - Virtual - EMEA
    $ 1500.00 USD
    Apr 28 - 29
    9:00 AM - 6:00 PM
    Pacific Daylight Time
    Online - Virtual - US Pacific
    $ 1500.00 USD
    May 16 - 17
    9:00 AM - 6:00 PM
    Eastern Daylight Time
    Online - Virtual - US Eastern
    $ 1500.00 USD
    May 23 - 24
    9:00 AM - 5:00 PM
    Central European Summer Time
    Online - Virtual - EMEA
    $ 1500.00 USD
    Jun 2 - 3
    9:00 AM - 6:00 PM
    Eastern Daylight Time
    Online - Virtual - US Eastern
    $ 1500.00 USD
    Jun 13 - 14
    9:00 AM - 6:00 PM
    Pacific Daylight Time
    Online - Virtual - US Pacific
    $ 1500.00 USD
    Jun 20 - 21
    8:00 AM - 5:00 PM
    Australian Eastern Standard Time (Victoria)
    Online - Virtual - Australia AEST
    $ 1500.00 USD
    Jun 27 - 28
    9:00 AM - 6:00 PM
    Eastern Daylight Time
    Online - Virtual - US Eastern
    $ 1500.00 USD
    Jun 27 - 28
    9:00 AM - 5:00 PM
    Central European Summer Time
    Online - Virtual - EMEA
    $ 1500.00 USD

    Onsite Training

    Request Quote

    Public Training

    Virtual - US Eastern

    Virtual - Australia

    Virtual - EMEA

    Virtual - US Pacific

    Virtual - Australia AEST


    Don't see a date that works for you?

    Request Class