Apache Spark™ Programming with Databricks

Apache Spark™ Programming with Databricks

Summary

This course uses a case study driven approach to explore the fundamentals of Spark Programming with Databricks, including Spark architecture, the DataFrame API, query optimization, Structured Streaming, and Delta.

Description

This course uses a case study driven approach to explore the fundamentals of Spark Programming with Databricks, including Spark architecture, the DataFrame API, query optimization, and Structured Streaming. First, you will become familiar with Databricks and Spark, recognize their major components, and explore datasets for the case study using the Databricks environment. After ingesting data from various file formats, you will process and analyze datasets by applying a variety of DataFrame transformations, Column expressions, and built-in functions. Lastly, you will execute streaming queries to process streaming data and highlight the advantages of using Delta Lake.

Duration

2 Days

Objectives

 

Upon completion of the course, students should be able to meet the following objectives:

  • Define the major components of Spark architecture and execution hierarchy
  • Describe how DataFrames are built, transformed, and evaluated in Spark
  • Apply the DataFrame API to explore, preprocess, join, and ingest data in Spark
  • Apply the Structured Streaming API to perform analytics on streaming data
  • Navigate the Spark UI and describe how the catalyst optimizer, partitioning, and caching affect Spark's execution performance

Audience

 

  • Data engineer
  • Data scientist
  • Machine learning engineer
  • Data architect

Prerequisites

 

  • Familiarity with basic SQL concepts (select, filter, groupby, join, etc.)
  • Beginner programming experience with Python or Scala (syntax, conditions, loops, functions)

Additional Notes

All ​participants ​will ​need ​:

  • an ​internet ​connection
  • a ​device ​that is compliant with the following supported internet browsers
  • NOTE: GoToTraining ​is ​our chosen online ​platform ​through which the ​class ​will ​be ​delivered and ​prior ​to ​attendance, ​each ​registrant ​will ​receive ​GoToTraining ​log-in ​instructions.

  • Outline

    Day 1: DataFrames

  • Introduction: Databricks Ecosystem, Spark Overview, Case Study
  • Databricks Platform: Databricks Concepts, Databricks Platform, Lab
  • Spark SQL: Spark SQL, DataFrames, SparkSession, Lab
  • Reader and Writer: Data Sources, DataFrameReader/Writer, Lab
  • Day 2: DataFrames and Transformations

  • DataFrame and Column: Columns and Expressions, Transformations, Actions, Rows, Lab
  • Aggregation: Groupby, Grouped Data Methods, Aggregate Functions, Math Functions, Lab
  • Datetimes: Dates and Timestamps, Datetime Patterns, Date Functions, Lab
  • Complex types: String Functions, Collection Functions
  • Additional Functions: Non-aggregate Functions, Na Functions, Lab
  • Day 3: Transformations and Spark Internals

  • Transformations: UDFs: UDFs, Vectorized UDFs, Performance, Lab
  • Spark Architecture: Spark Cluster, Spark Execution, Shuffling, Query Optimization, Catalyst Optimizer, Adaptive Query Execution
  • Query Optimization: Query Optimization, Catalyst Optimizer, Adaptive Query Execution
  • Partitioning: Partitions vs. Cores, Default Shuffle Partitions, Repartition, Lab
  • Review: Review of lab
  • Day 4: Structured Streaming and Delta

  • Streaming Query: Streaming Concepts, Streaming Query, Transformations, Monitoring, Lab
  • Processing Streams: Lab
  • Delta Lake: Delta Lake Concepts, Batch and Streaming
  • Upcoming Classes

    Date
    Time
    Location
    Price
    Aug 16 - 17
    8:00 AM - 5:00 PM
    Australian Eastern Standard Time (Victoria)
    Online - Virtual - Australia AEST
    $ 1500.00 USD
    Aug 16 - 17
    9:00 AM - 5:00 PM
    Central European Summer Time
    Online - Virtual - EMEA
    $ 1500.00 USD
    Aug 23 - 24
    9:00 AM - 6:00 PM
    Eastern Daylight Time
    Online - Virtual - US Eastern
    $ 1500.00 USD
    Sep 20 - 21
    9:00 AM - 6:00 PM
    Eastern Daylight Time
    Online - Virtual - US Eastern
    $ 1500.00 USD
    Sep 20 - 21
    9:00 AM - 5:00 PM
    Central European Summer Time
    Online - Virtual - EMEA
    $ 1500.00 USD
    Oct 7 - 8
    9:00 AM - 6:00 PM
    Pacific Daylight Time
    Online - Virtual - US Pacific
    $ 1500.00 USD
    Oct 18 - 19
    9:00 AM - 5:00 PM
    Central European Summer Time
    Online - Virtual - EMEA
    $ 1500.00 USD
    Oct 25 - 26
    9:00 AM - 6:00 PM
    Eastern Daylight Time
    Online - Virtual - US Eastern
    $ 1500.00 USD
    Nov 15 - 16
    9:00 AM - 6:00 PM
    Eastern Standard Time
    Online - Virtual - US Eastern
    $ 1500.00 USD
    Nov 22 - 23
    9:00 AM - 5:00 PM
    Central European Time
    Online - Virtual - EMEA
    $ 1500.00 USD
    Nov 29 - 30
    8:00 AM - 5:00 PM
    India Standard Time
    Online - Virtual - India
    $ 1500.00 USD
    Dec 6 - 7
    9:00 AM - 6:00 PM
    Pacific Standard Time
    Online - Virtual - US Pacific
    $ 1500.00 USD
    Dec 15 - 16
    9:00 AM - 5:00 PM
    Central European Time
    Online - Virtual - EMEA
    $ 1500.00 USD
    Dec 27 - 28
    9:00 AM - 6:00 PM
    Eastern Standard Time
    Online - Virtual - US Eastern
    $ 1500.00 USD

    Onsite Training

    Request Quote

    Public Training

    Virtual - Australia AEST

    Virtual - EMEA

    Virtual - US Eastern

    Virtual - US Pacific

    Virtual - India


    Don't see a date that works for you?

    Request Class