DB 100 - Apache Spark™ Overview

DB 100 - Apache Spark™ Overview

Summary

The course provides an introduction to the Spark architecture, some of the core APIs for using Spark, SQL and other high-level data access tools, as well as Spark’s streaming capabilities and machine learning APIs.

Description

After taking this class, students will be able to:
  • Use a subset of the core Spark APIs to operate on data.
  • Articulate and implement simple use cases for Spark
  • Build data pipelines and query large data sets using Spark SQL and DataFrames
  • Create Structured Streaming jobs
  • Understand how a Machine Learning pipeline works
  • Understand the basics of Spark’s internals

Duration

9 hours

Objectives

After taking this class, students will be able to:

  • Use a subset of the core Spark APIs to operate on data.
  • Articulate and implement simple use cases for Spark
  • Build data pipelines and query large data sets using Spark SQL and DataFrames
  • Create Structured Streaming jobs
  • Understand how a Machine Learning pipeline works
  • Understand the basics of Spark’s internals

Audience

Data analysts who want a quick introduction into how to use Apache Spark to streamline their big data processing, build production Spark jobs, and understand and debug running Spark applications.

Prerequisites

  • Some familiarity with Apache Spark is helpful but not required.
  • Knowledge of SQL is helpful.
  • Basic programming experience in an object-oriented or functional language is highly recommended but not required. The class can be taught concurrently in Python and Scala.

Additional Notes

All ​participants ​will ​need ​:

  • a ​laptop ​with ​updated ​versions ​of ​Chrome ​or ​Firefox ​(Internet ​Explorer ​and ​Safari ​are ​not ​supported) ​
  • ​an ​internet ​connection ​which ​can ​support ​use ​of ​GoToTraining. ​
  • ​GoToTraining ​is ​the online ​platform ​via ​which ​the ​class ​will ​be ​delivered and ​prior ​to ​attendance, ​each ​registrant ​will ​receive ​GoToTraining ​log-in ​instructions. For ​more ​information ​and ​to ​confirm ​​​your ​​​computer ​​​can ​​​run ​​​GoToTraining, ​please ​check ​here: Validation

    Upcoming Classes

    Jul 22
    9:00 AM - 5:00 PM
    British Summer Time
    United Kingdom
    $ 1500.00 USD
    Sep 10
    9:00 AM - 5:00 PM
    Pacific Daylight Time
    Online
    $ 1500.00 USD
    Sep 24
    9:00 AM - 5:00 PM
    Pacific Daylight Time
    Online
    $ 1500.00 USD
    Oct 8
    8:00 AM - 4:00 PM
    Greenwich Mean Time
    Online
    $ 1500.00 USD
    Oct 29
    9:00 AM - 5:00 PM
    Pacific Daylight Time
    Online
    $ 1500.00 USD
    Nov 5
    9:00 AM - 5:00 PM
    Pacific Standard Time
    Online
    $ 1500.00 USD
    Nov 19
    8:00 AM - 4:00 PM
    Greenwich Mean Time
    Online
    $ 1500.00 USD
    Dec 10
    9:00 AM - 5:00 PM
    Pacific Standard Time
    Online
    $ 1500.00 USD

    Onsite Training

    Request Quote

    Public Training

    London

    • 9:00 AM - 5:00 PM
      $ 1500.00 USD

    Virtual Class - US Pacific Time

    • 9:00 AM - 5:00 PM PDT
      $ 1500.00 USD
    • 9:00 AM - 5:00 PM PDT
      $ 1500.00 USD
    • 9:00 AM - 5:00 PM PST
      $ 1500.00 USD

    Virtual Class - GMT Time

    • 8:00 AM - 4:00 PM GMT
      $ 1500.00 USD
    • 8:00 AM - 4:00 PM GMT
      $ 1500.00 USD

    Virtual Class - US Eastern Time

    • 9:00 AM - 5:00 PM PDT
      $ 1500.00 USD
    • 9:00 AM - 5:00 PM PST
      $ 1500.00 USD

    Don't see a date that works for you?

    Request Class