DB 301 - Apache Spark™ for Machine Learning and Data Science

DB 301 - Apache Spark™ for Machine Learning and Data Science

Summary

This course provides a thorough, hands-on overview of Apache Spark and its applications to Machine Learning.

Description

The course covers the fundamentals of Apache Spark including Spark’s architecture and internals, the core APIs for using Spark, SQL and other high-level data access tools, Spark’s streaming capabilities and a heavy focus on Spark’s machine learning APIs and is delivered as a mixture of lecture and hands-on labs.

Duration

3 Days

Objectives

After completion of this class, students will be able to:

  • Understand when and where to use Spark
  • Articulate the difference between an RDD, DataFrame, and Dataset
  • Explain supervised vs unsupervised machine learning, and typical applications of both
  • Build a Machine Learning Pipeline using a combination of Transformers and Estimators
  • Save/Restore Models
  • Apply models to streaming data
  • Perform hyperparameter tuning with cross-validation
  • Analyze Spark query performance using the Spark UI
  • Train models with 3rd party libraries such as XGBoost
  • Perform hyperparameter search in parallel using single node algorithms such as scikit-learn
  • Gain familiarity with Decision Trees, Random Forests, Gradient Boosted Trees, Linear Regression, Collaborative Filtering, and K-Means
  • Explain options for putting models into production

Audience

Primarily intended for data scientists the course provides a thorough, hands-on overview of Apache Spark and its applications to Machine Learning.

Prerequisites

  • Intermediate to advanced programming experience in Python or Scala is required
  • Practical experience of developing Apache Spark applications is recommended
  • Proficiency with Apache Spark's DataFrames API is desired but not essential

Additional Notes

All ​participants ​will ​need ​:

  • an ​internet ​connection

  • a ​device ​that is compliant with the following supported internet browsers

  • to ​confirm ​​​your ​​​device ​​​can ​​​run ​​​GoToTraining : ​ Validate

  • NOTE: GoToTraining ​is ​our chosen online ​platform ​through which the ​class ​will ​be ​delivered and ​prior ​to ​attendance, ​each ​registrant ​will ​receive ​GoToTraining ​log-in ​instructions.

  • Upcoming Classes

    Date
    Time
    Location
    Price
    Oct 29 - Oct 31
    9:00 AM - 5:00 PM
    Eastern Daylight Time
    Online - Virtual Class - US Eastern Time
    $ 2500.00 USD
    Oct 29 - Oct 31
    9:00 AM - 5:00 PM
    Eastern Daylight Time
    McLean , United States
    $ 2500.00 USD
    Dec 3 - Dec 5
    9:00 AM - 5:00 PM
    Pacific Standard Time
    Online - Virtual Class - US Pacific Time
    $ 2500.00 USD
    Dec 3 - Dec 5
    9:00 AM - 5:00 PM
    Pacific Standard Time
    San Francisco , United States
    $ 2500.00 USD
    Jan 14 - Jan 16
    9:00 AM - 5:00 PM
    Pacific Standard Time
    San Francisco , United States
    $ 2500.00 USD
    Jan 14 - Jan 16
    9:00 AM - 5:00 PM
    Pacific Standard Time
    Online - Virtual Class - US Pacific Time
    $ 2500.00 USD
    Mar 3 - Mar 5
    9:00 AM - 5:00 PM
    Eastern Standard Time
    McLean , United States
    $ 2500.00 USD
    Mar 3 - Mar 5
    9:00 AM - 5:00 PM
    Eastern Standard Time
    Online - Virtual Class - US Eastern Time
    $ 2500.00 USD
    Apr 21 - Apr 23
    9:00 AM - 5:00 PM
    Pacific Daylight Time
    San Francisco , United States
    $ 2500.00 USD
    Apr 21 - Apr 23
    9:00 AM - 5:00 PM
    Pacific Daylight Time
    Online - Virtual Class - US Pacific Time
    $ 2500.00 USD
    Jun 9 - Jun 11
    9:00 AM - 5:00 PM
    Eastern Daylight Time
    Edison , United States
    $ 2500.00 USD
    Jun 9 - Jun 11
    9:00 AM - 5:00 PM
    Eastern Daylight Time
    Online - Virtual Class - US Eastern Time
    $ 2500.00 USD

    Onsite Training

    Request Quote

    Public Training

    Virtual Class - US Eastern Time

    McLean, VA

    Virtual Class - US Pacific Time

    San Francisco, CA

    Edison, NJ


    Don't see a date that works for you?

    Request Class