DB 301 - Apache Spark™ for Machine Learning and Data Science

DB 301 - Apache Spark™ for Machine Learning and Data Science

Summary

This course provides a thorough, hands-on overview of Apache Spark and its applications to Machine Learning.

Description

The course covers the fundamentals of Apache Spark including Spark’s architecture and internals, the core APIs for using Spark, SQL and other high-level data access tools, Spark’s streaming capabilities and a heavy focus on Spark’s machine learning APIs and is delivered as a mixture of lecture and hands-on labs.

Duration

3 Days

Objectives

After completion of this class, students will be able to:

  • Understand when and where to use Spark
  • Articulate the difference between an RDD, DataFrame, and Dataset
  • Explain supervised vs unsupervised machine learning, and typical applications of both
  • Build a Machine Learning Pipeline using a combination of Transformers and Estimators
  • Save/Restore Models
  • Apply models to streaming data
  • Perform hyperparameter tuning with cross-validation
  • Analyze Spark query performance using the Spark UI
  • Train models with 3rd party libraries such as XGBoost
  • Perform hyperparameter search in parallel using single node algorithms such as scikit-learn
  • Gain familiarity with Decision Trees, Random Forests, Gradient Boosted Trees, Linear Regression, Collaborative Filtering, and K-Means
  • Explain options for putting models into production

Audience

Primarily intended for data scientists the course provides a thorough, hands-on overview of Apache Spark and its applications to Machine Learning.

Prerequisites

  • Intermediate to advanced programming experience in Python or Scala is required
  • Practical experience of developing Apache Spark applications is recommended
  • Proficiency with Apache Spark's DataFrames API is desired but not essential

Additional Notes

All ​participants ​will ​need ​:

  • a ​laptop ​with ​updated ​versions ​of ​Chrome ​or ​Firefox ​(Internet ​Explorer ​and ​Safari ​are ​not ​supported) ​
  • ​an ​internet ​connection ​which ​can ​support ​use ​of ​GoToTraining. ​
  • ​GoToTraining ​is ​the online ​platform ​via ​which ​the ​class ​will ​be ​delivered and ​prior ​to ​attendance, ​each ​registrant ​will ​receive ​GoToTraining ​log-in ​instructions. For ​more ​information ​and ​to ​confirm ​​​your ​​​computer ​​​can ​​​run ​​​GoToTraining, ​please ​check ​here: Validation

    Upcoming Classes

    Jul 22
    9:00 AM - 5:00 PM
    British Summer Time
    United Kingdom
    $ 2500.00 USD
    Jul 23
    9:00 AM - 5:00 PM
    Pacific Daylight Time
    Online
    $ 2500.00 USD
    Jul 31
    8:00 AM - 4:00 PM
    Eastern Daylight Time
    United States
    $ 2500.00 USD
    Aug 19
    9:00 AM - 1:00 PM
    Greenwich Mean Time
    Online
    $ 2500.00 USD
    Sep 10
    9:00 AM - 5:00 PM
    Pacific Daylight Time
    Online
    $ 2500.00 USD
    Oct 2
    2:00 PM - 6:00 PM
    Mountain Daylight Time
    United States
    $ 2500.00 USD
    Oct 29
    9:00 AM - 5:00 PM
    Pacific Daylight Time
    Online
    $ 2500.00 USD
    Nov 19
    8:00 AM - 4:00 PM
    Greenwich Mean Time
    Online
    $ 2500.00 USD
    Dec 10
    9:00 AM - 5:00 PM
    Pacific Standard Time
    Online
    $ 2500.00 USD

    Onsite Training

    Request Quote

    Public Training

    London

    Virtual Class - US Eastern Time

    Reston, VA

    Virtual Class - GMT Time

    Virtual Class - US Pacific Time

    Centennial, CO


    Don't see a date that works for you?

    Request Class