DB 301 - Apache Spark™ for Machine Learning and Data Science

DB 301 - Apache Spark™ for Machine Learning and Data Science

Summary

This course provides a thorough, hands-on overview of Apache Spark and its applications to Machine Learning.

Description

The course covers the fundamentals of Apache Spark including Spark’s architecture and internals, the core APIs for using Spark, SQL and other high-level data access tools, Spark’s streaming capabilities and a heavy focus on Spark’s machine learning APIs and is delivered as a mixture of lecture and hands-on labs.

Duration

3 Days

Objectives

After completion of this class, students will be able to:

  • Understand when and where to use Spark
  • Articulate the difference between an RDD, DataFrame, and Dataset
  • Explain supervised vs unsupervised machine learning, and typical applications of both
  • Build a Machine Learning Pipeline using a combination of Transformers and Estimators
  • Save/Restore Models
  • Apply models to streaming data
  • Perform hyperparameter tuning with cross-validation
  • Analyze Spark query performance using the Spark UI
  • Train models with 3rd party libraries such as XGBoost
  • Perform hyperparameter search in parallel using single node algorithms such as scikit-learn
  • Gain familiarity with Decision Trees, Random Forests, Gradient Boosted Trees, Linear Regression, Collaborative Filtering, and K-Means
  • Explain options for putting models into production

Audience

Primarily intended for data scientists the course provides a thorough, hands-on overview of Apache Spark and its applications to Machine Learning.

Prerequisites

  • Intermediate to advanced programming experience in Python or Scala is required
  • Practical experience of developing Apache Spark applications is recommended
  • Proficiency with Apache Spark's DataFrames API is desired but not essential

Additional Notes

All ​participants ​will ​need ​:

  • an ​internet ​connection

  • a ​device ​that is compliant with the following supported internet browsers

  • to ​confirm ​​​your ​​​device ​​​can ​​​run ​​​GoToTraining : ​ Validate

  • NOTE: GoToTraining ​is ​our chosen online ​platform ​through which the ​class ​will ​be ​delivered and ​prior ​to ​attendance, ​each ​registrant ​will ​receive ​GoToTraining ​log-in ​instructions.

  • Upcoming Classes

    Date
    Time
    Location
    Price
    Aug 19 - Aug 23
    9:00 AM - 1:00 PM
    British Summer Time
    Online - Virtual Class - BST Time
    $ 2500.00 USD
    Sep 10 - Sep 12
    9:00 AM - 5:00 PM
    Pacific Daylight Time
    Online - Virtual Class - US Pacific Time
    $ 2500.00 USD
    Sep 10 - Sep 12
    9:00 AM - 5:00 PM
    Pacific Daylight Time
    San Francisco , United States
    $ 2500.00 USD
    Oct 29 - Oct 31
    9:00 AM - 5:00 PM
    Eastern Daylight Time
    Online - Virtual Class - US Eastern Time
    $ 2500.00 USD
    Oct 29 - Oct 31
    9:00 AM - 5:00 PM
    Eastern Daylight Time
    McLean , United States
    $ 2500.00 USD
    Dec 10 - Dec 12
    9:00 AM - 5:00 PM
    Pacific Standard Time
    Online - Virtual Class - US Pacific Time
    $ 2500.00 USD
    Dec 10 - Dec 12
    9:00 AM - 5:00 PM
    Pacific Standard Time
    San Francisco , United States
    $ 2500.00 USD $ 2125.00 USD
    Before Sep 11, 2019 10:00AM PDT

    Onsite Training

    Request Quote

    Public Training

    Virtual Class - BST Time

    Virtual Class - US Pacific Time

    San Francisco, CA

    Virtual Class - US Eastern Time

    McLean, VA


    Don't see a date that works for you?

    Request Class