Scalable Machine Learning with Apache Spark™

Scalable Machine Learning with Apache Spark™


In this course data analysts and data scientists practice the full data science workflow by exploring data, creating features and building models, performing hyperparameter tuning, and tracking parameters and managing models with MLflow. By the end of this course, you will have built end-to-end machine learning models ready for production. 


This course guides students through the process of building machine learning solutions using Spark. You will build and tune ML models with SparkML using transformers, estimators, and pipelines. This course highlights some of the key differences between SparkML and single-node libraries such as scikit-learn. Furthermore, you will reproduce your experiments and version your models using MLflow.


You will also integrate 3rd party libraries into Spark workloads, such as XGBoost. In addition, you will leverage Spark to scale inference of single-node models and parallelize hyperparameter tuning. This course includes hands-on labs and concludes with a collaborative capstone project. All of the notebooks are available in Python, and in Scala as well where available.


2 Days


Upon completion of the course, students should be able to:


  • Create data processing pipelines with Spark
  • Build and tune machine learning models with SparkML
  • Track, version, and deploy models with MLflow
  • Perform distributed hyperparameter tuning with Hyperopt
  • Use Spark to scale the inference of single-node models 


  • Data scientist
  • Machine learning engineer


  • Intermediate experience with Python Beginning experience with the PySpark DataFrame API (or have taken the Apache Spark Programming with Databricks class)
  • Working knowledge of machine learning and data science

Upcoming Classes

Nov 10 - 13
8:00 AM - 12:00 PM
Australian Eastern Daylight Time (New South Wales)
Online - Virtual - APJ (half-day schedule)
$ 1500.00 USD
Nov 10 - 13
9:00 AM - 1:00 PM
Pacific Standard Time
Online - Virtual - Americas (half-day schedule)
$ 1500.00 USD
Dec 2 - 3
9:00 AM - 5:00 PM
Central European Time
Online - Virtual - EMEA
$ 1500.00 USD
Dec 15 - 18
9:00 AM - 1:00 PM
Pacific Standard Time
Online - Virtual - Americas (half-day schedule)
$ 1500.00 USD