Scalable Deep Learning with TensorFlow and Apache Spark™

Scalable Deep Learning with TensorFlow and Apache Spark™

Summary

This course covers the fundamentals of neural networks with TensorFlow and how to scale your deep learning models with Spark.

Description

This course starts with the basics of the tf.keras API including defining model architectures, optimizers, and saving/loading models. You then learn advanced concepts such as callbacks, regularization, TensorBoard, and activation functions. After training your models, you build integrations with the MLflow tracking API to reproduce and version your experiments. You will apply model interpretability libraries such as LIME and SHAP to understand how the network generates predictions. You will also gain familiarity with Convolutional Neural Networks (CNNs) and how to perform transfer learning to reduce model training time. 

 

Substantial class time is spent on scaling your deep learning applications, from distributed inference with pandas UDFs to distributed hyperparameter search with Hyperopt to distributed model training with Horovod. This course is taught fully in Python.

Duration

2 Days

Objectives

Upon completion of the course, students should be able to:

 

  • Build deep learning models using Keras/TensorFlow

  • Tune hyperparameters at scale with Hyperopt

  • Track experiments using MLflow

  • Apply models at scale using pandas UDFs

  • Scale & train distributed models using Horovod

  • Apply model interpretability libraries to understand & visualize model predictions

  • Use CNNs (convolutional neural networks) and perform transfer learning to reduce model training time

  • Implement Generative Adversarial Networks

Audience

  • Data scientist

  • Machine learning engineer

     

Prerequisites

  • Intermediate experience with Python/pandas

  • Familiarity with machine learning concepts

  • Experience with Spark is helpful, but not required

Additional Notes

  • The appropriate, web-based programming environment will be provided to students
  • This class is taught in Python only

Outline

 

Day #1 AM
Duration Modules Description
30 min Introductions & Setup Registration, Courseware & Q&As
30 min Spark Review - Create a Spark DataFrame
- Analyze the Spark UI
- Cache data
- Go between Pandas and Spark DataFrames
10 min Break  
35 min Linear Regression - Build a linear regression model using Sklearn and reimplement it in Keras
- Modify # of epochs
- Visualize loss
- Go between Pandas and Spark DataFrames
30 min Keras Modify these parameters for increased model performance:
- Activation functions
- Loss functions
- Optimizer
- Batch Size
Save and load models
10 min Break  
30 min Keras Lab Build and evaluate your first Keras model!
(Students use Boston Housing Dataset, Instructor uses California Housing)
175 min    
Day #1 PM
Duration Modules Description
35 min Advanced Keras - Perform data standardization for better model convergence
- Create custom metrics
- Add validation data
- Generate model checkpointing/callbacks
- Use TensorBoard
- Apply dropout regularization
30 min Advanced Keras Lab - Perform data standardization
- Generate a separate train/validation dataset
- Create earlycheckpointing callback
- Load and apply your saved model
10 min Break  
30 min MLflow - Log experiments with MLflow
- View MLflow UI
- Generate a UDF with MLflow and apply to a Spark DataFrame
20 min MLflow Lab - Add MLflow to your experiments from the Boston Housing Dataset!
- Create LambdaCallback to log MLflow metrics while the model is training (after each epoch)
- Create a UDF that you can invoke in SQL
10 min Break  
40 min HyperOpt & Lab - Use HyperOpt with SparkTrials to perform distributed hyperparameter search
10 min Break  
35 min Horovod - Use Horovod to train a distributed neural network
- Distributed Deep Learning best practices
220 min    
Day #2 AM
Duration Modules Description
20 min Review Review of Day 1
30 min Horovod Petastorm - Use Horovod to train a distributed neural network using Parquet files + Petastorm
10 min Break  
45 min Horovod Lab - Prepare your data for use with Horovod
- Distribute the training of our model using HorovodRunner
- Use Parquet files as input data for our distributed deep learning model with Petastorm + Horovod
10 min Break  
35 min Model Interpretability - Use LIME and SHAP to understand which features are most important in the model's prediction for that data point
150 min    
Day #2 PM
Duration Modules Description
45 min CNNs - Analyze popular CNN architectures
- Apply pre-trained CNNs to images using Pandas Scalar Iterator UDF (introduced in Spark 3.0)
20 min Lime for CNNs Lab - Use LIME to visualize how the CNN makes predictions
10 min Break  
30 min Transfer Learning - Perform transfer learning to create a cat vs dog classifier
30 min Transfer Learning Lab - Build a model with nearly perfect accuracy predicting if a patient has pneumonia or not using transfer learning
10 min Break  
30 min Generative Adversarial Networks - Understand Generative and discriminative models, build GANs
10 min Break  
25 min Best Practices - Discuss DL best practices, state of the art, and new research areas
210 min  

Upcoming Classes

Date
Time
Location
Price
Apr 26 - 29
9:00 AM - 1:00 PM
Pacific Daylight Time
Online - Virtual - Americas (half-day schedule)
$ 1500.00 USD
Jun 21 - 24
9:00 AM - 1:00 PM
Pacific Daylight Time
Online - Virtual - Americas (half-day schedule)
$ 1500.00 USD
Aug 2 - 5
9:00 AM - 1:00 PM
Pacific Daylight Time
Online - Virtual - Americas (half-day schedule)
$ 1500.00 USD

Onsite Training

Request Quote

Public Training

Virtual - Americas (half-day schedule)

Classes marked with Full are full and no additional registrations are accepted. If you cannot find another class that suits your schedule, feel free to request a class and we will do our best to accomodate your needs.


Don't see a date that works for you?

Request Class