MLflow: Managing the Machine Learning Lifecycle (with capstone)

Summary

In this course data scientists and data engineers learn the best practices for managing experiments, projects, and models using MLflow. By the end of this course, you will have built a pipeline to log and deploy machine learning models using the environment they were trained with.

NOTE: This course is specific to the Databricks Unified Analytics Platform (based on Apache Spark™). While you might find it helpful for learning how to use Apache Spark in other environments, it does not teach you how to use Apache Spark in those environments.

Update: this course has been updated with the new Model Registry features.

Note: This course will not run on Databricks Community Edition.

Description

WARNING

You will see the following warning:

WARNING: This notebook was tested on DBR 6.2 but we found DBR 7.0. Using an untested DBR may yield unexpected results and/or various errors Please update your cluster configuration and/or download a newer version of this course before proceeding.

A new version of the courseware currently is not available but we encourage you to continue enjoying the course with a newer runtime

Length

3-6 hours, 75% hands-on

Format: Self-paced

The course is a series of six self-paced lessons available in Python. A final capstone project involves packaging an MLflow-based workflow that includes pre-processing logic, the optimal ML algorithm and hyperparameters, and post-processing logic. Each lesson includes hands-on exercises.

Platform

This course is intended to be run in a Databricks workspace. The course contains Databricks notebooks for both Azure Databricks and AWS Databricks; you can run the course on either platform.

Note: Access to a Databricks workspace is not part of your course purchase price. You are responsible for getting access to Databricks. See the FAQ for instructions on how to get access to an Databricks workspace.

Learning Objectives

During this course learners

  • Track machine learning experiments to organize the machine learning life cycle
  • Create, organize, and package machine learning projects with a focus on reproducibility and collaborating with a team
  • Manage the complexity of multistep machine learning projects using multistep workflows
  • Develop a generalizable way of handling machine learning models created in and deployed to a variety of environments
  • Apply what you learned with a capstone project where you create a workflow that includes pre-processing logic, the optimal ML algorithm and hyperparameters, and post-processing logic

Lessons

  1. Course Overview and Setup
  2. Experiment Tracking
  3. Packaging ML Projects
  4. Multistep Workflows
  5. Model Management
  6. Model registry
  7. Capstone Project

Target Audience

Primary Audience: Data Scientists and Data Engineers

Prerequisites

  • Python (pandas, sklearn, numpy)
  • Background in machine learning and data science

Lab Requirements

Please be sure to use a supported browser.

Duration

6 hours