Data Science on Databricks: The Bias-Variance Tradeoff

Summary

This course shows you how to select a family of machine learning models for deployment.

Description

In this course, we’ll show you how to use scikit-learn on Databricks, along with some core statistical and data science principles, to select a family of machine learning models for deployment.

 

This course is the first in a series of three courses developed to show you how to use Databricks to work with a single data set from experimentation to production-scale machine learning model deployment. The other courses in this series include: 

 

  • Tracking Experiments with MLflow

  • Deploying a Machine Learning Project with MLflow Projects

 

Learning objectives

  • Create and explore an aggregate sample from user event data.

  • Design an MLflow experiment to estimate model bias and variance.

  • Use exploratory data analysis and estimated model bias and variance to select a family of models for model development.

Prerequisites

 

  • Beginning-level experience running data science workflows in the Databricks Workspace

  • Beginner-level experience with Apache Spark

  • Intermediate-level experience with the Scipy Numerical Stack