SP860-Az: Introduction to Data Science and Machine Learning (Azure Databricks)


To register for this course please click "Register" below. If you are registering for someone else please check "This is for someone else".

The training is priced from $ 75.00 USD per participant


In this course data analysts and data scientists practice the full data science workflow by exploring data, building features, training regression and classification models, and tuning and selecting the best model. By the end of this course, you will have built end-to-end machine learning models ready to be launched into production.

NOTE: This course is specific to the Databricks Unified Analytics Platform (based on Apache Spark™). While you might find it helpful for learning how to use Apache Spark in other environments, it does not teach you how to use Apache Spark in those environments.



3-6 hours, 75% hands-on

Format: Self-paced

The course is a series of eight self-paced lessons available in both Scala and Python. A final capstone project involves writing an end-to-end machine learning pipeline including exploratory analysis, featurizing the data, training a machine learning model, and tuning model hyperparameters using grid search and cross-validation.


This version of the course is intended to be run on Azure Databricks.

Note: Access to a Databricks workspace is not part of your course purchase price. You are responsible for getting access to Databricks. See the FAQ for instructions on how to get access to an Databricks workspace.

Learning Objectives

During this course learners

  • Contextualize the role of machine learning in the broader technology and business landscape
  • Introduce the main topics of supervised machine learning and build a machine learning pipeline in Spark
  • Train and evaluate models in a distributed environment
  • Perform and interpret exploratory data analysis including statistics and plotting
  • Featurize a dataset
  • Train linear regression models
  • Train logistic regression models
  • Tune hyperparameters using grid search and cross-validation


  1. Course Overview and Setup
  2. What is ML?
  3. ML Workflows
  4. Exploratory Analysis
  5. Featurization
  6. Regression Modeling
  7. Classification
  8. Model Selection
  9. Capstone Project

Target Audience

  • Primary Audience: Data Engineers


  • Getting Started with Apache Spark™ DataFrames (optional, but strongly encouraged)

Lab Requirements

License Limitations

This self-paced training course may be used by 1 user for 12 months from the date of purchase. It may not be transferred or shared with any other user.


The use of the self-paced training course is subject to the Terms of Service and the Databricks Privacy Policy.


6 hours