Introduction to Feature Engineering and Selection

Summary

Engineer and select features to improve supervised machine learning solutions.

Description

As data practitioners work on supervised machine learning solutions, they often need to manipulate data to ensure that it is compatible with machine learning algorithm requirements and the model is meeting its objective. This process is known as feature engineering, and the end result is to improve the output of machine learning solutions. Once features are engineered, data practitioners also commonly need to determine the best way to select the best features to use in their machine learning projects. In this course, you’ll learn how to perform both of these tasks. This course is divided into two modules - in the first, you’ll explore feature engineering. In the second, you’ll explore feature selection. Both modules will start with an introduction to these topics - what they are and why they’re used. Then, you’ll review techniques that help data practitioners perform these tasks. Finally, you’ll have the chance to perform two hands-on lab activities - one where you will engineer features and another where you will select features for a fictional machine learning scenario.

Learning objectives

  • Explain popular feature engineering techniques used to improve supervised machine learning solutions.

  • Explain popular feature selection techniques used to improve supervised machine learning solutions.

  • Engineer meaningful features for use in a supervised machine learning project using the Databricks Data Science Workspace.

  • Select meaningful features for use in a supervised machine learning project using the Databricks Data Science Workspace.

Prerequisites

  • Intermediate experience with machine learning (experience using machine learning and data science libraries like scikit-learn and Pandas, knowledge of linear models)

  • Intermediate experience using the Databricks Workspace to perform data analysis (using Spark DataFrames, Databricks notebooks, etc.)

  • Beginning experience with statistical concepts commonly used in data science

Learning path

  • This course is part of the Data Scientist learning path.

Proof of completion

  • Upon 80% completion of this course, you will receive a proof of completion.