Data Pipelines Capstone: Manufacturing

Data Pipelines Capstone: Manufacturing

Summary

This capstone is a guided project focused on establishing a data pipeline to transform data from source through the bronze, silver, and gold layers for a manufacturing organization

Description

This capstone is a guided project in establishing a data pipeline to transform source data through the bronze, silver, and gold layers for a manufacturing organization. It provides students with a series of Spark programming challenges replicating a real-world data pipeline construction. Spark programming skills like the importing/exporting of data, data manipulation, streaming, and visualization will be used and improved.

Duration

8 hours

Objectives

Upon completion, students should be able to:

  • Build complete data pipelines using Apache Spark and Python or Scala
  • Clearly articulate and apply the Bronze-Silver-Gold data pipeline philosophy
  • Read source data from a variety of formats and save as Delta tables
  • Clean, transform, and manipulate data DataFrames
  • Apply logical operations to DataFrames to answer organizational questions
  • Build real-time streaming analyses with structured streaming

Audience

  • This course is ideal for data engineers
  • This course is suitable for machine learning engineers and data scientists

Prerequisites

Prerequisite Knowledge:

  • Familiarity with Python or Scala is required
  • Familiarity with Spark is required
  • Familiarity with data engineering best practices is recommended

Prerequisite Courses:

  • DB 105 - Apache Spark Programming™

Software & Hardware Requirements

  • Web Browser: Chrome
  • An Internet Connection
  • GoToTraining (for remote classes only)
    Please see the GoToMeeting System Check
  • A computer, laptop, or tablet with a keyboard

Additional Notes

  • The appropriate, web-based programming environment will be provided to students
  • This class is taught in Python and Scala

Outline

  • Introduction to the capstone project and the Bronze-Silver-Gold approach
  • Transform source data from a variety of course formats to bronze Delta tables
  • Clean and manipulate bronze Delta tables to create silver Delta tables for further organizational use
  • Apply logical operations to silver Delta tables to create gold Delta tables for answering organizational questions
  • Build real-time streaming analyses with structured streaming

Upcoming Classes

No classes have been scheduled, but you can always Request a Class.