Data Pipelines Capstone: Manufacturing

Summary
This capstone is a guided project focused on establishing a data pipeline to transform data from source through the bronze, silver, and gold layers for a manufacturing organization
Description
This capstone is a guided project in establishing a data pipeline to transform source data through the bronze, silver, and gold layers for a manufacturing organization. It provides students with a series of Spark programming challenges replicating a real-world data pipeline construction. Spark programming skills like the importing/exporting of data, data manipulation, streaming, and visualization will be used and improved.
Objectives
Upon completion, students should be able to:
- Build complete data pipelines using Apache Spark and Python or Scala
- Clearly articulate and apply the Bronze-Silver-Gold data pipeline philosophy
- Read source data from a variety of formats and save as Delta tables
- Clean, transform, and manipulate data DataFrames
- Apply logical operations to DataFrames to answer organizational questions
- Build real-time streaming analyses with structured streaming
Audience
- This course is ideal for data engineers
- This course is suitable for machine learning engineers and data scientists
Prerequisites
Prerequisite Knowledge:
- Familiarity with Python or Scala is required
- Familiarity with Spark is required
- Familiarity with data engineering best practices is recommended
Prerequisite Courses:
- DB 105 - Apache Spark Programming™
Software & Hardware Requirements
- Web Browser: Chrome
- An Internet Connection
-
GoToTraining (for remote classes only)
Please see the GoToMeeting System Check - A computer, laptop, or tablet with a keyboard
Additional Notes
- The appropriate, web-based programming environment will be provided to students
- This class is taught in Python and Scala
Outline
- Introduction to the capstone project and the Bronze-Silver-Gold approach
- Transform source data from a variety of course formats to bronze Delta tables
- Clean and manipulate bronze Delta tables to create silver Delta tables for further organizational use
- Apply logical operations to silver Delta tables to create gold Delta tables for answering organizational questions
- Build real-time streaming analyses with structured streaming