SP800: Getting Started with Apache Spark DataFrames (AWS Databricks)

Enroll

To register for this course please click "Register" below. If you are registering for someone else please check "This is for someone else".

The training is priced from $ 75.00 USD per participant



Summary

This hands-on self-paced training course targets Analysts and Data Scientists getting started using Databricks on Amazon's AWS to analyze big data with Apache Spark™ DataFrames. The course ends with a capstone project demonstrating Exploratory Data Analysis with Spark DataFrames on Databricks.

NOTE: This course is specific to the Databricks Unified Analytics Platform (based on Apache Spark™). While you might find it helpful for learning how to use Apache Spark in other environments, it does not teach you how to use Apache Spark in those environments.

Description

Length

3-6 hours, 75% hands-on

Format: Self-paced

The course is a series of six self-paced lessons plus a final capstone project performing Exploratory Data Analysis using Spark DataFrames on Databricks. Each lesson includes hands-on exercises.

Platform

This version of the course is intended to be run on AWS Databricks.

Learning Objectives

During this course learners

  • Use the interactive Databricks notebook environment.
  • Examine external data sets.
  • Query existing data sets using Spark DataFrames.
  • Visualize query results and data using the built-in Databricks visualization features.
  • Perform exploratory data analysis using Spark DataFrames.
  • Learn to translate SQL statements to DataFrame syntax.

Lessons

  1. Getting Started and Accessing the Course
  2. Querying Files with DataFrames
  3. Aggregations and JOINs
  4. Uploading and Accessing Data
  5. Querying JSON & Hierarchical Data with DataFrames
  6. Querying Data Lakes with DataFrames
  7. Capstone Project: Exploratory Data Analysis

Target Audience

  • Primary Audience: Data Scientists and Engineers
  • Secondary Audience: Data Analysts

Prerequisites

  • Programming in Scala or Python required.

Lab Requirements

License Limitations

This self-paced training course may be used by 1 user for 12 months from the date of purchase. It may not be transferred or shared with any other user.

Terms

The use of the self-paced training course is subject to the Terms of Service and the Databricks Privacy Policy.

Duration

6 hours