Getting Started with Apache Spark SQL (with capstone)

Enroll

To register for this course please click "Register" below. If you are registering for someone else please check "This is for someone else".

The training is priced from $ 75.00 USD per participant



Summary

This hands-on self-paced training course targets Analysts and Data Scientists getting started using Databricks to analyze big data with Apache Spark™ SQL. The course ends with a capstone project demonstrating Exploratory Data Analysis with Spark SQL on Databricks.

NOTE: This course is specific to the Databricks Unified Analytics Platform (based on Apache Spark™). While you might find it helpful for learning how to use Apache Spark in other environments, it does not teach you how to use Apache Spark in those environments.

Description

Length

3-6 hours, 75% hands-on

Format: Self-paced

The course is a series of six self-paced lessons plus a final capstone project performing Exploratory Data Analysis using Spark SQL on Databricks. Each lesson includes hands-on exercises.

The course contains Databricks notebooks for both Azure Databricks and AWS Databricks; you can run the course on either platform.

Platform

This course is intended to be run in a Databricks workspace. The course contains Databricks notebooks for both Azure Databricks and AWS Databricks; you can run the course on either platform.

Note: Access to a Databricks workspace is not part of your course purchase price. You are responsible for getting access to Databricks. See the FAQ for instructions on how to get access to an Databricks workspace.

Learning Objectives

During this course learners

  • Use the interactive Databricks notebook environment.
  • Examine external data sets.
  • Query existing data sets using Spark SQL.
  • Visualize query results and data using the built-in Databricks visualization features.
  • Perform exploratory data analysis using Spark SQL.

Lessons

  1. Getting Started and Accessing the Course
  2. Querying Files with SQL
  3. Aggregations, JOINs and Nested Queries
  4. Uploading and Accessing Data
  5. Querying JSON & Hierarchical Data with SQL
  6. Querying Data Lakes with SQL
  7. Capstone Project: Exploratory Data Analysis

Target Audience

  • Primary Audience: Data Analysts

Prerequisites

  • Knowledge of SQL required.

Lab Requirements

Duration

6 hours