SP805: Getting Started with Apache Spark SQL (AWS Databricks)
This hands-on self-paced training course targets Analysts and Data Scientists getting started using Databricks to analyze big data with Apache Spark™ SQL. The course ends with a capstone project demonstrating Exploratory Data Analysis with Spark SQL on Databricks.
NOTE: This course is specific to the Databricks Unified Analytics Platform (based on Apache Spark™). While you might find it helpful for learning how to use Apache Spark in other environments, it does not teach you how to use Apache Spark in those environments.
3-6 hours, 75% hands-on
The course is a series of six self-paced lessons plus a final capstone project performing Exploratory Data Analysis using Spark SQL on Databricks. Each lesson includes hands-on exercises.
This version of the course is intended to be run on AWS Databricks.
During this course learners
- Use the interactive Databricks notebook environment.
- Examine external data sets.
- Query existing data sets using Spark SQL.
- Visualize query results and data using the built-in Databricks visualization features.
- Perform exploratory data analysis using Spark SQL.
- Getting Started and Accessing the Course
- Querying Files with SQL
- Aggregations, JOINs and Nested Queries
- Uploading and Accessing Data
- Querying JSON & Hierarchical Data with SQL
- Querying Data Lakes with SQL
- Capstone Project: Exploratory Data Analysis
- Primary Audience: Data Analysts
- Knowledge of SQL required.
- Please be sure to use a supported browser.
This self-paced training course may be used by 1 user for 12 months from the date of purchase. It may not be transferred or shared with any other user.