Course provided by Udemy

Study type: Online

Starts: Anytime

Price: See latest price on Udemy

Overview

Learn AWS EMR and Spark 2 using Scala as programming language

Spark is in memory distributed computing framework in Big Data eco system and Scala is programming language. It is one of the hottest technologies in Big Data as of today. 

Spark 2 have changed drastically from Spark 1. Spark SQL and DataFrames have become core module on which other modules like Structured Streaming and Machine Learning Pipe lines.

As part of this course, there will be lot of emphasis on lower level APIs called transformations and actions of Spark along with core module Spark SQL and DataFrames

EMR brings cloud capabilities to Big Data. EMR provides different options which include Spark 2 as well as one of the service on the cluster.

As part of this course, you will learn

  • Basics of Amazon Web Services
  • Security using IAM
  • Setting up EMR clusters
  • Basics of programming using Scala
  • Spark 2 – Core Transformations and Actions
  • Spark 2 – Spark SQL and Data Frames
  • Spark 2 – Streaming and Structured Streaming
  • Different File Formats and Compression algorithms
  • Submitting Spark 2 jobs on EMR using step execution
  • and many more

We will start with understanding basics of AWS, setting up EMR cluster with Spark and then jump into Spark 2 using Scala as programming language.

Expected Outcomes

  1. Ability to use AWS from Enterprise perspective
  2. Create EMR clusters and run Spark jobs
  3. Setup development environment to develop Spark 2 applications using Scala
  4. Understand basics of programming using Scala
  5. Ability to use Spark 2 Transformations and Actions to process the data
  6. Deep dive into Spark SQL, Data Frames and Data Sets from Spark 2
  7. Applying the Spark 2 skills on real world problems to process data at scale
  8. Using IntelliJ as IDE to develop Spark 2 applications using Scala as programming language