Josh Innovations

Apache Spark

By: Professor

Spark is a unique framework for big data analytics which gives one unique integrated API by developers for the purpose of data scientists and analysts to perform separate tasks. It supports a wide range of popular languages like Python, R, SQL, Java and Scala. Apache Spark main aim is to provide hands-on experience to create real-time Data Stream Analysis and large-scale learning solutions for data scientists, data analysts and software developers. Spark is a cluster computer framework. It is an open source and lighting fast computer designer. It is used or extends the MapReduce model. It was introduced by Apache framework. It enables a computing solution which is scalable. Also, it is flexible and cost-effective. It is used for the speed up the Hadoop computational computing software process. Spark has its own cluster management, hence it uses the Hadoop for the storage purpose.

Course Content

Apache Spark

  • Introduction to Apache Spark
  • Why Spark
  • Batch Vs. Real Time Big Data Analytics
  • Batch Analytics – Hadoop Ecosystem Overview,
  • Real Time Analytics Options,
  • Streaming Data – Storm,
  • In Memory Data – Spark, What is Spark?,
  • Spark benefits to Professionals
  • Limitations of MR in Hadoop
  • Components of Spark
  • Spark Execution Architecture
  • Benefits of Apache Spark
  • Hadoop vs Spark
  • Features of Scala
  • Basic Data Types of Scala
  • Val vs Var
  • Type Inference
  • REPL
  • Objects & Classes in Scala
  • Functions as Objects in Scala
  • Anonymous Functions in Scala
  • Higher Order Functions
  • Lists in Scala
  • Maps
  • Pattern Matching
  • Traits in Scala
  • Collections in Scala
  • Spark & Distributed Systems
  • Spark for Scalable Systems
  • Spark Execution Context
  • What is RDD
  • RDD Deep Dive
  • RDD Dependencies
  • RDD Lineage
  • Spark Application In Depth
  • Spark Deployment
  • Parallelism in Spark
  • Caching in Spark
  • Spark Transformations
  • Spark Actions
  • Spark Cluster
  • Spark SQL Introduction
  • Spark Data Frames
  • Spark SQL with CSV
  • Spark SQL with JSON
  • Spark SQL with Database
  • Features of Spark Streaming
  • Micro Batch
  • Dstreams
  • Transformations on Dstreams
  • Spark Streaming Use Case 1
  • Spark Streaming Use Case 2
  • Spark Streaming Use Case 3
  • Introduction to Graph Parallel Systems
  • Introduction to GraphX
  • Features of GraphX
  • GraphX Deep Dive
  • Graph Builder
  • Using Mllib for Movie Recommendations
  • Analyzing Recommendation Results using Spark


Copyright © Josh Innovations 2021.All right reserved.Created by Starsite