Git Product home page Git Product logo

spark's Introduction

What is Spark ?

Spark is a fast, scalable,general purpose engine for large scale data processing.

  • Written in Scala: Functional programming language that runs on top of JVM.

Spark comes in multiple flavours :

  • Spark Shell(Python or Scala) : Interactive data processing / exploration
  • Spark Applications: For large scale data processing needs.

Why Spark ?

Spark Context

  • Main entry point to the Spark API.
  • Spark shell provides a preconfigured Spark context called 'sc'

Spark RDD (Resilient Distributed Dataset)

RDD are fundamental unit of data in Spark. Most of the processing in Spark is done on RDDs. RDD are immutable which allows : Consistency,Concurrency,Easy & deterministic recreation.

  • Resilient : If data in memory is lost, it can be recreated.
  • Distributed : Processed accross the cluster
  • Dataset : holds data which may come from hetrogenous sources (like file,database etc.) or created programmatically.

Spark MLlib

  • What is Spark MLlib ?

  • Why you should be using Spark MLlib ?

  • How ?

Spark Streaming

  • What is Spark Streaming ?

    • An extension of core Spark.

    • Provides capability for real-time processing of streaming data.

    • Use cases : Continous ETL , Website Monitoring , Fraud detection , Ad monetization , Social media analysis , Financial market trends

  • Why you should be using Spark Streaming ?

    • Integrates batch and real-time processing

    • Easy to develop : uses Spark's high level API

    • "Once and only once" processing

    • Second-scale latencies

    • Scalability and efficient fault tolerance

  • How ?

  • Divide data stream into batches of n seconds

    • Called a Dstream (Discretized Stream)
  • Process each batch in Spark as an RDD

  • Return results of RDD operations in batches

Spark GraphX

  • What is Spark GraphX ?

  • Why you should be using Spark GraphX ?

  • How ?

For further details along with code snippets(pyspark) follow the topics listed below:

spark's People

Contributors

zydusss avatar

Stargazers

 avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.