Git Product home page Git Product logo

spark-demo's Introduction

spark-demo

Apache Spark is a cluster computing framework that runs atop distributed storage software HDFS (among others), but which offers substantial performance improvement over Hadoop. To install Spark, first ensure Hadoop is installed on your system. A demo for configuring Hadoop on OS X is given on https://github.com/jcboyd/hadoop-demo. Spark can be installed with

$ brew install spark

by which it will be placed in /usr/local/Cellar/apache-spark. Programming in Spark centers on data structures known as Resilient Distributed Datasets (RDD), which are replicated over the cluster. Spark is written primarily in Scala and Java and comes with shell interfaces for Scala, Python, and R. For example, launch the Scala shell with

$ /usr/local/Cellar/apache-spark/1.6.0/bin/spark-shell

Functional language Scala lends itself well to Spark. A MapReduce operation can be accomplished spectacularly with

val textFile = sc.textFile("file:///usr/local/Cellar/apache-spark/1.6.0/README.md")
val wordCounts = textFile.flatMap(line => line.split(" ")).map(word => (word, 1)).reduceByKey((a, b) => a + b)

A number of data processing libraries are provided with the distribution, notably mllib, a machine learning library. A linear regression demo using this library is given in LinearRegression.scala. This can be compiled with Scala build tool sbt

$ brew install sbt
$ sbt package

The model can then be run with the command

$ /usr/local/Cellar/apache-spark/1.6.0/bin/spark-submit --class "LinearRegression" --master local[4] target/scala-2.10/spark-demo_2.10-1.0.jar

spark-demo's People

Contributors

jcboyd avatar

Watchers

James Cloos avatar

spark-demo's Issues

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.