Git Product home page Git Product logo

spork's Introduction

This Pig branch adds a Spark execution mode (Spork!).

Getting Started

Dependencies

  1. Spark version 0.9.0
  2. Hadoop 1.0.4
  3. Java 7
  4. Git client
  5. ant

Building spork

Download the code and build spork using ant:

$ git clone https://github.com/sigmoidanalytics/spork.git -b spork-0.9
$ ant jar-all

Configuring spork

Export below variables into shell or in your bash profile:

export SPARK_HOME=/path/to/spark
export HADOOP_HOME=/path/to/hadoop
export HADOOP_CONF_DIR=/path/to/hadoop/conf
export BROADCAST_MASTER_IP="SET IT AS THE SPARK_MASTER_IP"      # localhost
export BROADCAST_PORT=6000
export SPARK_MASTER="set spark master here"     # local or spark://localhost:7077

Run sample script

Put data into hdfs:

$ hadoop fs -mkdir /pig-test/input/
$ hadoop fs -put ./tutorial/data/excite-small.log /pig-test/input/

Start pig and paste the script:

$ ./pig-spark
raw = LOAD '/pig-test/input/excite-small.log' USING PigStorage('\t') AS (user: chararray, time:chararray, query:chararray);
queries = FOREACH raw GENERATE query;
distinct_queries = DISTINCT queries;
STORE distinct_queries INTO '/pig-test/output/';

TODO

  1. Migrate to Spark-1.0
  2. Create spark planner instead of using mapreduce planner
  3. Get e2e tests to work with Spork and create a benchmark report

Please feel free to file issues on our github repo (https://github.com/sigmoidanalytics/spork) or mail us at: [email protected].

spork's People

Contributors

julienledem avatar dvryaboy avatar ashutoshc avatar r0hini avatar gdfm avatar mateiz avatar zjffdu avatar gkesavan avatar sigmoidanalytics avatar aniket486 avatar rangadi avatar lalit1303 avatar aneeshs avatar anirudhtodi avatar breed avatar jcoveney avatar praveenr019 avatar kamalbanga avatar

Watchers

James Cloos avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.