Git Product home page Git Product logo

gearpump's Introduction

GearPump Build Status codecov.io

download

GearPump is a lightweight real-time big data streaming engine. It is inspired by recent advances in the Akka framework and a desire to improve on existing streaming frameworks.

The name GearPump is a reference to the engineering term β€œgear pump,” which is a super simple pump that consists of only two gears, but is very powerful at streaming water.

We model streaming within the Akka actor hierarchy.

Per initial benchmarks we are able to process 11 million messages/second (100 bytes per message) with a 17ms latency on a 4-node cluster.

For steps to reproduce the performance test, please check Performance benchmark

Technical papers

There is a 20 pages technical paper on typesafe blog, with technical highlights https://typesafe.com/blog/gearpump-real-time-streaming-engine-using-akka

Getting Started

The latest release binary can be found at: https://github.com/intel-hadoop/gearpump/releases

You can skip step 1 and step 2 if you are using pre-build binaries.

The latest released version can be found at: https://github.com/intel-hadoop/gearpump/releases

  1. Clone the GearPump repository
git clone https://github.com/intel-hadoop/gearpump.git
cd gearpump
  1. Build package

Build a package

## Please use scala 2.11
## The target package path: target/gearpump-$VERSION.tar.gz
sbt clean assembly packArchive ## Or use: sbt clean assembly pack-archive
  1. Configure

Distribute the package to all nodes. Modify conf/gear.conf on all nodes. You MUST configure akka.remote.netty.tcp.hostname to make it point to your hostname(or ip), and gearpump.cluster.masters to represent a list of master nodes.

### Put Akka configuration here
base {

  ##############################
  ### Required to change!!
  ### You need to set the ip address or hostname of this machine
  ###
  akka.remote.netty.tcp.hostname = "127.0.0.1"
}

#########################################
### This is the default configuration for gearpump
### To use the application, you at least need to change gearpump.cluster to point to right master
#########################################
gearpump {

  ##############################
  ### Required to change!!
  ### You need to set the master cluster address here
  ###
  ###
  ### For example, you may start three master
  ### on node1: bin/master -ip node1 -port 3000
  ### on node2: bin/master -ip node2 -port 3000
  ### on node3: bin/master -ip node3 -port 3000
  ###
  ### Then you need to set the cluster.masters = ["node1:3000","node2:3000","node3:3000"]
  cluster {
    masters = ["127.0.0.1:3000"]
  }
}
  1. Start Master nodes

Start the master daemon on all nodes you have configured in gearpump.cluster.masters. If you have configured gearpump.cluster.masters to:

gearpump{
   cluster {
    masters = ["node1:3000", "node2:3000"]
  }
}

Then start master daemon on node1 and node2.

## on node1
cd gearpump-$VERSION
bin/master -ip node1 -port 3000

## on node2
cd gearpump-$VERSION
bin/master -ip node1 -port 3000

We support Master HA and allow master to start on multiple nodes.

  1. Start worker

Start multiple workers on one or more nodes.

bin/worker
  1. Submit application jar and run

You can submit your application to cluster by providing an application jar. For example, for built-in examples, the jar is located at examples/gearpump-examples-assembly-$VERSION.jar

## To run WordCount example
bin/gear app -jar examples/gearpump-examples-assembly-$VERSION.jar org.apache.gearpump.streaming.examples.wordcount.WordCount -master node1:3000

Check the wiki pages for more on build and [running examples] in local modes](https://github.com/intel-hadoop/gearpump/wiki#how-to-run-gearpump).

How to write a GearPump Application

This is what a GearPump WordCount looks like.

object WordCount extends App with ArgumentsParser {

  def application(config: ParseResult) : AppDescription = {
    val partitioner = new HashPartitioner()
    val split = TaskDescription(classOf[Split].getName, splitNum)
    val sum = TaskDescription(classOf[Sum].getName, sumNum)
    
    // Here we define the dag
    val dag = Graph(split ~ partitioner ~> sum)
    
    val app = AppDescription("wordCount", classOf[AppMaster].getName, appConfig, dag)
    app
  }
  
  val config = parse(args)
  val context = ClientContext(config.getString("master"))
  context.submit(application(config))
}

For detailed description on writing a GearPump application, please check Write GearPump Applications on the wiki.

Maven dependencies

Please check here

Further information

For more documentation and implementation details, please visit GearPump Wiki.

We'll have QnA and discussions at GearPump User List.

Issues should be reported to GearPump GitHub issue tracker and contributions are welcomed via pull requests

Contributors (time order)

Contacts:

Please use the google user list if possible. For things that are not OK to be shared in maillist, please contact [email protected]

License

Licensed under the Apache License, Version 2.0: http://www.apache.org/licenses/LICENSE-2.0

Acknowledgement

The netty transport code work is based on Apache Storm. Thanks Apache Storm contributors.

gearpump's People

Contributors

bitdeli-chef avatar clockfly avatar huafengw avatar kkasravi avatar manuzhang avatar smarthi avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.