Git Product home page Git Product logo

flink-parameter-server's Introduction

Flink Parameter Server

A Parameter Server implementation based on the Streaming API of Apache Flink.

Parameter Server is an abstraction for model-parallel machine learning (see the work of Li et al.). Our implementation could be used with the Streaming API: it can take a DataStream of data-points as input, and produce a DataStream of model updates. This way, we can implement both online and offline ML algorithms. Currently only asynchronous training is supported.

Build

Use SBT. It can be published to the local SBT cache

sbt publish-local

and then added to a project as a dependency

libraryDependencies += "hu.sztaki.ilab" %% "flink-ps" % "0.1.0"

API

We can use the Parameter Server in the following way:

Parameter Server architecture

Basically, we can access the Parameter Server by defining a WorkerLogic, which can pull or push parameters. We provide input data to the worker via a Flink DataStream.

We need to implement the WorkerLogic trait

trait WorkerLogic[T, Id, P, WOut] extends Serializable {
  def onRecv(data: T, ps: ParameterServerClient[Id, P, WOut]): Unit
  def onPullRecv(paramId: Id, paramValue: P, ps: ParameterServerClient[Id, P, WOut]): Unit
}

where we can handle incoming data (onRecv), pull parameters from the Parameter Server, handle the answers to the pulls (onPullRecv), and push parameters to the Parameter Server or output results. We can use the ParameterServerClient:

trait ParameterServerClient[Id, P, WOut] extends Serializable {
  def pull(id: Id): Unit
  def push(id: Id, deltaUpdate: P): Unit
  def output(out: WOut): Unit
}

When we defined our worker logic we can wire it into a Flink job with the transform method of FlinkParameterServer.

def transform[T, Id, P, WOut](
  trainingData: DataStream[T],
  workerLogic: WorkerLogic[T, Id, P, WOut],
  paramInit: => Id => P,
  paramUpdate: => (P, P) => P,
  workerParallelism: Int,
  psParallelism: Int,
  iterationWaitTime: Long): DataStream[Either[WOut, (Id, P)]]

Besides the trainingData stream and the workerLogic, we need to define how the Parameter Server should initialize a parameter based on the parameter id (paramInit), and how to update a parameter based on a received push (paramUpdate). We must also define how many parallel instances of workers and parameter servers we should use (workerParallelism and psParallelism), and the iterationWaitTime (see Limitations).

There are also other options to define a DataStream transformation with a Parameter Server which let us specialize the process in more detail. See the different methods of FlinkParameterServer.

Limitations

We implement the two-way communication of workers and the parameter server with Flink Streaming iterations, which is not yet production-ready. The main issues are

  • Sometimes deadlocks due to cyclic backpressure. A workaround could be to limiting the amount of unanswered pulls per worker (e.g. by using WorkerLogic.addPullLimiter), or manually limiting the input rate of data on the input stream. In any case, deadlock would still be possible.
  • Termination is not defined for finite input. As a workaround, we can set the iterationwaitTime for the milliseconds to wait before shutting down if there's no messages sent along the iteration (see the Flink (Java Docs)https://ci.apache.org/projects/flink/flink-docs-master/api/java/)).
  • No fault tolerance.

All these issues are being addressed in FLIP-15 and FLIP-16 and soon to be fixed. Until then, we need to use workarounds.

flink-parameter-server's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

flink-parameter-server's Issues

demo?

is there have any demo, i don't know how can use it.... there are too little information...

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.