Git Product home page Git Product logo

dataflow-no-code's Introduction

Beam/Dataflow no-code proof of concept

This is a proof of concept on making a no-code / low-code experience for Apache Beam and Dataflow.

tl;dr: We grab a declarative (JSON/YAML) representation of an Apache Beam pipeline, and we generate a Dockerfile with everything needed to run the pipeline.

Overview

The JSON/YAML representation (low-code) can be easily generated via a graphical UI interface (no-code).

Core functions

All the core Apache Beam transforms would be supported out-of-the-box in the JSON/YAML representation. This includes element-wise transforms, aggregation transforms, windowing, triggers, as well as I/O transforms. This also includes a transform to call used-defined functions, described below.

User-defined functions

Custom functions are supported in one or more languages.

For this prototype, we support custom functions in Python only, but any language could easily be supported by creating a local web server.

All the language servers must have a well defined input and output format:

  • Each function processes exactly one element.
  • Additional arguments can be optionally added.
  • Requests and responses are JSON encoded.
  • The response is either a value or an error.

When a custom function is used in a pipeline, it uses a custom DoFn to call the custom function. Each custom function has a URL through which it's accessible (local or remote). Each element is encoded into JSON, passed to the custom function server, and the response is JSON decoded.

Pipeline file

The JSON/YAML pipeline file would contain all the necessary information to build the image, except for the user-defined functions' code itself. It would include:

  • All the steps in the pipeline.
    • User-defined function calls need the language, function name, and any additional arguments.
  • A list of requirements for each language used in user-defined functions.

Generating the pipeline

The user files would be:

  • The JSON/YAML pipeline file.
  • All the user-defined functions in their respective language, with one (public) function per file.

The language server files could be provided in Beam released images, one image per language.

The Dockerfile would be a multi-stage build like this:

  1. Pipeline builder stage
    1. Copy or install the pipeline generator and any other requirements.
    2. Copy the JSON/YAML pipeline file from the local filesystem.
    3. Run the generator, which would create the following files:
      • The main pipeline file, with all user-defined functions registered.
      • A run script, which would start all the language servers and then run the Beam worker boot file.
  2. For each language used in user-defined functions, create a builder stage
    1. These could base from different base images if needed
    2. Install any required build tools, if any
    3. Copy the user-defined function files from the local filesystem.
    4. Copy the language server files from the language server image.
    5. Compile any source files for languages that require it.
    6. Package the language server with all the user-defiend functions
  3. Main image
    1. Update/install any packages needed, including:
      • Tools/programs needed to run the pipeline itself.
      • Tools/programs needed to run each language server used.
    2. Copy the Beam worker boot files from the Beam image.
    3. Copy the main pipeline file(s) from the pipeline builder stage.
    4. Copy the run script from the pipeline builder stage.
    5. For each language builder stage, copy the packaged language servers.
    6. Set the entry point to the run script.

Building the container image

export PROJECT=$(gcloud config get-value project)

gcloud builds submit -t gcr.io/$PROJECT/dataflow-no-code build/

Running locally

docker run --rm -e ACTION=run gcr.io/$PROJECT/dataflow-no-code

Running in Dataflow

gcloud builds submit --config run-dataflow.yaml --no-source

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.