Git Product home page Git Product logo

pyplyn's Introduction

Introduction Build Status Static Analysis StackShare Code coverage

Pyplyn: a scalable time-series data collector

Pyplyn (meaning pipeline in Afrikaans) is an open source tool that extracts data from various sources, transforms and gives it meaning, and sends it to other systems for consumption.

Pyplyn was written to allow teams in Salesforce to power real-time health dashboards that go beyond one-to-one mapping of time-series data to visual artifacts.

One example of such an use-case is ingesting time-series data stored in Argus and Refocus, processing multiple metrics together (context), and displaying the result in Refocus (as red, yellow, or green lights).

Pyplyn System Diagram

Features

  • Simple and reliable data pipeline with support for various transformations
  • No code required, JSON-based syntax
  • Flexible multi-stage source/transformation/destination logic
  • Developed with support for extension via easy-to-grasph Java code
  • Highly available and scalable (the pipeline can be partitioned across multiple node)
  • Configurations can be added/updated/removed without restarting the process
  • Publishes operational metrics (errors, p95, etc.) for monitoring service health

Improvements from release 9.x

  • Faster processing speed with the use of RxJava (4.3x faster, tested on our reference dataset)
  • Cleaner code, mainly after converting models Immutables-annotated abstract classes
  • Support mutual TLS authentication for endpoints, by specifying a Java keystore and password
  • Connect, read, and write timeouts can now be specified for each connector
  • All Jackson-based models can now be serialized (with the type specifier field)
  • AppConfig.Global.minRepeatIntervalMillis was deprecated (replaced with AppConfig.Global.runOnce)
  • Bash script for managing the service's lifecycle (start, stop, restart, logs, etc.)
  • Since 10.0.0, Pyplyn releases follow Semantic versioning guidelines.

Roadmap

We welcome ideas for improvement and bugs and as such we encourage you to submit them by opening new issues on GitHub!

Running pyplyn

Pyplyn uses Maven for its build lifecycle. At least you will need to have Maven and Java 8 installed on your host OS.

Consult the full prerequisites section to find out more.

# Clone the Pyplyn repository
git clone https://github.com/salesforce/pyplyn /tmp/pyplyn

# Build the project with Maven
cd /tmp/pyplyn
mvn clean package

# Navigate to Pyplyn's build location
cd target/

# Create a new directory for your configurations (leave empty for now)
mkdir configurations

# Rename app-config.example.json and make the required changes
mv config/app-config.example.json config/pyplyn-config.json

# Rename connectors.example.json and make the required changes (see below)
mv config/connectors.example.json config/connectors.json

# Update the _connectors.json_ file and configure your endpoints
#

# Edit bin/pyplyn.sh and set _LOCATION_ to the absolute path of the build directory
#   LOCATION=/tmp/pyplyn/target

# Start pyplyn and check logs
bash bin/pyplyn.sh start

# Check that the program started without throwing any exceptions
bash ~/pyplyn/bin/pyplyn.sh logs

A full step-by-step explanation (including how to write configurations) can be found in the Pyplyn documentation.

Next steps?

Consult the Pyplyn Documentation for an in-depth explanation of Pyplyn's features.

Generate Javadocs by running the following Maven target: mvn package.

If you would like to contribute to Pyplyn, please read the contributor guide!

Thank you!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.