Git Product home page Git Product logo

pipelinewise's Introduction

PipelineWise

PyPI - Python Version License: Apache2

PipelineWise is a Data Pipeline Framework using the Singer.io specification to ingest and replicate data from various sources to various destinations. Documentation is available at https://transferwise.github.io/pipelinewise/

Logo

Features

  • Built with ELT in mind: PipelineWise fits into the ELT landscape and is not a traditional ETL tool. PipelineWise aims to reproduce the data from the source to an Analytics-Data-Store in as close to the original format as possible. Some minor load time transformations are supported but complex mapping and joins have to be done in the Analytics-Data-Store to extract meaning.

  • Managed Schema Changes: When source data changes, PipelineWise detects the change and alters the schema in your Analytics-Data-Store automatically

  • Load time transformations: Ideal place to obfuscate, mask or filter sensitive data that should never be replicated in the Data Warehouse

  • YAML based configuration: Data pipelines are defined as YAML files, ensuring that the entire configuration is kept under version control

  • Lightweight: No daemons or database setup are required

  • Extensible: PipelineWise is using Singer.io compatible taps and target connectors. New connectors can be added to PipelineWise with relatively small effort

Table of Contents

Connectors

Tap extracts data from any source and write it to a standard stream in a JSON-based format, and target consumes data from taps and do something with it, like load it into a file, API or database

Type Name Extra Latest Version Description
Tap Postgres PyPI version Extracts data from PostgreSQL databases. Supporting Log-Based, Key-Based Incremental and Full Table replications
Tap MySQL PyPI version Extracts data from MySQL databases. Supporting Log-Based, Key-Based Incremental and Full Table replications
Tap Kafka PyPI version Extracts data from Kafka topics
Tap S3 CSV PyPI version Extracts data from S3 csv files (currently a fork of tap-s3-csv because we wanted to use our own auth method)
Tap Zendesk PyPI version Extracts data from Zendesk using OAuth and Key-Based incremental replications
Tap Snowflake PyPI version Extracts data from Snowflake databases. Supporting Key-Based Incremental and Full Table replications
Tap Salesforce PyPI version Extracts data from Salesforce database using BULK and REST extraction API with Key-Based incremental replications
Tap Jira PyPI version Extracts data from Atlassian Jira using Base auth or OAuth credentials
Tap MongoDB PyPI version Extracts data from MongoDB databases. Supporting Log-Based and Full Table replications
Tap AdWords Extra PyPI version Extracts data Google Ads API (former Google Adwords) using OAuth and support incremental loading based on input state
Tap Google Analytics Extra PyPI version Extracts data from Google Analytics
Tap Oracle Extra PyPI version Extracts data from Oracle databases. Supporting Log-Based, Key-Based Incremental and Full Table replications
Tap Zuora Extra PyPI version Extracts data from Zuora database using AQAA and REST extraction API with Key-Based incremental replications
Tap GitHub PyPI version Extracts data from GitHub API using Personal Access Token and Key-Based incremental replications
Tap Shopify Extra PyPI version Extracts data from Shopify API using Personal App API Password and date based incremental replications
Tap Slack PyPI version Extracts data from a Slack API using Bot User Token and Key-Based incremental replications
Target Postgres PyPI version Loads data from any tap into PostgreSQL database
Target Redshift PyPI version Loads data from any tap into Amazon Redshift Data Warehouse
Target Snowflake PyPI version Loads data from any tap into Snowflake Data Warehouse
Target S3 CSV PyPI version Uploads data from any tap to S3 in CSV format
Transform Field PyPI version Transforms fields from any tap and sends the results to any target. Recommended for data masking/ obfuscation

Note: Extra connectors are experimental connectors and written by community contributors. These connectors are not maintained regularly and not installed by default. To install the extra packages use the --connectors=all option when installing PipelineWise.

Running from docker

If you have Docker installed then using docker is the recommended and easiest method to start using PipelineWise.

  1. Build an executable docker image that has every required dependency and is isolated from your host system.

By default, the image will build with all connectors. In order to keep image size small, we strongly recommend you change it to just the connectors you need by supplying the --build-arg command:

```sh
$ docker build --build-arg connectors=tap-mysql,target-snowflake -t pipelinewise:latest .
```
  1. Once the image is ready, create an alias to the docker wrapper script:

    $ alias pipelinewise="$(PWD)/bin/pipelinewise-docker"
  2. Check if the installation was successful by running the pipelinewise status command:

    $ pipelinewise status
    
    Tap ID    Tap Type      Target ID     Target Type      Enabled    Status    Last Sync    Last Sync Result
    --------  ------------  ------------  ---------------  ---------  --------  -----------  ------------------
    0 pipeline(s)

You can run any pipelinewise command at this point. Tutorials to create and run pipelines is at creating pipelines.

Running tests:

Building from source

  1. Make sure that all dependencies are installed on your system:

    • Python 3.x
    • python3-dev
    • python3-venv
    • mongo-tools
    • mbuffer
  2. Run the install script that installs the PipelineWise CLI and all supported singer connectors into separate virtual environments:

    $ ./install.sh --connectors=all

    Press Y to accept the license agreement of the required singer components. To automate the installation and accept every license agreement run ./install --acceptlicenses Use the optional --connectors=...,... argument to install only a specific list of singer connectors.

  3. To start the CLI you need to activate the CLI virtual environment and set PIPELINEWISE_HOME environment variable:

    $ source {ACTUAL_ABSOLUTE_PATH}/.virtualenvs/pipelinewise/bin/activate
    $ export PIPELINEWISE_HOME={ACTUAL_ABSOLUTE_PATH}

    (The ACTUAL_ABSOLUTE_PATH differs on every system, the install script prints the correct commands once the installation completes)

  4. Check if the installation was successful by running the pipelinewise status command:

    $ pipelinewise status
    
    Tap ID    Tap Type      Target ID     Target Type      Enabled    Status    Last Sync    Last Sync Result
    --------  ------------  ------------  ---------------  ---------  --------  -----------  ------------------
    0 pipeline(s)

You can run any pipelinewise command at this point. Tutorials to create and run pipelines can be found here: creating pipelines.

To run unit tests:

$ pytest --ignore tests/end_to_end

To run unit tests and generate code coverage:

$ coverage run -m pytest --ignore tests/end_to_end && coverage report

To generate code coverage HTML report.

$ coverage run -m pytest --ignore tests/end_to_end && coverage html -d coverage_html

Note: The HTML report will be generated in coverage_html/index.html

To run integration and end-to-end tests:

To run integration and end-to-end tests you need to use the Docker Development Environment. This will spin up a pre-configured PipelineWise project with pre-configured source and target databases in several docker containers which is required for the end-to-end test cases.

Developing with Docker

If you have Docker and Docker Compose installed, you can create a local development environment that includes not only the PipelineWise executables but also a pre-configured development project with some databases as source and targets for a more convenient development experience and to run integration and end-to-end tests.

For further instructions about setting up local development environment go to Test Project for Docker Development Environment.

Contribution

To add new taps and targets follow the instructions on

Links

License

Apache License Version 2.0

See LICENSE to see the full text.

pipelinewise's People

Contributors

koszti avatar samira-el avatar louis-pie avatar kasparg avatar ivan-transferwise avatar alastairstuart avatar henriblancke avatar pedromartinsteenstrup avatar luke-falvey avatar mashanm avatar guy-adams avatar dependabot[bot] avatar bparlapalliwiley avatar vitorbaptista avatar steptan avatar saurabhjain2611 avatar niallrees avatar gouline avatar mhindery avatar fabien-sarcel avatar hgrsd avatar reptilianbrain avatar amigold avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.