Git Product home page Git Product logo

mnityesh28 / tipoca-stream-1 Goto Github PK

View Code? Open in Web Editor NEW

This project forked from practo/tipoca-stream

0.0 1.0 0.0 15.99 MB

Near real time cloud native data pipeline in AWS (CDC+Sink). Hosts code for RedshiftSink. RDS to RedshiftSink Pipeline with masking and reloading support.

Home Page: https://towardsdatascience.com/open-sourcing-tipoca-stream-f261cdcc3a13

License: Apache License 2.0

Dockerfile 0.67% Makefile 2.36% Shell 4.31% Go 92.65%

tipoca-stream-1's Introduction

tipoca-stream

CI Status


A near realtime cloud native data pipeline using Kafka, KafkaConnect, and RedshiftSink in AWS. RedshiftSink is a high performance, low overhead data loader for Redshift, open-sourced by Practo. It comes with a rich data masking support so you can create a universal data access in your organization while preserving your customer's privacy!

Release blog.

Tipoca Stream is a successor to an internal non-realtime datawarehousing project called Tipoca, which itself derives its name from Tipoca City - home of the Clones in the Star Wars universe.

Install

The pipeline is a combination of services deployed independently. This repo holds the code for the redshiftsink only.

  • RedshiftSink Please follow REDSHIFTSINK.md to install the RedshiftSink Kubernetes Operator. Creating the RedshiftSink resource installs Batcher and Loader pods in the cluster. These pods sinks the data from Kafka topics to Redshift, it takes care of the database migration when required. Redshiftsink has a rich masking support. It also supports table reloads in Redshift when masking configurations are modified in Github.
      kubectl get redshiftsink
  • Kafka Install Kafka using Strimzi CRDs or self hosted or managed kafka.
      kubectl get kafka
  • Producer Install Producer using Strimzi CRDs and Debezium. Creating the kafkaconnect and kafkaconnector creates a kafkaconnect pod in the cluster which start streaming the data from the source(MYSQL, RDS, etc..) to Kafka.
      kubectl get kafkaconnect
      kubectl get kafkaconnector

The project has pluggable libraries which can be composed to solve any other data pipeline use case.

Contribute

Please follow this to bring a change.

Thanks

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.