Git Product home page Git Product logo

kevinjqliu / flink-iceberg-minio-trino Goto Github PK

View Code? Open in Web Editor NEW

This project forked from pranav1699/flink-iceberg-minio-trino

0.0 0.0 0.0 167 KB

This project demonstrates Real-Time streaming of CDC data from MySql to Apache Iceberg using Flink SQL Client for faster data analytics and machine learning workloads.

Home Page: https://medium.com/dev-genius/streaming-cdc-data-from-mysql-to-apache-iceberg-with-hive-metastore-using-apache-flink-0de9738fba0d

Dockerfile 100.00%

flink-iceberg-minio-trino's Introduction

Real-Time Streaming of CDC Data from MySql to Apache Iceberg using Flink

Project Logo

Overview

This project demonstrates the seamless streaming of Change Data Capture (CDC) data from MySql to Apache Iceberg using Apache Flink. By utilizing Flink's SQL Client, we enable fast data analytics and support machine learning workloads.

Purpose

The purpose of this repository is to provide a comprehensive example of setting up a real-time streaming pipeline for CDC data synchronization. The integration of Flink, MySql CDC connectors, Iceberg, Minio, Hive Metastore, and Trino showcases the capabilities of modern data tools in handling dynamic data scenarios.

Tools Used

  • Trino: High-performance query engine for distributed data processing.
  • Apache Flink: Robust stream processing framework for real-time data analytics.
  • Apache Iceberg: Open-source table format and processing framework for efficient data lake management.
  • Hive Metastore: Schema management tool ensuring seamless evolution and organization of data.
  • Minio: Secure object storage solution for reliable data storage in distributed environments.

Setup Instructions

  1. Ensure that Docker and Docker Compose are installed on your system.

  2. Clone this repository:

    git clone https://github.com/pranav1699/flink-iceberg-minio-trino.git
    cd flink-iceberg-minio-trino
  3. Start the Docker containers:

    docker-compose up -d

Next Steps

Read this blog : https://medium.com/dev-genius/streaming-cdc-data-from-mysql-to-apache-iceberg-with-hive-metastore-using-apache-flink-0de9738fba0d

flink-iceberg-minio-trino's People

Contributors

pranav1699 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.