Git Product home page Git Product logo

dozer's Introduction

CI Coverage Status Docs Join on Discord License

Overview

Dozer is a data platform for building, deploying and maintaining real-time data products.

It is ideal for companies with multiple databases, data warehouses and data lakes that are in need of combining, aggregating and transforming data in real time, and create customer facing or internal data applications.

Put it simply, Dozer empowers a single developer go from data sources to ready-made APIs in just a few minutes. All with just a with a simple configuration file.

How it works

Dozer pulls data from various sources like databases, data lakes, and data warehouses using Change Data Capture (CDC) and periodic polling mechanisms. This ensures up-to-date data ingestion in real-time or near-real-time.

After capturing data, Dozer offers the possibility of combining, transforming and aggregating it using its own internal real-time transformation engine. It supports Streaming SQL, WebAssembly (coming soon) and TypeScript (coming soon), as well as ONNX for performing AI predictions in real-time.

After processing, data is stored and indexed in a low-latency datastore (based on LMDB), queryable using REST and gRPC.

How to use it

① Build

A Dozer application consists of a YAML file that can be run locally using the Dozer Live UI or Dozer CLI. As YAML is edited, changes are immediately reflected on Dozer Live UI.

Screenshot

② Test

Dozer can run the entire infrastructure locally. You can inspect data flowing in in real time or use the built-it API explorer to query data through REST and gRPC. Dozer Live explorer also provides ready-made samples to integrate results into your front-end applications.

Screenshot

③ Deploy

Dozer applications can be self-hosted or deployed in the cloud with a single command. Dozer Cloud (coming soon) provides self-healing and monitoring capabilities, making sure your APIs are always available.

Supported Sources and Tranformation Engines

Dozer currently supports a variety of source databases, data warehouses and object stores. Whenever possible, Dozer leverages Change Data Capture (CDC) to keep data always fresh. For sources that do not support CDC, periodic polling is used.

Dozer transformations can be executed using Dozer's highly cutomizable streaming SQL engine, which provides UDF supports in WASM (coming soon), TypeScript (coming soon) and ONNX.

Here is an overview of all supported source types and transformation engines:

Screenshot

Why Dozer ?

As teams embark on the journey of implementing real-time data products, they invariably come across a host of challenges that can make the task seem daunting:

  1. Integration with Various Systems: Integrating with various data sources can present numerous technical hurdles and interoperability issues.

  2. Managing Latency: Ensuring low-latency data access, especially for customer-facing applications, can be a significant challenge.

  3. Real-Time Data Transformation: Managing real-time data transformations, especially when dealing with complex queries or large volumes of data, can be difficult and resource-intensive.

  4. Maintaining Data Freshness: Keeping the data up-to-date in real-time, particularly when it's sourced from multiple locations like databases, data lakes, or warehouses, can be a daunting task.

  5. Scalability and High Availability: Building a data application that can efficiently handle high-volume operations and remain reliable under heavy loads requires advanced architecture design and robust infrastructure.

To address all the above issues, teams often find themselves stitching together multiple technologies and a significant amount of custom code. This could involve integrating diverse systems like Kafka for real-time data streaming, Redis for low-latency data access and caching, and Spark or Flink for processing and analyzing streaming data.

Complex Tools Setup

The complexity of such a setup can become overwhelming. Ensuring that these different technologies communicate effectively, maintaining them, and handling potential failure points requires extensive effort and expertise.

This is where Dozer steps in, aiming to dramatically simplify this process. Dozer is designed as an all-in-one backend solution that integrates the capabilities of these disparate technologies into a single, streamlined tool. By doing so, Dozer offers the capacity to build an end-to-end real-time data product without the need to manage multiple technologies and extensive custom code.

Dozer's goal is to empower a single engineer or a small team of engineers to fully manage the entire lifecycle of a Data Product!

Getting Started

Follow the links below to get started with Dozer:

For a more comprehensive list of samples check out our GitHub Samples repo

dozer's People

Contributors

aaryaattrey avatar abcpro1 avatar abhishekmishragithub avatar auterium avatar cahyosubroto avatar chloeminkyung avatar chubei avatar crajcan avatar dependabot[bot] avatar dozerpadawan avatar duonganhthu43 avatar friederbluemle avatar gautamprikshit1 avatar hi-rustin avatar hoangnh93 avatar jesse-bakker avatar karolisg avatar mediuminvader avatar mrunmays avatar nurikk avatar readall avatar snork-alt avatar sonhmai avatar supergi0 avatar tinnguyen71 avatar tungbq avatar universalmind303 avatar v3g42 avatar xudong963 avatar

Stargazers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.