Git Product home page Git Product logo

datafuse's Introduction

Datafuse

Modern Real-Time Data Processing & Analytics DBMS with Cloud-Native Architecture

Datafuse is a Real-Time Data Processing & Analytics DBMS with Cloud-Native Architecture written in Rust, inspired by ClickHouse and powered by arrow-rs, built to make it easy to power the Data Cloud.

Principles

  • Fearless

    • No data races, No unsafe, Minimize unhandled errors
  • High Performance

    • Everything is Parallelism
  • High Scalability

    • Everything is Distributed
  • High Reliability

    • True Separation of Storage and Compute

Architecture

Datafuse Architecture

Performance

  • Memory SIMD-Vector processing performance only
  • Dataset: 100,000,000,000 (100 Billion)
  • Hardware: AMD Ryzen 7 PRO 4750U, 8 CPU Cores, 16 Threads
  • Rust: rustc 1.49.0 (e1884a8e3 2020-12-29)
  • Build with Link-time Optimization and Using CPU Specific Instructions
  • ClickHouse server version 21.2.1 revision 54447
Query FuseQuery (v0.1) ClickHouse (v21.2.1)
SELECT avg(number) FROM numbers_mt(100000000000) (3.11 s.) ×3.14 slow, (9.77 s.)
10.24 billion rows/s., 81.92 GB/s.
SELECT sum(number) FROM numbers_mt(100000000000) (2.96 s.) ×2.02 slow, (5.97 s.)
16.75 billion rows/s., 133.97 GB/s.
SELECT min(number) FROM numbers_mt(100000000000) (3.57 s.) ×3.90 slow, (13.93 s.)
7.18 billion rows/s., 57.44 GB/s.
SELECT max(number) FROM numbers_mt(100000000000) (3.59 s.) ×4.09 slow, (14.70 s.)
6.80 billion rows/s., 54.44 GB/s.
SELECT count(number) FROM numbers_mt(100000000000) (1.76 s.) ×2.22 slow, (3.91 s.)
25.58 billion rows/s., 204.65 GB/s.
SELECT sum(number+number+number) FROM numbers_mt(100000000000) (23.14 s.) ×5.47 slow, (126.67 s.)
789.47 million rows/s., 6.32 GB/s.
SELECT sum(number) / count(number) FROM numbers_mt(100000000000) (3.09 s.) ×1.96 slow, (6.07 s.)
16.48 billion rows/s., 131.88 GB/s.
SELECT sum(number) / count(number), max(number), min(number) FROM numbers_mt(100000000000) (6.73 s.) ×4.01 slow, (27.59 s.)
3.62 billion rows/s., 28.99 GB/s.
SELECT number FROM numbers_mt(10000000000) ORDER BY number DESC LIMIT 1000 (6.91 s.) ×1.42 slow, (9.83 s.)
1.02 billion rows/s., 8.14 GB/s.
SELECT max(number),sum(number) FROM numbers_mt(1000000000) GROUP BY number % 3, number % 4, number % 5 (10.87 s.) ×1.95 fast, (5.58 s.)
179.23 million rows/s., 1.43 GB/s.

Note:

  • ClickHouse system.numbers_mt is 16-way parallelism processing
  • FuseQuery system.numbers_mt is 16-way parallelism processing

Status

General

  • SQL Parser
  • Query Planner
  • Query Optimizer
  • Predicate Push Down
  • Limit Push Down
  • Projection Push Down
  • Type coercion
  • Parallel Query Execution
  • Distributed Query Execution
  • Hash GroupBy
  • Merge-Sort OrderBy
  • Joins (WIP)

SQL Support

  • Projection
  • Filter (WHERE)
  • Limit
  • Aggregate Functions
  • Scalar Functions
  • UDF Functions
  • SubQueries
  • Sorting
  • Joins (WIP)
  • Window (TODO)

Getting Started

Learn Datafuse

Try Datafuse

Contributing

Roadmap

  • 0.1 Support aggregation select (2021.02)
  • 0.2 Support distributed query (2021.03)
  • 0.3 Support group by (2021.04)
  • 0.4 Support order by (2021.04)
  • 0.5 Support join
  • 1.0 Support TPC-H benchmark

License

Datafuse is licensed under Apache 2.0.

datafuse's People

Contributors

bohutang avatar dependabot-preview[bot] avatar dependabot[bot] avatar drmingdrmer avatar hulunbier avatar jyizheng avatar leiysky avatar smallfish avatar sundy-li avatar taiyang-li avatar tceason avatar tlightsky avatar wubx avatar zhang2014 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.