Git Product home page Git Product logo

naive-query-engine's Introduction

Naive Query Engine (Toy for Learning) ๐Ÿ˜„

This is a Query Engine which support SQL interface. And it is only a Toy for learn query engine only. You can check TODO to check the progress now.

Simple enough to learn (Although it is simple...but with so much work to finish.. TAT ๐Ÿ˜ญ) and Now it only has a basic architecture and most operators and planners have not implemented (will be done in the future).

This is inspired(and most ideas come) by how-query-engines-work and it is just for learning purpose. And many ideas inspired by arrow-datafusion.

Use arrow to express in-memory columnar format and use sqlparser as SQL parser.

architecture

query_engine

how to use

for now, we can use NaiveDB like below, we can use csv as table storage.

use naive_db::print_result;
use naive_db::CsvConfig;
use naive_db::NaiveDB;
use naive_db::Result;

fn main() -> Result<()> {
    let mut db = NaiveDB::default();

    db.create_csv_table("t1", "data/test_data.csv", CsvConfig::default())?;

    // select
    let ret = db.run_sql("select id, name, age + 100 from t1 where id < 9 limit 3 offset 2")?;
    print_result(&ret)?;

    // Inner Join
    db.create_csv_table("employee", "data/employee.csv", CsvConfig::default())?;
    db.create_csv_table("rank", "data/rank.csv", CsvConfig::default())?;
    db.create_csv_table("department", "data/department.csv", CsvConfig::default())?;

    let ret = db.run_sql(
        "
        select id, name, rank_name, department_name
        from employee
        join rank on 
            employee.rank = rank.id  
        join department on
            employee.department_id = department.id
    ",
    )?;
    print_result(&ret)?;

    // cross join
    let ret = db.run_sql("select * from employee join rank")?;
    print_result(&ret)?;

    // aggregate
    let ret = db.run_sql(
        "
        select count(id), sum(age), sum(score), avg(score), max(score), min(score) 
        from t1 group by id % 3",
    )?;
    print_result(&ret)?;

    Ok(())
}

output will be:

+----+-------+-----------+
| id | name  | age + 100 |
+----+-------+-----------+
| 4  | lynne | 118       |
| 5  | alice | 119       |
| 6  | bob   | 120       |
+----+-------+-----------+
+----+-------+-------------+-----------------+
| id | name  | rank_name   | department_name |
+----+-------+-------------+-----------------+
| 2  | lynne | master      | IT              |
| 1  | vee   | diamond     | IT              |
| 3  | Alex  | master      | Marketing       |
| 4  | jack  | diamond     | Marketing       |
| 5  | mike  | grandmaster | Human Resource  |
+----+-------+-------------+-----------------+
+----+-------+---------------+------+----+-------------+
| id | name  | department_id | rank | id | rank_name   |
+----+-------+---------------+------+----+-------------+
| 1  | vee   | 1             | 1    | 1  | master      |
| 2  | lynne | 1             | 0    | 2  | diamond     |
| 3  | Alex  | 2             | 0    | 3  | grandmaster |
| 4  | jack  | 2             | 1    | 4  | master      |
| 5  | mike  | 3             | 2    | 5  | diamond     |
| 1  | vee   | 1             | 1    | 1  | grandmaster |
| 2  | lynne | 1             | 0    | 2  | master      |
| 3  | Alex  | 2             | 0    | 3  | diamond     |
| 4  | jack  | 2             | 1    | 4  | grandmaster |
| 5  | mike  | 3             | 2    | 5  | master      |
| 1  | vee   | 1             | 1    | 1  | diamond     |
| 2  | lynne | 1             | 0    | 2  | grandmaster |
| 3  | Alex  | 2             | 0    | 3  | master      |
| 4  | jack  | 2             | 1    | 4  | diamond     |
| 5  | mike  | 3             | 2    | 5  | grandmaster |
+----+-------+---------------+------+----+-------------+
+-----------+----------+--------------------+-------------------+------------+------------+
| count(id) | sum(age) | sum(score)         | avg(score)        | max(score) | min(score) |
+-----------+----------+--------------------+-------------------+------------+------------+
| 3         | 61       | 255.6              | 85.2              | 90.1       | 81.1       |
| 3         | 62       | 243.29000000000002 | 81.09666666666668 | 99.99      | 60         |
| 2         | 43       | 167.7              | 83.85             | 85.5       | 82.2       |
+-----------+----------+--------------------+-------------------+------------+------------+

architecture

The NaiveDB is just simple and has clear progress just like:

impl NaiveDB {
    pub fn run_sql(&self, sql: &str) -> Result<Vec<RecordBatch>> {
        // 1. sql -> statement
        let statement = SQLParser::parse(sql)?;
        // 2. statement -> logical plan
        let sql_planner = SQLPlanner::new(&self.catalog);
        let logical_plan = sql_planner.statement_to_plan(statement)?;
        // 3. optimize
        let optimizer = Optimizer::default();
        let logical_plan = optimizer.optimize(logical_plan);
        // 4. logical plan -> physical plan
        let physical_plan = QueryPlanner::create_physical_plan(&logical_plan)?;
        // 5. execute
        physical_plan.execute()
    }
}

TODO

  • type system
  • datasource
    • mem source
    • csv as datasource
    • empty datasource
  • logical plan & expressions
  • build logical plans
    • projection
    • filter
    • aggregate
    • limit
    • join
  • physical plan & expressions
    • physical scan
    • physical projection
    • physical filter
    • physical limit
    • join
      • algorithms
        • (dumb๐Ÿ˜Š) nested loop join
        • hash join
        • sort-merge join
      • inner join
      • cross join
    • physical expression
      • column expr
      • binary operation expr(add/sub/mul/div/and/or...)
      • literal expr
      • unary expr
      • aggr expr
      • so many work to do... TAT
  • query planner
    • scan
    • limit
    • join
    • aggregate
    • ...
  • query optimization
    • more rules needed
  • sql support
    • parser
    • SQL planner: statement -> logical plan
      • scan
      • projection
      • selection
      • limit
      • join
      • aggregate
        • group by
      • scalar function

naive-query-engine's People

Contributors

ganziheng avatar hanxuanliang avatar veeupup avatar ywqzzy avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

naive-query-engine's Issues

Limit

planner, physiacl, logical

Join !

When projection and selection are finished, we should consider writing join as the highest priority.

We can split it into some tasks:

  • join planner
  • nested loop join #24
    • support multi conditions
  • #25
  • #26
  • #28

Support `In` list and subquery

support in sql,

  • in list, like select * from t where id in (1, 2, 4)
  • in subquery like select * from t where id in (select id from t2 where age < 10)

SQL Test Framework

We need a test framework to test SQL, it will be systematic testing to show which SQL we have supported.

The framework would be that we have a SQL file and an excepted result file, and we can compare the result between the actual file and result file.

Roadmap 1.0

The naive-query-engine is designed to learn the query engine and the code or logic should keep it always simple and clear!
We want to make this project support basic SQL grammar and run it with a basic MPP executor.

We can split the total process into two steps. First, we want to have the basic SQL grammar, and the issues are below (some have finished and do not have an issue related...)
#15
#16
#17
#26
#28
#29
#32
#43
#44
#47
#48
#50
#64
#65

and the second milestone is the MPP executor
#18

Any contributions are welcome!

Implement fmt::Display for all expressions

impl fmt::Display for Operator {
    fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result {
        let display = match &self {
            Operator::Eq => "=",
            Operator::NotEq => "!=",
            Operator::Lt => "<",
            Operator::LtEq => "<=",
            Operator::Gt => ">",
            Operator::GtEq => ">=",
            Operator::Plus => "+",
            Operator::Minus => "-",
            Operator::Multiply => "*",
            Operator::Divide => "/",
            Operator::Modulo => "%",
            Operator::And => "AND",
            Operator::Or => "OR",
            Operator::Like => "LIKE",
            Operator::NotLike => "NOT LIKE",
            Operator::RegexMatch => "~",
            Operator::RegexIMatch => "~*",
            Operator::RegexNotMatch => "!~",
            Operator::RegexNotIMatch => "!~*",
            Operator::IsDistinctFrom => "IS DISTINCT FROM",
            Operator::IsNotDistinctFrom => "IS NOT DISTINCT FROM",
            Operator::BitwiseAnd => "&",
            Operator::BitwiseOr => "|",
        };
        write!(f, "{}", display)
    }
}

Ref to the code in datafusion (datafusion/expr/src/operator.rs)

Aggregate Func

aggregate func with group by

  • basic design: #41
  • basic planner: #45
  • group by, sum func work with different types #46
  • support more group by datatypes

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.