Git Product home page Git Product logo

husky-sql's Introduction

Husky SQL

Build Status

Dependencies

Husky SQL has the following minimal dependencies:

  • nlohmann/json (Version >= 3.2.0, it will be installed automatically)
  • All dependencies of husky

Build

Download the latest source code of Husky SQL:

git clone --recursive https://github.com/husky-team/husky-sql.git
cd husky-sql

Do an out-of-source build using CMake:

mkdir build && cd build
cmake .. -DCMAKE_BUILD_TYPE=Release # CMAKE_BUILD_TYPE: Release, Debug, RelWithDebInfo
make help                           # List all build target

make -j{N}                          # Make using N threads

Configuration & Running

For information of configuring and running Husky, please take a look at the Husky README.

Run sample query
make ExecuteQuery
Get query plans
make QueryPlan

husky-sql's People

Contributors

lmatz avatar tatianajin avatar

Stargazers

 avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

husky-sql's Issues

Implementing a Husky adapter and codegen for TableScan

As a practice for better understanding the query optimization and execution pipeline, let us start with supporting simple queries like SELECT * FROM table, where the table is a file stored in HDFS and to be read by Husky.

A rough guideline is as follows.

1. Provide schema and adapter

There are three major parts that you need:

  1. model.json: specify the schema in your database, for this practice one schema with one table is enough.
  2. Table: store the necessary information (e.g. the table url) for the table to read, and provide the column types through GetRowType. Hints: you may specify the columns in the model.json through "columns". To handle more dynamic cases, you may try using JNI or any method you think appropriate to dynamically load the column types from the file (e.g. stored in ORC, Parquet file header).
  3. TableFactory: create the table according to the given operands & row data type.

2. Optimize query and translate optimized plan into json

In the HuskyQueryPlan example we show how the optimization is carried out step by step. You can get the final plan (currently it is a plan in husky logical convention), and translate it into a json file (you can use your creativity to determine how to organize the json).

Alternatively, if you are more comfortable with using JNI to execute Husky tasks, you may use JNI instead of translating a plan into json.

You may try three cases (ordered from easiest to hardest):

  1. SELECT * from table: you will get a plan with only one RelNode: HuskyLogicalNativeTableScan. You can think about how to use the info in this RelNode to get the corresponding table (especially the url)
  2. Add project SELECT col1, col2 from table: think about what info is needed to implement the project function in Husky given the table. You may also implement a rule to push project into the table scan (the rule name is given in the Husky rulesets).
  3. Add filter SELECT col1, col2 from table where col3=val: you may start with a equality filter in the format of expression=literal. Think about what information is needed and how to get it from the corresponding RelNode.

For each group, either you use JNI or json for codegen, please reply here to specify your solution e.g. if you use json, please specify the structure of your json.

  • DY & SY
  • Alice
  • ZH

3. Provide execution logic in Husky

Provide your implementation for table scan, project, and filter. If json is used for codegen part, write a main program to read the json and execute the sql logic accordingly.

For each group, please reply here to specify your interface for the operators

  • DY & SY
  • Alice
  • ZH

Code style for java

Problem

Since it is trouble some to manually check styles (e.g. indentation, function block, import, etc.), it is better to add a check style plugin.

Proposed solution

The following is suggested. Some configurations are needed as well.

<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-checkstyle-plugin</artifactId>
<version>2.17</version>

Additional context

(ps. Guys, if you use Intellij please use the built-in formatter. This would be a good habit!!!)
Anybody can help?

Query Optimization and Convert Logical Plan to Physical Plan

  • Already Done:
  1. Read table metadata from table.json and construct the table.
  2. Basic adapter of Husky: HuskyLogicalTableScan(with project and filter).
  3. Query Optimization Rules:
    • PushProjectIntoTableScanRule:
      • matches operand(HuskyLogicalCalc.class, operand(HuskyLogicalTableScan.class, none()));
      • drop calc if the transformed program merely returns its input and doesn't exist filter.
    • PushFilterIntoTableScanRule:
      • matches operand(HuskyLogicalCalc.class, operand(HuskyLogicalTableScan.class, none()));
      • drop calc if the condition has been push down into table scan and there is no projection;
      • Note: condition should be converted into CNF for convenience of physical plan.
  4. Example:
    Suppose we have a query as following:
select TRANS_ID,ITEM_ID,PRICE,SELLER_ID,SELLER_NAME 
from Parts 
where PRICE > 2.0 and ITEM_ID = 2 or SELLER_ID = 4

Normalized Logical Plan:

LogicalProject(TRANS_ID=[$0], ITEM_ID=[$1], PRICE=[$3], SELLER_ID=[$4], SELLER_NAME=[$5])
  LogicalFilter(condition=[OR(AND(>($3, 2.0), =($1, 2)), =($4, 4))])
    LogicalTableScan(table=[[SALES, Parts]])


Apply PushProjectIntoTableScanRule and PushFilterIntoTableScanRule:

HuskyLogicalTableScan(table=[[SALES, Parts]], 
fields=[TRANS_ID=[$0], ITEM_ID=[$1], PRICE=[$3], SELLER_ID=[$4], SELLER_NAME=[$5]], 
condition=[AND(OR(>($2, 2.0), =($3, 4)), OR(=($1, 2), =($3, 4)))])

Note: In fields, FiledName=[$num] where num is the order of that field in the whole table; In condition, operand($num, constant) where num is the order of that field in the projected fieltes. For example, SELLER_ID is 4th (start with 0) field in the table while it is the 3rd field in the projected fields.

  • To Do List:

  • Add remaining optimization rules like: HuskyLogicalJoin, HuskyLogicalSort, etc.

  • Think about how to convert logical plan into physical plan

    1. How to connect each RelNode?
    2. What information should be contained in the output JSON file for each RelNode?
  • Here is a proposed example:


{
  "name": "HuskyLogicalCalc",
  "type": "Calc",
  "project": [
     {
          "index": "0",
          "name": "TRANS_ID",
          "datatype": "int"
      }, {
          "index": "1",
          "name": "ITEM_ID",
          "datatype": "int"
      },
      ...
  ],
  "condition":{
      {
          "operator": "AND",
          "left":{
               "operator": "OR",
               "left": "PRICE",
               "right": "2.0"
          },
          "right": {
               ...
          }
      }
    "input": {
          "name": "HuskyLogicalTableScan",
          "type": "TableScan",
          ...
    }
  }
}

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.