kipdata / fncksql Goto Github PK
View Code? Open in Web Editor NEWSQL as a Function for Rust
Home Page: http://www.kipdata.site/
License: Apache License 2.0
SQL as a Function for Rust
Home Page: http://www.kipdata.site/
License: Apache License 2.0
What version of KipSQL are you using?
latest
What version of Rust are you using?
rustc 1.77.0-nightly
What did you do?
explain select t.a from t1;
PLAN
----------------------------
Projection [t.a] [Project]+
Scan t1 -> [] [SeqScan]
(1 row)
What did you expect to see?
some error message
Use a certain format to simplify key generation
like: TableName + TypeNum + 0 + TypeId + 0 + DataValue
e.g. T0_Index_0_0_0_0000000000000000000 => T020000000000000000000000
Is your feature request related to a problem? Please describe:
currently, we just support like select * from t1 where a=1
we should add order operation for exp.
select a, b from t1 order by a
Describe the feature you'd like:
order by a desc/asc
e.g. select v2 from t where v1 like 'G%';
What version of KipSQL are you using?
latest
What version of Rust are you using?
nightly
What's the status of the running?
What did you do?
create table t1 (a int, b bool null);
insert into table t1 values (1, null);
like bustub we should support the part of the column is null
What did you expect to see?
insert success
and the value is exp. ' bool(null) '
What did you see instead?
the value is 'NULL'
When OrderBy .. Limit ..
exists, the TopK
operator will be used
Tips: TopK
should also use index
this issue is #97 sub-issue
support alert table:
ALTER TABLE table ADD column_1 INTERGER;
ALTER TABLE table MODIFY column_1 INTERGER;
ALTER TABLE table CHANGE COLUMN column_1 column_2 INTERGER;
ALTER TABLE table DROP COLUMN column_1;
RENAME table TO table ;
TPC-H:
CREATE TABLE lineitem (
l_orderkey INT,
l_partkey INT,
l_suppkey INT,
l_linenumber INT,
l_quantity DECIMAL(15,2),
l_extendedprice DECIMAL(15,2),
l_discount DECIMAL(15,2),
l_tax DECIMAL(15,2),
l_returnflag CHAR(1),
l_linestatus CHAR(1),
l_shipdate DATE,
l_commitdate DATE,
l_receiptdate DATE,
l_shipinstruct CHAR(25),
l_shipmode CHAR(10),
l_comment VARCHAR(44),
PRIMARY KEY (l_orderkey, l_linenumber)
);
What version of KipSQL are you using?
latest
What version of Rust are you using?
1.77.0
What's the status of the running?
What did you do?
When I run a SUBSTRING
function that length is larger than the string, the connectionn is reseted.
What did you expect to see?
=> select substring('abc', 1, 10);
---------------------------------
abc
What did you see instead?
=> select substring('abc', 1, 10);
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: Succeeded.
I think it's better to handle this by retrieving all the characters after the start index(like PostgreSQL and MySQL)
Is your feature request related to a problem? Please describe:
currently, we just support select a from t1
we should add filter operation for exp.
select * from t1 where a=1
Describe the feature you'd like:
where a>1 a<1 a =1
where c1 > c2
where c1 > c2 * 2
What version of KipSQL are you using?
main
What version of Rust are you using?
What's the status of the running?
build error
https://github.com/KipData/KipSQL/blob/main/src/storage/mod.rs#L151
KipDB new version(0.1.2-alpha.17) CacheError
has been removed, we should remove it in KipSql
What did you do?
What did you expect to see?
What did you see instead?
Function
Create Table
PRIMARY KEY (column_1, column_2, ...)
UNIQUE (column_1, column_2, ...)
CREATE TABLE CUSTOMER (C_ID INTEGER DEFAULT '0' NOT NULL)
Ref: #103
Create Index
CREATE UNIQUE INDEX u_index ON table(column_1, column_2, ...);
Alert Table
ALTER TABLE table ADD column_1 INTERGER;
ref: #104ALTER TABLE table MODIFY column_1 INTERGER;
ALTER TABLE table CHANGE COLUMN column_1 column_2 INTERGER;
ALTER TABLE table DROP COLUMN column_1;
ref: #109RENAME table TO table ;
Ref:
Desc
Desc Table
: #102It seems to project twice on temp table.
And I don't think it can handle more complex situations.
explain SELECT b FROM t1 WHERE a in (SELECT a FROM t2 where (select a from t1)); PLAN ------------------------------------------------------------------- Projection [t1.b] [Project] + LeftSemi Join On t1.a = (t2.a) as (_temp_table_2_.a) [HashJoin]+ Scan t1 -> [a, b] [SeqScan] + Projection [(t2.a) as (_temp_table_2_.a)] [Project] + Projection [t2.a] [Project] + Inner Join Where _temp_table_1_.a [HashJoin] + Scan t2 -> [a] [SeqScan] + Projection [(t2.a) as (_temp_table_1_.a)] [Project] + Projection [t2.a] [Project] + Scan t1 -> [] [SeqScan]I think every query using their own binder context may be a solution, but it may be a big work.
Extract KipDB transactions and put them under user control
The Update statement involves two transaction creations:
What version of KipSQL are you using?
0.0.1-alpha.6
What version of Rust are you using?
β― rustc -V
rustc 1.76.0-nightly (de686cbc6 2023-12-14)
What's the status of the running?
What did you do?
> πππ»ππΌππ½ππΎππΏ <
create table blog_1 (id int primary key, title varchar unique, created_at datetime);
+----------------------+
| CREATE TABLE SUCCESS |
+======================+
| blog_1 |
+----------------------+
> πππ»ππΌππ½ππΎππΏ <
insert into blog_1 (id, title) values (0, 'KipSQL', '2022-10-12 12:22:33');
thread 'main' panicked at /home/jojo/code/open-source/kipsql/src/binder/insert.rs:52:52:
index out of bounds: the len is 2 but the index is 2
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
What did you expect to see?
Insert bind fails with error
> πππ»ππΌππ½ππΎππΏ <
insert into blog_1 (id, title) values (0, 'KipSQL', '2022-10-12 12:22:33');
Oops!: bind error: column count not match, expect 2, got 3
What did you see instead?
PANIC occurs when the number of columns does not match the expression during bind Insert.
> πππ»ππΌππ½ππΎππΏ <
insert into blog_1 (id, title) values (0, 'KipSQL', '2022-10-12 12:22:33');
thread 'main' panicked at /home/jojo/code/open-source/kipsql/src/binder/insert.rs:52:52:
index out of bounds: the len is 2 but the index is 2
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
By default, kipsql provides cargo run for testing sql statements. As an embedded database, kipsql needs to perform some maintenance operations. It is more user-friendly to operate with cli instead of coding. Therefore, use the main method of this test directly. Just provide an optional path parameter
e.g. cargo run -path '/home/kould/kipsql_data'
Separate Varchar from Text, use a prefix of the same length, and combine it with the suffix to form a complete string.
Tips: The suffix of Varchar is string, and the suffix of Text is file.
As mentioned in issue: #162, StatisticsMetaLoader::load
takes a considerable amount of time.
Currently, StatisticsMetaLoader loads table statistical information at a granularity. This may happen when a table has only a few indexes that are frequently used, but indexes that are not commonly used are still loaded. We can reduce the granularity and increase the loading speed.
Tips: StatisticsMeta is based on Index as the granularity
pub fn load(&self, table_name: TableName) -> Result<&Vec<StatisticsMeta>, DatabaseError> {
let option = self.cache.get(&table_name);
if let Some(statistics_metas) = option {
Ok(statistics_metas)
} else {
let paths = self.tx.statistics_meta_paths(&table_name)?;
let mut statistics_metas = Vec::with_capacity(paths.len());
for path in paths {
statistics_metas.push(StatisticsMeta::from_file(path)?);
}
Ok(self
.cache
.get_or_insert(table_name, |_| Ok(statistics_metas))?)
}
}
In KipSQL, we use sqlparser-rs to parse SQLs.
Parser will return an AST, which is a tree that represents users' query. For example, the above SQL will produce an AST like...
create table t1 (v1 int not null, v2 int);
AST: CreateTable {
or_replace: false,
temporary: false,
external: false,
global: None,
if_not_exists: false,
transient: false,
name: ObjectName(
[
Ident {
value: "t1",
quote_style: None,
},
],
),
columns: [
ColumnDef {
name: Ident {
value: "v1",
quote_style: None,
},
data_type: Int(
None,
),
collation: None,
options: [
ColumnOptionDef {
name: None,
option: NotNull,
},
],
},
ColumnDef {
name: Ident {
value: "v2",
quote_style: None,
},
data_type: Int(
None,
),
collation: None,
options: [],
},
],
constraints: [],
hive_distribution: NONE,
hive_formats: Some(
HiveFormat {
row_format: None,
storage: None,
location: None,
},
),
table_properties: [],
with_options: [],
file_format: None,
location: None,
query: None,
without_rowid: false,
like: None,
clone: None,
engine: None,
default_charset: None,
collation: None,
on_commit: None,
on_cluster: None,
order_by: None,
}
Is your feature request related to a problem? Please describe:
Describe the feature you'd like:
Teachability, Documentation, Adoption, Migration Strategy:
we need support desc table
to query table structureοΌ
I will try impl this
Build the most basic component of the database: catalog. It provides metadata information for database tables, which is used for subsequent binder and logical planner construction.
At the same time, it is a component that runs through the entire query engine processing flow and provides table metadata information.
To simplify implementation, there is no introduction of the concepts of database and schema here.
Under RootCatalog, a HashMap directly contains all TableCatalogs.
The TableCatalog contains all ColumnCatalogs.
root->table->column
Convert sql to Logical Plan of Insert -> Projection -> Values
The new Values ββoperator is used to convert Rows to BoxedExecutor as a temporary data source
When DDL exists in multiple transactions, it may cause the data modification to be incorrect but still able to be submitted.(Write conflict detection requires the key to be the same)
In
c1 In (1, 2, 3)
=> c1 = 1 or c1 = 2 or c1 = 3
Like
c1 like 'abc%'
=> c1 > 'abc' and c1 < 'abd'
Is null
c1 is null
=> c1 == null
(logical)By converting these advanced operators into equality and range queries, CBO can provide more accurate row number estimates.
If you want to contribute code to FnckSQL and find requirements, you can refer to the following list
The corresponding test files are attached under the requirements description. The specific location is in the test directory.
e.g. E011_02
=> tests/slt/sql_2016/E011_02.slt
Tips: When you complete the requirement and submit the PR, please do not forget to delete the comment corresponding to the Case.
Select
BETWEEN
on Where
ESCAPE
on LIKE
ALL/ANY/SOME
on WHERE
EXISTS
on WHERE
WHERE
WHERE
with IN/Not IN
UNION\UNION DISTINCT
UNION ALL
EXCEPT DISTINCT
SELECT INTO
CONSTRAINT
name
VIEW
& CREATE VIEW
& DROP VIEW
EXCEPT
with VIEW
WHERE
, GROUP BY
, and HAVING
clauses supported in queries with grouped views
Using
on Join
CASE
Binder is a module in the entire database system that is very important.
Its function is to bind the AST and Schema information generated after parsing, specifically including:
In this stage, you should Implement 3 types of statement binding
BoundCreateTable
BoundSelect
BoundInsert
What version of KipSQL are you using?
What version of Rust are you using?
What's the status of the running?
What did you do?
create table t1 (a int primary key, b int, k int)
insert into t1 (a, b, k) values (-99, 1, 1), (-1, 2, 2), (5, 2, 2)
quit
cargo run
select * from t1
What did you expect to see?
(-99, 1, 1), (-1, 2, 2), (5, 2, 2)
What did you see instead?
Oops!: bind error: invalid table bind table t1
There are currently many features that depend on this
c1 >= (select 1)
When converting most numbers to the corresponding key, you can use type conversion to [u8] to convert to a string to reduce the length of the key while ensuring the bitwise comparison order.
As a lightweight embedded SQL storage database, KipSQL can enhance support for micro-frontends to expand usage scenarios.
[1] https://developer.mozilla.org/zh-CN/docs/WebAssembly/Rust_to_Wasm
e.g. SELECT * FROM employees WHERE salary > (SELECT salary FROM employees WHERE last_name = 'Abel');
TODO: When the case is select * from t1 where (c1 = 7 or c1 = 10) and c2 < 2;, the range is (-inf, (10, 2)), in fact it is better to be ((7), (7) ,2)), ((10), (10, 2))
on pr: #154 TODO part
I try to perform this case in a table with 200,000 rows: select * from t1 where c1 > 500 and c1 < 1000
, and use pprof-rs to generate a flame graph
The most obvious one is StatisticMetaLoader::load
, which will then be cached, and then I generate a flame graph after loading.
In the new flame graph, HepOptimizer::find_best
is a relatively large performance bottleneck.
Is your feature request related to a problem? Please describe:
execute drop table t
, if table t is not exist will get an error, I think need support if_exists
param
Oops!: executor error: storage_ap error: kipdb error
Describe the feature you'd like:
Describe alternatives you've considered:
Teachability, Documentation, Adoption, Migration Strategy:
let _ = kipsql.run("create table t1 (a int, b int)").await?;
let _ = kipsql.run("insert into t1 (b, a) values (1, 1), (3, 3), (5, 4)").await?;
let vec_batch_full_fields = kipsql.run("select * from t1").await?;
print_batches(&vec_batch_full_fields)?;
π
a | b |
---|---|
1 | 1 |
3 | 3 |
5 | 4 |
Tips: Fix it by directly adding Projection to Insert and Values
In order to facilitate testing and ensure the correctness of existing functions, the e2e testing framework sqllogictest-rs is introduced
https://github.com/risinglightdb/sqllogictest-rs
when inserting some null values, the DataType conversion fails.
because AST recognizes null in Insert Values ββas DataValue::Null, and the data type Null corresponding to KipSQL's DataValue::i32 is represented as i32(None)
therefore, Binder::bind_insert
should convert the DataType::Null of expr in the cols again according to the datatype of the corresponding catalog (if the DataType of ColumnCataLog is DataType::i32, then DataValue::Null -> DataValue::i32(None) )
skip startup
step
Question:
KipSQL is currently mainly used as an embedded SQL database. Its similar database may be SQLite, but SQLite does not support remote connections.
Therefore, it may be a bit redundant for KipSQL to support PG Wire?
However, if it is used to support operation and maintenance, it may be a good choice.
Tips:
This is a lower priority issue(I think)οΌBecause there is no suitable answer in the Question part.
[1] http://mysql.taobao.org/monthly/2020/03/02/
[2] https://zhuanlan.zhihu.com/p/493045524
e.g. select v2 from t where v1 is null;
The sql-query-engine-rs v0.1 milestone is to build a basic SQL query on CSV storage, like:
DDl:
CREATE TABLE student (
id INTEGER PRIMARY KEY,
name VARCHAR NOT NULL,
age INTEGER
);
DQL:
SELECT * FROM t
SELECT a, b FROM t
SELECT a, b, 1 FROM t
DML:
INSERT INTO t(a, b) VALUES (1, 10)
Therefore, we should implement the following database components:
By supporting CSV reading and output, it provides testing, backup and other applications.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
π Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. πππ
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google β€οΈ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.