erikgrinaker / toydb Goto Github PK
View Code? Open in Web Editor NEWDistributed SQL database in Rust, written as a learning project
License: Apache License 2.0
Distributed SQL database in Rust, written as a learning project
License: Apache License 2.0
The client's error handling is faulty, fix this and write integration tests.
The Raft engine should be rewritten to use Tokio async. Related to #22.
Hi @erikgrinaker,
Thank you for open source this great project. I would like to say this is the best resource to learn database principles. Meanwhile, the code quality is awesome, neat and elegant. Every piece is excellent.
I have a question for one of the optimizer, as shown below:
(Expression::Field(a, a_label), Expression::Field(b, b_label)) => {
let (left_field, right_field) = if a < left_size {
((a, a_label), (b - left_size, b_label))
} else {
((b, b_label), (a - left_size, a_label))
};
Ok(Node::HashJoin { left, left_field, right, right_field, outer })
}
when it is equal join, here we switch to hash join. based on Field(a) and Field(b). Not sure if I miss anything, but what if the corresponding field (column) are not unique ? if there are duplicate values in the fields. I feel the hash join will miss some rows?
The hash map is created by collect the key value pairs:
let right: HashMap<Value, Row> = rrows
.map(|res| match res {
Ok(row) if row.len() <= r => {
Err(Error::Internal(format!("Right index {} out of bounds", r)))
}
Ok(row) => Ok((row[r].clone(), row)),
Err(err) => Err(err),
})
.collect::<Result<_>>()?;
if there are duplicate values, I think only the last pair will exist.
do I miss anything? Appreciate any feedback. Thanks!
The client should support pipelining: https://docs.rs/tokio-postgres/0.5.3/tokio_postgres/index.html#pipelining
This would e.g. be useful for the bank simulation setup.
The client should automatically retry serialization failures. This is non-trivial due to async closure lifetimes. See bank simulation for implementation example.
These currently crash the node, they should be returned and the operations skipped.
Maybe with Elle: https://github.com/jepsen-io/elle
I've cloned the toydb
repository and do a cargo build --release
successful. I then went into target/release
directory and try to run ./toysql
and got the following error message:
Error: Internal("No connection could be made because the target machine actively refused it. (os error 10061)")
I run the same command with administrator privilege and got the same error message. I'm on Windows 10 x64 Version 2004 OS build 20201.1000.
This will make it easier to write non-trivial copy/paste examples.
When I tried to operate this order here, the system said that I run with the error as follows:
cargo run --release --bin toysql
the following packages contain code that will be rejected by a future version of Rust: nom v5.1.2
note: to see what the problems were, use the option --future-incompat-report
, or run cargo report future-incompatibilities --id 1
Running target/release/toysql
Error: Internal("Connection refused (os error 111)")
My operation system is Ubuntu22.04 and the rust version is 1.70. Please help me to solve this issue. Thanks!
Try to move call routing out of Raft node, and integrate with separate state machine thread (related to #22).
Plan nodes should not have unqualified or aliased names, they should all be resolved and validated during planning to simplify the optimization stage.
mvcc.rs: anomaly_read_skew function
let t1 = mvcc.begin()?;
let t2 = mvcc.begin()?;
assert_eq!(t1.get(b"a")?, Some(vec![0]));
t2.set(b"a", vec![2])?;
t2.set(b"b", vec![2])?;
t2.commit()?;
assert_eq!(t1.get(b"a")?, Some(vec![0]));
The annotation explanation is:Read skew is when t1 reads a and b, but t2 modifies b in between the reads. Snapshot isolation prevents this.
At the same time, according to the explanation of this document, different read operations are performed before and after, and in the test, a was read before and after.
https://vladmihalcea.com/a-beginners-guide-to-read-and-write-skew-phenomena/
Therefore, we should instead read b last time
The client should automatically handle serialization failures by taking a transaction as a closure and retrying it.
Submitting a call immediately after starting a cluster causes a panic:
Error: Internal("Unexpected Raft mutate response MutateState { call_id: [250, 91, 73, 148, 245, 202, 77, 21, 184, 190, 68, 192, 45, 3, 88, 140], command: [129, 0, 129, 0, 192] }")
Seems like the node returns the message we submitted, or something. Probably only applies to candidates, since it works fine once the cluster settles.
All storage keys (even individual rows and index entries) currently use full identifiers for tables and columns, they should use integer identifiers instead. They must also escape separators.
hi,when I checkout commit fe8f322. error occurred as follow:
file not found for module raft
Hello @erikgrinaker!
Thanks for this amazing project. It actually inspired me to create my own toy database as well. I'm also following Designing Data-Intensive Applications and Database Internals. I'm not sure if you have time to answer them, yet here are the questions:
Thanks in advance,
"{table_id}_{primary_id}_{col}": row[col]
. Is that correct?Maybe just Tokio TCP sockets with MessagePack. Make client sessions stateful, i.e. track transaction state in server thread.
It looks like:
Running target/debug/deps/tests-28925d9acf338694
running 1 test
test cluster::isolation::anomaly_dirty_read ...
It seems blocked within tokio, but I am not falimiar with tokio.
Raft currently runs in a single thread, such that long-running state operations block the leader from sending heartbeats and asserting leadership.
By default, toyDB uses an on-disk Raft log for persistence and an in-memory B+tree key-value store for SQL state data. It might be interesting to build an on-disk B+tree key-value store as well.
BitCask is a very simple log-structured key/value store, used by Riak. It basically writes key/value pairs to an append-only log, and keeps a map of keys to file byte offsets in memory, with regular compaction.
This is just the right level of complexity for toyDB, and is suitable both for Raft log and state machine storage.
Iterators are currently rather hacky and inefficient, and e.g. use O(log n) lookups per next() call (B+tree and MVCC) or buffer all results (Raft and SQL engines). Ideally, iterators should have O(1) complexity when calling next(), stream all results (with some amount of IO buffering), and don't hold read borrows to the entire data structure for the lifetime of the iterator.
Depends on #3.
Probably B-tree based, possibly LSM-trees with B-tree indexes.
Should have config options to switch storage engine, both for Raft and SQL.
Or some other convenient way to create errors.
Hey @erikgrinaker,
I read the architecture guide but maybe I missed this one somehow. I was wondering how MVCC storage behaves in terms of parallelism. I see that there is a Mutex here. So how does that affect execution exactly? Is there really only one transaction running at a time? Like, yeah, this means transactions are concurrent, but still not parallel as I understood.
Sorry if I'm missing something fundamental.
pub struct MVCC<E: Engine> {
/// The underlying KV engine. It is protected by a mutex so it can be shared between txns.
engine: Arc<Mutex<E>>,
}
Thanks in advance.
Build with GitHub Actions and host on GitHub Pages, or something.
Hi Erik,
Im new to rust and database and i think this repo is a amazing project for learning both things.
Want to contribute some code but i did not find any code of conduct or CONTRIBUTING things.
Is there anything i should notice about that?
Spin up a three-node cluster and run isolation tests for concurrent txns across multiple nodes.
Since all Result
instances use error::Error
, we should add error::Result
which always uses this error type.
Aborts caused by Raft leadership changes or client disconnections may leave stale transactions. The SQL session and client should try to recover these, by rolling back on drop and resuming transactions on commit/rollback error.
For concurrency and performance testing.
Status calls should have more content (e.g. cluster peer status) and better formatting.
Errors in single-statement transactions currently leave the transaction open, giving serialization errors when other txns try to modify a record.
The KV SQL engine should cache schema lookups.
This needs to be implemented at the MVCC level, so that versioning works - probably by specifying prefixes that should be cached. The downside of this is that it needs to go through serialization.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.