Git Product home page Git Product logo

Comments (17)

zh217 avatar zh217 commented on May 22, 2024 2

@gisborne I see your point. Currently a weak aspect of CozoDB is the lack of something like ALTER TABLE in Cozo. If we have stored rules (akin to views in SQL) then the situation becomes even more nasty. Maybe it is a good idea to have it, but I would like to first see the ALTER TABLE implemented, as stored rules would definitely need to play nicely with schema change.

from cozo.

zh217 avatar zh217 commented on May 22, 2024 1

You can either store it in the application layer in memory, or use a relation keyed by the session key. In my opinion this complexity doesn't belong to the database layer.

For reusable business rules, again I don't think it belongs to the database layer. Something like GraphQL's snippets are likely a better fit, but I'm still figuring out what's the best way to do it. Suggestions are welcome.

As an example of what this could be like:

$$imported_ruleset_1
$$imported_ruleset_2
?[x, z] := rule1[x, y], rule2[y, z]

and rule1 and rule2 come from the imported rulesets, passed in by the user at query time, just like parameters. The database can verify the imported rules and guard against injection attacks.

from cozo.

matthiasautrata avatar matthiasautrata commented on May 22, 2024 1

from cozo.

gisborne avatar gisborne commented on May 22, 2024 1

from cozo.

gisborne avatar gisborne commented on May 22, 2024 1

from cozo.

zh217 avatar zh217 commented on May 22, 2024

Every call to the database uses at least one thread. Rust uses OS's native threads, and on a modern linux something like 100k threads can be spawn with no problem. However, depending on what you want to do (for example web servers), it is best to limit the number of threads to the number of cores you have, especially if your query contains computation-heavy parts. We don't use Rust's async, since for database workloads that would actually drastically slow things down (and complicate the code).

For complicated queries (multiple rules within the same query, or graph algorithms), a single query may use many threads to speed things up. This is mostly managed by the rayon library under the hood, so there won't be an explosion of threads due to a single query.

Basically if the workload is what a traditional single-node RDBMS like PostgreSQL can handle, I don't see any problems for CozoDB either. Currently there is no plan for distributed versions.

from cozo.

gisborne avatar gisborne commented on May 22, 2024

To create these multiple threads on the same data store, do I just create separate database instances (DB::Instance.new) with the same path, or do I just use one and it spawns the threads itself?

This is not clear anywhere in the docs that I can see.

from cozo.

zh217 avatar zh217 commented on May 22, 2024

Oh I see what you mean. Yes it should be made clear in the docs.

You cannot create two Instance::new pointing to the same database (unlike SQLite, even if you use the SQLite engine). Sharing a single DB between threads is OK and recommended, and you don't need to do any synchronization. If you must use the same DB from different processes, then you must use the client/sever setup.

from cozo.

gisborne avatar gisborne commented on May 22, 2024

Do separate threads accessing the same instance have a way to have local state? Can they assert temporary things that only they can see?

The curly braces blocks look like maybe one way, but is there another?

FWIW I'd be happy to help writing documentation or any similar thing. Still working on learning Rust so I can't contribute that way yet. [email protected]

from cozo.

zh217 avatar zh217 commented on May 22, 2024

@gisborne Separate threads share nothing. Even for the data they read from stored relations, they are shielded from each other in the sense that, conceptually a snapshot is taken whenever a query starts, and the query only reads from the snapshot even, even if another thread has commited data to the stored relation since the start of the query. This is also the reason why we have write locks in the SQLite engine, as it is otherwise impossible for SQLite.

Why do we need such a strong notion of consistency? Because some graph algorithms run for quite a long time, and having your graph structures change half-way is disastrous.

In case of write conflicts, only one threads will complete. Conflicts can't happen in SQLite engine due to the lock.

from cozo.

gisborne avatar gisborne commented on May 22, 2024

Is there any way to have a state persist between queries, so it can be used for a sequence of things, eg during an interaction with a user?

Also, is there any way to make permanent, persisted rules, eg for business logic?

from cozo.

gisborne avatar gisborne commented on May 22, 2024

from cozo.

infogulch avatar infogulch commented on May 22, 2024

To restate gisborne's comment, how does the performance of a query like this:

$$imported_ruleset_1
$$imported_ruleset_2
?[x, z] := rule1[x, y], rule2[y, z]

... change as $$imported_ruleset_1 and $$imported_ruleset_2 become larger?

from cozo.

zh217 avatar zh217 commented on May 22, 2024

Caching is out of the question. Cache invalidation can easily become more complicated than the database itself.

from cozo.

zh217 avatar zh217 commented on May 22, 2024

@infogulch the same as the performance of the query where everything is inlined. It is just a safer version of text substitution.

from cozo.

infogulch avatar infogulch commented on May 22, 2024

My question is how expensive would it be to prefix every database query with, say, 100kb of rule definitions?

from cozo.

zh217 avatar zh217 commented on May 22, 2024

@infogulch Depends on your rule. If the rule is complicated, for example a recursive rule that walks several stored relations, then even one extra line will take a long time. If your rule definitions are just something that put data into rules to be used inline, then the bottleneck is parsing, and 100KB will probably not have any measurable difference. In fact, if you are using Python, then the Python-Rust interop probably takes more time than the parsing itself.

from cozo.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.