The load testing was very promising, but I’m not clear about the concurrency model and

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Clarify scalability and concurrency now and planned about cozo HOT 17 CLOSED

cozodb commented on May 22, 2024

Clarify scalability and concurrency now and planned

from cozo.

Comments (17)

zh217 commented on May 22, 2024 2

@gisborne I see your point. Currently a weak aspect of CozoDB is the lack of something like ALTER TABLE in Cozo. If we have stored rules (akin to views in SQL) then the situation becomes even more nasty. Maybe it is a good idea to have it, but I would like to first see the ALTER TABLE implemented, as stored rules would definitely need to play nicely with schema change.

from cozo.

zh217 commented on May 22, 2024 1

You can either store it in the application layer in memory, or use a relation keyed by the session key. In my opinion this complexity doesn't belong to the database layer.

For reusable business rules, again I don't think it belongs to the database layer. Something like GraphQL's snippets are likely a better fit, but I'm still figuring out what's the best way to do it. Suggestions are welcome.

As an example of what this could be like:

$$imported_ruleset_1
$$imported_ruleset_2
?[x, z] := rule1[x, y], rule2[y, z]

and rule1 and rule2 come from the imported rulesets, passed in by the user at query time, just like parameters. The database can verify the imported rules and guard against injection attacks.

from cozo.

matthiasautrata commented on May 22, 2024 1

Stored rules are useful because they help to create shared semantics. They should be a part of the definition of the datamodel, just like triggers. IMHO, it is a little bit like reasoning in OWL (or RDFOx). Of course, it works if you leave it in the application layer but it will require the programmer participation. Isn’t your point of leaving it out of the database similar to saying: We don’t need triggers in the database because they could be applied consistently in code? Basically, using your example below, it comes down to the question where you prefer to maintain rules: In code libraries that you hope every programmer will remember to import? Or in the database from where they are automatically made available? (Silly PS: My crutch would otherwise be to store the rules as text in the database and to figure out a little helper function that loads them into the program. Feels a bit cludgy.)

…

On Apr 27, 2023, at 04:07, Ziyang Hu ***@***.***> wrote: You can either store it in the application layer in memory, or use a relation keyed by the session key. In my opinion this complexity doesn't belong to the database layer. For reusable business rules, again I don't think it belongs to the database layer. Something like GraphQL's snippets are likely a better fit, but I'm still figuring out what's the best way to do it. Suggestions are welcome. As an example of what this could be like: $$imported_ruleset_1 $$imported_ruleset_2 ?[x, z] := rule1[x, y], rule2[y, z] and rule1 and rule2 come from the imported rulesets, passed in by the user at query time, just like parameters. The database can verify the imported rules and guard against injection attacks. — Reply to this email directly, view it on GitHub <#71 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AL52JJQOMYE7DW2JQHXAXFDXDISNVANCNFSM6AAAAAAXMAT7F4>. You are receiving this because you are subscribed to this thread.

from cozo.

gisborne commented on May 22, 2024 1

I am asking about this trying to imagine a typical business website, and finally being able to effectively implement much of the business logic in FOL. There would be quite a bit of FOL for all of it, but on reflection, for any given state change I want to make the number of rules relevant to that state change is probably not enormous. I would want a nice way to compose and load up sets of rules. I’d be happy to implement that in the host language, but insofar as you’re using a server and it’s a shared resource, it wouldn’t be bad to have a way to centralise it in Cozo.

from cozo.

gisborne commented on May 22, 2024 1

Sure. I had actually been thinking about doing something like this project myself. I had been contemplating the idea (noting that I intended to implement an append-only datalog at base) of not having relation modification at the base level. If you want your User relation to now have a boolean admin column, you create a new such relation, and aggregate the old and new relations using a rule. I wasn’t sure how to then allow insert into the relation defined by the rule. Maybe support something akin to Postgres’s query rewrite rules. The layer on top of the append-only layer, that provides updates and deletes and schema modification would then paper over all that and make it look like you could change schemas and so on.

…

On Apr 27, 2023, at 22:16, Ziyang Hu ***@***.***> wrote: @gisborne <https://github.com/gisborne> I see your point. Currently a weak aspect of CozoDB is the lack of something like ALTER TABLE in Cozo. If we have stored rules (akin to views in SQL) then the situation becomes even more nasty. Maybe it is a good idea to have it, but I would like to first see the ALTER TABLE implemented, as stored rules would definitely need to play nicely with schema change. — Reply to this email directly, view it on GitHub <#71 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAAG7X6R4VZAXHIGGUR3EHDXDNHB7ANCNFSM6AAAAAAXMAT7F4>. You are receiving this because you were mentioned.

from cozo.

zh217 commented on May 22, 2024

Every call to the database uses at least one thread. Rust uses OS's native threads, and on a modern linux something like 100k threads can be spawn with no problem. However, depending on what you want to do (for example web servers), it is best to limit the number of threads to the number of cores you have, especially if your query contains computation-heavy parts. We don't use Rust's async, since for database workloads that would actually drastically slow things down (and complicate the code).

For complicated queries (multiple rules within the same query, or graph algorithms), a single query may use many threads to speed things up. This is mostly managed by the rayon library under the hood, so there won't be an explosion of threads due to a single query.

Basically if the workload is what a traditional single-node RDBMS like PostgreSQL can handle, I don't see any problems for CozoDB either. Currently there is no plan for distributed versions.

from cozo.

gisborne commented on May 22, 2024

To create these multiple threads on the same data store, do I just create separate database instances (DB::Instance.new) with the same path, or do I just use one and it spawns the threads itself?

This is not clear anywhere in the docs that I can see.

from cozo.

zh217 commented on May 22, 2024

Oh I see what you mean. Yes it should be made clear in the docs.

You cannot create two Instance::new pointing to the same database (unlike SQLite, even if you use the SQLite engine). Sharing a single DB between threads is OK and recommended, and you don't need to do any synchronization. If you must use the same DB from different processes, then you must use the client/sever setup.

from cozo.

gisborne commented on May 22, 2024

Do separate threads accessing the same instance have a way to have local state? Can they assert temporary things that only they can see?

The curly braces blocks look like maybe one way, but is there another?

FWIW I'd be happy to help writing documentation or any similar thing. Still working on learning Rust so I can't contribute that way yet. [email protected]

from cozo.

zh217 commented on May 22, 2024

@gisborne Separate threads share nothing. Even for the data they read from stored relations, they are shielded from each other in the sense that, conceptually a snapshot is taken whenever a query starts, and the query only reads from the snapshot even, even if another thread has commited data to the stored relation since the start of the query. This is also the reason why we have write locks in the SQLite engine, as it is otherwise impossible for SQLite.

Why do we need such a strong notion of consistency? Because some graph algorithms run for quite a long time, and having your graph structures change half-way is disastrous.

In case of write conflicts, only one threads will complete. Conflicts can't happen in SQLite engine due to the lock.

from cozo.

gisborne commented on May 22, 2024

Is there any way to have a state persist between queries, so it can be used for a sequence of things, eg during an interaction with a user?

Also, is there any way to make permanent, persisted rules, eg for business logic?

from cozo.

gisborne commented on May 22, 2024

Might it not also be the case that if some rules are “always there” that some rule consequences might still be cached, so inference could be quicker?

…

On Apr 27, 2023, at 03:03, Matthias Autrata ***@***.***> wrote: Stored rules are useful because they help to create shared semantics. They should be a part of the definition of the datamodel, just like triggers. IMHO, it is a little bit like reasoning in OWL (or RDFOx). Of course, it works if you leave it in the application layer but it will require the programmer participation. Isn’t your point of leaving it out of the database similar to saying: We don’t need triggers in the database because they could be applied consistently in code? Basically, using your example below, it comes down to the question where you prefer to maintain rules: In code libraries that you hope every programmer will remember to import? Or in the database from where they are automatically made available? (Silly PS: My crutch would otherwise be to store the rules as text in the database and to figure out a little helper function that loads them into the program. Feels a bit cludgy.) > On Apr 27, 2023, at 04:07, Ziyang Hu ***@***.***> wrote: > > > You can either store it in the application layer in memory, or use a relation keyed by the session key. In my opinion this complexity doesn't belong to the database layer. > > For reusable business rules, again I don't think it belongs to the database layer. Something like GraphQL's snippets are likely a better fit, but I'm still figuring out what's the best way to do it. Suggestions are welcome. > > As an example of what this could be like: > > $$imported_ruleset_1 > $$imported_ruleset_2 > ?[x, z] := rule1[x, y], rule2[y, z] > and rule1 and rule2 come from the imported rulesets, passed in by the user at query time, just like parameters. The database can verify the imported rules and guard against injection attacks. > > — > Reply to this email directly, view it on GitHub <#71 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AL52JJQOMYE7DW2JQHXAXFDXDISNVANCNFSM6AAAAAAXMAT7F4>. > You are receiving this because you are subscribed to this thread. > — Reply to this email directly, view it on GitHub <#71 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAAG7X7BMI526CS6VRWZPN3XDI76ZANCNFSM6AAAAAAXMAT7F4>. You are receiving this because you were mentioned.

from cozo.

infogulch commented on May 22, 2024

To restate gisborne's comment, how does the performance of a query like this:

$$imported_ruleset_1
$$imported_ruleset_2
?[x, z] := rule1[x, y], rule2[y, z]

... change as $$imported_ruleset_1 and $$imported_ruleset_2 become larger?

from cozo.

zh217 commented on May 22, 2024

Caching is out of the question. Cache invalidation can easily become more complicated than the database itself.

from cozo.

zh217 commented on May 22, 2024

@infogulch the same as the performance of the query where everything is inlined. It is just a safer version of text substitution.

from cozo.

infogulch commented on May 22, 2024

My question is how expensive would it be to prefix every database query with, say, 100kb of rule definitions?

from cozo.

zh217 commented on May 22, 2024

@infogulch Depends on your rule. If the rule is complicated, for example a recursive rule that walks several stored relations, then even one extra line will take a long time. If your rule definitions are just something that put data into rules to be used inline, then the bottleneck is parsing, and 100KB will probably not have any measurable difference. In fact, if you are using Python, then the Python-Rust interop probably takes more time than the parsing itself.

from cozo.

Clarify scalability and concurrency now and planned about cozo HOT 17 CLOSED

Comments (17)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent