Git Product home page Git Product logo

cozodb / cozo Goto Github PK

View Code? Open in Web Editor NEW
3.1K 39.0 85.0 9.78 MB

A transactional, relational-graph-vector database that uses Datalog for query. The hippocampus for AI!

Home Page: https://cozodb.org

License: Mozilla Public License 2.0

CMake 0.02% C 0.64% C++ 0.68% Rust 97.42% Shell 0.47% PowerShell 0.07% Java 0.03% JavaScript 0.28% Swift 0.26% Ruby 0.04% HTML 0.06% RenderScript 0.01% Python 0.03%
database graph client-server cross-platform datalog embedded-database graph-algorithms graph-database graphdb relational-database

cozo's People

Contributors

avi-d-coder avatar chuanqisun avatar creatorrr avatar crowdhailer avatar github-actions[bot] avatar goldsteine avatar liangxianzhe avatar mateusvmv avatar michaelsbradleyjr avatar mr-dispatch avatar niwakadev avatar pegesund avatar redbar0n avatar sean-lynch avatar turnerdev avatar zh217 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

cozo's Issues

redb support

redb is young but promising kv storage

  • on par performance with lmdb
  • pure rust so it is a win for portability of cozo
  • compact enough code base

is there any chance for getting support in cozodb with timetravel?

ownership issue in backup_db in the Rust API

Hey there! Thanks for a very promising database!

I'm writing an ingestor and found myself wanting to use the backup_db method on Db<Storage>. This was harder than I expected, however, because that method needs to take an owned string as a path. This means I can't use the PathBuf I already have—instead, I have to convert that to a string (which is not always safe, because paths can have invalid unicode) and allocate a new string, since the method requires ownership.

Would it be possible to change this to be a borrowed std::path::Path instead? Or, if it has to be owned, a PathBuf?

Document storage strategies/architecture

I hunted around for this information briefly but couldn't find it. How are data and relations stored in cozo?

The primary thing I'm wondering is if Cozo is row-oriented or column-oriented. General-purpose databases have traditionally all stored rows contiguously, but storing similar data types continuously opens up big opportunities for compression and analytics workflows.

:replace doesn't seem to work on ephemeral relations after explicit creation

I was incredibly confused when making transactions that created (either explicitly or via :replace itself,) an ephemeral relation and then later tried to replace them, because I would always see:

× when executing against relation '_test'
╰─▶ Cannot destroy temp relation

Here's a minimal reproduction:

{?[a] <- [[1], [2], [3]]; :replace _test {a}}
{?[a] <- [[1]]; :replace _test {a}}

Hoping this is not intended behavior! It makes a lot of things significantly more convoluted to implement!

Failed running rust example from readme

Hi, great idea for a db! Looking forward towards trying it out a bit more.

I tried the Rust example from the readme and it's broken, this worked for me:

use miette::Result;

fn main() -> Result<()> {
    let db = Db::new("_test_db")?;
    println!("{}", db.run_script_str(r#"?[] <- [['hello', 'world!']]"#, ""));
    println!("{}", db.run_script_str(r#"?[] <- [['hello', 'world', $name]]"#, r#"{"name":"Rust"}"#));
    println!("{}", db.run_script_str(r#"?[a] <- [[1, 2]]"#, ""));

    Ok(())
}

Maybe relations as values?

It would be nicely orthogonal if, rather than living in a single global namespace, relations could be field values. Now I can organise my relations however suits.

The same idea might apply to other things — theories (collections of rules, values and so on)…

Custom functions defined in Rust (/ other host language)

Hi! I’d like to embed Cozo in my project and provide some utility functions to queries. Currently I use fixed rules for this:

urls[url] <- [["https://goldstein.rs"], ["https://cozodb.org"]]
fetched[url, status, body] <~ FetchUrl(urls[url])
query[url, body, selector] := fetched[url, status, body], selector = "title"
titles[] <~ HtmlSelect(query[url, body, selector])
?[url, title] := titles[url, titles], title = first(titles)

That works, but it’s kinda inconvenient. I’d prefer to define custom functions instead for API like this:

urls[url] <- [["https://goldstein.rs"], ["https://cozodb.org"]]
?[url, title] := urls[url], body = fetch_url(url), title = first(html_select(body, "title"))

Is feature like this planned? Maybe I’m missing something, but it seems to be useful for embedding Cozo.

Tutorial/doc: improve example

Suggestion for improving the recursive example from the tutorial, so it's slightly more intuitive.

Current:

parent[] <- [['joseph', 'jakob'], 
             ['jakob', 'issac'], 
             ['issac', 'abraham']]
grandparent[gcld, gp] := parent[gcld, p], parent[p, gp]
?[who] := grandparent[who, 'abraham']  # => 'jakob'

Issues:

  • The grandparent function is interpreted as: "who is the grandparent of X?" or "find the grandparent of X", but currently Jakob comes out as the grandparent of Abraham, when it's really the other way around.
  • What does gcld mean? It wasn't documented and I didn't understand it.
  • Parent child relationship is more natural to describe top-down/left-right than bottom-up/right-left. Especially since the "array" is called parent, so the focus is on the parent (i.e. top-down).
  • Typo: issac → isaac
  • Typo: "classical example recursive example" → "classical recursive example"
  • It'd be nice if the example showed one more level of recursion, i.e. great-grandparent.

Suggestion:

parent[] <- [['abraham', 'isaac'], 
             ['isaac', 'jakob'], 
             ['jakob', 'joseph']]

grandparent[A, C] := parent[A, B], parent[B, C]

great_grandparent[A, D] := parent[A, B], parent[B, C], parent[C, D]

?[who] := great_grandparent[who, 'joseph']  # => 'abraham'

which is also more in line with the example from "A Guided Tour of Relational Databases and Beyond", by Mark Levene & George Loizou and the other example I found when googling for it (since you mentioned it was a classical example).

FTS index creation error

There seems to be some issue when parsing the filters parameter. Repro vis wasm demo

:create table {k: String => v: String?}
::fts create table:index_name {
    extractor: v,
    extract_filter: !is_null(v),
    tokenizer: Simple,
    filters: [],
}

Error:

Filters must be a list of filters

image

Comparison with Surrealdb

Cozo and SurrealDB https://surrealdb.com/ are two new and innovative databases and both DBs can learn from each other. I don't want to use decades old relational databases for a new project so I am looking for a high level or If possible detailed comparison between Cozo and SurrealDB.

Support Lazy Columns

Lazy Columns are not traditionally nullable, but a row can be created without a value for such a column, and it can be then set later, but only once. I expect these can be made more efficient than traditional null fields, as once rows are fully ground, null checks and flags are no longer required.

This is a natural way to express a distributed function call: https://frest.substack.com/p/distributed-relational

Set up, or allow someone else to set up, suitable communication spaces

After this hit HN, I imagine there are quite a few interested folks. But even if the number of such folks is 10, it would be great to have somewhere to share information about this VERY exciting project.

I would be happy to set up and moderate a r/cozodb Reddit group.

I could just do this, of course, but it would be much more effective if the documentation and readme called out this space, and we had participation from the developer (even if only occasionally).

Can I not :put literal values?

:create test {name: String}

:put test {['Cozo']}
  × The query parser has encountered unexpected input / end of input at 40..40
   ╭─[2:1]
 2 │ 
 3 │ :put test {['Cozo']}

An EBNF might be part of the answer here, but I have semantic confusions also.

Share benchmark code

The benchmarks are very interesting reading.

It would be useful both in understanding the benchmarks but also in seeing examples of using the Rust API.

Please share the source of the benchmarks.

Sorry if you’ve already shared them, but I looked through the benchmark piece again and I didn’t see it. If it’s there, maybe make it more obvious?

Terminal REPL

I just discovered CoZo, and with disappointment, I noticed that the simplest way to use it is through a browser, which is not that simple at all (not to mention security implications of leaving open network sockets even on localhost).

So I made a terminal-based REPL: https://paste.debian.net/1266001/

It's basic, but it works. Type a space to enter multi-line editing.

Are you interested in having that? Should I submit it as a binary in cozo? Or a separate crate?

OS thread exhausted by large number of queries

When performing a large number of operations, I consistently receive thread exhaustion error. It feels like the thread was not being freed (either not being freed in time, or not being freed at all) after each query.

Repro (node.js)

RUST_BACKTRACE=full node ./repro.js
const { CozoDb } = require("cozo-node");

async function main() {
  const db = new CozoDb();

  let iter = 0;

  while (iter++ < 1000000) {
    await db.run(` ?[] <- [[""]]`);

    if (iter % 100 === 0) {
      console.log(iter);
    }
  }
}

main();

Output

100
200
...
30800
30900
31000
31100
31200
31300
thread '<unnamed>' panicked at 'failed to spawn thread: Os { code: 11, kind: WouldBlock, message: "Resource temporarily unavailable" }', /rustc/2c8cc343237b8f7d5a3c3703e3a87f2eb2c54a74/library/std/src/thread/mod.rs:715:29
stack backtrace:
   0:     0x7f02855123f0 - std::backtrace_rs::backtrace::libunwind::trace::h595f06c70adcc478
                               at /rustc/2c8cc343237b8f7d5a3c3703e3a87f2eb2c54a74/library/std/src/../../backtrace/src/backtrace/libunwind.rs:93:5
   1:     0x7f02855123f0 - std::backtrace_rs::backtrace::trace_unsynchronized::h177a0149c76cdde9
                               at /rustc/2c8cc343237b8f7d5a3c3703e3a87f2eb2c54a74/library/std/src/../../backtrace/src/backtrace/mod.rs:66:5
   2:     0x7f02855123f0 - std::sys_common::backtrace::_print_fmt::hc0701fd2c3530c58
                               at /rustc/2c8cc343237b8f7d5a3c3703e3a87f2eb2c54a74/library/std/src/sys_common/backtrace.rs:65:5
   3:     0x7f02855123f0 - <std::sys_common::backtrace::_print::DisplayBacktrace as core::fmt::Display>::fmt::hd4cd115d8750fd6c
                               at /rustc/2c8cc343237b8f7d5a3c3703e3a87f2eb2c54a74/library/std/src/sys_common/backtrace.rs:44:22
   4:     0x7f02850298be - core::fmt::write::h93e2f5923c7eca08
                               at /rustc/2c8cc343237b8f7d5a3c3703e3a87f2eb2c54a74/library/core/src/fmt/mod.rs:1213:17
   5:     0x7f02854edae4 - std::io::Write::write_fmt::h8162dbb45f0b9e62
                               at /rustc/2c8cc343237b8f7d5a3c3703e3a87f2eb2c54a74/library/std/src/io/mod.rs:1682:15
   6:     0x7f0285513bff - std::sys_common::backtrace::_print::h1835ef8a8f9066da
                               at /rustc/2c8cc343237b8f7d5a3c3703e3a87f2eb2c54a74/library/std/src/sys_common/backtrace.rs:47:5
   7:     0x7f0285513bff - std::sys_common::backtrace::print::hcb5e6388b9235f41
                               at /rustc/2c8cc343237b8f7d5a3c3703e3a87f2eb2c54a74/library/std/src/sys_common/backtrace.rs:34:9
   8:     0x7f02855137ff - std::panicking::default_hook::{{closure}}::h9c084969ccf9a722
                               at /rustc/2c8cc343237b8f7d5a3c3703e3a87f2eb2c54a74/library/std/src/panicking.rs:267:22
   9:     0x7f02855148b6 - std::panicking::default_hook::h68fa2ba3c3c6c12f
                               at /rustc/2c8cc343237b8f7d5a3c3703e3a87f2eb2c54a74/library/std/src/panicking.rs:286:9
  10:     0x7f02855148b6 - std::panicking::rust_panic_with_hook::h8d5c434518ef298c
                               at /rustc/2c8cc343237b8f7d5a3c3703e3a87f2eb2c54a74/library/std/src/panicking.rs:688:13
  11:     0x7f0285514364 - std::panicking::begin_panic_handler::{{closure}}::hf33414f5dabf6faf
                               at /rustc/2c8cc343237b8f7d5a3c3703e3a87f2eb2c54a74/library/std/src/panicking.rs:579:13
  12:     0x7f02855142cc - std::sys_common::backtrace::__rust_end_short_backtrace::hc50389427413bb75
                               at /rustc/2c8cc343237b8f7d5a3c3703e3a87f2eb2c54a74/library/std/src/sys_common/backtrace.rs:137:18
  13:     0x7f02855142a1 - rust_begin_unwind
                               at /rustc/2c8cc343237b8f7d5a3c3703e3a87f2eb2c54a74/library/std/src/panicking.rs:575:5
  14:     0x7f0284f8f632 - core::panicking::panic_fmt::h2de7a7938f816de8
                               at /rustc/2c8cc343237b8f7d5a3c3703e3a87f2eb2c54a74/library/core/src/panicking.rs:64:14
  15:     0x7f0284f8fa92 - core::result::unwrap_failed::hdc73d4affce1d414
                               at /rustc/2c8cc343237b8f7d5a3c3703e3a87f2eb2c54a74/library/core/src/result.rs:1790:5
  16:     0x7f02851351be - cozo::runtime::db::Poison::set_timeout::h0e7ced27714c26cc
  17:     0x7f0285139559 - cozo::runtime::db::Db<S>::run_query::hb8ae4d482dec4fc0
  18:     0x7f0285138eb4 - cozo::runtime::db::Db<S>::execute_single_program::h2f2e1b626910ed53
  19:     0x7f028513c40a - cozo::runtime::db::Db<S>::run_script::ha39e45fccbe8422f
  20:     0x7f02853a53cb - std::sys_common::backtrace::__rust_begin_short_backtrace::h1aa3d078f52cd66e
  21:     0x7f02853a2d53 - core::ops::function::FnOnce::call_once{{vtable.shim}}::h9df442b5b8069cae
  22:     0x7f0285515425 - <alloc::boxed::Box<F,A> as core::ops::function::FnOnce<Args>>::call_once::h1c0f3664d7ced314
                               at /rustc/2c8cc343237b8f7d5a3c3703e3a87f2eb2c54a74/library/alloc/src/boxed.rs:1988:9
  23:     0x7f0285515425 - <alloc::boxed::Box<F,A> as core::ops::function::FnOnce<Args>>::call_once::h67647c21c6c4968a
                               at /rustc/2c8cc343237b8f7d5a3c3703e3a87f2eb2c54a74/library/alloc/src/boxed.rs:1988:9
  24:     0x7f0285515425 - std::sys::unix::thread::Thread::new::thread_start::h355d348ba593a22c
                               at /rustc/2c8cc343237b8f7d5a3c3703e3a87f2eb2c54a74/library/std/src/sys/unix/thread.rs:108:17
  25:     0x7f028cf14609 - start_thread
  26:     0x7f028ce39133 - clone
  27:                0x0 - <unknown>

Additional env details:

npm ls cozo-node
  [email protected]

node -v
  v18.4.0

uname -a:
  Linux <*****> 5.15.90.1-microsoft-standard-WSL2 #1 SMP Fri Jan 27 02:56:13 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux

installed binary:
  native/6/cozo_node_prebuilt.node

cat /proc/sys/kernel/threads-max:
  256123

Clarify scalability and concurrency now and planned

The load testing was very promising, but I’m not clear about the concurrency model and hence the suitability of Cozo now, and also intended, as the data store for a typical business small business web site.

I would like to see a discussion of this and further, the practicality of the development model in such a situation. Can I just run Rails, and have each Rails connection use a separate connection to Cozo with its own state?

How about large webs of business rules. Will we able to persist them?

Please know: I ADORE this project. I’m just trying to determine whether this can actually free me from the hell of SQL. :-)

Convert values in list to relation

Hello I am new to cozo and datalog.

How do i convert the values of a list to a relation ?

Example:

input[values] <- [[1, 2, 3]]
?[value] := ???

Expected result:

value
1
2
3

Bool is not a valid ColType

I get a parser error when I try to use Bool as a column type. Looking at the code, it doesn't look like it's a supported type, however the docs indicate it is.

Bulk ingestion

Hi, I'm very excited seeing this project as it seem to fit perfectly what I need very soon.

My initial question would be concerning efficient ingestion of base facts. My current use case is provenance tracking combined with analytics results in distributed environments, so there'll potentially be lots of largish chunks of records. Right now the only API I can see is building a "query' string with a list of parameters. Is there a different option?

It's not a showstopper for me right now, but would be nice knowing on what your current thinking regarding a roadmap is.

Thanks

Uuid Keys Break Queries

When a UUID field is part of a stored relations keys, it breaks queries.

?[b, a] <- [[rand_uuid_v1(), "abc"]]
:replace test { a: String, b: Uuid, }

A basic query such as:

?[a, b] := *test[a, b]

will result in an error like:

thread '<unnamed>' panicked at 'internal error: entered unreachable code: [0, 0, 0, 0, 0, 0, 0, 0, 247]', /Users/zh217/.cargo/registry/src/github.com-1ecc6299db9ec823/cozo-0.1.4/src/data/memcmp.rs:272:18

Changing the UUID field to be a non-key works as expected:

?[b, a] <- [[rand_uuid_v1(), "abc"]]
:replace test { a: String => b: Uuid }
?[a, b] := *test[a, b]
a b
abc dd85b19a-5fde-11ed-a88e-1774a7698039

[Q] How do I implement a `range` table?

Hello!

Admittedly my logic programming is not very good, but I was wondering if it is possible to create an infix rule like this:

range[a, b, x]

where a and b are integers defining a closed interval in Z and x can be any integer within the interval. The most straightforward implementation is this one:

range[x] := x = 0
range[x] := range[y], x = y + 1, x <= 10
?[x] := range[x]

where a = 0 and b = 10. But as you can see, I need to hardcode a and b and I can't really have the range[a, b, x] rule as I originally stated. Is this an inherent limitation of Datalog? I'm thinking this can probably be implemented as a custom fixed rule that takes parameters a and b, but I wonder if something like this could be done only using CozoScript.

Thanks in advance and congrats on the project, I think it's really cool.

panic when using empty constant

Just trying things out, I noticed this panic:

thread '<unnamed>' panicked at 'index out of bounds: the len is 0 but the index is 0', src/query/stored.rs:380:25
[2022-11-08T12:48:51Z ERROR cozoserver] 2022-11-08 12:48:51.798700 Handler panicked: POST /text-query 366.062µs

when handling this query

{
?[id, name] <- [[]]
:replace product {id: Int, name: String}
}

Which is a stupid query, I know, but I struggle to create an empty stored relation without having an output relation (?), even though the docs say the contrary:

{
:replace product {id: Int, name: String}
}

says:

parser::no_entry

  × Program has no entry
  help: You need to have one rule named '?'

EDIT: It seems related to the usage of :replace instead of :create.

Document limitations / impact of large values

Cozo supports storing strings and byte arrays. These at least in theory allow you to store a very large amount of data in a single row. This naturally raises the question of how much data can/should you store in a single row in practice.

It would be useful to have the documentation provide clear answers to the following questions:

  • Is there a limit to the amount of data you can store in a single value or single row?
  • Are there performance impacts or other special considerations for working with large values such as long strings or large byte arrays?

How to query w/a validity field in a vector query?

So the way to query for the most recent version of a relation w/Validity (time travel) is

?[uid, mood] := *status{uid, mood @ 'NOW'}

But how can I do the same with a vector query? I tried

?[uid, mood] := ~status:index{uid, mood @ 'NOW' | query: q, k: 5, ef: 20}, q = vec(<vectors here>)

And that failed with the error "The query parser has encountered unexpected input / end of input at 47..47". No idea how to structure this query, hard to find more information on how the vector query is structured

Can't access the server from localhost on a different port in a browser

Hey, I was playing around with writing a client for querying Cozo in the browser.

The problem is that the /text-query endpoint requires a JSON value to be passed, but I can't set the Content-Type header because the CorsLayer in your server doesn't allow for any headers to be set. I believe this will also disallow the x-cozo-auth header from being set on cross origin requests from the browser as well.

Adding the following line to the CorsLayer instantiation fixed the issue, and will allow for the auth header as well.

    .allow_headers([header::CONTENT_TYPE, HeaderName::from_static("x-cozo-auth")])

I can make a pull request if you like.

As an aside, thanks for doing Cozo! It looks great, I'm going to use it in lieu of SQLite in my file manager project and see how it goes.

`cozo-node` install fails with `TAR_BAD_ARCHIVE: Unrecognized archive format`

When trying to do pnpm add cozo-node or npm install cozo-node, I'm getting a TAR_BAD_ARCHIVE: Unrecognized archive format error both under NodeJS v14 and v19. Full stacktrace:

.../node_modules/cozo-node install$ node-pre-gyp install
│ node-pre-gyp info it worked if it ends with ok
│ node-pre-gyp info using [email protected]
│ node-pre-gyp info using [email protected] | linux | x64
│ node-pre-gyp info check checked for "/tmp/cozo-demo/node_modules/.pnpm/[email protected]/node_modules/cozo-node/native/6/index.node" (not found)
│ node-pre-gyp http GET https://github.com/cozodb/cozo-lib-nodejs/releases/download/0.3.0/6-linux-x64.tar.gz
│ node-pre-gyp ERR! install TAR_BAD_ARCHIVE: Unrecognized archive format 
│ node-pre-gyp ERR! install error 
│ node-pre-gyp ERR! stack Error: TAR_BAD_ARCHIVE: Unrecognized archive format
│ node-pre-gyp ERR! stack     at Unpack.warn (/tmp/cozo-demo/node_modules/.pnpm/[email protected]/node_modules/tar/lib/warn-mixin.js:21:40)
│ node-pre-gyp ERR! stack     at Unpack.warn (/tmp/cozo-demo/node_modules/.pnpm/[email protected]/node_modules/tar/lib/unpack.js:229:18)
│ node-pre-gyp ERR! stack     at Unpack.<anonymous> (/tmp/cozo-demo/node_modules/.pnpm/[email protected]/node_modules/tar/lib/parse.js:83:14)
│ node-pre-gyp ERR! stack     at Unpack.emit (events.js:412:35)
│ node-pre-gyp ERR! stack     at Unpack.[emit] (/tmp/cozo-demo/node_modules/.pnpm/[email protected]/node_modules/tar/lib/parse.js:303:12)
│ node-pre-gyp ERR! stack     at Unpack.[maybeEnd] (/tmp/cozo-demo/node_modules/.pnpm/[email protected]/node_modules/tar/lib/parse.js:426:17)
│ node-pre-gyp ERR! stack     at Unpack.[consumeChunk] (/tmp/cozo-demo/node_modules/.pnpm/[email protected]/node_modules/tar/lib/parse.js:458:21)
│ node-pre-gyp ERR! stack     at Unzip.<anonymous> (/tmp/cozo-demo/node_modules/.pnpm/[email protected]/node_modules/tar/lib/parse.js:372:29)
│ node-pre-gyp ERR! stack     at Unzip.emit (events.js:412:35)
│ node-pre-gyp ERR! stack     at Unzip.[emitEnd2] (/tmp/cozo-demo/node_modules/.pnpm/[email protected]/node_modules/minipass/index.js:524:23)
│ node-pre-gyp ERR! System Linux 5.15.0-53-generic
│ node-pre-gyp ERR! command "/usr/local/bin/node" "/tmp/cozo-demo/node_modules/.pnpm/@[email protected]/node_modules/@mapbox/node-pre-gyp/bin/node-pre-gyp" "install"
│ node-pre-gyp ERR! cwd /tmp/cozo-demo/node_modules/.pnpm/[email protected]/node_modules/cozo-node
│ node-pre-gyp ERR! node -v v14.21.2
│ node-pre-gyp ERR! node-pre-gyp -v v1.0.10
│ node-pre-gyp ERR! not ok 
│ TAR_BAD_ARCHIVE: Unrecognized archive format

REPL's `%set` command should parse values using CozoScript syntax

I was trying to set some parameters in the REPL, but I could not understand why I was getting an error:

=> %set flag true

 × expected value at line 1 column 1

After reading how the parsing of parameters is done in the REPL, I noticed that it just uses serde_json::fromStr to parse a DataValue (Link) . So the way to do what I wanted is this:

=> %set flag {"Bool": true}
=> %params
{
  "flag": {
    "Bool": true
  }
}

This does not look right, I assume the intention here was to parse the value as it is usually done in a standard script.

Better explain Validity

The docs are generally great, but I find the discussion of validity confusing. The crucial part says:

All rows with identical key parts except the last validity part form the history for that key, interpreted in the following way: the fact represented by a row is valid if its flag is true, and the range of its validity is from its timestamp (inclusive) up until the timestamp of the next row under the same key (excluding the last validity part, and here time is interpreted to flow forward). A row with a false assertive flag does nothing other than making the previous fact invalid.

I am confused about what the Boolean does.

"the range of its validity is from its timestamp (inclusive) up until the timestamp of the next row" says the timestamp part behaves in the obvious way: a row is considered true between its timestamp and the timestamp of the next row with the same key, sorted by timestamp. Fine.

But:

" A row with a false assertive flag does nothing other than making the previous fact invalid" this sounds like the regular timestamp behaviour. When I insert a row with the same key but a later timestamp, the previous row could be called invalid as it's false after the new timestamp.

There is further explanation, but I don't understand it at all:

It is possible for two rows to have identical non-validity key parts and identical timestamps, but differ in their assertive flags. In this case when queried against the exact timestamp, the row is valid, as if the row with the false flag does not exist. The use case for this behaviour is to assert a fact only until a future time when that fact is sure to remain valid. When that time comes, a new fact can be asserted, and if the old fact remains valid there is no need to :rm the previous retraction.

HNSW cannot index more than one vector per relation

Repro steps is based on v0.6 release note, with dimension reduced to 1 for demo purpose

:create product {
    id 
    => 
    name, 
    description, 
    price, 
    name_vec: <F32; 1>, 
    description_vec: <F32; 1>
}
::hnsw create product:semantic{
    fields: [name_vec, description_vec], 
    dim: 1, 
    ef: 16, 
    m: 32,
}
?[id, name, description, price, name_vec, description_vec] <- [[1, "name", "description", 100, [1], [1]]]

:put product {id => name, description, price, name_vec, description_vec}

Results

  × when executing against relation 'product'
  ╰─▶ Cannot find tuple [1]

image

I tried removing the description_vec and it works as expected.

:create product {
    id 
    => 
    name, 
    description, 
    price, 
    name_vec: <F32; 1>,
}
::hnsw create product:semantic{
    fields: [name_vec], 
    dim: 1, 
    ef: 16, 
    m: 32,
}
?[id, name, description, price, name_vec] <- [[1, "name", "description", 100, [1]]]

:put product {id => name, description, price, name_vec}

image

Performance issue (or infinite loop)

I've tried a variation of this example present in tutorial:

shortest[b, min(dist)] := *route{fr: 'LHR', to: b, dist} 
                          # Start with the airport 'LHR', retrieve a direct route from 'LHR' to b

shortest[b, min(dist)] := shortest[c, d1], # Start with an existing shortest route from 'LHR' to c
                          *route{fr: c, to: b, dist: d2},  # Retrieve a direct route from c to b
                          dist = d1 + d2 # Add the distances

?[dist] := shortest['YPO', dist] # Extract the answer for 'YPO'. 
                                 # We chose it since it is the hardest airport to get to from 'LHR'.

Changing it in this way:

shortest[a, b, min(dist)] := *route{fr: a, to: b, dist} 
shortest[a, b, min(dist)] := shortest[a, c, d1],
                          *route{fr: c, to: b, dist: d2},
                          dist = d1 + d2

?[dist] := shortest['LHR', 'YPO', dist]

Despite it should be an equivalent query I don't get any result in reasonable time in https://cozodb.github.io/wasm-demo/

I don't think it is expected, am I missing something?

Conda packages?

Hi there, brilliant project, thanks a lot! Are there any plans to build and provide conda packages for cozo? I have never packaged a Python package that wraps Rust code, so no idea if it's easily doable and I can just do it myself, or whether it requires an involved build setup...

Use SQLite WAL mode

The Cozo documentation for the SQLite engine says:

SQLite… is effectively single-threaded when write concurrency is involved

There is an option when using SQLite that makes this very much false. It switches SQLite to use a Write-Ahead Log, which means one can happily open the same SQLite database from multiple separate process that can read and write simultaneously.

https://www.sqlite.org/wal.html

Suggestions for a more elaborate Hello World example

I suggest adding a tad bit more advanced hello world example, to quickly get people to understand how to register facts, and to query them with an advanced query.

I don't know (yet) how to represent this in cozo, but in Prolog, I often do something like this (might make some syntax mistake, but I hope you get the idea):

So, saving this into facts.pl

parent(joseph, jakob).
parent(jakob, isaac).
parent(isaac, abraham).

grandparent(Grandchild, Grandparent) :-
    parent(Grandchild, Middleperson),
    parent(Middleperson, Grandparent).

... and then running this in a swipl shell:

?- grandparent(jakob, Who).
Who = abraham.

?- grandparent(joseph, Who).
Who = isaac.

?- 

To me, this shows:

  1. How to assert facts that are relations
  2. How to define queries based on relation facts
  3. How to query this query with different inputs

Do you think something similar would make sense as a hello world example for cozo?

How to remove an HNSW index?

Can't find anywhere in the documentation that says how to delete a HNSW index. Trying to delete because Cozo blocks removing a table that has an index attached. ::remove doesn't work, ::hnsw remove doesn't work. Unsure how to approach

Make WASM persistable

Docs say WASM version only runs in memory.

Given that SQLite is available for WASM, even in the browser, it would be useful to make the wasm version persistent.

Most languages would have a good shot at running the WASM version immediately.

hnsw create index ops requires trailing comma

Without comma:

image

With comma:

image

I assume the intention was to support optional trailing comma, as it is demoed in all the documentations, as well as is used with the other system ops, e.g. create relation can use either:

image

image

I looked into these lines in the pest file but can't find any problem.

index_create_hnsw = {"create" ~ compound_ident ~ ":" ~ ident ~ "{" ~ (index_opt_field ~ ",")* ~ index_opt_field? ~ "}"}
index_opt_field = {ident ~ ":" ~ expr}

Support host language-defined relations

I would like to create an app that shows the user's file system, contacts, calendar etc all in one datalog-queryable interface. I'm going to start as soon as I work out how to call Cozo from Dart (there is a very nice binding generator for Rust, but I'm not yet enough of a Rust dev to get it to work yet — if anyone reading this can make this work, I'll sell the absolute crap out of Cozo in that community).

Ideally, I could write code that would present those external data sources as relations in Cozo.

Even if the API needed to be complex to fully support inference, folks would do it.

Usage of functions is not clear

The list functions are documented, but it is not clear from the manual or the tutorial how to apply them.

For example, with the list function... other than calling list(1, 2, 3) in the right-hand side of a rule, how would it be used?

It doesn't seem to work as an aggregator.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.