d-r-q / qbit Goto Github PK
View Code? Open in Web Editor NEWqbit is a kotlin-multiplatform embeddable decentralized DBMS with object-relational information model
qbit is a kotlin-multiplatform embeddable decentralized DBMS with object-relational information model
Actual behavior: Eavs are serialized to trx log via custom serialization.
Quality for this serialization is unclear and it's more difficult to implement support of nested map attributes using custom serialization.
Consider switch to kotlinx.serialization/CBOR.
Comparison matrix:
custom | kotlinx | |
---|---|---|
Dependencies | None | On stdlib |
Multiplatform | Yes | Yes |
Performance | ??? | Probably more performant |
Memory footprint | ??? | Probably more effective |
Support of Maps | Hard | Build-in |
Requrements for stored data | No | Probably will require some changes in qbit data structures |
kotlinx advantages:
Make Conn.persist, Trx.(persist, commit, rollback) and Db.pull suspend functions and Db.query to return Channel
Study Kotlin Flow (https://kotlinlang.org/docs/reference/coroutines/flow.html) as alternative for Channels in queries
Add file format version at beginning of stored files, so in future format changes may be introduces in backward compatible way
Consider splitting general persist method into several specialized method.
Advantages of single persist:
Advantages of multiple methods:
Deletions can be implemented without separation by queryForGids - query, that returns just Gids.
Another idea - return specialized object in queries, that has methods like delete and update(fn: (T))
Partial may be implemented via special Patch objects, that specify only values that should be changed.
Actually both APIs can exists simultaneously.
Actual behavior: qbit supports only pull api to pull entities
But database should provide way to execute analytical queries - i.e. queries that allow to fetch information of custom shape, using joins and aggregate functions.
There are several approaches to consider:
It's necessary to investigate all existing approaches for querying, choose the best one and implement it.
Add ability to sync (fetch and push) remote storage with local storage.
Implement merges of conflicting transactions.
For first version following conflict resolution strategies are proposed:
Actual behavior: qbit can ensure uniqueness only for value of single attribute
But there are situations when uniqueness should be enforced to set of attributes.
Implement platform for js/browser target
Currently StorageTest is duplicated in all modules due to kotlin multiplatform multiproject build configuration.
In particular there is no (obvious) way to declare dependency on project test sources.
Find out a way to get rid of this duplication
Add ability to initialize local storage using remote storage
Actual behavior: qbit keep full trx log forever.
Original reason to keep trx log forever was conflict resolution:
But for some users size of trx log may be more important, that those features.
And actually qbit can fold trx log for subgraph, that already is contained by all branches of all instances. In this case graph structure still maybe maintained and 3-way merge still may be performed.
But in current implementation there are at least on problem: nodes are refer each other via node's content hash, so all nodes that are kept in trx log, should be rebased on new folded root node. Probably this issue may be solved by introducing some level of indirection or rejection of node links via hashes.
Another feature to consider - is mark some attributes to keep forever and add them in separate index, that keep all seen entity states.
Actual behavior: when persisting entity with prop == emptyList(), the attribute for that property isn't actually persisted
This bug may be fixed by #6 or another way is to store special placeholder eav for such cases
See FunTest.Test scalar list clearing
, FunTest.Storage of entity with empty list should preserve list
It's probably makes sense to store huge objects (byte arrays and strings) apart from primary entities. This will allow fetch them truly lazily and/or via special api/query mechanism
Actual behavior: when storing entity without attributes or instance of data class with all props == null, the entity isn't actually stored and cannot be fetched later.
I see two ways to fix it:
See FunTest.Peristance of entity with all attr = null should actually persist the entity
, FunTest.Test persistence of entity without attributes
Consider addition of component attributes. It's attributes of composite type, that do not have separate gids, so they are embodied in containing entity and automatically deleted on deletion of containing entity
Add special type of attributes with following semantics:
To implement this feature, following questions should be considered:
I guess, that most of existing dbmses has notion of entity collection. qbit actually also has this notion, but it hidden - hasAttr queries.
Investigate entity collections in other databases and consider make them explicit in qbit
Implement support of attributes (and types) aliases. It will enable backward compatible renames
Currently Kotlin do not have standard types for dates, but I believe that it's very likely, that they will appear in the future. On other side implement this types in common code is pretty difficult, and if this types will be published, we should to support them forever, even when standard types will appear. So this attribute types should be removed, and instead of them qbit should provide platform specific helper functions, to create platform specific date objects from longs and longs+strings.
Actual behavior: qbit adds to ave index all attributes.
This solution requires additional storage for all attributes, even those, that are blobs and/or does not appear in any query.
There are two options to implement it:
It's unclear what is expected behavior for queries, that use not indexed attributes.
Implement ability to initialize remote storage with local storage
Implement fuzzer for qbit. It's a test, that generates random "program" (set of inserts, updates, deletes and selects), "executes" it and tries to evaluate if it get expected results
Actual behavior: trx log is stored as List[Eav].
This solution has several disadvantages:
As an alternative, trx may be stored as List[Entity]. Consider what is disadvantages of this solution and choose and implement one.
Probably it's worth to store trx as Map[EntityCollection,List[Entity]] (see #51).
Implement support of crdt semantics for attributes. It's necessary to consider options and choose the best for expression of the fact, that given attribute should have crdt semantics
Actual behavior: mutable entities are cached and returned to client as is. It introduces subtle bug in querying - queries are executed against original state of entity, but returns only ids, that then is pulled from db, using cache. So if mutated entity may still be returned in query that is matched by it original state, or vice versa, mutated entity may not be returned in query that is matched by it mutated state.
Add cloning of cached entities, before return it to client
Implement ability to fetch entities that are referencing to given entity via given attribute.
To implement this feature, new vae index should be added, that stores only eavs for reference attributes
Actual behavior: user can pass arbitrary value for entity gid
If this value will be greater, than current value of instance gids generator, this will lead to collisions.
So qbit should forbid storing of entities with gid >= current value of instance gids generator.
Additionally when #21 will be implemented, qbit should control, that type for given gid isn't changes
See FunTest.Qbit should forbid externally generated gids
Actual behavior: user should manually instantiate storage and pass it to qbit
To decouple user from implementation implement qbit connection factory function that accepts string parameter (kind of "qbit:fs://tmp/qbit/mydb") and internally creates corresponding instance of storage.
Implementation should also provide Storage implementation registration end point, so user, who implemented custom Storage may register it.
This feature requires some way to automatically discover service providers and I'm not sure it may be implemented on Native and JS platforms.
Implement decorator for Storage, that will encrypt/decrypt data on saving and fetching.
Looks like it's simple task - just add it everywhere List used and, probably, refactor out some utility functions
Implement platform for native/macos target
Implement ability to add remote storage to qbit
Looks like information about remotes should be stored outside of db, because it's unclear how to implement #32 if remotes will be stored in db.
Another problem is IID aquiring. To initialize instance, qbit should acquire iid, but this iid should be unique, so some protocol for iid acquiring should be developed
Yet another problem is how to organize synchronization. Current proposal is follows:
Implement platform for js/node target
Develop migration API and guidelines
Actual behavior: qbit exposes only typed API but do not actually now type of entity
Because of it's possible to store entity of another type for existing gid and qbit cannot prevent it.
To solve this issue, qbit can store type information about schema generated by classes, add to entities internal attribute with ref to type information and check it on persistance
Related: #51
There was integration with pitest, when qbit was jvm-only project, that was lost while migration to multiplatform.
Implement Storage using browser's local storage
Depends on #41
Implement Storage using google drive
Implement platform for native/linux target:
Actual behavior: qbit builds inmemory index for full trx log every time when opening connection
This approach works for small dbs, but obviously it wouldn't work for large dbs.
To mitigate this issue, qbit may build and store persistent index from time to time and then when opening connection build in memory index only for tail of trx log.
Then, while querying qbit will merge data from in memory index with data from persisted index.
Tombstones should be kept only in inmemory index and should be filtered out on persisting index.
There are two options for index organization to consider:
There are another structures (at least hashes) for indexes persistent storage
Also it's unclear now which behavior is expected in cases when persistent index is build for one branch of trx log with concurrent modifications.
Actual behavior: when committing trx, qbit first writes trx to storage and then tries to update head. But in current implementation, if head is updated concurrently, qbit rollback transaction. But do not remove trx created on first step.
Desired behavior: qbit removes rollbacked trx from storage.
See TrxTest.In case of transaction commit abort, transaction should clean up written data
Implement platform for native/win target
Implement Storage using Dropbox
Consider addition of ability for qbit to automatically performs syncs. Possible events to sync:
Develop schema design guidelines
Actual behavior: entity is stored fully in trx log.
This solution has at least two disadvantages:
But it has also and at least one advantage:
Investigate how and why other storage systems (including git) stores theirs transaction logs and consider switch to storing only patches
Actual behavior: currently qbit stores of trx data in memory until commit
It's not appropriate for bulk inserts - they may not fit into memory. Consider adding api for transaction flushing to storage.
Implement Storage using WebDav
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.