d-r-q / qbit Goto Github PK

View Code? Open in Web Editor NEW

44.0 2.0 9.0 2.06 MB

qbit is a kotlin-multiplatform embeddable decentralized DBMS with object-relational information model

Kotlin 100.00%

database p2p embeddable crdts kotlin kotlin-multiplatform

qbit's People

Stargazers

Watchers

Forkers

s1x6 pushkov-fedor jalalkhademi zaitsev726 ldadima ajani2001 professornik tarihay

qbit's Issues

Storage format: custom vs kotlinx serialization

Actual behavior: Eavs are serialized to trx log via custom serialization.

Quality for this serialization is unclear and it's more difficult to implement support of nested map attributes using custom serialization.

Consider switch to kotlinx.serialization/CBOR.

Restore serialization test

Comparison matrix:

	custom	kotlinx
Dependencies	None	On stdlib
Multiplatform	Yes	Yes
Performance	???	Probably more performant
Memory footprint	???	Probably more effective
Support of Maps	Hard	Build-in
Requrements for stored data	No	Probably will require some changes in qbit data structures

kotlinx advantages:

Probably will resolve #114/#132

Make api suspend

Make Conn.persist, Trx.(persist, commit, rollback) and Db.pull suspend functions and Db.query to return Channel

Study Kotlin Flow (https://kotlinlang.org/docs/reference/coroutines/flow.html) as alternative for Channels in queries

Storage format: add format version

Add file format version at beginning of stored files, so in future format changes may be introduces in backward compatible way

Split persist to persist, update and delete

Consider splitting general persist method into several specialized method.

Advantages of single persist:

Lesser api size
Ability to persist inserts and deletions in single trx without explicit creation of trx

Advantages of multiple methods:

Better discoverability
Hides tombstone concept from user
Ability to perform partial updates and deletions by query

Deletions can be implemented without separation by queryForGids - query, that returns just Gids.
Another idea - return specialized object in queries, that has methods like delete and update(fn: (T))

Partial may be implemented via special Patch objects, that specify only values that should be changed.

Actually both APIs can exists simultaneously.

Analytic queries

Actual behavior: qbit supports only pull api to pull entities

But database should provide way to execute analytical queries - i.e. queries that allow to fetch information of custom shape, using joins and aggregate functions.

There are several approaches to consider:

Something sql-like - sql, jpql, cql (cassandra), gql (neo4j) etc
GraphQL
Monad comprehensions
Datalog
Something else?

It's necessary to investigate all existing approaches for querying, choose the best one and implement it.

Remotes: sync

Add ability to sync (fetch and push) remote storage with local storage.

Merges

Implement merges of conflicting transactions.

For first version following conflict resolution strategies are proposed:

Deletions have precedence on updates
Otherwise last-writer wins

Composite unique constraints

Actual behavior: qbit can ensure uniqueness only for value of single attribute

But there are situations when uniqueness should be enforced to set of attributes.

MPP: js/browser

Implement platform for js/browser target

Remove duplication of StorageTest

Currently StorageTest is duplicated in all modules due to kotlin multiplatform multiproject build configuration.

In particular there is no (obvious) way to declare dependency on project test sources.

Find out a way to get rid of this duplication

Remotes: clone

Add ability to initialize local storage using remote storage

Trx log folding

Actual behavior: qbit keep full trx log forever.

Original reason to keep trx log forever was conflict resolution:

I believe, that 3-way merge enabled by trx log in form of DAG, may produce better conflict resolutions
In case of invalid conflict resolution, entity may be reverted to any of it's previous state
Probably it enables implementation of some sort of rerere for qbit.

But for some users size of trx log may be more important, that those features.

And actually qbit can fold trx log for subgraph, that already is contained by all branches of all instances. In this case graph structure still maybe maintained and 3-way merge still may be performed.

But in current implementation there are at least on problem: nodes are refer each other via node's content hash, so all nodes that are kept in trx log, should be rebased on new folded root node. Probably this issue may be solved by introducing some level of indirection or rejection of node links via hashes.

Another feature to consider - is mark some attributes to keep forever and add them in separate index, that keep all seen entity states.

Fix peristance of empty list

Actual behavior: when persisting entity with prop == emptyList(), the attribute for that property isn't actually persisted

This bug may be fixed by #6 or another way is to store special placeholder eav for such cases

See FunTest.Test scalar list clearing, FunTest.Storage of entity with empty list should preserve list

Blobs storage

It's probably makes sense to store huge objects (byte arrays and strings) apart from primary entities. This will allow fetch them truly lazily and/or via special api/query mechanism

Fix persistance of entities without attributes

Actual behavior: when storing entity without attributes or instance of data class with all props == null, the entity isn't actually stored and cannot be fetched later.

I see two ways to fix it:

Add special internal placeholder attribute, that is automatically added to entities without other attributes
Add for all entities some universally useful attribute, such as creation date. But with current implementation of storage this attribute is redundant (it may be retrieved from first transaction containing given entity) and not optimal for persisting data classes - this attribute have to be fetched additionally and added to attribute list of being persisted data class instance

See FunTest.Peristance of entity with all attr = null should actually persist the entity, FunTest.Test persistence of entity without attributes

Component attributes

Consider addition of component attributes. It's attributes of composite type, that do not have separate gids, so they are embodied in containing entity and automatically deleted on deletion of containing entity

Identifier attributes

Add special type of attributes with following semantics:

They are unique
They are enables upsert semantics - when persisting entity without gid, but with existing value for iden attribute, qbit should update exsiting entity instead of throwing uniqueness violation constraint.

To implement this feature, following questions should be considered:

Is entity should be allowed to have several id attrs? Probably for first version only single id attribute should be allowed for entity

Entity collections

I guess, that most of existing dbmses has notion of entity collection. qbit actually also has this notion, but it hidden - hasAttr queries.

Investigate entity collections in other databases and consider make them explicit in qbit

Aliases

Implement support of attributes (and types) aliases. It will enable backward compatible renames

Remove Instatnce and ZonedDateTime types

Currently Kotlin do not have standard types for dates, but I believe that it's very likely, that they will appear in the future. On other side implement this types in common code is pretty difficult, and if this types will be published, we should to support them forever, even when standard types will appear. So this attribute types should be removed, and instead of them qbit should provide platform specific helper functions, to create platform specific date objects from longs and longs+strings.

Do not index all attributes

Actual behavior: qbit adds to ave index all attributes.

This solution requires additional storage for all attributes, even those, that are blobs and/or does not appear in any query.

There are two options to implement it:

Index all attributes by default and add ability to disable indexation for specified attributes
Do not index anything by default and add ability to enable indexation for specified attributes

It's unclear what is expected behavior for queries, that use not indexed attributes.

Remotes: init

Implement ability to initialize remote storage with local storage

Fuzzer

Implement fuzzer for qbit. It's a test, that generates random "program" (set of inserts, updates, deletes and selects), "executes" it and tries to evaluate if it get expected results

Storage format: Entity vs List<Eav>

Actual behavior: trx log is stored as List[Eav].

This solution has several disadvantages:

Redundant storage of Gids and attribute names
~~Unclear how to implement (nested) Map attributes~~
Unclear how to support empty lists

As an alternative, trx may be stored as List[Entity]. Consider what is disadvantages of this solution and choose and implement one.

Probably it's worth to store trx as Map[EntityCollection,List[Entity]] (see #51).

CRDTs

Implement support of crdt semantics for attributes. It's necessary to consider options and choose the best for expression of the fact, that given attribute should have crdt semantics

Implement support of Map attributes

Implementation of this feature depends on #6 and #8

Implement support of Map attributes. It's better to support nested maps, if possible

Investigate caching of mutable entities

Actual behavior: mutable entities are cached and returned to client as is. It introduces subtle bug in querying - queries are executed against original state of entity, but returns only ids, that then is pulled from db, using cache. So if mutated entity may still be returned in query that is matched by it original state, or vice versa, mutated entity may not be returned in query that is matched by it mutated state.

Add cloning of cached entities, before return it to client

Reverse fetch

Implement ability to fetch entities that are referencing to given entity via given attribute.

To implement this feature, new vae index should be added, that stores only eavs for reference attributes

Forbit manually created gids

Actual behavior: user can pass arbitrary value for entity gid

If this value will be greater, than current value of instance gids generator, this will lead to collisions.

So qbit should forbid storing of entities with gid >= current value of instance gids generator.

Additionally when #21 will be implemented, qbit should control, that type for given gid isn't changes

See FunTest.Qbit should forbid externally generated gids

Related: #60 #27

Connection urls

Actual behavior: user should manually instantiate storage and pass it to qbit

To decouple user from implementation implement qbit connection factory function that accepts string parameter (kind of "qbit:fs://tmp/qbit/mydb") and internally creates corresponding instance of storage.

Implementation should also provide Storage implementation registration end point, so user, who implemented custom Storage may register it.

This feature requires some way to automatically discover service providers and I'm not sure it may be implemented on Native and JS platforms.

CryptoStorage

Implement decorator for Storage, that will encrypt/decrypt data on saving and fetching.

Implement support of Set attributes

Looks like it's simple task - just add it everywhere List used and, probably, refactor out some utility functions

MPP: native/macos

Implement platform for native/macos target

Remotes: add remote

Implement ability to add remote storage to qbit

Looks like information about remotes should be stored outside of db, because it's unclear how to implement #32 if remotes will be stored in db.

Another problem is IID aquiring. To initialize instance, qbit should acquire iid, but this iid should be unique, so some protocol for iid acquiring should be developed

Yet another problem is how to organize synchronization. Current proposal is follows:

Each instance have branch and only that instance can update this branch
While sync instance fetches all data from remotes and all branches. If instance detect conflict, it merges it, sets it's branch to merge node and pushes it into remote
While sync another instance it will fetch new value of first instance branch and be able to fast forward it's branch to merge

MPP: js/node

Implement platform for js/node target

Migration

Develop migration API and guidelines

Store entity types

Actual behavior: qbit exposes only typed API but do not actually now type of entity

Because of it's possible to store entity of another type for existing gid and qbit cannot prevent it.

To solve this issue, qbit can store type information about schema generated by classes, add to entities internal attribute with ref to type information and check it on persistance

Related: #51

Recover pitests

There was integration with pitest, when qbit was jvm-only project, that was lost while migration to multiplatform.

JsLocalStorage

Implement Storage using browser's local storage

Depends on #41

Google Drive backend

Implement Storage using google drive

MPP: native/linux

Implement platform for native/linux target:

Make qbit writer single threaded and get rid of ConcurrentHashMap
Stub WeakHashMap by simple HashMap, or by linked hash map with size limit and removal of elements in insertion order
ByteArray.asInput
currentTimeMillis
SHA-1
assert

see: https://play.kotlinlang.org/hands-on/Kotlin%20Native%20Concurrency/00_Introduction?utm_source=kotlin_twitter&utm_medium=play&utm_campaign=touchlab

Persistent index

Actual behavior: qbit builds inmemory index for full trx log every time when opening connection

This approach works for small dbs, but obviously it wouldn't work for large dbs.

To mitigate this issue, qbit may build and store persistent index from time to time and then when opening connection build in memory index only for tail of trx log.

Then, while querying qbit will merge data from in memory index with data from persisted index.

Tombstones should be kept only in inmemory index and should be filtered out on persisting index.

There are two options for index organization to consider:

B-Tree. B-Tree is already implemented (see cdbf347), it should be generally usable, but I failed to implement bulk replace operation - operation that in single pass replaces given set of item by new ones. This operation is important for immutable structure, but may be not such important and may be replaced by removeAll and addAll operations
SSTable. It's necessary to investigate this approach and compare it with b-tree

There are another structures (at least hashes) for indexes persistent storage

Also it's unclear now which behavior is expected in cases when persistent index is build for one branch of trx log with concurrent modifications.

Clean up on rollback

Actual behavior: when committing trx, qbit first writes trx to storage and then tries to update head. But in current implementation, if head is updated concurrently, qbit rollback transaction. But do not remove trx created on first step.

Desired behavior: qbit removes rollbacked trx from storage.

See TrxTest.In case of transaction commit abort, transaction should clean up written data