Git Product home page Git Product logo

qbit's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

qbit's Issues

Storage format: custom vs kotlinx serialization

Actual behavior: Eavs are serialized to trx log via custom serialization.

Quality for this serialization is unclear and it's more difficult to implement support of nested map attributes using custom serialization.

Consider switch to kotlinx.serialization/CBOR.

  • Restore serialization test

Comparison matrix:

custom kotlinx
Dependencies None On stdlib
Multiplatform Yes Yes
Performance ??? Probably more performant
Memory footprint ??? Probably more effective
Support of Maps Hard Build-in
Requrements for stored data No Probably will require some changes in qbit data structures

kotlinx advantages:

Split persist to persist, update and delete

Consider splitting general persist method into several specialized method.

Advantages of single persist:

  1. Lesser api size
  2. Ability to persist inserts and deletions in single trx without explicit creation of trx

Advantages of multiple methods:

  1. Better discoverability
  2. Hides tombstone concept from user
  3. Ability to perform partial updates and deletions by query

Deletions can be implemented without separation by queryForGids - query, that returns just Gids.
Another idea - return specialized object in queries, that has methods like delete and update(fn: (T))

Partial may be implemented via special Patch objects, that specify only values that should be changed.

Actually both APIs can exists simultaneously.

Analytic queries

Actual behavior: qbit supports only pull api to pull entities

But database should provide way to execute analytical queries - i.e. queries that allow to fetch information of custom shape, using joins and aggregate functions.

There are several approaches to consider:

  1. Something sql-like - sql, jpql, cql (cassandra), gql (neo4j) etc
  2. GraphQL
  3. Monad comprehensions
  4. Datalog
  5. Something else?

It's necessary to investigate all existing approaches for querying, choose the best one and implement it.

Remotes: sync

Add ability to sync (fetch and push) remote storage with local storage.

Merges

Implement merges of conflicting transactions.

For first version following conflict resolution strategies are proposed:

  1. Deletions have precedence on updates
  2. Otherwise last-writer wins

Composite unique constraints

Actual behavior: qbit can ensure uniqueness only for value of single attribute

But there are situations when uniqueness should be enforced to set of attributes.

MPP: js/browser

Implement platform for js/browser target

  • WeakHashMap
  • ByteArray.asInput
  • currentTimeMillis
  • SHA-1
  • assert

Remove duplication of StorageTest

Currently StorageTest is duplicated in all modules due to kotlin multiplatform multiproject build configuration.

In particular there is no (obvious) way to declare dependency on project test sources.

Find out a way to get rid of this duplication

Remotes: clone

Add ability to initialize local storage using remote storage

Trx log folding

Actual behavior: qbit keep full trx log forever.

Original reason to keep trx log forever was conflict resolution:

  1. I believe, that 3-way merge enabled by trx log in form of DAG, may produce better conflict resolutions
  2. In case of invalid conflict resolution, entity may be reverted to any of it's previous state
  3. Probably it enables implementation of some sort of rerere for qbit.

But for some users size of trx log may be more important, that those features.

And actually qbit can fold trx log for subgraph, that already is contained by all branches of all instances. In this case graph structure still maybe maintained and 3-way merge still may be performed.

But in current implementation there are at least on problem: nodes are refer each other via node's content hash, so all nodes that are kept in trx log, should be rebased on new folded root node. Probably this issue may be solved by introducing some level of indirection or rejection of node links via hashes.

Another feature to consider - is mark some attributes to keep forever and add them in separate index, that keep all seen entity states.

Fix peristance of empty list

Actual behavior: when persisting entity with prop == emptyList(), the attribute for that property isn't actually persisted

This bug may be fixed by #6 or another way is to store special placeholder eav for such cases

See FunTest.Test scalar list clearing, FunTest.Storage of entity with empty list should preserve list

Blobs storage

It's probably makes sense to store huge objects (byte arrays and strings) apart from primary entities. This will allow fetch them truly lazily and/or via special api/query mechanism

Fix persistance of entities without attributes

Actual behavior: when storing entity without attributes or instance of data class with all props == null, the entity isn't actually stored and cannot be fetched later.

I see two ways to fix it:

  1. Add special internal placeholder attribute, that is automatically added to entities without other attributes
  2. Add for all entities some universally useful attribute, such as creation date. But with current implementation of storage this attribute is redundant (it may be retrieved from first transaction containing given entity) and not optimal for persisting data classes - this attribute have to be fetched additionally and added to attribute list of being persisted data class instance

See FunTest.Peristance of entity with all attr = null should actually persist the entity, FunTest.Test persistence of entity without attributes

Component attributes

Consider addition of component attributes. It's attributes of composite type, that do not have separate gids, so they are embodied in containing entity and automatically deleted on deletion of containing entity

Identifier attributes

Add special type of attributes with following semantics:

  1. They are unique
  2. They are enables upsert semantics - when persisting entity without gid, but with existing value for iden attribute, qbit should update exsiting entity instead of throwing uniqueness violation constraint.

To implement this feature, following questions should be considered:

  1. Is entity should be allowed to have several id attrs? Probably for first version only single id attribute should be allowed for entity

Entity collections

I guess, that most of existing dbmses has notion of entity collection. qbit actually also has this notion, but it hidden - hasAttr queries.

Investigate entity collections in other databases and consider make them explicit in qbit

Aliases

Implement support of attributes (and types) aliases. It will enable backward compatible renames

Remove Instatnce and ZonedDateTime types

Currently Kotlin do not have standard types for dates, but I believe that it's very likely, that they will appear in the future. On other side implement this types in common code is pretty difficult, and if this types will be published, we should to support them forever, even when standard types will appear. So this attribute types should be removed, and instead of them qbit should provide platform specific helper functions, to create platform specific date objects from longs and longs+strings.

Do not index all attributes

Actual behavior: qbit adds to ave index all attributes.

This solution requires additional storage for all attributes, even those, that are blobs and/or does not appear in any query.

There are two options to implement it:

  1. Index all attributes by default and add ability to disable indexation for specified attributes
  2. Do not index anything by default and add ability to enable indexation for specified attributes

It's unclear what is expected behavior for queries, that use not indexed attributes.

Remotes: init

Implement ability to initialize remote storage with local storage

Fuzzer

Implement fuzzer for qbit. It's a test, that generates random "program" (set of inserts, updates, deletes and selects), "executes" it and tries to evaluate if it get expected results

Storage format: Entity vs List<Eav>

Actual behavior: trx log is stored as List[Eav].

This solution has several disadvantages:

  1. Redundant storage of Gids and attribute names
  2. Unclear how to implement (nested) Map attributes
  3. Unclear how to support empty lists

As an alternative, trx may be stored as List[Entity]. Consider what is disadvantages of this solution and choose and implement one.

Probably it's worth to store trx as Map[EntityCollection,List[Entity]] (see #51).

CRDTs

Implement support of crdt semantics for attributes. It's necessary to consider options and choose the best for expression of the fact, that given attribute should have crdt semantics

Investigate caching of mutable entities

Actual behavior: mutable entities are cached and returned to client as is. It introduces subtle bug in querying - queries are executed against original state of entity, but returns only ids, that then is pulled from db, using cache. So if mutated entity may still be returned in query that is matched by it original state, or vice versa, mutated entity may not be returned in query that is matched by it mutated state.

Add cloning of cached entities, before return it to client

Reverse fetch

Implement ability to fetch entities that are referencing to given entity via given attribute.

To implement this feature, new vae index should be added, that stores only eavs for reference attributes

Forbit manually created gids

Actual behavior: user can pass arbitrary value for entity gid

If this value will be greater, than current value of instance gids generator, this will lead to collisions.

So qbit should forbid storing of entities with gid >= current value of instance gids generator.

Additionally when #21 will be implemented, qbit should control, that type for given gid isn't changes

See FunTest.Qbit should forbid externally generated gids

Related: #60 #27

Connection urls

Actual behavior: user should manually instantiate storage and pass it to qbit

To decouple user from implementation implement qbit connection factory function that accepts string parameter (kind of "qbit:fs://tmp/qbit/mydb") and internally creates corresponding instance of storage.

Implementation should also provide Storage implementation registration end point, so user, who implemented custom Storage may register it.

This feature requires some way to automatically discover service providers and I'm not sure it may be implemented on Native and JS platforms.

CryptoStorage

Implement decorator for Storage, that will encrypt/decrypt data on saving and fetching.

Remotes: add remote

Implement ability to add remote storage to qbit

Looks like information about remotes should be stored outside of db, because it's unclear how to implement #32 if remotes will be stored in db.

Another problem is IID aquiring. To initialize instance, qbit should acquire iid, but this iid should be unique, so some protocol for iid acquiring should be developed

Yet another problem is how to organize synchronization. Current proposal is follows:

  1. Each instance have branch and only that instance can update this branch
  2. While sync instance fetches all data from remotes and all branches. If instance detect conflict, it merges it, sets it's branch to merge node and pushes it into remote
  3. While sync another instance it will fetch new value of first instance branch and be able to fast forward it's branch to merge

MPP: js/node

Implement platform for js/node target

  • WeakHashMap
  • ByteArray.asInput
  • currentTimeMillis
  • SHA-1
  • assert

Migration

Develop migration API and guidelines

Store entity types

Actual behavior: qbit exposes only typed API but do not actually now type of entity

Because of it's possible to store entity of another type for existing gid and qbit cannot prevent it.

To solve this issue, qbit can store type information about schema generated by classes, add to entities internal attribute with ref to type information and check it on persistance

Related: #51

Recover pitests

There was integration with pitest, when qbit was jvm-only project, that was lost while migration to multiplatform.

Persistent index

Actual behavior: qbit builds inmemory index for full trx log every time when opening connection

This approach works for small dbs, but obviously it wouldn't work for large dbs.

To mitigate this issue, qbit may build and store persistent index from time to time and then when opening connection build in memory index only for tail of trx log.

Then, while querying qbit will merge data from in memory index with data from persisted index.

Tombstones should be kept only in inmemory index and should be filtered out on persisting index.

There are two options for index organization to consider:

  1. B-Tree. B-Tree is already implemented (see cdbf347), it should be generally usable, but I failed to implement bulk replace operation - operation that in single pass replaces given set of item by new ones. This operation is important for immutable structure, but may be not such important and may be replaced by removeAll and addAll operations
  2. SSTable. It's necessary to investigate this approach and compare it with b-tree

There are another structures (at least hashes) for indexes persistent storage

Also it's unclear now which behavior is expected in cases when persistent index is build for one branch of trx log with concurrent modifications.

Clean up on rollback

Actual behavior: when committing trx, qbit first writes trx to storage and then tries to update head. But in current implementation, if head is updated concurrently, qbit rollback transaction. But do not remove trx created on first step.

Desired behavior: qbit removes rollbacked trx from storage.

See TrxTest.In case of transaction commit abort, transaction should clean up written data

Remotes: autosync

Consider addition of ability for qbit to automatically performs syncs. Possible events to sync:

  1. Each N seconds
  2. On begin of each trx
  3. On commit of each trx
  4. On each query

Storage format: full entity vs patch

Actual behavior: entity is stored fully in trx log.

This solution has at least two disadvantages:

  • It requires more space for storage
  • It requires pulling of full entity (which potentially large) to update even single little attribute

But it has also and at least one advantage:

  • Entity at state of given time may be read from single transaction

Investigate how and why other storage systems (including git) stores theirs transaction logs and consider switch to storing only patches

Related to #6 #8

Trx flush

Actual behavior: currently qbit stores of trx data in memory until commit

It's not appropriate for bulk inserts - they may not fit into memory. Consider adding api for transaction flushing to storage.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.