marten-de-vries / chairdb Goto Github PK
View Code? Open in Web Editor NEWA small CouchDB-compatible database with sync support
License: Apache License 2.0
A small CouchDB-compatible database with sync support
License: Apache License 2.0
Write a compatibility layer on top of map/reduce.
Doing it serially (as is currently the case) might block, as only a single attachment is guaranteed to be readable at any time. (That's to make an HTTPDatabase implementation possible). See for an example how the in-memory database handles it.
The first step would be to extend doc_to_couchdb_json
to handle this. Using inline attachments makes sense when the total attachment size for a document is not too big (<10kb? <100kb? Might be worth to look up what CouchDB does), but otherwise it should send all attachments using multipart with follow=true. My suggestion would be to make _single_response
work first, and then to make multi_response
call _single_response
internally multiple times and wrap the output in a multipart response.
Bonus points if it doesn't require multipart and can also return a JSON array like CouchDB when the requested content type is application/json.
By firing of a request to /doc/attachment. Preferably in parallel to the 'main' retrieval.
Currently there's just assert att_names is None
There's basic map/reduce support, but it implements aggregation using a table scan. That doesn't scale.
It would be nice to optimize it. The 'overlay indexes' described by Pennino, Pizzonia and Papi (2019) seem like a nice fit. See for example code: https://github.com/kdbtree/kdbtree/ .
A 'simple' skiplist might be easier, but it's not as elegant, as it requires more roundtrips to the backend database.
Diego Pennino, Maurizio Pizzonia, Alessio Papi. Overlay Indexes: Efficiently Supporting Aggregate Range Queries and Authenticated Data Structures in Off-the-Shelf Databases. IEEE Access. 7:175642-175670. 2019.
Follow-up issue of #8.
This would be nice to support. As a first step, it could simply not send data that the client doesn't request. But it would be nice to extend that to also not retrieving such data from disk. That might require a bit more work, though.
Generating some API docs would be nice.
Might be interesting to investigate whether the performance is similar. If so, then replacing _bulk_docs calls in HTTPDatabase with PUTs everywhere would probably simplify things. And also, implementing _bulk_get sometime would not be necessary.
For the 'local' databases, rows are processed when they arrive as write()
input, so batching is irrelevant. But for HTTP APIs, the single monster-call to _bulk_docs
needs to be replaced with multiple ones during a big replication to prevent the request size from growing too much. Also, we don't want bulk_docs to wait for _changes() when it's hanging just because continuous=true.
Solving the former issue is relatively simple: just break up the input stream based on some size criterium. The latter issue is more interesting. A nice approach might be to remove the continuous parameter from the changes() function, and instead provide the user with an event that notifies a database change. A little less user-friendly for users that are not the replicator, but it could easily be wrapped in a higher-level API and would add flexibility.
Additionally, it would allow the HTTPDatabase to listen to the _global_changes endpoint instead of the _changes endpoint (when available). When lots of HTTPDatabases are opened, they could share a single changes listener. It should scale much better. (It's the main trick of spiegel.)
Finally, it might be good to check whether batching is required for _revs_diff too. And to handle timeouts for _changes()/_global_changes()
Surprisingly easy to do with the current rev tree implementation and probably useful. It's probably the most-asked open feature for PouchDB, so I think it's worth doing.
The current implementation does what's most convenient, but long term, this should mirror whatever CouchDB decides to do:
Make HTTPDatabase retry requests that failed. Make sure the replicator bubbles up any persisting errors, but make sure it handles forbidden/unauthorized etc. errors correctly.
Make sure SQLDatabase, InMemoryDatabase and HTTPRemote return errors in the same style.
Convert errors to the right JSON errors in chairdb.server
.
Finally, count_docs
in the replicator currently has a weird check required for skimdb
. That's worth investigating further.
Think about when to set the database schema. create()
similar to remote()? Or just whenever a method is called? Or legalize the current decision to misuse context managers?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.