Git Product home page Git Product logo

chairdb's People

Contributors

marten-de-vries avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

chairdb's Issues

sql.py parallelize attachment reading in _create_doc_ptr

Doing it serially (as is currently the case) might block, as only a single attachment is guaranteed to be readable at any time. (That's to make an HTTPDatabase implementation possible). See for an example how the in-memory database handles it.

Support non-inline attachments in the HTTP API

The first step would be to extend doc_to_couchdb_json to handle this. Using inline attachments makes sense when the total attachment size for a document is not too big (<10kb? <100kb? Might be worth to look up what CouchDB does), but otherwise it should send all attachments using multipart with follow=true. My suggestion would be to make _single_response work first, and then to make multi_response call _single_response internally multiple times and wrap the output in a multipart response.

Bonus points if it doesn't require multipart and can also return a JSON array like CouchDB when the requested content type is application/json.

Implement efficient range queries

There's basic map/reduce support, but it implements aggregation using a table scan. That doesn't scale.

It would be nice to optimize it. The 'overlay indexes' described by Pennino, Pizzonia and Papi (2019) seem like a nice fit. See for example code: https://github.com/kdbtree/kdbtree/ .

A 'simple' skiplist might be easier, but it's not as elegant, as it requires more roundtrips to the backend database.

Diego Pennino, Maurizio Pizzonia, Alessio Papi. Overlay Indexes: Efficiently Supporting Aggregate Range Queries and Authenticated Data Structures in Off-the-Shelf Databases. IEEE Access. 7:175642-175670. 2019.

Follow-up issue of #8.

HTTP api attachment range requests

This would be nice to support. As a first step, it could simply not send data that the client doesn't request. But it would be nice to extend that to also not retrieving such data from disk. That might require a bit more work, though.

HTTP 2/3: PUT instead of _bulk_docs?

Might be interesting to investigate whether the performance is similar. If so, then replacing _bulk_docs calls in HTTPDatabase with PUTs everywhere would probably simplify things. And also, implementing _bulk_get sometime would not be necessary.

Better batching & using _global_changes

For the 'local' databases, rows are processed when they arrive as write() input, so batching is irrelevant. But for HTTP APIs, the single monster-call to _bulk_docs needs to be replaced with multiple ones during a big replication to prevent the request size from growing too much. Also, we don't want bulk_docs to wait for _changes() when it's hanging just because continuous=true.

Solving the former issue is relatively simple: just break up the input stream based on some size criterium. The latter issue is more interesting. A nice approach might be to remove the continuous parameter from the changes() function, and instead provide the user with an event that notifies a database change. A little less user-friendly for users that are not the replicator, but it could easily be wrapped in a higher-level API and would add flexibility.

Additionally, it would allow the HTTPDatabase to listen to the _global_changes endpoint instead of the _changes endpoint (when available). When lots of HTTPDatabases are opened, they could share a single changes listener. It should scale much better. (It's the main trick of spiegel.)

Finally, it might be good to check whether batching is required for _revs_diff too. And to handle timeouts for _changes()/_global_changes()

Implement _purge

Surprisingly easy to do with the current rev tree implementation and probably useful. It's probably the most-asked open feature for PouchDB, so I think it's worth doing.

Better error handling

Make HTTPDatabase retry requests that failed. Make sure the replicator bubbles up any persisting errors, but make sure it handles forbidden/unauthorized etc. errors correctly.

Make sure SQLDatabase, InMemoryDatabase and HTTPRemote return errors in the same style.

Convert errors to the right JSON errors in chairdb.server.

Finally, count_docs in the replicator currently has a weird check required for skimdb. That's worth investigating further.

sqlite: context manager for DB creation?

Think about when to set the database schema. create() similar to remote()? Or just whenever a method is called? Or legalize the current decision to misuse context managers?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.