Git Product home page Git Product logo

tantivy-object-store's Introduction

Tantivy Object Store

This repo contains an implementation of a tantivy::directory::Directory using an object_store::ObjectStore.

This implementation supports both read and write, but does not support locking or file watch. The index building process is responsible for making sure that there are no concurrent index writers.

A few notable behavior differences from tantivy's directory implementations:

Versioning

Tantivy uses a file called meta.json which is a list of all the files that make up the index, effectively keeping track of a snapshot of the index. However, vanilla tantivy doesn't support versioning, meaning every time we update the index, meta.json is overwritten. This PR allows the caller to set a read_version and write_version. These version numbers are appended to the end of the file name when caller attempts to atomic_read or atomic_write.

Copy on Write (CoW)

When creating a ObjectStoreDirectory, user may set read_version and write_version. read_version is used when user calls atomic_read. Instead of reading meta.json, we will try to read meta.json.{read_version}. Same when user calls atomic_write, we will try to write meta.json.{write_version}

NOTE: The write_version take precedence over read_version. This means, after first write, atomic_read will read from meta.json.{write_version} NOT meta.json.{read_version}. This is needed because tantivy modifies meta.json file quite a few times during indexing, the CoW impl here needs have read-after-write consistency.

Index Reloading

This implementation does not support reloading. If a watch callback is registered, the callback will never be called. User needs to handle reloading via other mechanisms for now.

It maybe possible to use something like object store's native change notification to trigger reload, but that's for future work.

Deletion

Since tantivy attempts to garbage collect and merge index files during indexing, we had to change delete operation to noop. This is because we don't want tantivy to garbage collect files from past versions, as those files maybe in use by other readers. We will implement a garbage collection processes separately.

Threading

This implementation contains a tokio::Runtime for running the IO jobs. This means, when calling functions from this implementation from inside another tokio runtime the caller should always use tokio::task::spawn_blocking so the task can be scheduled on a thread without tokio runtime. (This is needed because nesting tokio runtimes causes panic)

Concurrency Safety

This implementation is concurrency safe within a single instance, as the atomic_read|write mechanism is a lock object in the returned trait object.

Performance

TBD

tantivy-object-store's People

Contributors

albertlockett avatar chebbychefneq avatar eddyxu avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

tantivy-object-store's Issues

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.