Git Product home page Git Product logo

lucene-s3directory's Introduction

lucene-s3directory

⚠️ EXPERIMENTAL ⚠️

This is a Lucene Directory implementation for AWS S3. It stores indices in S3 buckets instead of the local file system. This is just a proof of concept for now and is not suitable for production use.

Motivation

The project was inspired by Shay Banon (kimchy), creator of Elasticsearch and Compass. It is a direct fork of his JdbcDirectory which is part of Compass.

Back in 2007, Shay wrote about the idea of Lucene-to-S3 integration in his blog post:

I spent some time trying to have the ability to store Lucene index on Amazon S3 service. Amazon S3 is a really cool idea, and having the ability to store Lucene index on top of it will provide a simple way to allow storing Lucene index in a distributed environment supporting HA. It will also make a lot of sense for applications deployed on Amazon EC2, since working with S3 from EC2 is free.

But back then S3 did not support locking so he scrapped the implementation:

It would be great if the good people at Amazon would allow for simple locking support. I understand that this is not simple to do in a distributed environment, but it must be there in some form, it will make S3 much a more attractive offer.

Since late 2018 S3 supports locking. The S3Directory uses legal hold locks on write.lock files. The AWS Java SDK v2.0 is used for that reason.

Getting started

Requirements:

  • Java 1.8+
  • Lucene 7.6+ compatible

To build the project:

mvn -DskipTests=true clean install

Usage:

S3Directory dir = new S3Directory("my-lucene-index");
dir.create();

// use it in your code in place of FSDirectory, for example

// finally
dir.close();
dir.delete();

To run the integration tests, you'll need to have a valid AWS profile configured on your system. The tests will run against the real S3 service on AWS.

Performance

Performance is not great. Each request to AWS takes a lot of time - TLS handshake, signature calculation, etc. I tried to do my best to optimize the code but I'm sure it can be optimized further. Contributions are welcome.

S3DirectoryBenchmarkITest.java:

RAMDirectory Time: 312 ms
FSDirectory Time : 136 ms
S3Directory Time : 3846 ms

License

Apache 2.0

lucene-s3directory's People

Contributors

albogdano avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.