Git Product home page Git Product logo

redisearch's Introduction

RediSearch

Full-Text search over redis by RedisLabs

Overview

Redisearch impements a search engine on top of redis, but unlike other redis search libraries, it does not use internal data structures like sorted sets.

Inverted indexes are stored on top of Redis strings using binary encoding, and not mapped to existing data structures (see DESIGN.md).

This allows much faster performance, significantly less memory consumption, and more advanced features like exact phrase matching, that are not possible with traditional redis search approaches.

Primary Features:

  • Full-Text indexing of multiple fields in documents.
  • Incremental indexing without performance loss.
  • Document ranking (provided manually by the user at index time).
  • Field weights.
  • Auto-complete suggestions (with fuzzy prefix suggestions)
  • Exact Phrase Search of up to 8 words.
  • Stemming based query expansion in many languages (using Snowball).
  • Limiting searches to specific document fields (up to 8 fields supported).
  • Numeric filters and ranges.
  • Supports any utf-8 encoded text.
  • Retrieve full document content or just ids
  • Automatically index existing HASH keys as documents.

Not yet supported:

  • Geo filters.
  • NOT queries (foo -bar).
  • Spelling correction
  • Full boolean query syntax
  • Aggregations
  • Deletion and Updating (without full index rebuild)

License: AGPL

Which basically means you can freely use this for your own projects without "virality" to your code, as long as you're not modifying the module itself.

Note About Stability

RediSearch is still under development and can be considered Alpha. While we've tested it extensively with big data-sets and very high workloads, and it is very stable - the API itself may change. You are welcome to use it, but keep in mind future versions might change things.

Internal Design

See DESIGN.md for technical details about the internal design of the module.

Building and running:

git clone https://github.com/RedisLabsModules/RediSearch.git
cd RediSearch/src
make all

# Assuming you have a redis build from the unstable branch:
/path/to/redis-server --loadmodule ./module.so

Quick Guide:

  • Creating an index with fields and weights:
127.0.0.1:6379> FT.CREATE myIdx title TEXT 5.0 body TEXT 1.0 url 1.0
OK 

  • Adding documents to the index:
127.0.0.1:6379> FT.ADD myIdx doc1 1.0 fields title "hello world" body "lorem ipsum" url "http://redis.io" 
OK
  • Searching the index:
127.0.0.1:6379> FT.SEARCH myIdx "hello world" LIMIT 0 10
1) (integer) 1
2) "doc1"
3) 1) "title"
   2) "hello world"
   3) "body"
   4) "lorem ipsum"
   5) "url"
   6) "http://redis.io"

NOTE: Input is expected to be valid utf-8 or ascii. The engine cannot handle wide character unicode at the moment.

  • Dropping the index:
127.0.0.1:6379> FT.DROP myIdx
OK
  • Adding and getting Auto-complete suggestions:
127.0.0.1:6379> FT.SUGADD autocomplete "hello world" 100
OK

127.0.0.1:6379> FT.SUGGET autocomplete "he"
1) "hello world"


Command details

FT.CREATE index field1 weight1 [field2 weight2 ...]

Creates an index with the given spec. The index name will be used in all the key names so keep it short!

Parameters:

- index: the index name to create. If it exists the old spec will be overwritten

- field / weight pairs: pairs of field name and relative weight in scoring. 
The weight is a double, but does not need to be normalized.

Returns:

OK or an error


FT.ADD index docId score [LANGUAGE lang] [NOSAVE] FIELDS ....

Add a documet to the index.

Parameters:

- index: The Fulltext index name. The index must be first created with FT.CREATE

- docId: The document's id that will be returned from searches. 
  Note that the same docId cannot be added twice to the same index

- score: The document's rank based on the user's ranking. This must be between 0.0 and 1.0. 
  If you don't have a score just set it to 1

- NOSAVE: If set to true, we will not save the actual document in the index and only index it.

- FIELDS: Following the FIELDS specifier, we are looking for pairs of <field> <text> to be indexed.

  Each field will be scored based on the index spec given in FT.CREATE. 
  Passing fields that are not in the index spec will make them be stored as part of the document, or ignored if NOSAVE is set 

- LANGUAGE lang: If set, we use a stemmer for the supplied langauge during indexing. Defaults to English. 
  If an unsupported language is sent, the command returns an error. 
  The supported languages are:

  > "arabic",  "danish",    "dutch",   "english",   "finnish",    "french",
  > "german",  "hungarian", "italian", "norwegian", "portuguese", "romanian",
  > "russian", "spanish",   "swedish", "tamil",     "turkish"

Returns

OK on success, or an error if something went wrong.


FT.ADDHASH index docId score [LANGUAGE lang]

Add a documet to the index from an existing HASH key in Redis.

Parameters:

- index: The Fulltext index name. The index must be first created with FT.CREATE

- docId: The document's id. This has to be an existing HASH key in redis that will hold the fields 
  the index needs.

- score: The document's rank based on the user's ranking. This must be between 0.0 and 1.0. 
  If you don't have a score just set it to 1

- LANGUAGE lang: If set, we use a stemmer for the supplied langauge during indexing. Defaults to English. 
  If an unsupported language is sent, the command returns an error. 
  The supported languages are:

  > "arabic",  "danish",    "dutch",   "english",   "finnish",    "french",
  > "german",  "hungarian", "italian", "norwegian", "portuguese", "romanian",
  > "russian", "spanish",   "swedish", "tamil",     "turkish"

Returns

OK on success, or an error if something went wrong.


FT.SEARCH index query [NOCONTENT] [VERBATIM] [LANGUAGE lang] [LIMIT offset num] [INFIELDS num field ...] [FILTER numeric_field min max]

Seach the index with a textual query, returning either documents or just ids.

Parameters:

- index: The Fulltext index name. The index must be first created with FT.CREATE

- query: the text query to search. If it's more than a single word, put it in quotes.
Basic syntax like quotes for exact matching is supported.

- NOCONTENT: If it appears after the query, we only return the document ids and not 
the content. This is useful if rediseach is only an index on an external document collection

- LIMIT fist num: If the parameters appear after the query, we limit the results to 
the offset and number of results given. The default is 0 10

- INFIELDS num field1 field2 ...: If set, filter the results to ones appearing only in specific
fields of the document, like title or url. num is the number of specified field arguments

- FILTER numeric_field min max: If set, and numeric_field is defined as a numeric field in 
FT.CREATE, we will limit results to those having numeric values ranging between min and max.
min and max follow ZRANGE syntax, and can be -inf, +inf and use `(` for exclusive ranges.


- WITHSCORES: If set, we also return the relative internal score of each document. this can be
used to merge results from multiple instances

- VERBATIM if set, we do not try to use stemming for query expansion but search the query terms verbatim.

- LANGUAGE lang: If set, we use a stemmer for the supplied langauge during search for query expansion. 
  Defaults to English. If an unsupported language is sent, the command returns an error.
   
  The supported languages are:

  > "arabic",  "danish",    "dutch",   "english",   "finnish",    "french",
  > "german",  "hungarian", "italian", "norwegian", "portuguese", "romanian",
  > "russian", "spanish",   "swedish", "tamil",     "turkish"

Returns:

Array reply, where the first element is the total number of results, and then pairs of document id, and a nested array of field/value, unless NOCONTENT was given


FT.DROP index

Deletes all the keys associated with the index.

If no other data is on the redis instance, this is equivalent to FLUSHDB, apart from the fact that the index specification is not deleted.

Returns:

Simple String reply - OK on success.


FT.OPTIMIZE index

After the index is built (and doesn't need to be updated again withuot a complete rebuild) we can optimize memory consumption by trimming all index buffers to their actual size.

Warning 1: Do not run it if you intend to update your index afterward.

Warning 2: This blocks redis for a long time. Do not run it on production instances

Returns:

Integer Reply - the number of index entries optimized.


FT.SUGGADD key string score [INCR]

Add a suggestion string to an auto-complete suggestion dictionary. This is disconnected from the index definitions, and leaves creating and updating suggestino dictionaries to the user.

Parameters:

  • key: the suggestion dictionary key.

  • string: the suggestion string we index

  • score: a floating point number of the suggestion string's weight

  • INCR: if set, we increment the existing entry of the suggestion by the given score, instead of replacing the score. This is useful for updating the dictionary based on user queries in real time

Returns:

Integer reply: the current size of the suggestion dictionary.


FT.SUGLEN key

Get the size of an autoc-complete suggestion dictionary

Parameters:

  • key: the suggestion dictionary key.

Returns:

Integer reply: the current size of the suggestion dictionary.


FT.SUGGET key prefix [FUZZY] [MAX num]

Get completion suggestions for a prefix

Parameters:

  • key: the suggestion dictionary key.

  • prefix: the prefix to complete on

  • FUZZY: if set,we do a fuzzy prefix search, including prefixes at levenshtein distance of 1 from the prefix sent

  • MAX num: If set, we limit the results to a maximum of num. (Note: The default is 5, and the number cannot be greater than 10).

  • WITHSCORES: If set, we also return the score of each suggestion. this can be used to merge results from multiple instances

Returns:

Array reply: a list of the top suggestions matching the prefix

TODO

See TODO

Stemming Support

RediSearch supports stemming - that is adding the base form of a word to the index. This allows the query for "going" to also return results for "go" and "gone", for example.

The current stemming support is based on the Snowball stemmer library, which supports most European languages, as well as Arabic and other. We hope to include more languages soon (if you need a specicif langauge support, please open an issue).

For further details see the Snowball Stemmer website.

The following languages are supported arguments to search and indexing commands:

  • arabic
  • danish
  • dutch
  • english
  • finnish
  • french
  • german
  • hungarian
  • italian
  • norwegian
  • portuguese
  • romanian
  • russian
  • spanish
  • swedish
  • tamil
  • turkish

redisearch's People

Contributors

dvirsky avatar pfreixes avatar

Watchers

James Cloos avatar Rishabh Puri avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.