Git Product home page Git Product logo

elasticsearch-topk-plugin's Introduction

Disclaimer: While we're not using ElasticSearch for Algolia's hosted full-text, numerical & faceted search engine; we're using it for internal analytics (faceting over billions of log lines generated by our engine, no full-text search).

Elasticsearch Top-K Plugin

This plugin extends Elasticsearch providing a fast & memory-efficient aggregation statistically retrieving the Top-K elements of a field. The field can be either string, numerical or boolean. The plugin registers a new type of aggregation (topk).

This plugin is a temporary replacement of #6697.

We love pull-requests!

Prerequisites:

  • Elasticsearch 1.3.0+

Binaries

  • Compiled versions of the plugin are stored in the dist directory.

Why

The default terms aggregations implementations use an amount of memory that is linear with the cardinality of the value source they run on. Things get even worse when using sub aggregations, especially the memory-intensive ones such as percentiles, cardinality, top_hits or bucket aggregations. This plugin is based on the Space-Saving algorithm, which try to detect the most frequent terms with a fixed (configurable) number of counters.

Principle

This plugin uses the StreamSummary data structure provided by the Stream-lib library to compute the top-k values of a field. Basically, it retrieves the most frequent terms of a field without loading all of them (and their associated sub aggregations) into RAM. The merge between shards and between indices is supported but might introduce accuracy issues: this is the general trade-off of this algorithm.

Usage

To build an aggregation keeping the top-k elements of a field, use the following code:

{
  "aggregations": {
    "<aggregation_name>": {
      "topk": {
        "field": "<field_name>",
        "size": 10
      }
    }
  }
}

For example, to keep the 100 most frequent values of your "ip" field, use:

{
  "aggregations": {
    "top_ips": {
      "topk": {
        "field": "ip",
        "size": 100
      }
    }
  }
}
{
  "aggregations": {
    "top_ips": {
      "buckets": [
        { "key": "1.2.3.4", "doc_count": 62718 },
        { "key": "5.6.7.8", "doc_count": 54233 },
        [...]
        { "key": "1.6.3.8", "doc_count": 12123 },
      ]
    }
  }
}

Setup

Installation

./plugin --url file:///absolute/path/to/elasticsearch-topk-plugin-LATEST.zip --install topk-aggregation

Uninstallation

./plugin --remove topk-aggregation

elasticsearch-topk-plugin's People

Contributors

redox avatar elpicador avatar nagriar avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.