Git Product home page Git Product logo

Comments (8)

untergeek avatar untergeek commented on August 16, 2024

It's not so much a switch as it is a "remove". Scroll has always been in there, but scan was "true" by default. I'll be removing scan as an option.

from logstash-input-elasticsearch.

djschny avatar djschny commented on August 16, 2024

It's a few things that I can see:

  • remove the scan option
  • change that size is now per overall response and now per shard
  • default to sorting the scroll by _doc which gives previous scan behavior

from logstash-input-elasticsearch.

untergeek avatar untergeek commented on August 16, 2024

default to sorting the scroll by _doc which gives previous scan behavior

I'm not sure that's completely necessary, as Logstash users are accustomed to handling data in an asynchronous manner. If the documents are out of order, what's the problem? But can't that be added to the query body anyway, if desired? Having to auto-merge that into a query would actually complicate things, I think.

change that size is now per overall response and now per shard

The nows are confusing me. Since size certainly was per shard, did you mean that size is now per overall response and not per shard? Where is this change documented? I can't seem to find it.

from logstash-input-elasticsearch.

untergeek avatar untergeek commented on August 16, 2024

Okay, I see this:

Scroll requests have optimizations that make them faster when the sort order is _doc. If you want to iterate over all documents regardless of the order, this is the most efficient option:

So I will add the sort by _doc to the default query, with documentation indicating that it should be added to other queries.

from logstash-input-elasticsearch.

djschny avatar djschny commented on August 16, 2024

I'm not sure that's completely necessary, as Logstash users are accustomed to handling data in an asynchronous manner. If the documents are out of order, what's the problem? But can't that be added to the query body anyway, if desired?

It has nothing to do with that, it is for performance (similar to what scan did).

The nows are confusing me. Since size certainly was per shard, did you mean that size is now per overall response and not per shard?

Sorry typo on my part, the second now should be a not.

Where is this change documented? I can't seem to find it.

I was never able to find it either, but when I've tested those have been my findings.

from logstash-input-elasticsearch.

untergeek avatar untergeek commented on August 16, 2024

I just found it: https://www.elastic.co/guide/en/elasticsearch/reference/5.0/search-uri-request.html

size The number of hits to return. Defaults to 10.

from logstash-input-elasticsearch.

untergeek avatar untergeek commented on August 16, 2024

Hmmm. It says that in older versions too. Perhaps "per shard" is implied in the older versions.

from logstash-input-elasticsearch.

untergeek avatar untergeek commented on August 16, 2024

Here's the answer from an internal discussion:

SCAN used to be an exception: in all other cases (simple search, regular scrolls), search operations return ${size} documents per page - only SCAN would return ${size}x${num_shards}. So now that SCAN is gone and that we recommend to use the _search/scroll API and to sort by _doc in order to fetch all documents from an index, calls to the search API always return ${size} documents per page.

That means that scan is the only case where size was per shard. So, changing the documentation here shouldn't be necessary as it currently says:

  # This allows you to set the maximum number of hits returned per scroll.
  config :size, :validate => :number, :default => 1000

from logstash-input-elasticsearch.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.