Version: v5.0.0-rc1 Operating System: Mac 10.12 <p dir="auto

It's a few things that I can see: remove the <code class="notr

I just found it: <a href="https://www.elastic.co/guide/en/elasticsearch/reference/5.0/

Here's the answer from an internal discussion: <code cl

switch from scan to scroll about logstash-input-elasticsearch HOT 8 CLOSED

logstash-plugins commented on August 16, 2024

switch from scan to scroll

from logstash-input-elasticsearch.

Comments (8)

untergeek commented on August 16, 2024

It's not so much a switch as it is a "remove". Scroll has always been in there, but scan was "true" by default. I'll be removing scan as an option.

from logstash-input-elasticsearch.

djschny commented on August 16, 2024

It's a few things that I can see:

remove the scan option
change that size is now per overall response and now per shard
default to sorting the scroll by _doc which gives previous scan behavior

from logstash-input-elasticsearch.

untergeek commented on August 16, 2024

default to sorting the scroll by _doc which gives previous scan behavior

I'm not sure that's completely necessary, as Logstash users are accustomed to handling data in an asynchronous manner. If the documents are out of order, what's the problem? But can't that be added to the query body anyway, if desired? Having to auto-merge that into a query would actually complicate things, I think.

change that size is now per overall response and now per shard

The nows are confusing me. Since size certainly was per shard, did you mean that size is now per overall response and not per shard? Where is this change documented? I can't seem to find it.

from logstash-input-elasticsearch.

untergeek commented on August 16, 2024

Okay, I see this:

Scroll requests have optimizations that make them faster when the sort order is _doc. If you want to iterate over all documents regardless of the order, this is the most efficient option:

So I will add the sort by _doc to the default query, with documentation indicating that it should be added to other queries.

from logstash-input-elasticsearch.

djschny commented on August 16, 2024

I'm not sure that's completely necessary, as Logstash users are accustomed to handling data in an asynchronous manner. If the documents are out of order, what's the problem? But can't that be added to the query body anyway, if desired?

It has nothing to do with that, it is for performance (similar to what scan did).

The nows are confusing me. Since size certainly was per shard, did you mean that size is now per overall response and not per shard?

Sorry typo on my part, the second now should be a not.

Where is this change documented? I can't seem to find it.

I was never able to find it either, but when I've tested those have been my findings.

from logstash-input-elasticsearch.

untergeek commented on August 16, 2024

I just found it: https://www.elastic.co/guide/en/elasticsearch/reference/5.0/search-uri-request.html

size The number of hits to return. Defaults to 10.

from logstash-input-elasticsearch.

untergeek commented on August 16, 2024

Hmmm. It says that in older versions too. Perhaps "per shard" is implied in the older versions.

from logstash-input-elasticsearch.

untergeek commented on August 16, 2024

Here's the answer from an internal discussion:

SCAN used to be an exception: in all other cases (simple search, regular scrolls), search operations return ${size} documents per page - only SCAN would return ${size}x${num_shards}. So now that SCAN is gone and that we recommend to use the _search/scroll API and to sort by _doc in order to fetch all documents from an index, calls to the search API always return ${size} documents per page.

That means that scan is the only case where size was per shard. So, changing the documentation here shouldn't be necessary as it currently says:

  # This allows you to set the maximum number of hits returned per scroll.
  config :size, :validate => :number, :default => 1000

from logstash-input-elasticsearch.

switch from scan to scroll about logstash-input-elasticsearch HOT 8 CLOSED

Comments (8)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent