Comments (8)
It's not so much a switch as it is a "remove". Scroll has always been in there, but scan was "true" by default. I'll be removing scan
as an option.
from logstash-input-elasticsearch.
It's a few things that I can see:
- remove the
scan
option - change that
size
is now per overall response and now per shard - default to sorting the scroll by
_doc
which gives previous scan behavior
from logstash-input-elasticsearch.
default to sorting the scroll by
_doc
which gives previous scan behavior
I'm not sure that's completely necessary, as Logstash users are accustomed to handling data in an asynchronous manner. If the documents are out of order, what's the problem? But can't that be added to the query body anyway, if desired? Having to auto-merge that into a query would actually complicate things, I think.
change that
size
is now per overall response and now per shard
The now
s are confusing me. Since size
certainly was per shard, did you mean that size
is now per overall response and not per shard? Where is this change documented? I can't seem to find it.
from logstash-input-elasticsearch.
Okay, I see this:
Scroll requests have optimizations that make them faster when the sort order is _doc. If you want to iterate over all documents regardless of the order, this is the most efficient option:
So I will add the sort by _doc
to the default query, with documentation indicating that it should be added to other queries.
from logstash-input-elasticsearch.
I'm not sure that's completely necessary, as Logstash users are accustomed to handling data in an asynchronous manner. If the documents are out of order, what's the problem? But can't that be added to the query body anyway, if desired?
It has nothing to do with that, it is for performance (similar to what scan did).
The nows are confusing me. Since size certainly was per shard, did you mean that size is now per overall response and not per shard?
Sorry typo on my part, the second now
should be a not
.
Where is this change documented? I can't seem to find it.
I was never able to find it either, but when I've tested those have been my findings.
from logstash-input-elasticsearch.
I just found it: https://www.elastic.co/guide/en/elasticsearch/reference/5.0/search-uri-request.html
size
The number of hits to return. Defaults to 10.
from logstash-input-elasticsearch.
Hmmm. It says that in older versions too. Perhaps "per shard" is implied in the older versions.
from logstash-input-elasticsearch.
Here's the answer from an internal discussion:
SCAN
used to be an exception: in all other cases (simple search, regular scrolls), search operations return ${size} documents per page - onlySCAN
would return ${size}x${num_shards}. So now that SCAN is gone and that we recommend to use the_search/scroll
API and to sort by_doc
in order to fetch all documents from an index, calls to the search API always return ${size} documents per page.
That means that scan
is the only case where size
was per shard. So, changing the documentation here shouldn't be necessary as it currently says:
# This allows you to set the maximum number of hits returned per scroll.
config :size, :validate => :number, :default => 1000
from logstash-input-elasticsearch.
Related Issues (20)
- Cannot get new connection from pool when same pipeline read and write to same index HOT 7
- [Doc] Update links to new cloud content in Logstash Reference
- Clearing a scroll occasionally raises a 'too_long_frame_exception' HOT 7
- Allow the use of the size parameter inside a query
- investigate plugin CI failures (against snapshots)
- plugin fails to start with default hosts setting
- Use "search_after" instead of "scroll" HOT 1
- Improve UX of connection+product validation at register
- Test failing with JNR error: getprotobyname_r failed
- regression on Manticore 0.8.0 due port being part of host
- custom user agent no longer set since LS 7.16 HOT 1
- Add `schedule_every` setting HOT 1
- Failure to create an event from the payload can crash the plugin
- Add a "sincedb" type of mecanism HOT 1
- Hello, how can logstash-input-elasticsearch support the ignore_unavailable function of elasticsearch? HOT 1
- Results from all scrolls for each slice are stored in memory leading to OOM HOT 1
- The plugin doesn't work with Elasticsearch on cloud `cloud_id` & `api_key` settings.
- BREAKING: ssl_certificate_verification => true uses deprecated verifier since v4.17.0
- Add the ability to select between GET and POST method for queries, in order to support also Elasticsearch servers behind Load Balancers
- Need cleaning up resources for es-ruby client when pipeline restart
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from logstash-input-elasticsearch.