Comments (12)
@atsushi-matsui I am still not understanding, could you give me a document you would expect to match and one that wouldn't with your most recent example (thus requiring the feature change)?
I am just trying to confirm the behavior as it still isn't clear to me how omitting a clause is any different than making that clause a match_all.
from elasticsearch.
@atsushi-matsui for your docs, what is the mapping configured? including any custom analyzers please.
Thank you for your patience :). Excluding vs. including vs. match_none vs. match_all is tricky to reason about.
from elasticsearch.
Pinging @elastic/es-search (Team:Search)
from elasticsearch.
Stop words are excluded by the token filter, so we expect zero hits, but all hits are returned
I don't understand this @atsushi-matsui . Omitting a clause is the same as now "matching all docs" given the clause.
In your first example, it seems the following would work fine:
{
"query": {
"bool": {
"must": [
{
"match": {
"title": "Quick",
"zero_terms_query", "all"
}
},
{
"match": {
"title": "the",
"zero_terms_query", "all"
}
},
{
"match": {
"title": "Brown",
"zero_terms_query", "all"
}
},
{
"match": {
"title": "Fox",
"zero_terms_query", "all"
}
}
]
}
}
}
Then in your second example, omitting BOTH clauses (which is what would happen in this case), is the exact same as a match_all
query. Consider the query:
"query": {"bool": {"must": []}}
That is the exact same as a match_all
query.
from elasticsearch.
@benwtrent
Thanks for the reply!!!
Then in your second example, omitting BOTH clauses (which is what would happen in this case), is the exact same as a match_all query. Consider the query:
I understand that the second example is equivalent to match_all, but there are cases where we want to omit the clause, so I'll show you another example.
When building a search system using Elasticsearch in Japan, it is common to prepare kuromoji and a 2-gram analyzer.
Here is a setting example.
{
"settings": {
"analysis": {
"tokenizer": {
"kuromoji_tokenizer": {
"type": "kuromoji_tokenizer",
"mode": "search"
},
"ngram_tokenizer": {
"type": "nGram",
"min_gram": 2,
"max_gram": 2,
"token_chars": ["letter", "digit"]
}
},
"analyzer": {
"kuromoji_analyzer": {
"type": "custom",
"tokenizer": "kuromoji_tokenizer",
"filter": [
"kuromoji_baseform",
"kuromoji_part_of_speech",
"cjk_width",
"stop",
"kuromoji_stemmer",
"lowercase"
]
},
"ngram_analyzer": {
"type": "custom",
"tokenizer": "ngram_tokenizer",
"filter": ["lowercase"]
}
}
}
},
"mappings": {
"properties": {
"text_ja": {
"type": "text",
"analyzer": "kuromoji_analyzer"
},
"text_cjk": {
"type": "text",
"analyzer": "ngram_analyzer"
}
}
}
}
In Japan, it is common to search by entering phrases separated by spaces, so we can construct bool_query using words separated by spaces as phrases.
When we want to search for the anime "遊☆戯☆王", we may sometimes enter "遊 ☆ 戯 ☆ 王" separated by spaces.
At this time, if we include text_ja and text_cjk in the field and set zero_terms_query to all, all results will be hit, which is not a user-friendly result.
{
"query": {
"bool": {
"must": [
{
"multi_match": {
"query": "遊",
"fields": ["text_ja", "text_cjk"],
"type": "phrase",
"zero_terms_query": "all"
}
},
{
"multi_match": {
"query": "☆",
"fields": ["text_ja", "text_cjk"],
"type": "phrase",
"zero_terms_query": "all"
}
},
{
"multi_match": {
"query": "戯",
"fields": ["text_ja", "text_cjk"],
"type": "phrase",
"zero_terms_query": "all"
}
},
{
"multi_match": {
"query": "☆",
"fields": ["text_ja", "text_cjk"],
"type": "phrase",
"zero_terms_query": "all"
}
},
{
"multi_match": {
"query": "王",
"fields": ["text_ja", "text_cjk"],
"type": "phrase",
"zero_terms_query": "all"
}
}
]
}
}
}
If we omit the "☆" in our search, we may find works by "遊☆戯☆王".
Omitting "☆" is the same as removing the "☆" query and setting zero_terms_query to none, as shown below.
{
"query": {
"bool": {
"must": [
{
"multi_match": {
"query": "遊",
"fields": ["text_ja", "text_cjk"],
"type": "phrase",
"zero_terms_query": "none"
}
},
{
"multi_match": {
"query": "戯",
"fields": ["text_ja", "text_cjk"],
"type": "phrase",
"zero_terms_query": "none"
}
},
{
"multi_match": {
"query": "王",
"fields": ["text_ja", "text_cjk"],
"type": "phrase",
"zero_terms_query": "none"
}
}
]
}
}
}
Therefore, I would like bool_query to have a function that omits the clause.
from elasticsearch.
The organization I work for is actually facing this problem.
Even if my proposal is not accepted, I would appreciate it if you could let me know if there is another solution!
from elasticsearch.
@benwtrent
I'm sorry that the issue is difficult to understand.
I will try my best to convey it as accurately as possible.
Register the following data.
If a user searches for "遊☆戯☆王" and enters "遊 ☆," the search system should return only the document in Example 2-1.
If you set zero_terms_query to "all" as in Example 1-1, all documents will be returned, so this is not a desired result.
The cause is likely to be that 2-gram is set for text_cjk and match_all is returned.
If zero_terms_query is set to "none" as in Example 1-2, there will be 0 hits, which is also not a desired result.
The cause is likely to be 0 tokens in text_cjk.
In such a case, it is possible that the document in Example 2-1 can be obtained by omitting the "☆" character that causes the analyzer to set the number of tokens to 0.
In other words, this means that the search is performed only in the valid "遊" field in text_ja.
# queries
### Example 1-1
{
"query": {
"bool": {
"must": [
{
"multi_match": {
"query": "遊",
"fields": ["text_ja", "text_cjk"],
"type": "phrase",
"zero_terms_query": "all"
}
},
{
"multi_match": {
"query": "☆",
"fields": ["text_ja", "text_cjk"],
"type": "phrase",
"zero_terms_query": "all"
}
}
]
}
}
}
### Example 1-2
{
"query": {
"bool": {
"must": [
{
"multi_match": {
"query": "遊",
"fields": ["text_ja", "text_cjk"],
"type": "phrase",
"zero_terms_query": "none"
}
},
{
"multi_match": {
"query": "☆",
"fields": ["text_ja", "text_cjk"],
"type": "phrase",
"zero_terms_query": "none"
}
}
]
}
}
}
# documents
### Example 2-1
{
"text_ja": "遊☆戯☆王",
"text_cjk": "遊☆戯☆王",
"release_date": "2023-01-01",
"views": 123
}
### Example 2-2
{
"text_ja": "ドラゴンボール",
"text_cjk": "ドラゴンボール",
"release_date": "2023-01-01",
"views": 123
}
### Example 2-3
{
"text_ja": "ナルト",
"text_cjk": "ナルト",
"release_date": "2023-01-01",
"views": 123
}
from elasticsearch.
If you set the query as "遊 ☆" in query_string as shown below, it will appear that the search is executed only for "遊".
Although it does not exist in the query_string option, if you check the source code, it appears that the "☆" is omitted because zero_terms_query is set to null.
I would like bool_query to provide a similar option.
{
"query": {
"query_string": {
"query": "遊 ☆",
"default_operator": "AND",
"fields": ["text_ja", "text_cjk"],
"type": "phrase"
}
}
}
from elasticsearch.
for your docs, what is the mapping configured? including any custom analyzers please.
This is my setting used to confirm operation.
{
"settings": {
"analysis": {
"tokenizer": {
"kuromoji_tokenizer": {
"type": "kuromoji_tokenizer",
"mode": "normal"
},
"ngram_tokenizer": {
"type": "ngram",
"min_gram": 2,
"max_gram": 2
}
},
"analyzer": {
"kuromoji_analyzer": {
"type": "custom",
"tokenizer": "kuromoji_tokenizer",
"filter": [
"kuromoji_stemmer",
"lowercase"
]
},
"ngram_analyzer": {
"type": "custom",
"tokenizer": "ngram_tokenizer",
"filter": ["lowercase"]
}
}
}
},
"mappings": {
"properties": {
"text_ja": {
"type": "text",
"analyzer": "kuromoji_analyzer"
},
"text_cjk": {
"type": "text",
"analyzer": "ngram_analyzer"
}
}
}
}
from elasticsearch.
I created a verification environment, so please use it if you like.
https://github.com/atsushi-matsui/sample-elastic
from elasticsearch.
Hi, @benwtrent.
I would like to know if there is any progress.
from elasticsearch.
Pinging @elastic/es-search-relevance (Team:Search Relevance)
from elasticsearch.
Related Issues (20)
- XContentGenerator#copyCurrentEvent does not handle BigInteger and BigDecimal HOT 4
- MixedClusterEsqlSpecIT » test {comparison.RangeVersion SYNC} HOT 2
- [CI] MixedClusterEsqlSpecIT test {comparison.RangeVersion SYNC} failing HOT 2
- ESQL: Allow `_` inside named parameters names HOT 1
- REROUTE processor doesn't work for 2nd and later instances of the same integration and drops documents for theese instances. HOT 7
- Analyze index disk usage API should be asynchronous HOT 1
- [ESQL] create TO_DATE_NANOS function HOT 1
- [ILM] Add support to delete index and searchable snapshot on different schedules HOT 1
- Elasticsearch 8.15 fails to start on Windows with bootstrap.memory_lock HOT 2
- Bulk API returns the very big `took` value instead milliseconds HOT 4
- [ML] Add configurable chunking options to Inference API HOT 1
- [CI] MixedClusterClientYamlTestSuiteIT test {p0=search.vectors/40_knn_search/kNN search with filter in _knn_search endpoint} failing HOT 3
- [CI] MixedClusterClientYamlTestSuiteIT test {p0=search.vectors/40_knn_search/kNN search in _knn_search endpoint} failing HOT 3
- [CI] MixedClusterClientYamlTestSuiteIT test {p0=search.vectors/40_knn_search/kNN search in _knn_search endpoint} failing HOT 3
- [CI] MixedClusterClientYamlTestSuiteIT class failing HOT 1
- [CI] MixedClusterClientYamlTestSuiteIT class failing HOT 1
- [CI] EsIndexSerializationTests testEqualsAndHashcode failing HOT 1
- [Discussion] OpenSearch searchable snapshot set-up in Glacier tier HOT 1
- Restore snapshot logs include too many indices HOT 1
- x-pack-otel plugin follow-up tasks HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from elasticsearch.