Deion Problem When multiple queries are

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Omit noMatchDocs in a bool query about elasticsearch HOT 12 OPEN

atsushi-matsui commented on August 21, 2024

Omit noMatchDocs in a bool query

from elasticsearch.

Comments (12)

benwtrent commented on August 21, 2024 1

@atsushi-matsui I am still not understanding, could you give me a document you would expect to match and one that wouldn't with your most recent example (thus requiring the feature change)?

I am just trying to confirm the behavior as it still isn't clear to me how omitting a clause is any different than making that clause a match_all.

from elasticsearch.

benwtrent commented on August 21, 2024 1

@atsushi-matsui for your docs, what is the mapping configured? including any custom analyzers please.

Thank you for your patience :). Excluding vs. including vs. match_none vs. match_all is tricky to reason about.

from elasticsearch.

elasticsearchmachine commented on August 21, 2024

Pinging @elastic/es-search (Team:Search)

from elasticsearch.

benwtrent commented on August 21, 2024

Stop words are excluded by the token filter, so we expect zero hits, but all hits are returned

I don't understand this @atsushi-matsui . Omitting a clause is the same as now "matching all docs" given the clause.

In your first example, it seems the following would work fine:

{
  "query": {
    "bool": {
      "must": [
        {
          "match": {
            "title": "Quick",
            "zero_terms_query", "all"
          }
        },
        {
          "match": {
            "title": "the",
            "zero_terms_query", "all"
          }
        },
        {
          "match": {
            "title": "Brown",
            "zero_terms_query", "all"
          }
        },
        {
          "match": {
            "title": "Fox",
            "zero_terms_query", "all"
          }
        }
      ]
    }
  }
}

Then in your second example, omitting BOTH clauses (which is what would happen in this case), is the exact same as a match_all query. Consider the query:

"query": {"bool": {"must": []}}

That is the exact same as a match_all query.

from elasticsearch.

atsushi-matsui commented on August 21, 2024

@benwtrent
Thanks for the reply!!!

Then in your second example, omitting BOTH clauses (which is what would happen in this case), is the exact same as a match_all query. Consider the query:

I understand that the second example is equivalent to match_all, but there are cases where we want to omit the clause, so I'll show you another example.

When building a search system using Elasticsearch in Japan, it is common to prepare kuromoji and a 2-gram analyzer.
Here is a setting example.

{
  "settings": {
    "analysis": {
      "tokenizer": {
        "kuromoji_tokenizer": {
          "type": "kuromoji_tokenizer",
          "mode": "search"
        },
        "ngram_tokenizer": {
          "type": "nGram",
          "min_gram": 2,
          "max_gram": 2,
          "token_chars": ["letter", "digit"]
        }
      },
      "analyzer": {
        "kuromoji_analyzer": {
          "type": "custom",
          "tokenizer": "kuromoji_tokenizer",
          "filter": [
            "kuromoji_baseform",
            "kuromoji_part_of_speech",
            "cjk_width",
            "stop",
            "kuromoji_stemmer",
            "lowercase"
          ]
        },
        "ngram_analyzer": {
          "type": "custom",
          "tokenizer": "ngram_tokenizer",
          "filter": ["lowercase"]
        }
      }
    }
  },
  "mappings": {
    "properties": {
      "text_ja": {
        "type": "text",
        "analyzer": "kuromoji_analyzer"
      },
      "text_cjk": {
        "type": "text",
        "analyzer": "ngram_analyzer"
      }
    }
  }
}

In Japan, it is common to search by entering phrases separated by spaces, so we can construct bool_query using words separated by spaces as phrases.
When we want to search for the anime "遊☆戯☆王", we may sometimes enter "遊 ☆ 戯 ☆ 王" separated by spaces.
At this time, if we include text_ja and text_cjk in the field and set zero_terms_query to all, all results will be hit, which is not a user-friendly result.

{
    "query": {
      "bool": {
        "must": [
          {
            "multi_match": {
              "query": "遊",
              "fields": ["text_ja", "text_cjk"],
              "type": "phrase",
              "zero_terms_query": "all"
            }
          },
          {
            "multi_match": {
              "query": "☆",
              "fields": ["text_ja", "text_cjk"],
              "type": "phrase",
              "zero_terms_query": "all"
            }
          },
          {
            "multi_match": {
              "query": "戯",
              "fields": ["text_ja", "text_cjk"],
              "type": "phrase",
              "zero_terms_query": "all"
            }
          },
          {
            "multi_match": {
              "query": "☆",
              "fields": ["text_ja", "text_cjk"],
              "type": "phrase",
              "zero_terms_query": "all"
            }
          },
          {
            "multi_match": {
              "query": "王",
              "fields": ["text_ja", "text_cjk"],
              "type": "phrase",
              "zero_terms_query": "all"
            }
          }
        ]
      }
    }
  }

If we omit the "☆" in our search, we may find works by "遊☆戯☆王".
Omitting "☆" is the same as removing the "☆" query and setting zero_terms_query to none, as shown below.

{
    "query": {
      "bool": {
        "must": [
          {
            "multi_match": {
              "query": "遊",
              "fields": ["text_ja", "text_cjk"],
              "type": "phrase",
              "zero_terms_query": "none"
            }
          },
          {
            "multi_match": {
              "query": "戯",
              "fields": ["text_ja", "text_cjk"],
              "type": "phrase",
              "zero_terms_query": "none"
            }
          },
          {
            "multi_match": {
              "query": "王",
              "fields": ["text_ja", "text_cjk"],
              "type": "phrase",
              "zero_terms_query": "none"
            }
          }
        ]
      }
    }
  }

Therefore, I would like bool_query to have a function that omits the clause.

from elasticsearch.

atsushi-matsui commented on August 21, 2024

The organization I work for is actually facing this problem.
Even if my proposal is not accepted, I would appreciate it if you could let me know if there is another solution!

from elasticsearch.

atsushi-matsui commented on August 21, 2024

@benwtrent
I'm sorry that the issue is difficult to understand.
I will try my best to convey it as accurately as possible.

Register the following data.
If a user searches for "遊☆戯☆王" and enters "遊 ☆," the search system should return only the document in Example 2-1.
If you set zero_terms_query to "all" as in Example 1-1, all documents will be returned, so this is not a desired result.
The cause is likely to be that 2-gram is set for text_cjk and match_all is returned.
If zero_terms_query is set to "none" as in Example 1-2, there will be 0 hits, which is also not a desired result.
The cause is likely to be 0 tokens in text_cjk.
In such a case, it is possible that the document in Example 2-1 can be obtained by omitting the "☆" character that causes the analyzer to set the number of tokens to 0.
In other words, this means that the search is performed only in the valid "遊" field in text_ja.

# queries
### Example 1-1
{
  "query": {
    "bool": {
      "must": [
        {
          "multi_match": {
            "query": "遊",
            "fields": ["text_ja", "text_cjk"],
            "type": "phrase",
            "zero_terms_query": "all"
          }
        },
        {
          "multi_match": {
            "query": "☆",
            "fields": ["text_ja", "text_cjk"],
            "type": "phrase",
            "zero_terms_query": "all"
          }
        }
      ]
    }
  }
}

### Example 1-2
{
  "query": {
    "bool": {
      "must": [
        {
          "multi_match": {
            "query": "遊",
            "fields": ["text_ja", "text_cjk"],
            "type": "phrase",
            "zero_terms_query": "none"
          }
        },
        {
          "multi_match": {
            "query": "☆",
            "fields": ["text_ja", "text_cjk"],
            "type": "phrase",
            "zero_terms_query": "none"
          }
        }
      ]
    }
  }
}

# documents
### Example 2-1
{
  "text_ja": "遊☆戯☆王",
  "text_cjk": "遊☆戯☆王",
  "release_date": "2023-01-01",
  "views": 123
}

### Example 2-2
{
  "text_ja": "ドラゴンボール",
  "text_cjk": "ドラゴンボール",
  "release_date": "2023-01-01",
  "views": 123
}

### Example 2-3
{
  "text_ja": "ナルト",
  "text_cjk": "ナルト",
  "release_date": "2023-01-01",
  "views": 123
}

from elasticsearch.

atsushi-matsui commented on August 21, 2024

If you set the query as "遊 ☆" in query_string as shown below, it will appear that the search is executed only for "遊".
Although it does not exist in the query_string option, if you check the source code, it appears that the "☆" is omitted because zero_terms_query is set to null.
I would like bool_query to provide a similar option.

{
  "query": {
    "query_string": {
      "query": "遊 ☆",
      "default_operator": "AND",
      "fields": ["text_ja", "text_cjk"], 
      "type": "phrase"
    }
  }
}

from elasticsearch.

atsushi-matsui commented on August 21, 2024

@benwtrent

for your docs, what is the mapping configured? including any custom analyzers please.

This is my setting used to confirm operation.

{
  "settings": {
    "analysis": {
      "tokenizer": {
        "kuromoji_tokenizer": {
          "type": "kuromoji_tokenizer",
          "mode": "normal"
        },
        "ngram_tokenizer": {
          "type": "ngram",
          "min_gram": 2,
          "max_gram": 2
        }
      },
      "analyzer": {
        "kuromoji_analyzer": {
          "type": "custom",
          "tokenizer": "kuromoji_tokenizer",
          "filter": [
            "kuromoji_stemmer",
            "lowercase"
          ]
        },
        "ngram_analyzer": {
          "type": "custom",
          "tokenizer": "ngram_tokenizer",
          "filter": ["lowercase"]
        }
      }
    }
  },
  "mappings": {
    "properties": {
      "text_ja": {
        "type": "text",
        "analyzer": "kuromoji_analyzer"
      },
      "text_cjk": {
        "type": "text",
        "analyzer": "ngram_analyzer"
      }
    }
  }
}

from elasticsearch.

atsushi-matsui commented on August 21, 2024

I created a verification environment, so please use it if you like.
https://github.com/atsushi-matsui/sample-elastic

from elasticsearch.

atsushi-matsui commented on August 21, 2024

Hi, @benwtrent.
I would like to know if there is any progress.

from elasticsearch.

elasticsearchmachine commented on August 21, 2024

Pinging @elastic/es-search-relevance (Team:Search Relevance)

from elasticsearch.

Omit noMatchDocs in a bool query about elasticsearch HOT 12 OPEN

Comments (12)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent