Comments (2)
We introduced a different way to do faceting in v26 which might have caused a regression here. Will investigate and update.
from typesense.
Typesense supports two strategies for efficient faceting, and has some built-in heuristics to pick the right strategy for you by default. In 27.0.rc11
RC build, we have introduced a new query parameter called facet_strategy
that will allow you to configure the faceting behavior.
To fix the issue you are facing, please send this additional search parameter:
"facet_strategy": "exhaustive"
This will force Typesense to compute facets in an exhaustive manner and allows the total_values
to be exact.
Additional details
If you find that faceting is slow for your type of query patterns and shape of your dataset, you can use the facet_strategy
parameter to override the default strategy, to fine tune performance.
The valid values for this parameter are exhaustive
, top_values
and automatic
.
exhaustive
- in this strategy, once we have the list of matching documents, we’ll simply iterate through each document’s facet_by
fields, and sum up the number of documents for each unique facet value. This is effective when the number of documents is small (less than few tens of thousands of docs) and/or when the number of facet values requested (as defined by max_facet_values
) is large.
top_values
- in this strategy, once we have the list of matching documents, we’ll look up each facet field’s value in a reverse index that stores a mapping of {facet_field_value => [list of all documents that have this value]}
. We’ll then find the intersection of these two lists of documents (the list of matching documents and the list of all documents that have this facet field value), and the length of the intersected list will give us the facet count. This strategy is efficient if we have a large number of hits, since we only have to do intersections on the top facet values (the values that have the largest number of documents in the reverse index). However, if the number of facet values to fetch (as configured by max_facet_values
) is sufficiently large and the number of hits is small, then this strategy becomes less efficient, compared to the iterate_count strategy. Another downside of this approach is that it will not return an exact count for total_values
in the facet stats because we only consider only consider limited number of facets for facet count intersections.
automatic
- Typesense will pick an ideal strategy based on the heuristics described above and is the default value for this parameter.
You can specify a strategy for all facet fields in the query via:
"facet_strategy": "exhaustive"
or you can specify a different strategy for each field by using a comma separated list of field names that match the order of field names in facet_by
. So for eg, if you have facet_by: field1, field2, field3
and facet_strategy: automatic, exhaustive, top_values
, field1 will use the automatic
mode, field2 will use the exhaustive
mode and field3 will use the top_values
mode.
from typesense.
Related Issues (20)
- [Feature Request] Filter Caching
- wrong facet_counts, missing facet values with Semantic Search HOT 5
- Typesense uses host machine memory stats when running in Docker
- Memory Corruption in `fuzzy_search_fields`
- [Feature request] Add "cleanup" parameter to snapshot operation
- [BUG] Lazy filter performance issue when more than 10 terms in a filter_by field HOT 1
- Voice Query - Whisper
- why so many open file descriptors to the same file? HOT 2
- cf/mistral/mistral-7b-instruct-v0.1 is only accepted Cloudflare Workers AI model
- [Feature Request] Streaming Conversational Responses
- Hightlight stop working when perform infix search with fallback operation
- Error when using buckets on vector search "Could not find a field named `_vector_distance` in the schema for sorting."
- [Feature Request] Conversational Search (RAG) with a local LLM HOT 7
- TypeSense: sorting only applies on the first page, How to fix that? HOT 1
- Hidden Hits Parameter Not Working in Multi-Search Vector Query
- Curation overrides causing duplicate results HOT 2
- Question/Feature Request: Pinned/Forced Search Results at Specific Positions HOT 5
- Parent of nested object array is the array of object itself and not the parent object
- add a sortable field
- Requesting support for Scandinavian letters (ä, ö, æ, ø, å) HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from typesense.