Git Product home page Git Product logo

Comments (2)

kishorenc avatar kishorenc commented on May 24, 2024 1

We introduced a different way to do faceting in v26 which might have caused a regression here. Will investigate and update.

from typesense.

kishorenc avatar kishorenc commented on May 24, 2024

Typesense supports two strategies for efficient faceting, and has some built-in heuristics to pick the right strategy for you by default. In 27.0.rc11 RC build, we have introduced a new query parameter called facet_strategy that will allow you to configure the faceting behavior.

To fix the issue you are facing, please send this additional search parameter:

"facet_strategy": "exhaustive"

This will force Typesense to compute facets in an exhaustive manner and allows the total_values to be exact.

Additional details

If you find that faceting is slow for your type of query patterns and shape of your dataset, you can use the facet_strategy parameter to override the default strategy, to fine tune performance.

The valid values for this parameter are exhaustive, top_values and automatic.

exhaustive - in this strategy, once we have the list of matching documents, we’ll simply iterate through each document’s facet_by fields, and sum up the number of documents for each unique facet value. This is effective when the number of documents is small (less than few tens of thousands of docs) and/or when the number of facet values requested (as defined by max_facet_values) is large.

top_values - in this strategy, once we have the list of matching documents, we’ll look up each facet field’s value in a reverse index that stores a mapping of {facet_field_value => [list of all documents that have this value]}. We’ll then find the intersection of these two lists of documents (the list of matching documents and the list of all documents that have this facet field value), and the length of the intersected list will give us the facet count. This strategy is efficient if we have a large number of hits, since we only have to do intersections on the top facet values (the values that have the largest number of documents in the reverse index). However, if the number of facet values to fetch (as configured by max_facet_values) is sufficiently large and the number of hits is small, then this strategy becomes less efficient, compared to the iterate_count strategy. Another downside of this approach is that it will not return an exact count for total_values in the facet stats because we only consider only consider limited number of facets for facet count intersections.

automatic - Typesense will pick an ideal strategy based on the heuristics described above and is the default value for this parameter.

You can specify a strategy for all facet fields in the query via:

"facet_strategy": "exhaustive"

or you can specify a different strategy for each field by using a comma separated list of field names that match the order of field names in facet_by. So for eg, if you have facet_by: field1, field2, field3 and facet_strategy: automatic, exhaustive, top_values, field1 will use the automatic mode, field2 will use the exhaustive mode and field3 will use the top_values mode.

from typesense.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.