Git Product home page Git Product logo

Comments (11)

ebernhardson avatar ebernhardson commented on May 26, 2024 1

Indeed one common difficulty in ML systems is a disparity between the feature values used in production, and those used when training the system. Logging production feature generation is a good way to ensure that everything is consistent end to end.

I'm not sure though how to get this into elasticsearch. I poked around quite a bit developing an LTR plugin for elasticsearch 2 and was not able to find a way to get the values into the response format. The difficulty is that the response from the data nodes to the original query nodes needs to be updated to include the extra data, such that it can be returned in the json response. Would be good to find a way to integrate this.

from elasticsearch-learning-to-rank.

peterdm avatar peterdm commented on May 26, 2024

It's likely to be the common case to want to expose the computed feature-values to downstream consumers. This would reduce the execution weight of this sort of query, by not having to calculate the same values twice.

from elasticsearch-learning-to-rank.

softwaredoug avatar softwaredoug commented on May 26, 2024

Good idea

from elasticsearch-learning-to-rank.

softwaredoug avatar softwaredoug commented on May 26, 2024

At the very least, we can create debug output that could be used. We can keep an eye to having this be relatively consistent and machine readabel

from elasticsearch-learning-to-rank.

softwaredoug avatar softwaredoug commented on May 26, 2024

One todo I have is to make a query explain that at a mininum shows each feature's values for a doc. Now debug output is more costly in production, but not sure you want to log everyone's features anyway...

from elasticsearch-learning-to-rank.

nomoa avatar nomoa commented on May 26, 2024

Would it be possible to do this with a custom highlighter? The user would simply have to duplicate its LTR query as an highlight query. It then should be able to output a text blob, this blob could be even very close to ranklib line format. The highlighter would be highly coupled to the LTR query allowing it to disable the ranklib model to only inspect individual feature scorers.
I haven't thought about all the details and I doubt that it's a viable solution for production, but it could help to avoid mistakes such as the disparities described by Erik.

from elasticsearch-learning-to-rank.

softwaredoug avatar softwaredoug commented on May 26, 2024

Debug output was added in 0.1.0

Another option is to issue bulk queries using _msearch against production for logging features.

from elasticsearch-learning-to-rank.

ebernhardson avatar ebernhardson commented on May 26, 2024

_msearch is a possibility, certainly. I'm a bit wary of having too many different ways to provide the feature queries though. The more times something is duplicated the more opportunity there is to diverge between building data sets and what ends up running in production. I think the solr plugin had the right approach here: The queries used for generating features are stored inside the model, and the resulting vectors can be returned as part of the search result. This helps to reduce the surface area of errors.

from elasticsearch-learning-to-rank.

softwaredoug avatar softwaredoug commented on May 26, 2024

Going to investigate how feasible this would be. The complication is that it's relatively straightforward to build a plugin that creates a Lucene query, I suspect it's more complex to customize the response. I also would like to keep the plugin footprint small for maintenance and as ES plugins are relatively sandboxed with lots of nice strict enforcement of expected behaviors. So perhaps there's a combination of plugin work and existing features that can help get this.

from elasticsearch-learning-to-rank.

epugh avatar epugh commented on May 26, 2024

@peterdm have you looked at what @nomoa added in PR #54 ? Does that go far enough to call this "done" for our 1.0 release?

from elasticsearch-learning-to-rank.

softwaredoug avatar softwaredoug commented on May 26, 2024

This works in 1.0. Closing

from elasticsearch-learning-to-rank.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.