Light

Would be helpful to output all feature-values with matching documents in the response. about elasticsearch-learning-to-rank HOT 11 CLOSED

o19s commented on May 26, 2024

Would be helpful to output all feature-values with matching documents in the response.

from elasticsearch-learning-to-rank.

Comments (11)

ebernhardson commented on May 26, 2024 1

Indeed one common difficulty in ML systems is a disparity between the feature values used in production, and those used when training the system. Logging production feature generation is a good way to ensure that everything is consistent end to end.

I'm not sure though how to get this into elasticsearch. I poked around quite a bit developing an LTR plugin for elasticsearch 2 and was not able to find a way to get the values into the response format. The difficulty is that the response from the data nodes to the original query nodes needs to be updated to include the extra data, such that it can be returned in the json response. Would be good to find a way to integrate this.

from elasticsearch-learning-to-rank.

peterdm commented on May 26, 2024

It's likely to be the common case to want to expose the computed feature-values to downstream consumers. This would reduce the execution weight of this sort of query, by not having to calculate the same values twice.

from elasticsearch-learning-to-rank.

softwaredoug commented on May 26, 2024

Good idea

from elasticsearch-learning-to-rank.

softwaredoug commented on May 26, 2024

At the very least, we can create debug output that could be used. We can keep an eye to having this be relatively consistent and machine readabel

from elasticsearch-learning-to-rank.

softwaredoug commented on May 26, 2024

One todo I have is to make a query explain that at a mininum shows each feature's values for a doc. Now debug output is more costly in production, but not sure you want to log everyone's features anyway...

from elasticsearch-learning-to-rank.

nomoa commented on May 26, 2024

Would it be possible to do this with a custom highlighter? The user would simply have to duplicate its LTR query as an highlight query. It then should be able to output a text blob, this blob could be even very close to ranklib line format. The highlighter would be highly coupled to the LTR query allowing it to disable the ranklib model to only inspect individual feature scorers.
I haven't thought about all the details and I doubt that it's a viable solution for production, but it could help to avoid mistakes such as the disparities described by Erik.

from elasticsearch-learning-to-rank.

softwaredoug commented on May 26, 2024

Debug output was added in 0.1.0

Another option is to issue bulk queries using _msearch against production for logging features.

from elasticsearch-learning-to-rank.

ebernhardson commented on May 26, 2024

_msearch is a possibility, certainly. I'm a bit wary of having too many different ways to provide the feature queries though. The more times something is duplicated the more opportunity there is to diverge between building data sets and what ends up running in production. I think the solr plugin had the right approach here: The queries used for generating features are stored inside the model, and the resulting vectors can be returned as part of the search result. This helps to reduce the surface area of errors.

from elasticsearch-learning-to-rank.

softwaredoug commented on May 26, 2024

Going to investigate how feasible this would be. The complication is that it's relatively straightforward to build a plugin that creates a Lucene query, I suspect it's more complex to customize the response. I also would like to keep the plugin footprint small for maintenance and as ES plugins are relatively sandboxed with lots of nice strict enforcement of expected behaviors. So perhaps there's a combination of plugin work and existing features that can help get this.

from elasticsearch-learning-to-rank.

epugh commented on May 26, 2024

@peterdm have you looked at what @nomoa added in PR #54 ? Does that go far enough to call this "done" for our 1.0 release?

from elasticsearch-learning-to-rank.

softwaredoug commented on May 26, 2024

This works in 1.0. Closing

from elasticsearch-learning-to-rank.

Related Issues (20)

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.