Comments (11)
Indeed one common difficulty in ML systems is a disparity between the feature values used in production, and those used when training the system. Logging production feature generation is a good way to ensure that everything is consistent end to end.
I'm not sure though how to get this into elasticsearch. I poked around quite a bit developing an LTR plugin for elasticsearch 2 and was not able to find a way to get the values into the response format. The difficulty is that the response from the data nodes to the original query nodes needs to be updated to include the extra data, such that it can be returned in the json response. Would be good to find a way to integrate this.
from elasticsearch-learning-to-rank.
It's likely to be the common case to want to expose the computed feature-values to downstream consumers. This would reduce the execution weight of this sort of query, by not having to calculate the same values twice.
from elasticsearch-learning-to-rank.
Good idea
from elasticsearch-learning-to-rank.
At the very least, we can create debug output that could be used. We can keep an eye to having this be relatively consistent and machine readabel
from elasticsearch-learning-to-rank.
One todo I have is to make a query explain that at a mininum shows each feature's values for a doc. Now debug output is more costly in production, but not sure you want to log everyone's features anyway...
from elasticsearch-learning-to-rank.
Would it be possible to do this with a custom highlighter? The user would simply have to duplicate its LTR query as an highlight query. It then should be able to output a text blob, this blob could be even very close to ranklib line format. The highlighter would be highly coupled to the LTR query allowing it to disable the ranklib model to only inspect individual feature scorers.
I haven't thought about all the details and I doubt that it's a viable solution for production, but it could help to avoid mistakes such as the disparities described by Erik.
from elasticsearch-learning-to-rank.
Debug output was added in 0.1.0
Another option is to issue bulk queries using _msearch
against production for logging features.
from elasticsearch-learning-to-rank.
_msearch is a possibility, certainly. I'm a bit wary of having too many different ways to provide the feature queries though. The more times something is duplicated the more opportunity there is to diverge between building data sets and what ends up running in production. I think the solr plugin had the right approach here: The queries used for generating features are stored inside the model, and the resulting vectors can be returned as part of the search result. This helps to reduce the surface area of errors.
from elasticsearch-learning-to-rank.
Going to investigate how feasible this would be. The complication is that it's relatively straightforward to build a plugin that creates a Lucene query, I suspect it's more complex to customize the response. I also would like to keep the plugin footprint small for maintenance and as ES plugins are relatively sandboxed with lots of nice strict enforcement of expected behaviors. So perhaps there's a combination of plugin work and existing features that can help get this.
from elasticsearch-learning-to-rank.
@peterdm have you looked at what @nomoa added in PR #54 ? Does that go far enough to call this "done" for our 1.0 release?
from elasticsearch-learning-to-rank.
This works in 1.0. Closing
from elasticsearch-learning-to-rank.
Related Issues (20)
- Getting class_cast_exception when trying to use profile:true with sltr query HOT 5
- Fix readthedocs to include "store" parameter in sltr body when logging feature scores HOT 3
- Can I pass a feature vector as a param ? HOT 3
- Update elasticsearch to 7.17.5 HOT 3
- elasticsearch-learning-to-rank plugin for ES 8.x HOT 3
- ES 6.8.23 -> LTR Build Request HOT 9
- Cannot Install Plugin HOT 2
- Plans for a similar opensearch plugin? HOT 1
- not able to install ETL Plugin into Ubuntu Server 22.04 LTS HOT 1
- Update elasticsearch to 8.5.1
- Update elasticsearch to 8.5.2
- LTR for Elasticsearch 8.4.1 HOT 1
- There is no release for elasticsearch 7.17.9 HOT 3
- How is the final score of a LambdaMart model calculated?
- How to include categorical features in feature set HOT 5
- Dense Vector Feature as a param to a Mustache script score template HOT 2
- Using LTR together with the Go Typed-API Client?
- Search Keyword with certain conditions HOT 2
- Recent versions are not published to Maven repository HOT 1
- sltr queries with minimum_should_match features
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from elasticsearch-learning-to-rank.