Git Product home page Git Product logo

o19s / elasticsearch-learning-to-rank Goto Github PK

View Code? Open in Web Editor NEW
1.5K 76.0 365.0 2.07 MB

Plugin to integrate Learning to Rank (aka machine learning for better relevance) with Elasticsearch

Home Page: http://opensourceconnections.com/blog/2017/02/14/elasticsearch-learning-to-rank/

License: Apache License 2.0

Java 97.24% Python 2.76%
elasticsearch relevant-search machine-learning search-relevance elasticsearch-plugin elasticsearch-plugins

elasticsearch-learning-to-rank's People

Contributors

aprudhomme avatar ebernhardson avatar epugh avatar henrywallace avatar hronom avatar jackpf avatar jettro avatar jzonthemtn avatar mzaian avatar nathancday avatar ndkmath1 avatar nomoa avatar pakio avatar philippus avatar richardknox avatar rtancman avatar saluev avatar schedutron avatar shibe avatar softwaredoug avatar sstults avatar styrmis avatar swonvip avatar taoyyu avatar thepanz avatar tmanabe avatar umeshdangat avatar worleydl avatar wrigleydan avatar xiaomeng-faire avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

elasticsearch-learning-to-rank's Issues

To avoid costly errors, validate features when they're added

If you have valid JSON, but a semantically invalid query (for example it's easy to add "query") like below when the plugin expects things at the "match" level:

 {
      "name": "user_rating",
      "type": "feature",
      "feature": {
         "name": "user_rating",
         "params": [],
         "template_language": "mustache",
         "template": {
            "query": {
               "match": {
                  "title": "{{keywords}}"
               }
            }
         }
      }
   }

Say this is feature "1" and you've built up a large feature set of 100 queries. Then you go to execute the feature set. You'll get a bug that the sltr query could not be executed.

As feature sets are append-only your whole feature set is screwed up and you have to start over. Is there any way to validate this is a valid query (ie attempt to parse it) before letting the feature be cerated

Minor HTTP status code nitpicks in 1.0

I noted when rebuilding the demo that

  • Trying to GET a missing feature store (to check if it exists) returns a 400, not a 404
  • PUT'ing a new feature store should return 201, instead it returns 200

Add full example for simple testing

It would be good to have a working example data set and set of click data to have a simple smoke test for devs to run.

I think what is needed is:

This is also mentioned in the current README, just figured opening an issue would help.

Script evaluated differently in LTR?

I have a sample painless script which compares two static dates. If I run it inside script_fields I see the value I expect. However if I run it as a script query input to LTR, I get 1.

Baffled!

Test Setup:

POST _scripts/ranklib/dummy
{
  "script": "## LambdaMART\n## No. of trees = 1\n## No. of leaves = 10\n## No. of threshold candidates = 256\n## Learning rate = 0.1\n## Stop early = 100\n\n<ensemble>\n <tree id=\"1\" weight=\"0.1\">\n  <split>\n   <feature> 1 </feature>\n   <threshold> 0.45867884 </threshold>\n   <split pos=\"left\">\n    <feature> 1 </feature>\n    <threshold> 0.0 </threshold>\n    <split pos=\"left\">\n     <output> -2.0 </output>\n    </split>\n    <split pos=\"right\">\n     <output> -1.3413081169128418 </output>\n    </split>\n   </split>\n   <split pos=\"right\">\n    <feature> 1 </feature>\n    <threshold> 0.6115718 </threshold>\n    <split pos=\"left\">\n     <output> 0.3089442849159241 </output>\n    </split>\n    <split pos=\"right\">\n     <output> 2.0 </output>\n    </split>\n   </split>\n  </split>\n </tree>\n</ensemble>"
}  

POST /test/empty/
{}
    
GET /test/empty/_search
{
  "query": {
    "match_all": {}
  }
}

Query to Reproduce Error:

GET /test/empty/_search?explain=true
{
  "query": {
      "match_all": {}
  },
  "script_fields": {
      "days_between": {
          "script": {
              "params": {
                  "search_timestamp": "2017-03-23T00:00:00.000Z",
                  "compare_to": "2017-03-18T04:34:15.606Z"
              },
              "lang": "painless",
              "inline": "return ChronoUnit.DAYS.between(Instant.parse(params.compare_to), Instant.parse(params.search_timestamp))"
          }
      }
  },
  "rescore": {
      "query": {
          "rescore_query": {
              "ltr": {
                  "model": {
                      "stored": "dummy"
                  },
                  "features": [
                      {
                          "script": {
                              "_name": "days_between",
                              "script": {
                                "params": {
                                    "search_timestamp": "2017-03-23T00:00:00.000Z",
                                    "compare_to": "2017-03-18T04:34:15.606Z"
                                },
                                "lang": "painless",
                                "inline": "return ChronoUnit.DAYS.between(Instant.parse(params.compare_to), Instant.parse(params.search_timestamp))"
                            }
                          }
                      }
                  ]
              }
          }
      }
  }
}

For me (5.3.0_0.1.0) the result comes out:

fields.days_between = [ 4 ]
_explaination.details[1].details[0].details[0].details[0].value = 1

Consider changing the error message/code when attempting to update model

Models are intended to be immutable. However, if you do try to update a model you get a fairly cryptic error.

Performing:

POST http://localhost:9200/_ltr/_featureset/movie_features/_createmodel

Returns 409 with

{"error":{"root_cause":[{"type":"version_conflict_engine_exception","reason":"[store][model-test_9]: version conflict, document already exists (current version [1])","index_uuid":"WnhIFFEMTuyTwLXu-DVmxw","shard":"0","index":".ltrstore"}],"type":"version_conflict_engine_exception","reason":"[store][model-test_9]: version conflict, document already exists (current version [1])","index_uuid":"WnhIFFEMTuyTwLXu-DVmxw","shard":"0","index":".ltrstore"},"status":409}

I would propose a 405 with a helpful message "Models cannot be updated, please create a new model" or something similar

Demonstrate xgboost usage

Apparently xgboost is the shiznit for training boosted trees. We should demonstrate how xgboost can be used with this plugin, with existing Ranklib XML method of specifying a LambdaMART model.

I would expect

  • scripts directory showing off xgboost
  • or another git repo demonstrating xgboost + ES w/ this plugin
  • or a helpful blog post on xgboost + ES

Cant log boosted sltr query

To address #69, (so I can log features for a set of ids), I added an sltr query with a boost of 0. Specifically:

GET tmdb/_search
{
    "explain": true, 
    "query": {
        "bool": {
            "should": [                
                {"sltr": {
                    "boost": 0,
                    "_name": "logged_featureset",
                    "featureset": "movie_features",
                    "params": {
                        "keywords": "rambo"
                    }
                }},
                {"match": {
                   "overview": 
                   {
                       "query": "rambo"
                   }
                }}
                ]
            }
    },
    "ext": {
        "ltr_log": {
            "log_specs": {
                "name": "log_entry1",
                "named_query": "logged_featureset"
            }
        }
    }
}

This gives the following error:

{
   "error": {
      "root_cause": [
         {
            "type": "illegal_argument_exception",
            "reason": "Query named [logged_featureset] must be a [sltr] query [BoostQuery] found"
         }
      ],
      "type": "search_phase_execution_exception",
      "reason": "all shards failed",
      "phase": "query",
      "grouped": true,
      "failed_shards": [
         {
            "shard": 0,
            "index": "tmdb",
            "node": "A1yBd5opScyEeVTitmkyCA",
            "reason": {
               "type": "illegal_argument_exception",
               "reason": "Query named [logged_featureset] must be a [sltr] query [BoostQuery] found"
            }
         }
      ]
   },
   "status": 400
}

Change logging so that it doesn't refer to an sltr query

Consider "offline" logging use cases where a user batches a set of identifiers and simply wants the scores for each feature for a set of document identifiers (this is what happens currently in the demo). The current logging API expects to find an sltr query in the body.

At first blush, I would prefer a logging interface closer too:

GET tmdb/_search
{
    "query": {
        "terms": {
             "id": ["1234", "5678"]
        }
    },
    "ext": {
        "ltr_log": {
            "log_specs": {
                "featureset": "my_feature_set"
            }
        }
    }
}

This would log "my_feature_set" for the returned documents.

This seems more flexible than the current logging interface in the 1.0 branch, as it would support several logging use cases.

If you forget _ before a feature or feature set, you can create insidous/confusing bugs

I was about to create an issue with the list of problems below. Notice the subtle problem: I accidentally created a feature store name "feature" when I had intended to create a feature in the default feature store. Then was confused why I couldn't find my feature. (note some below may still be bugs, I'm going to create separate issues):

May I suggest, that the following are blacklisted feature store names:

  • feature
  • featureSet
  • feature_set
  • featureset
  • feature*
  • (others?)

?

==================
Original bug I was about to file

Attempting to use the CRUD API to create/list features and I encounter a number of bugs (using the latest 1_0):

Features don't appear to be listed

PUT _ltr/feature
{
  "name": "foo",
  "params": ["query_string"],
  "template_language": "mustache",
  "template" : {
    "match": {
      "field": "{{query_string}}"
    }
  }
}

GET _ltr/_feature?prefix=foo

The latter GET returns 0 results for me. Additionally, I have simple features "1" and "2" for my demo, that I create when I create the feature set that do not return.

Start not recognized when retrieving features or feature sets

Running the example from the docs

GET /_ltr/_featureset?prefix=set&start=20&size=30

Gives error about start:

{
   "error": {
      "root_cause": [
         {
            "type": "illegal_argument_exception",
            "reason": "request [/_ltr/_featureset] contains unrecognized parameter: [start]"
         }
      ],
      "type": "illegal_argument_exception",
      "reason": "request [/_ltr/_featureset] contains unrecognized parameter: [start]"
   },
   "status": 400
}

Error about "store"

After running the code above, I attempt to retrieve "foo" with the following error

GET _ltr/feature/foo
{
   "error": {
      "root_cause": [
         {
            "type": "illegal_argument_exception",
            "reason": "request [/_ltr/feature/foo] contains unrecognized parameter: [store]"
         }
      ],
      "type": "illegal_argument_exception",
      "reason": "request [/_ltr/feature/foo] contains unrecognized parameter: [store]"
   },
   "status": 400
}

The same error happens if I atet

Cannot append to existing feature set

After creating foo above, I attempt to append to my existing feature set

POST /_ltr/_featureset/movie_features/_addfeatures/foo

Returns

{
   "error": {
      "root_cause": [
         {
            "type": "illegal_argument_exception",
            "reason": "The feature query [foo] returned no features"
         }
      ],
      "type": "illegal_argument_exception",
      "reason": "The feature query [foo] returned no features"
   },
   "status": 400
}

Query validation

Can we use the existing validation endpoint?
Should or can we make sure that nothing is deprecated?

Scripted feature queries returning a value > 1 are passed to the LTR reranker as 1.0

To reproduce create the following index:

PUT /rando

PUT /rando/_mapping/fortune 
{
  "properties": {
    "msg": {
      "type": "text"
    },
    "lucky_number": {
      "type": "float"
    }
  }
}

PUT /rando/fortune/1
{
  "msg": "Be patient: in time, even an egg will walk.",
  "lucky_number": 0.9
}

PUT /rando/fortune/2
{
  "msg": "Let the deeds speak.",
  "lucky_number": 2.2
}

PUT /rando/fortune/3
{
  "msg": "Digital circuits are made from analog parts.",
  "lucky_number": 3.3
}

GET /rando/_search
{
  "query": {
    "match_all": {}
  }
}

Load the following model (with a lucky_number split threshold of 0.99 )

POST _scripts/ranklib/testmodel
{
  "script": "## LambdaMART\n## No. of trees = 1\n## No. of leaves = 2\n## No. of threshold candidates = 1\n## Learning rate = 0.1\n## Stop early = 100\n\n<ensemble><tree id=\"1\" weight=\"0.1\"><split><feature> 1 </feature><threshold> 0.99 </threshold><split pos=\"left\"><output>5</output></split><split pos=\"right\"><output>10</output></split></split></tree></ensemble>"
}

And run the scoring query

GET /rando/_search
{
    "query": {
        "ltr": {
            "model": {
                "stored": "testmodel"
            },
            "features": [{
                "script": {
                  "script": {
                    "lang": "expression",
                    "inline": "doc['lucky_number']"
                  }
                }
            }]
        }
    },
    "script_fields": {
      "1": {
        "script" : {
          "lang": "expression",
          "inline" : "doc['lucky_number']"
        }
      }
    },
    "_source":true
}

As you'd expect fortune-1 takes the left-split, and fortune-2 and 3 take the right-split.

Now reload the same model but modify the lucky_number split threshold to be 2.5

POST _scripts/ranklib/testmodel
{
  "script": "## LambdaMART\n## No. of trees = 1\n## No. of leaves = 2\n## No. of threshold candidates = 1\n## Learning rate = 0.1\n## Stop early = 100\n\n<ensemble><tree id=\"1\" weight=\"0.1\"><split><feature> 1 </feature><threshold> 2.5 </threshold><split pos=\"left\"><output>5</output></split><split pos=\"right\"><output>10</output></split></split></tree></ensemble>"
}

And re-run the scoring query above.

(Expectation fortune-1 and fortune 2 take the left split, while fortune-3 takes the right split)

However: All three end up taking the left split despite the fact that 3.3 > 2.5

(This behavior seems to indicate that RankLib is receiving min(script_computed_value, 1.0) ... as opposed to the explicit script_computed_value)

Note: I validated this standalone with RankLib directly and the test as structured above was successful.

RankLib install

The documentation about installing the RankLib jar talks about using maven, but the maven command won't work without a pom. More instructions on installing the jar would be helpful.

Support Fewer Models (slim down to handful of Ranklib models)

Erik Bernhardson makes a good point that really there's only a few types of models you probably care about. Ranklib comes with all sorts of intermediate/weird models that most people probably don't care about.

Maximum Interpretability

  • Models, like the linear model, that are basically optimized boosts. Advantage: it's easy to interpret, debug, and understand these models

Maximum Flexibility / Prediction Power

  • LambdaMART/Boosted Tree model that does a great job of predicting the weird nooks and crannies of search relevance

Related, we probably don't need to really rely on Ranklib when we just care about these 2 or 3 models, and only the evaluation parts of those models as well

Document v1.0 features

Update README and any other documentation to reflect v1.0 features. What other documentation do we need?

Would be helpful to output all feature-values with matching documents in the response.

This would make the feature-value inputs available to be logged (as future training data) without having to re-evaluate each feature query as a script_field.

(Those weights could simply be output in the response)

E.g. Instead of having to do

GET /rando/_search
{
    "query": {
        "ltr": {
            "model": {
                "stored": "testmodel"
            },
            "features": [{
                "script": {
                  "script": {        
                    "lang": "expression",
                    "inline": "doc['lucky_number']"
                  }
                }
            }]
        }
    },
    "script_fields": {
      "1": {
        "script" : {
          "lang": "expression",
          "inline" : "doc['lucky_number']"
        }
      }
    },
    "_source":true
}

to get:

{
  "took": 10,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "failed": 0
  },
  "hits": {
    "total": 3,
    "max_score": 0.5,
    "hits": [
      {
        "_index": "rando",
        "_type": "fortune",
        "_id": "2",
        "_score": 0.5,
        "_source": {
          "msg": "Let the deeds speak.",
          "lucky_number": 2.2
        },
        "fields": {
          "1": [
            2.2
          ]
        }
      },
      {
        "_index": "rando",
        "_type": "fortune",
        "_id": "1",
        "_score": 0.5,
        "_source": {
          "msg": "Be patient: in time, even an egg will walk.",
          "lucky_number": 0.9
        },
        "fields": {
          "1": [
            0.9
          ]
        }
      },
      {
        "_index": "rando",
        "_type": "fortune",
        "_id": "3",
        "_score": 0.5,
        "_source": {
          "msg": "Digital circuits are made from analog parts.",
          "lucky_number": 3.3
        },
        "fields": {
          "1": [
            3.3
          ]
        }
      }
    ]
  }
}

Could we generate the same output by running this?:
(perhaps even incorporating query _names for clarity?)

GET /rando/_search
{
    "query": {
        "ltr": {
            "output_feature_values": true,
            "model": {
                "stored": "testmodel"
            },
            "features": [{
                "script": {
                  "_name": "lucky",
                  "script": {        
                    "lang": "expression",
                    "inline": "doc['lucky_number']"
                  }
                }
            }]
        }
    }
    "_source":true
}

to produce something like this?

{
  "took": 10,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "failed": 0
  },
  "hits": {
    "total": 3,
    "max_score": 0.5,
    "hits": [
      {
        "_index": "rando",
        "_type": "fortune",
        "_id": "2",
        "_score": 0.5,
        "_source": {
          "msg": "Let the deeds speak.",
          "lucky_number": 2.2
        },
        "feature_values": [{
          "lucky": 2.2
        }]
      },
      {
        "_index": "rando",
        "_type": "fortune",
        "_id": "1",
        "_score": 0.5,
        "_source": {
          "msg": "Be patient: in time, even an egg will walk.",
          "lucky_number": 0.9
        },
        "feature_values": [{
          "lucky": 0.9
        }]
      },
      {
        "_index": "rando",
        "_type": "fortune",
        "_id": "3",
        "_score": 0.5,
        "_source": {
          "msg": "Digital circuits are made from analog parts.",
          "lucky_number": 3.3
        },
        "feature_values": [{
          "lucky": 3.3
        }]
      }
    ]
  }
}

No support for empty feature sets

I tend to see the functionality here as a "workbook" approach to developing features. I can see a case where someone creates a feature set and wants to add more later. So I was surprised to see this command

PUT _ltr/_featureset/more_movie_features
{
  "name": "more_movie_features",
  "features": []
}

Failed with

{
   "error": {
      "root_cause": [
         {
            "type": "parsing_exception",
            "reason": "At least one feature must be defined in [features]",
            "line": 4,
            "col": 1
         }
      ],
      "type": "parsing_exception",
      "reason": "At least one feature must be defined in [features]",
      "line": 4,
      "col": 1
   },
   "status": 400
}

I suspect there's a reason for this, but want to understand why for documentation. But if it's possible, it'd be great to allow empty feature sets and avoid the confusion.

Build failed with gradle

Hi guys, I can not build the project with gradle, and it gives this error:

FAILURE: Build failed with an exception.

Where:
Build file '/file/elasticsearchLtrPlugin/elasticsearch-learning-to-rank/build.gradle' line: 18

What went wrong:
A problem occurred evaluating root project 'ltr-query'.

Failed to apply plugin [id 'carrotsearch.randomized-testing']
Could not create task of type 'RandomizedTestingTask'.

Try:
Run with --stacktrace option to get the stack trace. Run with --info or --debug option to get more log output.

BUILD FAILED

Do you have any suggestions?

Many thanks,
Beifei

String keys for feature-generating queries

In the current implementation feature-generating queries are matched to features in a trained model based on their position in the ltr subquery.

This is extremely terse, and introduces many opportunities to introduce positional mismatches (during query development, or model development), without a validation step.

What could help, at least from a query-development perspective would be something similar to this:

http://alexbenedetti.blogspot.com/2016/08/solr-is-learning-to-rank-better-part-2.html

Where features are identified with a string-key in the query. And the ordering/alignment of feature-inputs against the model is specified in a separate block from the feature-inputs themselves.

Significantly, then those feature description string-keys could then be carried through to the _explain (in place of "Feature 7:") for more readable validation.

The query example does not work

Hi,

I have used this query example, however, it did not work.

{
    "query": {...}
    "rescore": {
        "query": {
            "ltr": {
                "model": {
                    "stored": "dummy"
                },
                "features": [{
                    "match": {
                        "title": userSearchString
                    }
                },{
                    "constant_score": {
                        "query": {
                            "match_phrase": {
                                "title": "userSearchString"
                            }
                        }
                    }
                }]
            }
        }
    }
}

It seems that the query is wrong, and a "," seems to be missing after "query":{...}. After I added ",", and run the query, it gives such error:

{
"error": {
"root_cause": [
{
"type": "illegal_argument_exception",
"reason": "[query] unknown field [ltr], parser not found"
}
],
"type": "illegal_argument_exception",
"reason": "[query] unknown field [ltr], parser not found"
},
"status": 400
}

I have checked the source code, in this class "LtrQueryBuilder", it does not mention "rescore", does it mean that "rescore" is not implemented?

I am looking forward to your reply.
Thanks,
Beifei

Feature/Feature Set relationship confused somewhat in API

Initially when I developed the demo, I had assumed from the structure of this request:

{
  "name": "my_feature_set",
  "features" : [
    {
      "name": "my_feature",
      "params": ["query_string"],
      "template_language": "mustache",
      "template" : {
        "match": {
          "field": "{{query_string}}"
        }
      }
    }
  ]
}

That features were "owned by" a feature set. IE a strong composition relationship. This is further reinforced by requiring feature sets to be created with new features (see #82). For example, in this sort of relationship I assumed if you delete a feature set, you would also delete all associated features. Or you would do GET _ltr/my_feature_set/feature/my_feature.

However, it appears on further study that features exist independently of feature sets, and the relationship is more associative (which makes a lot of sense to me).

Can we change this API to avoid the confusion by creating a clearer associative relationship? For example, I would suggest something that did not automatically create features in the set, such as notionally something like:


PUT _ltr/_feature/my_feature
   {
      "name": "my_feature",
      "params": ["query_string"],
      "template_language": "mustache",
      "template" : {
        "match": {
          "field": "{{query_string}}"
        }
      }
   }

PUT _ltr/_featureset
{
  "name": "my_feature_set",
  "features" : [{
      "name": "my_feature"
   }]
}

I could live with the current way things are done, but as I'm documenting I'm seeing how it can be confusing.

At the very least, I think it would be good to be able to support a feature set creation syntax like what I've proposed above where feature sets are created from existing features. (maybe one exists and I missed it?).

Thoughts? Am I missing something?

Kill the Ranklib dependency

Related to #14. Scientist say we only use 1% of our Brains err I mean Ranklib.

So when you focus on what actually matters, we don't really care about most of the models. We don't really care about the training side of Ranklib. And when you take all that stuff away you get a very small set of code for evaluating some models.

This issue is to kill the dependency and just directly include the models/code we care about

A new feature for using the first phase scores in the second phase in the Ltr query

Hi all,

I think it would be a good feature to be able to use first phase query scores as a feature in the Ltr query. Solr Ltr integration has this feature, called OriginalScoreFeature, specially designed for this purpose. It seems that they have achieved this by customizing the QueryRescorer in Solr and passing this info as DocInfo. However, in elasticsearch, it does not seem to be able to customize QueryRescorer.

regards
Rifat

Random test failures

com.o19s.es.ltr.feature.store.index.CachedFeatureStoreTests.testExpirationOnGet is failing randomly and probably others that rely on system clock.
We need to find a way to make it run nicely or disable it to avoid annoyance.

Create build for ES 2.4

There are a number of barriers to getting ES 5.x deployed (for us it is partly that we are still working to get off of 1.x and supporting three major versions in production would require even more hacks). If it's possible to get a 2.x version of this plugin that would make it a lot easier to experiment with existing indices and should increase the user base of people contributing.

Start not recognized when listing features/feature sets

Run this command from the docs:

GET /_ltr/_featureset?prefix=set&start=20&size=30

You'll get the error:

{
   "error": {
      "root_cause": [
         {
            "type": "illegal_argument_exception",
            "reason": "request [/_ltr/_featureset] contains unrecognized parameter: [start]"
         }
      ],
      "type": "illegal_argument_exception",
      "reason": "request [/_ltr/_featureset] contains unrecognized parameter: [start]"
   },
   "status": 400
}

Provide a way to run circleci on PR

Currently CI runs only on branches of this repo. It'd be interesting for contributors that do not have write access to this repo to be able to run circleci on their PR.
Additionally it'll help reviewers to make sure that the build pass before merging the PR without fetching the PR locally.

Simplifying/hiding feature stores

My first (perhaps incorrect) impression of feature stores is that they are an implementation detail most users would not think about. Are there cases where people would create more than one feature store? Or can a single feature store satisfy the vast majority of use cases?

Can we

  • Initialize the default feature store on plugin installation?
  • Hide this implementation detail (don't document creating other feature stores?)

Support stored features in FeatureSet creation

I'm working on improving the documentation for utilizing the REST api to manage features and feature sets. Currently I don't see a way to utilize existing features when creating a new feature set and we don't allow creating a feature set with no features. We should support creating a featureset by using the names of existing features.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.