Git Product home page Git Product logo

Comments (10)

sethtroisi avatar sethtroisi commented on August 26, 2024 2

This discussion is interesting for us (tensorflow/minigo). We're using (2) to do dynamic pairing of different NN models which are very expensive to evaluate (~1-3minutes) per eval.

We do a variation on active learning where we select a couple of points that maximize expected information gain and evaluate them. We stop not after k but after std_err is below some threshold.

I've been evaluating the goodness of ranking via the inverse of the hessian (justification was this post I think)

        hessian = choix.opt.PairwiseFcts(pairs, penalty=.1).hessian(ilsr_param)
        std_err = np.sqrt(np.diagonal(np.linalg.inv(hessian)))

from choix.

sethtroisi avatar sethtroisi commented on August 26, 2024 2

@lucasmaystre random.sample(range(M, 2) is clearly better, that was a silly oversight on my part.

which is in some cases (marginally) more accurate. But also more expensive, since you need multiple matrix inversions.
I misread the "marginally" as applying to both statements (accuracy, and cost).

I converted the script to a Colab (colab is a Google hosted + more magical version of Jupyter notebooks) and printed some plots about shift in ratings and std_err and speed.

TL;DR ep_pairwise is very similar but 20-30x slower. which is to slow for my use case :/

from choix.

lucasmaystre avatar lucasmaystre commented on August 26, 2024 1

@sethtroisi very nice. What you are doing can also be thought of as computing the Laplace approximation of the posterior distribution.

An alternative would be:

mean, cov = choix.ep_pairwise(n_items, pairs, 0.1, model="logit")
std_err = np.sqrt(np.diagonal(cov))

This computes an approximate posterior using the EP algorithm, which is in some cases (marginally) more accurate. But also more expensive, since you need multiple matrix inversions.

from choix.

lucasmaystre avatar lucasmaystre commented on August 26, 2024

Hi @wegel, that's a great question. Supposing that you have a budget of k comparisons, there are two ways to go about it:

  1. selecting all k comparisons before observing the outcomes.
  2. iteratively & adaptively select pairs of items to compare based on the outcomes of previous comparisons

For (1), there is some theory that suggests that you cannot do much better that selecting k pairs uniformly at random among all nĀ² pairs.

For (2), a.k.a. the active learning setting, there is a number of approaches, but it is still very much an active area of research. Some alternatives:

  • In my opinion, the best approach is to greedily maximize the expected information gain, i.e., to follow the approach of this paper. But the cost of this is O(nĀ³), so not sure if it'll work in your case. I have some code for this if you want to try - let me know and I'll add you to the (private) repo.
  • Another approach is to use a search algorithm (such as Mergesort) as a "heuristic" to select pairs. This idea is explained in this paper (which is part of my PhD thesis).
  • This recent paper seems to provide another approach.

Hope this helps!

from choix.

wegel avatar wegel commented on August 26, 2024

Interesting. If I chose (1), how would I know (potentially using choix) that the randomly selected pairs were sufficient to calculate with good certainty the rank of all items? Would I just call choix.ilsr_pairwise and check if it raises a RuntimeError: Did not converge after 100 iterations, and if did raise the error, maybe select a few other random pairs to compare? How can I determine that the ranking truthfulness has improved at each iteration in that case?

Intuitively, using something similar to quicksort for this problem seems like a good fit, so I'll read your paper with attention (and try to understand how I would actually implement/program it). Thanks!

from choix.

lucasmaystre avatar lucasmaystre commented on August 26, 2024

Regarding your first point, I don't think there is any ideal way to do this. In particular, convergence is not necessarily the right thing to look at. What I would look at is as follows:

  • stability: if I train the model on part of the data vs. all of the data, does the ranking change by much? (you can use choix.{footrule,kendall}_dist or look at the norm of the difference between parameter vectors)
  • predictive accuracy: splitting the data the data into two sets A, B, how well does a model trained on A predict comparison outcomes in B?

In either case, it is not clear when to stop adding pairs - at some point you will probably just need to decide that the results are "good enough".

from choix.

sethtroisi avatar sethtroisi commented on August 26, 2024

Thanks for the links. I'll try it out. We are slightly sensitive to speed, we have 1.5M pairs over ~10K models takes several minutes to converge and invert.

A related question I've been curious about.
What does penalty represent? Is there some reasonable way to choice it?

from choix.

lucasmaystre avatar lucasmaystre commented on August 26, 2024

@sethtroisi penalty is simply the L2-regularization hyperparameter. In the inference functions exposed in the top-level package, it is called alpha.

In other words: if you minimize choix.opt.PairwiseFcts(pairs, penalty=x).objective (which you would do when calling choix.opt_pairwise(n_items, pairs, alpha=x)) then you are computing the maximum a-posteriori estimate of the parameters, under a spherical Gaussian prior with variance 1/x.

If you can make a guess about the range of the parameters, you can use that to set penalty. Otherwise, you can estimate the best value using cross-validation or similar. Informally, 0.1 seems like a decent default value :-)

from choix.

sethtroisi avatar sethtroisi commented on August 26, 2024

@lucasmaystre My math background is more in discrete math so I'm Googling everything you say to trying to understand. Thanks again for following up.

ep_pairwise seems to always throw an error I wrote up this simple test program to demonstrate

I'm using this to power the ratings page for Minigo, on this data set (~150,000 games) it ran for >10 minutes before I killed it. I tried with a smaller data set (~10000 games but it still


my params have a range of ~25
with penalty 0.1 my std_err go from 0.09 to 1.80
with penalty 0.05 my std_err go from 0.11 to 2.19
with penalty 0.2 my std_err go from 0.08 to 1.40

I guess I'll spend sometime seeing what minimizes error on a holdout game set.

from choix.

lucasmaystre avatar lucasmaystre commented on August 26, 2024

@sethtroisi regarding your test program: some of the pairs you generate on line 16 have twice the same player. This throws ep_pairwise off. If you replace line 16 by:

players = random.sample(range(M), 2)

I think it should work.

In general I would expect ep_pairwise to be significantly slower than what you used until now (by a factor 10x-100x). There are ways to speed this up, but I'm not sure it's worth it - simply using the inverse hessian at the maximum-likelihood estimate of the parameters, like you did so far, should be good enough in practice.

from choix.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    šŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. šŸ“ŠšŸ“ˆšŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ā¤ļø Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.