Git Product home page Git Product logo

Comments (6)

lucasmaystre avatar lucasmaystre commented on August 26, 2024 1

Yes, I think you got it, your code now looks correct 👍

from choix.

lucasmaystre avatar lucasmaystre commented on August 26, 2024

Hi @jaradc - let me see if I understand what you want to do.
You have n items (represented by rows in your table above), and each item is scored along m dimensions (the columns above). What you suggest to do:

  1. rank the items by the value in each column. This would give you m rankings, each of size n.
  2. aggregate these rankings into a single "meta"-ranking.

You can definitely do this with choix. As you point out, you would need to relabel the items using consecutive integers. Then, you can use, e.g., the function opt_rankings. This will return a vector of parameters, one parameter for each item. The larger the value, the "better" the item. If you want a ranking, you can use:

print("ranking (worst to best):", np.argsort(params))

However, it is not currently possible to weight different rankings (e.g., if you think one column is more important than another and want it to have a bigger influence in the final ranking). You could achieve something similar by duplicating more important columns several times, but that's a bit of a hack.

from choix.

jaradc avatar jaradc commented on August 26, 2024

Thanks for taking the time to answer my question. I know you're probably super-busy with your work.

Based on your answer, I am trying to apply it.

  1. When you say "consecutive integers", would this be equivalent to saying a "rank" along axis 0?

For example:

>>> df
  Item         1        2     3     4         5      6       7     8
0    A  369248.0  12757.0  3.45  0.83  10569.60  104.0  101.63  0.82
1    B   35621.0    245.0  0.69  0.90    219.74    3.0   73.25  1.22
>>> data = df.loc[:, '1':'8'].rank(axis=0, method='dense', ascending=False).astype('int64')
>>> data
   1  2  3  4  5  6  7  8
0  1  1  1  2  1  1  1  2
1  2  2  2  1  2  2  2  1
  1. If correct so far, then I use opt_rankings:
choix.opt_rankings(n_items=10, data=data.values.tolist())
array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0.])

I'm unsure what n_items is in the opt_rankings function. n_items is in a lot of other choix functions and my best guess is that it is asking for how many items to return? The documentation says "number of distinct items" but I'm still not clear.

An array with all zeros is returned.

I thought maybe the sample data is just too different and that's maybe why so I made it similar to test this idea:

>>> df1
  Item       1      2     3     4        5    6       7     8
0    A  369248  12752  3.45  0.83  10528.0  109  101.63  0.82
1    B  369248  12757  3.45  0.82  10569.6  104  121.03  0.89
>>> data1 = df1.loc[:, '1':'8'].rank(axis=0, method='dense', ascending=False).astype('int64')
>>> choix.opt_rankings(n_items=10, data=data.values.tolist())
array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0.])

It still returns zeros. Even if n_items is large:

>>> choix.opt_rankings(n_items=10000, data=data).sum()
0.0
>>> choix.opt_rankings(n_items=10000, data=data1).sum()
0.0

I have no-doubt I'm just not understanding something. I'm just not sure what.

from choix.

lucasmaystre avatar lucasmaystre commented on August 26, 2024

Hi @jaradc ,

When you say "consecutive integers", would this be equivalent to saying a "rank" along axis 0?

No, data should contain a collection of ranked list of items, and not the ranks of the items. E.g., imagine you have 4 items, and you have scores [0.2, 1.8, -1, 4.5]. This would lead to the ranking 3 > 1 > 0 > 2, and accordingly you should set data[i] = [3, 1, 0, 2] before calling opt_rankings. The array of ranks is a different concept, and would be something like [2, 3, 1, 4] or similar.

Furthermore, data, should be a collection of rankings (for example, a list of rankings). If you have a single ranking, you should wrap it inside a list first.

n_items is, in your case, the number of rows. It's just to help choix understand how many items there are. In the case of complete rankings, it could be inferred from the size of the ranking, but in some cases it's useful to declare it explicitly (e.g., when the comparisons are sparse, and some items may have never been compared).

I encourage you to go through the example notebook, https://github.com/lucasmaystre/choix/blob/master/notebooks/intro-pairwise.ipynb. It is for pairwise comparisons, but that's just a special case of rankings, when the number of items in every ranking is exactly 2. You may want to start from that notebook and build upon it to suit your usecase.

from choix.

jaradc avatar jaradc commented on August 26, 2024

Thanks for this vital clarification.

So in your scores example above ([0.2, 1.8, -1, 4.5]), if higher numbers mean "better performance", it's really a reverse argsort.

>>> a = np.array([0.2, 1.8, -1, 4.5])
>>> np.argsort(a)[::-1]
array([3, 1, 0, 2], dtype=int64)

So assuming all column's row values are "good" when they are high values (to keep it simple):

>>> data
          1        2     3     4         5      6       7     8
0  369248.0  12757.0  3.45  0.83  10569.60  104.0  101.63  0.82
1   35621.0    245.0  0.69  0.90    219.74    3.0   73.25  1.22

>>> argsort_data = [data[col].argsort()[::-1].values.tolist() for col in data]

>>> argsort_data
[[0, 1], [0, 1], [0, 1], [1, 0], [0, 1], [0, 1], [0, 1], [1, 0]]

# n_items is, in your case, the number of rows
>>> len(data)
2 

>>> choix.opt_rankings(n_items=len(data), data=argsort_data)
array([ 0.54930578, -0.54930578])

>>> result = choix.opt_rankings(len(data), data=argsort_data)

>>> print("ranking (worst to best):", np.argsort(result))
ranking (worst to best): [1 0]

I reviewed the workbook (all of them) before asking here but it didn't look similar to my problem so I didn't know how to apply it. Mainly, the n_items, params is/was really unclear to me without a lot of examples I could learn from and apply to my use-case.

from choix.

jaradc avatar jaradc commented on August 26, 2024

Thank you for helping me get started with Choix! And for providing this library.

from choix.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.