I'm trying to determine if Choix can be used to compare two rows of data that look lik

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

Use Choix for table-like heterogenous data for comparison? about choix HOT 6 CLOSED

jaradc commented on August 26, 2024

Use Choix for table-like heterogenous data for comparison?

from choix.

Comments (6)

lucasmaystre commented on August 26, 2024 1

Yes, I think you got it, your code now looks correct 👍

from choix.

lucasmaystre commented on August 26, 2024

Hi @jaradc - let me see if I understand what you want to do.
You have n items (represented by rows in your table above), and each item is scored along m dimensions (the columns above). What you suggest to do:

rank the items by the value in each column. This would give you m rankings, each of size n.
aggregate these rankings into a single "meta"-ranking.

You can definitely do this with choix. As you point out, you would need to relabel the items using consecutive integers. Then, you can use, e.g., the function opt_rankings. This will return a vector of parameters, one parameter for each item. The larger the value, the "better" the item. If you want a ranking, you can use:

print("ranking (worst to best):", np.argsort(params))

However, it is not currently possible to weight different rankings (e.g., if you think one column is more important than another and want it to have a bigger influence in the final ranking). You could achieve something similar by duplicating more important columns several times, but that's a bit of a hack.

from choix.

jaradc commented on August 26, 2024

Thanks for taking the time to answer my question. I know you're probably super-busy with your work.

Based on your answer, I am trying to apply it.

When you say "consecutive integers", would this be equivalent to saying a "rank" along axis 0?

For example:

>>> df
  Item         1        2     3     4         5      6       7     8
0    A  369248.0  12757.0  3.45  0.83  10569.60  104.0  101.63  0.82
1    B   35621.0    245.0  0.69  0.90    219.74    3.0   73.25  1.22
>>> data = df.loc[:, '1':'8'].rank(axis=0, method='dense', ascending=False).astype('int64')
>>> data
   1  2  3  4  5  6  7  8
0  1  1  1  2  1  1  1  2
1  2  2  2  1  2  2  2  1

If correct so far, then I use opt_rankings:

choix.opt_rankings(n_items=10, data=data.values.tolist())
array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0.])

I'm unsure what n_items is in the opt_rankings function. n_items is in a lot of other choix functions and my best guess is that it is asking for how many items to return? The documentation says "number of distinct items" but I'm still not clear.

An array with all zeros is returned.

I thought maybe the sample data is just too different and that's maybe why so I made it similar to test this idea:

>>> df1
  Item       1      2     3     4        5    6       7     8
0    A  369248  12752  3.45  0.83  10528.0  109  101.63  0.82
1    B  369248  12757  3.45  0.82  10569.6  104  121.03  0.89
>>> data1 = df1.loc[:, '1':'8'].rank(axis=0, method='dense', ascending=False).astype('int64')
>>> choix.opt_rankings(n_items=10, data=data.values.tolist())
array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0.])

It still returns zeros. Even if n_items is large:

>>> choix.opt_rankings(n_items=10000, data=data).sum()
0.0
>>> choix.opt_rankings(n_items=10000, data=data1).sum()
0.0

I have no-doubt I'm just not understanding something. I'm just not sure what.

from choix.

lucasmaystre commented on August 26, 2024

Hi @jaradc ,

When you say "consecutive integers", would this be equivalent to saying a "rank" along axis 0?

No, data should contain a collection of ranked list of items, and not the ranks of the items. E.g., imagine you have 4 items, and you have scores [0.2, 1.8, -1, 4.5]. This would lead to the ranking 3 > 1 > 0 > 2, and accordingly you should set data[i] = [3, 1, 0, 2] before calling opt_rankings. The array of ranks is a different concept, and would be something like [2, 3, 1, 4] or similar.

Furthermore, data, should be a collection of rankings (for example, a list of rankings). If you have a single ranking, you should wrap it inside a list first.

n_items is, in your case, the number of rows. It's just to help choix understand how many items there are. In the case of complete rankings, it could be inferred from the size of the ranking, but in some cases it's useful to declare it explicitly (e.g., when the comparisons are sparse, and some items may have never been compared).

I encourage you to go through the example notebook, https://github.com/lucasmaystre/choix/blob/master/notebooks/intro-pairwise.ipynb. It is for pairwise comparisons, but that's just a special case of rankings, when the number of items in every ranking is exactly 2. You may want to start from that notebook and build upon it to suit your usecase.

from choix.

jaradc commented on August 26, 2024

Thanks for this vital clarification.

So in your scores example above ([0.2, 1.8, -1, 4.5]), if higher numbers mean "better performance", it's really a reverse argsort.

>>> a = np.array([0.2, 1.8, -1, 4.5])
>>> np.argsort(a)[::-1]
array([3, 1, 0, 2], dtype=int64)

So assuming all column's row values are "good" when they are high values (to keep it simple):

>>> data
          1        2     3     4         5      6       7     8
0  369248.0  12757.0  3.45  0.83  10569.60  104.0  101.63  0.82
1   35621.0    245.0  0.69  0.90    219.74    3.0   73.25  1.22

>>> argsort_data = [data[col].argsort()[::-1].values.tolist() for col in data]

>>> argsort_data
[[0, 1], [0, 1], [0, 1], [1, 0], [0, 1], [0, 1], [0, 1], [1, 0]]

# n_items is, in your case, the number of rows
>>> len(data)
2 

>>> choix.opt_rankings(n_items=len(data), data=argsort_data)
array([ 0.54930578, -0.54930578])

>>> result = choix.opt_rankings(len(data), data=argsort_data)

>>> print("ranking (worst to best):", np.argsort(result))
ranking (worst to best): [1 0]

I reviewed the workbook (all of them) before asking here but it didn't look similar to my problem so I didn't know how to apply it. Mainly, the n_items, params is/was really unclear to me without a lot of examples I could learn from and apply to my use-case.

from choix.

jaradc commented on August 26, 2024

Thank you for helping me get started with Choix! And for providing this library.

from choix.

Use Choix for table-like heterogenous data for comparison? about choix HOT 6 CLOSED

Comments (6)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent