sherbold / autorank Goto Github PK
View Code? Open in Web Editor NEWHome Page: https://sherbold.github.io/autorank
License: Apache License 2.0
Home Page: https://sherbold.github.io/autorank
License: Apache License 2.0
In [Breadcrumbsautorank/autorank
/autorank.py ](https://github.com/sherbold/autorank/blob/master/autorank/autorank.py) Line 570:
Should "result.rankdf.index[0], result.rankdf.index[1]" be "result.rankdf.index[1], result.rankdf.index[0]" ?
Thanks
Given the following code; I tried to make the same graph shown in README.md
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from autorank import autorank, plot_stats, create_report, latex_table
df = pd.DataFrame(
{'adaboost': [0.6029411764705882,
0.6838235294117647,
0.6323529411764706,
0.5955882352941176,
0.6102941176470589,
0.5882352941176471,
0.5808823529411765,
0.6838235294117647,
0.625,
0.5735294117647058],
'bagging': [0.625,
0.6985294117647058,
0.6911764705882353,
0.6985294117647058,
0.6838235294117647,
0.7279411764705882,
0.7426470588235294,
0.6911764705882353,
0.7573529411764706,
0.7058823529411765],
'decision_tree': [0.5882352941176471,
0.6691176470588235,
0.6102941176470589,
0.5955882352941176,
0.5661764705882353,
0.5735294117647058,
0.6176470588235294,
0.7132352941176471,
0.5955882352941176,
0.6544117647058824],
'random_forest': [0.6176470588235294,
0.7205882352941176,
0.6397058823529411,
0.6470588235294118,
0.6691176470588235,
0.6911764705882353,
0.7205882352941176,
0.6985294117647058,
0.6691176470588235,
0.6691176470588235]}
)
result = autorank(df, alpha=0.05, verbose=False)
plot_stats(result)
plt.show()
Resulting graph looks nothing like it.
Great library thanks for creating it!
I have encountered a minor issue when printing the report about the statistical analysis.
From my analysis I receive the following output:
# this is how I call autorank
result = autorank(df, alpha=0.05, order="descending", verbose=True) # (1)
create_report(result) # (2)
# output (1)
Fail to reject null hypothesis that data is normal for column $B_{c}$ (p=0.140985>=0.007143)
Fail to reject null hypothesis that data is normal for column $B_{h}$ (p=0.033133>=0.007143)
Fail to reject null hypothesis that data is normal for column $B_{l}$ (p=0.036612>=0.007143)
Rejecting null hypothesis that data is normal for column $B_{r}$ (p=0.000899<0.007143) <- rejects for $B_{r}$
Fail to reject null hypothesis that data is normal for column $O$ (p=0.065865>=0.007143)
Fail to reject null hypothesis that data is normal for column $\hat{H}$ (p=0.937819>=0.007143)
Fail to reject null hypothesis that data is normal for column $\hat{M}$ (p=0.009967>=0.007143)
...
# output (2)
The statistical analysis was conducted for 7 populations with 22 paired samples.
The family-wise significance level of the tests is alpha=0.050.
We rejected the null hypothesis that the population is normal for the population $B_{c}$ (p=0.001). Therefore, we assume that not all populations are normal.
...
There seems to be some ordering problem here when create_report()
is using result.rankdf.index[i]
here.
The columns of the dataframe are not in the order they used to be when the results of the Shapiro-Wilk-Test were computed, leading to the wrong population ($B_{c}$
) being printed (it should be $B_{r}$
).
I could not yet figure out, where exactly this re-ordering occurs that breaks the report function.
It would be great, if you could have a look at it.
Cheers!
In accordance with Kruschke and Liddell, 2018, the ROPE is calculated correctly (d = 0.1 = ROPE / STD, therefore: ROPE = 0.1 * STD):
# half the size of a small effect size following Kruschke (2018)
if all_normal:
cur_rope = rope*_pooled_std(reordered_data.iloc[:, i], reordered_data.iloc[:, j])
else:
cur_rope = rope*_pooled_mad(reordered_data.iloc[:, i], reordered_data.iloc[:, j])
However, the description does not match:
For normal data, the ROPE is defined as 0.1*d, where d is the effect size (Cohen's d).
For non-normal data, the ROPE is defined as 0.1*gamma, where gamma is the effect size (Akinshin's gamma).
In the description, I guess it should be "0.1*STD" and "0.1*MAD"?
Hello,
thanks for your nice tools,
for using the bayesian result i have a problem because in your paper and original paper did not describe like you provided.
please add more detail or provide a video and explain one bayesian example, to user can understand what is the CI, y, P values, Magnitude and decision.
because normally people that use your tools don't know statistics and putting these words without explaining them is useless, at least for me!
i saw it's provided the report but it's not enough to understand the table.
Thanks
The problem happens for the following example code:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
plt.rc('text', usetex=False)
from autorank import autorank, plot_stats, create_report, latex_table
pd.set_option('display.max_columns', 7)
raw = np.array([[0.61874876, 0.61219062],
[0.89017217, 0.90443957],
[0.62806089, 0.63185734],
[0.96929193, 0.97255931],
[0.87340513, 0.95460121],
[0.84087749, 0.94438674],
[0.9863088 , 0.98558508],
[0.94314842, 0.64510605],
[0.9862604 , 0.99173966]])
data = pd.DataFrame()
data['pop_0'] = raw[:, 0]
data['pop_1'] = raw[:, 1]
result = autorank(data, alpha=0.05, verbose=False, approach="bayesian", rope=1.0, rope_mode="absolute")
create_report(result)
It always outputs:
We found significant and practically relevant differences between the populations pop_1 (MD=0.944+-0.190, MAD=0.061) and pop_0 (MD=0.890+-0.184, MAD=0.117).
even though the ROPE is 1.0
.
The reason is because the following condition expects a set with only 'inconclusive'
or 'equal'
Lines 447 to 454 in d05bffb
>>> set(result.rankdf['decision'])
{'NA', 'equal'}
Thank you for your work, just as the tittle says, is it possible to show the average ranking for each classifier above each line?
just as the title says, if we set order = "ascending", it does not change the order of x-axis...
Hi sherbold, great work to make statistical testing just a bit simpler.
I have the feeling that the flowchart in the README.md (and documentation) is mixed up in a way that the 3rd and the 4th box on the bottom row are swapped. They are currently not in line with the textual description and code.
Thanks for this awesome tool, but I am wondering if it is possible to let the user decide if they want a parametric test or a nonparametric test? That will allow some flexibility to this function. For example, I want to get a CD plot for my test, but it always returns a CI plot.
I've noticed that scipy docs indicate that median_absolute_deviation
is now deprecated and recommend using median_abs_deviation
instead. Doing a calculation of a sample array directly from numpy seems to indicate that the newer version implements the calculation correctly while the older version is different (for some reason I have not investigated).
Dear Dr. Herbold,
I would like to submit a pull request for the Autorank package in order to enhance its maintainability. Prior to doing so, I would appreciate your feedback and opinions regarding the proposed changes.
repos:
- repo: https://github.com/pre-commit/pre-commit-hooks
rev: v4.4.0
hooks:
- id: end-of-file-fixer
- id: trailing-whitespace
- repo: https://github.com/psf/black
rev: 23.3.0
hooks:
- id: black
Sincerely,
Alireza Aghamohammadi
I would like to suggest to adopt the "random_state" parameter from baycomp: janezd/baycomp@4cffdb3. In this way, the results can be reproduced.
The only problem is that this would require replacing the PyPI version of baycomp with the version on GitHub. What do you think?
I am happy to create a pull request if needed.
When I run
autorank(df, alpha=0.05, verbose=True)
On the following dataframe:
in | F1 | F2 | F3 |
---|---|---|---|
1 | -0.108493 | -0.182625 | -0.217593 |
2 | 0.100764 | 0.170489 | -0.0353945 |
3 | 0.578502 | 0.60399 | 0.592368 |
4 | 0.690802 | 0.746536 | 0.70053 |
5 | 0.280736 | 0.269668 | 0.278789 |
6 | 0.43442 | 0.371632 | 0.425941 |
7 | 0.568025 | 0.498394 | 0.458875 |
8 | 0.952764 | 0.95084 | 0.949877 |
9 | 0.339497 | 0.211892 | 0.337175 |
10 | -0.334314 | 1.03841e-14 | 1.03841e-14 |
11 | 0.7448 | 0.711472 | 0.74447 |
12 | 0.946012 | 0.945422 | 0.950613 |
13 | 0.317754 | 0.509453 | 0.354345 |
where in is the index of the dataframe, I get the following error:
Fail to reject null hypothesis that data is normal for column F1(p=0.799397>=0.016667)
Fail to reject null hypothesis that data is normal for column F2 (p=0.910572>=0.016667)
Fail to reject null hypothesis that data is normal for column F3 (p=0.754639>=0.016667)
Using Bartlett's test for homoscedacity of normally distributed data
Fail to reject null hypothesis that all variances are equal (p=0.946143>=0.050000)
Traceback (most recent call last):
File "/home/s3092593/afaslib/autofolio/generate_closed_gap_figures.py", line 69, in <module>
res_without_fc = autorank(df_without_fc, alpha=0.05, verbose=True)
File "/home/s3092593/miniconda3/envs/torch/lib/python3.10/site-packages/autorank/autorank.py", line 278, in autorank
res = rank_multiple_normal_homoscedastic(data, alpha, verbose, order, effect_size, force_mode)
File "/home/s3092593/miniconda3/envs/torch/lib/python3.10/site-packages/autorank/_util.py", line 254, in rank_multiple_normal_homoscedastic
anova = AnovaRM(stacked_data, 'result', 'id', within=['treatment'])
File "/home/s3092593/miniconda3/envs/torch/lib/python3.10/site-packages/statsmodels/stats/anova.py", line 496, in __init__
if not data.equals(data.drop_duplicates(subset=[subject] + within)):
File "/home/s3092593/miniconda3/envs/torch/lib/python3.10/site-packages/pandas/util/_decorators.py", line 331, in wrapper
return func(*args, **kwargs)
File "/home/s3092593/miniconda3/envs/torch/lib/python3.10/site-packages/pandas/core/frame.py", line 6661, in drop_duplicates
duplicated = self.duplicated(subset, keep=keep)
File "/home/s3092593/miniconda3/envs/torch/lib/python3.10/site-packages/pandas/core/frame.py", line 6795, in duplicated
raise KeyError(Index(diff))
KeyError: Index(['id'], dtype='object')
Hi!
I ran autorank to rank algorithms and the last sentence of the result is: "Based on the post-hoc Nemenyi test, we assume that there are no significant differences within the following groups: sobel and random; random and guidedbackprop; guidedbackprop and smoothgrad; smoothgrad and rise; rise, scorecam, and layercam. All other differences are significant.".
However, this does not reflect on the plot (see figure). It seems that the plot function has a problem
.
When I tried running the example at https://sherbold.github.io/autorank/ I got an attribute error:
AttributeError: module 'scipy.stats' has no attribute 'median_absolute_deviation'
I was running scipy v1.1.0. Upgrading scipy to 1.4.1 fixed the issue. Perhaps you can consider specifying minimum version requirements for your dependencies in setup.py? E.g. "scipy>=1.4" or whatever the appropriate version is.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.