sherbold / autorank Goto Github PK

View Code? Open in Web Editor NEW

63.0 63.0 12.0 5.82 MB

Home Page: https://sherbold.github.io/autorank

License: Apache License 2.0

Python 98.33% TeX 1.67%

autorank's People

Contributors

Stargazers

Watchers

Forkers

pythonthings deniuhe alexandreabraham cosmina98 aaditya-bhatia lgtm-migrator pavelkromer aaghamohammadi hadarshavit daniel300000

autorank's Issues

Order in report

In [Breadcrumbsautorank/autorank
/autorank.py ](https://github.com/sherbold/autorank/blob/master/autorank/autorank.py) Line 570:
Should "result.rankdf.index[0], result.rankdf.index[1]" be "result.rankdf.index[1], result.rankdf.index[0]" ?

Thanks

plot not displaying correctly with given example.

Given the following code; I tried to make the same graph shown in README.md

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from autorank import autorank, plot_stats, create_report, latex_table

df = pd.DataFrame(
{'adaboost': [0.6029411764705882,
  0.6838235294117647,
  0.6323529411764706,
  0.5955882352941176,
  0.6102941176470589,
  0.5882352941176471,
  0.5808823529411765,
  0.6838235294117647,
  0.625,
  0.5735294117647058],
 'bagging': [0.625,
  0.6985294117647058,
  0.6911764705882353,
  0.6985294117647058,
  0.6838235294117647,
  0.7279411764705882,
  0.7426470588235294,
  0.6911764705882353,
  0.7573529411764706,
  0.7058823529411765],
 'decision_tree': [0.5882352941176471,
  0.6691176470588235,
  0.6102941176470589,
  0.5955882352941176,
  0.5661764705882353,
  0.5735294117647058,
  0.6176470588235294,
  0.7132352941176471,
  0.5955882352941176,
  0.6544117647058824],
 'random_forest': [0.6176470588235294,
  0.7205882352941176,
  0.6397058823529411,
  0.6470588235294118,
  0.6691176470588235,
  0.6911764705882353,
  0.7205882352941176,
  0.6985294117647058,
  0.6691176470588235,
  0.6691176470588235]}
)
result = autorank(df, alpha=0.05, verbose=False)

plot_stats(result)
plt.show()

Resulting graph looks nothing like it.

Bug in dataframe ordering when using `create_report()`

Great library thanks for creating it!
I have encountered a minor issue when printing the report about the statistical analysis.
From my analysis I receive the following output:

# this is how I call autorank
result = autorank(df, alpha=0.05, order="descending", verbose=True) # (1)
create_report(result) # (2)

# output (1)
Fail to reject null hypothesis that data is normal for column $B_{c}$ (p=0.140985>=0.007143)
Fail to reject null hypothesis that data is normal for column $B_{h}$ (p=0.033133>=0.007143)
Fail to reject null hypothesis that data is normal for column $B_{l}$ (p=0.036612>=0.007143)
Rejecting null hypothesis that data is normal for column $B_{r}$ (p=0.000899<0.007143)  <- rejects for $B_{r}$
Fail to reject null hypothesis that data is normal for column $O$ (p=0.065865>=0.007143)
Fail to reject null hypothesis that data is normal for column $\hat{H}$ (p=0.937819>=0.007143)
Fail to reject null hypothesis that data is normal for column $\hat{M}$ (p=0.009967>=0.007143)
...
# output (2)
The statistical analysis was conducted for 7 populations with 22 paired samples.
The family-wise significance level of the tests is alpha=0.050.
We rejected the null hypothesis that the population is normal for the population $B_{c}$ (p=0.001). Therefore, we assume that not all populations are normal.
...

There seems to be some ordering problem here when create_report() is using result.rankdf.index[i] here.
The columns of the dataframe are not in the order they used to be when the results of the Shapiro-Wilk-Test were computed, leading to the wrong population ( $B_{c}$ ) being printed (it should be $B_{r}$ ).
I could not yet figure out, where exactly this re-ordering occurs that breaks the report function.

It would be great, if you could have a look at it.
Cheers!

Correct description

In accordance with Kruschke and Liddell, 2018, the ROPE is calculated correctly (d = 0.1 = ROPE / STD, therefore: ROPE = 0.1 * STD):

# half the size of a small effect size following Kruschke (2018)
if all_normal:
    cur_rope = rope*_pooled_std(reordered_data.iloc[:, i], reordered_data.iloc[:, j])
else:
    cur_rope = rope*_pooled_mad(reordered_data.iloc[:, i], reordered_data.iloc[:, j])

However, the description does not match:

For normal data, the ROPE is defined as 0.1*d, where d is the effect size (Cohen's d).
For non-normal data, the ROPE is defined as 0.1*gamma, where gamma is the effect size (Akinshin's gamma).

In the description, I guess it should be "0.1*STD" and "0.1*MAD"?

bayesian explain

Hello,
thanks for your nice tools,
for using the bayesian result i have a problem because in your paper and original paper did not describe like you provided.
please add more detail or provide a video and explain one bayesian example, to user can understand what is the CI, y, P values, Magnitude and decision.
because normally people that use your tools don't know statistics and putting these words without explaining them is useless, at least for me!
i saw it's provided the report but it's not enough to understand the table.
Thanks

create_report produces incorrect description for approach="bayesian"

The problem happens for the following example code:

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
plt.rc('text', usetex=False)
from autorank import autorank, plot_stats, create_report, latex_table


pd.set_option('display.max_columns', 7)


raw = np.array([[0.61874876, 0.61219062],
                [0.89017217, 0.90443957],
                [0.62806089, 0.63185734],
                [0.96929193, 0.97255931],
                [0.87340513, 0.95460121],
                [0.84087749, 0.94438674],
                [0.9863088 , 0.98558508],
                [0.94314842, 0.64510605],
                [0.9862604 , 0.99173966]])
data = pd.DataFrame()
data['pop_0'] = raw[:, 0]
data['pop_1'] = raw[:, 1]

result = autorank(data, alpha=0.05, verbose=False, approach="bayesian", rope=1.0, rope_mode="absolute")
create_report(result)

It always outputs:

We found significant and practically relevant differences between the populations pop_1 (MD=0.944+-0.190, MAD=0.061) and pop_0 (MD=0.890+-0.184, MAD=0.117).

even though the ROPE is 1.0.

The reason is because the following condition expects a set with only 'inconclusive' or 'equal'

autorank/autorank/autorank.py

Lines 447 to 454 in d05bffb

 if {'inconclusive'} == set(result.rankdf['decision']): 

 print("We failed to find any conclusive evidence for differences between the populations " 

 "%s." % create_population_string(result.rankdf.index, with_stats=True)) 

 elif {'equal'} == set(result.rankdf['decision']): 

 print( 

 "All populations are equal, i.e., the are no significant and practically relevant differences " 

 "between the populations %s." % create_population_string(result.rankdf.index, 

 with_stats=True))

but the actual result is

>>> set(result.rankdf['decision'])
{'NA', 'equal'}

Is it possible to show the average ranking for each classifier above each line?

Thank you for your work, just as the tittle says, is it possible to show the average ranking for each classifier above each line?

Here is an example (please check the Figure 3 in this paper):

https://www.researchgate.net/publication/323942118_A_large_margin_time_series_nearest_neighbour_classification_under_locally_weighted_time_warps

I think this is feasible in another package:

https://github.com/hfawaz/cd-diagram

order = "ascending" does not change the plot order

just as the title says, if we set order = "ascending", it does not change the order of x-axis...

Flowchart potentially incorrect

Hi sherbold, great work to make statistical testing just a bit simpler.
I have the feeling that the flowchart in the README.md (and documentation) is mixed up in a way that the 3rd and the 4th box on the bottom row are swapped. They are currently not in line with the textual description and code.

Is it possible to let the user choose which test to run?

Thanks for this awesome tool, but I am wondering if it is possible to let the user decide if they want a parametric test or a nonparametric test? That will allow some flexibility to this function. For example, I want to get a CD plot for my test, but it always returns a CI plot.

Update to median_abs_deviation from scipy

I've noticed that scipy docs indicate that median_absolute_deviation is now deprecated and recommend using median_abs_deviation instead. Doing a calculation of a sample array directly from numpy seems to indicate that the newer version implements the calculation correctly while the older version is different (for some reason I have not investigated).

Suggestions for a pull request

Dear Dr. Herbold,

I would like to submit a pull request for the Autorank package in order to enhance its maintainability. Prior to doing so, I would appreciate your feedback and opinions regarding the proposed changes.

Using the f-string feature of Python rather than the string format method can improve code readability. It should be noted that f-strings require Python 3.6 or later. However, since your scipy dependency already requires Python 3.6 or later, using f-strings should not pose an issue.
Using the built-in logging module instead of print statements is generally recommended, as it allows for the definition of different log levels such as INFO, DEBUG, ERROR, and others. This can be particularly useful when troubleshooting or debugging code.
Using the pre-commit package is highly recommended, as it leverages Git hooks to enforce coding style guidelines prior to committing changes to the repository. This can be particularly useful in ensuring consistency and maintainability of the codebase. In cases where the criteria are not met, developers are prevented from pushing changes to the repository.
As for the black formatter, it is an opinionated PEP 8 coding style tool that enforces a particular style of code formatting. While some developers may prefer a different coding style, using black can help to ensure consistency and readability throughout the codebase. In our company, we have successfully implemented the following pre-commit configuration to enforce the use of a black formatter.

repos:
-   repo: https://github.com/pre-commit/pre-commit-hooks
    rev: v4.4.0
    hooks:
    -   id: end-of-file-fixer
    -   id: trailing-whitespace
-   repo: https://github.com/psf/black
    rev: 23.3.0
    hooks:
    -   id: black

Sincerely,
Alireza Aghamohammadi

Add random_state from baycomp

I would like to suggest to adopt the "random_state" parameter from baycomp: janezd/baycomp@4cffdb3. In this way, the results can be reproduced.

The only problem is that this would require replacing the PyPI version of baycomp with the version on GitHub. What do you think?

I am happy to create a pull request if needed.

Error when using Bartlett's test for homoscedacity of normally distributed data

When I run

autorank(df, alpha=0.05, verbose=True)

On the following dataframe:

in	F1	F2	F3
1	-0.108493	-0.182625	-0.217593
2	0.100764	0.170489	-0.0353945
3	0.578502	0.60399	0.592368
4	0.690802	0.746536	0.70053
5	0.280736	0.269668	0.278789
6	0.43442	0.371632	0.425941
7	0.568025	0.498394	0.458875
8	0.952764	0.95084	0.949877
9	0.339497	0.211892	0.337175
10	-0.334314	1.03841e-14	1.03841e-14
11	0.7448	0.711472	0.74447
12	0.946012	0.945422	0.950613
13	0.317754	0.509453	0.354345

where in is the index of the dataframe, I get the following error:

Fail to reject null hypothesis that data is normal for column F1(p=0.799397>=0.016667)
Fail to reject null hypothesis that data is normal for column F2 (p=0.910572>=0.016667)
Fail to reject null hypothesis that data is normal for column F3 (p=0.754639>=0.016667)
Using Bartlett's test for homoscedacity of normally distributed data
Fail to reject null hypothesis that all variances are equal (p=0.946143>=0.050000)
Traceback (most recent call last):
  File "/home/s3092593/afaslib/autofolio/generate_closed_gap_figures.py", line 69, in <module>
    res_without_fc = autorank(df_without_fc, alpha=0.05, verbose=True)
  File "/home/s3092593/miniconda3/envs/torch/lib/python3.10/site-packages/autorank/autorank.py", line 278, in autorank
    res = rank_multiple_normal_homoscedastic(data, alpha, verbose, order, effect_size, force_mode)
  File "/home/s3092593/miniconda3/envs/torch/lib/python3.10/site-packages/autorank/_util.py", line 254, in rank_multiple_normal_homoscedastic
    anova = AnovaRM(stacked_data, 'result', 'id', within=['treatment'])
  File "/home/s3092593/miniconda3/envs/torch/lib/python3.10/site-packages/statsmodels/stats/anova.py", line 496, in __init__
    if not data.equals(data.drop_duplicates(subset=[subject] + within)):
  File "/home/s3092593/miniconda3/envs/torch/lib/python3.10/site-packages/pandas/util/_decorators.py", line 331, in wrapper
    return func(*args, **kwargs)
  File "/home/s3092593/miniconda3/envs/torch/lib/python3.10/site-packages/pandas/core/frame.py", line 6661, in drop_duplicates
    duplicated = self.duplicated(subset, keep=keep)
  File "/home/s3092593/miniconda3/envs/torch/lib/python3.10/site-packages/pandas/core/frame.py", line 6795, in duplicated
    raise KeyError(Index(diff))
KeyError: Index(['id'], dtype='object')

Plot error

Hi!
I ran autorank to rank algorithms and the last sentence of the result is: "Based on the post-hoc Nemenyi test, we assume that there are no significant differences within the following groups: sobel and random; random and guidedbackprop; guidedbackprop and smoothgrad; smoothgrad and rise; rise, scorecam, and layercam. All other differences are significant.".
However, this does not reflect on the plot (see figure). It seems that the plot function has a problem

.

Specify version requirements for dependencies

When I tried running the example at https://sherbold.github.io/autorank/ I got an attribute error:

AttributeError: module 'scipy.stats' has no attribute 'median_absolute_deviation'

I was running scipy v1.1.0. Upgrading scipy to 1.4.1 fixed the issue. Perhaps you can consider specifying minimum version requirements for your dependencies in setup.py? E.g. "scipy>=1.4" or whatever the appropriate version is.

	if {'inconclusive'} == set(result.rankdf['decision']):
	print("We failed to find any conclusive evidence for differences between the populations "
	"%s." % create_population_string(result.rankdf.index, with_stats=True))
	elif {'equal'} == set(result.rankdf['decision']):
	print(
	"All populations are equal, i.e., the are no significant and practically relevant differences "
	"between the populations %s." % create_population_string(result.rankdf.index,
	with_stats=True))