webermarcolivier / statannot Goto Github PK

View Code? Open in Web Editor NEW

391.0 391.0 74.0 2.4 MB

add statistical annotations (pvalue significance) on an existing boxplot generated by seaborn boxplot

License: MIT License

Jupyter Notebook 84.54% Python 15.46%

statannot's People

Contributors

Stargazers

Watchers

statannot's Issues

Barplot with hue

Hi,

I want to generate a plot similar to this. Where can I find the script? Please help me.

Thanks

All Nan slice encountered when df had no nans

Hi,

My data frame input has no nans, but it still raised the error nanargmax. How do I resolve this error? Thank you very much

File "", line 54, in
test='t-test_paired', text_format='full', loc='inside', verbose=2)

File "/anaconda3/lib/python3.6/site-packages/statannot/statannot.py", line 519, in add_stat_annotation
(y_stack_arr[0, :] <= x2))])

File "<array_function internals>", line 6, in nanargmax

File "/anaconda3/lib/python3.6/site-packages/numpy/lib/nanfunctions.py", line 551, in nanargmax
raise ValueError("All-NaN slice encountered")

ValueError: All-NaN slice encountered

Implement permutation test

It would be nice if the permutation test would be implemented as an alternative to t-test & Co.

is it possible to remove the "ns" label but only keep the significant symbol?

Hi,

Thank you so much for making this wonderful plugin and it helps me a lot in my research.

I have one request and wondering if you would like to improve in your future plans.
for some cases, we may only wanna keep the significant symbols while removing all the "ns" labels.
I checked somehow it is not implemented in the current version, wondering is it possible to realize it in the near future?

but again, just small advice, and thanks a lot for your contributions already!
shirley

Bug with violinplot + ax.set_yscale(log)?

I have called

fig, ax = plt.subplots(...)
sns.violinplot(ax=ax[0], ...)
add_stat_annotation(ax[0], ...)
ax[0].set_yscale("log")

For both loc='inside' and 'outside', I get very wrong limits on the y-axis. The limits can still be set manually with ax.set_ylim(...), and then everything is fine again. If I skip the set_yscale, everything works fine

Is this behaviour expected?

`short_test_name` suggestion

Hello,

It seems that the short_test_name parameter could also be useful when statannot is responsible for performing the tests.

It could enable the customization of the text showing on the plots with text_format = "full" or "simple" , if someone needs "Wil." instead of "Wilcoxon" on a crowded plot, for example.

What do you think?

Thanks

Remove ns p-value lines

Hi) Is there a way to remove NS p-values so there are no lines and values for non-significant pairs?
Thank you)

barplot, correct y position of annotation line

In the current implementation, in the case of a barplot, the y positions are computed based on the real data points. This makes sense in the case of the boxplot, where all the data points are plotted. However, the final seaborn barplot output is composed of the bar + error bars. We should only take into account the y position of the error bars.

Example:

Is there a one-side ind or welch t-test ?

I saw found the feature only in Mann-Whitney. but there is no one-side t-test ?

KeyError: 'Thur' for the example

Thanks for the great package. However, I had a problem running the demo.
Seaborn in version 0.9.0

pvalue annotation legend:
ns: 5.00e-02 < p <= 1.00e+09
*: 1.00e-02 < p <= 5.00e-02
**: 1.00e-03 < p <= 1.00e-02
***: 1.00e-04 < p <= 1.00e-03
****: p <= 1.00e-04
()
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-4-3cef1a3195f1> in <module>()
      1 ax = sns.boxplot(x="day", y="total_bill", data=df)
      2 add_statistical_test_annotation(ax, df, [("Thur", "Fri"), ("Thur", "Sat"), ("Fri", "Sun")],
----> 3                                 test='Mann-Whitney', order=None, textFormat='star', loc='outside', verbose=2);
      4 plt.savefig('example1.png', dpi=300, bbox_inches='tight')

/home/magnus/opt/statannot/statannot.py in add_statistical_test_annotation(ax, df, catPairList, xlabel, ylabel, test, order, textFormat, loc, pvalueThresholds, color, lineYOffsetAxesCoord, lineHeightAxesCoord, yTextOffsetPoints, linewidth, fontsize, verbose)
     85             x1 = np.where(catValues == cat1)[0][0]
     86             x2 = np.where(catValues == cat2)[0][0]
---> 87             cat1YMax = g[ylabel].max()[cat1]
     88             cat2YMax = g[ylabel].max()[cat2]
     89             cat1Values = g.get_group(cat1)[ylabel].values

/usr/local/lib/python2.7/site-packages/pandas/core/series.pyc in __getitem__(self, key)
    599         key = com._apply_if_callable(key, self)
    600         try:
--> 601             result = self.index.get_value(self, key)
    602 
    603             if not is_scalar(result):

/usr/local/lib/python2.7/site-packages/pandas/indexes/base.pyc in get_value(self, series, key)
   2151                     raise InvalidIndexError(key)
   2152                 else:
-> 2153                     raise e1
   2154             except Exception:  # pragma: no cover
   2155                 raise e1

KeyError: 'Thur'

allow independent groups of annotations lines

Thanks, now I understand much better. Generalizing a bit the problem, it would be nice to identify these "clusters" of annotations and only stack the lines independently in each one. This would improve a lot the layout of the annotations. For example, in grouped (hued) boxplots, as in your example, annotations could be stacked in each group of boxes independently. The clusters can be easily determined by identifying groups of non-overlapping segments on the x axis.

I will try to work on such a solution, it shouldn't be too difficult.

Originally posted by @webermarcolivier in #19 (comment)

Mixup in plotting significant differences between groups boxplot

I am trying to add statistical significance between groups (Equator and Posterior) and age. I am able to get that part working, however, when I add the second group (Equator vs Equator for different age groups) it treats the data backwards. I'm not sure how to resolve this.

I added the two lists "Equator vs Posterior" (lines 1-5) and "Equator vs Age" (lines 5-10) together to compare statistical significance.

[(('30-39', 'Equator'), ('30-39', 'Posterior')),
 (('60-69', 'Equator'), ('60-69', 'Posterior')),
 (('40-49', 'Equator'), ('40-49', 'Posterior')),
 (('50-59', 'Equator'), ('50-59', 'Posterior')),
 (('70-79', 'Equator'), ('70-79', 'Posterior')),
 (('30-39', 'Equator'), ('30-39', 'Equator')),
 (('30-39', 'Equator'), ('60-69', 'Equator')),
 (('30-39', 'Equator'), ('40-49', 'Equator')),
 (('30-39', 'Equator'), ('50-59', 'Equator')),
 (('30-39', 'Equator'), ('70-79', 'Equator'))]

However, when I try to plot, statannot treats the "Equator vs Age" groups as "Posterior"

SteadyState_Peel_force_ageGroup_BoxPlot_WithData.pdf

f, ax = plt.subplots()
ax = sns.boxplot(x=AG, y=ss_mN, hue=R, hue_order=[Eq, Po], data=df)

# Statistical test for differences
# List of groups (AgeGroups)
hue_order = list(df[AG].unique())
# Create combinations to compare
box_pairs_1 = [((Age_Group_i, Eq), (Age_Group_i, Po)) 
               for Age_Group_i in hue_order]

# Add equator age groups
# Create combinations to compare
box_pairs_2 = [((LAG[0], Eq), (Age_Group_i, Eq)) 
               for Age_Group_i in hue_order]

box_pairs = box_pairs_1 + box_pairs_2
test_results = add_stat_annotation(ax, plot='boxplot', data=df, 
                                   x=AG, y=ss_mN, 
                                   hue=R, box_pairs=box_pairs,
                                   test='t-test_ind', text_format='star',
                                   loc='outside', verbose=2, 
                                   comparisons_correction=None, 
                                   line_offset=0.0, 
                                   line_offset_to_box=0.0, 
                                   line_height= 0.015, 
                                   fontsize='small') # 'bonferroni'

boxPlotBlackBorder() # Make borders black

Any idea how I can resolve this?

Color Changes bar color but not text color

This code changes the color of the bars but not the text. How can i change the text color in this plot?

add_stat_annotation(ax, data=exc154_data, hue="Genotype", x="Tissue", y="Transcript Density", order=["Antenna","Early Pulse"],hue_order=["WT","Exc154 homozygous","Exc154 hemizygous"], box_pairs=box_pairs, test='Mann-Whitney', text_format='star', loc='inside', verbose=2,color='0.7')

ANOVA

Would it be possible to include an ANOVA test using the boxplots/bargraphs where you could compare multiple groups and get a single line across indicating significant difference between groups with small vertical lines indicating which groups are included?

This image was created in MS Excel and lines were added afterwards, but the line at the top is what I was wondering about the possibility in statannot.

How to install this package?

Hi,
This package will make life so easy! But how to install it?

Line lengths do not match non-default boxplot width values

Using the source code that produced this example figure but altering the boxplot width:

ax = sns.boxplot(data=df, x=x, y=y, hue=hue, width=0.25)

results in this:

It would be nice if the lines were aligned to the boxplot centers even for non-default width values.
(Obviously this is an extreme case where lines would be become too small, but this issue also holds for simper cases, e.g. 2 x 2 comparisons).

FacetGrid

It would be really cool if you add a way to use your function with seaborn.FacetGrid.

statannot in log scale

Could you please add a feature to add_stat_annotation so it can work on y-axis in log scale

Annotations not aligned

First of all - a very useful tool, thank You

For some reason, the annotations are not aligned for me and I can't figure why.
I'm using the code below and the annotations are far to the right of the boxplot.
Running the code on Anacondas 1.9.7 JupyterLab 1.1.4 Python 3.7.3
Do You know where the problem could be?

ax = sns.boxplot(data=df[['ACTB','GAPDH','HPRT1']]) fig = plt.gcf() test_results = add_stat_annotation(ax, df, box_pairs=[('ACTB','GAPDH'), ('ACTB','HPRT1'), ('GAPDH','HPRT1')], test='t-test_ind', text_format='star', loc='outside', verbose=1)

No multiple testing correction for significance

The pairwise testing happens without any correction for significance like with Bonferroni correction for example. A test might show up as significant while it might be not significant if correction were to be applied. Is it possible to add this feature?

AttributeError: module 'seaborn' has no attribute 'categorical'

I'm getting this error when trying to add statistical annotation on a boxplot
This happens with both code that worked a couple of weeks ago and the example case at the statannot front page.
I updated both packages with no success.
Very grateful of any help!

Possible to add annotations without computing them implicitly?

Is it possible to add the text annotations in the same format as box_pairs?

E.g.

add_stat_annotation(ax, data=df, x=x, y=y, order=order,
                    box_pairs=[("Thur", "Fri"), ("Thur", "Sat"), ("Fri", "Sun")],
                    text=['this','that', 'the other'], loc='outside', verbose=2)

Show only significant comparisons

Is there an easy way to show only the comparisons that end up as significant? I would like to compute all comparisons, but showing the non-significant ones make the plot very busy.

small suggestions

Thanks for such a useful package! A couple quick ideas in case you have time to make any changes.

Would be useful to have a way to pass equal_var=False to scipy.stats.ttest_ind for a Welch's test. This could also just be made to be the default, as it is usually a better choice than a standard t test that assumes equal variance, see e.g. here for one reference.
I think annotations like "p < xxx", where xxx is an exact calculated p value, should probably be changed to "p = xxx".
Ability to write p values using a provided format string like "%0.2f" so the label can read "p = 0.02" instead of "2.0e-2"

Calculate annotation for hue-ed plots

Hey, do you think it would be possible to apply the library to plots when you hue-ed values?

https://seaborn.pydata.org/generated/seaborn.boxplot.html?highlight=boxplot#seaborn.boxplot

Barplot error bars?

Do the errors bars on the barplot represent standard deviation or standard error? Neither of which seems to correspond to these values for my data.

Submitting subset of dataframe to add_stat_annotation() causes TypeError: unhashable type: 'numpy.ndarray'

When I subset a pandas df with .isin() and I submit this to
add_stat_annotation(ax[0,0], data=SubsetDf, x=x, y=y, hue=hue, box_pairs=box_pairs, perform_stat_test=False, pvalues=SubsetDf["P-val"], text_annot_custom=SubsetDf["Text"], loc='inside', order=order, verbose=0)
I get the TypeError: unhashable type: 'numpy.ndarray'.
The SubsetDf is reindexed following subsetting.

ValueError: boxPairList contains an unvalid box pair.

When trying to visualize statistical significance between two boxplots, plotted by making use of the hue variable, I get the following error:

 ValueError: boxPairList contains an unvalid box pair.

When printing the .group_names or .hue_names of the BoxPlotter object, nothing is returned, which is probably causing the error.

Any ideas on how to solve this?

Feature request: some suggestions

I was in dire need for statistical annotations and successfully used your library. Thanks a lot for your work, it generally worked great.

Let me still suggest a couple of improvements. I know, this is a potpourri of different things. You can filter the requests one by one and create new issues in case you agree with them.

Your toolbox currently offers two principal features that are blended into one API-function: (1) Compute and format p-values, (2) and create annotated brackets. I'd suggest to factor out a function add_annotated_brackets() that is completely agnostic about any testing and p-values. This function creates the brackets and annotations according to the formatting settings. add_stat_annotation() could then just call add_annotated_brackets().
You could also better factor out statistical testing part, such that the users can interfere with the p-values, filter the brackets of interest etc. (see e.g. #50), before they actually add the brackets.
Personally, I think there are plenty of other libraries offering statistical testing. Keeping up with with all methodologies (bonferroni, bootstrapping, anova) is possibly difficult. I'd therefore put the primary focus on how to print these p-values (or other information), and make it as easy as possible for the user to create these annotations using the seaborn semantics.
The reason for these suggestion: With the current solution, one has to adjust several default parameters to use custom annotation and to disable the testing. It took me a while figure this out.
add arguments annotat_kws and line_kws that are forwarded to ax.annotate() and ax.plot()/lines.Line2D(), respectively. This will give the user a bit more stylistic freedom (font color, line styles, ...)
ax.annotate(..., clip_on=False, ...) (around Line 590) is wrong, it should be ax.annotate(..., clip_on=(loc=="inside")) I think. At least the setting is not consistent with the ax.plot() command a couple of lines earlier. If the stats annotation falls outside plt.ylim(), the bracket line is clipped, but not the annotation text.
...but maybe you want to leave the clipping options to the user altogether. You don't know exactly how the axes are used and fixed clip settings can mess up things.
Support different types of brackets (permit the user to pass a line path as template that is scaled and transformed)
Support inverted annotation brackets (from below)
In add_annotated_brackets() permit to set or adjust the y-locations of the brackets freely. The default choice to use max(y)+margin is good. But it might be useful to adjust this (e.g. have all brackets at same y).

You certainly know that it is not recommended to use private stuff from other libraries... On the other hand, I can completely understand why you decided to use seaborn's _Plotters, nevertheless. To reverse-engineer the seaborn plots from its Artists would be a nightmare. I recently asked the seaborn crew / Michael Waskom why not to extend seaborn by an annotation infrastructure. See here for the feature request and Michael's response.

horizontal orientation

hello, is it possible to use add_stat_annotation with a boxplot that has orient='h'? in other words can the annotations be rotated sideways?

annotations in a straight line

Hello,
I would like to know if there is a way to have all annotations in a straight line.
When there is a lot of boxes it doesn't leave a lot of space for the results. However, I can't find a way to have all the annotations in a straight line.
Thank you!
Clément

Hide non-significant annotations

This tool is excellent. Thank you for all your hard work. Is it possible to hide non-significant results where multiple comparisons are made? So only show the annotations for significant results?

Thanks!

Crash for barplots

There is a crash when trying to plot annotations for a barplot in statannot-0.2.3 and seaborn-0.10.0.

Consider the following code:

import numpy as np
import pandas as pd

import statannot
import seaborn as sns

import matplotlib.pyplot as plt


np.random.seed(42)
df = pd.DataFrame({
    'y': np.random.random(size=50),
    'x': np.random.choice(['X1', 'X2'], size=50),
})

ax = sns.barplot(x='x', y='y', data=df)
statannot.add_stat_annotation(
    ax, plot='barplot',
    data=df, x='x', y='y',
    box_pairs=[('X1', 'X2')],
    test='Mann-Whitney', text_format='simple'
)

plt.show()

It crashes with the error

Traceback (most recent call last):
  File "test.py", line 21, in <module>
    test='Mann-Whitney', text_format='simple'
  File "../python/site-packages/statannot/statannot.py", line 442, in add_stat_annotation
    errcolor=".26", errwidth=None, capsize=None, dodge=True)
TypeError: __init__() missing 1 required positional argument: 'seed'

This bug can be fixed by adding e.g. seed=None, in

statannot/statannot/statannot.py

Lines 438 to 442 in 1835078

 box_plotter = sns.categorical._BarPlotter( 

 x, y, hue, data, order, hue_order, 

 estimator=np.mean, ci=95, n_boot=1000, units=None, 

 orient=None, color=None, palette=None, saturation=.75, 

 errcolor=".26", errwidth=None, capsize=None, dodge=True)

factoring out plotting of annotations and statistical testing into two separate functions

In the issue #53, @normarius raised the topic of factoring out the plotting of annotation and the computation of statistical tests into two separate functions.

Your toolbox currently offers two principal features that are blended into one API-function: (1) Compute and format p-values, (2) and create annotated brackets. I'd suggest to factor out a function add_annotated_brackets() that is completely agnostic about any testing and p-values. This function creates the brackets and annotations according to the formatting settings. add_stat_annotation() could then just call add_annotated_brackets().

You could also better factor out statistical testing part, such that the users can interfere with the p-values, filter the brackets of interest etc. (see e.g. #50), before they actually add the brackets.

Personally, I think there are plenty of other libraries offering statistical testing. Keeping up with with all methodologies (bonferroni, bootstrapping, anova) is possibly difficult. I'd therefore put the primary focus on how to print these p-values (or other information), and make it as easy as possible for the user to create these annotations using the seaborn semantics.

The reason for these suggestion: With the current solution, one has to adjust several default parameters to use custom annotation and to disable the testing. It took me a while figure this out.

Chi Squared test between categorical data and asterisk change on the stars when exporting graphs into LaTeX

I wanted to add Chi-Squared tests to be able to compare categorical data and visualize significant differences. I was successful in doing so and was curious if anyone has any suggestions on improvement like adding multiple groups to perform a Chi-Squared test to look at multiple groups instead of just two.

I also wanted to have a significant difference asterisk to match the asterisk when I use the figures in LaTeX. I had to modify the symbol which is below.

I am using my own data set but any categorical data would work for this. There are four Failure Code values (0, 1, 2, 3), two age groups (+/- 60 yrs).

Here is how I would add the annotation for the significant differences:

ax = sns.countplot(x='Failure Code', hue='Age60', hue_order=['Age $<$ 60', 'Age $\geq$ 60'], data=df_no_Nan) # 

# Statistical test for differences
hue_order = list(df_no_Nan['Failure Code'].unique()) # List of groups (AgeGroups)
box_pairs_1 = [((FailureCodei, 'Age $<$ 60'), (FailureCodei, 'Age $\geq$ 60')) for FailureCodei in hue_order] # Create combinations to compare
box_pairs = box_pairs_1
test_results = add_stat_annotation(ax, plot = 'countplot', data=df_no_Nan, x='Failure Code', y='failure code', hue='Age60', box_pairs=box_pairs,
                                    test='chisquare', text_format='star',
                                    loc='inside', verbose=2, comparisons_correction=None) # 'bonferroni'

Here is the updated code of statannot:

import warnings

import matplotlib.pyplot as plt
from matplotlib import lines
import matplotlib.transforms as mtransforms
from matplotlib.font_manager import FontProperties
import numpy as np
import pandas as pd
import seaborn as sns
from seaborn.utils import remove_na
import pdb

from .utils import raise_expected_got, assert_is_in
from .StatResult import StatResult

from scipy import stats

DEFAULT = object()


def stat_test(
    box_data1,
    box_data2,
    test,
    comparisons_correction=None,
    num_comparisons=1,
    **stats_params
):
    """Get formatted result of two sample statistical test.

    Arguments
    ---------
    bbox_data1, bbox_data2
    test: str
        Statistical test to run. Must be one of:
        - `Levene`
        - `Mann-Whitney`
        - `Mann-Whitney-gt`
        - `Mann-Whitney-ls`
        - `t-test_ind`
        - `t-test_welch`
        - `t-test_paired`
        - `Wilcoxon`
        - `Kruskal`
        - `Chi squared`
    comparisons_correction: str or None, default None
        Method to use for multiple comparisons correction. Currently only the
        Bonferroni correction is implemented.
    num_comparisons: int, default 1
        Number of comparisons to use for multiple comparisons correction.
    stats_params
        Additional keyword arguments to pass to scipy stats functions.

    Returns
    -------
    StatResult object with formatted result of test.

    """
    # Check arguments.
    assert_is_in(
        comparisons_correction,
        ['bonferroni', None],
        label='argument `comparisons_correction`',
    )

    # Switch to run scipy.stats hypothesis test.
    if test == 'Levene':
        stat, pval = stats.levene(box_data1, box_data2, **stats_params)
        result = StatResult(
            'Levene test of variance', 'levene', 'stat', stat, pval
        )
    elif test == 'Mann-Whitney':
        u_stat, pval = stats.mannwhitneyu(
            box_data1, box_data2, alternative='two-sided', **stats_params
        )
        result = StatResult(
            'Mann-Whitney-Wilcoxon test two-sided',
            'M.W.W.',
            'U_stat',
            u_stat,
            pval,
        )
    elif test == 'Mann-Whitney-gt':
        u_stat, pval = stats.mannwhitneyu(
            box_data1, box_data2, alternative='greater', **stats_params
        )
        result = StatResult(
            'Mann-Whitney-Wilcoxon test greater',
            'M.W.W.',
            'U_stat',
            u_stat,
            pval,
        )
    elif test == 'Mann-Whitney-ls':
        u_stat, pval = stats.mannwhitneyu(
            box_data1, box_data2, alternative='less', **stats_params
        )
        result = StatResult(
            'Mann-Whitney-Wilcoxon test smaller',
            'M.W.W.',
            'U_stat',
            u_stat,
            pval,
        )
    elif test == 't-test_ind':
        stat, pval = stats.ttest_ind(a=box_data1, b=box_data2, **stats_params)
        result = StatResult(
            't-test independent samples', 't-test_ind', 'stat', stat, pval
        )
    elif test == 't-test_welch':
        stat, pval = stats.ttest_ind(
            a=box_data1, b=box_data2, equal_var=False, **stats_params
        )
        result = StatResult(
            'Welch\'s t-test independent samples',
            't-test_welch',
            'stat',
            stat,
            pval,
        )
    elif test == 't-test_paired':
        stat, pval = stats.ttest_rel(a=box_data1, b=box_data2, **stats_params)
        result = StatResult(
            't-test paired samples', 't-test_rel', 'stat', stat, pval
        )
    elif test == 'Wilcoxon':
        zero_method_default = len(box_data1) <= 20 and "pratt" or "wilcox"
        zero_method = stats_params.get('zero_method', zero_method_default)
        print("Using zero_method ", zero_method)
        stat, pval = stats.wilcoxon(
            box_data1, box_data2, zero_method=zero_method, **stats_params
        )
        result = StatResult(
            'Wilcoxon test (paired samples)', 'Wilcoxon', 'stat', stat, pval
        )
    elif test == 'Kruskal':
        stat, pval = stats.kruskal(box_data1, box_data2, **stats_params)
        test_short_name = 'Kruskal'
        result = StatResult(
            'Kruskal-Wallis paired samples', 'Kruskal', 'stat', stat, pval
        )
        
    elif test == 'chisquare':
        stat, pval = stats.chisquare([box_data1.count(), box_data2.count()], **stats_params)
        test_short_name = 'ChiSquare'
        result = StatResult(
            'ChiSquare categorical groups', 'ChiSquare', 'stat', stat, pval
        )
        
        
    else:
        result = StatResult(None, '', None, None, np.nan)

    # Optionally, run multiple comparisons correction.
    if comparisons_correction == 'bonferroni':
        result.pval = bonferroni(result.pval, num_comparisons)
        result.test_str = result.test_str + ' with Bonferroni correction'
    elif comparisons_correction is None:
        pass
    else:
        # This should never be reached because `comparisons_correction` must
        # be a valid correction method or None.
        raise RuntimeError('Unexpectedly reached end of switch.')

    return result


def bonferroni(p_values, num_comparisons='auto'):
    """Apply Bonferroni correction for multiple comparisons.

    The Bonferroni correction is defined as
        p_corrected = min(num_comparisons * p, 1.0).

    Arguments
    ---------
    p_values: scalar or list-like
        One or more p_values to correct.
    num_comparisons: int or `auto`
        Number of comparisons. Use `auto` to infer the number of comparisons
        from the length of the `p_values` list.

    Returns
    -------
    Scalar or numpy array of corrected p-values.

    """
    # Input checks.
    if np.ndim(p_values) > 1:
        raise_expected_got(
            'Scalar or list-like', 'argument `p_values`', p_values
        )
    if num_comparisons != 'auto':
        try:
            # Raise a TypeError if num_comparisons is not numeric, and raise
            # an AssertionError if it isn't int-like.
            assert np.ceil(num_comparisons) == num_comparisons
        except (AssertionError, TypeError) as e:
            raise_expected_got(
                'Int or `auto`', 'argument `num_comparisons`', num_comparisons
            )

    # Coerce p_values to numpy array.
    p_values_array = np.atleast_1d(p_values)

    if num_comparisons == 'auto':
        # Infer number of comparisons
        num_comparisons = len(p_values_array)
    elif len(p_values_array) > 1 and num_comparisons != len(p_values_array):
        # Warn if multiple p_values have been passed and num_comparisons is
        # set manually.
        warnings.warn(
            'Manually-specified `num_comparisons={}` differs from number of '
            'p_values to correct ({}).'.format(
                num_comparisons, len(p_values_array)
            )
        )

    # Apply correction by multiplying p_values and thresholding at p=1.0
    p_values_array *= num_comparisons
    p_values_array = np.min(
        [p_values_array, np.ones_like(p_values_array)], axis=0
    )

    if len(p_values_array) == 1:
        # Return a scalar if input was a scalar.
        return p_values_array[0]
    else:
        return p_values_array



def pval_annotation_text(x, pvalue_thresholds):
    single_value = False
    if type(x) is np.array:
        x1 = x
    else:
        x1 = np.array([x])
        single_value = True
    # Sort the threshold array
    pvalue_thresholds = pd.DataFrame(pvalue_thresholds).sort_values(by=0, ascending=False).values
    x_annot = pd.Series(["" for _ in range(len(x1))])
    for i in range(0, len(pvalue_thresholds)):
        if i < len(pvalue_thresholds)-1:
            condition = (x1 <= pvalue_thresholds[i][0]) & (pvalue_thresholds[i+1][0] < x1)
            x_annot[condition] = pvalue_thresholds[i][1]
        else:
            condition = x1 < pvalue_thresholds[i][0]
            x_annot[condition] = pvalue_thresholds[i][1]

    return x_annot if not single_value else x_annot.iloc[0]


def simple_text(pval, pvalue_format, pvalue_thresholds, test_short_name=None):
    """
    Generates simple text for test name and pvalue
    :param pval: pvalue
    :param pvalue_format: format string for pvalue
    :param test_short_name: Short name of test to show
    :param pvalue_thresholds: String to display per pvalue range
    :return: simple annotation
    """
    # Sort thresholds
    thresholds = sorted(pvalue_thresholds, key=lambda x: x[0])

    # Test name if passed
    text = test_short_name and test_short_name + " " or ""

    for threshold in thresholds:
        if pval < threshold[0]:
            pval_text = "p ≤ {}".format(threshold[1])
            break
    else:
        pval_text = "p = {}".format(pvalue_format).format(pval)

    return text + pval_text

# ='boxplot' removed after the word plot
def add_stat_annotation(ax, plot, 
                        data=None, x=None, y=None, hue=None, units=None, order=None,
                        hue_order=None, box_pairs=None, width=0.8,
                        perform_stat_test=True,
                        pvalues=None, test_short_name=None,
                        test=None, text_format='star', pvalue_format_string=DEFAULT,
                        text_annot_custom=None,
                        loc='inside', show_test_name=True,
                        pvalue_thresholds=DEFAULT, stats_params=dict(),
                        comparisons_correction='bonferroni',
                        use_fixed_offset=False, line_offset_to_box=None,
                        line_offset=None, line_height=0.02, text_offset=1,
                        color='0.2', linewidth=1.5,
                        fontsize='medium', verbose=1):
    """
    Optionally computes statistical test between pairs of data series, and add statistical annotation on top
    of the boxes/bars. The same exact arguments `data`, `x`, `y`, `hue`, `order`, `width`,
    `hue_order` (and `units`) as in the seaborn boxplot/barplot function must be passed to this function.

    This function works in one of the two following modes:
    a) `perform_stat_test` is True: statistical test as given by argument `test` is performed.
    b) `perform_stat_test` is False: no statistical test is performed, list of custom p-values `pvalues` are
       used for each pair of boxes. The `test_short_name` argument is then used as the name of the
       custom statistical test.

    :param plot: type of the plot, one of 'boxplot' or 'barplot'.
    :param line_height: in axes fraction coordinates
    :param text_offset: in points
    :param box_pairs: can be of either form: For non-grouped boxplot: `[(cat1, cat2), (cat3, cat4)]`. For boxplot grouped by hue: `[((cat1, hue1), (cat2, hue2)), ((cat3, hue3), (cat4, hue4))]`
    :param pvalue_format_string: defaults to `"{.3e}"`
    :param pvalue_thresholds: list of lists, or tuples. Default is: For "star" text_format: `[[1e-4, "****"], [1e-3, "***"], [1e-2, "**"], [0.05, "*"], [1, "ns"]]`. For "simple" text_format : `[[1e-5, "1e-5"], [1e-4, "1e-4"], [1e-3, "0.001"], [1e-2, "0.01"]]`
    :param pvalues: list or array of p-values for each box pair comparison.
    :param comparisons_correction: Method for multiple comparisons correction. `bonferroni` or None.
    """

    def find_x_position_box(box_plotter, boxName):
        """
        boxName can be either a name "cat" or a tuple ("cat", "hue")
        """
        if box_plotter.plot_hues is None:
            cat = boxName
            hue_offset = 0
        else:
            cat = boxName[0]
            hue = boxName[1]
            hue_offset = box_plotter.hue_offsets[
                box_plotter.hue_names.index(hue)]

        group_pos = box_plotter.group_names.index(cat)
        box_pos = group_pos + hue_offset
        return box_pos

    def get_box_data(box_plotter, boxName):
        """
        boxName can be either a name "cat" or a tuple ("cat", "hue")

        Here we really have to duplicate seaborn code, because there is not
        direct access to the box_data in the BoxPlotter class.
        """
        cat = box_plotter.plot_hues is None and boxName or boxName[0]

        index = box_plotter.group_names.index(cat)
        group_data = box_plotter.plot_data[index]

        if box_plotter.plot_hues is None:
            # Draw a single box or a set of boxes
            # with a single level of grouping
            box_data = remove_na(group_data)
        else:
            hue_level = boxName[1]
            hue_mask = box_plotter.plot_hues[index] == hue_level
            box_data = remove_na(group_data[hue_mask])

        return box_data

    # Set default values if necessary
    if pvalue_format_string is DEFAULT:
        pvalue_format_string = '{:.3e}'
        simple_format_string = '{:.2f}'
    else:
        simple_format_string = pvalue_format_string

    if pvalue_thresholds is DEFAULT:
        if text_format == "star":
            pvalue_thresholds = [[0.0001, r"${****}$"], [0.001, r"${***}$"],
                                 [0.01, r"${**}$"], [0.05, r"$*$"], [1, "ns"]]
        else:
            pvalue_thresholds = [[1e-5, "1e-5"], [1e-4, "1e-4"],
                                 [1e-3, "0.001"], [1e-2, "0.01"]]

    fig = plt.gcf()

    # Validate arguments
    if perform_stat_test:
        if test is None:
            raise ValueError("If `perform_stat_test` is True, `test` must be specified.")
        if pvalues is not None or test_short_name is not None:
            raise ValueError("If `perform_stat_test` is True, custom `pvalues` "
                             "or `test_short_name` must be `None`.")
        valid_list = ['t-test_ind', 't-test_welch', 't-test_paired',
                      'Mann-Whitney', 'Mann-Whitney-gt', 'Mann-Whitney-ls',
                      'Levene', 'Wilcoxon', 'Kruskal', 'chisquare']
        if test not in valid_list:
            raise ValueError("test value should be one of the following: {}."
                             .format(', '.join(valid_list)))
    else:
        if pvalues is None:
            raise ValueError("If `perform_stat_test` is False, custom `pvalues` must be specified.")
        if test is not None:
            raise ValueError("If `perform_stat_test` is False, `test` must be None.")
        if len(pvalues) != len(box_pairs):
            raise ValueError("`pvalues` should be of the same length as `box_pairs`.")

    if text_annot_custom is not None and len(text_annot_custom) != len(box_pairs):
        raise ValueError("`text_annot_custom` should be of same length as `box_pairs`.")

    assert_is_in(
        loc, ['inside', 'outside'], label='argument `loc`'
    )
    assert_is_in(
        text_format,
        ['full', 'simple', 'star'],
        label='argument `text_format`'
    )
    assert_is_in(
        comparisons_correction,
        ['bonferroni', None],
        label='argument `comparisons_correction`'
    )

    if verbose >= 1 and text_format == 'star':
        print("p-value annotation legend:")
        pvalue_thresholds = pd.DataFrame(pvalue_thresholds).sort_values(by=0, ascending=False).values
        for i in range(0, len(pvalue_thresholds)):
            if i < len(pvalue_thresholds)-1:
                print('{}: {:.2e} < p <= {:.2e}'.format(pvalue_thresholds[i][1],
                                                        pvalue_thresholds[i+1][0],
                                                        pvalue_thresholds[i][0]))
            else:
                print('{}: p <= {:.2e}'.format(pvalue_thresholds[i][1], pvalue_thresholds[i][0]))
        print()

    ylim = ax.get_ylim()
    yrange = ylim[1] - ylim[0]

    if line_offset is None:
        if loc == 'inside':
            line_offset = 0.05
            if line_offset_to_box is None:
                line_offset_to_box = 0.06
        # 'outside', see valid_list
        else:
            line_offset = 0.03
            if line_offset_to_box is None:
                line_offset_to_box = line_offset
    else:
        if loc == 'inside':
            if line_offset_to_box is None:
                line_offset_to_box = 0.06
        elif loc == 'outside':
            line_offset_to_box = line_offset
    y_offset = line_offset*yrange
    y_offset_to_box = line_offset_to_box*yrange

    if plot == 'boxplot':
        # Create the same plotter object as seaborn's boxplot
        box_plotter = sns.categorical._BoxPlotter(
            x, y, hue, data, order, hue_order, orient=None, width=width, color=None,
            palette=None, saturation=.75, dodge=True, fliersize=5, linewidth=None)
    
    elif plot == 'barplot':
        # Create the same plotter object as seaborn's barplot
        box_plotter = sns.categorical._BarPlotter(
            x, y, hue, data, order, hue_order,
            estimator=np.mean, ci=95, n_boot=1000, units=None, seed=None, 
            orient=None, color=None, palette=None, saturation=.75,
            errcolor=".26", errwidth=None, capsize=None, dodge=True)
    
    elif plot == 'countplot':
        # Create the same plotter object as seaborn's countplot
        box_plotter = sns.categorical._CountPlotter(
            x, y, hue, data, order, hue_order,
            estimator=np.mean, ci=95, n_boot=1000, units=None, seed=None, 
            orient=None, color=None, palette=None, saturation=.75,
            errcolor=".26", errwidth=None, capsize=None, dodge=True)

    # Build the list of box data structures with the x and ymax positions
    group_names = box_plotter.group_names
    hue_names = box_plotter.hue_names
    
    if box_plotter.plot_hues is None:
        box_names = group_names
        labels = box_names
    else:
        box_names = [(group_name, hue_name) for group_name in group_names for hue_name in hue_names]
        labels = ['{}_{}'.format(group_name, hue_name) for (group_name, hue_name) in box_names]

    if test == 'chisquare':
        box_structs = [{'box':box_names[i],
                    'label':labels[i],
                    'x':find_x_position_box(box_plotter, box_names[i]),
                    'box_data':get_box_data(box_plotter, box_names[i]),
                    'ymax':np.amax(get_box_data(box_plotter, box_names[i]).count()) if
                           len(get_box_data(box_plotter, box_names[i])) > 0 else np.nan}
                   for i in range(len(box_names))]
    else:
        box_structs = [{'box':box_names[i],
                        'label':labels[i],
                        'x':find_x_position_box(box_plotter, box_names[i]),
                        'box_data':get_box_data(box_plotter, box_names[i]),
                        'ymax':np.amax(get_box_data(box_plotter, box_names[i])) if
                               len(get_box_data(box_plotter, box_names[i])) > 0 else np.nan}
                       for i in range(len(box_names))]
    # Sort the box data structures by position along the x axis
    box_structs = sorted(box_structs, key=lambda x: x['x'])
    # Add the index position in the list of boxes along the x axis
    box_structs = [dict(box_struct, xi=i) for i, box_struct in enumerate(box_structs)]
    # Same data structure list with access key by box name
    box_structs_dic = {box_struct['box']:box_struct for box_struct in box_structs}

    # Build the list of box data structure pairs
    box_struct_pairs = []
    for i_box_pair, (box1, box2) in enumerate(box_pairs):
        valid = box1 in box_names and box2 in box_names
        if not valid:
            raise ValueError("box_pairs contains an invalid box pair.")
            pass
        # i_box_pair will keep track of the original order of the box pairs.
        box_struct1 = dict(box_structs_dic[box1], i_box_pair=i_box_pair)
        box_struct2 = dict(box_structs_dic[box2], i_box_pair=i_box_pair)
        if box_struct1['x'] <= box_struct2['x']:
            pair = (box_struct1, box_struct2)
        else:
            pair = (box_struct2, box_struct1)
        box_struct_pairs.append(pair)

    # Draw first the annotations with the shortest between-boxes distance, in order to reduce
    # overlapping between annotations.
    box_struct_pairs = sorted(box_struct_pairs, key=lambda x: abs(x[1]['x'] - x[0]['x']))

    # Build array that contains the x and y_max position of the highest annotation or box data at
    # a given x position, and also keeps track of the number of stacked annotations.
    # This array will be updated when a new annotation is drawn.
    y_stack_arr = np.array([[box_struct['x'] for box_struct in box_structs],
                            [box_struct['ymax'] for box_struct in box_structs],
                            [0 for i in range(len(box_structs))]])
    if loc == 'outside':
        y_stack_arr[1, :] = ylim[1]
    ann_list = []
    test_result_list = []
    ymaxs = []
    y_stack = []

    for box_struct1, box_struct2 in box_struct_pairs:

        box1 = box_struct1['box']
        box2 = box_struct2['box']
        label1 = box_struct1['label']
        label2 = box_struct2['label']
        box_data1 = box_struct1['box_data']
        box_data2 = box_struct2['box_data']
        x1 = box_struct1['x']
        x2 = box_struct2['x']
        xi1 = box_struct1['xi']
        xi2 = box_struct2['xi']
        ymax1 = box_struct1['ymax']
        ymax2 = box_struct2['ymax']
        i_box_pair = box_struct1['i_box_pair']

        # Find y maximum for all the y_stacks *in between* the box1 and the box2
        i_ymax_in_range_x1_x2 = xi1 + np.nanargmax(y_stack_arr[1, np.where((x1 <= y_stack_arr[0, :]) &
                                                                           (y_stack_arr[0, :] <= x2))])
        ymax_in_range_x1_x2 = y_stack_arr[1, i_ymax_in_range_x1_x2]

        if perform_stat_test:
            result = stat_test(
                box_data1,
                box_data2,
                test,
                comparisons_correction,
                len(box_struct_pairs),
                **stats_params
            )
        else:
            test_short_name = test_short_name if test_short_name is not None else ''
            result = StatResult(
                'Custom statistical test',
                test_short_name,
                None,
                None,
                pvalues[i_box_pair]
            )

        result.box1 = box1
        result.box2 = box2
        test_result_list.append(result)

        # Don't plot lines that are not significantly different to only plot significant bars
        # (https://github.com/webermarcolivier/statannot/issues/25)
        if result.pval > 0.05:
            print(result.box1, 'and' ,result.box2, 'did not show significant differences and the p value = {}'.format(result.pval))
            continue
        else:
            print(result.box1, 'and' ,result.box2, 'did show significant differences and the p value = {}'.format(result.pval))

        if verbose >= 1:
            print("{} v.s. {}: {}".format(label1, label2, result.formatted_output))

        if text_annot_custom is not None:
            text = text_annot_custom[i_box_pair]
        else:
            if text_format == 'full':
                text = "{} p = {}".format('{}', pvalue_format_string).format(result.test_short_name, result.pval)
            elif text_format is None:
                text = None
            elif text_format is 'star':
                text = pval_annotation_text(result.pval, pvalue_thresholds)
            elif text_format is 'simple':
                test_short_name = show_test_name and test_short_name or ""
                text = simple_text(result.pval, simple_format_string, pvalue_thresholds, test_short_name)

        yref = ymax_in_range_x1_x2
        yref2 = yref

        # Choose the best offset depending on wether there is an annotation below
        # at the x position in the range [x1, x2] where the stack is the highest
        if y_stack_arr[2, i_ymax_in_range_x1_x2] == 0:
            # there is only a box below
            offset = y_offset_to_box
        else:
            # there is an annotation below
            offset = y_offset
        y = yref2 + offset
        h = line_height*yrange
        line_x, line_y = [x1, x1, x2, x2], [y, y + h, y + h, y]
        if loc == 'inside':
            ax.plot(line_x, line_y, lw=linewidth, c=color)
        elif loc == 'outside':
            line = lines.Line2D(line_x, line_y, lw=linewidth, c=color, transform=ax.transData)
            line.set_clip_on(False)
            ax.add_line(line)

        # why should we change here the ylim if at the very end we set it to the correct range????
        # ax.set_ylim((ylim[0], 1.1*(y + h)))

        if text is not None:
            ann = ax.annotate(
                text, xy=(np.mean([x1, x2]), y + h),
                xytext=(0, text_offset), textcoords='offset points',
                xycoords='data', ha='center', va='bottom',
                fontsize=fontsize, clip_on=False, annotation_clip=False)
            ann_list.append(ann)

            plt.draw()
            y_top_annot = None
            got_mpl_error = False
            if not use_fixed_offset:
                try:
                    bbox = ann.get_window_extent()
                    bbox_data = bbox.transformed(ax.transData.inverted())
                    y_top_annot = bbox_data.ymax
                except RuntimeError:
                    got_mpl_error = True

            if use_fixed_offset or got_mpl_error:
                if verbose >= 1:
                    print("Warning: cannot get the text bounding box. Falling back to a fixed"
                          " y offset. Layout may be not optimal.")
                # We will apply a fixed offset in points,
                # based on the font size of the annotation.
                fontsize_points = FontProperties(size='medium').get_size_in_points()
                offset_trans = mtransforms.offset_copy(
                    ax.transData, fig=fig, x=0,
                    y=1.0*fontsize_points + text_offset, units='points')
                y_top_display = offset_trans.transform((0, y + h))
                y_top_annot = ax.transData.inverted().transform(y_top_display)[1]
        else:
            y_top_annot = y + h

        y_stack.append(y_top_annot)    # remark: y_stack is not really necessary if we have the stack_array
        ymaxs.append(max(y_stack))
        # Fill the highest y position of the annotation into the y_stack array
        # for all positions in the range x1 to x2
        y_stack_arr[1, (x1 <= y_stack_arr[0, :]) & (y_stack_arr[0, :] <= x2)] = y_top_annot
        # Increment the counter of annotations in the y_stack array
        y_stack_arr[2, xi1:xi2 + 1] = y_stack_arr[2, xi1:xi2 + 1] + 1

    # Check to see if there are actual significant differences
    if len(ymaxs) == 0:
        pass
    else:
        y_stack_max = max(ymaxs)
        if loc == 'inside':
            ax.set_ylim((ylim[0], max(1.03*y_stack_max, ylim[1])))
        elif loc == 'outside':
            ax.set_ylim((ylim[0], ylim[1]))

    return ax, test_result_list

improve documentation

Improve documentation of add_stat_annotation function.

statannot tries to import remove_na from seaborn.utils which is deprecated

update readme

Readme should be updated with the latest interface and examples.

Error when plot adjusts max y limit on axes

Here's the add_stat_annotation code i'm currently using:

stat_tests = add_stat_annotation(ax=g1, data=prop_df, x='group', y='skew',
                                             order=group_list,
                                             box_pairs=k_pairs,
                                             text_annot_custom=k_annot,
                                             perform_stat_test=False, pvalues=k_pvals,
                                             loc='inside', verbose=0)

(Where k_pairs, k_annot, and k_pvals are lists of groups, strings to annotate with and p values) throws error:

--> 627     y_stack_max = max(ymaxs)
    628     if loc == 'inside':
    629         ax.set_ylim((ylim[0], max(1.03*y_stack_max, ylim[1])))

ValueError: max() arg is an empty sequence

This error occurs whether I choose to manually set the y limits for the axes on my figure or not. Note I am applying this statistical annotation to each plot on a matplotlib plt.subplots.

Multiple comparisons

Is the multiple comparisons problem tackled somehow in the library? If not, adding an option to take this into account would be great. I am happy to help with that if needed.

Feature request: list of lists for box_pairs

Please allow use of "list of lists" for box_pairs.

Thank you.

p-value <= 1

The output of the code in the notebook is:

pvalue annotation legend:
ns: 5.00e-02 < p <= 1.00e+09
*: 1.00e-02 < p <= 5.00e-02
**: 1.00e-03 < p <= 1.00e-02
***: 1.00e-04 < p <= 1.00e-03
****: p <= 1.00e-04

Since p-value is always smaller than (or equal) to one, shouldn't the ns line be: ns: 5.00e-02 < p <= 1 ?

In case this has been intentional please ignore my comments. Otherwise I think the following patch fixes the issue:

diff --git a/statannot.py b/statannot.py
index 45b90b3..925f408 100644
--- a/statannot.py
+++ b/statannot.py
@@ -52,7 +52,7 @@ def add_stat_annotation(ax,
                         data=None, x=None, y=None, hue=None, order=None, hue_order=None,
                         boxPairList=None,
                         test='Mann-Whitney', textFormat='star', loc='inside',
-                        pvalueThresholds=[[1e9,"ns"], [0.05,"*"], [1e-2,"**"], [1e-3,"***"], [1e-4,"****"]],
+                        pvalueThresholds=[[1,"ns"], [0.05,"*"], [1e-2,"**"], [1e-3,"***"], [1e-4,"****"]],
                         color='0.2', lineYOffsetAxesCoord=None, lineHeightAxesCoord=0.02, yTextOffsetPoints=1,
                         linewidth=1.5, fontsize='medium', useFixedOffset=False, verbose=1):
     """

RuntimeError : Cannot get window extent w/o renderer

Hey there !
I'm having a RuntimeError running this :

data
Out[66]: 
    Control    FBS 1%    FBS 3%
0  0.494348  1.196539  0.921887
1  0.556027  0.940206  1.153515
2  0.445540       NaN  1.108820
3  0.464731  0.931461  0.956742
4  0.393526  0.894547  1.073090
5  0.479290       NaN  1.099829
6  0.683442  0.936075       NaN
7  0.667166       NaN       NaN
8  0.526530       NaN       NaN
9  0.499731       NaN       NaN

statannot.add_stat_annotation(ax, data, boxPairList=[('Control','FBS 1%'), ('Control','FBS 3%'), ('FBS 1%','FBS 3%')], test='t-test', textFormat='star',loc='outside', verbose=1)

pvalue annotation legend:
ns: 5.00e-02 < p <= 1.00e+09
*: 1.00e-02 < p <= 5.00e-02
**: 1.00e-03 < p <= 1.00e-02
***: 1.00e-04 < p <= 1.00e-03
****: p <= 1.00e-04

Traceback (most recent call last):

  File "<ipython-input-67-57fb82b05cc8>", line 1, in <module>
    statannot.add_stat_annotation(ax, data, boxPairList=[('Control','FBS 1%'), ('Control','FBS 3%'), ('FBS 1%','FBS 3%')], test='t-test', textFormat='star',loc='outside', verbose=1)

  File "C:\Users\analysesbiophysique\.spyder-py3\projets\Imaging\packages\statannot.py", line 218, in add_stat_annotation
    bbox = ann.get_window_extent()

  File "C:\ProgramData\Anaconda3\lib\site-packages\matplotlib\text.py", line 2323, in get_window_extent
    text_bbox = Text.get_window_extent(self, renderer=renderer)

  File "C:\ProgramData\Anaconda3\lib\site-packages\matplotlib\text.py", line 920, in get_window_extent
    raise RuntimeError('Cannot get window extent w/o renderer')

RuntimeError: Cannot get window extent w/o renderer

Do you have any idea why this is happenning ?

Thanks !

any plan to apply to catplot?

Thanks for your good work on this tool, really appreciate it, but any plan to apply it to seaborn catplot?

plot created without annotation - error message "add_stat_annotation() got an unexpected keyword argument 'box_pairs'"

I try to run a simple plot with the annotation but do not manage to make it work. I have checked that my packages are up to date:
(I use Anaconda and Spyder)
sns: '0.9.0'
numpy: '1.15.4'
matplotlib: '3.0.2'
pd: 0.23.4
scipy: '1.1.0'

My code produces the right figure but without annotation and spits the following error message:
TypeError: add_stat_annotation() got an unexpected keyword argument 'box_pairs'

Any idea why? Any help much appreciated!

Marie

The code I use :

import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
import numpy as np
from statannot import add_stat_annotation
from pandas.compat import StringIO

df = sns.load_dataset("tips")
from statannot.statannot import add_stat_annotation

x = "day"
y = "total_bill"
order = ['Sun', 'Thur', 'Fri', 'Sat']
ax = sns.boxplot(data=df, x=x, y=y, order=order)
test_results = add_stat_annotation(ax, data=df, x=x, y=y, order=order,
box_pairs=[("Thur", "Fri"), ("Thur", "Sat"), ("Fri", "Sun")],
test='Mann-Whitney', text_format='star', loc='outside', verbose=2)
plt.savefig('example_non-hue_outside.png', dpi=300, bbox_inches='tight')
test_results`

Something in between full and simple

I was writing to ask the easiest way to get the absolute p-value without also printing the type of test utilized (i.e. text_format = full). Essentially, I want to the "simple" output, but with the actual p-value not the simplified form (e.g. <0.05

TypeError: init() missing 1 required positional argument: 'seed'

When I tried to use add_state_annotation() to a barplot, I get the following error:

Traceback (most recent call last):
  File "plotting.py", line 63, in result_plot
    add_stat_annotation(ax, plot="barplot", data=df,
  File "/home/myhome/pyenv/py37/lib/python3.7/site-packages/statannot/statannot.py", line 442, in add_stat_annotation
    errcolor=".26", errwidth=None, capsize=None, dodge=True)
TypeError: __init__() missing 1 required positional argument: 'seed'

I am using seaborn 0.11.0. I checked seaborn's source code and indeed there is a required seed argument. I think this is a simple bug to be fixed. Also, it would be nice to have some barplot examples in the Jupyter Notebook, so that the barplot annotation functionality is tested:)

p-value and stats returning 'nan' except for Mann-Whitney test

Hi! I am attempting to use add_stat_annotation to add t-test statistics for group differences to a bar plot, but keep getting 'nan' for the returned p and t values. I have resorted to using the scipy t-test_ind function to compute these prior and add them using add_stat_annot manually, but would be great to know if there is a way to fix this! I also tried out the other available tests, and got nans for all except the Mann-Whitney test. Of note, my x variable is a categorical variable containing 1s, 2s, and 3s, and my y variable is a continuous variable with no 0s or nans. Thanks for any insight!

Cannot import add_statistical_test_annotation

Hi,
I am kind of stuck trying to fix this error. Perhaps it's an implementing issue, though it doesn't have a problem just importing the module statannot. Also I noticed in statannot.py the function is called differently (add_stat_annotation // not add_statistical_test_annotation), though several blogs from late 2018 recommended latter function name. This is my code:

from statannot import add_statistical_test_annotation
(...)
with sns.axes_style(style='ticks'):
g = sns.catplot("Category", "Week 2", data=df, kind="box")
g.set_axis_labels("Category", str(input_parameter))
add_statistical_test_annotation(g, df, [("SHAM", "iCMP"), ("PBS", "iCMP"), ("eGFP", "iCMP")],
test='Mann-Whitney', order=None, textFormat='star', loc='inside', verbose=2)
plt.show()

Error:
from statannot import add_statistical_test_annotation
ImportError: cannot import name 'add_statistical_test_annotation'

Any help gladly appreciated!

Multi-comparison adjustment does not automatically apply

I am running three comparisons with an independent t-test and the annotation shows nicely on the plot but the p values are not adjusted following the Bonferroni procedure. I think I have the right statannot version as I have the StatResult.py file in my statannot folder. I am at a loss to find out what the problem is.

When I check the analysis I can see that there is no adjustment as shown below.

Could you help me to "activate" the adjustment?

for res in test_results: print(res)

print("\nStatResult attributes:", test_results[0].__dict__.keys())

I get:

AxesSubplot(0.125,0.125;0.775x0.755)
[{'pvalue': 1.210296633675592e-07, 'test_short_name': 't-test_ind', 'formatted_output': 't-test independent samples, P_val=1.210e-07 stat=5.397e+00', 'box1': 'Donation? Yes', 'box2': 'Donation? No'}, {'pvalue': 0.002277227049961783, 'test_short_name': 't-test_ind', 'formatted_output': 't-test independent samples, P_val=2.277e-03 stat=-3.074e+00', 'box1': 'Control', 'box2': 'Donation? Yes'}, {'pvalue': 0.003862565144342565, 'test_short_name': 't-test_ind', 'formatted_output': 't-test independent samples, P_val=3.863e-03 stat=2.904e+00', 'box1': 'Control', 'box2': 'Donation? No'}]

	box_plotter = sns.categorical._BarPlotter(
	x, y, hue, data, order, hue_order,
	estimator=np.mean, ci=95, n_boot=1000, units=None,
	orient=None, color=None, palette=None, saturation=.75,
	errcolor=".26", errwidth=None, capsize=None, dodge=True)

webermarcolivier / statannot Goto Github PK

statannot's People

Contributors

Stargazers

Watchers

Forkers

statannot's Issues

Recommend Projects

Recommend Topics

Recommend Org