grousselet / rogme Goto Github PK

View Code? Open in Web Editor NEW

78.0 7.0 10.0 15.39 MB

Robust Graphical Methods For Group Comparisons

License: MIT License

R 100.00%

r quantile robust data-visualization statistics

rogme's Introduction

rogme

Robust Graphical Methods For Group Comparisons (v. 0.2.1)

The rogme R package provides graphical tools and robust statistical methods to compare groups of continous and pseudo-continuous observations. The goal is to illustrate and quantify how and by how much groups differ. The current version of the package is limited to comparing two groups (though multiple pairs of groups can be compared in one go). Future developments will extend the tools to deal with multiple groups, interactions and hierarchical designs.

NEW: a hierarchical shift function to compare two dependent conditions from one group is now available. It has parametric and bootstrap versions. The approach is described in Rousselet & Wilcox, 2019.

The package can be installed using these commands:

install.packages("devtools")
devtools::install_github("GRousselet/rogme")

The approach behind the package can be summarised in one figure:

How two independent distributions differ. Left: standard but misleading bar graphs of mean values. Right: detailed graphical methods. A. Stripcharts of marginal distributions. Vertical lines mark the deciles, with a thicker line for the median. B. Kernel density representation and rug plot of the distribution of difference scores. Vertical lines mark the deciles, with a thicker line for the median. C. Shift function. Group 1 - Group 2 is plotted along the y-axis for each decile (white disk), as a function of Group 1 deciles. The vertical lines indicate 95% bootstrap confidence intervals. The shift function can be sparser or denser by changing the quantiles. D. Difference asymmetry function with 95% bootstrap confidence intervals.

The approach is also described in these articles:

A few simple steps to improve the description of group results in neuroscience

Beyond differences in means: robust graphical methods to compare two groups in neuroscience. [Reproducibility package using rogme]

rogme uses ggplot2 for graphical representations, and the main statistical functions were developed by Rand Wilcox, as part of his WRS package.

The main tool in rogme is the shift function. A shift function shows the difference between the quantiles of two groups as a function of the quantiles of one group. For inferences, the function returns an uncertainty interval for each quantile difference. By default, the deciles are used. Currently, confidence intervals are computed using one of two percentile bootstrap techniques. Highest density intervals and Bayesian bootstrap intervals will be available eventually.

Vignettes

Functions

All the main functions rely on the Harrell-Davis quantile estimator, computed by the hd() function.

Shift function

In the WRS package, the shift function can be calculated using:

shifthd() or qcomhd() for independent groups
shiftdhd() or Dqcomhd() for dependent groups

These functions can also produce non-ggplot figures.

In rogme, the shift function can be calculated using:

shifthd() or shifthd_pbci() for independent groups
shiftdhd() or shiftdhd_pbci() for dependent groups

Illustrations of the results is handled separately by plot_sf().

You can see the shift function in action for instance in these publications:

Difference asymmetry function

The difference asymmetry function is another powerful graphical and inferential tool. In the WRS package it is calculated using:

qwmwhd() for independent groups
difQpci() for dependent groups

In rogme, these functions have been renamed:

asymhd() for independent groups
asymdhd() for dependent groups
plot_diff_asym() to plot the results

You can see the difference asymmetry function in action in this blog post and in this review article.

Shift function demo [1]

Detailed illustration of the shift function using two distributions that differ in spread. The observations are in arbitrary units (a.u.).

#> generate data
set.seed(21)
g1 <- rnorm(1000) + 6
g2 <- rnorm(1000) * 1.5 + 6

#> make tibble
df <- mkt2(g1, g2)

First, we generate 1D scatterplots for the two groups.

#> scatterplots alone
ps <- plot_scat2(data = df,
                 formula = obs ~ gr,
                 xlabel = "",
                 ylabel = "Scores (a.u.)",
                 alpha = 1,
                 shape = 21,
                 colour = "grey10",
                 fill = "grey90") #> scatterplots
ps <- ps + coord_flip()
ps

Second, we compute the shift function and then plot it.

#> compute shift function
sf <- shifthd(data = df, formula = obs ~ gr, nboot = 200)
#> sf <- shifthd_pbci(data = df, formula = obs ~ gr, nboot = 200, q = c(.1,.25,.5,.75,.9))

#> plot shift function
psf <- plot_sf(sf, plot_theme = 2)
#> Warning: Using alpha for a discrete variable is not advised.

#> Warning: Using alpha for a discrete variable is not advised.

#> add labels for deciles 1 & 9
psf <- add_sf_lab(psf, sf, 
                  y_lab_nudge = .1, 
                  text_size = 4)

#> change axis labels
psf[[1]] <- psf[[1]] +  labs(x = "Group 1 quantiles of scores (a.u.)",
                             y = "Group 1 - group 2 \nquantile differences (a.u.)")
psf[[1]]

Third, we make 1D scatterplots with deciles and colour coded differences.

p <- plot_scat2(df,
                xlabel = "",
                ylabel = "Scores (a.u.)",
                alpha = .3,
                shape = 21,
                colour = "grey10",
                fill = "grey90") #> scatterplots
p <- plot_hd_links(p, sf[[1]],
                    q_size = 1,
                    md_size = 1.5,
                    add_rect = TRUE,
                    rect_alpha = 0.1,
                    rect_col = "grey50",
                    add_lab = TRUE,
                    text_size = 5) #> superimposed deciles + rectangle
p <- p + coord_flip() #> flip axes
p

Finally, we combine the three plots into one figure.

library(cowplot)
cowplot::plot_grid(ps, p, psf[[1]], labels=c("A", "B", "C"), ncol = 1, nrow = 3,
                   rel_heights = c(1, 1, 1), label_size = 20, hjust = -0.5, scale=.95)

Panel A illustrates two distributions, both n = 1000, that differ in spread. The observations in the scatterplots were jittered based on their local density, as implemented in ggbeeswarm::geom_quasirandom.

Panel B illustrates the same data from panel A. The dark vertical lines mark the deciles of the distributions. The thicker vertical line in each distribution is the median. Between distributions, the matching deciles are joined by coloured lined. If the decile difference between group 1 and group 2 is positive, the line is orange; if it is negative, the line is purple. The values of the differences for deciles 1 and 9 are indicated in the superimposed labels.

Panel C focuses on the portion of the x-axis marked by the grey shaded area at the bottom of panel B. It shows the deciles of group 1 on the x-axis – the same values that are shown for group 1 in panel B. The y-axis shows the differences between deciles: the difference is large and positive for decile 1; it then progressively decreases to reach almost zero for decile 5 (the median); it becomes progressively more negative for higher deciles. Thus, for each decile the shift function illustrates by how much one distribution needs to be shifted to match another one. In our example, we illustrate by how much we need to shift deciles from group 2 to match deciles from group 1.

More generally, a shift function shows quantile differences as a function of quantiles in one group. It estimates how and by how much two distributions differ. It is thus a powerful alternative to the traditional t-test on means, which focuses on only one, non-robust, quantity. Quantiles are robust, intuitive and informative.

Shift function demo [2]

The shift function can also be computed for all pairs of groups in a data frame in one call by using the argument doall = TRUE.

set.seed(21) # generate data
n <- 100 # sample size
library(tibble)
df <- tibble(gr = factor(c(rep("group1",n),rep("group2",n),rep("group3",n))),
             obs= c(rnorm(n), rnorm(n)+1, rnorm(n)*5)) # make tibble
out <- shifthd(df, doall = TRUE) # compute all comparisons

Plot all shift functions in one call of plot_sf.

plist <- plot_sf(out)

The plots can then be combined using the packages gridExtra or cowplot.

library(gridExtra)
do.call("grid.arrange", c(plist, ncol=2))

To extract one object and for instance change a label:

p <- plist[[1]]
p + labs(y = "Difference")

New group plot with different y labels and titles:

for(sub in 1:length(plist)){
plist[[sub]] <- plist[[sub]] + labs(y = "Difference", title = names(out)[sub]) + theme(plot.title = element_text(size = 20, face = "bold.italic"))
}
do.call("grid.arrange", c(plist, ncol=2))

To understand what’s going on, here are the marginal distributions:

p <- plot_scat2(df,
                xlabel = "",
                ylabel = "Scores (a.u.)",
                alpha = .5,
                shape = 21,
                colour = "grey10",
                fill = "grey90") #> scatterplots
p + coord_flip() #> flip axes

rogme's People

Contributors

Stargazers

Watchers

Forkers

guhjy nemochina2008 hsteptoe thanhvh agahkarakuzu rdpalacio shaoyoucheng alyssadai paul30402

rogme's Issues

Is it possible to use plot_hd_links to visualize differences of more than 2 groups within one plot?

Hey im looking to visualize differences between 5 groups in my plot, I noticed that the shifthd function does return the shift outputs for more than 2 groups. So the idea is to compare group1 -> group2 and then group2 -> group3, group3 -> group4 and so on. Would be amazing if theres an easy way to do that with the current implementation :)

Analyze and plot quartiles

Is there a simple way to select to analyze and plot just the quartiles instead of the deciles? I tried to have a look at the functions (plot_scat2 for example) but it seemed to me to be hard coded and I do not have the abilities to rewrite the functions..

`plot_hsf_pb_dist` missing `qseq`

The plot_hsf_pb_dist() function seems to be missing a definition of qseq at L225.

Changing this line to scale_y_continuous(breaks = data$quantiles) fixes the problem for me.

Error when using `plot_sf` with a factor containing a space symbol

Hi there,

I just noticed an error when trying to plot a shift function with plot_sf when one of the factor contained a space symbol.

Let me know if I can be of any help!

library(rogme)                    # remotes::install_github("GRousselet/rogme")
library(ggplot2)

dataset <-
    data.frame(x = factor(c("Condition A", "Condition B")),
               y = rnorm(100))


sf <- shifthd(data = dataset, 
              formula = y ~ x, nboot = 200)

plot_sf(sf)
#> Error in parse(text = elt): <text>:1:11: unexpected symbol
#> 1: Condition A
#>               ^

^{Created on 2021-06-08 by the reprex package (v2.0.0)}

Problem to plot black quantiles

Hello
Thanks a lot for this package. I have now a problem which is new. When I do the example in your page the black lines for quantiles don't appear. I have this error message :
`> p <- plot_scat2(df,

```
            xlabel = "",
```
```
            ylabel = "Scores (a.u.)",
```
```
            alpha = .3,
```
```
            shape = 21,
```
```
            colour = "grey10",
```

            fill = "grey90") #> scatterplots

p <- plot_hd_links(p, sf[[1]],

```
               q_size = 1,
```
```
               md_size = 1.5,
```
```
               add_rect = TRUE,
```
```
               rect_alpha = 0.1,
```
```
               rect_col = "grey50",
```
```
               add_lab = TRUE,
```

               text_size = 5) #> superimposed deciles + rectangle

Warning messages:
1: Ignoring unknown parameters: fun.y, fun.ymin, fun.ymax
2: Ignoring unknown parameters: fun.y, fun.ymin, fun.ymax
3: Ignoring unknown parameters: fun.y, fun.ymin, fun.ymax
4: Ignoring unknown parameters: fun.y, fun.ymin, fun.ymax
5: Ignoring unknown parameters: fun.y, fun.ymin, fun.ymax
6: Ignoring unknown parameters: fun.y, fun.ymin, fun.ymax
7: Ignoring unknown parameters: fun.y, fun.ymin, fun.ymax
8: Ignoring unknown parameters: fun.y, fun.ymin, fun.ymax
9: Ignoring unknown parameters: fun.y, fun.ymin, fun.ymax

p <- p + coord_flip() #> flip axes
p
No summary function supplied, defaulting to mean_se()
No summary function supplied, defaulting to mean_se()
No summary function supplied, defaulting to mean_se()
No summary function supplied, defaulting to mean_se()
No summary function supplied, defaulting to mean_se()
No summary function supplied, defaulting to mean_se()
No summary function supplied, defaulting to mean_se()
No summary function supplied, defaulting to mean_se()
No summary function supplied, defaulting to mean_se()
`
What can I do for this ?

Thanks a lot

plot_pbsf does not exist

The main {rogme} page states:

In rogme, the shift function can be calculated using:
shifthd() or shifthd_pbci() for independent groups
shiftdhd() or shiftdhd_pbci() for dependent groups
Illustrations of the results is handled separately by plot_sf() and plot_pbsf()

I installed the {rogme} package today (5/16/2019) and it does not appear to exist. Did I make a mistake along the way?

Shiftdhd() output has the same estimates for each quantile

I think that the indexing here may be the reason why.

plot_kde_rug_dec1/2 functions throw errors (+fix)

Even when using data frames made from the mkt2 function, I both plot_kde_rug_dec functions throw errors. Interrogating the code a bit, I found the problems in the first two lines, which are identical for both functions.

The following lines:

cdat <- plyr::ddply(data, "gr", summarise, deciles=q1469(data))
hd05 <- plyr::ddply(data, "gr", summarise, hd=hd(data,0.5))

should be changed to

cdat <- plyr::ddply(data, "gr", summarise, deciles=q1469(obs))
hd05 <- plyr::ddply(data, "gr", summarise, hd=hd(obs,0.5))

in order to work with a tibble of variables "gr" and "obs".
Once "data" is changed to "obs" the functions seem to work correctly.

plot_sf() error with shifthd_pbci() output

Related to #4 and #6, the plot_sf() error still seems to exist when used with shifthd_pbci(). I'm not exactly clear what the outcome of #4 and #6 where, so starting a new issue for clarity.

The error can be reproduced with:

library(rogme)
library(ggplot2)

#> generate data
set.seed(21)
g1 <- rnorm(1000) + 6
g2 <- rnorm(1000) * 1.5 + 6

#> make tibble
df <- mkt2(g1, g2)

#> compute shift function
# sf <- shifthd(data = df, formula = obs ~ gr, nboot = 200)
sf <- shifthd_pbci(data = df, formula = obs ~ gr, nboot = 200, q = c(.1,.25,.5,.75,.9))

#> plot shift function
psf <- plot_sf(sf, plot_theme = 1)

and returns an error:

Error in if (all.equal(df$q, seq(0.1, 0.9, 0.1))) { : 
  argument is not interpretable as logical

Coincidentally, using sf in plot_hd_links() works fine.

no compliation

be lovely to have it all as a single file to have all functions loaded - as an alternative as to devtools ; Rtools require compilation and on windows, doesn't always work

thx

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.