strengejacke / sjstats Goto Github PK

Effect size measures and significance tests

Home Page: https://strengejacke.github.io/sjstats

R 99.81% SCSS 0.19%

statistical-measures regression-models mixed-models r effectsize significance-testing

sjstats's Introduction

sjstats - Collection of Convenient Functions for Common Statistical Computations

Collection of convenient functions for common statistical computations, which are not directly provided by R's base or stats packages.

This package aims at providing, first, shortcuts for statistical measures, which otherwise could only be calculated with additional effort (like Cramer's V, Phi, or effict size statistics like Eta or Omega squared), or for which currently no functions are available.

Second, another focus lies on implementations of common statistical significance tests with a consistent syntax, like t-test, Mann-Whitney test, Chi-squared test, and more. These functions are designed to be more user-friendly and also support weights, i.e. weighted statistics can be calculated.

Finally, the package includes miscellaneous functions that are either not yet available in R (like svyglm.nb() or svyglm.zip() to calculate negative binomial or zero-inflated poisson models for survey data) or are just convenient for daily work (like functions for bootstrapping, or ANOVA summary tables).

The comprised tools include:

Especially for mixed models: design effect, sample size calculation
Significance tests: Correlation, Chi-squared test, t-test, Mann-Whitney-U test, Wilcoxon rank sum test, Kruskal-Wallis test.

Note that most functions that formerly were available in this package have been moved to the easystats project.

Documentation

Please visit https://strengejacke.github.io/sjstats/ for documentation and vignettes.

Installation

Latest development build

To install the latest development snapshot (see latest changes below), type following commands into the R console:

library(remotes)
remotes::install_github("strengejacke/sjstats")

Officiale, stable release

To install the latest stable release from CRAN, type following command into the R console:

install.packages("sjstats")

Citation

In case you want / have to cite my package, please use citation('sjstats') for citation information.

sjstats's People

Contributors

Stargazers

Watchers

sjstats's Issues

eta_sq and `omega_sq` don't work when white adjustment is applied

# for reproducibility
set.seed(123)

# needed libraries
library(car)
#> Loading required package: carData
library(sjstats)

# model
mod <- stats::lm(
  formula = conformity ~ fcategory * partner.status,
  data = Moore,
  contrasts = list(fcategory = contr.sum, partner.status = contr.sum)
)

# works
sjstats::eta_sq(stats::aov(mod))
#>                       term etasq
#> 1                fcategory 0.003
#> 2           partner.status 0.175
#> 3 fcategory:partner.status 0.145

# works
sjstats::eta_sq(car::Anova(mod, type = "III"))
#>                       term etasq
#> 1              (Intercept) 0.819
#> 2                fcategory 0.005
#> 3           partner.status 0.034
#> 4 fcategory:partner.status 0.025

# doesn't work
sjstats::eta_sq(car::Anova(mod, type = "III", white.adjust = TRUE))
#> Coefficient covariances computed by hccm()
#> Error in if (.after < 1) {: argument is of length zero

I am guessing this is because when white.adjust = TRUE, there is no column corresponding to sum of squares:

# tidy outputs from broom
broom::tidy(car::Anova(mod, type = "III"))
#> # A tibble: 5 x 5
#>   term                      sumsq    df statistic   p.value
#>   <chr>                     <dbl> <dbl>     <dbl>     <dbl>
#> 1 (Intercept)              5753.      1   274.     3.05e-19
#> 2 fcategory                  36.0     2     0.859  4.31e- 1
#> 3 partner.status            240.      1    11.4    1.66e- 3
#> 4 fcategory:partner.status  175.      2     4.18   2.26e- 2
#> 5 Residuals                 818.     39    NA     NA

broom::tidy(car::Anova(mod, type = "III", white.adjust = TRUE))
#> Coefficient covariances computed by hccm()
#> # A tibble: 5 x 4
#>   term                        df statistic   p.value
#>   <chr>                    <dbl>     <dbl>     <dbl>
#> 1 (Intercept)                  1   228.     6.84e-18
#> 2 fcategory                    2     0.911  4.10e- 1
#> 3 partner.status               1     9.51   3.75e- 3
#> 4 fcategory:partner.status     2     2.83   7.12e- 2
#> 5 Residuals                   39    NA     NA

^{Created on 2018-09-24 by the reprex package (v0.2.1)}

singular model and `sjstats::eta_sq` and `sjstats::omega_sq`

If the error model for aovlist objects is singular, then eta-squared and omega-squared functions don't work (only if partial = FALSE).

Any way to safeguard against this? Maybe just produce NAs?

# checking if broom::tidy works
broom::tidy(x = stats::aov(
  formula = mpg ~ wt + qsec + Error(disp / am),
  data = dplyr::filter(mtcars, cyl == 6),
  na.action = na.omit
))
#> # A tibble: 5 x 7
#>   stratum term         df sumsq meansq statistic p.value
#>   <chr>   <chr>     <dbl> <dbl>  <dbl>     <dbl>   <dbl>
#> 1 disp    wt            1 0.135  0.135    NA      NA    
#> 2 disp:am wt            1 7.85   7.85     NA      NA    
#> 3 Within  wt            1 0.852  0.852     0.480   0.560
#> 4 Within  qsec          1 0.288  0.288     0.162   0.726
#> 5 Within  Residuals     2 3.55   1.77     NA      NA

# eta-squared
sjstats::eta_sq(
  model = stats::aov(
    formula = mpg ~ wt + qsec + Error(disp / am),
    data = dplyr::filter(mtcars, cyl == 6),
    na.action = na.omit
  ),
  ci.lvl = 0.95,
  partial = FALSE
)
#> Warning in stats::aov(mformula, data = i): Error() model is singular

#> Warning in stats::aov(mformula, data = i): Error() model is singular

#> Warning in stats::aov(mformula, data = i): Error() model is singular

#> Warning in stats::aov(mformula, data = i): Error() model is singular

#> Warning in stats::aov(mformula, data = i): Error() model is singular

#> Warning in stats::aov(mformula, data = i): Error() model is singular

#> Warning in stats::aov(mformula, data = i): Error() model is singular

#> Warning in stats::aov(mformula, data = i): Error() model is singular

#> Warning in stats::aov(mformula, data = i): Error() model is singular

#> Warning in stats::aov(mformula, data = i): Error() model is singular

#> Warning in stats::aov(mformula, data = i): Error() model is singular

#> Warning in stats::aov(mformula, data = i): Error() model is singular

#> Warning in stats::aov(mformula, data = i): Error() model is singular

#> Warning in stats::aov(mformula, data = i): Error() model is singular

#> Warning in stats::aov(mformula, data = i): Error() model is singular

#> Warning in stats::aov(mformula, data = i): Error() model is singular

#> Warning in stats::aov(mformula, data = i): Error() model is singular

#> Warning in stats::aov(mformula, data = i): Error() model is singular

#> Warning in stats::aov(mformula, data = i): Error() model is singular

#> Warning in stats::aov(mformula, data = i): Error() model is singular

#> Warning in stats::aov(mformula, data = i): Error() model is singular

#> Warning in stats::aov(mformula, data = i): Error() model is singular

#> Warning in stats::aov(mformula, data = i): Error() model is singular

#> Warning in stats::aov(mformula, data = i): Error() model is singular

#> Warning in stats::aov(mformula, data = i): Error() model is singular

#> Warning in stats::aov(mformula, data = i): Error() model is singular

#> Warning in stats::aov(mformula, data = i): Error() model is singular

#> Warning in stats::aov(mformula, data = i): Error() model is singular

#> Warning in stats::aov(mformula, data = i): Error() model is singular

#> Warning in stats::aov(mformula, data = i): Error() model is singular

#> Warning in stats::aov(mformula, data = i): Error() model is singular

#> Warning in stats::aov(mformula, data = i): Error() model is singular

#> Warning in stats::aov(mformula, data = i): Error() model is singular

#> Warning in stats::aov(mformula, data = i): Error() model is singular

#> Warning in stats::aov(mformula, data = i): Error() model is singular

#> Warning in stats::aov(mformula, data = i): Error() model is singular

#> Warning in stats::aov(mformula, data = i): Error() model is singular

#> Warning in stats::aov(mformula, data = i): Error() model is singular

#> Warning in stats::aov(mformula, data = i): Error() model is singular

#> Warning in stats::aov(mformula, data = i): Error() model is singular

#> Warning in stats::aov(mformula, data = i): Error() model is singular

#> Warning in stats::aov(mformula, data = i): Error() model is singular
#> Error in cbind_all(x): Argument 2 must be length 4, not 6

# omega-squared
sjstats::omega_sq(
  model = stats::aov(
    formula = mpg ~ wt + qsec + Error(disp / am),
    data = dplyr::filter(mtcars, cyl == 6),
    na.action = na.omit
  ),
  ci.lvl = 0.95,
  partial = FALSE
)
#> # A tibble: 4 x 5
#>   term  omegasq conf.low conf.high stratum
#>   <chr>   <dbl>    <dbl>     <dbl> <chr>  
#> 1 wt    -0.113        NA    NA     disp   
#> 2 wt     0.421        NA    NA     disp:am
#> 3 wt    -0.0639       NA     0.504 Within 
#> 4 qsec  -0.103        NA     0.425 Within

# partial eta-squared
sjstats::eta_sq(
  model = stats::aov(
    formula = mpg ~ wt + qsec + Error(disp / am),
    data = dplyr::filter(mtcars, cyl == 6),
    na.action = na.omit
  ),
  ci.lvl = 0.95,
  partial = TRUE
)
#> # A tibble: 4 x 5
#>   term  partial.etasq conf.low conf.high stratum
#>   <chr>         <dbl>    <dbl>     <dbl> <chr>  
#> 1 wt           0.0366       NA    NA     disp   
#> 2 wt           0.689        NA    NA     disp:am
#> 3 wt           0.194         0     0.640 Within 
#> 4 qsec         0.0751        0     0.564 Within

# partial omega-squared
sjstats::omega_sq(
  model = stats::aov(
    formula = mpg ~ wt + qsec + Error(disp / am),
    data = dplyr::filter(mtcars, cyl == 6),
    na.action = na.omit
  ),
  ci.lvl = 0.95,
  partial = TRUE
)
#> Warning in stats::aov(mformula, data = i): Error() model is singular

#> Warning in stats::aov(mformula, data = i): Error() model is singular

#> Warning in stats::aov(mformula, data = i): Error() model is singular

#> Warning in stats::aov(mformula, data = i): Error() model is singular

#> Warning in stats::aov(mformula, data = i): Error() model is singular

#> Warning in stats::aov(mformula, data = i): Error() model is singular

#> Warning in stats::aov(mformula, data = i): Error() model is singular

#> Warning in stats::aov(mformula, data = i): Error() model is singular

#> Warning in stats::aov(mformula, data = i): Error() model is singular

#> Warning in stats::aov(mformula, data = i): Error() model is singular

#> Warning in stats::aov(mformula, data = i): Error() model is singular

#> Warning in stats::aov(mformula, data = i): Error() model is singular

#> Warning in stats::aov(mformula, data = i): Error() model is singular

#> Warning in stats::aov(mformula, data = i): Error() model is singular

#> Warning in stats::aov(mformula, data = i): Error() model is singular

#> Warning in stats::aov(mformula, data = i): Error() model is singular

#> Warning in stats::aov(mformula, data = i): Error() model is singular

#> Warning in stats::aov(mformula, data = i): Error() model is singular

#> Warning in stats::aov(mformula, data = i): Error() model is singular

#> Warning in stats::aov(mformula, data = i): Error() model is singular

#> Warning in stats::aov(mformula, data = i): Error() model is singular

#> Warning in stats::aov(mformula, data = i): Error() model is singular

#> Warning in stats::aov(mformula, data = i): Error() model is singular

#> Warning in stats::aov(mformula, data = i): Error() model is singular

#> Warning in stats::aov(mformula, data = i): Error() model is singular

#> Warning in stats::aov(mformula, data = i): Error() model is singular

#> Warning in stats::aov(mformula, data = i): Error() model is singular

#> Warning in stats::aov(mformula, data = i): Error() model is singular

#> Warning in stats::aov(mformula, data = i): Error() model is singular

#> Warning in stats::aov(mformula, data = i): Error() model is singular

#> Warning in stats::aov(mformula, data = i): Error() model is singular

#> Warning in stats::aov(mformula, data = i): Error() model is singular

#> Warning in stats::aov(mformula, data = i): Error() model is singular

#> Warning in stats::aov(mformula, data = i): Error() model is singular

#> Warning in stats::aov(mformula, data = i): Error() model is singular

#> Warning in stats::aov(mformula, data = i): Error() model is singular

#> Warning in stats::aov(mformula, data = i): Error() model is singular
#> Error in cbind_all(x): Argument 2 must be length 4, not 6

Created on 2018-07-13 by the reprex package (v0.2.0).

anova effect size function not working with

Hi,

If I set the argument of car::Anova (https://goo.gl/DBpMim) TRUE, so that we have a heteroscedasticity-corrected coefficient covariance matrix while computing ANOVA summary, I can't seem to get the partial eta-squared by sjstats::eta_sq(aov, partial = TRUE).

As you noted in your response on Twitter-
"The problems seems to be that when using white.adjust = TRUE, then neither as.data.frame() nor broom::tidy() return the sums of squares, which is needed to compute eta squared."

Thanks for any help you might offer.

adding `apaTables` package to `sjstats` imports

I am wondering if there is any particular reason why sjstats is not importing apaTables package. The sjstats relies on it to compute confidence intervals for partial eta-squared and I am guessing you don't want to import the whole package just for one function?

I am asking because my package ggstatsplot (https://indrajeetpatil.github.io/ggstatsplot/) imports sjstats and it's having problems passing R CMD CHECK because the build keeps failing with the error:

Warning: Package `apaTables` needed to compute confidence intervals. 
Please install that package first.

The only way I can get around this issue is by importing literally every package apaTable relies on, which is not a very promising solution because no function from any of these packages appear in my NAMESPACE. Otherwise, I will just have to remove sjstats from dependencies and find other ways to compute 95% CI for partial omega- and eta-squared and that prospect is not very promising either.

It would be awesome if you can add apaTables to your imports, so not just me but any other package that relies on sjstats will not have any such problems.

Installation Issue

I'm trying to install this package (Max OS X, R v. 3.4.2, RStudio v. 1.1.383) and I keep getting the error:

 installation of package ‘sjstats’ had non-zero exit status

The source of the problem seems to be with the TMB and Matrix packages?

Warning in checkMatrixPackageVersion() :
Package version inconsistency detected.
TMB was built with Matrix version 1.2.10
Current Matrix version is 1.2.11
Please re-install 'TMB' from source or restore original 'Matrix' package
Error : object ‘modify_if’ is not exported by 'namespace:purrr'
ERROR: lazy loading failed for package ‘sjstats’

mediation() failing on brms object

I am trying to use mediation() on the following model;

fit.brms <- brm(bf(descrimFlag ~ maori) +
                  bf(hazardous ~ descrimFlag + maori),
                family = bernoulli(),
                data = dat)

marginal_effects(fit.brms)

mediation(fit.brms, treatment="maoriTRUE", mediator="descrimFlagTRUE")

If I do not include TRUE in the above arguments I would get the following error;

Error in UseMethod("pull") : 
  no applicable method for 'pull' applied to an object of class "NULL"

If I add TRUE then I get;

Error in .subset2(x, i, exact = exact) : 
  attempt to select less than one element in get1index

I used the CRAN version and also the dev version incase there was a difference there, same response.

Why am i still not getting sjstats 11.0?

I am hoping to install sjPlot but sjstats seems to remain 10.1 version. How to get 11.0?

library(sjstats)
update.packages("sjstats")
library(sjPlot)
Error in loadNamespace(j <- i[[1L]], c(lib.loc, .libPaths()), versionCheck = vI[[j]]) :
namespace ‘sjstats’ 0.10.1 is being loaded, but >= 0.11.0 is required
Error: package or namespace load failed for ‘sjPlot’

the Nagelkerke's R-square that r2() generates for a glm model seems different that drawn from rms::lrm() when a dummy variable is used.

Most of the r() results are consistent with rms::lrm() and BaylorEdPsych::PseudoR2(), but when I am running a glm model with dummy variables included, r2() results are much higher (too good to be true, such as .69) than those generated by the latter two functions (.1). The result from r2() that is inconsistency against the later two functions drove me to ask you for a double check with the calculation of r2(). Or have I missed something important?

For replication the issue, I am attaching the rda file. And the code for analysis is below. Note that KMT, NP, PFP, TSU, and others (with DPP put as base) are dummy variables generated from the variable L2B.

kao06r.rda.zip

kao06.mod.1 <- glm(turnout ~ gender + age + KMT + NP + PFP + TSU + others,
family=binomial,
data=kao06)

sjstats::r2(kao06.mod.1) # Nagelkerke's R-square = 0.69
BaylorEdPsych::PseudoR2(kao06.mod.1) # Nagelkerke's R-square = 0.1301

Question abow how intra-class correlation using sjstats::icc() is calculated

#Thank you for a fantastic package. I have a quick question about how the intra-class correlation (ICC) using sjstats::icc() is calculated.

I have a fitted lme4 model with the following error standard deviations:

Random Effect A: 0.13
Random Effect B: 0.59
Random Effect C: 0.08
Residual: 0.57

I would expect the ICC for Random Effect A to be: 0.13 / (0.13 + 0.59 + 0.08 + 0.57) = 0.094, however (as below) is 0.026. Similarly, I would expect the ICC for Random Effect B to be 0.59 / (0.13 + 0.59 + 0.08 + 0.57) = 0.43, however it is 0.49, and the ICC for Random Effect C to be 0.08 / (0.13 + 0.59 + 0.08 + 0.57) = 0.058, but it is 0.009. Can you help me understand why these are different?

> sjstats::icc(m1)
Linear mixed model
 Family: gaussian (identity)
Formula: overall_engagement ~ 1 + (1 | participant_ID) + (1 | program_ID/response_date_b)

  ICC (Random Effect A): 0.026098
  ICC (Random Effect B): 0.498624
  ICC (Random Effect C): 0.009448

`sjstats::eta_sq` doesn't produce 95% CI when `partial = FALSE`

This issue is not observed for sjstats::omega_sq or when partial = TRUE for sjstats::eta_sq .

# anova1
anova(aov(mpg ~ hp * wt, mtcars))
#> Analysis of Variance Table
#> 
#> Response: mpg
#>           Df Sum Sq Mean Sq F value    Pr(>F)    
#> hp         1 678.37  678.37 146.380 1.228e-12 ***
#> wt         1 252.63  252.63  54.512 4.856e-08 ***
#> hp:wt      1  65.29   65.29  14.088 0.0008108 ***
#> Residuals 28 129.76    4.63                      
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

# 95% CI
sjstats::eta_sq(
  model = anova(aov(mpg ~ hp * wt, mtcars)),
  partial = FALSE,
  ci.lvl = 0.95 ,
  n = 100
)
#> Warning in anova.lm(model): ANOVA F-tests on an essentially perfect fit are
#> unreliable

#> Warning in anova.lm(model): ANOVA F-tests on an essentially perfect fit are
#> unreliable

#> Warning in anova.lm(model): ANOVA F-tests on an essentially perfect fit are
#> unreliable

#> Warning in anova.lm(model): ANOVA F-tests on an essentially perfect fit are
#> unreliable

#> Warning in anova.lm(model): ANOVA F-tests on an essentially perfect fit are
#> unreliable

#> Warning in anova.lm(model): ANOVA F-tests on an essentially perfect fit are
#> unreliable

#> Warning in anova.lm(model): ANOVA F-tests on an essentially perfect fit are
#> unreliable

#> Warning in anova.lm(model): ANOVA F-tests on an essentially perfect fit are
#> unreliable

#> Warning in anova.lm(model): ANOVA F-tests on an essentially perfect fit are
#> unreliable

#> Warning in anova.lm(model): ANOVA F-tests on an essentially perfect fit are
#> unreliable

#> Warning in anova.lm(model): ANOVA F-tests on an essentially perfect fit are
#> unreliable

#> Warning in anova.lm(model): ANOVA F-tests on an essentially perfect fit are
#> unreliable

#> Warning in anova.lm(model): ANOVA F-tests on an essentially perfect fit are
#> unreliable

#> Warning in anova.lm(model): ANOVA F-tests on an essentially perfect fit are
#> unreliable

#> Warning in anova.lm(model): ANOVA F-tests on an essentially perfect fit are
#> unreliable
#> Error in mutate_impl(.data, dots): Evaluation error: Result 2 is not a length 1 atomic vector.

# anova2
anova(lm(mpg ~ hp * wt, mtcars))
#> Analysis of Variance Table
#> 
#> Response: mpg
#>           Df Sum Sq Mean Sq F value    Pr(>F)    
#> hp         1 678.37  678.37 146.380 1.228e-12 ***
#> wt         1 252.63  252.63  54.512 4.856e-08 ***
#> hp:wt      1  65.29   65.29  14.088 0.0008108 ***
#> Residuals 28 129.76    4.63                      
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

# 95% CI
sjstats::eta_sq(
  model = anova(lm(mpg ~ hp * wt, mtcars)),
  partial = FALSE,
  ci.lvl = 0.95 ,
  n = 100
)
#> Warning in anova.lm(model): ANOVA F-tests on an essentially perfect fit are
#> unreliable

#> Warning in anova.lm(model): ANOVA F-tests on an essentially perfect fit are
#> unreliable

#> Warning in anova.lm(model): ANOVA F-tests on an essentially perfect fit are
#> unreliable
#> Error in mutate_impl(.data, dots): Evaluation error: Result 2 is not a length 1 atomic vector.

^{Created on 2018-10-04 by the reprex package (v0.2.1)}

std_beta() : error with simple regression model

std_beta() function does well with multiple regression model, but results in error with model with only one independent variable.

fit <- lm(Ozone ~ Wind, data = airquality)
std_beta(fit)
Error: Columns 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116 must be named

Error handling offset term when supplied to glm(...,offset=) argument

Hi,

This fix does not appear to be working for me. I installed the latest sjstats:

devtools::install_github("strengejacke/sjstats")

and then I installed the dev versions in sequence as you suggest in the sjstats readme:

devtools::install_github("strengejacke/sjlabelled")
devtools::install_github("strengejacke/sjmisc")
devtools::install_github("strengejacke/sjstats")
devtools::install_github("strengejacke/ggeffects")
devtools::install_github("strengejacke/sjPlot")

The new installations had no effect on the errors thrown by the following code:

d = data.frame(x1 = rnorm(100, 0, 1), 
                        x2 = rnorm(100, 0, 1), 
                        pop = rpois(100, 100)) 

d$y = rpois(100, exp(0 + d$x1*.3 + d$x2*-.4 + d$x1*d$x2*.5)) 
d$log_pop = log(d$pop)
m <- glm(y ~ x1*x2, offset=log(pop), data=d)
plot_model(m, type = "pred", terms = c("x1", "x2", "pop"))

is.na() applied to non-(list or vector) of type 'NULL'Error in names(datlist) <- names(first) :
'names' attribute [3] must be the same length as the vector [2]

m <- glm(y ~ x1*x2, offset=log_pop, data=d)
plot_model(m, type = "pred", terms = c("x1", "x2", "log_pop")) 

ERROR: is.na() applied to non-(list or vector) of type 'NULL'Error in names(datlist) <- names(first) :

'names' attribute [3] must be the same length as the vector [2]

names(sjstats::model_frame(m))`

returns a model frame where the offset term is called (offset).

effect sizes on non-significant p-values

Currently package returns NA for effect sizes where p>0.05. Is there a way to view the etasq when p>0.05?

two or more logical statements in `prop()`

One thing I noticed is that it does not seem to work with two or more logical statements, e.g.:
prop(mtcars, mpg >25 & mpg < 30)

conf.high lower than the partial.etasq

Dear all,

I am a bit confused about a result I am getting as Output of:

model1 <- aov(Rating ~ (repfac1*repfac2) + Error(vid/(repfac1*repfac2)), contrasts = contr.sum, data = data_repeated_measure)
sjstats::eta_sq(model1, partial = TRUE, ci.lvl = .95)
it gives me a partial.etasq of 0.955, conf.low of 0,193 and conf.high of 0.501.
My confusion is, because the conf.high is lower than the partial.etasq for that effect.

I get the same results for the CI when I use another R-package (MBESS):

Lims <- conf.limits.ncf(F.value = 15.4105982, conf.level = 0.95, df.1 <- 3, df.2 <- 75)
Lower.lim <- Lims$Lower.Limit/(Lims$Lower.Limit + df.1 + df.2 + 1)
Upper.lim <- Lims$Upper.Limit/(Lims$Upper.Limit + df.1 + df.2 + 1)

Further I get an error message when trying to calculate the confidence interval for partial=FALSE:

sjstats::eta_sq(model1, partial = FALSE, ci.lvl = .95)

"Error in cbind_all(x) : Argument 2 must be length 6, not 13"

Can you help me out here?
Thanks in advance!

add support for stanmvreg-models

no text

Robust standard errors

Hi, is there currently a function to provide robust standard errors (and CI's, p-values) for regression results? In Stata using the option robust after a regression does all the work. From what I understand at the moment, in R one has to calculate it manually via the sandwich package and then write additional functions to create an adjusted summary.

Typo for Pearson's R

Hi,

I found a typo in xtab_statistics.R for calculating Pearson's R:

sjstats/R/xtab_statistics.R

Line 192 in df06247

statistics == "pearson" ~ "Person's r",

sjstats/R/xtab_statistics.R

Line 201 in df06247

statistics == "pearson" ~ "Person's r",

This also affects sjt.xtab from sjPlot.

Suggestion: support for t-tests

Hi,

as sd-tests and t-tests in R do not provide a lot of information, adding a table function in sjPlot would be very useful.

This is the output of a paired t-test in Stata (see #189):

It includes everything that is of interest (group specific and difference wise), means, standard deviations, confidence intervals, nr. of observations, df's, p values.
In comparison this is the t-test in R:

For independent two sample t-tests, there could either be a separate function to check for equal variances (var.test in base R), or the t-test function could have a parameter, e.g.var.equal, with options auto, TRUE, or FALSE. In the case of auto, the function could automatically check for variance equality and adjust the degrees of freedom computation if necessary.

cannot compute r2 values for glmmTMB objects with ar1 covariance structure

The r2 values of a glmmTMB object with AR1 covariance structure cannot be computed with sjstats.

Example:

library(glmmTMB)
library(MASS)

n <- 100                                          
x1 <- mvrnorm(mu = rep(0,n), Sigma = .7 ^ as.matrix(dist(1:n)) )    
y1 <- x1 + rnorm(n)    
x2 <- mvrnorm(mu = rep(0,n), Sigma = .7 ^ as.matrix(dist(1:n)) )
y2 <- x2+ rnorm(n)   
x3 <- mvrnorm(mu = rep(0,n), Sigma = .7 ^ as.matrix(dist(1:n)) ) 
y3 <- x3 + rnorm(n)   
x4 <- mvrnorm(mu = rep(0,n), Sigma = .7 ^ as.matrix(dist(1:n)) )   
y4 <- x4 + rnorm(n)   
x5 <- mvrnorm(mu = rep(0,n), Sigma = .7 ^ as.matrix(dist(1:n)) )    
y5 <- x5 + rnorm(n)   

x <- c(x1,x2,x3,x4,x5)
y <- c(y1,y2,y3,y4,y5)
times <- factor(c(1:n,1:n,1:n,1:n,1:n))
group <- factor(c(rep(1,n),rep(1,n),rep(1,n),rep(1,n),rep(1,n)))

dat0 <- data.frame(y,times,group)

j = glmmTMB(y ~ ar1(times + 0 | group) + (1|group), data=dat0)
j1 = glmmTMB(y ~ ar1(times + 0 | group) , data=dat0)

sjstats::r2(j) and sjstats::r2(j1) both return the following error traceback:

Error in vals$X[, rownames(Sigma), drop = FALSE] : subscript out of bounds
7. FUN(X[[i]], ...)
6. apply(X = X, FUN = FUN, ...)
5. sapply(vals$vc[terms], function(Sigma) { Z <- vals$X[, rownames(Sigma), drop = FALSE] Z.m <- Z %*% Sigma return(sum(diag(crossprod(Z.m, Z)))/stats::nobs(x)) ...
4. getVarRand(not.obs.terms)
3. r2_mixedmodel(x, type = "r2")
2. r2.glmmTMB(j)
1. sjstats::r2(j)

Perhaps this is related to the following bug? glmmTMB/glmmTMB#401

Error message re: sjt.glmer?

library("sjPlot")
library(sjstats)
library(sjlabelled)
sjt.glmer(p0)
Error in object@pp$unsc() : object 'merPredDunsc' not found

After fitting my glm model, and installed all required packages, I got that error message, how to resolve it?

Add R2 for Bayesian models

See here:
http://discourse.mc-stan.org/t/stan-summary-r2-or-adjusted-r2/4308/4

Error in using equi_test to plot ROPEs for an intercept only brmsfit object

For example,

samples = brms::brm(Sepal.Length ~ 1, iris) 
equi_test(samples, out = "plot")

gives the error

Error in `$<-.data.frame`(`*tmp*`, "grp", value = NA) : 
  replacement has 1 row, data has 0

but equi_test(samples, out = "viewer") or just equi_test(samples) works fine. Is there a way to visualize the ROPE for an intercept only model (for use as a "one sample t-test" kind of analysis)?

Pseudo R² calculations

Hi,

I just tried to refit a negative binomial model from Stata in R to see whether the coefficients are the same.

reg2 <- glm.nb(comments_count_fb ~ maxprop + day, data = d)

The coefficients are in line:

But what I realized is that there is a huge difference in the numbers for the Pseudo R²s:

Is it possible that the formulas in sjstats are not correct? I found the code in sjstats:::pseudo_ralt:

  CoxSnell <- 1 - exp((stats::deviance(x) - x$null.deviance)/n)
  Nagelkerke <- CoxSnell/(1 - exp(-x$null.deviance/n))

Wikipedia lists these formulas (based on log-likelihood):

And the Stata code in the background looks like this:

* ML R2 Cox-Snell
cap scalar `r2_ml' = 1 - exp(2*(`ll_0'-`ll')/`N')
* Cragg-Uhler R2
cap scalar `r2_cu' = (`r2_ml')/(1-exp(2*`ll_0'/`N'))

where ll_0 is the log-likelihood of the null model and ll the log-likelihood.

If I use the nagelkerke function from the package rcompanion, I get the same numbers as in Stata:

rcompanion::nagelkerke(reg2)
$Models
                                                                       
Model: "glm.nb, comments_count_fb ~ maxprop + day, d, 1.327074476, log"
Null:  "glm.nb, comments_count_fb ~ 1, d, 0.9229716263, log"           

$Pseudo.R.squared.for.model.vs.null
                             Pseudo.R.squared
McFadden                            0.0332483
Cox and Snell (ML)                  0.3660860
Nagelkerke (Cragg and Uhler)        0.3660860

This is a code snippet from the nagelkerke() function:


    m = suppressWarnings(logLik(fit, REML = FALSE))[1]
    n = suppressWarnings(logLik(null, REML = FALSE))[1]
    cs = 1 - exp(-2/N * (m - n))
    nk = cs/(1 - exp(2/N * n))

model_frame and Surv() objects

model_frame() does not work for coxph() when variables in Surv() have other names than time and status - looks like the dimnames are always time+status, even if variables have different names.

d <- data.frame(alive = sample(0:1, 100, replace = TRUE), delay = rnorm(100, 60, 30), age = rnorm(100, 60,15))
mod <- coxph(Surv(delay, alive) ~ poly(age, 3), data = d)
model_frame(mod)

addresses strengejacke/ggeffects#39

effect size calculation for repeated measures anova?

Is there a way to calculate Cohen's F and other effect size functions for a repeated measures anova, which, using aov, can't be coerced to an aov object?

Thanks,
Elizabeth

se(icc(model)) with multiple grouping variables

I'm getting an error when trying to get standard errors for merMod models with multiple random intercepts, i.e., dv ~ (1 | group1) + (1 | group2). Looks like it's coded to expect only one parameter. If I do 10 simulations, I get:

Error in mutate_impl(.data, dots): wrong result size (20), expected 10 or 1

Nittiest of nitpicks: Name of chi-squared statistic in tables

I don't like the long name "Chi-squared" and would prefer a more accurate (and shorter) name.

Suggested replacement:

Using unicode: 𝜒²
Using HTML: χ<sup>2</sup>

Would affect this line

This could also apply to other summary statistics in the table functions, but since all of these recently changed from "correct" symbols to ASCII-names, I assume there's a reson for the change, so this might be a pointless issue.

icc.R now uses stats::sigma which does not exist in R <3.3

check icc() for glmmTMB?

see http://rsif.royalsocietypublishing.org/content/14/134/20170213

Suggestion: nonlogical return value for multicollin()

Hi, at the moment multicollin() always returns a logical:

data(efc)

fit <- lm(barthtot ~ c160age + c12hour + c161sex + c172code, data = efc)
multicollin(fit)

No multicollinearity detected.
# A tibble: 1 × 1
  multicollin
        <lgl>
1       FALSE

In line with the other assumption checks, I suggest switching the default return value to be non-logical, e.g. in this case the average VIF over all covariates. Only if one chooses as.logical=T in check_assumptions() a logical should be returned.

rms models with nonlinear and interaction terms and omega_sq()

Per #21, there is now support for rms models. Hurray! Example with linear model:

library(pacman)
p_load(rms, sjstats)

ols(Sepal.Width ~ Petal.Width, data = iris) %>%
  anova() %>% 
  omega_sq(ci.lvl = .95)

# A tibble: 2 x 4
  term        omega_sq conf.low conf.high
  <chr>          <dbl>    <dbl>     <dbl>
1 Petal.Width    0.112   0.0475     0.235
2 TOTAL          0.112   0.0475     0.235

However, the output can be odd when there are nonlinear because rms inserts these as extra rows:

ols(Sepal.Width ~ rcs(Petal.Width) + rcs(Petal.Length), data = iris) %>%
  anova() %>% 
  omega_sq(ci.lvl = .95)

# A tibble: 6 x 4
  term             omega_sq conf.low conf.high
  <chr>               <dbl>    <dbl>     <dbl>
1 Petal.Width     -0.000931   NA        0.0597
2 " Nonlinear"    -0.00894    NA       NA     
3 Petal.Length     0.00776    NA        0.0887
4 " Nonlinear"     0.0108     NA        0.0951
5 TOTAL NONLINEAR  0.156       0.127    0.342 
6 TOTAL            0.314       0.263    0.480

Note the presence of missing values. And interactions:

ols(Sepal.Width ~ Petal.Width * Petal.Length, data = iris) %>%
  anova() %>% 
  omega_sq(ci.lvl = .95)

# A tibble: 6 x 4
  term                                                      omega_sq conf.low conf.high
  <chr>                                                        <dbl>    <dbl>     <dbl>
1 Petal.Width  (Factor+Higher Order Factors)                  0.106     0.156     0.375
2 " All Interactions"                                         0.0936    0.137     0.352
3 Petal.Length  (Factor+Higher Order Factors)                 0.129     0.193     0.413
4 " All Interactions"                                         0.0936    0.137     0.352
5 Petal.Width * Petal.Length  (Factor+Higher Order Factors)   0.0936    0.137     0.352
6 TOTAL                                                       0.190     0.279     0.494

It gets very messy with both:

ols(Sepal.Width ~ rcs(Petal.Width) * rcs(Petal.Length), data = iris, penalty = 1) %>%
  anova() %>% 
  omega_sq(ci.lvl = .95)

# A tibble: 15 x 4
   term                                                      omega_sq conf.low conf.high
   <chr>                                                        <dbl>    <dbl>     <dbl>
 1 Petal.Width  (Factor+Higher Order Factors)                 -0.0337  NA        NA     
 2 " All Interactions"                                        -0.0419  NA        NA     
 3 " Nonlinear (Factor+Higher Order Factors)"                 -0.0397  NA        NA     
 4 Petal.Length  (Factor+Higher Order Factors)                -0.0231  NA         0.0130
 5 " All Interactions"                                        -0.0419  NA        NA     
 6 " Nonlinear (Factor+Higher Order Factors)"                 -0.0304  NA        NA     
 7 Petal.Width * Petal.Length  (Factor+Higher Order Factors)  -0.0419  NA        NA     
 8 " Nonlinear"                                               -0.0391  NA        NA     
 9 " Nonlinear Interaction : f(A,B) vs. AB"                   -0.0391  NA        NA     
10 " f(A,B) vs. Af(B) + Bg(A)"                                -0.0253  NA        NA     
11 " Nonlinear Interaction in Petal.Width vs. Af(B)"          -0.0332  NA        NA     
12 " Nonlinear Interaction in Petal.Length vs. Bg(A)"         -0.0321  NA        NA     
13 TOTAL NONLINEAR                                            -0.0226  NA         0.0152
14 TOTAL NONLINEAR + INTERACTION                               0.0925   0.0234    0.146 
15 TOTAL                                                       0.237    0.100     0.256

The output here seems impossible. Is it because omega_sq interprets every row as being a predictor thus causing double counting and impossible values? A solution here is to exclude rows based on regex matching of the extra lines, though in rare cases with oddly named terms (one could use a term called Nonlinear, notice the initial space), this could result in errors, but it seems unlikely to be a practical problem.

Total omega² vs. adj. R² vs. R²

I notice that the numbers do not match:

Model 1 (above). R2 = .132, adj. R2 = .128, omega² = .112. There are no other predictors, so the variance should be allocated to the only predictor there is. Note however that the output from anova() call has an odd-looking row, which looks like it is just the intercept:

ols(Sepal.Width ~ Petal.Width, data = iris) %>%
  anova()

                Analysis of Variance          Response: Sepal.Width 

 Factor      d.f. Partial SS MS        F     P     
 Petal.Width   1   3.794493  3.7944934 22.91 <.0001
 REGRESSION    1   3.794493  3.7944934 22.91 <.0001
 ERROR       148  24.512440  0.1656246

Second model: R² = .440, adj. R² = .408, omega² = .314.
Third model: R² = .416, adj. R² = .404, omega² = .190.
Fourth model: R² = .431, adj. R² = .406, omega² = .237.

I also note that eta_sq() output does not match either R² or R² adj. either.

`se()` not working for merMod with specific random parts

I am trying to plot the joint random and fixed effects coefficients for each level of each grouping variable (actually, I am only interested in one grouping variable). I have three grouping vars with 948, 8, and 6 levels. Calling the function produces only one plot (for the grouping var with 948 levels) and the following errors and warnings:

sjp.lmer(mlm3, type="coef")
Error in [.data.frame(se.fit, , i) : undefined columns selected

In addition: Warning messages:
1: In sweep(cmode.vars, 2, fixed.vars, "+") :
STATS does not recycle exactly across MARGIN
2: In sweep(cmode.vars, 2, fixed.vars, "+") :
STATS is longer than the extent of 'dim(x)[MARGIN]'
3: In sweep(cmode.vars, 2, fixed.vars, "+") :
STATS does not recycle exactly across MARGIN
4: In sweep(cmode.vars, 2, fixed.vars, "+") :
STATS does not recycle exactly across MARGIN
5: In sweep(cmode.vars, 2, fixed.vars, "+") :
STATS is longer than the extent of 'dim(x)[MARGIN]'
6: In sweep(cmode.vars, 2, fixed.vars, "+") :
STATS does not recycle exactly across MARGIN

(see data_and_model.RData)

Forthcoming changes in brms

Keep track of changes in brms that may affect sjstats-functions: paul-buerkner/brms#470

add p_value() support for polr and clm(m)

optionally drop random effects from equi_test()

Offsets not being used

Hello. If I put offsets in my formula they aren't being passed to glm.nb, see example below. seems the svymle function ignores offsets.

Is this a problem for the survey package? or could sjstats_loglik et al be modified to include an offset argument?

mydat = data.frame(
    y = sample(1:10),
    x = seq(-1,1,len=10),
    weights = rep(1, 10),
    myoffset = 20
    )
    
glm(y ~ x + offset(myoffset), data = mydat, family='poisson')$coef
MASS::glm.nb(y ~ x + offset(myoffset), data=mydat)[
    c('theta', 'coefficients')]

sjstats::svyglm.nb(
    y ~ x + offset(myoffset),
    design = survey::svydesign(
        ids = ~1, weights = ~ weights,
        data = mydat)
)$estimate

sjstats::icc() doesn't seem to display ICC for random slopes

Thanks for a great package. sjstats::icc() displays the ICC for random intercepts, but does not seem to display the ICC for random slopes, although the code for the function does seem to pull out random slopes. I can try to add a reprex if it would be helpful but hoped to ask if there was a reason why this wasn't included first.

`etasq` function doesn't work with `aovlist` class objects

The sjstats::etasq function doesn't seem to work with repeated measures ANOVA designs:

# loading the needed libraries
library(tidyverse)
library(magrittr)
library(sjstats)

# converting the iris dataset to long format
iris_long <- datasets::iris %>%
  dplyr::mutate(.data = ., id = dplyr::row_number(x = Species)) %>%
  tidyr::gather(
    data = .,
    key = "condition",
    value = "value",
    Sepal.Length:Petal.Width,
    convert = TRUE,
    factor_key = TRUE
  ) %>%
  tidyr::separate(
    col = "condition",
    into = c("attribute", "measure"),
    sep = "\\.",
    convert = TRUE
  ) %>%
  tibble::as_data_frame(x = .)

# specifying the model
iris.mod <-
  stats::aov(formula = value ~ attribute * measure + Error(id / (attribute * measure)),
             data = iris_long)

# getting tidy output from broom (works only with >= broom 0.4.5)
broom::tidy(iris.mod)
#> # A tibble: 8 x 7
#>   stratum              term        df   sumsq  meansq statistic    p.value
#>   <chr>                <chr>    <dbl>   <dbl>   <dbl>     <dbl>      <dbl>
#> 1 id                   Residua~     1 2.64e+2 2.64e+2     NA    NA        
#> 2 id:attribute         attribu~     1 2.37e+2 2.37e+2     NA    NA        
#> 3 id:measure           measure      1 1.11e+3 1.11e+3     NA    NA        
#> 4 id:attribute:measure attribu~     1 7.99e-1 7.99e-1     NA    NA        
#> 5 Within               attribu~     1 4.70e+2 4.70e+2   1446.    4.71e-161
#> 6 Within               measure      1 5.77e+1 5.77e+1    177.    1.34e- 35
#> 7 Within               attribu~     1 1.54e+0 1.54e+0      4.72  3.01e-  2
#> 8 Within               Residua~   592 1.92e+2 3.25e-1     NA    NA

# getting effect sizes and their confidence intervals 
sjstats::eta_sq(model = iris.mod, partial = TRUE, ci.lvl = 0.95)
#> Error in UseMethod("anova"): no applicable method for 'anova' applied to an object of class "c('aovlist', 'listof')"

Created on 2018-07-02 by the reprex package (v0.2.0).

Loo-adjusted R2 warning: In ypredloo - y : longer object length is not a multiple of shorter object length

I have the following warning when extracting loo adjusted r2. Any idea why? Thanks a lot 😃

Warning message:
In ypredloo - y :
  longer object length is not a multiple of shorter object length

The data is attached and the code is the following:

library(rstanarm)

df <- read.table("df.txt")
fit <- stan_lmer(y ~ group / poly(x, 2) +
                   (1|r1) + (1|r2),
                 data=df, seed=6, QR=TRUE)
sjstats::r2(fit, loo=TRUE)

Tweedie r2

Would it be possible to calculate an R2 value of models with the Tweedie distribution fit via glmmTMB? I found a worked example using cpglmm in Nakagawa, Johnson, and Schielzeth 2017 Appendix s6. Is there anything particular that is missing from glmmTMB?

Feature request: confidence intervals for sjstats::eta_sq

I was wondering if it would be possible to also include confidence intervals (CI) for the effect sizes provided by sjstats::eta_sq. There are some other packages that do provide this. For example, CI for-

omega_squared can be obtained with userfriendlyscience::confIntOmegaSq
partial_eta_squared can be obtained with apaTables::get.ci.partial.eta.squared

But I am a bit wary of utilizing so many packages and mixing different pieces of results from varied packages together, so it'd be really nice to have all of the necessary statistical details from the same package itself.

add HDI for icc() for rstanarm models

See http://discourse.mc-stan.org/t/rstanarm-extracting-variance-components/4409

Pseudo R2 for pscl::zeroinfl() and pscl::hurdle()

Hello,

I'm trying to calculate a pseudo R2 for a glmmTMB poisson model without random effects. However the following happens:

data(Salamanders) m1 <- glmmTMB(count~ mined, zi=~mined, family=poisson, data=Salamanders) r2(m1)

Error in sum(sapply(vals$vc[terms], function(Sigma) { : invalid 'type' (list) of argument

Is this not implemented yet, or am I trying to do something that makes no sense?

Thanks in advance,

Tristan

Bug in std_beta for models with only one explanatory variable

Hi,

for models with only one explanatory variable

    if (has_intercept) 
      fit.data <- fit.data[, -1]

in sjstats::std_beta will turn fit.data into a vector which in turn will break the rest of the code.

You could either add drop = FALSE

    if (has_intercept) 
      fit.data <- fit.data[, -1, drop = FALSE]

or use one of the tibble-methods that are less eager to change the data type. :)

This might happen at different points in the code, wherever you drop the intercept column.

Thanks
Stefan

`lmer` p-values: discrepancy between lmerTest and sjstats with identical standard errors

In their "Keep it maximal" paper (http://talklab.psy.gla.ac.uk/KeepItMaximalR2.pdf), Barr and colleagues tabulate different ways one can compute p-values for lmer models:

There are various ways to obtain p-values from LMEMs, and to our knowledge,
there is little agreement on which method to use. Therefore, we considered
three methods currently in practice: (1) treating the t-statistic as if it were a z
statistic (i.e., using the standard normal distribution as a reference); (2) performing
likelihood ratio tests, in which the deviance (−2LL) of a model containing
the fixed effect is compared to another model without it but that is otherwise
identical in random effects structure; and (3) by Markov Chain Monte Carlo
(MCMC) sampling, using the mcmcsamp() function of lme4 with 10000 iterations.

If I am not mistaken, both lmerTest and sjstats use the first approach to compute the p-values for linear mixed-effects models?

What I am confused by is why the p-values are different when computed with lmerTest as compared to when computed with sjstats, especially when p.kr = FALSE and the standard errors for estimates from these two functions (lmerTest::as_lmerModLmerTest and sjstats::p_value) are identical. I'm a bit worried because the differences are pretty big, with p-values sometimes going from p < 0.01 to p < 0.001. I read the details for the sjstats function, but still couldn't come up with an explanation.

# libraries needed
library(lme4)
library(lmerTest)
library(tibble)
library(sjstats)
library(ggstatsplot)

# two different models
mod1 <-
  lme4::lmer(
    formula = scale(rating) ~ scale(budget) + (budget |
      genre),
    REML = FALSE,
    data = ggstatsplot::movies_long
  )
#> Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl =
#> control$checkConv, : Model failed to converge with max|grad| = 0.0041235
#> (tol = 0.002, component 1)
#> Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv, : Model is nearly unidentifiable: very large eigenvalue
#>  - Rescale variables?

mod2 <-
  lme4::lmer(
    formula = scale(rating) ~ scale(budget) + scale(length) + (budget + length |
      genre),
    REML = FALSE,
    data = ggstatsplot::movies_long
  )
#> Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl =
#> control$checkConv, : Model failed to converge with max|grad| = 1.53644 (tol
#> = 0.002, component 1)
#> Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv, : Model is nearly unidentifiable: very large eigenvalue
#>  - Rescale variables?;Model is nearly unidentifiable: large eigenvalue ratio
#>  - Rescale variables?

# p-values with lmerTest
coef(summary(lmerTest::as_lmerModLmerTest(mod1))) %>%
  tibble::as_data_frame(x = .)
#> # A tibble: 2 x 5
#>   Estimate `Std. Error`    df `t value` `Pr(>|t|)`
#>      <dbl>        <dbl> <dbl>     <dbl>      <dbl>
#> 1  -0.0165       0.107   6.34    -0.155      0.882
#> 2   0.0765       0.0413  5.92     1.85       0.114

coef(summary(lmerTest::as_lmerModLmerTest(mod2))) %>%
  tibble::as_data_frame(x = .)
#> # A tibble: 3 x 5
#>   Estimate `Std. Error`    df `t value` `Pr(>|t|)`
#>      <dbl>        <dbl> <dbl>     <dbl>      <dbl>
#> 1   0.0202       0.0819  5.54     0.247     0.814 
#> 2  -0.0781       0.0317  4.81    -2.47      0.0587
#> 3   0.344        0.0831  4.73     4.14      0.0101

# without KR approx
sjstats::p_value(fit = mod1, p.kr = FALSE) %>%
  tibble::as_data_frame(x = .)

#> Computing p-values via Wald-statistics approximation (treating t as Wald z).
#> # A tibble: 2 x 3
#>   term          p.value std.error
#>   <fct>           <dbl>     <dbl>
#> 1 (Intercept)    0.877     0.107 
#> 2 scale(budget)  0.0642    0.0413

sjstats::p_value(fit = mod2, p.kr = FALSE) %>%
  tibble::as_data_frame(x = .)

#> Computing p-values via Wald-statistics approximation (treating t as Wald z).
#> # A tibble: 3 x 3
#>   term            p.value std.error
#>   <fct>             <dbl>     <dbl>
#> 1 (Intercept)   0.805        0.0819
#> 2 scale(budget) 0.0137       0.0317
#> 3 scale(length) 0.0000352    0.0831

# p-values with sjstats
# KR approx

sjstats::p_value(fit = mod1, p.kr = TRUE) %>%
  tibble::as_data_frame(x = .)

#> Computing p-values via Kenward-Roger approximation. Use `p.kr = FALSE` if computation takes too long.
#> Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl =
#> control$checkConv, : Model failed to converge with max|grad| = 0.0033352
#> (tol = 0.002, component 1)
#> Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv, : Model is nearly unidentifiable: very large eigenvalue
#>  - Rescale variables?
#> Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl =
#> control$checkConv, : Model failed to converge with max|grad| = 0.0033352
#> (tol = 0.002, component 1)
#> Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv, : Model is nearly unidentifiable: very large eigenvalue
#>  - Rescale variables?
#> Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl =
#> control$checkConv, : Model failed to converge with max|grad| = 0.0033352
#> (tol = 0.002, component 1)
#> Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv, : Model is nearly unidentifiable: very large eigenvalue
#>  - Rescale variables?

#> # A tibble: 2 x 3
#>   term          p.value std.error
#>   <fct>           <dbl>     <dbl>
#> 1 (Intercept)     0.893    0.118 
#> 2 scale(budget)   0.167    0.0485

sjstats::p_value(fit = mod2, p.kr = TRUE) %>%
  tibble::as_data_frame(x = .)

#> Computing p-values via Kenward-Roger approximation. Use `p.kr = FALSE` if computation takes too long.
#> Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl =
#> control$checkConv, : Model failed to converge with max|grad| = 0.672962
#> (tol = 0.002, component 1)
#> Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv, : Model is nearly unidentifiable: very large eigenvalue
#>  - Rescale variables?;Model is nearly unidentifiable: large eigenvalue ratio
#>  - Rescale variables?
#> Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl =
#> control$checkConv, : Model failed to converge with max|grad| = 0.672962
#> (tol = 0.002, component 1)
#> Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv, : Model is nearly unidentifiable: very large eigenvalue
#>  - Rescale variables?;Model is nearly unidentifiable: large eigenvalue ratio
#>  - Rescale variables?
#> Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl =
#> control$checkConv, : Model failed to converge with max|grad| = 0.672962
#> (tol = 0.002, component 1)
#> Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv, : Model is nearly unidentifiable: very large eigenvalue
#>  - Rescale variables?;Model is nearly unidentifiable: large eigenvalue ratio
#>  - Rescale variables?
#> Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl =
#> control$checkConv, : Model failed to converge with max|grad| = 0.672962
#> (tol = 0.002, component 1)
#> Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv, : Model is nearly unidentifiable: very large eigenvalue
#>  - Rescale variables?;Model is nearly unidentifiable: large eigenvalue ratio
#>  - Rescale variables?

#> # A tibble: 3 x 3
#>   term          p.value std.error
#>   <fct>           <dbl>     <dbl>
#> 1 (Intercept)   0.832      0.0912
#> 2 scale(budget) 0.102      0.0387
#> 3 scale(length) 0.00760    0.101

Created on 2018-07-16 by the reprex package (v0.2.0).

grpmean weights issues

Hi,

the following code raises an error:

library(sjmisc)
library(sjstats)
data(efc)
efc$weight <- sample(c(0.5577446,1.2667685), size = nrow(efc), replace = TRUE)
efc$w_ow <- sample(c(0.5577446,1.2667685), size = nrow(efc), replace = TRUE)

grpmean(efc, e42dep, e16sex, weight.by = w_ow)

but this works:

grpmean(efc, e42dep, e16sex, weight.by = weight)

The problem seems to be related to this line:

sjstats/R/grpmean.R

Line 68 in aae3248

weights <- gsub("\"", "", deparse(substitute(weight.by)), fixed = T)

I'm not sure whether this might affect other functions as well.

sjstats::omega_sq not working when scaled variables are used

Below I am providing a fully reproducible example of an instance where sjstats::omega_sq is not working properly. There are two issues:

Lower bound for CI is NA for omega-squared.
(But maybe it is not defined in this case? The same issue also observed with function from another package. So this is not specific to sjstats.)
When variables are scaled, partial omega-squared is not obtained

The same issue was not observed with the function partial eta-squared.

library(sjstats)

# simple model
mod1 <- stats::lm(formula = Sepal.Length ~ Sepal.Width,
                  data = datasets::iris)

# omega-squared
sjstats::omega_sq(
  model = mod1,
  partial = FALSE,
  ci.lvl = 0.95,
  n = 100
)
#> # A tibble: 1 x 4
#>   term        omegasq conf.low conf.high
#>   <chr>         <dbl>    <dbl>     <dbl>
#> 1 Sepal.Width 0.00711       NA    0.0717

# partial omega-squared
sjstats::omega_sq(
  model = mod1,
  partial = TRUE,
  ci.lvl = 0.95,
  n = 100
)
#> # A tibble: 1 x 4
#>   term        partial.omegasq conf.low conf.high
#>   <chr>                 <dbl>    <dbl>     <dbl>
#> 1 Sepal.Width         0.00711  -0.0223    0.0580

# the same model with scaled variables
mod2 <-
  stats::lm(formula = scale(Sepal.Length) ~ scale(Sepal.Width),
            data = iris)

# omega-squared
sjstats::omega_sq(
  model = mod2,
  partial = FALSE,
  ci.lvl = 0.95,
  n = 100
)
#> # A tibble: 1 x 4
#>   term               omegasq conf.low conf.high
#>   <chr>                <dbl>    <dbl>     <dbl>
#> 1 scale(Sepal.Width) 0.00711       NA    0.0717

# partial omega-squared
sjstats::omega_sq(
  model = mod2,
  partial = TRUE,
  ci.lvl = 0.95,
  n = 100
)
#> Error in mutate_impl(.data, dots): Evaluation error: object 'Sepal.Length' not found.

# getting confidence interval from userfriendlyscience package
# the same issue
userfriendlyscience::confIntOmegaSq(
  var1 = iris$Sepal.Length,
  var2 = iris$Sepal.Width,
  conf.level = 0.95
)
#> Omega squared: 95% CI = [NA; .16], point estimate = .03

Created on 2018-04-15 by the reprex package (v0.2.0).

omega_sq() and eta_sq(): return type inconsistent depending on ci.level param

Small bug that may cause trouble for some. The output type should be consistent.

> aov(Sepal.Width ~ Petal.Length, data = iris) %>% omega_sq() %>% str()
 Named num 0.177
 - attr(*, "names")= chr "Petal.Length"
> aov(Sepal.Width ~ Petal.Length, data = iris) %>% omega_sq(ci.lvl = .95) %>% str()
Classes ‘tbl_df’, ‘tbl’ and 'data.frame':	1 obs. of  4 variables:
 $ term     : chr "Petal.Length"
 $ omega_sq : num 0.177
 $ conf.low : num 0.0833
 $ conf.high: num 0.29
> aov(Sepal.Width ~ Petal.Length, data = iris) %>% eta_sq() %>% str()
 Named num 0.184
 - attr(*, "names")= chr "Petal.Length"
> aov(Sepal.Width ~ Petal.Length, data = iris) %>% eta_sq(ci.lvl = .95) %>% str()
Classes ‘tbl_df’, ‘tbl’ and 'data.frame':	1 obs. of  4 variables:
 $ term     : chr "Petal.Length"
 $ etasq    : num 0.184
 $ conf.low : num 0.0756
 $ conf.high: num 0.3

More SE-estimation types beyond "robust()"

Add additional vcov-estimation from sandwich pkg, see strengejacke/ggeffects#13