Git Product home page Git Product logo

bain's People

Contributors

cjvanlissa avatar cvzundert avatar herberthoijtink avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

bain's Issues

Relegate much of bain documentation to a vignette

The bain help documentation is very extensive, but typically, documentation only consists of the essential information needed to run the function, and a few simple examples for automated testing.

Text as extensive as the current documentation is better suited for a Vignette (which is also automatically included with the package).

CJ will create a bare-bones vignette.
Herbert, can you edit both documents so the right info ends up in each of them?

bugje in de parser

library(bain)
library(testthat)

DE VIJF REGELS HIERONDER GEVEN HET GOEDE RESULTAAT

varnames<-c("a1","b1","a2","b2")
hypotheses31 <-
"0.20 *a1 = 0.20 *a2&
0.20 *b1 = 0.20 *b2 &"
x1<-bain:::parse_hypothesis(varnames, hypotheses31)

MAAR .20 (ALS MULTIPLIER VAN EEN PARAMETER, EN, ZONDER 0 VOOR DE PUNT)

IPV 0.20 LEIDT TOT EEN FOUT RESULTAAT

varnames<-c("a1","b1","a2","b2")
hypotheses31 <-
".20 *a1 = .20 *a2&
.20 *b1 = .20 *b2 &"
x2<-bain:::parse_hypothesis(varnames, hypotheses31)

HIERONDER DE TESTTHAT

test_that("parser", {expect_equal(as.vector(t(do.call(rbind, x1$hyp_mat))),
as.vector(t(do.call(rbind, x2$hyp_mat))))})

VOOR EEN GETAL OPTELLEN BIJ EEN PARAMETER GAAT HET WEL

GOED ZOWEL ALS .2 ALS 0.2 ERBIJ OPGETELD WORDT

varnames<-c("a1","b1","c1","d1","e1","f1","a2","b2","c2","d2","e2","f2")
hypotheses31 <-
"a1 = a2 - 0.2&
b1 = b2 + .2 &
0.2c1 -.3 = c2 +.3&
d1 + .3= 3
d2 + .3&
e1 -.3= e2 + .3&
4f1 -0.3 = 3.2f2"
x3<-bain:::parse_hypothesis(varnames, hypotheses31)

HIERONDER DE TESTTHAT

test_that("parser", {expect_equal(as.vector(t(do.call(rbind, x3$hyp_mat))),
c(1 , 0, 0.0 , 0, 0 , 0 ,-1 ,0 , 0 , 0, 0 , 0.0, -0.2,
0 , 1, 0.0, 0, 0 , 0 ,0, -1 ,0 ,0 , 0, 0.0, 0.2,
0 , 0, 0.2, 0, 0 , 0 , 0 ,0, -1, 0 , 0 , 0.0 , 0.6,
0 ,0, 0.0 , 1 , 0, 0 , 0 ,0 , 0, -3, 0 ,0.0 , 0.0,
0, 0, 0.0, 0, 1 , 0, 0, 0 , 0, 0 ,-1 , 0.0 , 0.6,
0, 0, 0.0, 0, 0, 4, 0 , 0, 0, 0 , 0 ,-3.2, 0.3)
)})

Implicitly removing the intercept in `bain` results in seemingly unexpected behavior

When estimating the effect of a categorical variable, such as in, for example, an ANOVA, bain by default drops the intercept. This seems like a logical decision, but results in seemingly unexpected behavior, because there is no warning message shown. Accordingly, the estimate as given in an lm model reflects the difference between two categories, while bain uses the same name to refer to the group specific mean. This issue becomes apparent in the following reprex.

# Load required packages
library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(magrittr)
#> Warning: package 'magrittr' was built under R version 4.1.2
library(bain)

set.seed(1)

# Load data and specify sex as a factor variable
dat1 <- sesamesim %>%
  mutate(sex = factor(sex, labels = c("boy", "girl")))

# Fit model
mod1 <- lm(postnumb ~ sex, dat1)

# Summary of model
summary(mod1)
#> 
#> Call:
#> lm(formula = postnumb ~ sex, data = dat1)
#> 
#> Residuals:
#>     Min      1Q  Median      3Q     Max 
#> -30.096  -8.096  -0.856   8.144  32.904 
#> 
#> Coefficients:
#>             Estimate Std. Error t value Pr(>|t|)    
#> (Intercept)   30.096      1.175  25.616   <2e-16 ***
#> sexgirl       -1.240      1.628  -0.761    0.447    
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Residual standard error: 12.6 on 238 degrees of freedom
#> Multiple R-squared:  0.00243,    Adjusted R-squared:  -0.001761 
#> F-statistic: 0.5799 on 1 and 238 DF,  p-value: 0.4471

# Note that sexgirl here reflects the difference with sexboy
# If we use this model in bain, and specify a negative 
# effect for sexgirl (which seems like a logical thing to do),
# this results in unexpected behavior, because bain implicitly
# transforms the intercept and difference into two separate means.
# However, this only becomes apparent after inspecting the summary
# of the bain output.
bf1 <- bain(mod1, "sexgirl < 0")

bf1
#> Bayesian informative hypothesis testing for an object of class lm (ANOVA):
#> 
#>    Fit   Com   BF.u  BF.c  PMPa  PMPb 
#> H1 0.000 0.500 0.000 0.000 1.000 0.000
#> Hu                               1.000
#> 
#> Hypotheses:
#>   H1: sexgirl<0
#> 
#> Note: BF.u denotes the Bayes factor of the hypothesis at hand versus the unconstrained hypothesis Hu. BF.c denotes the Bayes factor of the hypothesis at hand versus its complement.
# Fit is 0, which seems weird, because it is in line with the effect
# as estimated by the `lm` function.

# Only when calling summary(mod1) it becomes apparent that `sexgirl`
# suddenly refers to the mean of the girls, rather than the difference
# between boys and girls.
summary(bf1)
#>   Parameter   n Estimate       lb       ub
#> 1    sexboy 115 30.09565 27.79295 32.39836
#> 2   sexgirl 125 28.85600 26.64732 31.06468

Created on 2022-03-09 by the reprex package (v2.0.0)

On a related note, researchers may also manually dummy code their categories. If this is the case, bain does not remove the intercept, but estimates the difference. However, this ignores the fact that there are multiple groups involved, which results in an inconsistent Bayes factor (Hoijtink, Gu & Mulder, 2019). Is it indeed problematic to create numeric dummy variables, and estimate the effect of these using bain, because the resulting Bayes factors are highly similar (see reprex below). If this is actually a problematic way to estimate the effects of categorical variables, it may be worthwhile to let bain recognize numeric dummy variables, and display a warning if these are observed.

# Load required packages
library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(magrittr)
#> Warning: package 'magrittr' was built under R version 4.1.2
library(bain)

set.seed(1)

# Make the site variable a factor
bf_group_means <- sesamesim %>%
  mutate(site = as.factor(site)) %$%
  lm(postnumb ~ site + prenumb) %>%
  bain("site3 < site1 < site2 & prenumb > 0")

bf_group_means
#> Bayesian informative hypothesis testing for an object of class lm (ANCOVA):
#> 
#>    Fit   Com   BF.u  BF.c   PMPa  PMPb 
#> H1 0.443 0.057 7.794 13.189 1.000 0.886
#> Hu                                0.114
#> 
#> Hypotheses:
#>   H1: site3<site1<site2&prenumb>0
#> 
#> Note: BF.u denotes the Bayes factor of the hypothesis at hand versus the unconstrained hypothesis Hu. BF.c denotes the Bayes factor of the hypothesis at hand versus its complement.

# Create dummies manually, I'll use site = 1 as the reference
bf_differences <- sesamesim %>%
  mutate(site1 = ifelse(site == 1, 1, 0),
         site2 = ifelse(site == 2, 1, 0),
         site3 = ifelse(site == 3, 1, 0),
         site4 = ifelse(site == 4, 1, 0),
         site5 = ifelse(site == 5, 1, 0)) %$%
  lm(postnumb ~ site2 + site3 + site4 + site5 + prenumb) %>%
  bain("site3 < 0 & site2 > 0 & prenumb > 0")

bf_differences
#> Bayesian informative hypothesis testing for an object of class lm (continuous predictors):
#> 
#>    Fit   Com   BF.u  BF.c   PMPa  PMPb 
#> H1 0.448 0.055 8.118 13.888 1.000 0.890
#> Hu                                0.110
#> 
#> Hypotheses:
#>   H1: site3<0&site2>0&prenumb>0
#> 
#> Note: BF.u denotes the Bayes factor of the hypothesis at hand versus the unconstrained hypothesis Hu. BF.c denotes the Bayes factor of the hypothesis at hand versus its complement.
# Almost identical fit & complexity

summary(bf_group_means)
#>   Parameter   n   Estimate         lb        ub
#> 1     site1  60 27.4293820 25.1603844 29.698380
#> 2     site2  55 34.9818244 32.5529993 37.410650
#> 3     site3  64 27.6904082 25.4002689 29.980548
#> 4     site4  43 26.9840163 24.3250440 29.642989
#> 5     site5  18 31.4298837 27.3413936 35.518374
#> 6   prenumb 240  0.7159311  0.5986552  0.833207
summary(bf_differences)
#>   Parameter   n  Estimate         lb        ub
#> 1     site3 240 0.2610262 -3.0455948  3.567647
#> 2     site2 240 7.5524424  4.3017084 10.803176
#> 3   prenumb 240 0.7159311  0.5986552  0.833207
# In the latter, the grouping structure is ignored

Created on 2022-03-09 by the reprex package (v2.0.0)

The BF turns from Inf to NaN with increasing Test value in a one-sample t-test

Sometimes, depending on the test value in a one-sample t-test, the Bayes factor turns from Inf to NaN. An example can be found below. I'm not sure how to handle this on the R side though, maybe you have some ideas?

data("sesamesim")

set.seed(100)
testValue <- 36
test <- bain::t_test(x = sesamesim$age)
b1 <- bain::bain(x = test, hypothesis = paste0("x=", testValue, "; x>", testValue, "; x<", testValue), fraction = 1)
BF_02_1 <- b1$BFmatrix[1,3]
BF_02_1

[1] 718.2452

set.seed(100)
testValue <- 35.5
test <- bain::t_test(x = sesamesim$age)
b2 <- bain::bain(x = test, hypothesis = paste0("x=", testValue, "; x>", testValue, "; x<", testValue), fraction = 1)
BF_02_2 <- b2$BFmatrix[1,3]
BF_02_2

[1] Inf

set.seed(100)
testValue <- 35
test <- bain::t_test(x = sesamesim$age)
b3 <- bain::bain(x = test, hypothesis = paste0("x=", testValue, "; x>", testValue, "; x<", testValue), fraction = 1)
BF_02_3 <- b3$BFmatrix[1,3]
BF_02_3

[1] NaN

See also https://github.com/jasp-stats/jasp-test-release/issues/1252

Deze test failt:

estimate<-c(0,1,2)
names(estimate) <- c("a", "b", "c")
sampN <- 100
covariance <- matrix(c(1,0,0,0,1,0,0,0,1),nrow=3,ncol=3)

set.seed(199)
y<-bain(estimate,"3a + .5 + .5 > 2b -1 + 1> c -1.5 -.5",n=sampN,Sigma=covariance,group_parameters=0,joint_parameters = 3)

a <- rnorm(100000,0,1)
b <- rnorm(100000,1,1)
c <- rnorm(100000,2,1)
propf <- 0
for (i in 1:100000){
if (3 * a[i] + 1 > 2 * b[i] & 2 * b[i] > c[i]-2) {propf <- propf + 1/100000}
}
test_that("bain default", {expect_equal(y$fit$Fit[1], propf,tolerance = .01)})

b <- rnorm(100000,0,50)
c <- rnorm(100000,2,50)
propc <- 0
for (i in 1:100000){
if (3 * a[i] + 1 > 2 * b[i] & 2 * b[i] > c[i]-2) {propc <- propc + 1/100000}
}
test_that("bain default", {expect_equal(y$fit$Com[1], propc,tolerance = .01)})

Properly implement random seed

Because the random seed was fixed to 100 in Bain 0.1.3, I have temporarily done the same in the development version of bain 0.2.0. Release this before pushing to CRAN. Use sample.int() at every call to fortran.

Data type of "sigma"

Sigma is now an atomic vector, covariance matrix, or list of covariance matrices. Would it not be more practical to use an array? List implies that the covariance matrices can have different dimensions - but this seems like something that should not be allowed.

Check "comparability" of hypotheses

Why is it not possible to specify two separate, and contradictory, hypotheses? Currently bain throws an error. If two experts disagree about the proper hypothesis, wouldn't Bayes factors be a great way to settle the debate?

Different results on Windows/Mac versus Linux

When testing bain in JASP we came across the following inconsistencies in the results. Maybe you have an idea where this originates and how to tackle these differences.

Windows/Mac

data("sesamesim")
formula <- age ~ site + peabody + prenumb + postnumb + funumb - 1
hypothesis <- "site1 = site2 = site3 = site4 = site5;site1 < site2 < site3 < site4 < site5;site1 > site2 > site3 > site4 > site5"
fraction <- 1
standardized <- FALSE
sesamesim$site <- factor(sesamesim$site)
fit <- stats::lm(formula = formula, data = sesamesim)
bainResult <- bain::bain(x = fit, hypothesis = hypothesis, fraction = fraction, standardize = standardized)
> print(bainResult)

Bayesian informative hypothesis testing for an object of class lm (ANCOVA):
   Fit   Com   BF.u  BF.c  PMPa  PMPb 
H1 0.000 0.000 0.000 0.000 0.015 0.000
H2 0.000 0.007 0.007 0.007 0.985 0.007
H3 0.000 0.007 0.000 0.000 0.000 0.000
Hu                               0.993
Hypotheses:
  H1: site1=site2=site3=site4=site5
  H2: site1<site2<site3<site4<site5
  H3: site1>site2>site3>site4>site5
Note: BF.u denotes the Bayes factor of the hypothesis at hand versus the unconstrained hypothesis Hu. BF.c denotes the Bayes factor of the hypothesis at hand versus its complement.

>  bainResult$BFmatrix
             H1           H2         H3
H1  1.000000000 1.476255e-02   30175.92
H2 67.738958866 1.000000e+00 2044085.52
H3  0.000033139 4.892163e-07       1.00

Linux

data("sesamesim")
formula <- age ~ site + peabody + prenumb + postnumb + funumb - 1
hypothesis <- "site1 = site2 = site3 = site4 = site5;site1 < site2 < site3 < site4 < site5;site1 > site2 > site3 > site4 > site5"
fraction <- 1
standardized <- FALSE
sesamesim$site <- factor(sesamesim$site)
fit <- stats::lm(formula = formula, data = sesamesim)
bainResult <- bain::bain(x = fit, hypothesis = hypothesis, fraction = fraction, standardize = standardized)
> print(bainResult)

Bayesian informative hypothesis testing for an object of class lm (ANCOVA):
   Fit   Com   BF.u  BF.c  PMPa  PMPb 
H1 0.000 0.000 0.000 0.000 0.014 0.000
H2 0.000 0.007 0.007 0.007 0.986 0.007
H3 0.000 0.006 0.000 0.000 0.000 0.000
Hu                               0.992
Hypotheses:
  H1: site1=site2=site3=site4=site5
  H2: site1<site2<site3<site4<site5
  H3: site1>site2>site3>site4>site5
Note: BF.u denotes the Bayes factor of the hypothesis at hand versus the unconstrained hypothesis Hu. BF.c denotes the Bayes factor of the hypothesis at hand versus its complement.

> bainResult$BFmatrix
             H1           H2         H3
H1 1.000000e+00 1.404837e-02   22430.45
H2 7.118265e+01 1.000000e+00 1596659.07
H3 4.458225e-05 6.263078e-07       1.00

As you can see, the results between Mac/Windows and Linux diverge (for example BF23 = 30175.92 on windows versus BF23 = 22430.45 on Linux).

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.