cjvanlissa / bain Goto Github PK
View Code? Open in Web Editor NEWBayes Factors for Informative Hypotheses
License: GNU General Public License v3.0
Bayes Factors for Informative Hypotheses
License: GNU General Public License v3.0
The bain help documentation is very extensive, but typically, documentation only consists of the essential information needed to run the function, and a few simple examples for automated testing.
Text as extensive as the current documentation is better suited for a Vignette (which is also automatically included with the package).
CJ will create a bare-bones vignette.
Herbert, can you edit both documents so the right info ends up in each of them?
First 4 columns are not informative for applied researchers
library(bain)
library(testthat)
varnames<-c("a1","b1","a2","b2")
hypotheses31 <-
"0.20 *a1 = 0.20 *a2&
0.20 *b1 = 0.20 *b2 &"
x1<-bain:::parse_hypothesis(varnames, hypotheses31)
varnames<-c("a1","b1","a2","b2")
hypotheses31 <-
".20 *a1 = .20 *a2&
.20 *b1 = .20 *b2 &"
x2<-bain:::parse_hypothesis(varnames, hypotheses31)
test_that("parser", {expect_equal(as.vector(t(do.call(rbind, x1$hyp_mat))),
as.vector(t(do.call(rbind, x2$hyp_mat))))})
varnames<-c("a1","b1","c1","d1","e1","f1","a2","b2","c2","d2","e2","f2")
hypotheses31 <-
"a1 = a2 - 0.2&
b1 = b2 + .2 &
0.2c1 -.3 = c2 +.3&
d1 + .3= 3d2 + .3&
e1 -.3= e2 + .3&
4f1 -0.3 = 3.2f2"
x3<-bain:::parse_hypothesis(varnames, hypotheses31)
test_that("parser", {expect_equal(as.vector(t(do.call(rbind, x3$hyp_mat))),
c(1 , 0, 0.0 , 0, 0 , 0 ,-1 ,0 , 0 , 0, 0 , 0.0, -0.2,
0 , 1, 0.0, 0, 0 , 0 ,0, -1 ,0 ,0 , 0, 0.0, 0.2,
0 , 0, 0.2, 0, 0 , 0 , 0 ,0, -1, 0 , 0 , 0.0 , 0.6,
0 ,0, 0.0 , 1 , 0, 0 , 0 ,0 , 0, -3, 0 ,0.0 , 0.0,
0, 0, 0.0, 0, 1 , 0, 0, 0 , 0, 0 ,-1 , 0.0 , 0.6,
0, 0, 0.0, 0, 0, 4, 0 , 0, 0, 0 , 0 ,-3.2, 0.3)
)})
When estimating the effect of a categorical variable, such as in, for example, an ANOVA, bain
by default drops the intercept. This seems like a logical decision, but results in seemingly unexpected behavior, because there is no warning message shown. Accordingly, the estimate as given in an lm
model reflects the difference between two categories, while bain
uses the same name to refer to the group specific mean. This issue becomes apparent in the following reprex.
# Load required packages
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
library(magrittr)
#> Warning: package 'magrittr' was built under R version 4.1.2
library(bain)
set.seed(1)
# Load data and specify sex as a factor variable
dat1 <- sesamesim %>%
mutate(sex = factor(sex, labels = c("boy", "girl")))
# Fit model
mod1 <- lm(postnumb ~ sex, dat1)
# Summary of model
summary(mod1)
#>
#> Call:
#> lm(formula = postnumb ~ sex, data = dat1)
#>
#> Residuals:
#> Min 1Q Median 3Q Max
#> -30.096 -8.096 -0.856 8.144 32.904
#>
#> Coefficients:
#> Estimate Std. Error t value Pr(>|t|)
#> (Intercept) 30.096 1.175 25.616 <2e-16 ***
#> sexgirl -1.240 1.628 -0.761 0.447
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> Residual standard error: 12.6 on 238 degrees of freedom
#> Multiple R-squared: 0.00243, Adjusted R-squared: -0.001761
#> F-statistic: 0.5799 on 1 and 238 DF, p-value: 0.4471
# Note that sexgirl here reflects the difference with sexboy
# If we use this model in bain, and specify a negative
# effect for sexgirl (which seems like a logical thing to do),
# this results in unexpected behavior, because bain implicitly
# transforms the intercept and difference into two separate means.
# However, this only becomes apparent after inspecting the summary
# of the bain output.
bf1 <- bain(mod1, "sexgirl < 0")
bf1
#> Bayesian informative hypothesis testing for an object of class lm (ANOVA):
#>
#> Fit Com BF.u BF.c PMPa PMPb
#> H1 0.000 0.500 0.000 0.000 1.000 0.000
#> Hu 1.000
#>
#> Hypotheses:
#> H1: sexgirl<0
#>
#> Note: BF.u denotes the Bayes factor of the hypothesis at hand versus the unconstrained hypothesis Hu. BF.c denotes the Bayes factor of the hypothesis at hand versus its complement.
# Fit is 0, which seems weird, because it is in line with the effect
# as estimated by the `lm` function.
# Only when calling summary(mod1) it becomes apparent that `sexgirl`
# suddenly refers to the mean of the girls, rather than the difference
# between boys and girls.
summary(bf1)
#> Parameter n Estimate lb ub
#> 1 sexboy 115 30.09565 27.79295 32.39836
#> 2 sexgirl 125 28.85600 26.64732 31.06468
Created on 2022-03-09 by the reprex package (v2.0.0)
On a related note, researchers may also manually dummy code their categories. If this is the case, bain
does not remove the intercept, but estimates the difference. However, this ignores the fact that there are multiple groups involved, which results in an inconsistent Bayes factor (Hoijtink, Gu & Mulder, 2019). Is it indeed problematic to create numeric dummy variables, and estimate the effect of these using bain
, because the resulting Bayes factors are highly similar (see reprex below). If this is actually a problematic way to estimate the effects of categorical variables, it may be worthwhile to let bain
recognize numeric dummy variables, and display a warning if these are observed.
# Load required packages
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
library(magrittr)
#> Warning: package 'magrittr' was built under R version 4.1.2
library(bain)
set.seed(1)
# Make the site variable a factor
bf_group_means <- sesamesim %>%
mutate(site = as.factor(site)) %$%
lm(postnumb ~ site + prenumb) %>%
bain("site3 < site1 < site2 & prenumb > 0")
bf_group_means
#> Bayesian informative hypothesis testing for an object of class lm (ANCOVA):
#>
#> Fit Com BF.u BF.c PMPa PMPb
#> H1 0.443 0.057 7.794 13.189 1.000 0.886
#> Hu 0.114
#>
#> Hypotheses:
#> H1: site3<site1<site2&prenumb>0
#>
#> Note: BF.u denotes the Bayes factor of the hypothesis at hand versus the unconstrained hypothesis Hu. BF.c denotes the Bayes factor of the hypothesis at hand versus its complement.
# Create dummies manually, I'll use site = 1 as the reference
bf_differences <- sesamesim %>%
mutate(site1 = ifelse(site == 1, 1, 0),
site2 = ifelse(site == 2, 1, 0),
site3 = ifelse(site == 3, 1, 0),
site4 = ifelse(site == 4, 1, 0),
site5 = ifelse(site == 5, 1, 0)) %$%
lm(postnumb ~ site2 + site3 + site4 + site5 + prenumb) %>%
bain("site3 < 0 & site2 > 0 & prenumb > 0")
bf_differences
#> Bayesian informative hypothesis testing for an object of class lm (continuous predictors):
#>
#> Fit Com BF.u BF.c PMPa PMPb
#> H1 0.448 0.055 8.118 13.888 1.000 0.890
#> Hu 0.110
#>
#> Hypotheses:
#> H1: site3<0&site2>0&prenumb>0
#>
#> Note: BF.u denotes the Bayes factor of the hypothesis at hand versus the unconstrained hypothesis Hu. BF.c denotes the Bayes factor of the hypothesis at hand versus its complement.
# Almost identical fit & complexity
summary(bf_group_means)
#> Parameter n Estimate lb ub
#> 1 site1 60 27.4293820 25.1603844 29.698380
#> 2 site2 55 34.9818244 32.5529993 37.410650
#> 3 site3 64 27.6904082 25.4002689 29.980548
#> 4 site4 43 26.9840163 24.3250440 29.642989
#> 5 site5 18 31.4298837 27.3413936 35.518374
#> 6 prenumb 240 0.7159311 0.5986552 0.833207
summary(bf_differences)
#> Parameter n Estimate lb ub
#> 1 site3 240 0.2610262 -3.0455948 3.567647
#> 2 site2 240 7.5524424 4.3017084 10.803176
#> 3 prenumb 240 0.7159311 0.5986552 0.833207
# In the latter, the grouping structure is ignored
Created on 2022-03-09 by the reprex package (v2.0.0)
Herbert must examine which warnings are given. Especially when users just call bain on an lm object with mixed type predictors. Are the results valid?
E.g., sepal_length:sepal_width
Sometimes, depending on the test value in a one-sample t-test, the Bayes factor turns from Inf
to NaN
. An example can be found below. I'm not sure how to handle this on the R side though, maybe you have some ideas?
data("sesamesim")
set.seed(100)
testValue <- 36
test <- bain::t_test(x = sesamesim$age)
b1 <- bain::bain(x = test, hypothesis = paste0("x=", testValue, "; x>", testValue, "; x<", testValue), fraction = 1)
BF_02_1 <- b1$BFmatrix[1,3]
BF_02_1
[1] 718.2452
set.seed(100)
testValue <- 35.5
test <- bain::t_test(x = sesamesim$age)
b2 <- bain::bain(x = test, hypothesis = paste0("x=", testValue, "; x>", testValue, "; x<", testValue), fraction = 1)
BF_02_2 <- b2$BFmatrix[1,3]
BF_02_2
[1] Inf
set.seed(100)
testValue <- 35
test <- bain::t_test(x = sesamesim$age)
b3 <- bain::bain(x = test, hypothesis = paste0("x=", testValue, "; x>", testValue, "; x<", testValue), fraction = 1)
BF_02_3 <- b3$BFmatrix[1,3]
BF_02_3
[1] NaN
See also https://github.com/jasp-stats/jasp-test-release/issues/1252
estimate<-c(0,1,2)
names(estimate) <- c("a", "b", "c")
sampN <- 100
covariance <- matrix(c(1,0,0,0,1,0,0,0,1),nrow=3,ncol=3)
set.seed(199)
y<-bain(estimate,"3a + .5 + .5 > 2b -1 + 1> c -1.5 -.5",n=sampN,Sigma=covariance,group_parameters=0,joint_parameters = 3)
a <- rnorm(100000,0,1)
b <- rnorm(100000,1,1)
c <- rnorm(100000,2,1)
propf <- 0
for (i in 1:100000){
if (3 * a[i] + 1 > 2 * b[i] & 2 * b[i] > c[i]-2) {propf <- propf + 1/100000}
}
test_that("bain default", {expect_equal(y$fit$Fit[1], propf,tolerance = .01)})
b <- rnorm(100000,0,50)
c <- rnorm(100000,2,50)
propc <- 0
for (i in 1:100000){
if (3 * a[i] + 1 > 2 * b[i] & 2 * b[i] > c[i]-2) {propc <- propc + 1/100000}
}
test_that("bain default", {expect_equal(y$fit$Com[1], propc,tolerance = .01)})
Because the random seed was fixed to 100 in Bain 0.1.3, I have temporarily done the same in the development version of bain 0.2.0. Release this before pushing to CRAN. Use sample.int() at every call to fortran.
Sigma is now an atomic vector, covariance matrix, or list of covariance matrices. Would it not be more practical to use an array? List implies that the covariance matrices can have different dimensions - but this seems like something that should not be allowed.
Why is it not possible to specify two separate, and contradictory, hypotheses? Currently bain throws an error. If two experts disagree about the proper hypothesis, wouldn't Bayes factors be a great way to settle the debate?
When testing bain in JASP we came across the following inconsistencies in the results. Maybe you have an idea where this originates and how to tackle these differences.
Windows/Mac
data("sesamesim")
formula <- age ~ site + peabody + prenumb + postnumb + funumb - 1
hypothesis <- "site1 = site2 = site3 = site4 = site5;site1 < site2 < site3 < site4 < site5;site1 > site2 > site3 > site4 > site5"
fraction <- 1
standardized <- FALSE
sesamesim$site <- factor(sesamesim$site)
fit <- stats::lm(formula = formula, data = sesamesim)
bainResult <- bain::bain(x = fit, hypothesis = hypothesis, fraction = fraction, standardize = standardized)
> print(bainResult)
Bayesian informative hypothesis testing for an object of class lm (ANCOVA):
Fit Com BF.u BF.c PMPa PMPb
H1 0.000 0.000 0.000 0.000 0.015 0.000
H2 0.000 0.007 0.007 0.007 0.985 0.007
H3 0.000 0.007 0.000 0.000 0.000 0.000
Hu 0.993
Hypotheses:
H1: site1=site2=site3=site4=site5
H2: site1<site2<site3<site4<site5
H3: site1>site2>site3>site4>site5
Note: BF.u denotes the Bayes factor of the hypothesis at hand versus the unconstrained hypothesis Hu. BF.c denotes the Bayes factor of the hypothesis at hand versus its complement.
> bainResult$BFmatrix
H1 H2 H3
H1 1.000000000 1.476255e-02 30175.92
H2 67.738958866 1.000000e+00 2044085.52
H3 0.000033139 4.892163e-07 1.00
Linux
data("sesamesim")
formula <- age ~ site + peabody + prenumb + postnumb + funumb - 1
hypothesis <- "site1 = site2 = site3 = site4 = site5;site1 < site2 < site3 < site4 < site5;site1 > site2 > site3 > site4 > site5"
fraction <- 1
standardized <- FALSE
sesamesim$site <- factor(sesamesim$site)
fit <- stats::lm(formula = formula, data = sesamesim)
bainResult <- bain::bain(x = fit, hypothesis = hypothesis, fraction = fraction, standardize = standardized)
> print(bainResult)
Bayesian informative hypothesis testing for an object of class lm (ANCOVA):
Fit Com BF.u BF.c PMPa PMPb
H1 0.000 0.000 0.000 0.000 0.014 0.000
H2 0.000 0.007 0.007 0.007 0.986 0.007
H3 0.000 0.006 0.000 0.000 0.000 0.000
Hu 0.992
Hypotheses:
H1: site1=site2=site3=site4=site5
H2: site1<site2<site3<site4<site5
H3: site1>site2>site3>site4>site5
Note: BF.u denotes the Bayes factor of the hypothesis at hand versus the unconstrained hypothesis Hu. BF.c denotes the Bayes factor of the hypothesis at hand versus its complement.
> bainResult$BFmatrix
H1 H2 H3
H1 1.000000e+00 1.404837e-02 22430.45
H2 7.118265e+01 1.000000e+00 1596659.07
H3 4.458225e-05 6.263078e-07 1.00
As you can see, the results between Mac/Windows and Linux diverge (for example BF23 = 30175.92 on windows versus BF23 = 22430.45 on Linux).
/
not recognized. /x
could be replaced by *1/x
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.