stan-dev / rstanarm Goto Github PK
View Code? Open in Web Editor NEWrstanarm R package for Bayesian applied regression modeling
Home Page: https://mc-stan.org/rstanarm
License: GNU General Public License v3.0
rstanarm R package for Bayesian applied regression modeling
Home Page: https://mc-stan.org/rstanarm
License: GNU General Public License v3.0
Are there plans to implement the possibility of including truncated priors (for beta coefficients, or even intercept) in (g)lm(er) models, through beta or uniform distributions?
I am running the following type of model:
stan_glmer(log_y ~ (1 + log_x | group),
prior = t7_prior, prior_intercept = t7_prior,
data = data, iter = 1000, chains = 4)
The dataset had about 20,000 rows with 99 groups.
...
Chain 3, Iteration: 1000 / 1000 [100%] (Sampling)
# Elapsed Time: 14649.7 seconds (Warm-up)
# 19422.5 seconds (Sampling)
# 34072.2 seconds (Total)
Warning message:
Warning message:
passing unknown arguments: show_messages.
passing unknown arguments: show_messages.
Warning message:
passing unknown arguments: show_messages.
Warning message:
passing unknown arguments: show_messages.
Chain 1, Iteration: 1000 / 1000 [100%] (Sampling)
# Elapsed Time: 17411.9 seconds (Warm-up)
# 18800 seconds (Sampling)
# 36211.9 seconds (Total)
Error in serialize(data, node$con, xdr = FALSE) : ignoring SIGPIPE signal
Calls: <Anonymous> ... doTryCatch -> sendData -> sendData.SOCK0node -> serialize
In addition: Warning message:
passing unknown arguments: show_messages.
Execution halted
Session Info:
> sessionInfo()
R version 3.2.1 (2015-06-18)
Platform: x86_64-redhat-linux-gnu (64-bit)
Running under: Red Hat Enterprise Linux Server 7.0 (Maipo)
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8
[4] LC_COLLATE=en_US.UTF-8 LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=en_US.UTF-8 LC_ADDRESS=en_US.UTF-8
[10] LC_TELEPHONE=en_US.UTF-8 LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=en_US.UTF-8
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] markdown_0.7.7 shiny_0.12.2 rstanarm_2.8.0 lme4_1.1-10
[5] magrittr_1.5 dplyr_0.4.1 rstan_2.8.0 Rcpp_0.12.1
[9] Matrix_1.2-1 PricingModel_0.2.1.8-6 netezza_0.6-3.9 rJava_0.9-6
[13] ggplot2_1.0.1
loaded via a namespace (and not attached):
[1] lattice_0.20-31 tidyr_0.2.0 zoo_1.7-12 gtools_3.5.0 assertthat_0.1
[6] digest_0.6.8 mime_0.4 R6_2.1.1 plyr_1.8.3 stats4_3.2.1
[11] coda_0.17-1 lazyeval_0.1.10 minqa_1.2.4 nloptr_1.0.4 R.oo_1.19.0
[16] DT_0.1 shinythemes_1.0.1 proto_0.3-10 splines_3.2.1 shinyjs_0.2.0
[21] stringr_1.0.0 htmlwidgets_0.5 loo_0.1.3 munsell_0.4.2 httpuv_1.3.3
[26] base64enc_0.1-3 htmltools_0.2.6 gridExtra_2.0.0 threejs_0.2.1 matrixStats_0.14.2
[31] MASS_7.3-40 R.methodsS3_1.7.0 grid_3.2.1 nlme_3.1-120 arm_1.8-6
[36] xtable_1.7-4 gtable_0.1.2 DBI_0.3.1 RJDBC_0.2-5 scales_0.3.0
[41] rredis_1.6.9 stringi_0.5-5 reshape2_1.4.1 dygraphs_0.4.5 xts_0.9-7
[46] tools_3.2.1 shinystan_2.0.1 abind_1.4-3 parallel_3.2.1 inline_0.3.14
[51] colorspace_1.2-6
Sorry, I do not have traceback.
@bgoodri I've also noticed that doing the QR composition before advi seems to be more reliable, the same you discussed with @akucukelbir in this thread on stan-dev. Do you think we should default to QR=TRUE
if the user specifies "meanfield"
or "fullrank"
as the estimation algorithm?
With either devtools::source_url
or just source
in the latest version of R
, it is fairly easy to do many of the example-models via rstanarm with the following code idiom
{
source("https://raw.githubusercontent.com/stan-dev/example-models/*.data.R", local = TRUE)
stan_foo(formula, data = environment(), ...)
}
However, since this requires an internet connection, it cannot be part of R CMD check
but I think it would be good to make many demos that do this using best practices (from https://cran.r-project.org/doc/manuals/r-release/R-exts.html#Package-subdirectories ):
The demo subdirectory is for R scripts (for running via demo()) that demonstrate some of the functionality of the package. Demos may be interactive and are not checked automatically, so if testing is desired use code in the tests directory to achieve this. The script files must start with a (lower or upper case) letter and have one of the extensions .R or .r. If present, the demo subdirectory should also have a 00Index file with one line for each demo, giving its name and a description separated by a tab or at least three spaces. (This index file is not generated automatically.)
The gamm4 models use t2()
in place of te()
for tensor splines, but have to equivalent for ti()
. This means one can't separate the interactions from the main effects, as in this example: http://stats.stackexchange.com/a/189437/28370
It would be great if stan_gamm4()
could support ti()
terms or something similar.
Related to #46
@Felipe-Osorio is the error thrown when running the code below the one you were referring to?
fit <- stan_glmer(mpg ~ log1p(wt) + (1|cyl), data = mtcars)
p <- posterior_predict(fit) # ok
p2 <- posterior_predict(fit, newdata = mtcars[1:10, ])
Error in `[.data.frame`(newdata, , colnames(frX)) :
undefined columns selected
test <- stan_glm(mpg ~ wt, data = mtcars)
> test
Call: stan_glm(formula = mpg ~ wt, data = mtcars)
Coefficients:
(Intercept) wt
37.226 -5.322
Degrees of Freedom: Total (i.e. Null); 30 Residual
Error in signif(x$null.deviance, digits) :
non-numeric argument to mathematical function
> traceback()
4: format(signif(x$null.deviance, digits))
3: cat("Null Deviance:\t ", format(signif(x$null.deviance, digits)),
"\nResidual Deviance:", format(signif(x$deviance, digits)),
"\tAIC:", format(signif(x$aic, digits)))
2: print.glm(x)
1: function (x, ...)
UseMethod("print")(x)
This seems to have to do with the fact that it is calling print.glm now.
The error is
Quitting from lines 107-113 (aov.Rmd)
Error: processing vignette 'aov.Rmd' failed with diagnostics:
'stan_lm' is not an exported object from 'namespace:rstanarm'
Execution halted
which appears to be consistent with the new auto-generated NAMESPACE, which may have to do with these warnings
roxygen2::roxygenize(clean = TRUE)
...
There were 14 warnings (use warnings() to see them)
> warnings()
Warning messages:
1: @export [posterior_predict.R#12]: may only span a single line
2: @export [predict.R#4]: may only span a single line
3: @export [rstanarm-package.R#14]: may only span a single line
4: @export [stan_glm.R#23]: may only span a single line
5: @export [stan_glm.fit.R#26]: may only span a single line
6: @export [stan_glmer.R#7]: may only span a single line
7: @export [stan_glmer.R#113]: may only span a single line
8: @export [stan_lm.R#25]: may only span a single line
9: @export [stan_polr.R#46]: may only span a single line
10: @export [stanreg-methods.R#99]: may only span a single line
11: @export [stanreg-methods.R#227]: may only span a single line
12: @export [stanreg-methods.R#239]: may only span a single line
13: @export [stanreg-methods.R#249]: may only span a single line
14: @export [summary.R#8]: may only span a single line
I realize that you can't avoid all possible conflicts, but just wanted to let you know that R package caret
also defines a function R2
, which causes really puzzling errors with stan_lm
. I don't have a good alternative (you might want to rename hs
, etc, as well).
The gamm4 R package inputs syntax for estimating generalized additive models, converts it internally to lme4 syntax, and uses lmer to estimate the parameters. It sounds as if (from the appendix of the lme4 vignette and the documentation of gamm4) that we could just copy some of this R code and make a stan_gamm4()
function with no additional Stan code and it is quite likely that the decov prior will make the estimates of the smoothing parameters more reliable.
These tasks (and probably other things I'm forgetting) still need to be done before first release:
Currently, there is just a stub that throws an error saying it is unimplemented (unless you are only doing in-sample prediction). The predict.gam()
function in the mgcv package has the complicated stuff for handling the smooth terms when newdata
is specified and if type = "lpmatrix"
, it returns the necessary matrix of predictors rather than the predictions. But we need to dig into the details of it to make sure we are multiplying the correct columns by the correct margins of the posterior.
With rstan 2.9 I'm getting the following error when running
stan_glm(mpg ~ wt, data = mtcars)
sampling seems to proceeds ok, bun then
Error in dim(sssf) <- c(n2, object@sim$chains, length(tidx)) :
dims [product 20000] do not match the length of object [40000]
which looks like it's coming from extract
as defined in the stanfit-class.R
file in rstan.
@bgoodri I've been thinking recently that it might avoid a great deal of confusion if we create a separate method for obtaining so-called credible intervals rather than using R's standard confint
. Perhaps an S3 generic credint
and a credint.stanreg
method? (Or just a credint
function, skipping the S3 method stuff?) If we want to do this then I can make the change quite easily this afternoon.
Here are some reasons why I think it's worth doing this
stats::confint
, which we can't change, clearly says it's for confidence intervals.credint
, where we can make it clear what the differences are between frequentist/Bayesian confidence/credible intervals. We don't really do this anywhere at the moment but I think the differences in interpretation are important.confint
for a model not estimated by optimization, but we can either issue a warning to use credint
(and allow confint
to still be usable) or an error that forces the use of credint
.I get weird fixed effects estimates when fitting a model without an intercept:
library(rstanarm)
devtools::install_github("paul-buerkner/brms")
library(brms)
foo1 <- stan_glm(mpg ~ -1 + ., data=mtcars, chains=2, iter=2000, warmup=500)
foo1
foo2 <- brm(mpg ~ -1 + ., data=mtcars, chains=2, iter=2000, warmup=500)
foo2
foo3 <- glm(mpg ~ -1 + ., data = mtcars)
summary(foo3)
The results of foo2
and foo3
closely match each other while foo1
produces very different results. Especially the MD_SD
(and SD
) estimates are a bit off. This may be a result of centering the fixed effects design matrix X regardless of whether an intercept is present or not. I believe that centering is invalid in models without an intercept (at least when not corrected otherwise) but I may be mistaken.
Executive summary:
While rstanarm fits a random effects logistic regression model to my large dataset (~50,000 units of observations) and runs four chains without an issue, on trying to generate the stanfit object, I get a memory size error. There are about 2000 random effects in this model. The issue arises even with only 10 iterations. I have 8 RAM and four cores on a virtual windows server.
formulaR2.0 <- ond ~ pay +age_group +sex + (1 | cpt)
stanfit2.0 <- stan_glmer(formulaR2.0, data = myAQI, family = binomial, iter= 10, cores =2)
Loading required namespace: rstudioapi
Error: cannot allocate vector of size 2.7 Gb In addition:
Warning messages:
1: In structure(.Internal(La_qr(x)), useLAPACK = TRUE, class = "qr") :
Reached total allocation of 8191Mb: see help(memory.size)
2: In structure(.Internal(La_qr(x)), useLAPACK = TRUE, class = "qr") :
Reached total allocation of 8191Mb: see help(memory.size)
3: In structure(.Internal(La_qr(x)), useLAPACK = TRUE, class = "qr") :
Reached total allocation of 8191Mb: see help(memory.size)
4: In structure(.Internal(La_qr(x)), useLAPACK = TRUE, class = "qr") :
Reached total allocation of 8191Mb: see help(memory.size)
sessionInfo()
R version 3.2.2 (2015-08-14)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows Server 2012 x64 (build 9200)
locale:
[1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C
[5] LC_TIME=English_United States.1252
attached base packages:
[1] parallel stats graphics grDevices utils datasets methods base
other attached packages:
[1] rstanarm_2.8.0 rstan_2.8.0 ggplot2_1.0.1 Rcpp_0.12.1 lme4_1.1-9
[6] Matrix_1.2-2 knitr_1.11
loaded via a namespace (and not attached):
[1] nloptr_1.0.4 plyr_1.8.3 shinyjs_0.2.0 xts_0.9-7
[5] base64enc_0.1-3 tools_3.2.2 digest_0.6.8 nlme_3.1-121
[9] gtable_0.1.2 lattice_0.20-33 rstudioapi_0.3.1 shiny_0.12.2
[13] shinystan_2.0.1 proto_0.3-10 loo_0.1.3 gridExtra_2.0.0
[17] stringr_1.0.0 gtools_3.5.0 dygraphs_0.4.5 htmlwidgets_0.5
[21] DT_0.1 stats4_3.2.2 grid_3.2.2 inline_0.3.14
[25] R6_2.1.1 minqa_1.2.4 reshape2_1.4.1 magrittr_1.5
[29] codetools_0.2-14 shinythemes_1.0.1 threejs_0.2.1 scales_0.3.0
[33] matrixStats_0.14.2 htmltools_0.2.6 MASS_7.3-43 splines_3.2.2
[37] xtable_1.7-4 mime_0.4 colorspace_1.2-6 httpuv_1.3.3
[41] stringi_0.5-5 munsell_0.4.2 markdown_0.7.7 zoo_1.7-12
I think we are obligated to work IRT for one of Stan's grants so we might as well make it precompileable
We now have a non-reproducible failure on r-devel for Windows
https://www.r-project.org/nosvn/R.check/r-devel-windows-ix86+x86_64/rstanarm-00check.html
I noticed the following error when calling posterior_predict with newdata, where my newdata contains a tbl_df
(which inherits from data.frame) instead of a data.frame
.
Here is a reproducible example:
library(dplyr)
data(mtcars)
mtcars_tbl_df <- tbl_df(mtcars)
m1 <- stan_glmer(qsec ~ 1 + wt + (1 | cyl), data = mtcars)
ppred1_df <- posterior_predict(m1, newdata = mtcars)
m2 <- stan_glmer(qsec ~ 1 + wt + (1 | cyl), data = mtcars_tbl_df)
ppred2_tbl <- posterior_predict(m2, newdata = mtcars_tbl_df)
Which produces the following error:
> ppred2_tbl <- posterior_predict(m2, newdata = mtcars_tbl_df)
Error in .pp_data_mer(object, newdata, ...) :
New levels found in grouping variable(s) cyl
However, in this example, the following will work:
ppred2_tbl2 <- posterior_predict(m2, newdata = as.data.frame(mtcars_tbl_df))
This seems like a minor bug to me, since there is such an easy work-around (demonstrated above). Documenting in case it is helpful.
All the .stan programs in the rstanarm matrix utilize a design matrix X
that is N by K. It would be interesting to explore an option to preprocess X
with a QR decomposition
X = Q * R
where Q' * Q = I
and R
is upper triangular. Then if the linear predictor is
eta = X * beta = (Q * R) * beta = Q * (R * beta) = Q * theta
where
theta = R * beta
it is perhaps reasonable to put independent symmetric priors on the elements of theta
because the columns of Q
are mutually orthogonal and all have unit length. If so, you could recover the posterior distribution of beta
as a generated quantity with the inverse of the fixed R
matrix via
beta = R_inverse * theta
This idea may also help with the scaling problems we have seen with meanfield ADVI when doing regression problems.
Email from Martin Maechler:
Yesterday, a new version of 'nlme' was released on CRAN,
for which the VarCorr() S3 generic has a new '...' argument
--- so that others can write methods with extra arguments,
something really desirable.
However, this lead to the following WARNINGS for your packages,
which we'd ask you to fix.
I strongly recommend you to do what I (as co-author) did for the
lme4 package:
in your DESCRIPTION, depend on the very latest nlme,
i.e., put
nlme (>= 3.1-123)
in your 'Imports:' or 'Depends:' in your DESCRIPTION file
add ', ...' at the end of the argument list of your VarCorr.
function definition,
both in R/.R and man/.Rd
Currently Bernoulli is supported but not binomial
(Edit this comment to add additional checklist items or mark any as complete)
sigma
genericstan_glmer
for bernoulli, binomial, counttest(){}
block so that their results can be used by subsequent unit testsadapt_delta
control to modeling functions (if no user-specified adapt_delta, the default depends on the prior on the regression coefficients)stan_glm.nb
, stan_glmer.nb
wrappersppcheck
plot.stanreg
posterior_predict
predict.stanreg
stan_polr
sigma
stan_gamm4
implementation (replace guts of gam with estimates from stanfit)See, for example,
https://cran.rstudio.com/web/packages/rstanarm/vignettes/binomial.html
I can't see any images (just the broken image link symbols)
I just noticed that this was mentioned in the release announcement, but I'll leave this issue here for those who don't read release notes...
For stan_glm
with a suitable outcome (and maybe stan_lmer
), we should provide an option to use the t distribution as the likelihood instead of a Gaussian.
Perhaps we just follow BDA3 17.2 and use the fact that
y_i ~ t_v(m, s^2)
is equivalent to
y_i | V_i ~ N(m, V_i)
V_i ~ scaled-inv-chi-sq(v, s^2)
Seeing as it's the intercept that's so different, does this have to do with the centering?
fit <- stan_glm(mpg ~ wt, data = mtcars)
fit_vb <- stan_glm(mpg ~ wt, data = mtcars, algorithm = "meanfield")
> summary(fit)
mean mcse sd 2.5% 25% 50% 75% 97.5% n_eff Rhat
(Intercept) 37.25 0.04 1.94 33.31 35.96 37.22 38.55 41.09 2477 1
wt -5.33 0.01 0.58 -6.45 -5.72 -5.33 -4.94 -4.22 2477 1
sigma 3.15 0.01 0.43 2.45 2.85 3.10 3.39 4.12 1813 1
mean_PPD 20.10 0.02 0.79 18.59 19.58 20.10 20.62 21.67 2636 1
log-posterior -53.10 0.04 1.32 -56.61 -53.65 -52.76 -52.14 -51.62 938 1
> summary(fit_vb)
mean sd 2.5% 25% 50% 75% 97.5%
(Intercept) 0.62 0.07 0.50 0.58 0.62 0.67 0.77
wt -5.37 0.63 -6.65 -5.77 -5.35 -4.96 -4.16
sigma 3.12 0.34 2.52 2.88 3.10 3.35 3.87
mean_PPD 37.46 2.13 33.29 36.02 37.38 38.92 41.74
log-posterior 20.16 0.88 18.52 19.54 20.19 20.75 21.84
I ran into an error earlier today which led to NaNs in the log-likelihood matrix. Upon further inspection, I noticed that sigma had been estimated twice, and its mean took a negative value. Without dissecting the functions, I can only speculate that this is some kind of variable indexing bug in the "translation" between lmer syntax and stan code.
In this reproducible example, the first sigma appears to represent the group-level intercept, b[(Intercept) 1] seems to be its respective slope, and every subsequent variable name is off by one, so to speak. An estimate for the real sigma is not given. Otherwise, stan does a fine job of recovering the data-generating parameters.
Inference for Stan model: Gaussian GLM.
4 chains, each with iter=1000; warmup=500; thin=1;
post-warmup draws per chain=500, total post-warmup draws=2000.
mean se_mean sd 2.5% 25% 50% 75% 97.5% n_eff Rhat
(Intercept) 1.92 0.04 0.71 0.47 1.54 1.91 2.29 3.45 338 1.01
x 3.95 0.04 0.83 2.31 3.48 3.95 4.44 5.54 373 1.01
sigma -0.96 0.04 0.72 -2.43 -1.34 -0.95 -0.60 0.52 341 1.01
b[(Intercept) 1] -1.97 0.04 0.85 -3.67 -2.43 -1.96 -1.46 -0.33 442 1.01
b[x 1] -0.10 0.04 0.72 -1.66 -0.45 -0.09 0.29 1.40 359 1.01
b[(Intercept) 2] 0.22 0.04 0.84 -1.42 -0.26 0.20 0.71 1.90 441 1.01
b[x 2] 1.12 0.04 0.72 -0.39 0.75 1.12 1.50 2.62 328 1.01
b[(Intercept) 3] 1.99 0.04 0.86 0.33 1.50 1.97 2.50 3.67 372 1.01
sigma -0.96 0.04 0.72 -2.43 -1.34 -0.95 -0.60 0.52 341 1.01
mean_PPD 3.98 0.00 0.06 3.87 3.95 3.98 4.02 4.10 1771 1.00
Samples were drawn using NUTS(diag_e) at Mon Aug 17 16:21:17 2015.
For each parameter, n_eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor on split chains (at
convergence, Rhat=1).
R code:
N <- 600
J <- 3
x <- runif(N)
group <- c(rep(1, N/J), rep(2, N/J), rep(3, N/J))
y <- group + 2.00 * group * x + rnorm(N)
simdat <- data.frame(cbind(y, x, group))
stan_1 <- stan_glmer(y ~ x + (1+x|group)
, data = simdat
, chains = 4
, iter = 1000
, family = gaussian()
, prior.for.intercept = normal(0,2)
, prior = normal(0,3)
)
stan_1
ll <- stan_1$log_lik
crossv <- loo::loo(ll, cores = 1) # NaNs not allowed error
#table(round(crossv$pareto_k)) # Inf?
rstanarm v. 2.7.0
Some weirdness in the latest edit around lines 424-432.
commit b2c73f0
These lines
all_args <- mget(names(formals()), sys.frame(sys.nframe()))
prior.info <- all_args[grep("prior", names(all_args), fixed = TRUE)]
are intended to collect all the prior-related arguments so after the model is fit we or the user can see what choices were made for the priors. Because match.call
won't include these if the user leaves the default priors I came up with the solution above. But I just discovered that something like
stan_glm(mpg ~ wt, data = mtcars, subset = cyl < 8)
leads to the following error:
Error in mget(names(formals()), sys.frame(sys.nframe())) :
object 'cyl' not found
which I guess makes sense, but I'm not sure how to work around it. Any ideas?
Should be pretty easy, just need to do it.
@bgoodri Is there any reason why we should definitely report residual degrees of freedom? Right now we have a call to qr
in stanreg
that is only used to get the rank of x
for computing df.residual
. If x
is really large we probably don't want to compute the QR decomposition in stanreg
unless we really want to report df.residual
. What do you think?
For example, this model from the lme4 documentation:
stan_lmer(diameter ~ (1|plate) + (1|sample), data = Penicillin)
will result in an empty X matrix.
From @waynefoltaERI (via brms issue 24)
For example, while playing with the mtcars dataset for this issue, I found that brms' and rstanarm's answers differed considerably. And glm agreed with rstanarm... which gave me a clue and when I investigated further, it appears that rstanarm is ignoring priors. (Or I am misunderstanding how they are specified, but not getting any error messages.) So the reason for the agreement is that I was specifying priors, but rstanarm was ignoring them and using flat (improper, frequentist-like) priors. Maybe has to do with their pre-compilation.
stan_plot(*, pars = "log-posterior", include = FALSE)
does not actually exclude the log-posterior from the plot
stan_plot(*, pars = "lp__", include = FALSE)
does, so I assume it is just a naming thing?
We should just be able to pass N = 0
and skip over the likelihood contribution to the posterior, but it may take a bit of fiddling. People like to see how much the data contributes to the posterior distribution relative to the prior.
Not a short-term priority, but could be nice to have some (limited) functionality for GPs in future iterations of the package.
I'm working with some simulated data and started by running a few simple models. I'm assuming the generic Gaussian GLM is pre-compiled in rstanarm -- I'd installed 2.8.1 directly. This could be an idiosyncrasy of working off clone VMs, but I got the following error:
> rstan1 <- stan_glmer(qty ~ price + (1 + price | prod_key)
+ , data = df
+ , chains = 4
+ , iter = 1000
+ , prior = normal(0, 2)
+ , prior_intercept = normal(100, 40)
+ )
starting worker pid=24456 on localhost:11133 at 14:04:11.818
starting worker pid=24465 on localhost:11133 at 14:04:12.164
starting worker pid=24474 on localhost:11133 at 14:04:12.491
starting worker pid=24483 on localhost:11133 at 14:04:12.796
Error in checkForRemoteErrors(val) :
4 nodes produced errors; first error: the compiled object from C++ code for this model is invalid, possible reasons:
- compiled with save_dso=FALSE;
- compiled on a different platform;
- not existed (created from reading csv files).
> stan_trace(rstan1)
Error in if (object$algorithm == "optimizing") stop("Plots not yet available for optimization (algorithm='optimizing')", :
argument is of length zero
> sessionInfo()
R version 3.2.1 (2015-06-18)
Platform: x86_64-redhat-linux-gnu (64-bit)
Running under: Red Hat Enterprise Linux Server 7.0 (Maipo)
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C LC_ADDRESS=C LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] rstan_2.8.1 ggplot2_1.0.1 shinystan_2.0.1 shiny_0.12.2 rstanarm_2.8.1 Rcpp_0.12.1 arm_1.8-6 lme4_1.1-10 Matrix_1.2-1 MASS_7.3-40
[11] lazyeval_0.1.10 dplyr_0.4.1
loaded via a namespace (and not attached):
[1] lattice_0.20-31 zoo_1.7-12 gtools_3.5.0 assertthat_0.1 digest_0.6.8 mime_0.4 R6_2.1.1 plyr_1.8.1
[9] stats4_3.2.1 coda_0.18-1 httr_0.6.1 minqa_1.2.4 nloptr_1.0.4 DT_0.1 shinythemes_1.0.1 proto_0.3-10
[17] devtools_1.9.1.9000 splines_3.2.1 shinyjs_0.2.3 RcppEigen_0.3.2.5.1 stringr_0.6.2 htmlwidgets_0.5 loo_0.1.3.9000 RCurl_1.95-4.5
[25] munsell_0.4.2 httpuv_1.3.3 base64enc_0.1-3 htmltools_0.2.6 gridExtra_2.0.0 threejs_0.2.1 codetools_0.2-11 matrixStats_0.15.0
[33] withr_1.0.0 bitops_1.0-6 BH_1.58.0-1 grid_3.2.1 nlme_3.1-120 xtable_1.8-0 gtable_0.1.2 DBI_0.3.1
[41] magrittr_1.5 StanHeaders_2.8.0 scales_0.2.4 reshape2_1.4.1 dygraphs_0.5 xts_0.9-7 tools_3.2.1 markdown_0.7.7
[49] abind_1.4-3 parallel_3.2.1 inline_0.3.14 colorspace_1.2-4 memoise_0.2.1 knitr_1.11
The traceback() output was unusual -- mostly comprised of a long list of numbers, presumably sampling args. It's interesting that if I try this model on a single chain (eschewing the parallel calls), it suddenly works. Because this error only sometimes happens, it's been tricky to pinpoint and diagnose. I don't recall it happening prior to 2.8.0, but this may just be one of the drawbacks of working with the bleeding edge versions...
Let's do this for rstanarm
https://groups.google.com/d/msg/stan-users/5IxMxY3tv2w/ptm2vyJmBwAJ
see the repo https://github.com/jtpi/rstan-varsel
The compare
function in the loo package checks that models have the same number of observations, but we can also check that the outcome variable is the same. This would probably require a wrapper around loo::compare
that first does this check, which means rstanarm would have its own model comparison function, instead of users just using the loo package. But I'm not sure that doing this check is worth adding a function since compare
's check for the number of observations will catch almost all cases of y being different.
For models with a Gaussian outcome and some form of grouping structure, allow sigma to vary by level of a group.
@bgoodri I just triggered an error I didn't know about in glFormula
. Apparently it won't allow a model with more random effects than observations because "the random-effects parameters are probably unidentifiable." That's a reasonable concern for glmer
, but for stan_glmer
this will rule out many models (like a lot of the ones Andrew fits) for which it's possible to overcome this with informative priors.
@bgoodri I'm not sure what's going but after your recent merge for the posterior_predict
stuff this example
stan_lmer(Reaction ~ Days + (Days | Subject), data = sleep study)
throws an error. It has something to do with the _NEW_
terms. We're getting b[(Intercept) Subject:_NEW_Subject]
but missing the _NEW_
term for Days
.
Each stan_foo()
function should have an option to do optimizing
rather than sampling
(and probably an option for variational inference too). Consequently, each stan_foo()
function should have an option to turn the priors off, so that the user can obtain pure MLEs. These changes can probably be accomodated by the existing .stan files if we pass a has_priors flag.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.