nimble-dev / nimble Goto Github PK

View Code? Open in Web Editor NEW

148.0 18.0 21.0 66.33 MB

The base NIMBLE package for R

Home Page: http://R-nimble.org

License: BSD 3-Clause "New" or "Revised" License

R 26.90% C++ 56.65% C 0.33% Makefile 0.03% Shell 0.07% Batchfile 0.08% M4 0.04% TeX 0.24% HTML 9.86% Roff 5.80%

bayesian-inference mcmc hierarchical-models probabilistic-programming bayesian-methods r

nimble's Introduction

NIMBLE

Website | Documentation | Examples | Developing | Workshop materials

NIMBLE is an R package for hierarchical statistical modeling (aka graphical modeling). It enables writing general models along with methods such as Markov chain Monte Carlo (MCMC), particle filtering (aka sequential Monte Carlo), and other general methods.

For writing statistical models, NIMBLE adopts and extends the BUGS language, making it largely compatible with BUGS and JAGS. NIMBLE makes BUGS extensible, allowing users to add new functions and new distributions.

For writing algorithms (aka analysis methods), NIMBLE provides a model-generic programming system embedded within R. This provides control over models as generic objects and mathematical manipulation of model variables. In this way, NIMBLE's programming paradigm treats probabilistic graphical models as a basic programming construct.

Both models and algorithms are compiled via generating customized C++ and providing seamless interfaces to compiled C++ from R.

NIMBLE's most developed methods are for MCMC. Users can easily customize sampler configurations from R and write new samplers in NIMBLE's algorithm programming system.

Developers of new computational statistical methods can build them in NIMBLE to gain the benefits of its graphical modeling language, compilation, and distribution via CRAN.

Installation

Install prerequisites

NIMBLE needs a C++ compiler and the GNU make utility. Typically, Mac users can obtain these by installing Xcode, including command line utilities, while Windows users can obtain them by installing Rtools. See the User Manual for more details.

Install NIMBLE

The easiest way to install NIMBLE is via CRAN:

install.packages("nimble")

To install from the NIMBLE website:

library(devtools)
install.packages("nimble", type = "source", repos = "https://r-nimble.org")

Note that NIMBLE's sequential Monte Carlo (SMC; aka particle filtering) methods are now (as of version 0.10.0) in the nimbleSMC package.

Note that MCMCsuite and compareMCMCs have been migrated to the compareMCMCs package, now available on CRAN.

Citation

In published work that uses or mentions NIMBLE, please cite:

de Valpine, P., D. Turek, C.J. Paciorek, C. Anderson-Bergman, D. Temple Lang, and R. Bodik. 2017. Programming with models: writing statistical algorithms for general model structures with NIMBLE. Journal of Computational and Graphical Statistics 26:403-413. https://doi.org/10.1080/10618600.2016.1172487.

In published work that uses NIMBLE, please also cite the package version:

de Valpine, P., C. Paciorek, D. Turek, N. Michaud, C. Anderson-Bergman, F. Obermeyer, C. Wehrhahn Cortes, A. Rodriguez, D. Temple Lang, and S. Paganin. 2024. NIMBLE: MCMC, Particle Filtering, and Programmable Hierarchical Modeling. doi: 10.5281/zenodo.1211190. R package version 1.1.0, https://cran.r-project.org/package=nimble.

To help us track usage to justify funding support for NIMBLE, please include the DOI in the citation.

Licenses

Nimble is released under a mixture of licenses, and depends on additional third-party libraries with compatible licenses.

Nimble's non-C++ code (R, bash, Make, etc.) is released under Revised BSD.
Nimble's C++ code is released under GPL 2.
Nimble's User Manual is released under the CC BY 4.0 license.
The Eigen C++ library included with Nimble is licensed under MPL 2.

Acknowledgements

The development of NIMBLE has been funded by:

an NSF Advances in Biological Informatics grant (DBI-1147230) to P. de Valpine, C. Paciorek, and D. Temple Lang;
an NSF SI2-SSI grant (ACI-1550488) to P. de Valpine, C. Paciorek, and D. Temple Lang;
an NSF Collaborative Research grant (DMS-1622444) to P. de Valpine, A. Rodriguez, and C. Paciorek; and
an NSF Collaborative Research grant (DMS-2152860) to P. de Valpine, C. Paciorek, and D. Turek.

with additional support provided by postdoctoral funding for D. Turek from the Berkeley Institute for Data Science and Google Summer of Code fellowships for N. Michaud (2015) and C. Lewis-Beck (2017).

nimble's People

Contributors

Stargazers

Watchers

nimble's Issues

provide more informative error message when needed constants are missing from BUGS model

Example of current non-informative error msg:

code <- nimbleCode({
alpha[1:K] ~ ddirch(theta[1:K])
})
K <- 5
m <- nimbleModel(code, inits = list(theta = rgamma(K, 3, 1)))
Error in getSymbolicParentNodesRecurse(x, constNames, indexNames, nimbleFunctionNames) :
Error, R function : has non-replaceable node values as arguments. Must be a nimble function.

add pt,qt for t_nonstandard

Non-scalar replacements for node functions

Allow for non-scalar replacement quantities to become member variables (setup outputs) of node functions.

tidy up MCMC spec related functions

fix help info so that removeSamplers() and getSamplers() indicate that one can pass vectors of node or var names
addSampler help info is missing 'target'. Also my thought is that 'target' should be the first arg, not 'type'
(maybe worth discussion) I think we want getSamplers() to return an R object that contains info about the sampler that a user could then work with programmatically. Perhaps it should just be the (vector of) ref class object as is from samplerSpecs. It could return this invisibly with print=TRUE being the standard functionality or the return could be the main poitn, with print=FALSE the standard functionality.

3a) if getSamplers() returns list of sampler specs, it would be nice to have each element named based on the vertex being sampled

3b) (enhancement for down the road perhaps) we might want to allow a user to modify a spec (i.e., change a value in the control list) without having to create a totally new sampler and delete the old one just to, say, change the adaptation time

update user manual and examples regarding log.p,log,lower.tail args to user-supplied distributions

norm in R and C differ

norm() in R and C nimble functions differ as the R default seems to be the "1" norm (the 'type' argument) and the C version is the Frobenius

help on isData() is misleading

isData will accept variables rather than node names, but the help info on the nodeNames argument states:

nodeNames: A character vector of node names. This must be
entirely node names, not model variables.

Same issue for the main argument to expandNodeNames().

I would just change the wording, but wanted to run this by Daniel first.

figure out way to propagate initial values for lifted nodes from LHS link

e.g.,

log(mu) ~ dnorm(0,1)

user will initialize mu, but log_mu will not be initialized

warn user if constants provided that are not in model

would need to check declInfo for 'indexExpr', 'valueExpr' and for loop indices and built up an object ConstantsUsed and then warn if any of the elements were zero.

should we change nimbleStep to nimStep to be consistent with nimPrint, nimCopy, etc.?

or add nimStep as a synonym?

restrictions on column names in data?

Hi nimble-devs,

Haven't quite wrapped my head around the expected format for the data argument to nimbleModel(). For instance, it seems that x is a valid name but not X; I get the error:

variable name not suitable for setData(): X

when using a data.frame with X as a column name (yes, I realize that data expects a list, though a data.frame appears to work, presumably by some automatic coercion. Why the default data format is a list and not a data.frame is another thing that isn't clear to me). I think I may be missing something, but this seems like a somewhat strange restriction. Thanks for the help.

Cheers,
Carl

p.s. greatly enjoying v0.3.0; thanks for the thorough release notes on the syntax changes.

clean up types in distributions_inputlist and in various nimble-provided distributions

current plan:

all core C code use doubles and has some checking of inputs
interface code is place for more checking
only provide types in inputList when dimension > 1, and always use double
tell user in user-defined section of manual that we assume double(0) and they should use double and not int

work on handling of NA and NaN in C code for distribution functions

conjugacy checking

add tests of conjugacy to testing system
have Daniel's run-time checking of numerical values against M-H be optional based on user-provided flag in nimbleOptions

deal with incorrect altParams in dmnorm, dwish

Where is the source code?

I am excited about the release and can't resist the chance to submit issue #1

Here is my bug report:

I want to check out your code but can't find any of the files.

warn user about NAs in RHS only nodes at model building time

Would be nice to do, particularly as the current warning at run time doesn't say what RHS only variable is missing, but I don't see a good way to do this, given the variety of ways that a single variable can contain parameters, data (LHS), and RHS only. Furthermore, part of a variable might never get used in a model so it would be fine to have NAs for that part.

warning is issued from using values(...) <- X assignment

nimbleFunction() issues a warning (from reference class creation) whenever we use the assignment form of values() <-

This can be seen by creating such a NF:

nfdef <- nimbleFunction(
setup = function(model, nodes) {},
run = function(p = double(1)) {
values(model, nodes) <- p
}
)

Correct handling of RHSonly nodes, and nodeFunctionID assignment

It seems RHSonly nodes should not be assigned nodeFunctionIDs, but still should be handled correctly by getNodeNames(), e.g.

add various missing distributions and parameterizations

pareto, double exponential/laplace, inverse wishart, inverse gamma, generalized gamma, multivariate t

also make sure to add standard R help info for any R distribution functions

deal with periods in nimbleFunction arguments

At this point, these propagate to C++ and cause compilation errors as they conflict with C++ OOP.

More generally might deal with cases like a user user 'log' as a function and as a variable/argument.

Also avoid conflict of user variable/arg names with variable names generated when lifting in nimble code.

add informative error msg when doing a compilation and forgetting to compile the model beforehand

e.g., trying to build a C MCMC via compileNimble but forgetting to compile the model, gives this error message:

code <- nimbleCode({
y ~ dnorm(mu,1)
mu ~dnorm(0,1)
})
m <- nimbleModel(code, data = list(y = 1), inits = list(mu = 0))
sp <- configureMCMC(m)
Rmcmc <- buildMCMC(sp)
Cmcmc <- compileNimble(Rmcmc)
Error in cModel$.basePtr : $ operator not defined for this S4 class

provide nimStop or nimError as DSL function

this would allow a user to catch run-time errors. nimWarn would also be nice.

verbosity and nimbleOption()

We have a bunch of methods related to MCMC that take a print argument.

We might do the following:

have the default value be the result of getNimbleOptions('verbose'), and have the default nimbleOption value for verbose be FALSE. This mimics R's 'verbose' option
change 'print' to 'verbose' and in general in other cases where we want this sort of thing have the arg be 'verbose'
possibly have getNimbleOptions('verbose') default to getOptions('verbose')

Any thoughts? (Daniel, in particular)

add svd (and possibly eigen) to DSL

General handling of argument names, adding UIDs

work on how nimble handles NA, NaN

Here's an example of something awry:

code <- nimbleCode({
ind <- mu < 0
})
m <- nimbleModel(code)

cm$ind
[1] NA
calculate(cm,'ind')
[1] 0
cm$ind
[1] 0

The issue here is why the NA is not being propagated to 'ind', given that 'mu' is NA at the moment.

consider enhancing MCMC sampling for parameters on constrained domains

e.g., for non-negative parameters, think about:

sampling on log scale
reflecting proposals back across zero (I forget the name of this)

expandNodeNames overlaps with getNodeNames?

Should we combined these two functions by allowing getNodeNames to take a 'nodes' argument?

Also, I think the help info on the nodeNames arg to expandNodeNames is not quite right. It says that it takes a vector of nodeNames, but it could also take variable names, right?

work on scoping for nimbleFunctions

(Replacing original post here):

It would be nice to use/imitate R's scoping for nimbleFunctions and models. E.g. it would be nice to write

foo <- function() {

nf1 <- nimbleFunction(...)

nf2 <- nimbleFunction(...)  use nf1 in here

mc <- nimbleCode(...) use nf1 or nf2 in here

m <- nimbleModel(mc)

compiledStuff <- compileNimble(...) compile nf1, nf2, mc, etc. here

compiledStuff

}

This would require several things to work:

When we process a model definition (i.e. nimbleModel), we look in the right environments for user-defined nimbleFunctions (or other objects, potentially), i.e. we find them from the environment from which nimbleModel was called.
When we compile, we again look in the right environment(s) to find objects.
During uncompiled (R) execution, one nimbleFunction calling another finds it via R's scoping rules.

We could accomplish the first two (I got through the first before reaching the conclusion of this note), but R's reference classes make the third apparently impossible. We build models, the nodeFunctions within models, and other nimbleFunctions by generating and evaluating code for new reference class definitions. The member functions in these classes do not use R's scoping, so we're stuck. In other words, we could arrange to build m, but then calculate(m) would attempt to use its nodeFunctions, which are reference class objects, which might contain a call to nf1 but wouldn't find it as one would expect from lexical scoping. Again, this is a limitation of R's reference class system.

As a result, users must define their models and nimbleFunctions in the global environment.

do we want getPositions and getParents functions?

also do we want a function that tells us the dimension of a node?

revise handling of : in loops in BUGS code

handle for-loop indexes consistent with BUGS & jags. e.g., for(i in 1:3+1) {}, should give lowest precendence to the ':' operator

add progress bars for various building, compilation, MCMC running steps

Anything that might take more than 15-30 seconds or so might have a progress bar.

A first step would be to have a general function to help with this. There might be some code available on the web.

fix nodefunction compilation so that argument matching by name works

Currently, my understanding is that arg matching by name works for standard nimbleFunctions but something is up with how we process nodefunctions such that in a nodefunction, the arg names are ignored.

Perry, I know we chatted briefly about this, but I'm forgetting what was said. If you can point me in the right direction, I'm happy to take this on.

making simulate default to include deterministic dependencies

We had this as an issue in the issues file. Do we want to move ahead with this and have a nimbleOption to turn this off?

add var to dlnorm as alternative parameter

also have sigma and var for dt

avoid need to lift boolean arithmetic expressions that are part of distribution parameter expressions

Currently if we have
mu.c ~ dconstraint(mu < 1)
the user needs to manually lift mu<1:
ind <- mu<1
mu.c ~ dconstraint(ind)

I believe this arises more generally when using a boolean expression as a distribution argument/parameter.

Check for duplicate node declarations, throw an error

add domain ranges to all distributions in inputList

When T() is used, modify range and set truncation flag to TRUE

Also tell users to do this if add user-provided distribution

update manual on use of distribution functions in nimbleFunctions

Once keyword processing is dealt with, we should update the manual to reflect the additional flexibility we allow.

Also note at that location in the manual that user defined dist functions can be used. (Note need to deal with keyword processing for user-defined too via matchFunctions analog in nimbleUserNamespace)

Remove declarations which define zero nodes (e.g., backward indexed)

allow user-defined link functions

attempt to warn/error out when using indexing outside of a loop

The following code provides a non-informative warning message, does not error out, and produces a non-functional model, with a node function for "mu[]"

Note the syntax error that the user forgot to have {} wrap the two lines intended to be part of the for loop.

code = nimbleCode({
for(i in 1:3)
y[i] ~ dnorm(mu[i],1)
mu[i] ~ dnorm(0,1)
})
m = nimbleModel(code)

deal with pivoting in chol in DSL

add special case conj samplers

e.g. exp is gamma, unif is everything, chisq is gamma, etc.
scalar N with mv N descendants is conj (might be hard)

mvSamples for nodes/vars not sampled in an MCMC are filled with 0 on the C side

We might want to change this to put in NA_REAL if that is possible. Might be a general change that C modelValues are initialized with NA_REAL?

Create a system where every scalar node element has a unique ID

is getDepencies in devel behaving as discussed Fri May 1 mtg?

Note that when returnScalarComponents = FALSE, we are returning extra "y" nodes.

code <- nimbleCode({
for(i in 1:4) {
y[i] ~ dnorm(mu[i],1)
}
mu[1:2] ~ dmnorm(z[1:2],I[1:2,1:2])
mu[3:4] ~dmnorm(z[1:2],I[1:2,1:2])
})

m <- nimbleModel(code)
m$getDependencies("mu[2:3]")
[1] "mu[1:2]" "mu[3:4]" "y[1]" "y[2]" "y[3]" "y[4]"
m$getDependencies("mu[1]")
[1] "mu[1:2]" "y[1]" "y[2]"
m$getDependencies("mu[1:2]")
[1] "mu[1:2]" "y[1]" "y[2]"
m$getDependencies("mu[2]")
[1] "mu[1:2]" "y[1]" "y[2]"

m$getDependencies("mu[2:3]", returnScalarComponents = TRUE)
[1] "mu[2]" "mu[3]" "y[2]" "y[3]"
m$getDependencies("mu[1]", returnScalarComponents = TRUE)
[1] "mu[1]" "y[1]"
m$getDependencies("mu[1:2]", returnScalarComponents = TRUE)
[1] "mu[1]" "mu[2]" "y[1]" "y[2]"
m$getDependencies("mu[2]", returnScalarComponents = TRUE)
[1] "mu[2]" "y[2]"

push keyword processing down to RCfunction class

Conjugacy-checking to inspect relationships at the declaration level

Potential for massive improvement in the efficiency of conjugacy checking.

provide more informative error msgs for nimbleModel() and compileNimble() when dimension info not sufficient in BUGS code

In a number of cases our error msgs are non-informative when no brackets are provided for non-scalars (see below) (note we are much better when brackets are provided but the indexing is missing). One possibility would be to write a little system to check code based on knowing dimensions of distribution values and parameters and knowing expected dimensions for various DSL functions.

Examples:
code <- nimbleCode({
b[1:2] ~ dmnorm(mn[1:2], I[1:2,1:2])
z <- Z%*%b
})

m <- nimbleModel(code, inits = list(mn = rep(0, 2), I = diag(rep(0,2)), b = c(1,2), Z =matrix(rnorm(4),2)))
Error in parentExprReplaced[[iI + 2]] :
object of type 'symbol' is not subsettable

code <- nimbleCode({
b ~ dmnorm(mn, I)
})

m <- nimbleModel(code, inits = list(mn = rep(0, 2), I = diag(rep(1,2)), b = c(1,2)))
cm <- compileNimble(m)
Error: Error, sizeUnaryCwiseSquare was called with an argument that is not a matrix

code <- nimbleCode({
b ~ dmnorm(mn, I)
z[1:2] <- Z[1:2,1:2]%*%b[1:2]
})

m <- nimbleModel(code, inits = list(mn = rep(0, 2), I = diag(rep(1,2)), b = c(1,2), Z =matrix(rnorm(4),2)))
Error in if (is.numeric(v) && any(v < 0)) { :
missing value where TRUE/FALSE needed