tobiasmadsen / dgraph Goto Github PK

View Code? Open in Web Editor NEW

3.0 3.0 2.0 581 KB

Discrete factor graphs in R

R 43.91% C++ 49.26% HTML 6.45% Makefile 0.09% M4 0.28%

dgraph's People

Contributors

Stargazers

Watchers

Forkers

indykpol guhjy

dgraph's Issues

Plotting graph works only for trees

Using reingold.tilford as layout for igraph plot layout only works for trees and not for disconnected graphs.

Use of Rcpp Module causes dangling pointers

A well known problem with Rcpp Modules.
http://lists.r-forge.r-project.org/pipermail/rcpp-devel/2014-June/007758.html

Easiest way to reproduce is:

Create module
Save Image
Overwrite module
Run garbage collection: gctorture()
Load Image
Call module function

E.g.

varDim <- 2
facPot <- list(matrix(0.5,1,2))
facNbs <- list(1)
mydfg <- dfg(varDim, facPot, facNbs)

save.image(".RData")

mydfg <- 2

gctorture()

load(".RData")

mydfg$dfgmodule$resetFactorPotentials( list(matrix(0.5,1,2)) )

Give facScores directly to tail approximation methods

Provide facScores argument directly instead of through the foreground model

tailIS and kl should not require same potMap only same structure

Enable comparison of two factor graphs with similar structure, but different mappings of potentials. Comparisons are performed in both tailSaddle, tailIS and kl.

Refactor the check of similar structure and generate smallest structure that captures both maps.

Reflection: Optimize and potentials

Use the potential generators in the optimize functions.

Check of range of data can fail for a matrix

Call to sapply gives vector of length nrow(data) for data.frame but a vector of length nrow(data)*ncol(data) for matrices.

 if( ! suppressWarnings( all( sapply(data, max, na.rm = T) <= dfg$varDim) ) )
    stop("Data outside range")

Saddlepoint calculation

We find the saddlepoint by Newton-Raphson, however by using the fact that kappa prime is monotonous, we can fall back on bisection if the Newton-Raphson procedure is diverging.

Kullback leibler divergence

For continuous distribution some probabilities might be zero(numeric underflow) in foreground while non-zero in background causing spurious infinite kullback leibler divergence.

Copy function unnecessary

The copy function was made to ensure that each dfg object had it's own underlying module. The module is now built as needed( see #8) and the normal assign function can be used instead.

Underflow for variable with large number of neighbors

We get an underflow if a variable has a large number of neighbors. Normalization of messages is performed only each time a new message is computed, should also be normalized during computations.

library(dgRaph)
N <- 1000
varDim <- rep(100, N+1)
facPot <- list(matrix(0.01, 100, 100))
facNbs <- lapply(1:N, function(i){c(1,i+1)})
potMap <- rep(1, N)
dfg(varDim, facPot, facNbs, potMap)

data <- data.frame(matrix(rep(1,N+1),1,N+1))
likelihood(data, dfg = mydfg) # Produce NaN

Automate choice of IS distribution

The choice of alpha is hard to decide beforehand for user. Some ideas could be

Combining different values of alpha: Importance samples can be combined, look at http://statweb.stanford.edu/~owen/reports/seis.pdf
Use range of score to and pick distributions that generate scores with mean dispersed over that range

tail approximations fail with unnormalized graphs

Tail approximation with importance sampling (and possibly also saddlepoint approximation) gives wrong results with un-normalized foreground and/or backgrounds.

Can not be installed on CentOS

Upon installation on CentOS server

g++ -I/com/extra/R/3.1.0/lib64/R/include -DNDEBUG -DDEBUG -I/usr/local/include -I"/home/tm/R/x86_64-unknown-linux-gnu-library/3.1/Rcpp/include" -I"/home/tm/R/x86_64-unknown-linux-gnu-library/3.1/BH/include" -fpic -g -O2 -c DiscreteFactorGraph.cpp -o DiscreteFactorGraph.o
In file included from /usr/lib/gcc/x86_64-redhat-linux/4.4.6/../../../../include/c++/4.4.6/array:35,
from DiscreteFactorGraph.h:21,
from DiscreteFactorGraph.cpp:6:
/usr/lib/gcc/x86_64-redhat-linux/4.4.6/../../../../include/c++/4.4.6/c++0x_warning.h:31:2: error: #error This file requires compiler and library support for the upcoming ISO C++ standard, C++0x. This support is currently experimental, and must be enabled with the -std=c++0x or -std=gnu++0x compiler options.
In file included from DiscreteFactorGraph.cpp:6:
DiscreteFactorGraph.h:148: error: ISO C++ forbids declaration of 'array' with no type
DiscreteFactorGraph.h:148: error: invalid use of '::'
DiscreteFactorGraph.h:148: error: expected ';' before '<' token
DiscreteFactorGraph.h:149: error: ISO C++ forbids declaration of 'array' with no type
DiscreteFactorGraph.h:149: error: invalid use of '::'
DiscreteFactorGraph.h:149: error: expected ';' before '<' token
DiscreteFactorGraph.cpp:835: error: expected constructor, destructor, or type conversion before '<' token
DiscreteFactorGraph.cpp:1207: error: expected '}' at end of input

Can be solved by adding --std=c++0x to PKG_CPPFLAGS in src/Makevars.

CI for quantiles estimated with Importance Sampling

Could use bootstrapping?

Avoid redoing sumProduct Algorithm

With this commit d636c40 the option to calculate the normalising constant without rerunning the sumproduct-algorithm was removed. This was done primarily because those function would fail in case of disconnected graphs. At the same time we want to delegate the decision of whether sumProduct needs to be run to the sumproduct algorithm itself.

Possible solution:
Have a private boolean indicating if internal state(potentials) has changed since sumProduct was run last.

Complications:
MaxSum algorithm uses same set of messages to do calculations.
Identify other scenarios where sumProduct needs to be re-run.

random NA's in tailSaddle

The saddle point approximation method generates random NA's. Reproducible example to come.

simulate masks stats::simulate

Our simulate hides stats::simulate.

library(dgRaph)
dat <- data.frame(x = 1:50, y = rnorm(50,1:50,2))
mylm <- lm(formula = y ~ x, data = dat)
simulate(mylm, nsim = 10) # Does not work

More checks to dataList input

If soft evidence input using dataList does not have the correct length, we might see segfaults. More to come...

Interrupt long running loops

Call checkInterrupt in long running loops