Git Product home page Git Product logo

aemcmc's Issues

Add support for closed-form posteriors

We need to copy over the kanren relations in symbolic_pymc so that we can sample directly from closed-form posteriors—when possible. See the basic example here.

Within the general process of constructing a sampler for a model graph (see #3), we could apply these rewrites as well, and—in some cases—simply return a graph of the joint posterior, instead of a sampler graph.

Design robust calibration tests for samplers

aemcmc currently does not implement calibration tests for its samplers (i.e. it does not check that the samplers generate samples from the correct distribution), while aehmc's tests are very flaky and can be made to pass or fail with different RNG seeds. While this is still a very open area of research, making sure that the samplers that aemcmc build generate correct samples is critical.

This blog post summarizes the situation fairly well and links to the relevant literature.

My current understanding from the literature is that existing solutions are hypothesis tests which test hypotheses that we don't expect to be true. There seems to be (unsurprisingly) little hope that we can devise tests that require absolutely no supervision. However, there are a few things we could do to automate some of Gelman's suggestions in the post above, and the general idea would be to have some rough tests in CI to avoid obvious miscalibration (check that the chain has moved on a simple problem, check the monte carlo error, etc. )and give the users some tools to evaluate the calibration of the sampler on their use case.

Implement sampler for regressions with a `horsehoe+` prior

We could implement the Gibbs sampler described in [1] for the parameters of the Horseshoe+ prior.

[1] Makalic, Enes, and Daniel F. Schmidt. “High-Dimensional Bayesian Regularised Regression with the BayesReg Package.” ArXiv Preprint ArXiv:1611.06649, 2016.

Rewrite the Horseshoe Gibbs sampler to use our rewrite system

The Gibbs sampler for the horseshoe prior does not work on the conditional distributions found in the model, but on the conditional distributions of a transformed version of the model.

In particular, it exploits the fact that a half-Cauchy distribution can be intepreted as a mixture of inverse-gamma distributions, and that if $Z \sim \operatorname{InvGamma}(1, a)$ then $Z \sim 1 / \operatorname{Exp}(a)$.

We should instead implement these relations and update construct_samplers so it finds this structure and builds the sampling steps for each variable in the original model expression.

Related to #46, #65

Add relations for the Student t distributions

  1. If $Z \sim \operatorname{Norm}(0,1)$ and $\chi^2 \sim \operatorname{ChiSquared}(n)$ then $Y = \frac{Z}{\sqrt{\chi^2 /n}} \sim \operatorname{T}(n)$. Falls under #65.
  2. If $X \sim \operatorname{Norm}(\mu, \sigma)$, $\tau = 1 / \sigma^2 \sim \operatorname{Gamma}(\alpha, \beta)$ then $X$ is a random variable with a Student's t distribution. This one falls under #49.

Add examples to the README

A couple of examples to illustrate the capabilities of the library. We can begin with :

  • Closed-form posterior
  • Sparse regression

Update python versions in tests

Tests are currently ran with Python 3.6 and 3.7. However Python 3.6 has reached End Of Life and versions until 3.10 should be supported.

Create a framework for matching models to samplers

We need a framework for mapping samplers to Aesara model graphs.

In general, one of the main interfaces to the sampler implementations in this project would accept a model graph (i.e. an Aesara graph that represents the statistical model) and return a graph that represents the sampler steps that generate posterior samples for the model.

For example, our current Horseshoe Gibbs sampler implementation applies to normal regression models with Horseshoe prior regression parameters. We want to be able to parse a given model graph and determine whether or not those elements are present so that the Gibbs sampler can be proposed/applied.

The act of parsing such models is easily answered by Aesara's basic rewriting and unification functionality, but we need a good way to organize and deploy the process. For instance, how do we want to store the mappings between a unification "pattern" and a set of samplers?

Also, this framework needs to be flexible enough to identify and apply rewrites that could make a model graph amenable to samplers. The primary example is our current normal scale-mixture expansion for the negative-binomial (via the Polya-gamma); after identifying the Horseshoe regression portion of the model (e.g. X.dot(beta_rv), where beta_rv = at.random.normal(0, eta_rv * lambda2_rv) and eta_rv = at.random.halfcauchy(...), etc.), the negative-binomial shouldn't prevent the Horseshoe Gibbs sampler from being applied, because it can be converted into a normal scale-mixture.

This framework will clearly involve a rewrite process similar to AePPL, but there are some unique organizational aspects we need to address as well.

Expand scale-mixtures

After #45, we need to add a scale-mixture-expansion step to the sampler-generating process.

We can start by implementing the Polya-gamma expansion for negative-binomials and Bernoulli random variables. This will work together to produce nbinom_normal_posterior and bern_normal_posterior in #45, after expansion and subsequent detection of the resulting normal-normal posterior.

On a related note, we'll need to start walking through "graph rewrite space" in a more reasonable way. For instance, a scale-mixture expansion might aid in the construction of a sampler where one wasn't possible before (e.g. we don't have a sampler for the unexpanded term in our "database"). On the other hand, when there is a sampler for a term that can be expanded, we will need to consider the samplers resulting from both expanding and not expanding.

`deterministic_vars` and `updates` should be optional to initialize `ModelInfo`

Indeed these are not needed for many models, for instance the following toy example:

import aesara.tensor as at
from aesara.tensor.random import RandomStream

from aemcmc.utils import ModelInfo

srng = RandomStream(0)
a_rv = srng.normal(0, 1, name="a")
b_rv = srng.normal(a_rv, 1, name="b")

a_vv = a_rv.clone()
a_vv.name = "a_vv"

b_at = at.scalar(name="b_at")

model = ModelInfo((b_rv,), {a_rv: a_vv, b_rv: b_at}, (), ())

Standardize the sampler-constructor interface

Currently, this library provides a few functions that construct samplers for very specific models given the inputs/parameters of said models. We should standardize these constructor functions under a function interface that takes model graphs as inputs (i.e. graphs that represent the models supported by the constructor functions).

This is a first step toward the more general case of #3.

Again, since our current constructor functions apply to very specific types of models, we can keep things simple and employ some very basic/restricted unification/pattern matching and generalize from there.

For example:

import numpy as np
import aesara
import aesara.tensor as at


srng = at.random.RandomStream(seed=2309)

X = at.as_tensor(np.random.normal(size=(100, 10)))

# Horseshoe sub-graph
tau_rv = srng.halfcauchy(size=1)
lambda_rv = srng.halfcauchy(size=10)
beta_rv = srng.normal(0, lambda_rv * tau_rv, size=10)

# The "observation" model
eta = X @ beta_rv
p = at.sigmoid(-eta)
Y_rv = srng.nbinom(10, p)


gibbs_sampler = horseshoe_nbinom_gibbs(Y_rv, obs, n_samples)

where horseshoe_nbinom_gibbs would extract the X, beta_rv, and any other necessary components and subsequently perform the same actions as aemcmc.negative_binomial_horseshoe_gibbs.

Add reversible jump MCMC

Reversible Jump MCMC algorithms are a generalization of the Rosenbluth-Metropolis-Hastings algorithm for when the target density does not have a fixed number of dimensions. Since aesara can handle variables of varying dimensions we should be able to implement such algorithms in aehmc.

As a motivating example we could use the coal-mining disaster datasets and let the number of change points be a Poisson-distributed as in the reference below.

References

https://www2.stat.duke.edu/~scs/Courses/Stat376/Papers/TransdimMCMC/GreenRevJump.1995.pdf

Lift random variables through mixtures when looking for closed-form posteriors

Copying @brandonwillard's message from #4. If we lift random/measurable variables through mixtures, we can enable some important closed-form posterior opportunities.

For example:

import aesara
import aesara.tensor as at


srng = at.random.RandomStream(4238)

I_rv = srng.bernoulli(0.5, name="I")

Z_1_rv = srng.gamma(10, 100, name="Z_1")
Z_2_rv = srng.gamma(1, 1, name="Z_2")

Z_rv = at.stack([Z_1_rv, Z_2_rv])

# Observation model
Y_rv = srng.poisson(Z_rv[I_rv], name="Y")

Conjugate updates are available between Y_rv and the two Z_*_rv, conditional on the values of I_rv.

The model after lifting should be equivalent to the following:

Z_1_new_rv = srng.poisson(Z_1_rv, name="Z_1_new")
Z_2_new_rv = srng.poisson(Z_2_rv, name="Z_2_new")

# New observation model
Y_new_rv = at.stack([Z_1_new_rv, Z_2_new_rv])
Y_new_rv.name = "Y_new"

The Z_*_new_rv terms are now amenable to the Poisson-gamma conjugate rewrites.

Add a Metropolis-within-Gibbs sampler

Following #16 we should implement a Metropolis-within-Gibbs sampler similar to PyMC's (algorithms listed in this file) that uses NUTS to update the RVs with continuous variables and Gibbs samplers / others to update the RVs with discrete support.

This way we can start attracting users who will interact with a generic build_sampler (name tbd) function, and we will have a first performance benchmark. It will be a big step towards #3.

Add ratio distributions relations

Ratio distributions are distributions of random variables $Z$ that can be formed as the ratio $Z = X/Y$ of two other random variables $X$ and $Y$. Common examples are Cauchy distributions as the ratio of two normal distributions, the T distribution and the F distribution.

Add a function to change the scan order of the sampling steps returned by `construct_sampler`

The order in which the sampling steps are chained together can influence the sampling results, we should thus allow users to change the scan order of the samplers we return.

After #76 it will be possible to inspect the sampler graph and identify the sampling steps according to their type. We can then rewire the graphs e.g. as follows:

More generally, we will need to write a series of operators that act on samplers: we may want to change the scan order, randomize the scan order, merge two samplers (e.g. with different scan orders for symmetric scan), vectorize the sampler, draw several samples using scan, etc. Related to #80.

Logistic regression with time-varying coefficients: cannot construct sampler

Description of your problem or feature request

I want to perform a classification task which consists in predicting whether a document $i$ published at a time $t_i$ belongs to a certain class ( $y_i=1$ ) or not ( $y_i=0$ ). The contents of $i$ is given by a bag of words $b_{ij}$. The predictive features are encoded into a sparse tensor $x_{ijt} = b_{ij} \delta_{t,t_i}$. For this task I am considering a logistic regression with time-varying coefficients, such that the coefficients evolve through a random walk:

$$\text{logit} (p_{i}) = \sum_{jt} x_{ijt} \beta_{jt}$$

$$y_{i} \sim \text{Bernouilli}(p_i)$$

$$\beta_{j,t+1} \sim \text{Normal}(\beta_{j,t},\sigma_j^{-1})$$

$$\beta_{j,0} \sim \text{Normal}(0,\sigma_{j,0}^{-1})$$

$$\sigma_j \sim \text{Exponential}(1)$$

$$\sigma_{j,0} \sim \text{Exponential}(1)$$

For the sake of simplicity below I assume $\sigma_{j,0}=\sigma_j$, although I do not see any compelling reason to do so.

However:

  • I cannot construct a sampler for this model (see details below)
  • I wonder if there could be a more efficient implementation anyway: $x_{ijt}$ is very sparse, and at.tensordot(X, beta_t_rv, 2) involves too many needless operations. I would not have done that with Stan but in this case I found it easier to write the model this way.

Please provide a minimal, self-contained, and reproducible example.

As a clueless user, I tried to build a sampler based on my generative model by looking at other examples; which means I could be doing something completely idiotic!
Here is what I came up with:

import numpy as np

import aesara
import aesara.tensor as at
from aemcmc.basic import construct_sampler
from aesara.tensor.random.utils import RandomStream

def logistic_fit(X_val, y_val):
    N, M, T = X_val.shape

    srng = RandomStream(0)
    X = at.tensor3("X")

    sigma_rv = srng.exponential(1, size=X.shape[1])
    beta_t_rv = at.cumsum(srng.normal(0, 1/sigma_rv, size=(X.shape[1],X.shape[2])), axis=1)

    eta = at.tensordot(X, beta_t_rv, 2)
    p = at.sigmoid(-eta)
    Y_rv = srng.bernoulli(p, name="Y")

    y_vv = Y_rv.clone()
    y_vv.name = "y"

    sample_vars = [sigma_rv, beta_t_rv]

    sample_steps, updates, initial_values = construct_sampler({Y_rv: y_vv}, srng)

# X_val = np.load("X_val.npy")
# y_val = np.load("y_val.npy")

X_val = np.zeros((1000, 50, 10))
y_val = np.zeros(1000)

beta = logistic_fit(X_val, y_val)

Although thus far the script does not actually use the data, if at some point you want to try with actual data, download and unzip these files (X_val.npy and y_val.npy) to the folder the script is executed from: data.zip

However, this crashes:

Please provide the full traceback of any errors.

ERROR (aesara.graph.rewriting.basic): Rewrite failure due to: local_elemwise_dimshuffle_subsume
ERROR (aesara.graph.rewriting.basic): node: Elemwise{true_div,no_inplace}(InplaceDimShuffle{x}.0, exponential_rv{0, (0,), floatX, False}.out)
ERROR (aesara.graph.rewriting.basic): TRACEBACK:
ERROR (aesara.graph.rewriting.basic): Traceback (most recent call last):
  File "/Users/acristia/anaconda3/lib/python3.8/site-packages/aesara/graph/rewriting/basic.py", line 1933, in process_node
    replacements = node_rewriter.transform(fgraph, node)
  File "/Users/acristia/anaconda3/lib/python3.8/site-packages/aesara/graph/rewriting/basic.py", line 1092, in transform
    return self.fn(fgraph, node)
  File "/Users/acristia/anaconda3/lib/python3.8/site-packages/aemcmc/rewriting.py", line 345, in local_elemwise_dimshuffle_subsume
    new_op = SubsumingElemwise(new_inputs, new_outputs, inline=True)
  File "/Users/acristia/anaconda3/lib/python3.8/site-packages/aemcmc/rewriting.py", line 189, in __init__
    OpFromGraph.__init__(self, inputs, outputs, *args, **kwargs)
  File "/Users/acristia/anaconda3/lib/python3.8/site-packages/aesara/compile/builders.py", line 343, in __init__
    raise TypeError(f"Constants not allowed as inputs; {i}")
TypeError: Constants not allowed as inputs; TensorConstant{1}

Traceback (most recent call last):
  File "examples/gibbs_sample.py", line 61, in <module>
    beta = logistic_fit(X_val, y_val)
  File "examples/gibbs_sample.py", line 26, in logistic_fit
    sample_steps, updates, initial_values = construct_sampler({Y_rv: y_vv}, srng)
  File "/Users/acristia/anaconda3/lib/python3.8/site-packages/aemcmc/basic.py", line 108, in construct_sampler
    raise NotImplementedError(
NotImplementedError: Could not find a posterior samplers for {exponential_rv{0, (0,), floatX, False}.out, normal_rv{0, (0, 0), floatX, False}.out}

Versions and main components

  • Aesara version: 2.8.9
  • Aemcmc version: 0.06
  • Python version: 3.8.10
  • Operating system: macOS 10.15.4
  • How did you install Aesara: pip

Implement sampler for logistic regularized regression

See section 2.2 in [1]. We should be able to reuse most of the negative binomial regularized regression's implementation.

[1] Makalic, Enes, and Daniel F. Schmidt. “High-Dimensional Bayesian Regularised Regression with the BayesReg Package.” ArXiv Preprint ArXiv:1611.06649, 2016.

Generalize `horseshoe_nbinom_gibbs`

Currently, horseshoe_nbinom_gibbs fixes a prior for the beta intercept term, but we need all choices like that to be fundamentally configurable, and, eventually, specified exclusively by a given model graph (e.g. once #16 is in place).

Instead, let's make horseshoe_nbinom_gibbs work without an explicit intercept assumption (i.e. only X.dot(beta)). This will broaden the applicability of the sampler and should simplify the situation for #16.

Also, if there's an accuracy/performance cost to this change, that would be a good thing to investigate relative to our model graph-based goals (e.g. how could we reproduce these benefits without compromising the desired interface/functionality).

Add support for the Poisson-Gamma closed-form posterior

Consider the following observation model:

import aesara
import aesara.tensor as at

srng = at.random.RandomStream(4238)

alpha = at.scalar()
beta = at.scalar()
observed = at.scalar()

Z_rv = srng.gamma(alpha, beta, name="Z_1")
Y_rv = srng.poisson(Z_rv, name="Y")

Z_rv is a conjugate prior of Y_rv and if the observed value of the latter is observed the model is equivalent to the following:

Z_rv = srng.gamma(alpha + observed, beta + 1)

Implement a function that constructs FFBS samplers for model graphs

With the recently added aeppl mixture features and an HMM state sequence like the one implemented in aeppl-hmm, we're now able to make a function (e.g. create_ffbs_sampler) that constructs FFBS sample step graphs given a graph of an HMM model.

To do this, we'll need the intermediate forms provided by aeppl, and, in particular, the Mixture Op. Just as aeppl does to produce a log-probability graph, we can use the same steps to generate the intermediate form, search for eligible Mixture and HMM state sequence nodes, extract the necessary FFBS parameters (i.e. transition matrices, initial state probabilities, state sequence), and use those to construct an FFBS step graph.

Make `horseshoe_nbinom_gibbs` return a kernel

The current version of horseshoe_nbinom_gibbs includes a scan loop to get samples using the kernel defined inside the function. However, to be able to compose this kernel with others we need to be able to build its graph independently from any sampling loop.

Add relation between different parametrizations?

Examples that come to mind are the Binomial distribution and the Wald distribution (but there are so many others, see the Leemis chart for more examples). We will need to add new RV types for these distributions.

This could be very useful, for instance the Wald distribution in its usual form can be reparametrized to a form that belongs to the scale family, thus enabling new rewrites.

The "X is a special case of Y when ..." fall under this category, the difference being we don't need to introduce new types.

Add standard HMC/NUTS defaults and options

Aside from constructing complete sampler loops per #80, we need to set some standard HMC/NUTS defaults (e.g. mass matrix adaptation) and clarify the settings/options interface to the sampler construction process.

Add a utility function to sample using `scan`

Using the sampling steps built by AeMCMC in a scan loop is not straightforward:

import aesara
import aemcmc


sample_steps, sample_updates, initial_values = aemcmc.construct_sampler(
    {Y_rv: y_tt}, srng
)

to_sample_rvs: List[TensorVariable]
inputs = [initial_values[rv] for rv in to_sample_rvs]
outputs = [sample_steps[rv] for rv in to_sample_rvs]

def step_fn(*values):
    from aesara.compile.function.pfunc import rebuild_collect_shared

    vv_to_values = {inputs[i]: val for i, val in enumerate(values)}

    _, new_values, [_, new_updates, _, _] = rebuild_collect_shared(
        outputs, inputs=inputs, replace=vv_to_values, updates=sample_updates
    )

    return new_values, new_updates

n_samples = at.iscalar("n_samples")
outputs, updates = aesara.scan(step_fn, outputs_info=inputs, n_steps=n_samples)

sample_fn = aesara.function(inputs + [n_samples], outputs, updates=updates)

but easily generalizable. We should implement a utility function, e.g. aemcmc.sampling_loop which, given the outputs of construct_sampler and a number of iterations n_samples returns a graph that generate n_samples.

Use rules notation to describe the beta-binomial relation

The rule is currently described as follows in the docstring:

$$ \begin{align*} p &\sim \operatorname{Beta}\left(\alpha, \beta\right)\\ Y &\sim \operatorname{Binom}\left(N, p\right) \end{align*} $$

If we observe $Y=y$, then $p$ follows a beta distribution:

$$ p \sim \operatorname{Beta}\left(\alpha + y, \beta + N - y\right) $$

Proposed change

$$ \frac{ Y \sim \operatorname{Binom}\left(N, p\right), \quad p \sim \operatorname{Beta}\left(\alpha, \beta\right) }{ \left(p|Y=y\right) \sim \operatorname{Beta}\left(\alpha+y, \beta+N-y\right) } $$

The mathematical expression of the relation in `scale_loc_transform` does not correspond to this transform

The current docstring contains the following relation:

$$ \begin{equation*} \frac{ X \sim \operatorname{N}\left(\mu_x, \sigma_x^2\right), \quad Y \sim \operatorname{N}\left(\mu_y, \sigma_y^2\right), \quad X + Y = Z }{ Z \sim \operatorname{N}\left(\mu_x + \mu_y, \sigma_x^2 + \sigma_y^2\right) } \end{equation*} $$

when we want:

$$ \begin{equation*} \frac{ Y \sim \operatorname{P}(0, 1), \quad X = \mu + \sigma,Y }{ X \sim \operatorname{P}\left(\mu, \sigma\right) } \end{equation*} $$

`construct_sampler` does not support transformed observables

It looks like we need to convert model graphs into AePPL IR and operate on those, because construct_sampler/construct_ir_fgraph only supports observed RandomVariables and not general MeasurableVariables.

An example:

import aesara.tensor as at
import aemcmc

srng = at.random.RandomStream(23920)

X_rv = srng.normal(0, 1, name="X")
Y_rv = 1 + X_rv

obs_rvs_to_values = {Y_rv: at.scalar("y")}

sample_steps, updates, initial_values, nuts_parameters = aemcmc.construct_sampler(
    obs_rvs_to_values, srng
)
aemcmc/basic.py:40: in construct_sampler
    fgraph, obs_rvs_to_values, memo, new_to_old_rvs = construct_ir_fgraph(
aemcmc/rewriting.py:91: in construct_ir_fgraph
    obs_rvs_to_values = {memo[k]: v for k, v in obs_rvs_to_values.items()}
aemcmc/rewriting.py:91: in <dictcomp>
    obs_rvs_to_values = {memo[k]: v for k, v in obs_rvs_to_values.items()}
E   KeyError: Elemwise{add,no_inplace}.0

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.