ae-foster / pyro Goto Github PK

View Code? Open in Web Editor NEW

This project forked from pyro-ppl/pyro

10.0 10.0 5.0 68.48 MB

Deep universal probabilistic programming with Python and PyTorch

Home Page: http://pyro.ai

License: Other

Makefile 0.31% Shell 0.31% Python 98.81% CSS 0.02% TeX 0.18% Dockerfile 0.08% C++ 0.29%

pyro's People

Contributors

Stargazers

Watchers

Forkers

ilyas-malik steliord dennisprangle phy-reac minhvt34

pyro's Issues

Best practice for optimizing loss functions in pyro

I'm currently optimizing my loss functions using opt_eig_ape_loss:

def opt_eig_ape_loss(design, loss_fn, num_samples, num_steps, optim, return_history=False,
                     final_design=None, final_num_samples=None):

    if final_design is None:
        final_design = design
    if final_num_samples is None:
        final_num_samples = num_samples

    params = None
    history = []
    for step in range(num_steps):
        if params is not None:
            pyro.infer.util.zero_grads(params)
        with poutine.trace(param_only=True) as param_capture:
            agg_loss, loss = loss_fn(design, num_samples)
        params = set(site["value"].unconstrained()
                     for site in param_capture.trace.nodes.values())
        if torch.isnan(agg_loss):
            raise ArithmeticError("Encountered NaN loss in opt_eig_ape_loss")
        agg_loss.backward()
        if return_history:
            history.append(loss)
        optim(params)

    _, loss = loss_fn(final_design, final_num_samples, evaluation=True)
    if return_history:
        return torch.stack(history), loss
    else:
        return loss

This seems like something that might be implemented in core Pyro, or could be done better using more Pyro tools?

NMC with `independent_priors=True`

This is now broken due to more rigorous shape requirements in modern pytorch.

Syntax

I believe pyro syntax has changed (e.g. plate) since I took my fork.

Reorganize `pyro.contrib.glmm`

The status of pyro.contrib.glmm is half way between example and contribution. We should either move it to examples, upgrade it to a proper contrib, or (most likely) split the code. For instance, the models could remain in glmm but the critics and guides should move to examples.contrib.oed.

Better function names and docstrings

The code is littered with legacy names: naive_rainforth, barber_agakov, etc. Best bring the names into line with our submission. We also need to write/update docstrings significantly.

Pytorch version

We need to ensure the code runs against pytorch==1.0 not pytorch==0.4.0

Clean up examples

The pyro.contrib.oed examples need to be tidied up:

we have some old, broken examples
we have 'productionized' experiment code for the paper, which makes sense for running multiple experiments (pickle the output, write a separate plotting script, etc) but are not exactly readable. On the other hand, they are important for reproducibility.
even the comprehensible examples, like location, lack explanation

Write. Some. Tests.

There is a shocking lack of tests in the current module. Fortunately, the test cases already exist in eig_estimation_benchmarking and simply need to be ported / modified to become actual software tests. We could also review the state of tests for glmm

Best practice for MC in pyro

I think pyro.contrib.oed has deviated from best practice for doing Monte Carlo estimation in pyro. Let's look at how I currently obtain multiple, independent samples from a model:

First

def lexpand(A, *dimensions):
    """Expand tensor, adding new dimensions on left."""
    return A.expand(tuple(dimensions) + A.shape)

Then, in eig.py

# Take N samples of the model
expanded_design = lexpand(design, N)  # N copies of the model
trace = poutine.trace(model).get_trace(expanded_design)

What's the point of this versus something like EmpiricalMarginal? This approach uses tensorization nicely and we can run the simulations in parallel: in practice it is much faster than running N simulations of the model in series, by creating multiple traces or something. Another appealing thing is that I can control the shape of the output tensor: e.g. if I want NM samples in a grid (e.g. to sum over one dimension, do something else on another) I just lexpand(design, N, M).

The problem: this is not pyro- I need some code inside my models that expands everything to match the dimensions of the design input. Is there a tensorized way to take independent samples of a model?

NMC memory leaks

Issue Description

NMC samples can be created in parallel, or in series. Parallel is preferred for small samples, but is upper bounded by the available memory. Series processing should allow us to process many batches sequentially keeping the memory consumption constant. Instead, we see a ballooning of the memory consumption, ending with the process being killed by the OS. Why is the memory used by previous samples not released?

Environment

For any bugs, please provide the following:

Fedora 26, Python 3.6.8
torch==1.1.0
pyro on branch submit of this repository

Code Snippet

The following bash command reproduces the error

> python3 examples/contrib/oed/eig_estimation_benchmarking.py --case-tags=mixed_effects_regression --estimator-tags=truth &
> htop