google / edward2 Goto Github PK

A simple probabilistic programming language.

License: Apache License 2.0

Python 4.91% Jupyter Notebook 95.09%

bayesian-methods deep-learning machine-learning data-science tensorflow neural-networks statistics probabilistic-programming

edward2's Introduction

Edward2

Edward2 is a simple probabilistic programming language. It provides core utilities in deep learning ecosystems so that one can write models as probabilistic programs and manipulate a model's computation for flexible training and inference. It's organized as follows:

edward2/: Library code.
examples/: Examples.
experimental/: Active research projects.

Are you upgrading from Edward? Check out the guide Upgrading_from_Edward_to_Edward2.md. The core utilities are fairly low-level: if you'd like a high-level module for uncertainty modeling, check out the guide for Bayesian Layers. We recommend the Uncertainty Baselines if you'd like to build on research-ready code.

Installation

We recommend the latest development version. To install, run

pip install "edward2 @ git+https://github.com/google/edward2.git"

You can also install the latest stable version using the following. As a caveat, however, we very rarely update the stable version (this is a passion project maintained by part-timers and scheduling releases every so often sucks up time).

pip install edward2

Edward2 supports three backends: TensorFlow (the default), JAX, and NumPy (see below to activate). Installing edward2 does not automatically install any backend. To get these dependencies, use for example pip install edward2[tensorflow]", replacing tensorflow for the appropriate backend. Sometimes Edward2 uses the latest changes from TensorFlow in which you'll need TensorFlow's nightly package: use pip install edward2[tf- nightly].

1. Models as Probabilistic Programs

Random Variables

In Edward2, we use RandomVariables to specify a probabilistic model's structure. A random variable rv carries a probability distribution (rv.distribution), which is a TensorFlow Distribution instance governing the random variable's methods such as log_prob and sample.

Random variables are formed like TensorFlow Distributions.

import edward2 as ed

normal_rv = ed.Normal(loc=0., scale=1.)
## <ed.RandomVariable 'Normal/' shape=() dtype=float32 numpy=0.0024812892>
normal_rv.distribution.log_prob(1.231)
## <tf.Tensor: id=11, shape=(), dtype=float32, numpy=-1.6766189>

dirichlet_rv = ed.Dirichlet(concentration=tf.ones([2, 3]))
## <ed.RandomVariable 'Dirichlet/' shape=(2, 3) dtype=float32 numpy=
array([[0.15864784, 0.01217205, 0.82918006],
       [0.23385087, 0.69622266, 0.06992647]], dtype=float32)>

By default, instantiating a random variable rv creates a sampling op to form the tensor rv.value ~ rv.distribution.sample(). The default number of samples (controllable via the sample_shape argument to rv) is one, and if the optional value argument is provided, no sampling op is created. Random variables can interoperate with TensorFlow ops: the TF ops operate on the sample.

x = ed.Normal(loc=tf.zeros(2), scale=tf.ones(2))
y = 5.
x + y, x / y
## (<tf.Tensor: id=109, shape=(2,), dtype=float32, numpy=array([3.9076924, 4.588356 ], dtype=float32)>,
##  <tf.Tensor: id=111, shape=(2,), dtype=float32, numpy=array([-0.21846154, -0.08232877], dtype=float32)>)
tf.tanh(x * y)
## <tf.Tensor: id=114, shape=(2,), dtype=float32, numpy=array([-0.99996394, -0.9679181 ], dtype=float32)>
x[1]  # 2nd normal rv
## <ed.RandomVariable 'Normal/' shape=() dtype=float32 numpy=-0.41164386>

Probabilistic Models

Probabilistic models in Edward2 are expressed as Python functions that instantiate one or more RandomVariables. Typically, the function ("program") executes the generative process and returns samples. Inputs to the function can be thought of as values the model conditions on.

Below we write Bayesian logistic regression, where binary outcomes are generated given features, coefficients, and an intercept. There is a prior over the coefficients and intercept. Executing the function adds operations samples coefficients and intercept from the prior and uses these samples to compute the outcomes.

def logistic_regression(features):
  """Bayesian logistic regression p(y | x) = int p(y | x, w, b) p(w, b) dwdb."""
  coeffs = ed.Normal(loc=tf.zeros(features.shape[1]), scale=1., name="coeffs")
  intercept = ed.Normal(loc=0., scale=1., name="intercept")
  outcomes = ed.Bernoulli(
      logits=tf.tensordot(features, coeffs, [[1], [0]]) + intercept,
      name="outcomes")
  return outcomes

num_features = 10
features = tf.random.normal([100, num_features])
outcomes = logistic_regression(features)
# <ed.RandomVariable 'outcomes/' shape=(100,) dtype=int32 numpy=
# array([1, 0, ... 0, 1], dtype=int32)>

Edward2 programs can also represent distributions beyond those which directly model data. For example, below we write a learnable distribution with the intention to approximate it to the logistic regression posterior.

def logistic_regression_posterior(coeffs_loc, coeffs_scale,
                                  intercept_loc, intercept_scale):
  """Posterior of Bayesian logistic regression p(w, b | {x, y})."""
  coeffs = ed.MultivariateNormalTriL(
      loc=coeffs_loc,
      scale_tril=tfp.trainable_distributions.tril_with_diag_softplus_and_shift(
          coeffs_scale),
      name="coeffs_posterior")
  intercept = ed.Normal(
      loc=intercept_loc,
      scale=tf.nn.softplus(intercept_scale) + 1e-5,
      name="intercept_posterior")
  return coeffs, intercept

coeffs_loc = tf.Variable(tf.random.normal([num_features]))
coeffs_scale = tf.Variable(tf.random.normal(
    [num_features*(num_features+1) // 2]))

intercept_loc = tf.Variable(tf.random.normal([]))
intercept_scale = tf.Variable(tf.random.normal([]))
posterior_coeffs, posterior_intercept = logistic_regression_posterior(
    coeffs_loc, coeffs_scale, intercept_loc, intercept_scale)

2. Manipulating Model Computation

Tracing

Training and testing probabilistic models typically require more than just samples from the generative process. To enable flexible training and testing, we manipulate the model's computation using tracing.

A tracer is a function that acts on another function f and its arguments *args, **kwargs. It performs various computations before returning an output (typically f(*args, **kwargs): the result of applying the function itself). The ed.trace context manager pushes tracers onto a stack, and any traceable function is intercepted by the stack. All random variable constructors are traceable.

Below we trace the logistic regression model's generative process. In particular, we make predictions with its learned posterior means rather than with its priors.

def set_prior_to_posterior_mean(f, *args, **kwargs):
  """Forms posterior predictions, setting each prior to its posterior mean."""
  name = kwargs.get("name")
  if name == "coeffs":
    return posterior_coeffs.distribution.mean()
  elif name == "intercept":
    return posterior_intercept.distribution.mean()
  return f(*args, **kwargs)

with ed.trace(set_prior_to_posterior_mean):
  predictions = logistic_regression(features)

training_accuracy = (
    tf.reduce_sum(tf.cast(tf.equal(predictions, outcomes), tf.float32)) /
    tf.cast(outcomes.shape[0], tf.float32))

Program Transformations

Using tracing, one can also apply program transformations, which map from one representation of a model to another. This provides convenient access to different model properties depending on the downstream use case.

For example, Markov chain Monte Carlo algorithms often require a model's log-joint probability function as input. Below we take the Bayesian logistic regression program which specifies a generative process, and apply the built-in ed.make_log_joint transformation to obtain its log-joint probability function. The log-joint function takes as input the generative program's original inputs as well as random variables in the program. It returns a scalar Tensor summing over all random variable log-probabilities.

In our example, features and outcomes are fixed, and we want to use Hamiltonian Monte Carlo to draw samples from the posterior distribution of coeffs and intercept. To this use, we create target_log_prob_fn, which takes just coeffs and intercept as arguments and pins the input features and output rv outcomes to its known values.

import no_u_turn_sampler  # local file import

# Set up training data.
features = tf.random.normal([100, 55])
outcomes = tf.random.uniform([100], minval=0, maxval=2, dtype=tf.int32)

# Pass target log-probability function to MCMC transition kernel.
log_joint = ed.make_log_joint_fn(logistic_regression)

def target_log_prob_fn(coeffs, intercept):
  """Target log-probability as a function of states."""
  return log_joint(features,
                   coeffs=coeffs,
                   intercept=intercept,
                   outcomes=outcomes)

coeffs_samples = []
intercept_samples = []
coeffs = tf.random.normal([55])
intercept = tf.random.normal([])
target_log_prob = None
grads_target_log_prob = None
for _ in range(1000):
  [
      [coeffs, intercepts],
      target_log_prob,
      grads_target_log_prob,
  ] = no_u_turn_sampler.kernel(
          target_log_prob_fn=target_log_prob_fn,
          current_state=[coeffs, intercept],
          step_size=[0.1, 0.1],
          current_target_log_prob=target_log_prob,
          current_grads_target_log_prob=grads_target_log_prob)
  coeffs_samples.append(coeffs)
  intercept_samples.append(coeffs)

The returned coeffs_samples and intercept_samples contain 1,000 posterior samples for coeffs and intercept respectively. They may be used, for example, to evaluate the model's posterior predictive on new data.

Using the JAX or NumPy backend

Using alternative backends is as simple as the following:

import edward2.numpy as ed  # NumPy backend
import edward2.jax as ed  # or, JAX backend

In the NumPy backend, Edward2 wraps SciPy distributions. For example, here's linear regression.

def linear_regression(features, prior_precision):
  beta = ed.norm.rvs(loc=0.,
                     scale=1. / np.sqrt(prior_precision),
                     size=features.shape[1])
  y = ed.norm.rvs(loc=np.dot(features, beta), scale=1., size=1)
  return y

References

In general, we recommend citing the following article.

Tran, D., Hoffman, M. D., Moore, D., Suter, C., Vasudevan S., Radul A., Johnson M., and Saurous R. A. (2018). Simple, Distributed, and Accelerated Probabilistic Programming. In Neural Information Processing Systems.

@inproceedings{tran2018simple,
  author = {Dustin Tran and Matthew D. Hoffman and Dave Moore and Christopher Suter and Srinivas Vasudevan and Alexey Radul and Matthew Johnson and Rif A. Saurous},
  title = {Simple, Distributed, and Accelerated Probabilistic Programming},
  booktitle = {Neural Information Processing Systems},
  year = {2018},
}

If you'd like to cite the layers module specifically, use the following article.

Tran, D., Dusenberry M. W., van der Wilk M., Hafner D. (2019). Bayesian Layers: A Module for Neural Network Uncertainty. In Neural Information Processing Systems.

@inproceedings{tran2019bayesian,
  author = {Dustin Tran and Michael W. Dusenberry and Danijar Hafner and Mark van der Wilk},
  title={Bayesian {L}ayers: A module for neural network uncertainty},
  booktitle = {Neural Information Processing Systems},
  year={2019}
}

edward2's People

Contributors

Stargazers

Watchers

edward2's Issues

Add ensembles baseline to CIFAR-10

Initial idea: workflow can be run x models. Then ensemble script loads in those checkpoints.

Add modelcheckpoint callback

Lets us save during training. Particularly important for deterministic's best test NLL results.

Add robustness CIFAR test sets

For cifar 10.1, add nll / accuracy / ce. For cifar c, add mean nll, mce, and mean ce. similarly cifar p.

Implement subspace inference and or swa(g) baselines

These should roughly be similar to the deterministic baselines.

Enable automatic citation generation from Edward1

Edward1 uses text like [@chen2014stochastic] in the docstring. During docstring generation (https://github.com/blei-lab/edward/tree/master/docs), this expands out the citation Bibtex-style and adds a References section at the bottom of the docstring.

model.predict does not work with stochastic output layers

model.predict doesn't work for non-Tensor outputs, including Tensor-convertible objects like ed.RandomVariable.

For now, the workaround is to replace model.predict as below with an explicit for loop over the data.

dataset_test = dataset_test.repeat().batch(batch_size)
test_steps = ds_info.splits['test'].num_examples // batch_size

predictions = model.predict(dataset_test, verbose=1, steps=test_steps)  # raises error
logits = predictions.distribution.logits  # predicted logits of full dataset

dataset_test = dataset_test.batch(batch_size)

logits = []
for features, _ in dataset_test:
  predictions = model(features)
  logits.append(predictions.distribution.logits)

logits = tf.concat(logits, axis=0)  # predicted logits of full dataset

Note to loop over tf data, you need to use TF 2.0 behavior; otherwise you need to use a tf.Session with the deprecated iterator design.

seeds in initializers

They're not used consistently.

Train on more CIFAR-10 data

All baselines currently use splits of 40k train / 10k validation / 10k test. What's standard (aside from not having validation data..)? 45k train / 5k validation? 49k train / 1k validation?

Use @tf.keras.utils.register_keras_serializable?

IIUC, this avoids the need for our own serializable boilerplate functions which we copied from Keras' functions.

@tf.keras.utils.register_keras_serializable(package='Custom', name='l1')

Move vectorquantizer layer from tensorflow/tensor2tensor to here

Reorganize ed.layers modules and namespaces

Move distribution-based metrics in baselines to edward2/tensorflow

Metrics should also change to not rely on model.outputs but call y_pred if possible.

MDN Implementation using Edward2

The current example on MDN from Edward tutorials needs small modifications to run on edward2. Documentation covering these modifications will be appreciated.

Conv2DVariationalDropout does not work in eager

@ywen666

Maybe add last-layer VI baseline to CIFAR-10

Rewrite top-level README.md to support two separate workflows: typical ML training and trace-based PPL training

Currently, the README.md is written for the latter. But in most examples and baselines, we use the former as the models aren't so structured that they necessitate more complex tracing.

Add L0 layers

Argued to form a variational lower bound of network with spike and slab weight prior.

https://github.com/AMLab-Amsterdam/L0_regularization/blob/master/l0_layers.py

Add dropout baseline for CIFAR-10

Improve deterministic baseline to 95%+ test accuracy

ResNet-20 may be too weak of a baseline. We should maybe move to WRN-28-10. Wide ResNets also involve dropout, so there's likely more benefit in using BNN layers due to the need to regularize the wider layers. The BatchEnsemble baselines' current code also use ResNets with more than typical filters.

todos for switching to wide resnet

use v2 aka preactivation resnet, with order of bn-relu-conv instead of conv-bn-relu
add width factor
add dropout baseline with dropout between convs and perhaps after skip connection

Update edwardlib.org website to reflect Edward2

Baselines do not test on all data during training

Baselines currently have this snippet.

validation_steps = 100
dataset_test = dataset_test.take(FLAGS.batch_size * validation_steps).repeat(
      ).batch(FLAGS.batch_size)

Is it too expensive to evaluate on all test data at each epoch? Ideally for small experiments like CIFAR-10, we shouldn't need to run an additional eval job that's separate from training.

Remove default kwargs in CIFAR-10 resnet functions

Move all CIFAR-10 resnets to pre-activation resnet

He et al. (2016) is likely a better practice. Differences shouldn't be noticeable for the depths we're currently using though.

Add Travis continuous integration

Add calibration error on CIFAR-10

Update library code to use TensorFlow 2.0 namespace

Rewrite resnet implementations as block + model?

Resnets in CIFAR/ImageNet currently write a generic conv, bn, and relu layer, and the model function stacks these.

I like Pytorch's implementations which more closely resemble how we think of resnets: define a residual block function (which itself can vary), and then define a model which stacks residual blocks. These seems conceptually nicer but may require more boilerplate as the conv layer takes quite a few arguments.

Improve underfitting behavior for VI CIFAR-10 baseline

The results are competitive but it's still underfitting. I haven't tried standard techniques yet such as KL annealing, initialization from a pre-trained deterministic network (see sparse VD paper), etc.

Add option for mixed precision training

Deterministic and BatchEnsemble baselines cast data as bfloat16 by default with TPUs. Following the cloud TPU imagenet example, we need to set a policy if we want to maintain bfloat16 for the activations, etc.

  if _USE_BFLOAT16:
    policy = tf.keras.mixed_precision.experimental.Policy('mixed_bfloat16')
    tf.keras.mixed_precision.experimental.set_policy(policy)

It should be easy enough to set up a boolean flag to use bfloat16 with that policy or otherwise operate in float32.

Update documentation to include layers

tune deterministic resnet-50 to 76.3%?

Facebook's scaling paper suggests they consistently get 76.4% with base learning rate=0.1 and total_batch_size (kn) = 256, and 76.3% with total_batch_size (kn)=8k. Our accuracies are hovering around 76.0-76.3%, based on the official tpu resnet50 keras codebase. Should double check whether the official codebase is meeting this target and ultimately how we can meet it.

Replace use of log_marginal with negative_log_likelihood

No need to implement both. We report with the name of NLL in the table anyways (not MLL), and negative_log_likelihood is also more descriptive.

Define metrics used in baselines

As we continue to add metrics, we should formalize how they're defined by writing a section with english/math descriptions of how each of the columns are computed.

Increase batch size/learning rates for VI CIFAR-10 baselines

Batch sizes should be 256+ IMO. The experiments shouldn't take 10+ hours for 200 epochs (it's up to a day for the deterministic baseline).

Add activation/weight histograms for CIFAR-10

For some reason, naively turning it on didn't work for me in variational_inference.py.

  tensorboard_cb = tf.keras.callbacks.TensorBoard(log_dir=FLAGS.output_dir,
                                                  # TODO(trandustin): This
                                                  # doesn't work(?).
                                                  # histogram_freq=5,
                                                  write_graph=False)

Add refined VI CIFAR-10 baseline

Initial idea: workflow can load in checkpoint from running VI baseline.

AttributeError: module 'edward2' has no attribute 'set_seed'

Edward 1 API has a function called set_seed, which is apparently no more available in Edward 2, given that I get the error AttributeError: module 'edward2' has no attribute 'set_seed' when I attempt to call ed.set_seed. So, what is the equivalent function in Edward 2?

Potentially rewrite constraint functions for bayesian layers

We currently adopt Keras' practice of unconstrained parameters followed by projected gradient descent. It's more common in probabilistic modeling code to constrain the parameter space itself, e.g., ed.Normal(0., tf.nn.softplus(tf.Variable(1.)) + tf.keras.backend.epsilon()). This is subtle but potentially an impactful change so we should be careful with our ablation studies if we want to change this behavior.

Add pylint linter to Travis

Update examples/documentation to use TensorFlow 2.0 behavior

ImportError: cannot import name 'docstring' from 'tensorflow_probability.python.util'

I have a virtual environment (with Python 3.7.4) where I installed edward2, tensorflow (2) and tensorflow_probability. When I try to import edward2 with import edward2 as ed, I get the error

ImportError: cannot import name 'docstring' from 'tensorflow_probability.python.util'

Here's the full Traceback

Traceback (most recent call last):
File "/Users/nbro/Desktop/edward_tests/edward_test2.py", line 1, in
import edward2 as ed
File "/Users/nbro/Desktop/edward_tests/venv2/lib/python3.7/site-packages/edward2/init.py", line 32, in
from edward2 import generated_random_variables
File "/Users/nbro/Desktop/edward_tests/venv2/lib/python3.7/site-packages/edward2/generated_random_variables.py", line 26, in
from tensorflow_probability.python.util import docstring as docstring_util
ImportError: cannot import name 'docstring' from 'tensorflow_probability.python.util' (/Users/nbro/Desktop/edward_tests/venv2/lib/python3.7/site-packages/tensorflow_probability/python/util/init.py)

Add scale arg to scale the output of KL divergence regularizers

Makes it slightly easier to scale the KL penalties by the dataset size if you want to do it as part of the model rather than at training time. E.g., consider model.fit where the loss is always loss + sum(model.losses), so you can't scale the model losses externally as we currently recommend.

Make new layers code use TF2 API

tensorflow_probability docstring_util location

The location of the docstring_util in tensorflow_probability 0.8.0 has changed, resulting in edward2 errors such as:

File "/home/wibble/.conda/envs/gem/lib/python3.7/site-packages/edward2/generated_random_variables.py", line 26, in <module>
    from tensorflow_probability.python.util import docstring as docstring_util
ImportError: cannot import name 'docstring' from 'tensorflow_probability.python.util'

Problem Importing Layer Modules

Hello,

I was eager to try out some of the Bayesian layers that you had implemented. I saw that they were in the tensor2tensor package but then they seemed to have been removed placed in the edward2 package in the layers part. but I can't seem to find any of the modules (e.g. GaussianProcess, SparseGaussianProcess) or perhaps I don't understand how it works with importing them. I thought maybe perhaps the names were different but I can't seem to wrap my head around where to find the bayesian layer modules.

I cannot even find the files within the actual distribution located in /usr/local/lib/python3.6/dist-packages/edward2/__init__.py so I'm not sure if I am doing something wrong or not.

Installations

I did the following installation procedure with and without the tf-nightly

pip install edward2[tf-nightly]

I also tried to directly install it via the github repo:

pip install git+https://github.com/google/edward2

Helpful Info

Google Colab Notebook
Python 3.6
TensorFlow - 2.0.0-rc1
Edward - 0.0.1

Thanks,
Emmanuel

AttributeError: module 'edward2' has no attribute 'KLqp'

I am trying to run this example https://github.com/blei-lab/edward/blob/master/examples/bayesian_nn.py, but with edward 2 and TensorFlow 2 (which is now stable). After having first used the script tf_upgrade_v2 on this file and fixed other problems mentioned in this issue #37, I get another error

Traceback (most recent call last):
File "/Users/nbro/Desktop/edward_tests/edward_test2.py", line 102, in
app.run(main)
File "/Users/nbro/Desktop/edward_tests/venv/lib/python3.7/site-packages/absl/app.py", line 299, in run
_run_main(main, args)
File "/Users/nbro/Desktop/edward_tests/venv/lib/python3.7/site-packages/absl/app.py", line 250, in _run_main
sys.exit(main(argv))
File "/Users/nbro/Desktop/edward_tests/edward_test2.py", line 95, in main
inference = ed.KLqp({W_0: qW_0, b_0: qb_0, W_1: qW_1, b_1: qb_1, W_2: qW_2, b_2: qb_2},
AttributeError: module 'edward2' has no attribute 'KLqp'

I noticed that https://github.com/google/edward2/blob/master/Upgrading_From_Edward_To_Edward2.md shows an example of how to perform inference with edward 2. This example is extremely verbose, compared to edward 1's example, which just calls KLqp. Is there an easy (non-verbose) way of performing inference with edward 2 (with TensorFlow 2)?

This issue may be related to blei-lab/edward#640.