Git Product home page Git Product logo

edward2's Introduction

Edward2

Edward2 is a simple probabilistic programming language. It provides core utilities in deep learning ecosystems so that one can write models as probabilistic programs and manipulate a model's computation for flexible training and inference. It's organized as follows:

Are you upgrading from Edward? Check out the guide Upgrading_from_Edward_to_Edward2.md. The core utilities are fairly low-level: if you'd like a high-level module for uncertainty modeling, check out the guide for Bayesian Layers. We recommend the Uncertainty Baselines if you'd like to build on research-ready code.

Installation

To install the latest stable version, run

pip install edward2

To install the latest development version, run

pip install "git+https://github.com/google/edward2.git#egg=edward2"

Edward2 supports three backends: TensorFlow (the default), JAX, and NumPy (see below to activate). Installing edward2 does not automatically install any backend. To get these dependencies, use for example pip install edward2[tensorflow]", replacing tensorflow for the appropriate backend. Sometimes Edward2 uses the latest changes from TensorFlow in which you'll need TensorFlow's nightly package: use pip install edward2[tf- nightly].

1. Models as Probabilistic Programs

Random Variables

In Edward2, we use RandomVariables to specify a probabilistic model's structure. A random variable rv carries a probability distribution (rv.distribution), which is a TensorFlow Distribution instance governing the random variable's methods such as log_prob and sample.

Random variables are formed like TensorFlow Distributions.

import edward2 as ed

normal_rv = ed.Normal(loc=0., scale=1.)
## <ed.RandomVariable 'Normal/' shape=() dtype=float32 numpy=0.0024812892>
normal_rv.distribution.log_prob(1.231)
## <tf.Tensor: id=11, shape=(), dtype=float32, numpy=-1.6766189>

dirichlet_rv = ed.Dirichlet(concentration=tf.ones([2, 3]))
## <ed.RandomVariable 'Dirichlet/' shape=(2, 3) dtype=float32 numpy=
array([[0.15864784, 0.01217205, 0.82918006],
       [0.23385087, 0.69622266, 0.06992647]], dtype=float32)>

By default, instantiating a random variable rv creates a sampling op to form the tensor rv.value ~ rv.distribution.sample(). The default number of samples (controllable via the sample_shape argument to rv) is one, and if the optional value argument is provided, no sampling op is created. Random variables can interoperate with TensorFlow ops: the TF ops operate on the sample.

x = ed.Normal(loc=tf.zeros(2), scale=tf.ones(2))
y = 5.
x + y, x / y
## (<tf.Tensor: id=109, shape=(2,), dtype=float32, numpy=array([3.9076924, 4.588356 ], dtype=float32)>,
##  <tf.Tensor: id=111, shape=(2,), dtype=float32, numpy=array([-0.21846154, -0.08232877], dtype=float32)>)
tf.tanh(x * y)
## <tf.Tensor: id=114, shape=(2,), dtype=float32, numpy=array([-0.99996394, -0.9679181 ], dtype=float32)>
x[1]  # 2nd normal rv
## <ed.RandomVariable 'Normal/' shape=() dtype=float32 numpy=-0.41164386>

Probabilistic Models

Probabilistic models in Edward2 are expressed as Python functions that instantiate one or more RandomVariables. Typically, the function ("program") executes the generative process and returns samples. Inputs to the function can be thought of as values the model conditions on.

Below we write Bayesian logistic regression, where binary outcomes are generated given features, coefficients, and an intercept. There is a prior over the coefficients and intercept. Executing the function adds operations samples coefficients and intercept from the prior and uses these samples to compute the outcomes.

def logistic_regression(features):
  """Bayesian logistic regression p(y | x) = int p(y | x, w, b) p(w, b) dwdb."""
  coeffs = ed.Normal(loc=tf.zeros(features.shape[1]), scale=1., name="coeffs")
  intercept = ed.Normal(loc=0., scale=1., name="intercept")
  outcomes = ed.Bernoulli(
      logits=tf.tensordot(features, coeffs, [[1], [0]]) + intercept,
      name="outcomes")
  return outcomes

num_features = 10
features = tf.random.normal([100, num_features])
outcomes = logistic_regression(features)
# <ed.RandomVariable 'outcomes/' shape=(100,) dtype=int32 numpy=
# array([1, 0, ... 0, 1], dtype=int32)>

Edward2 programs can also represent distributions beyond those which directly model data. For example, below we write a learnable distribution with the intention to approximate it to the logistic regression posterior.

def logistic_regression_posterior(coeffs_loc, coeffs_scale,
                                  intercept_loc, intercept_scale):
  """Posterior of Bayesian logistic regression p(w, b | {x, y})."""
  coeffs = ed.MultivariateNormalTriL(
      loc=coeffs_loc,
      scale_tril=tfp.trainable_distributions.tril_with_diag_softplus_and_shift(
          coeffs_scale),
      name="coeffs_posterior")
  intercept = ed.Normal(
      loc=intercept_loc,
      scale=tf.nn.softplus(intercept_scale) + 1e-5,
      name="intercept_posterior")
  return coeffs, intercept

coeffs_loc = tf.Variable(tf.random.normal([num_features]))
coeffs_scale = tf.Variable(tf.random.normal(
    [num_features*(num_features+1) // 2]))

intercept_loc = tf.Variable(tf.random.normal([]))
intercept_scale = tf.Variable(tf.random.normal([]))
posterior_coeffs, posterior_intercept = logistic_regression_posterior(
    coeffs_loc, coeffs_scale, intercept_loc, intercept_scale)

2. Manipulating Model Computation

Tracing

Training and testing probabilistic models typically require more than just samples from the generative process. To enable flexible training and testing, we manipulate the model's computation using tracing.

A tracer is a function that acts on another function f and its arguments *args, **kwargs. It performs various computations before returning an output (typically f(*args, **kwargs): the result of applying the function itself). The ed.trace context manager pushes tracers onto a stack, and any traceable function is intercepted by the stack. All random variable constructors are traceable.

Below we trace the logistic regression model's generative process. In particular, we make predictions with its learned posterior means rather than with its priors.

def set_prior_to_posterior_mean(f, *args, **kwargs):
  """Forms posterior predictions, setting each prior to its posterior mean."""
  name = kwargs.get("name")
  if name == "coeffs":
    return posterior_coeffs.distribution.mean()
  elif name == "intercept":
    return posterior_intercept.distribution.mean()
  return f(*args, **kwargs)

with ed.trace(set_prior_to_posterior_mean):
  predictions = logistic_regression(features)

training_accuracy = (
    tf.reduce_sum(tf.cast(tf.equal(predictions, outcomes), tf.float32)) /
    tf.cast(outcomes.shape[0], tf.float32))

Program Transformations

Using tracing, one can also apply program transformations, which map from one representation of a model to another. This provides convenient access to different model properties depending on the downstream use case.

For example, Markov chain Monte Carlo algorithms often require a model's log-joint probability function as input. Below we take the Bayesian logistic regression program which specifies a generative process, and apply the built-in ed.make_log_joint transformation to obtain its log-joint probability function. The log-joint function takes as input the generative program's original inputs as well as random variables in the program. It returns a scalar Tensor summing over all random variable log-probabilities.

In our example, features and outcomes are fixed, and we want to use Hamiltonian Monte Carlo to draw samples from the posterior distribution of coeffs and intercept. To this use, we create target_log_prob_fn, which takes just coeffs and intercept as arguments and pins the input features and output rv outcomes to its known values.

import no_u_turn_sampler  # local file import

# Set up training data.
features = tf.random.normal([100, 55])
outcomes = tf.random.uniform([100], minval=0, maxval=2, dtype=tf.int32)

# Pass target log-probability function to MCMC transition kernel.
log_joint = ed.make_log_joint_fn(logistic_regression)

def target_log_prob_fn(coeffs, intercept):
  """Target log-probability as a function of states."""
  return log_joint(features,
                   coeffs=coeffs,
                   intercept=intercept,
                   outcomes=outcomes)

coeffs_samples = []
intercept_samples = []
coeffs = tf.random.normal([55])
intercept = tf.random.normal([])
target_log_prob = None
grads_target_log_prob = None
for _ in range(1000):
  [
      [coeffs, intercepts],
      target_log_prob,
      grads_target_log_prob,
  ] = no_u_turn_sampler.kernel(
          target_log_prob_fn=target_log_prob_fn,
          current_state=[coeffs, intercept],
          step_size=[0.1, 0.1],
          current_target_log_prob=target_log_prob,
          current_grads_target_log_prob=grads_target_log_prob)
  coeffs_samples.append(coeffs)
  intercept_samples.append(coeffs)

The returned coeffs_samples and intercept_samples contain 1,000 posterior samples for coeffs and intercept respectively. They may be used, for example, to evaluate the model's posterior predictive on new data.

Using the JAX or NumPy backend

Using alternative backends is as simple as the following:

import edward2.numpy as ed  # NumPy backend
import edward2.jax as ed  # or, JAX backend

In the NumPy backend, Edward2 wraps SciPy distributions. For example, here's linear regression.

def linear_regression(features, prior_precision):
  beta = ed.norm.rvs(loc=0.,
                     scale=1. / np.sqrt(prior_precision),
                     size=features.shape[1])
  y = ed.norm.rvs(loc=np.dot(features, beta), scale=1., size=1)
  return y

References

In general, we recommend citing the following article.

Tran, D., Hoffman, M. D., Moore, D., Suter, C., Vasudevan S., Radul A., Johnson M., and Saurous R. A. (2018). Simple, Distributed, and Accelerated Probabilistic Programming. In Neural Information Processing Systems.

@inproceedings{tran2018simple,
  author = {Dustin Tran and Matthew D. Hoffman and Dave Moore and Christopher Suter and Srinivas Vasudevan and Alexey Radul and Matthew Johnson and Rif A. Saurous},
  title = {Simple, Distributed, and Accelerated Probabilistic Programming},
  booktitle = {Neural Information Processing Systems},
  year = {2018},
}

If you'd like to cite the layers module specifically, use the following article.

Tran, D., Dusenberry M. W., van der Wilk M., Hafner D. (2019). Bayesian Layers: A Module for Neural Network Uncertainty. In Neural Information Processing Systems.

@inproceedings{tran2019bayesian,
  author = {Dustin Tran and Michael W. Dusenberry and Danijar Hafner and Mark van der Wilk},
  title={Bayesian {L}ayers: A module for neural network uncertainty},
  booktitle = {Neural Information Processing Systems},
  year={2019}
}

edward2's People

Contributors

acyrl avatar chiamp avatar davmre avatar dusenberrymw avatar dustinvtran avatar edward-bot avatar faizan-m avatar fchollet avatar fehiepsi avatar fortuin avatar ghassenj avatar grasskin avatar hawkinsp avatar jburnim avatar jereliu avatar jtainslie avatar ksachdeva avatar lukewood avatar markpkcollier avatar mhavasi avatar miksu avatar pbischoff avatar qlzh727 avatar rchen152 avatar rodolphejenatton avatar shreyaspadhy avatar smit-hinsu avatar yilei avatar ywen666 avatar znado avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

edward2's Issues

Train on more CIFAR-10 data

All baselines currently use splits of 40k train / 10k validation / 10k test. What's standard (aside from not having validation data..)? 45k train / 5k validation? 49k train / 1k validation?

model.predict does not work with stochastic output layers

model.predict doesn't work for non-Tensor outputs, including Tensor-convertible objects like ed.RandomVariable.

For now, the workaround is to replace model.predict as below with an explicit for loop over the data.

dataset_test = dataset_test.repeat().batch(batch_size)
test_steps = ds_info.splits['test'].num_examples // batch_size

predictions = model.predict(dataset_test, verbose=1, steps=test_steps)  # raises error
logits = predictions.distribution.logits  # predicted logits of full dataset
dataset_test = dataset_test.batch(batch_size)

logits = []
for features, _ in dataset_test:
  predictions = model(features)
  logits.append(predictions.distribution.logits)

logits = tf.concat(logits, axis=0)  # predicted logits of full dataset

Note to loop over tf data, you need to use TF 2.0 behavior; otherwise you need to use a tf.Session with the deprecated iterator design.

tensorflow_probability docstring_util location

The location of the docstring_util in tensorflow_probability 0.8.0 has changed, resulting in edward2 errors such as:

File "/home/wibble/.conda/envs/gem/lib/python3.7/site-packages/edward2/generated_random_variables.py", line 26, in <module>
    from tensorflow_probability.python.util import docstring as docstring_util
ImportError: cannot import name 'docstring' from 'tensorflow_probability.python.util'

AttributeError: module 'edward2' has no attribute 'set_seed'

Edward 1 API has a function called set_seed, which is apparently no more available in Edward 2, given that I get the error AttributeError: module 'edward2' has no attribute 'set_seed' when I attempt to call ed.set_seed. So, what is the equivalent function in Edward 2?

MDN Implementation using Edward2

The current example on MDN from Edward tutorials needs small modifications to run on edward2. Documentation covering these modifications will be appreciated.

Problem Importing Layer Modules

Hello,

I was eager to try out some of the Bayesian layers that you had implemented. I saw that they were in the tensor2tensor package but then they seemed to have been removed placed in the edward2 package in the layers part. but I can't seem to find any of the modules (e.g. GaussianProcess, SparseGaussianProcess) or perhaps I don't understand how it works with importing them. I thought maybe perhaps the names were different but I can't seem to wrap my head around where to find the bayesian layer modules.

I cannot even find the files within the actual distribution located in /usr/local/lib/python3.6/dist-packages/edward2/__init__.py so I'm not sure if I am doing something wrong or not.


Installations

I did the following installation procedure with and without the tf-nightly

pip install edward2[tf-nightly]

I also tried to directly install it via the github repo:

pip install git+https://github.com/google/edward2

Helpful Info

  • Google Colab Notebook
  • Python 3.6
  • TensorFlow - 2.0.0-rc1
  • Edward - 0.0.1

Thanks,
Emmanuel

Add scale arg to scale the output of KL divergence regularizers

Makes it slightly easier to scale the KL penalties by the dataset size if you want to do it as part of the model rather than at training time. E.g., consider model.fit where the loss is always loss + sum(model.losses), so you can't scale the model losses externally as we currently recommend.

AttributeError: module 'edward2' has no attribute 'KLqp'

I am trying to run this example https://github.com/blei-lab/edward/blob/master/examples/bayesian_nn.py, but with edward 2 and TensorFlow 2 (which is now stable). After having first used the script tf_upgrade_v2 on this file and fixed other problems mentioned in this issue #37, I get another error

Traceback (most recent call last):
File "/Users/nbro/Desktop/edward_tests/edward_test2.py", line 102, in
app.run(main)
File "/Users/nbro/Desktop/edward_tests/venv/lib/python3.7/site-packages/absl/app.py", line 299, in run
_run_main(main, args)
File "/Users/nbro/Desktop/edward_tests/venv/lib/python3.7/site-packages/absl/app.py", line 250, in _run_main
sys.exit(main(argv))
File "/Users/nbro/Desktop/edward_tests/edward_test2.py", line 95, in main
inference = ed.KLqp({W_0: qW_0, b_0: qb_0, W_1: qW_1, b_1: qb_1, W_2: qW_2, b_2: qb_2},
AttributeError: module 'edward2' has no attribute 'KLqp'

I noticed that https://github.com/google/edward2/blob/master/Upgrading_From_Edward_To_Edward2.md shows an example of how to perform inference with edward 2. This example is extremely verbose, compared to edward 1's example, which just calls KLqp. Is there an easy (non-verbose) way of performing inference with edward 2 (with TensorFlow 2)?

This issue may be related to blei-lab/edward#640.

Add activation/weight histograms for CIFAR-10

For some reason, naively turning it on didn't work for me in variational_inference.py.

  tensorboard_cb = tf.keras.callbacks.TensorBoard(log_dir=FLAGS.output_dir,
                                                  # TODO(trandustin): This
                                                  # doesn't work(?).
                                                  # histogram_freq=5,
                                                  write_graph=False)

Baselines do not test on all data during training

Baselines currently have this snippet.

validation_steps = 100
dataset_test = dataset_test.take(FLAGS.batch_size * validation_steps).repeat(
      ).batch(FLAGS.batch_size)

Is it too expensive to evaluate on all test data at each epoch? Ideally for small experiments like CIFAR-10, we shouldn't need to run an additional eval job that's separate from training.

tune deterministic resnet-50 to 76.3%?

Facebook's scaling paper suggests they consistently get 76.4% with base learning rate=0.1 and total_batch_size (kn) = 256, and 76.3% with total_batch_size (kn)=8k. Our accuracies are hovering around 76.0-76.3%, based on the official tpu resnet50 keras codebase. Should double check whether the official codebase is meeting this target and ultimately how we can meet it.

Potentially rewrite constraint functions for bayesian layers

We currently adopt Keras' practice of unconstrained parameters followed by projected gradient descent. It's more common in probabilistic modeling code to constrain the parameter space itself, e.g., ed.Normal(0., tf.nn.softplus(tf.Variable(1.)) + tf.keras.backend.epsilon()). This is subtle but potentially an impactful change so we should be careful with our ablation studies if we want to change this behavior.

Add option for mixed precision training

Deterministic and BatchEnsemble baselines cast data as bfloat16 by default with TPUs. Following the cloud TPU imagenet example, we need to set a policy if we want to maintain bfloat16 for the activations, etc.

  if _USE_BFLOAT16:
    policy = tf.keras.mixed_precision.experimental.Policy('mixed_bfloat16')
    tf.keras.mixed_precision.experimental.set_policy(policy)

It should be easy enough to set up a boolean flag to use bfloat16 with that policy or otherwise operate in float32.

Improve deterministic baseline to 95%+ test accuracy

ResNet-20 may be too weak of a baseline. We should maybe move to WRN-28-10. Wide ResNets also involve dropout, so there's likely more benefit in using BNN layers due to the need to regularize the wider layers. The BatchEnsemble baselines' current code also use ResNets with more than typical filters.

todos for switching to wide resnet

  • use v2 aka preactivation resnet, with order of bn-relu-conv instead of conv-bn-relu
  • add width factor
  • add dropout baseline with dropout between convs and perhaps after skip connection

Define metrics used in baselines

As we continue to add metrics, we should formalize how they're defined by writing a section with english/math descriptions of how each of the columns are computed.

Rewrite resnet implementations as block + model?

Resnets in CIFAR/ImageNet currently write a generic conv, bn, and relu layer, and the model function stacks these.

I like Pytorch's implementations which more closely resemble how we think of resnets: define a residual block function (which itself can vary), and then define a model which stacks residual blocks. These seems conceptually nicer but may require more boilerplate as the conv layer takes quite a few arguments.

ImportError: cannot import name 'docstring' from 'tensorflow_probability.python.util'

I have a virtual environment (with Python 3.7.4) where I installed edward2, tensorflow (2) and tensorflow_probability. When I try to import edward2 with import edward2 as ed, I get the error

ImportError: cannot import name 'docstring' from 'tensorflow_probability.python.util'

Here's the full Traceback

Traceback (most recent call last):
File "/Users/nbro/Desktop/edward_tests/edward_test2.py", line 1, in
import edward2 as ed
File "/Users/nbro/Desktop/edward_tests/venv2/lib/python3.7/site-packages/edward2/init.py", line 32, in
from edward2 import generated_random_variables
File "/Users/nbro/Desktop/edward_tests/venv2/lib/python3.7/site-packages/edward2/generated_random_variables.py", line 26, in
from tensorflow_probability.python.util import docstring as docstring_util
ImportError: cannot import name 'docstring' from 'tensorflow_probability.python.util' (/Users/nbro/Desktop/edward_tests/venv2/lib/python3.7/site-packages/tensorflow_probability/python/util/init.py)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.