Git Product home page Git Product logo

heteroscedasticdropoutuncertainty's Introduction

During a talk I gave at Google recently, I was asked about a peculiar behaviour of the uncertainty estimates we get from dropout networks (studied here). When fitting a model on the following dataset:

A dataset

we observe the following weird behaviour:

weird behaviour

... the model cannot increase its uncertainty to cover the points at the far right-hand side of the plane. The explanation to this behaviour is actually quite simple. To understand what's going on, we need to talk about homoscedasticity versus heteroscedasticity.

Homoscedastic and Heteroscedastic Models

Homoscedastic regression assumes identical observation noise for every input point x. Heteroscedastic regression, on the other hand, assumes that observation noise can vary with input x. Heteroscedastic models are useful in cases where parts of the observation space might have higher noise levels than others.

Heteroscedastic dropout regression example

Example of a heteroscedastic model

Using dropout we get homoscedastic model uncertainty. This can be seen from the model definition [Gal and Ghahramani 2015, equation 8]. The likelihood in our derivations is defined as $y_i \sim N(\muW(x_i), \tau-1 I)$ with $\muW$ the network output, dependent on the randomly dropped weights $W$. Here our model precision $\tau$ (which is the same as the inverse observation noise) is a constant, which has to be tuned for the data.

We can easily adapt the model to obtain data-dependent noise. This simply involves making $\tau$ into a function of the data, very much like $\muW$ being a function of the data. We can tie the two functions together, splitting the top layer of a network between predictive mean $\mu(x)$ and predictive variance $\tau-1(x)$ (of course we would want to re-parametrise this to make sure $\tau$ is positive). Thus the new (now heteroscedastic!) model likelihood is given by $y_i \sim N(\muW(x_i), \tau-1,W(x_i) I)$.

We can implement this new model by slightly adapting the cost function of the original model (I'll put a paper online with the derivations if there's interest). In the mean time you can browse through the code for this new cost function (named HeteroscedasticRegressionLayer in my implementation into ConvnetJS). We estimate our predictive variance like before by averaging stochastic forward passes through the model, both for $\mu$ and for $\tau$ (we made the observation noise parameter $\tau$ explicit, but there are other sources of uncertainty in our model; we will see an example below).

Examples

I put together a few interactive demos, demonstrating the differences between the homoscedastic model and the heteroscedastic one.

First, homoscedastic models with fixed observation noise (either large or small) cannot increase model uncertainty when the amount of observed noise increases rapidly (as we see on the right-hand side of the plane in the different figures). In this interactive example we have a large(-ish) observation noise of $\tau=1$, and as we can see:

Homoscedastic dropout regression example, large observation noise

our model uncertainty does not increase at the right-hand side of the plane. This behaviour is shared with the Gaussian process (which this model approximates), as can be seen here:

Homoscedastic GP regression example, large observation noise

Decreasing our (still) fixed observation noise to zero we can see that the model will try to fit through all points (and indeed overfit if left to run for enough time). We still see that the model is uncertain about parts of the space, demonstrating nicely that the observation noise is not the only factor in determining model confidence:

Homoscedastic dropout regression example, small observation noise

Again, we see the same behaviour with the Gaussian process:

Homoscedastic GP regression example, small observation noise

This interactive dropout demo is given here. The Matlab code for the Gaussian process experiments is available here, with a dependency on GPML.

Lastly, our new heteroscedastic model is demonstrated here. This model manages to increase model uncertainty in the parts of the plane where there is higher noise:

Heteroscedastic dropout regression example

You can play with this interactive demo and add more points to the dataset. Further examples with a different function ($y = x + sin(\alpha(x + w)) + sin(\beta(x + w)) + w$ with $w \sim N(0, 0.032), \alpha = 4, \beta = 13$, used in Deep Exploration via Bootstrapped DQN) are given for:

These demos demonstrate the differences between homoscedastic and heteroscedastic regression with dropout uncertainty. ConvnetJS was used here as a framework to interactively demonstrate the properties underlying dropout uncertainty. ConvnetJS was originally developed by Karpathy under the MIT license which is used here as well.

heteroscedasticdropoutuncertainty's People

Contributors

yaringal avatar

Watchers

James Cloos avatar Sangpil Kim avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.