Git Product home page Git Product logo

evostrats's Introduction

evostrats.png

Sam Greydanus | 2020

Summary

Quite a few functions are hard to optimize because they have many local minima. These functions often have "rough" loss surfaces: if you compute the analytic gradient at one point, then move a small distance away from that point and compute the gradient again, the two gradients look very different. One scenario where this happens is gradient-based metalearning (also called bilevel optimization). Here's an example from (Metz et al., 2019):

surface.png

In situations like this one, it helps to smooth the loss surface, and thus the gradients, before optimization. Doing this points us towards regions of parameter space where the overall density of good solutions is higher. Once we get to these regions (eg. the upper right corner of the plots above), we can anneal the smoothing factor and zero in on a particularly good solution.

Any amount of noise involved in an optimization process helps to smooth the loss surface. Stochastic gradient descent is a good example: empirical results suggest that smaller batch sizes (more stochastic gradients) can actually improve generalization in neural networks. But I agree with Metz et al. 2019 when they claim that this isn't enough to stabilize metalearning. A more extreme alternative is to use evolution strategies as a stochastic, unbiased gradient estimator ("mutation rate" ~ "smoothness factor"). Many people would say that this is crazy...

...but the purpose of this little library is to show otherwise. It consists of an ES gradient estimator and a series of benchmarks to evaluate how well it scales to high-dimensional neural network parameter spaces. Since one can never train too many MNIST classifiers, here are some results on MNIST:

evostrats_mnist.png

The ultimate purpose of this ES gradient estimator is to optimize metalearning problems. Thus I am particularly interested in maximizing sample efficiency. After all, in metalearning, each "sample" is an entire learning process. These learning processes are extraordinarily expensive to compute.

Dependencies

  • NumPy
  • SciPy
  • PyTorch

evostrats's People

Contributors

greydanus avatar summum-bonum avatar

Stargazers

Cody Collier avatar

Watchers

 avatar

Forkers

zwimpee

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.