evolution's Introduction

evolution

This is a local, not distributed, go, not python, implementation of the Evolution Strategies as a Scalable Alternative to Reinforcement Learning (Salimans et. al). The original starter from the paper can be found openai/evolution-strategies-starter. Under the covers it uses the openai/gym-http-api, more specifically binding-go, and uses unixpickle/anynet and unixpickle/anyvec for efficient high-level vector computation. Enjoy!

instructions

The goal is to solve CartPole-v0, This requires 195 epochs/reward over 100 episodes. Install openai/gym, openai/gym-http-api is a dependency required from the Go source.

Get the binary. Clone, download, or whatever you want, or just

$ go get github.com/wenkesj/evolution

In a seperate terminal, open the gym from wherever github.com/openai/gym-http-api is located in your fs.

$ python gym_http_server.py

Run the trainer and evaluater with whatever concauction you choose.

$ # 200 episodes of "training" by 2 agents and 100
$ # finalepisodes of evaluation with a single agent
$ # Saving results to a directory "~/agents2eps200"
$ evolution --outmonitor ~/agents2eps200 \
  --finalepisodes 100 \
  --episodes 200
  --agents 2

example results

So, after 42 episodes, the 2 agents evolve enough to simply destroy at the game on their own. In this simple case, we apply a cutoff average reward of 195 or above for both agents, signifying the parameters on average should be able to solve the game with a single offspring. So we test that fact,

And it works! We get 198.5 average reward over 100 episodes!

roadmap

disclaimer

This is a project for my Complex Systems and Networks class. This isn't meant to be comparable to the original work; I'm not a master coder/statistical god/andrej karpathy, I just thought this was a cool idea. This is an implementation with results and intrepretation.

evolution's People

Contributors

Stargazers

Watchers

evolution's Issues

Normally-distributed noise

Theoretically, Evolution Strategies requires normally-distributed noise in order to be an accurate gradient estimator. I see here that noise is generated as a random number from [-1, 1], which is not a normal distribution (rather, it's a uniform distribution). You can sample a normal distribution with r.NormFloat64(). I'd love to see how/if this affects performance.

Recommend Projects

wenkesj / evolution Goto Github PK

evolution's Introduction

evolution

instructions

example results

roadmap

disclaimer

evolution's People

Contributors

Stargazers

Watchers

Forkers

evolution's Issues

Normally-distributed noise

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent