What is in this repo?

A Tensorflow-based implementation of all algorithms presented in Asynchronous Methods for Deep Reinforcement Learning.

This implementation uses processes instead of threads to achieve real concurrency. Each process has a local replica of the network(s) used, implemented in Tensorflow, and runs its own Tensorflow session. In addition, a copy of the network parameters are kept in a shared memory space. At runtime, each process uses its own local network(s) to choose actions and compute gradients (with Tensorflow). The shared network parameters are updated periodically in an asynchronous manner, by applying the grads obtained from Tensorflow into the shared memory space.

Both ALE and Open AI GYM environments can be used.

#Results The graphs below show the reward achieved in different games by one individual actor during training (i.e., not averaging over several runs, and over all actors, as in the paper). All experiments were run on a rather old machine equipped with 2 Xeon E5540 quad-core 2.53GHz CPUs (16 virtual cores) and 47 Gb RAM.

Boxing-v0 (from OpenAI Gym), A3C, 100 actors, lr=0.0007, 80M steps in 59h, 31m: As you can see, the score achieved is much higher than the one reported in the paper. That is due to the effect of having 100 actors. So concurrently exploring the environment in different ways definitely helps with the learning process and makes experience replay not needed. Note, however, that the performance in terms of training time is slightly worse than with fewer actors. This is probably due to our implementation, which is not optimal, and to the limitations of the machine we used.

Pong (from ALE), A3C, 16 actors, lr=0.0007, 80M steps in 48h:

Beam Rider (from ALE), A3C, 16 actors, lr=0.0007, 80M steps in 45h, 25min:

Breakout (from ALE), A3C, 15 actors, lr=0.0007, 80M steps in 53h, 22m:

How to run the algorithms (MacOSX for now)?

A number of hyperparameters can be specified. Default values have been chosen according to the paper and information received by @muupan from the authors. To see a list, please run:

python main.py -h

If you just want to see the code in action, you can kick off training with the default hyperparameters by running:

python main.py pong --rom_path ../atari_roms/

To run outside a docker, you need to install some dependencies:

Tensorflow
OpenAI Gym
The Arcade Learning Environment (ALE).(Note that OpenAI Gym uses ALE internally, so you could use that version. This would require some hacking.)
Scikit-image
Open CV v2, for standalone ALE (It should be possible to change the code in emulator.py to use Scikit-image instead of CV2. Indeed, CV2 might slow things down)

To run inside a docker:

(1) Clone this repo at ~/some-path.

(2) Make sure your machine has docker installed. Follow instructions [here] (https://docs.docker.com/toolbox/toolbox_install_mac/) if not. [These] (https://docs.docker.com/toolbox/toolbox_install_windows/) instructions may work for Windows.

(3) Make sure you have xquartz installed in order to visualise game play. Do the following in a separate terminal window:

$ brew cask install --force xquartz
$ open -a XQuartz
$ socat TCP-LISTEN:6000,reuseaddr,fork UNIX-CLIENT:\"$DISPLAY\"

(4) Get our docker image containing all dependencies to run the algorithms and to visualise game play.

$ docker pull restrd/tensorflow-atari-cpu

(5) Run the docker image. This will mount your home folder to /your-user-name inside the container. Be sure to give a name to the container: <container-name>

$ docker run -d -p 8888:8888 -p 6006:6006 --name "<container-name>" -v ~/:/root/$usr -e DISPLAY=$(ifconfig vboxnet0 | awk '$1 == "inet" {gsub(/\/.*$/, "", $2); print $2}'):0 -it docker.io/restrd/tensorflow0.10-atari-cpu

(6) Shell into the container.

$ docker exec -it <container-name> /bin/bash

(7) Go to the algorithms folder (/your-user-name/some-path/async-deep-rl/algorithms) and choose which algorithm to run via the configuration options in main.py.

(8) If you want to run the algorithms using Open AI GYM with 16 processes and visualize the games, e.g.:

$ python main.py BeamRider-v0 --env GYM -n 16 -v 1

Running TensorBoard

You can also run TensorBoard to visualise losses and game scores.

(1) Configure port forwarding rules in [VirtualBox] (https://www.virtualbox.org/). Go to your running virtual machine's Settings>Network>Port Forwarding, and add a new rule (see row starting with tb in pic).

(2) Run tensorboard from within the container:

$ tensorboard --logdir=/tmp/summary_logs/ &

(3) If not (1), get the ip address of your docker host running inside of [VirtualBox] (https://www.virtualbox.org/). Go to http://<docker-host-ip>:6006

If (1), go to http://127.0.0.1:6006

Problem synchronizing processes

Hi @traai,

Your code has been quite useful and it has helped me to understand asynchronous methods much better.

I have a comment about the following line in the ActorLearner class:

self.barrier.wait()

If I understand correctly, this Barrier class is supposed to help us synchronize all the processes so that the initial parameters are initialized by the first process and then once again after all processes have finished synchronizing these values.

I have checked the Barrier class, however, and I think there are some issues with the wait() method.

def wait(self):
        with self.counter.lock:
            self.counter.val.value += 1
            if self.counter.val.value == self.n:
                self.barrier.release()
        self.barrier.acquire()
        self.barrier.release()

First, you use self.barrier.wait() twice, but the class counter value is never restarted, so that the condition if self.counter.val.value == self.n will never the satisfied again.

Second, the way the .release() methods are called results in the Semaphore value to become 1 after calling wait the first time. Because of this, the same logic can not be applied when calling wait() a second time...

I have created a notebook where you can check what I mean (although it is not very well organized).

I propose to change the wait() function to something like this:

def wait(self,name):
        with self.counter.lock:
            self.counter.val.value += 1
            if self.counter.val.value % self.n == 0:
                for i in range(self.n-1):
                    self.barrier.release()
                return

        self.barrier.acquire()

This way, we do not need to restart the counter value, and the last process to call the method will take care of releasing all the previous ones, leaving the Semaphore value at 0.

traai / async-deep-rl Goto Github PK

async-deep-rl's Introduction

What is in this repo?

How to run the algorithms (MacOSX for now)?

Running TensorBoard

async-deep-rl's People

Contributors

Stargazers

Watchers

Forkers

async-deep-rl's Issues

merge possible?

Problem synchronizing processes

How to start running?

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent