Git Product home page Git Product logo

async-deep-rl's Introduction

What is in this repo?

Join the chat at https://gitter.im/traai/async-deep-rl

A Tensorflow-based implementation of all algorithms presented in Asynchronous Methods for Deep Reinforcement Learning.

This implementation uses processes instead of threads to achieve real concurrency. Each process has a local replica of the network(s) used, implemented in Tensorflow, and runs its own Tensorflow session. In addition, a copy of the network parameters are kept in a shared memory space. At runtime, each process uses its own local network(s) to choose actions and compute gradients (with Tensorflow). The shared network parameters are updated periodically in an asynchronous manner, by applying the grads obtained from Tensorflow into the shared memory space.

Both ALE and Open AI GYM environments can be used.

#Results The graphs below show the reward achieved in different games by one individual actor during training (i.e., not averaging over several runs, and over all actors, as in the paper). All experiments were run on a rather old machine equipped with 2 Xeon E5540 quad-core 2.53GHz CPUs (16 virtual cores) and 47 Gb RAM.

Boxing-v0 (from OpenAI Gym), A3C, 100 actors, lr=0.0007, 80M steps in 59h, 31m: As you can see, the score achieved is much higher than the one reported in the paper. That is due to the effect of having 100 actors. So concurrently exploring the environment in different ways definitely helps with the learning process and makes experience replay not needed. Note, however, that the performance in terms of training time is slightly worse than with fewer actors. This is probably due to our implementation, which is not optimal, and to the limitations of the machine we used.

Pong (from ALE), A3C, 16 actors, lr=0.0007, 80M steps in 48h:

Beam Rider (from ALE), A3C, 16 actors, lr=0.0007, 80M steps in 45h, 25min:

Breakout (from ALE), A3C, 15 actors, lr=0.0007, 80M steps in 53h, 22m:

How to run the algorithms (MacOSX for now)?

A number of hyperparameters can be specified. Default values have been chosen according to the paper and information received by @muupan from the authors. To see a list, please run:

python main.py -h

If you just want to see the code in action, you can kick off training with the default hyperparameters by running:

python main.py pong --rom_path ../atari_roms/

To run outside a docker, you need to install some dependencies:

  • Tensorflow
  • OpenAI Gym
  • The Arcade Learning Environment (ALE).(Note that OpenAI Gym uses ALE internally, so you could use that version. This would require some hacking.)
  • Scikit-image
  • Open CV v2, for standalone ALE (It should be possible to change the code in emulator.py to use Scikit-image instead of CV2. Indeed, CV2 might slow things down)

To run inside a docker:

(1) Clone this repo at ~/some-path.

(2) Make sure your machine has docker installed. Follow instructions [here] (https://docs.docker.com/toolbox/toolbox_install_mac/) if not. [These] (https://docs.docker.com/toolbox/toolbox_install_windows/) instructions may work for Windows.

(3) Make sure you have xquartz installed in order to visualise game play. Do the following in a separate terminal window:

$ brew cask install --force xquartz
$ open -a XQuartz
$ socat TCP-LISTEN:6000,reuseaddr,fork UNIX-CLIENT:\"$DISPLAY\"

(4) Get our docker image containing all dependencies to run the algorithms and to visualise game play.

$ docker pull restrd/tensorflow-atari-cpu

(5) Run the docker image. This will mount your home folder to /your-user-name inside the container. Be sure to give a name to the container: <container-name>

$ docker run -d -p 8888:8888 -p 6006:6006 --name "<container-name>" -v ~/:/root/$usr -e DISPLAY=$(ifconfig vboxnet0 | awk '$1 == "inet" {gsub(/\/.*$/, "", $2); print $2}'):0 -it docker.io/restrd/tensorflow0.10-atari-cpu

(6) Shell into the container.

$ docker exec -it <container-name> /bin/bash

(7) Go to the algorithms folder (/your-user-name/some-path/async-deep-rl/algorithms) and choose which algorithm to run via the configuration options in main.py.

(8) If you want to run the algorithms using Open AI GYM with 16 processes and visualize the games, e.g.:

$ python main.py BeamRider-v0 --env GYM -n 16 -v 1 

Running TensorBoard

You can also run TensorBoard to visualise losses and game scores.

(1) Configure port forwarding rules in [VirtualBox] (https://www.virtualbox.org/). Go to your running virtual machine's Settings>Network>Port Forwarding, and add a new rule (see row starting with tb in pic).

Setting port forwarding in VirtualBox

(2) Run tensorboard from within the container:

$ tensorboard --logdir=/tmp/summary_logs/ &

(3) If not (1), get the ip address of your docker host running inside of [VirtualBox] (https://www.virtualbox.org/). Go to http://<docker-host-ip>:6006

If (1), go to http://127.0.0.1:6006

async-deep-rl's People

Contributors

arjunchandra avatar gitter-badger avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

async-deep-rl's Issues

Problem synchronizing processes

Hi @traai,

Your code has been quite useful and it has helped me to understand asynchronous methods much better.

I have a comment about the following line in the ActorLearner class:

self.barrier.wait()

If I understand correctly, this Barrier class is supposed to help us synchronize all the processes so that the initial parameters are initialized by the first process and then once again after all processes have finished synchronizing these values.

I have checked the Barrier class, however, and I think there are some issues with the wait() method.

def wait(self):
        with self.counter.lock:
            self.counter.val.value += 1
            if self.counter.val.value == self.n:
                self.barrier.release()
        self.barrier.acquire()
        self.barrier.release()

First, you use self.barrier.wait() twice, but the class counter value is never restarted, so that the condition if self.counter.val.value == self.n will never the satisfied again.

Second, the way the .release() methods are called results in the Semaphore value to become 1 after calling wait the first time. Because of this, the same logic can not be applied when calling wait() a second time...

I have created a notebook where you can check what I mean (although it is not very well organized).

I propose to change the wait() function to something like this:

def wait(self,name):
        with self.counter.lock:
            self.counter.val.value += 1
            if self.counter.val.value % self.n == 0:
                for i in range(self.n-1):
                    self.barrier.release()
                return

        self.barrier.acquire()

This way, we do not need to restart the counter value, and the last process to call the method will take care of releasing all the previous ones, leaving the Semaphore value at 0.

How to start running?

The instructions in the README.md say to run main.py as follows:

python main.py BeamRider-v0 --env GYM -n 16 -v 1

However, main.py has been removed. Can you update the instructions to run?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.