Git Product home page Git Product logo

async-rl's Introduction

Asyncronous RL in Tensorflow + Keras + OpenAI's Gym

This is a Tensorflow + Keras implementation of asyncronous 1-step Q learning as described in "Asynchronous Methods for Deep Reinforcement Learning".

Since we're using multiple actor-learner threads to stabilize learning in place of experience replay (which is super memory intensive), this runs comfortably on a macbook w/ 4g of ram.

It uses Keras to define the deep q network (see model.py), OpenAI's gym library to interact with the Atari Learning Environment (see atari_environment.py), and Tensorflow for optimization/execution (see async_dqn.py).

Requirements

Usage

Training

To kick off training, run:

python async_dqn.py --experiment breakout --game "Breakout-v0" --num_concurrent 8

Here we're organizing the outputs for the current experiment under a folder called 'breakout', choosing "Breakout-v0" as our gym environment, and running 8 actor-learner threads concurrently. See this for a full list of possible game names you can hand to --game.

Visualizing training with tensorboard

We collect episode reward stats and max q values that can be vizualized with tensorboard by running the following:

tensorboard --logdir /tmp/summaries/breakout

This is what my per-episode reward and average max q value curves looked like over the training period:

Evaluation

To run a gym evaluation, turn the testing flag to True and hand in a current checkpoint file:

python async_dqn.py --experiment breakout --testing True --checkpoint_path /tmp/breakout.ckpt-2690000 --num_eval_episodes 100

After completing the eval, we can upload our eval file to OpenAI's site as follows:

import gym
gym.upload('/tmp/breakout/eval', api_key='YOUR_API_KEY')

Now we can find the eval at https://gym.openai.com/evaluations/eval_uwwAN0U3SKSkocC0PJEwQ

Next Steps

See a3c.py for a WIP async advantage actor critic implementation.

Resources

I found these super helpful as general background materials for deep RL:

Important notes

  • In the paper the authors mention "for asynchronous methods we average over the best 5 models from 50 experiments". I overlooked this point when I was writing this, but I think it's important. These async methods seem to vary in performance a lot from run to run (at least in my implementation of them!). I think it's a good idea to run multiple seeded versions at the same time and average over their performance to get a good picture of whether or not some architectural change is good or not. Equivalently don't get discouraged if you don't see performance on your task right away; try rerunning the same code a few more times with different seeds.
  • This repo has no affiliation with Deepmind or the authors; it was just a simple project I was using to learn TensorFlow. Feedback is highly appreciated.

async-rl's People

Contributors

bryant1410 avatar coreylynch avatar ei-grad avatar osh avatar xu-song avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

async-rl's Issues

About the randomness of the performance

I am currently trying to run your code and get the same performance, but the mean reward is stuck around a score of 5. I have tried to run it three times and I got the same performance each time. The code seems to run fine though.

How random is the performance ? How many trials did you do before obtaining the results presented in the README ?

pretrained model

Hi @coreylynch , thanks for the awesome project!

I was wondering, do you have the Keras weights of a pretrained agent somewhere? I was looking to do some quick visualizations with breakout.

Best,
-eder

tf.Variable unexpected keyword 'dtype'

Using TensorFlow backend.
can't determine number of CPU cores: assuming 4
I tensorflow/core/common_runtime/local_device.cc:25] Local device intra op parallelism threads: 4
can't determine number of CPU cores: assuming 4
I tensorflow/core/common_runtime/local_session.cc:45] Local session inter op parallelism threads: 4
INFO:gym.envs.registration:Making new env: Breakout-v0
[2016-07-16 22:50:28,278] Making new env: Breakout-v0
Traceback (most recent call last):
File "async_dqn.py", line 310, in
tf.app.run()
File "/Library/Python/2.7/site-packages/tensorflow/python/platform/default/_app.py", line 11, in run
sys.exit(main(sys.argv))
File "async_dqn.py", line 301, in main
graph_ops = build_graph(num_actions)
File "async_dqn.py", line 173, in build_graph
s, q_network = build_network(num_actions=num_actions, agent_history_length=FLAGS.agent_history_length, resized_width=FLAGS.resized_width, resized_height=FLAGS.resized_height)
File "/Users/nathaniel/Downloads/async-rl-master/model.py", line 10, in build_network
model = Convolution2D(nb_filter=16, nb_row=8, nb_col=8, subsample=(4,4), activation='relu', border_mode='same')(inputs)
File "/usr/local/lib/python2.7/site-packages/keras/engine/topology.py", line 458, in call
self.build(input_shapes[0])
File "/usr/local/lib/python2.7/site-packages/keras/layers/convolutional.py", line 296, in build
self.W = self.init(self.W_shape, name='{}_W'.format(self.name))
File "/usr/local/lib/python2.7/site-packages/keras/initializations.py", line 61, in glorot_uniform
return uniform(shape, s, name=name)
File "/usr/local/lib/python2.7/site-packages/keras/initializations.py", line 33, in uniform
name=name)
File "/usr/local/lib/python2.7/site-packages/keras/backend/tensorflow_backend.py", line 103, in variable
v = tf.Variable(value, dtype=_convert_string_dtype(dtype), name=name)
TypeError: init() got an unexpected keyword argument 'dtype'

Reward doesn't go up ....

I ran the async dqn model out of the box with 3 seeds on 7 atari games on 24 threads -- Pong, Breakout, SeaQuest, BeamRider, SpaceInvaders, Qbert, and Enduro. However, the reward stays the same for all the games until 11M global time steps. I've also run Breakout up to 30M global steps with 5 seeds and the reward doesn't go up either. Anybody has this issue?

OSError,why?

OSError: dlopen(/Users/weyn/anaconda/lib/python2.7/site-packages/atari_py/ale_interface/build/libale_c.so, 6): Symbol not found: __ZNKSt5ctypeIcE13_M_widen_initEv
Referenced from: /Users/weyn/anaconda/lib/python2.7/site-packages/atari_py/ale_interface/build/libale_c.so
Expected in: /usr/lib/libstdc++.6.dylib
in /Users/weyn/anaconda/lib/python2.7/site-packages/atari_py/ale_interface/build/libale_c.so

Do results differ only because of the seed?

You write that one should try experiments with multiple seeds. Did you found that results differ substantially given only different seeds?

I'm asking because in the paper, Mnih. et al. take the best 5 out of 50 runs with different learning rates. However, from the paper it's not clear to me whether the methods are sensitive to the choice of learning rate or instable in general.

No local network synchronization

I'm interested as to why you decided not to create a local copy of the variables in the worker threads and sync them with the global network at the end of the rollout. Does that create issues with the global network (being used for inference in the rollout) being updated in the middle of rollout? Is there a reason why you changed your algorithm from the one described in the Async methods for RL paper?

ValueError: need more than 4 values to unpack

When I try to run the a3c.py, I came across some problem.
“”“
Traceback (most recent call last):
File "/usr/lib/python2.7/threading.py", line 801, in __bootstrap_inner
self.run()
File "/usr/lib/python2.7/threading.py", line 754, in run
self.__target(*self.__args, **self.__kwargs)
File "a3c.py", line 71, in actor_learner_thread
s, a, R, minimize, p_network, v_network = graph_ops
ValueError: need more than 4 values to unpack
“‘’”
Follow by the solution in the Stackoverflow, I add a comma in the code. but it failed.
I would appreciate it if anyone can help me.

t_max = 32

Hello,

In the A3C paper they state t_max = 5, is there any reason you set it to 32?

Actually I don't really understand why the batch size should be so small, why shouldn't we use traditional batch sizes of 128 or more frames, shouldn't this make learning stronger?

How to speed up training with GPU?

Hey! Thanks a bunch for sharing this.
I've made some attempts of speeding up the training with a GPU, but if there is any increase at all - it's very little. I get about 10 global frames/steps per sec when running the algorithm not on ALE but on a very simple python-script I've written myself. I've tried other GPU-compatible DL-algoritms and the slowdown doesn't seem to originate from scrip I've written. Do you have any idea of how to manage this issue?

When are you planning to have A3C FF ( Algorithm 2) and A3C LSTM (Algorithm 3) done

What is your timeline of having n-Step Q-Learning A3C FF ( Algorithm 2 ) and A3C LSTM ( Algorithm 3) done as per you next steps in Keras + Tensorflow . I do have some code for a Stock Trading game that is using Deep Q ( just standard Deep Q Learning with Experience Play, but i would like to use A3C LSTM with Experience Play as per the research paper ) . Let me know if you are interested in working to incorporate the Stock trading Game into your code ( i will email you the zip code, it is 6 small python files) It is in Keras + TensorFlow .

Tensorflow outdated

I guess this code is written in old tensorflow?

x = tf.reshape(x, tf.pack([-1, prod(shape(x)[1:])]))
AttributeError: 'module' object has no attribute 'pack'

Is it possible that this code updated to latest tensorflow.

Thanks!

RGB image

How to use the raw RGB instead of the grayscaled image ?
I have some troubles with the neural networks shape which doesn't match the observation shape (84,84,3) ?

clipping

In the code the rewards returned from the environment are clipped between -1 and 1. But I believe breakout will give higher rewards than 1 for bricks in rows nearer the top. What is the rationale for clipping?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.