tf-a3c-gpu's Introduction

tf-a3c-gpu

Tensorflow implementation of A3C algorithm using GPU (haven't tested, but it would be also trainable with CPU).

On the original paper, "Asynchronous Methods for Deep Reinforcement Learning", suggests CPU only implementations, since environment can only be executed on CPU which causes unevitable communication overhead between CPU and GPU otherwise.

However, we can minimize communication up to 'current state, reward' instead of whole parameter sets by storing all parameters for a policy and a value network inside of a GPU. Furthermore, we can achieve more utilization of a GPU by having multiple agent for a single thread. The current implementation (and with minor tuned-hyperparameters) uses 4 threads while each has 64 agents. With this setting, I was able to achieve 2 times of speed up. (huh, a little bit disappointing, isn't it?)

Therefore, this implementation is not quietly exact re-implementation of the paper, and the effect of having multiple batch for each thread is worth to be examined. (different # of threads and agents per thread). (However, I am still curious about how A3C can achieve such a nice results. Is the asynchrnous update is the only key? I couldn't find other explanations of effectiveness of this method.) Yet, it gave me a quiet competitive result (3 hours of training on breakout-v0 for reasonable playing), so it could be a good base for someone to start with.

Enjoy :)

Requirements

Python 2.7
Tensorflow v1.2
OpenAI Gym v0.9
scipy, pip (for image resize)
tqdm(optional)
better-exceptions(optional)

Training Results

Training on Breakout-v0 is done with nVidia Titan X Pascal GPU for 28 hours
With the hyperparameter I used, one step corresponds to 64 * 5 frames of inputs(64 * 5 * average 3 game framse).
Orange Line: with reward clipping(reward is clipped to -1 to 1) + Gradient Normalization, Purple Line: wihtout them
- by the number of steps
- by the number of episodes
- by the time
Check the results on my results page

Training from scratch

All the hyperparmeters are defined on a3c.py file. Change some hyperparameters as you want, then execute it.

python ac3.py

Validation with trained models

If you want to see the trained agent playing, use the command:

python ac3-test.py --model ./models/breakout-v0/last.ckpt --out /tmp/result

Notes & Acknowledgement

Here is other implementations and code I refer to.
- ppwwyyxx's implementation
- carpedm20's implementation of DQN

tf-a3c-gpu's People

Stargazers

Watchers

tf-a3c-gpu's Issues

Why clip Rewards?

I was wondering why clipping the rewards improves the performance.....the rewards for the Breakout environment (using OpenAI gym) is already limited between [-1, 1]. Could it be that the performance difference is due to the gradient normalization only?

I also noticed that you use tf.clip_by_average_norm instead of tf.clip_by_global_norm. Have you tried the latter? It's just that in other A3C implementations that I have seen, it is far more common to use the latter, and that made me wounder if there is any specific to use clip_by_average_norm.

Anyways, congratulations! Great work!

QueueRunner doesn't seem to get any information from the training method

Hi,

It seems like the queue runner can not pull the results from the agent iterations. In other words the execution get stuck in line 104 of a3c.py, without giving any error. Could you please double check your code?

Recommend Projects

hiwonjoon / tf-a3c-gpu Goto Github PK

tf-a3c-gpu's Introduction

tf-a3c-gpu

Requirements

Training Results

Training from scratch

Validation with trained models

Notes & Acknowledgement

tf-a3c-gpu's People

Stargazers

Watchers

Forkers

tf-a3c-gpu's Issues

Why clip Rewards?

QueueRunner doesn't seem to get any information from the training method

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent