Git Product home page Git Product logo

tf-a3c-gpu's Introduction

tf-a3c-gpu

Tensorflow implementation of A3C algorithm using GPU (haven't tested, but it would be also trainable with CPU).

On the original paper, "Asynchronous Methods for Deep Reinforcement Learning", suggests CPU only implementations, since environment can only be executed on CPU which causes unevitable communication overhead between CPU and GPU otherwise.

However, we can minimize communication up to 'current state, reward' instead of whole parameter sets by storing all parameters for a policy and a value network inside of a GPU. Furthermore, we can achieve more utilization of a GPU by having multiple agent for a single thread. The current implementation (and with minor tuned-hyperparameters) uses 4 threads while each has 64 agents. With this setting, I was able to achieve 2 times of speed up. (huh, a little bit disappointing, isn't it?)

Therefore, this implementation is not quietly exact re-implementation of the paper, and the effect of having multiple batch for each thread is worth to be examined. (different # of threads and agents per thread). (However, I am still curious about how A3C can achieve such a nice results. Is the asynchrnous update is the only key? I couldn't find other explanations of effectiveness of this method.) Yet, it gave me a quiet competitive result (3 hours of training on breakout-v0 for reasonable playing), so it could be a good base for someone to start with.

Enjoy :)

Requirements

  • Python 2.7
  • Tensorflow v1.2
  • OpenAI Gym v0.9
  • scipy, pip (for image resize)
  • tqdm(optional)
  • better-exceptions(optional)

Training Results

  • Training on Breakout-v0 is done with nVidia Titan X Pascal GPU for 28 hours

  • With the hyperparameter I used, one step corresponds to 64 * 5 frames of inputs(64 * 5 * average 3 game framse).

  • Orange Line: with reward clipping(reward is clipped to -1 to 1) + Gradient Normalization, Purple Line: wihtout them

    • by the number of steps

    • by the number of episodes

    • by the time

  • Check the results on my results page

    Watch the video

Training from scratch

  • All the hyperparmeters are defined on a3c.py file. Change some hyperparameters as you want, then execute it.
python ac3.py

Validation with trained models

  • If you want to see the trained agent playing, use the command:
python ac3-test.py --model ./models/breakout-v0/last.ckpt --out /tmp/result

Notes & Acknowledgement

tf-a3c-gpu's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

tf-a3c-gpu's Issues

Why clip Rewards?

I was wondering why clipping the rewards improves the performance.....the rewards for the Breakout environment (using OpenAI gym) is already limited between [-1, 1]. Could it be that the performance difference is due to the gradient normalization only?

I also noticed that you use tf.clip_by_average_norm instead of tf.clip_by_global_norm. Have you tried the latter? It's just that in other A3C implementations that I have seen, it is far more common to use the latter, and that made me wounder if there is any specific to use clip_by_average_norm.

Anyways, congratulations! Great work!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.