Using a paper from Google DeepMind I've developed a new version of the DQN using threads exploration instead of memory replay as explain in here: http://arxiv.org/pdf/1602.01783v1.pdf I used the one-step-Q-learning pseudocode, and now we can train the Pong game in less than 20 hours and without any GPU or network distribution.

Python 100.00%

asynchronous-methods-for-deep-reinforcement-learning's Introduction

Asynchronous-Methods-for-Deep-Reinforcement-Learning

asynchronous-methods-for-deep-reinforcement-learning's People

Stargazers

Watchers

asynchronous-methods-for-deep-reinforcement-learning's Issues

It is GREAT!

Hi zeta:

I saw your reply and came here to have a look.

This is so cool! I really need a method that can avoid memory replay since the memory space is a big problem and time consuming.

Thank you. I will look into it further.
Mingyan

Different final epsilons from the paper

The paper states that the final epsilons should be [0.1, 0.01, 0.5]. But I noticed in your code they are [0.01, 0.01, 0.05] (Strangely there are two 0.01s). Is this a mistake or intentional improvement?

I'm tunning the model myself, while I'm not sure which hyper parameters are important.

Training in process/core level parallelism

Hi @Zeta36

Great project! I'm trying to run some experiments with the code. It seems that currently the code uses threading with tensorflow, and from my observation, the training loop is not really in full parallel because of running on threads instead of processes. I think ideally, each learner should be on a different process to fully utilize a modern machine.

This might be relevant:
http://stackoverflow.com/questions/34900246/tensorflow-passing-a-session-to-a-python-multiprocess

But it looks like bad news, I can't just spawn a bunch of processes and let them share the same tensorflow session. So maybe a distributed tensorflow session is what we need:
https://github.com/tensorflow/tensorflow/blob/master/tensorflow/g3doc/how_tos/distributed/index.md

exception for thread

when i start the thread it show this exception

Exception in thread Thread-31:
Traceback (most recent call last):
File "/home/anderson/.conda/envs/tensorflow/lib/python3.6/threading.py", line 916, in _bootstrap_inner
self.run()
File "/home/anderson/.conda/envs/tensorflow/lib/python3.6/threading.py", line 864, in run
self._target(*self._args, **self._kwargs)
File "", line 141, in actorLearner
x_t1_col, r_t, terminal, info = env.step(KEYMAP[GAME][action_index])
File "/home/anderson/Videos/gym/gym/wrappers/time_limit.py", line 31, in step
observation, reward, done, info = self.env.step(action)
File "/home/anderson/Videos/gym/gym/envs/atari/atari_env.py", line 68, in step
action = self._action_set[a]
IndexError: index 5 is out of bounds for axis 0 with size 4

Implement the actor-critic methods

Hello,
In the asynchronous dqn paper, they also described an on policy method, the advantage actor-critic (A3C), which achieved better results than others, do you currently have any plan to include this method in this repo as well?
Because I am working off this repo as a starting point, and attempt to reproduce the results of the A3C method on the continuous action domain, but I am still trying to figure out the network model they used in the physical state case when apply to Mojoco, and how the policy gradient is accumulated.

zeta36 / asynchronous-methods-for-deep-reinforcement-learning Goto Github PK

asynchronous-methods-for-deep-reinforcement-learning's Introduction

Asynchronous-Methods-for-Deep-Reinforcement-Learning

asynchronous-methods-for-deep-reinforcement-learning's People

Stargazers

Watchers

Forkers

asynchronous-methods-for-deep-reinforcement-learning's Issues

It is GREAT!

Different final epsilons from the paper

Training in process/core level parallelism

exception for thread

Implement the actor-critic methods

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent