devsisters / dqn-tensorflow Goto Github PK

View Code? Open in Web Editor NEW

2.4K 146.0 762.0 29.57 MB

Tensorflow implementation of Human-Level Control through Deep Reinforcement Learning

License: MIT License

Python 100.00%

dqn-tensorflow's Introduction

Human-Level Control through Deep Reinforcement Learning

Tensorflow implementation of Human-Level Control through Deep Reinforcement Learning.

This implementation contains:

Deep Q-network and Q-learning
Experience replay memory
- to reduce the correlations between consecutive updates
Network for Q-learning targets are fixed for intervals
- to reduce the correlations between target and predicted Q-values

Requirements

Python 2.7 or Python 3.3+
gym
tqdm
SciPy or OpenCV2
TensorFlow 0.12.0

Usage

First, install prerequisites with:

$ pip install tqdm gym[all]

To train a model for Breakout:

$ python main.py --env_name=Breakout-v0 --is_train=True
$ python main.py --env_name=Breakout-v0 --is_train=True --display=True

To test and record the screen with gym:

$ python main.py --is_train=False
$ python main.py --is_train=False --display=True

Results

Result of training for 24 hours using GTX 980 ti.

Simple Results

Details of Breakout with model m2(red) for 30 hours using GTX 980 Ti.

Details of Breakout with model m3(red) for 30 hours using GTX 980 Ti.

Detailed Results

[1] Action-repeat (frame-skip) of 1, 2, and 4 without learning rate decay

[2] Action-repeat (frame-skip) of 1, 2, and 4 with learning rate decay

[1] & [2]

[3] Action-repeat of 4 for DQN (dark blue) Dueling DQN (dark green) DDQN (brown) Dueling DDQN (turquoise)

The current hyper parameters and gradient clipping are not implemented as it is in the paper.

[4] Distributed action-repeat (frame-skip) of 1 without learning rate decay

[5] Distributed action-repeat (frame-skip) of 4 without learning rate decay

References

License

MIT License.

dqn-tensorflow's People

Contributors

Stargazers

Watchers

Forkers

floodsung mrkulk jason-m-evans kastnerkyle ml-ai-nlp-ir robert-ko benjamesbabala zhuangkechen skypea erlenlok banjjagi thierry-silbermann scaiado wavelets ml-lab gandalfvn solaris33 khmontik hiwonjoon alexbigboy rainmaker712 zlingneng codedai jorenvs caidongyun xiaolonw sarvghotra wangxiao5791509 chagge edgarbc huleg louiekang csdlrl milestonesvn zhangzongliang david4096 dhlee421 ivehui bram1et neverspill tonyan oronanschel jajohe jinsongbo qa1 iamroot-ai-2th qazmichaelgw devkang89 zergylord teslasloan dylanthomas andhus jkw552403 vyraun zeyuan1987 wucpmark orchestor splendor-kill ywk991112 tranlm eraoul stevekapturowski qiuzhangcheng cuixue phpmind sunxs1101 ronghanghu highwayns gentlegene digideskio zzmjohn elsenliu jongchan kaishengyao silviorevelli pemami4911 jbromley jacktang nishithbsk xfdywy weasley001 gyunt guojiyao samlobel atone wudeshi visionu junmyung ionelhosu tonydeep luoyadan linzichuan collector-m yingjerkao thesby frankhan91 liuweiming awilliamson sahanduiuc f133030

dqn-tensorflow's Issues

why the content is not match the readerme?

where is the m2 and m3?

Slower than deep_q_rl

Hi, I found that this implementation is slower than deep_q_rl which is implemented by theano.
Is it because this repo used openai gym rather than rom files?
Or the performance between Thesorflow and Theano? Or any other details?

deep_q_rl runs 100-200 steps at learning process.
But DQN-tensorflow just runs 70-90 steps at learning process. It makes the training slow, and cannot run 200M in 10 days as dqn nature paper.

How much RAM is required for a proper training?

For on the order of 100M iteration, what was the required amount of RAM in your case? 16GB GPU EC2 instance with 2GB GPU apparently has no enough memory and locks itself down.

breakout-v0 with initial parameter got a terrible performance

hi, i run this code with the initial parameters: model='m1' with game breakout-v0. after about 8days' gpu training, when the program finished, i evaluated the model and got average-reward 22.0， which has a big difference with your screenshot(score=300+)。

And another experiment only with the switch(duel=True, double_q=True) on, the model's average-reward equals 5.4. Even worse than the original DQN.

Is there any trick i missed ? thanks for your replay!

No module named 'utils

when i try to run
python main.py --env_name=Breakout-v0 --is_train=True

i get this error

Traceback (most recent call last):
File "main.py", line 4, in
from dqn.agent import Agent
File "/home/dahandla/DQN-tensorflow-master/dqn/agent.py", line 10, in
from .replay_memory import ReplayMemory
File "/home/dahandla/DQN-tensorflow-master/dqn/replay_memory.py", line 8, in
from utils import save_npy, load_npy
ImportError: No module named 'utils'

any help would be appreciated

Trouble loading checkpoints

I need to be able to resume training breakout-v0 after stopping it. I would also like to be able to move a checkpoint dir to another machine and resume training there.

When I train on my laptop, using ubuntu 14.04, I am able to resume after stopping. But on the faster machine I really want to use, I can not resume after stopping. That machine uses ubuntu 16.04, FWIW.

Both machines use tensorflow 1.3.0. The working laptop uses python 3.6 and the non-working machine uses python 3.5.2. OpenAI gym is version 0.9.4 on both machines, as installed by pip. Neither machine uses GPU, and both use NHWC.

On both machines, I have cloned from the devsisters/DQN-tensorflow repository and manually fixed the bugs that prevent it from working with python 3.x.

`~/DQN-tensorflow$ python main.py --env_name=Breakout-v0 --is_train=True --display=False

[*] GPU : 1.0000
{'_save_step': 500000,
'_test_step': 50000,
'action_repeat': 4,
'backend': 'tf',
'batch_size': 32,
'cnn_format': 'NHWC',
'discount': 0.99,
'display': False,
'double_q': False,
'dueling': False,
'env_name': 'Breakout-v0',
'env_type': 'detail',
'ep_end': 0.1,
'ep_end_t': 1000000,
'ep_start': 1.0,
'history_length': 4,
'learn_start': 50000.0,
'learning_rate': 0.00025,
'learning_rate_decay': 0.96,
'learning_rate_decay_step': 50000,
'learning_rate_minimum': 0.00025,
'max_delta': 1,
'max_reward': 1.0,
'max_step': 50000000,
'memory_size': 1000000,
'min_delta': -1,
'min_reward': -1.0,
'model': 'm1',
'random_start': 30,
'scale': 10000,
'screen_height': 84,
'screen_width': 84,
'target_q_update_step': 10000,
'train_frequency': 4}
WARNING:tensorflow:From /home/mjc/DQN-tensorflow/dqn/agent.py:224: calling argmax (from tensorflow.python.ops.math_ops) with dimension is deprecated and will be removed in a future version.
Instructions for updating:
Use the axis argument instead
WARNING:tensorflow:From /opt/anaconda/miniconda3/envs/tfbuild/lib/python3.5/site-packages/tensorflow/python/util/tf_should_use.py:107: initialize_all_variables (from tensorflow.python.ops.variables) is deprecated and will be removed after 2017-03-02.
Instructions for updating:
Use tf.global_variables_initializer instead.

[*] Loading checkpoints...
[!] Load FAILED: checkpoints/Breakout-v0/backend-tf/ep_end-0.1/model-m1/screen_width-84/env_type-detail/learning_rate-0.00025/learning_rate_minimum-0.00025/memory_size-1000000/env_name-Breakout-v0/dueling-False/learning_rate_decay-0.96/batch_size-32/min_delta--1/max_reward-1.0/learn_start-50000.0/double_q-False/max_delta-1/scale-10000/random_start-30/cnn_format-NHWC/discount-0.99/min_reward--1.0/action_repeat-4/learning_rate_decay_step-50000/ep_start-1.0/history_length-4/target_q_update_step-10000/ep_end_t-1000000/train_frequency-4/max_step-50000000/screen_height-84/
`

How can this problem be fixed?

Add requirements.txt or alternative

Hi!
Currently, I try to get this project running. However, the README does not exactly describe which requirements are necessary and issues like #29 arise. It would be useful for me and others, if there is a requirements.txt or maybe even a Dockerfile in order to get this running on any platform.
Cheers
René

EDIT:
For me following requirements.txt work:

atari-py==0.0.21
Box2D-kengz==2.3.3
certifi==2017.7.27.1
chardet==3.0.4
funcsigs==1.0.2
gym==0.7.0
idna==2.6
imageio==2.2.0
Keras==2.0.8
mock==2.0.0
mujoco-py==0.5.7
numpy==1.13.3
olefile==0.44
pachi-py==0.0.21
pbr==3.1.1
Pillow==4.3.0
protobuf==3.1.0
pyglet==1.2.4
PyOpenGL==3.1.0
PyYAML==3.12
requests==2.18.4
scipy==1.0.0
six==1.11.0
tensorflow==0.12.0
Theano==0.9.0
tqdm==4.19.4
urllib3==1.22

In addition I had to run following commands:

brew install swig
brew install cmake

the simulator should look for you like this:

Agent.learning_rate is never set?

I was reading through the code and couldn't figure out where Agent.learning_rate is being set. It's used here:

https://github.com/devsisters/DQN-tensorflow/blob/master/dqn/agent.py#L299

But it's only set on the Config object, not the Agent.

How to run for more epochs

How do I run the training for longer?

Possible bug of using numpy randint

Code of line 62 in replay_memory.py:
index = random.randint(self.history_length, self.count - 1)
should change to
index = random.randint(self.history_length, self.count)
Correct ?

[!] Load FAILED error

Hi, I was trying to run the DQN code. when it iterated 50000 steps, an error [!] Load FAILED happened. According to the error information. CPU only supports data format "NHWC", but the code executed by gpu with data format "NCHW". Thus, I want to know how to execute gpu with "NCHW" and save or load cpu with "NHWC" to avoid this error . THX!!

code modified by me following(not work):

  def save_model(self, step=None):
    print(" [*] Saving checkpoints...")
    **self.config.cnn_format = "NHWC"**
    print("******** save begin data_formate %s", self.config.cnn_format);
    model_name = type(self).__name__

    if not os.path.exists(self.checkpoint_dir):
      os.makedirs(self.checkpoint_dir)
    self.saver.save(self.sess, self.checkpoint_dir, global_step=step)
    **self.config.cnn_format = "NCHW"**
    print("******** save end data_formate %s", self.config.cnn_format);




  def load_model(self):
    print(" [*] Loading checkpoints...")
    **self.config.cnn_format = "NHWC"**
    print("******** load begin data_formate %s", self.config.cnn_format);

    ckpt = tf.train.get_checkpoint_state(self.checkpoint_dir)
    if ckpt and ckpt.model_checkpoint_path:
      ckpt_name = os.path.basename(ckpt.model_checkpoint_path)
      fname = os.path.join(self.checkpoint_dir, ckpt_name)
      self.saver.restore(self.sess, fname)
      print(" [*] Load SUCCESS: %s" % fname)
      **self.config.cnn_format = "NCHW"**
      print("******** load end data_formate %s", self.config.cnn_format);
      return True
    else:
      print(" [!] Load FAILED: %s" % self.checkpoint_dir)
      **self.config.cnn_format = "NCHW"**
      print("******** load end data_formate %s", self.config.cnn_format);
      return False

No updates to the learning rate

This suggests that the maximum of the minimum learning rate and exponentially decayed rate is calculated. But in the configurations file, both the learning rate and the minimum learning rate are supplied the same values. This will result in no updates to the learning rate with more training steps.
OR Is this specifically for the case with no updates in the learning rate?
Thanks.

Why set terminal=True when live decreases?

In gym, terminal is True only when lives is 0. But in act() function of GymEnvironment class, it seems that terminal is True when live decrease. It will influence num_game and episode reward.

Error! TimeLimit' object has no attribute 'ale'

I run this code in cpu and this error occurred.

TimeLimit' object has no attribute 'ale' Can anyone show me how to solve this matter?
Thank you !

Unable to see various charts in Tensorboard Events

Seems like in assets/tensorboard_160516.png, the Tensorboard displayed metrics such as average loss. I'm unable to see them in Tensorflow 0.11rc, could it be because of the tensorflow version? Which version of tensorflow has this been run on?

A bug in the implementation

Hello, I spotted what I believe might be a bug in the DQN implementation on line 291 here:

https://github.com/devsisters/DQN-tensorflow/blob/master/dqn/agent.py#L291

The code tries to clip the self.delta with tf.clip_by_value, I assume with the intention of being robust when the discrepancy in Q is above a threshold:

self.delta = self.target_q_t - q_acted
self.clipped_delta = tf.clip_by_value(self.delta, self.min_delta, self.max_delta, name='clipped_delta')
self.global_step = tf.Variable(0, trainable=False)
self.loss = tf.reduce_mean(tf.square(self.clipped_delta), name='loss')

However, the clip_by_value function's local gradient outside of the min_delta, max_delta range is zero. Therefore, with the current code whenever the discrepancy is above min/max delta, the gradient becomes exactly zero in backprop. This might not be what you intend, and is certainly not standard, I believe.

I think you probably want to clip the gradient here, not the raw Q. In that case you would have to use the Huber loss:

def clipped_error(x): 
    return tf.select(tf.abs(x) < 1.0, 0.5 * tf.square(x), tf.abs(x) - 0.5) # condition, true, false

and use this on this.delta instead of tf.square. This would have the desired effect of increased robustness to outliers.

I got error about use_gpu flag..... Could you help me out, please?

Traceback (most recent call last):
File "main.py", line 69, in
tf.app.run()
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/platform/app.py", line 44, in run
_sys.exit(main(_sys.argv[:1] + flags_passthrough))
File "main.py", line 56, in main
raise Exception("use_gpu flag is true when no GPUs are available")
Exception: use_gpu flag is true when no GPUs are available

The above is the error message.

Could you help me out??

GPU Utilization

I have a Titan X and have been running the Breakout simulation for over two days now and it's only 7% through training and nvidia-smi is showing that it's only using 4-5%. The README.md says that it only took 30 hours on a 980. That doesn't seem right. According to main.py, it should be using 100% by default if I don't give the flag. Is anyone else having this issue or is it just me?
Edit:
nvidia-smi -i 0 -q -d MEMORY,UTILIZATION,POWER,CLOCK,COMPUTE shows that FB Memory Usage is 11423 MiB/ 12185 Mib. Does that look correct if using the default GPU setting for Breakout?

AttributeError: 'TimeLimit' object has no attribute 'ale'

Hi, I downloaded the codes, and then test it as it described here.
However, I got this error as follows,
I think, all requirements are installed except opencv2 and openAI gym was tested.
I would appreciate that someone finds the cause and the solution.

Traceback (most recent call last):
File "/DQN-tensorflow-master/main.py", line 69, in
tf.app.run()
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/platform/app.py", line 43, in run
sys.exit(main(sys.argv[:1] + flags_passthrough))
File "/DQN-tensorflow-master/main.py", line 64, in main
agent.train()
File "/DQN-tensorflow-master/dqn/agent.py", line 40, in train
screen, reward, action, terminal = self.env.new_random_game()
File /DQN-tensorflow-master/dqn/environment.py", line 28, in new_random_game
self.new_game(True)
File "/DQN-tensorflow-master/dqn/environment.py", line 21, in new_game
if self.lives == 0:
File "/DQN-tensorflow-master/dqn/environment.py", line 52, in lives
return self.env.ale.lives()
AttributeError: 'TimeLimit' object has no attribute 'ale'

History is not updated with new game screen created after a terminal state is reached

Hi. I am trying to understand the code and I came across what I think is a bug in:

DQN-tensorflow/dqn/agent.py

Line 32 in c7b1f10

def train(self):

It is related with the way the agent interacts with the environment: at the beginning of training the environment is reset via self.env.new_random_game() and afterwards the history is filled with the new random state via self.history.add(screen), which is needed because the agent always chooses its actions taking that history as input via action = self.predict(self.history.get()).

When a terminal state is reached a new random game is created but the new random state is not added to the history this time. This causes that the agent will use the terminal state of the last episode to decide which action to take in the first state of the new episode, which I think is wrong.

A way to fix it would be to add

for _ in range(self.history_length):
    self.history.add(screen)

after this line.

I don't know if fixing this would have any positive impact on performance since it only affects the first self.history_length steps of each episode but anyways I wanted to share it.

Thanks in advance.

How can I use the pre-trained network from the checkpoints dir

Is it possible to download the trained network and use it, to see how it plays the game?

Unable to load GPU-trained model

Hi,

I encounter a problem when I train the Breakout-v0 on GPU on a linux machine, and then want to load the model on my local Mac. Although they are using the same settings (except GPU vs. CPU. I also make sure that the CNN format is the same as GPU when I load it on my Mac), on my Mac the loading of the model is unsuccessful, as shown in the following error:

[*] Loading checkpoints...
INFO:tensorflow:Restoring parameters from checkpoints/Breakout-v0/min_delta--1/max_delta-1/history_length-4/train_frequency-4/target_q_update_step-10000/double_q-False/memory_size-1000000/action_repeat-4/ep_end_t-1000000/dueling-False/min_reward--1.0/backend-tf/random_start-30/scale-10000/env_type-detail/learning_rate_decay_step-50000/ep_start-1.0/screen_width-84/learn_start-50000.0/cnn_format-NCHW/learning_rate-0.00025/batch_size-32/discount-0.99/max_step-50000000/max_reward-1.0/learning_rate_decay-0.96/learning_rate_minimum-0.00025/env_name-Breakout-v0/ep_end-0.1/model-m1/screen_height-84/-3250000
[2017-09-24 19:36:32,049] Restoring parameters from checkpoints/Breakout-v0/min_delta--1/max_delta-1/history_length-4/train_frequency-4/target_q_update_step-10000/double_q-False/memory_size-1000000/action_repeat-4/ep_end_t-1000000/dueling-False/min_reward--1.0/backend-tf/random_start-30/scale-10000/env_type-detail/learning_rate_decay_step-50000/ep_start-1.0/screen_width-84/learn_start-50000.0/cnn_format-NCHW/learning_rate-0.00025/batch_size-32/discount-0.99/max_step-50000000/max_reward-1.0/learning_rate_decay-0.96/learning_rate_minimum-0.00025/env_name-Breakout-v0/ep_end-0.1/model-m1/screen_height-84/-3250000

InvalidArgumentError                      Traceback (most recent call last)
<ipython-input-4-6320008d113d> in <module>()
     17       config.cnn_format = 'NHWC'
     18 
---> 19     agent = Agent(config, env, sess)
     20 
     21     if FLAGS.is_train:

/Users/tailin/Dropbox (Personal)/project/meta_learning/dqn/agent.pyc in __init__(self, config, environment, sess)
     31       self.step_assign_op = self.step_op.assign(self.step_input)
     32 
---> 33     self.build_dqn()
     34 
     35   def train(self):

/Users/tailin/Dropbox (Personal)/project/meta_learning/dqn/agent.pyc in build_dqn(self)
    340     self._saver = tf.train.Saver(self.w.values() + [self.step_op], max_to_keep=30)
    341 
--> 342     self.load_model()
    343     self.update_target_q_network()
    344 

/Users/tailin/Dropbox (Personal)/project/meta_learning/dqn/base.pyc in load_model(self)
     44       ckpt_name = os.path.basename(ckpt.model_checkpoint_path)
     45       fname = os.path.join(self.checkpoint_dir, ckpt_name)
---> 46       self.saver.restore(self.sess, fname)
     47       print(" [*] Load SUCCESS: %s" % fname)
     48       return True

/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/tensorflow/python/training/saver.pyc in restore(self, sess, save_path)
   1558     logging.info("Restoring parameters from %s", save_path)
   1559     sess.run(self.saver_def.restore_op_name,
-> 1560              {self.saver_def.filename_tensor_name: save_path})
   1561 
   1562   @staticmethod

/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/tensorflow/python/client/session.pyc in run(self, fetches, feed_dict, options, run_metadata)
    893     try:
    894       result = self._run(None, fetches, feed_dict, options_ptr,
--> 895                          run_metadata_ptr)
    896       if run_metadata:
    897         proto_data = tf_session.TF_GetBuffer(run_metadata_ptr)

/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/tensorflow/python/client/session.pyc in _run(self, handle, fetches, feed_dict, options, run_metadata)
   1122     if final_fetches or final_targets or (handle and feed_dict_tensor):
   1123       results = self._do_run(handle, final_targets, final_fetches,
-> 1124                              feed_dict_tensor, options, run_metadata)
   1125     else:
   1126       results = []

/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/tensorflow/python/client/session.pyc in _do_run(self, handle, target_list, fetch_list, feed_dict, options, run_metadata)
   1319     if handle is None:
   1320       return self._do_call(_run_fn, self._session, feeds, fetches, targets,
-> 1321                            options, run_metadata)
   1322     else:
   1323       return self._do_call(_prun_fn, self._session, handle, feeds, fetches)

/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/tensorflow/python/client/session.pyc in _do_call(self, fn, *args)
   1338         except KeyError:
   1339           pass
-> 1340       raise type(e)(node_def, op, message)
   1341 
   1342   def _extend_graph(self):

InvalidArgumentError: Assign requires shapes of both tensors to match. lhs shape= [6] rhs shape= [4]
	 [[Node: save/Assign_9 = Assign[T=DT_FLOAT, _class=["loc:@prediction/q/bias"], use_locking=true, validate_shape=true, _device="/job:localhost/replica:0/task:0/cpu:0"](prediction/q/bias, save/RestoreV2_9)]]

Caused by op u'save/Assign_9', defined at:
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/runpy.py", line 174, in _run_module_as_main
    "__main__", fname, loader, pkg_name)
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/runpy.py", line 72, in _run_code
    exec code in run_globals
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/ipykernel_launcher.py", line 16, in <module>
    app.launch_new_instance()
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/traitlets/config/application.py", line 658, in launch_instance
    app.start()
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/ipykernel/kernelapp.py", line 477, in start
    ioloop.IOLoop.instance().start()
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/zmq/eventloop/ioloop.py", line 177, in start
    super(ZMQIOLoop, self).start()
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/tornado/ioloop.py", line 888, in start
    handler_func(fd_obj, events)
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/tornado/stack_context.py", line 277, in null_wrapper
    return fn(*args, **kwargs)
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/zmq/eventloop/zmqstream.py", line 440, in _handle_events
    self._handle_recv()
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/zmq/eventloop/zmqstream.py", line 472, in _handle_recv
    self._run_callback(callback, msg)
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/zmq/eventloop/zmqstream.py", line 414, in _run_callback
    callback(*args, **kwargs)
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/tornado/stack_context.py", line 277, in null_wrapper
    return fn(*args, **kwargs)
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/ipykernel/kernelbase.py", line 283, in dispatcher
    return self.dispatch_shell(stream, msg)
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/ipykernel/kernelbase.py", line 235, in dispatch_shell
    handler(stream, idents, msg)
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/ipykernel/kernelbase.py", line 399, in execute_request
    user_expressions, allow_stdin)
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/ipykernel/ipkernel.py", line 196, in do_execute
    res = shell.run_cell(code, store_history=store_history, silent=silent)
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/ipykernel/zmqshell.py", line 533, in run_cell
    return super(ZMQInteractiveShell, self).run_cell(*args, **kwargs)
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/IPython/core/interactiveshell.py", line 2718, in run_cell
    interactivity=interactivity, compiler=compiler, result=result)
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/IPython/core/interactiveshell.py", line 2822, in run_ast_nodes
    if self.run_code(code, result):
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/IPython/core/interactiveshell.py", line 2882, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-4-6320008d113d>", line 19, in <module>
    agent = Agent(config, env, sess)
  File "dqn/agent.py", line 33, in __init__
    self.build_dqn()
  File "dqn/agent.py", line 340, in build_dqn
    self._saver = tf.train.Saver(self.w.values() + [self.step_op], max_to_keep=30)
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/tensorflow/python/training/saver.py", line 1140, in __init__
    self.build()
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/tensorflow/python/training/saver.py", line 1172, in build
    filename=self._filename)
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/tensorflow/python/training/saver.py", line 688, in build
    restore_sequentially, reshape)
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/tensorflow/python/training/saver.py", line 419, in _AddRestoreOps
    assign_ops.append(saveable.restore(tensors, shapes))
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/tensorflow/python/training/saver.py", line 155, in restore
    self.op.get_shape().is_fully_defined())
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/tensorflow/python/ops/state_ops.py", line 274, in assign
    validate_shape=validate_shape)
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/tensorflow/python/ops/gen_state_ops.py", line 43, in assign
    use_locking=use_locking, name=name)
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/tensorflow/python/framework/op_def_library.py", line 767, in apply_op
    op_def=op_def)
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 2630, in create_op
    original_op=self._default_original_op, op_def=op_def)
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 1204, in __init__
    self._traceback = self._graph._extract_stack()  # pylint: disable=protected-access

InvalidArgumentError (see above for traceback): Assign requires shapes of both tensors to match. lhs shape= [6] rhs shape= [4]
	 [[Node: save/Assign_9 = Assign[T=DT_FLOAT, _class=["loc:@prediction/q/bias"], use_locking=true, validate_shape=true, _device="/job:localhost/replica:0/task:0/cpu:0"](prediction/q/bias, save/RestoreV2_9)]]

What is the problem here? Ideally I should be able to load the model no matter which system, or GPU/CPU I use.

Help！Why the screen outputs images flow of such low quality as well as with no color?

When I set is_train=False and display=True in the code, the screen outputs the image flows with rather low quality and without color. Why? Could someone help me?

Rendering error

When running DQN with --display option getting the following error
Traceback (most recent call last):
File "main.py", line 66, in
tf.app.run()
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/platform/app.py", line 43, in run
sys.exit(main(sys.argv[:1] + flags_passthrough))
File "main.py", line 61, in main
agent.train()
File "/home/savvai/Documents/DQN-tensorflow/dqn/agent.py", line 40, in train
screen, reward, action, terminal = self.env.new_random_game()
File "/home/savvai/Documents/DQN-tensorflow/dqn/environment.py", line 28, in new_random_game
self.new_game(True)
File "/home/savvai/Documents/DQN-tensorflow/dqn/environment.py", line 24, in new_game
self.render()
File "/home/savvai/Documents/DQN-tensorflow/dqn/environment.py", line 60, in render
self.env.render()
File "/usr/local/lib/python2.7/dist-packages/gym/core.py", line 174, in render
return self._render(mode=mode, close=close)
File "/usr/local/lib/python2.7/dist-packages/gym/envs/atari/atari_env.py", line 119, in _render
from gym.envs.classic_control import rendering
File "/usr/local/lib/python2.7/dist-packages/gym/envs/classic_control/rendering.py", line 23, in
from pyglet.gl import *
File "/usr/local/lib/python2.7/dist-packages/pyglet/gl/init.py", line 236, in
import pyglet.window
File "/usr/local/lib/python2.7/dist-packages/pyglet/window/init.py", line 1817, in
gl._create_shadow_window()
File "/usr/local/lib/python2.7/dist-packages/pyglet/gl/init.py", line 205, in _create_shadow_window
_shadow_window = Window(width=1, height=1, visible=False)
File "/usr/local/lib/python2.7/dist-packages/pyglet/window/xlib/init.py", line 163, in init
super(XlibWindow, self).init(*args, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/pyglet/window/init.py", line 505, in init
config = screen.get_best_config(template_config)
File "/usr/local/lib/python2.7/dist-packages/pyglet/canvas/base.py", line 161, in get_best_config
configs = self.get_matching_configs(template)
File "/usr/local/lib/python2.7/dist-packages/pyglet/canvas/xlib.py", line 179, in get_matching_configs
configs = template.match(canvas)
File "/usr/local/lib/python2.7/dist-packages/pyglet/gl/xlib.py", line 29, in match
have_13 = info.have_version(1, 3)
File "/usr/local/lib/python2.7/dist-packages/pyglet/gl/glx_info.py", line 89, in have_version
client = [int(i) for i in client_version.split('.')]
ValueError: invalid literal for int() with base 10: 'None'

git checkout fails on Windows 10

When I clone the repository, I get a message saying that the cloning succeeded but that the checkout failed:

$ git clone https://github.com/devsisters/DQN-tensorflow.git
Cloning into 'DQN-tensorflow'...
remote: Counting objects: 717, done.
remote: Total 717 (delta 0), reused 0 (delta 0), pack-reused 717
Receiving objects: 100% (717/717), 29.57 MiB | 1.26 MiB/s, done.
Resolving deltas: 100% (446/446), done.
fatal: cannot create directory at 'checkpoints/Breakout-v0/min_delta--1/max_delta-1/history_length-4/train_frequency-4/target_q_update_step-10000/memory_size-1000000/action_repeat-4/ep_end_t-1000000/backend-tf/random_start-30/scale-10000/env_type-simple/min_reward--1.0': Filename too long
warning: Clone succeeded, but checkout failed.
You can inspect what was checked out with 'git status'
and retry the checkout with 'git checkout -f HEAD'

Apparently the path length becomes too long. In Windows, there is by default a max limitation of 260 characters for path lengths. It should be possible to turn this limitation off, but doing so doesn't seem to work for me (I even noticed that LongPathsEnabled already was 1 since earlier).

Perhaps not surprisingly, git checkout -f HEAD doesn't work either, and results in a very similar error message:

$ cd DQN-tensorflow/
$ git checkout -f HEAD
fatal: cannot create directory at 'checkpoints/Breakout-v0/min_delta--1/max_delta-1/history_length-4/train_frequency-4/target_q_update_step-10000/memory_size-1000000/action_repeat-4/ep_end_t-1000000/backend-tf/random_start-30/scale-10000': Filename too long

How to run on Windows?

I tried to run from the windows command line using the command in Readme but it says "Import Error: No module named tensorflow." How do I run the program on Windows? I am using Anaconda.

clipped error function is not correct

The function clipped_error is not correctly written. Indeed, the try and except parts are the same !
Currently, it is written as :

def clipped_error(x):
try:
return tf.select(tf.abs(x) < 1.0, 0.5 * tf.square(x), tf.abs(x) - 0.5)
except:
return tf.where(tf.abs(x) < 1.0, 0.5 * tf.square(x), tf.abs(x) - 0.5)

The function should be

def clipped_error(x):
return tf.where(tf.abs(x) < 1.0, 0.5 * tf.square(x), tf.abs(x) - 0.5)

Visualizing the trained data

I've tried several times to display trained data. But i cant get the
even though the algorithm works well(

) and I can only see the empty screen(

)
Plz give some pieces of advice :)

error memory

When i do :
python main.py --is_train=False --display=True --use_gpu=False

I get :
[*] GPU : 1.0000 [2018-05-23 17:17:55,692] Making new env: Breakout-v0 {'_save_step': 500000, '_test_step': 50000, 'action_repeat': 4, 'backend': 'tf', 'batch_size': 32, 'cnn_format': 'NHWC', 'discount': 0.99, 'display': True, 'double_q': False, 'dueling': False, 'env_name': 'Breakout-v0', 'env_type': 'detail', 'ep_end': 0.1, 'ep_end_t': 1000000, 'ep_start': 1.0, 'history_length': 4, 'learn_start': 50000.0, 'learning_rate': 0.00025, 'learning_rate_decay': 0.96, 'learning_rate_decay_step': 50000, 'learning_rate_minimum': 0.00025, 'max_delta': 1, 'max_reward': 1.0, 'max_step': 50000000, 'memory_size': 1000000, 'min_delta': -1, 'min_reward': -1.0, 'model': 'm1', 'random_start': 30, 'scale': 10000, 'screen_height': 84, 'screen_width': 84, 'target_q_update_step': 10000, 'train_frequency': 4} Traceback (most recent call last): File "main.py", line 70, in tf.app.run() File "/Tuto_DQN/env/local/lib/python2.7/site-packages/tensorflow/python/platform/app.py", line 43, in run sys.exit(main(sys.argv[:1] + flags_passthrough)) File "main.py", line 62, in main agent = Agent(config, env, sess) File "/Tuto_DQN/tuto_dqn/DQN-tensorflow/dqn/agent.py", line 23, in init self.memory = ReplayMemory(self.config, self.model_dir) File "/Tuto_DQN/tuto_dqn/DQN-tensorflow/dqn/replay_memory.py", line 18, in init self.screens = np.empty((self.memory_size, config.screen_height, config.screen_width), dtype = np.float16) MemoryError
I installed all the dependencies according to the issue #44 Add requirements.txt or alternative
I am running it on my laptop which is a samsung series 7 ultra notebook

Could someone advise me on how to overcome this issue? any comment would be highly appreciated
Thanks a lot!!

about starting a new game and History

in dqn/agent.py line 59

  if terminal:
    screen, reward, action, terminal = self.env.new_random_game()

when starting a new game due to a terminal state.

why we don't need to reset the self.history?

because it would affect the next iteration.

  # 1. predict
  action = self.predict(self.history.get())
  # 2. act
  screen, reward, terminal = self.env.act(action, is_training=True)
  # 3. observe
  self.observe(screen, reward, action, terminal)

the predicted action for self.history.get() is not depending on the current game screens, it will predict action for the previous game screen, which is ended, instead.

Do I miss anything?

Thank you very much.

Why this implement don't using the original preprocess?

I searched for preprocess and couldn't find any thing. Am I missing something? Or is there a reason about that? I notice that the imlement of ReplayMemory is different from original description in the article.

I want to make my own ale models for DQN Reinforce Learning using my video file input.

Hi,

I want to make my own ale models for DQN Reinforce Learning using my video file input.

In video images, there are class labels in the corner for supervised training.
How can I make my own ALE model?

Thank you in advances.

README points to deleted paper

Following link in the README points to an empty page.

Tensorflow implementation of Human-Level Control through Deep Reinforcement Learning.

Maybe you could upload the PDF in this repo?

I got error when I tried to run it with trained data and display-on

Hi!
First of all, I really thank you for your share!

I set all the setting that needed to run it.

I could implement the process to train a model for breakout
(python main.py --env_name=Breakout-v0 --is_train=True --display=True)

However, When i tried to run it as test and record
(python main.py --is_train=False --display=True)

I've got this error:

InvalidArgumentError (see above for traceback): CPU BiasOp only supports NHWC.
[[Node: prediction/l1/BiasAdd = BiasAdd[T=DT_FLOAT, data_format="NCHW", _device="/job:localhost/replica:0/task:0/cpu:0"](prediction/l1/Conv2D, prediction/l1/biases/read)]]

Could you please help me out?

Error after 50,000 iterations with Alien-v0 environment running on my MacOS Sierra

This is the log of the events. What should I do? Thanks!

iMac:DQN-tensorflow shyamalsuhanachandra$ python main.py --env_name=Alien-v0 --is_train=True --display=True
 [*] GPU : 1.0000
[2016-09-27 17:28:27,334] Making new env: Alien-v0
{'_save_step': 500000,
 '_test_step': 50000,
 'action_repeat': 4,
 'backend': 'tf',
 'batch_size': 32,
 'cnn_format': 'NCHW',
 'discount': 0.99,
 'display': True,
 'double_q': False,
 'dueling': False,
 'env_name': 'Alien-v0',
 'env_type': 'detail',
 'ep_end': 0.1,
 'ep_end_t': 1000000,
 'ep_start': 1.0,
 'history_length': 4,
 'learn_start': 50000.0,
 'learning_rate': 0.00025,
 'learning_rate_decay': 0.96,
 'learning_rate_decay_step': 50000,
 'learning_rate_minimum': 0.00025,
 'max_delta': 1,
 'max_reward': 1.0,
 'max_step': 50000000,
 'memory_size': 1000000,
 'min_delta': -1,
 'min_reward': -1.0,
 'model': 'm1',
 'random_start': 30,
 'scale': 10000,
 'screen_height': 84,
 'screen_width': 84,
 'target_q_update_step': 10000,
 'train_frequency': 4}
 [*] Loading checkpoints...
 [!] Load FAILED: checkpoints/Alien-v0/min_delta--1/max_delta-1/history_length-4/train_frequency-4/target_q_update_step-10000/double_q-False/memory_size-1000000/action_repeat-4/ep_end_t-1000000/dueling-False/min_reward--1.0/backend-tf/random_start-30/scale-10000/env_type-detail/learning_rate_decay_step-50000/ep_start-1.0/screen_width-84/learn_start-50000.0/cnn_format-NCHW/learning_rate-0.00025/batch_size-32/discount-0.99/max_step-50000000/max_reward-1.0/learning_rate_decay-0.96/learning_rate_minimum-0.00025/env_name-Alien-v0/ep_end-0.1/model-m1/screen_height-84/
2016-09-27 17:28:28.996 Python[26135:3913383] ApplePersistenceIgnoreState: Existing state will not be touched. New state will be written to /var/folders/m1/b_t9_2151y30ryvtr_2gznch0000gp/T/org.python.python.savedState
  0%|                    | 49999/50000000 [14:17<235:26:28, 58.93it/s]E tensorflow/core/common_runtime/executor.cc:334] Executor failed to create kernel. Invalid argument: CPU BiasOp only supports NHWC.
     [[Node: target/target_l1/BiasAdd = BiasAdd[T=DT_FLOAT, data_format="NCHW", _device="/job:localhost/replica:0/task:0/cpu:0"](target/target_l1/Conv2D, target/target_l1/biases/read)]]

Traceback (most recent call last):
  File "main.py", line 66, in <module>
    tf.app.run()
  File "/usr/local/lib/python2.7/site-packages/tensorflow/python/platform/app.py", line 30, in run
    sys.exit(main(sys.argv))
  File "main.py", line 61, in main
    agent.train()
  File "/Users/shyamalsuhanachandra/DQN-tensorflow/dqn/agent.py", line 56, in train
    self.observe(screen, reward, action, terminal)
  File "/Users/shyamalsuhanachandra/DQN-tensorflow/dqn/agent.py", line 135, in observe
    self.q_learning_mini_batch()
  File "/Users/shyamalsuhanachandra/DQN-tensorflow/dqn/agent.py", line 157, in q_learning_mini_batch
    q_t_plus_1 = self.target_q.eval({self.target_s_t: s_t_plus_1})
  File "/usr/local/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 559, in eval
    return _eval_using_default_session(self, feed_dict, self.graph, session)
  File "/usr/local/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 3656, in _eval_using_default_session
    return session.run(tensors, feed_dict)
  File "/usr/local/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 710, in run
    run_metadata_ptr)
  File "/usr/local/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 908, in _run
    feed_dict_string, options, run_metadata)
  File "/usr/local/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 958, in _do_run
    target_list, options, run_metadata)
  File "/usr/local/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 978, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors.InvalidArgumentError: CPU BiasOp only supports NHWC.
     [[Node: target/target_l1/BiasAdd = BiasAdd[T=DT_FLOAT, data_format="NCHW", _device="/job:localhost/replica:0/task:0/cpu:0"](target/target_l1/Conv2D, target/target_l1/biases/read)]]
Caused by op u'target/target_l1/BiasAdd', defined at:
  File "main.py", line 66, in <module>
    tf.app.run()
  File "/usr/local/lib/python2.7/site-packages/tensorflow/python/platform/app.py", line 30, in run
    sys.exit(main(sys.argv))
  File "main.py", line 58, in main
    agent = Agent(config, env, sess)
  File "/Users/shyamalsuhanachandra/DQN-tensorflow/dqn/agent.py", line 29, in __init__
    self.build_dqn()
  File "/Users/shyamalsuhanachandra/DQN-tensorflow/dqn/agent.py", line 240, in build_dqn
    32, [8, 8], [4, 4], initializer, activation_fn, self.cnn_format, name='target_l1')
  File "/Users/shyamalsuhanachandra/DQN-tensorflow/dqn/ops.py", line 25, in conv2d
    out = tf.nn.bias_add(conv, b, data_format)
  File "/usr/local/lib/python2.7/site-packages/tensorflow/python/ops/nn_ops.py", line 391, in bias_add
    return gen_nn_ops._bias_add(value, bias, data_format=data_format, name=name)
  File "/usr/local/lib/python2.7/site-packages/tensorflow/python/ops/gen_nn_ops.py", line 279, in _bias_add
    data_format=data_format, name=name)
  File "/usr/local/lib/python2.7/site-packages/tensorflow/python/framework/op_def_library.py", line 703, in apply_op
    op_def=op_def)
  File "/usr/local/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 2317, in create_op
    original_op=self._default_original_op, op_def=op_def)
  File "/usr/local/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 1239, in __init__
    self._traceback = _extract_stack()

UnboundLocalError: local variable 'avg_ep_reward' referenced before assignment

When I run training with python main.py --env_name=Breakout-v0 --is_train=True --display=True --cpu=True , I got this output after a couple of training episodes:

python main.py --env_name=Breakout-v0 --is_train=True --display=True --cpu=True
[_] GPU : 0.5000
[2016-05-20 17:00:38,585] Making new env: Breakout-v0
{'_save_step': 50000,
'test_step': 10000,
'action_repeat': 4,
'backend': 'tf',
'batch_size': 32,
'cnn_format': 'NHWC',
'discount': 0.99,
'display': True,
'env_name': 'Breakout-v0',
'env_type': 'simple',
'ep_end': 0.1,
'ep_end_t': 1000000,
'ep_start': 1.0,
'history_length': 4,
'learn_start': 50000.0,
'learning_rate': 0.00025,
'max_delta': 1,
'max_reward': 1.0,
'max_step': 50000000,
'memory_size': 1000000,
'min_delta': -1,
'min_reward': -1.0,
'model': 'm2',
'random_start': 30,
'scale': 10000,
'screen_height': 84,
'screen_width': 84,
'target_q_update_step': 10000,
'train_frequency': 4}
[] Loading checkpoints...
[!] Load FAILED: checkpoints/Breakout-v0/min_delta--1/max_delta-1/history_length-4/train_frequency-4/target_q_update_step-10000/memory_size-1000000/action_repeat-4/ep_end_t-1000000/backend-tf/random_start-30/scale-10000/env_type-simple/min_reward--1.0/ep_start-1.0/screen_width-84/learn_start-50000.0/cnn_format-NHWC/learning_rate-0.00025/batch_size-32/discount-0.99/max_reward-1.0/max_step-50000000/env_name-Breakout-v0/ep_end-0.1/model-m2/screen_height-84/
2016-05-20 17:00:40.195 Python[25567:405995] ApplePersistenceIgnoreState: Existing state will not be touched. New state will be written to /var/folders/t0/tw1pt8nn5xv2ykn_4tmnxg5m0000gn/T/org.python.python.savedState
0%| | 49978/50000000 [02:47<39:09:30, 354.33it/s]
Traceback (most recent call last):
File "main.py", line 63, in
tf.app.run()
File "/usr/local/lib/python2.7/site-packages/tensorflow/python/platform/app.py", line 30, in run
sys.exit(main(sys.argv))
File "main.py", line 58, in main
agent.train()
File "/Users/x0r/Documents/codes/DQN-tensorflow/dqn/agent.py", line 110, in train
if max_avg_ep_reward >= avg_ep_reward * 0.9:
UnboundLocalError: local variable 'avg_ep_reward' referenced before assignment

unsupported operand type(s) for +: 'dict_values' and 'list

When I use python3.6 to implement the program, the error shows :

Traceback (most recent call last):
File "main.py", line 70, in
tf.app.run()
File "/home/tanggy/anaconda3/lib/python3.6/site-packages/tensorflow/python/platform/app.py", line 126, in run
_sys.exit(main(argv))
File "main.py", line 62, in main
agent = Agent(config, env, sess)
File "/home/tanggy/Downloads/DQN-tensorflow-master/dqn/agent.py", line 30, in init
self.build_dqn()
File "/home/tanggy/Downloads/DQN-tensorflow-master/dqn/agent.py", line 328, in build_dqn
self._saver = tf.train.Saver(self.w.values() + [self.step_op], max_to_keep=30)
TypeError: unsupported operand type(s) for +: 'dict_values' and 'list

Add trained Snapshots

Please add trained Snapshots of one or two atari roms.

what's the difference between simple game and detail game

model M2 missing in config

I only find M1 in config.py, can you push M2? many thanks!

KeyError: '__flags'

After following the install instructions and running
python main.py --env_name=Breakout-v0 --is_train=True
I receive the following error

Traceback (most recent call last):
  File "main.py", line 70, in <module>
    tf.app.run()
  File "/home/jdmartin86/anaconda2/lib/python2.7/site-packages/tensorflow/python/platform/app.py", line 126, in run
    _sys.exit(main(argv))
  File "main.py", line 49, in main
    config = get_config(FLAGS) or FLAGS
  File "/home/jdmartin86/sandbox/test-qlearn/DQN-tensorflow/config.py", line 58, in get_config
    for k, v in FLAGS.__dict__['__flags'].items():
KeyError: '__flags'

Name "reduce" is not defined

I am running Windows 10 on a 2017 MacBook Pro using BootCamp so as far as I know I can't use gpu

When I run python main.py --env_name=Breakout-v0 --is_train=True --display=True --use_gpu=False I get this output

 [*] GPU : 1.0000
2018-01-06 17:07:58.156417: I C:\tf_jenkins\home\workspace\rel-win\M\windows\PY\36\tensorflow\core\platform\cpu_feature_guard.cc:137] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX AVX2
{'_save_step': 500000,
 '_test_step': 50000,
 'action_repeat': 4,
 'backend': 'tf',
 'batch_size': 32,
 'cnn_format': 'NHWC',
 'discount': 0.99,
 'display': True,
 'double_q': False,
 'dueling': False,
 'env_name': 'Breakout-v0',
 'env_type': 'detail',
 'ep_end': 0.1,
 'ep_end_t': 1000000,
 'ep_start': 1.0,
 'history_length': 4,
 'learn_start': 50000.0,
 'learning_rate': 0.00025,
 'learning_rate_decay': 0.96,
 'learning_rate_decay_step': 50000,
 'learning_rate_minimum': 0.00025,
 'max_delta': 1,
 'max_reward': 1.0,
 'max_step': 50000000,
 'memory_size': 1000000,
 'min_delta': -1,
 'min_reward': -1.0,
 'model': 'm1',
 'random_start': 30,
 'scale': 10000,
 'screen_height': 84,
 'screen_width': 84,
 'target_q_update_step': 10000,
 'train_frequency': 4}
Traceback (most recent call last):
  File "main.py", line 70, in <module>
    tf.app.run()
  File "C:\Users\Unicoranium\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow\python\platform\app.py", line 48, in run
    _sys.exit(main(_sys.argv[:1] + flags_passthrough))
  File "main.py", line 62, in main
    agent = Agent(config, env, sess)
  File "C:\Users\Unicoranium\Desktop\MachineLearning\DQN-tensorflow\DQN-tensorflow-master\dqn\agent.py", line 30, in __init__
    self.build_dqn()
  File "C:\Users\Unicoranium\Desktop\MachineLearning\DQN-tensorflow\DQN-tensorflow-master\dqn\agent.py", line 201, in build_dqn
    self.l3_flat = tf.reshape(self.l3, [-1, reduce(lambda x, y: x * y, shape[1:])])
NameError: name 'reduce' is not defined```

How can I fix it?

can not reproduce experiment shown in figure

Hi, can you share a configuration that can reproduce the results you showed on the figure?
I run the default M1 configuration and only get average episodic reward at around 3.

I tried to change the configurations like setting action_repeat = 4, change learning_rate, add double_q and duel_q, there is no much change.

Many thanks!

The Gym envirment has bug so the training does't give good reward

The best reward would be 30, that all. But by replacing Gym with ROM directly, the output would be very different, very stable reward around 300~400

I dont' know exactly what's wrong with Gym

CUDA_ERROR_OUT_OF_MEMOR

name: GeForce GTX 1070 major: 6 minor: 1 memoryClockRate(GHz): 1.7465
pciBusID: 0000:01:00.0
totalMemory: 8.00GiB freeMemory: 6.63GiB
2018-04-09 16:56:08.725121: I T:\src\github\tensorflow\tensorflow\core\common_runtime\gpu\gpu_device.cc:1423] Adding visible gpu devices: 0
2018-04-09 16:56:09.233091: I T:\src\github\tensorflow\tensorflow\core\common_runtime\gpu\gpu_device.cc:911] Device interconnect StreamExecutor with strength 1 edge matrix:
2018-04-09 16:56:09.236868: I T:\src\github\tensorflow\tensorflow\core\common_runtime\gpu\gpu_device.cc:917] 0
2018-04-09 16:56:09.239291: I T:\src\github\tensorflow\tensorflow\core\common_runtime\gpu\gpu_device.cc:930] 0: N
2018-04-09 16:56:09.242389: I T:\src\github\tensorflow\tensorflow\core\common_runtime\gpu\gpu_device.cc:1041] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 8192 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1070, pci bus id: 0000:01:00.0, compute capability: 6.1)
2018-04-09 16:56:09.249227: E T:\src\github\tensorflow\tensorflow\stream_executor\cuda\cuda_driver.cc:936] failed to allocate 8.00G (8589934592 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY
2018-04-09 16:56:09.253673: E T:\src\github\tensorflow\tensorflow\stream_executor\cuda\cuda_driver.cc:936] failed to allocate 7.20G (7730940928 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY
Traceback (most recent call last):
File "main.py", line 75, in
tf.app.run()
File "D:\ProgramData\Anaconda3\envs\tensorflow\lib\site-packages\tensorflow\python\platform\app.py", line 126, in run
_sys.exit(main(argv))
File "main.py", line 53, in main
config = get_config(FLAGS) or FLAGS
File "D:\tools\dqq\DQN-tensorflow-master\config.py", line 58, in get_config

How to generalize to other Atari Games?

Hi i want to run your code to train agent on other games.
Is there any code or hyperparameter that needs to be changed in order to train a nice agent ?

How can you see the results during the run?

or how did you plot all of your figures

Python 3.x compatibility issues

Code requires:
from functools import reduce #py2.7 reduce > py3.x functools.reduce
s/xrange/range #py2.7 xrange > py3.x range

Issue after those are fixed:

Traceback (most recent call last):
  File "main.py", line 70, in <module>
    tf.app.run()
  File "/usr/local/anaconda3/lib/python3.6/site-packages/tensorflow/python/platform/app.py", line 48, in run
    _sys.exit(main(_sys.argv[:1] + flags_passthrough))
  File "main.py", line 62, in main
    agent = Agent(config, env, sess)
  File "/Users/emmanuel.mwangi/open_source/laml/DQN-tensorflow/dqn/agent.py", line 31, in __init__
    self.build_dqn()
  File "/Users/emmanuel.mwangi/open_source/laml/DQN-tensorflow/dqn/agent.py", line 329, in build_dqn
    self._saver = tf.train.Saver(self.w.values() + [self.step_op], max_to_keep=30)
TypeError: unsupported operand type(s) for +: 'dict_values' and 'list'

environment.py에서 detailed mode가 목숨이 1개인 문제

Detailed mode의 경우 목숨이 1개 줄어들면 에피소드가 끝나도록 되어있어서 simple mode와 점수가 5배 이상 차이나게 됩니다.
Detailed mode에서 목숨이 줄어든 것으로 바로 terminal=True를 주면 바로 새 랜덤 게임을 실행하게 되는 것이 문제입니다.
그래서 그래프를 보면 simple mode가 detailed mode보다 좋은 것처럼 보이지만 실은 둘을 비교해서는 안되는 조건 하에 있는 것입니다. 게다가 M2(purple)이 step=1M 쯤에서부터 안보이는군요. 그래프를 잘못해석할 여지가 있다고 봅니다.

Segmentation fault (core dumped) | MemoryError

"Segmentation fault (core dumped)" while trying to run it.

I have no GPU configured with tensorflow. I suspect thats the reason. Is there any way to make it work just with the CPU?

Tried a couple of flags, but they didn't work.
python main.py --env_name=Breakout-v0 --is_train=True --display=True --cpu=True