Implementations from the free course Deep Reinforcement Learning with Tensorflow and PyTorch

Home Page: http://www.simoninithomas.com/deep-rl-course

Jupyter Notebook 91.85% Python 8.14% Shell 0.01%

a2c actor-critic deep-learning deep-q-learning deep-q-network deep-reinforcement-learning ppo pytorch qlearning tensorflow tensorflow-tutorials unity

deep_reinforcement_learning_course's People

Contributors

Stargazers

Watchers

Forkers

dbverstr5324 advaitha kartheekpnsn moshebeutel meanmachine1031 berryhn awinnie chaochaolu cugblw caoxu915683474 beeva-franciscollaneza oneinprocess hanahimi brucekyle99 wn9081 hengguan nickatomlin cif2cif lekonstantine keremaslan mgutierrez09 pandinosaurus nobbyhawk raoshashank alibaheri 6zzy allawn goel42 tjujianyu nitneuqr whitish zy-peng xingyuzhang2018 keeeeenw anu-bioinfo anni413616 aiedward sagarkukreja kurnianggoro qingyuanxingsi datafeeds thejanw reiisky lukaschen1986 maximuswudy mukulpai rebcabin collector-m huzichunjohn jsztompka 201528014227051 yashwantreddy arunkumarramanan mayunagupta cpuyyp dgscharan kiran-kg aryancodify c1a1o1 airobotgui pwadlington georgefuxiaosong tp227 awenare mscandizzo onlinesen vadimseliverstov psusmit dawnzhang sanyaade-machine-learning jmorrow1000 dinghe aboussetta megawind jbarsce jhird wesszumino allensmile shiyongde gzzgz wwwanghao yushu-liu jdc08161063 little1tow liuliang03 frfy max777888 gtatarkin qunyex spec2e raulvigo zbn123 cherryl411 theophylline jjouganous azizimohammed thomas0804 amughalbscs16 atwine krishnachaitanya1

deep_reinforcement_learning_course's Issues

Sonic A2C not working for Pong

I'm trying to test whether the A2C code for Sonic could be used to train an agent on another environment. I replaced the Sonic environments with 8 copies of Pong, and I varied up the number of epochs and mini batches and nsteps, but no matter what, I could not get it to learn Pong. Is there a reason this implementation won't train on Pong? Am I missing some important parameter? Could you test it for yourself and let me know? All I had to do was change the environments in agent.py with a Pong make_env() that used frame stacking and preprocessing.

System hangs after running a while

This may be related to memory issues that have been posted. I was running breakout and after a while it just hangs. When I try to restart from last checkpoint it does work but goes right into hang mode. was running RND Montezum but changed to run Breakout. I've attached compressed file of saved models

example output:


[Episode 707(3)] Step: 460  Reward: 16.0  Recent Reward: 17.17  Visited Room: [{}]
[Episode 705(1)] Step: 497  Reward: 17.0  Recent Reward: 16.69  Visited Room: [{}]
[Episode 702(2)] Step: 558  Reward: 12.0  Recent Reward: 16.64  Visited Room: [{}]
[Episode 692(5)] Step: 411  Reward: 15.0  Recent Reward: 16.45  Visited Room: [{}]
[Episode 729(0)] Step: 526  Reward: 18.0  Recent Reward: 16.85  Visited Room: [{}]
[Episode 685(4)] Step: 654  Reward: 16.0  Recent Reward: 16.24  Visited Room: [{}]
[Episode 703(2)] Step: 467  Reward: 16.0  Recent Reward: 16.7  Visited Room: [{}]
[Episode 728(6)] Step: 569  Reward: 19.0  Recent Reward: 17.59  Visited Room: [{}]
[Episode 708(3)] Step: 506  Reward: 17.0  Recent Reward: 17.09  Visited Room: [{}]
[Episode 706(1)] Step: 487  Reward: 16.0  Recent Reward: 16.61  Visited Room: [{}]
[Episode 730(0)] Step: 412  Reward: 11.0  Recent Reward: 16.86  Visited Room: [{}]
[Episode 693(5)] Step: 478  Reward: 16.0  Recent Reward: 16.42  Visited Room: [{}]
[Episode 707(1)] Step: 414  Reward: 15.0  Recent Reward: 16.67  Visited Room: [{}]
[Episode 729(6)] Step: 447  Reward: 16.0  Recent Reward: 17.6  Visited Room: [{}]
[Episode 709(3)] Step: 457  Reward: 16.0  Recent Reward: 17.15  Visited Room: [{}]
[Episode 686(4)] Step: 591  Reward: 20.0  Recent Reward: 16.25  Visited Room: [{}]
[Episode 704(2)] Step: 582  Reward: 19.0  Recent Reward: 16.72  Visited Room: [{}]
[Episode 694(5)] Step: 362  Reward: 10.0  Recent Reward: 16.44  Visited Room: [{}]
[Episode 731(0)] Step: 403  Reward: 15.0  Recent Reward: 16.82  Visited Room: [{}]
[Episode 730(6)] Step: 361  Reward: 6.0  Recent Reward: 17.52  Visited Room: [{}]
[Episode 708(1)] Step: 503  Reward: 17.0  Recent Reward: 16.75  Visited Room: [{}]
Starting Checkpoint: 
Num Step:  128
Now Global Step :3136000
Ending Checkpoint: 
[Episode 695(5)] Step: 437  Reward: 15.0  Recent Reward: 16.34  Visited Room: [{}]
[Episode 687(4)] Step: 538  Reward: 17.0  Recent Reward: 16.31  Visited Room: [{}]
[Episode 710(3)] Step: 604  Reward: 18.0  Recent Reward: 17.15  Visited Room: [{}]
[Episode 705(2)] Step: 4502  Reward: 6.0  Recent Reward: 16.62  Visited Room: [{}]
[Episode 732(0)] Step: 4502  Reward: 5.0  Recent Reward: 16.7  Visited Room: [{}]
[Episode 731(6)] Step: 4502  Reward: 0.0  Recent Reward: 17.3  Visited Room: [{}]
[Episode 709(1)] Step: 4502  Reward: 2.0  Recent Reward: 16.58  Visited Room: [{}]
[Episode 696(5)] Step: 4502  Reward: 2.0  Recent Reward: 16.2  Visited Room: [{}]
[Episode 688(4)] Step: 4502  Reward: 2.0  Recent Reward: 16.11  Visited Room: [{}]
[Episode 711(3)] Step: 4502  Reward: 2.0  Recent Reward: 16.99  Visited Room: [{}]
[Episode 706(2)] Step: 4502  Reward: 0.0  Recent Reward: 16.45  Visited Room: [{}]
[Episode 733(0)] Step: 4502  Reward: 0.0  Recent Reward: 16.53  Visited Room: [{}]
[Episode 732(6)] Step: 4502  Reward: 0.0  Recent Reward: 17.05  Visited Room: [{}]
[Episode 710(1)] Step: 4502  Reward: 0.0  Recent Reward: 16.42  Visited Room: [{}]
[Episode 697(5)] Step: 4502  Reward: 0.0  Recent Reward: 16.06  Visited Room: [{}]
[Episode 689(4)] Step: 4502  Reward: 0.0  Recent Reward: 16.01  Visited Room: [{}]
[Episode 712(3)] Step: 4502  Reward: 0.0  Recent Reward: 16.92  Visited Room: [{}]
Starting Checkpoint: 
Num Step:  128
Now Global Step :3225600
Ending Checkpoint: 
[Episode 707(2)] Step: 4502  Reward: 0.0  Recent Reward: 16.24  Visited Room: [{}]
[Episode 734(0)] Step: 4502  Reward: 0.0  Recent Reward: 16.27  Visited Room: [{}]
[Episode 733(6)] Step: 4502  Reward: 0.0  Recent Reward: 16.86  Visited Room: [{}]
[Episode 711(1)] Step: 4502  Reward: 0.0  Recent Reward: 16.27  Visited Room: [{}]
[Episode 698(5)] Step: 4502  Reward: 0.0  Recent Reward: 15.89  Visited Room: [{}]
[Episode 690(4)] Step: 4502  Reward: 0.0  Recent Reward: 15.75  Visited Room: [{}]
[Episode 713(3)] Step: 4502  Reward: 0.0  Recent Reward: 16.73  Visited Room: [{}]
[Episode 708(2)] Step: 4502  Reward: 0.0  Recent Reward: 16.07  Visited Room: [{}]
[Episode 735(0)] Step: 4502  Reward: 0.0  Recent Reward: 16.11  Visited Room: [{}]
[Episode 734(6)] Step: 4502  Reward: 0.0  Recent Reward: 16.71  Visited Room: [{}]
[Episode 712(1)] Step: 4502  Reward: 0.0  Recent Reward: 16.09  Visited Room: [{}]
[Episode 699(5)] Step: 4502  Reward: 0.0  Recent Reward: 15.72  Visited Room: [{}]
[Episode 691(4)] Step: 4502  Reward: 0.0  Recent Reward: 15.58  Visited Room: [{}]
[Episode 714(3)] Step: 4502  Reward: 0.0  Recent Reward: 16.54  Visited Room: [{}]
[Episode 709(2)] Step: 4502  Reward: 0.0  Recent Reward: 15.94  Visited Room: [{}]
[Episode 736(0)] Step: 4502  Reward: 0.0  Recent Reward: 15.97  Visited Room: [{}]
[Episode 735(6)] Step: 4502  Reward: 0.0  Recent Reward: 16.49  Visited Room: [{}]
[Episode 713(1)] Step: 4502  Reward: 0.0  Recent Reward: 15.87  Visited Room: [{}]
[Episode 700(5)] Step: 4502  Reward: 0.0  Recent Reward: 15.43  Visited Room: [{}]
[Episode 692(4)] Step: 4502  Reward: 0.0  Recent Reward: 15.34  Visited Room: [{}]
[Episode 715(3)] Step: 4502  Reward: 0.0  Recent Reward: 16.39  Visited Room: [{}]
Starting Checkpoint: 
Num Step:  128
Now Global Step :3315200

models.tar.gz

No usage of stacked_frames

I'm not really familiar with Python notebooks but from the code that's visible in both the Doom and the Space Invaders code, it seems that the stacked_frames are only updated but never used as input (in the memory only the actual, not even preprocessed, states/frames are being stored).

Couldn't we just store the stacked_frames instead of the frames and the rest (action, reward, next_state, done) can stay the same (the next_state could also be preprocessed, I guess)?
Gonna try it, once I find the time.

Q Learning with FrozenLake Step 4: The Q learning algorithm

I have a trouble in the code at 41th line "episode += 1" , why does episode need to +1 here?

error in the last line

print ("Score over time: " + str(sum(reward/total_test_episode)))
TypeError: 'int' object is not iterable

Step 4 returns all zeros

When reproducing the code, on step 4 I got all zeros for qtable after 10000 episodes.

numerical stability of preprocessing function

Hi Simon,

Thank you for your tutorial. Are there any other ways to normalize the discounted reward if there's a good chance that the agent will not receive any reward during the beginning stages of training?

For example, (x - x.mean) / x.std will blow up if the agent does not receive any reward in an episode. Thanks for your help.

DQN Atari Space Invaders issue

I think that my training part is't working.
Here is the picture.

There is no error here.

Game not found

When i try to run the sample code of Deep Q Learning DQN Atari Space Invaders.ipynb
I got the error message like Game not found: SpaceInvaders-Atari2600.
where can i download the SpaceInvaders-Atari2600 ROM file
thank you

Possible mistake in Deep Q Learning Space Invaders notebook

Hey. Shouldn't self.Q = tf.reduce_sum(tf.multiply(self.output, self.actions_)) in DQN class be self.Q = tf.reduce_sum(tf.multiply(self.output, self.actions_), axis=1), i.e. reduced along columns so that the output length of self.Q is equal to the batch size? If not then self.Q will be a scalar while self.target_Q will be a vector of batch size length.

reset not called in MaxAndSkipEnv

reset not called in MaxAndSkipEnv..must be specified as:

    def reset(self):
        return self.env.reset()

without args.

Deep Q-Learning: ValueError: zero-size array to reduction operation minimum which has no identity

When I run cells of "Deep Q learning with Doom.ipynb" in Deep Q-Learning part, error occurs in the 11th cell (Step 6). Here is the info:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-11-289a4b5c1934> in <module>()
     10         # First we need a state
     11         state = game.get_state().screen_buffer
---> 12         state, stacked_frames = stack_frames(stacked_frames, state, True)
     13 
     14     # Random action

<ipython-input-6-bf05256e1e9f> in stack_frames(stacked_frames, state, is_new_episode)
      6 def stack_frames(stacked_frames, state, is_new_episode):
      7     # Preprocess frame
----> 8     frame = preprocess_frame(state)
      9 
     10     if is_new_episode:

<ipython-input-5-90faeed1717d> in preprocess_frame(frame)
     32 
     33     # Resize
---> 34     preprocessed_frame = transform.resize(normalized_frame, [84,84])
     35 
     36     return preprocessed_frame

D:\Anaconda3\lib\site-packages\skimage\transform\_warps.py in resize(image, output_shape, order, mode, cval, clip, preserve_range)
    133         out = warp(image, tform, output_shape=output_shape, order=order,
    134                    mode=mode, cval=cval, clip=clip,
--> 135                    preserve_range=preserve_range)
    136 
    137     return out

D:\Anaconda3\lib\site-packages\skimage\transform\_warps.py in warp(image, inverse_map, map_args, output_shape, order, mode, cval, clip, preserve_range)
    817                                      mode=ndi_mode, order=order, cval=cval)
    818 
--> 819     _clip_warp_output(image, warped, order, mode, cval, clip)
    820 
    821     return warped

D:\Anaconda3\lib\site-packages\skimage\transform\_warps.py in _clip_warp_output(input_image, output_image, order, mode, cval, clip)
    568     """
    569     if clip and order != 0:
--> 570         min_val = input_image.min()
    571         max_val = input_image.max()
    572 

D:\Anaconda3\lib\site-packages\numpy\core\_methods.py in _amin(a, axis, out, keepdims)
     27 
     28 def _amin(a, axis=None, out=None, keepdims=False):
---> 29     return umr_minimum(a, axis, None, out, keepdims)
     30 
     31 def _sum(a, axis=None, dtype=None, out=None, keepdims=False):

ValueError: zero-size array to reduction operation minimum which has no identity

I searched for that, and people say

This error happens when you have an empty mask (all zeros) https://github.com/matterport/Mask_RCNN/issues/47

I don't know how to fix it. So I ask you for help.
Thank you very much!

Logging for graphing

The logging method used here is the Baselines logger, and it looks like the output of this logger (in the A2C Sonic code) is in training.txt. Is there a way to modify the logger code in model.py to output to a specific file (such as training.txt) and moreover, to do so in a format more accessible for graphing, such as CSV?

Playing a different Sonic level?

In the A2C Sonic example, how would I play a trained agent on a different level of Sonic? i.e. how would I change the environment and state after having trained the agent on one such that I can evaluate it on another?

Is there a way to train on multiple (different) environments, which is what I believe they did in the retro contest? I would basically like to train on a SET of environments and then evaluate performance of the trained agent on another set.

Space Invaders Training

Hi, I'm trying to train the Space Invaders DQN Atari Space Invaders.ipynb
I think there is a typo or forgotten to remove of a list in the training cell which is
rewards_list.append((episode, total_reward))

Just I want to double check cause when I commented it the core runs and now I'm still training it

Thank You

Deep Q learning with Doom, last part

I tried testing the code and it crashes after the 500 episodes of training from the last part.
For state = stack_frames(stacked_frames, frame) did you forget a True or False at the end? And possibly the LHS should be state, stacked_frames too?

Traceback (most recent call last):
File "Deep Q-learning with Doom.py", line 530, in
state = stack_frames(stacked_frames, frame)
TypeError: stack_frames() missing 1 required positional argument: 'is_new_episode'

Unsure about the LHS, but I tried both True and False myself. But it then crashes on the next line too.

Traceback (most recent call last):
File "Deep Q-learning with Doom.py", line 532, in
Qs = sess.run(DQNetwork.output, feed_dict = {DQNetwork.inputs_: state.reshape((1, *state.shape))})
AttributeError: 'tuple' object has no attribute 'reshape'

I converted the .ipynb to one python file first. Hope that didn't cause any problems.

can't open article link

Unresolved reference 'sess'

DEEP Q LEARNING : SPACE INVADER Game
sess variable in else body of predict_action method. I think sess should be initialized with tf.Session Same is the case with rewards_list variable in training code

ViZDoomErrorException: Could not initialize SDL video

When I'm trying to run notebook in a Linux server environment, it disables monitor device.

Thus I have encountered this error at game.init():

Failed to connect to the Mir Server
 
---------------------------------------------------------------------------
ViZDoomErrorException                     Traceback (most recent call last)
<ipython-input-33-070898310509> in <module>()
      9     game.set_doom_scenario_path("deadly_corridor.wad")
     10 
---> 11     game.init()
     12 
     13     game.set_window_visible(False)

ViZDoomErrorException: Could not initialize SDL video:
Failed to connect to the Mir Server

This problem could be solved if we add game.set_window_visible(False) before game.init().

May I suggest adding a try exception block?

Thanks for writing this awesome repo.

error in the last line of the code

error when runing space invadors(AttributeError: 'function' object has no attribute 'ndim')

i am starting with machine learning ad i don't know how to solve it. can anyone help?

the full eror is :
AttributeError Traceback (most recent call last)
in
191 episode_rewards = []
192 state = env.reset
--> 193 state, stacked_frames = stack_frames(stacked_frames, state, True)
194
195 while step < max_steps:

in stack_frames(stacked_frames, state, is_new_episode)
61
62 def stack_frames (stacked_frames, state, is_new_episode):
---> 63 frame = preprocess_frame(state)
64
65 if is_new_episode:

in preprocess_frame(frame)
49 """
50 def preprocess_frame(frame):
---> 51 gray = rgb2gray(frame)
52
53 cropped_frame = gray[8:-12,4:-12]

~\AppData\Local\conda\conda\envs\treball\lib\site-packages\skimage\color\colorconv.py in rgb2gray(rgb)
799 """
800
--> 801 if rgb.ndim == 2:
802 return np.ascontiguousarray(rgb)
803

AttributeError: 'function' object has no attribute 'ndim'

Default doom config doesn't set screen_buffer to greyscale anymore

I was having issues with the image transform function throwing an error due to what looked like receiving an empty image.

Traceback (most recent call last):
File "main.py", line 324, in
state, stacked_frames = stack_frames(stacked_frames, state, True)
File "main.py", line 128, in stack_frames
frame = preprocess_frame(state)
File "main.py", line 114, in preprocess_frame
preprocessed_frame = transform.resize(normalized_frame, [84, 84])
File "/home/phill/.local/lib/python3.6/site-packages/skimage/transform/_warps.py", line 169, in resize
preserve_range=preserve_range)
File "/home/phill/.local/lib/python3.6/site-packages/skimage/transform/_warps.py", line 896, in warp
_clip_warp_output(image, warped, order, mode, cval, clip)
File "/home/phill/.local/lib/python3.6/site-packages/skimage/transform/_warps.py", line 648, in _clip_warp_output
min_val = input_image.min()
File "/home/phill/.local/lib/python3.6/site-packages/numpy/core/_methods.py", line 32, in _amin
return umr_minimum(a, axis, None, out, keepdims, initial)
ValueError: zero-size array to reduction operation minimum which has no identity

After fiddling with everything I took a look at the default config file and noticed that it had the screen colors set to something besides greyscale so I updated it to:

screen_format = GRAY8

That seems to have fixed the issue. I'm not sure why, but you might want to add that to the instructions.

Continuous output space scenario

Hi,

About the Cartpole, from what I understand, the agent selects an action based on the action probability outputted by the neural network.

But let's imagine that the action space is infinite.
For example, instead of outputting left or right, the agent outputs a speed, which can be a negative value if rolling to the left, or a positive value if rolling to the right.
How can I implement such a system ? does it seem feasible ? What would I need to modify in the code ?

I can't watch the videos

Hello, I am a student in China and I am trying to watch your videos about reinforcement learning, but when I click the address of the video, I got a error which show it's an unvalid address. Can you help me?

Problems I found in FrozenLake

I found that there is a high chance I would get all-zero Q-table after Q-learning algorithm. After some google, I found that the environment FrozenLake is default to be "slippery", so you don't always go where you want. So I guess the agents just aren't lucky enough to random to the goal. And later on, the epsilon decreased, and the agents does less exploration. (Although due to "slippery" environment, they could still see as doing random exploration.) And though sometime it learned the Q-table, the action it decide is quite weird, (go left at the beginning for example), but it still have as much as 0.49 score over time. So I think it is largely "slipped" to the end.
So my solution is after the Q-learning process, if the Q-table is still all-zero, I would reset the epsilon as 1 and start the iteration again. Also, reducing the decay_rate helps as well.
Last, I found a way to close the "is_slippery" option:

from gym.envs.registration import register
register(
id='FrozenLakeNotSlippery-v0',
entry_point='gym.envs.toy_text:FrozenLakeEnv',
kwargs={'map_name' : '4x4', 'is_slippery': False},
max_episode_steps=100,
reward_threshold=0.78, # optimum = .8196
)
env = gym.make("FrozenLakeNotSlippery-v0")

from openai/gym#565

Controlled ViZDoom instance is not running or not ready.

I got the error "vizdoom.vizdoom.ViZDoomIsNotRunningException: Controlled ViZDoom instance is not running or not ready." How can I solve the problem?

simoninithomas.github.io’s server IP address could not be found.

Main course link & syllabus Link (in ReadMe) don't work anymore.

ValueError: zero-size array to reduction operation minimum which has no identity

Hey, so I ran an identical version of the code in the notebook for the Deep Q Learning w/ Doom Tutorial and got an unexpected traceback error:

**File "", line 346, in
state, stacked_frames = stack_frames(stacked_frames, state, True)

File "", line 120, in stack_frames
frame = preprocess_frame(state)

File "", line 93, in preprocess_frame
preprocessed_frame = transform.resize(normalized_frame, [84,84])

File "/home/saams4u/.local/lib/python3.6/site-packages/skimage/transform/_warps.py", line 169, in resize
preserve_range=preserve_range)

File "/home/saams4u/.local/lib/python3.6/site-packages/skimage/transform/_warps.py", line 894, in warp
_clip_warp_output(image, warped, order, mode, cval, clip)

File "/home/saams4u/.local/lib/python3.6/site-packages/skimage/transform/_warps.py", line 646, in _clip_warp_output
min_val = input_image.min()

File "/home/saams4u/.local/lib/python3.6/site-packages/numpy/core/_methods.py", line 32, in _amin
return umr_minimum(a, axis, None, out, keepdims, initial)

ValueError: zero-size array to reduction operation minimum which has no identity**

Not sure what the problem is exactly...

memory leak

Hello there:
Thank you very much for this course!
But when I try to run the course code DDDQN, there is still a memory leak even if the parameter (memory_size) has reached the maximum number of buffers.

action = np.argmax(qtable[state,:]) ：

It is a excellent work! Thank you simoninithomas!
When I trying the "Q* Learning with FrozenLake "
I received the error message:

 IndexError       Traceback (most recent call last)
<ipython-input-10-b80d2f0eee51> in <module>()
 
           17         ## If this number > greater than epsilon --> exploitation (taking the biggest Q value for this state)
           18         if exp_exp_tradeoff > epsilon:
      ---> 19             action = np.argmax(qtable[state,:])
           20 
           21         # Else doing a random choice --> exploration
   
  IndexError: arrays used as indices must be of integer (or boolean) type

I printed out the value of the state:
[ 0.01670674 -0.0137805 -0.02276666 0.02263872]

I don't know how to solute this problem.

Value Error - lets-play-doom

Hello,
when I try to run your code, it always ends with this error:
"ValueError: zero-size array to reduction operation minimum which has no identity"
trackback indicates to this call: "stacked_frames = stack_frames(stacked_frames, state, True)"

What is the problem?

Version of Tensorflow

May I know the version of tensorflow that you are using?

retro has no attribute make

I'm getting a weird error when I'm running the deep Q learning space invaders example

" env = retro.make(game ='SpaceInvaders-Atari2600')
AttributeError: module 'retro' has no attribute 'make'
"
I installed the gym-retro and have python 3.6 working on windows 10
but I'm still new to this

Dueling Deep Q Learning with Doom: Memory Error

I try to train Doom on my pc, and use the same code on the page.
But each time after I training it a while, it occur memory error in stack-frame process.
I check my memory usage while training, it keeps increasing while training.
I have 16GB RAM, is it enough? Or something wrong about the code?

Why we have self.Q = tf.reduce_sum(tf.multiply(self.output, self.actions_), axis=1) in Deep Q learning with Doom.ipynb

Hi Thomas,

I recently started working on reinforcement learning and going through your Deep reinforcement learning online series.

I don't understand that why we are doing tf.reduce_sum and multiple the convolution output to action.

In [7]: ....... 
# Q is our predicted Q value
self.Q = tf.reduce_sum(tf.multiply(self.output, self.actions_), axis=1)

Why aren't we considering self.output as predicted Q value.

[Policy Gradients with Doom] - ValueError: cannot reshape array of size 9031680 into shape (1,84,84,4)

Hi, I have tried to run the Policy Gradient with Doom in my Desktop, but I am running into the following error:

Traceback (most recent call last): File "Doom_Grad.py", line 357, in <module> states_mb, actions_mb, rewards_of_batch, discounted_rewards_mb, nb_episodes_mb = make_batch(batch_size, stacked_frames) File "Doom_Grad.py", line 279, in make_batch feed_dict={PGNetwork.inputs_: state.reshape(1, *state_size)}) ValueError: cannot reshape array of size 9031680 into shape (1,84,84,4)

I was wondering if anyone could help me in this part.

Can't pip install vizdoom

When I pip installed vizdoom, I get an error as "Command "python setup.py egg_info" failed with error code 1 in C:\Users\Admin\AppData\Local\Temp\pip-install-1fe7dcqa\vizdoom"
Any help or suggestion is highly appreciated.

Cannot get Q value prediction for next_state

I am getting an error on this line:

Qs_next_state = sess.run(DQNetwork.output, feed_dict={DQNetwork.inputs_: next_states_mb})

Tensorflow does not like the np.array structure. I am assuming something is happening when we create the mini-batches from batch.

Anyone else get this error?

Error in train function: setting an array element with a sequence.
<class 'ValueError'>

TIA

"The frame skipping method is already implemented in the library."

In Space Invaders DQN notebook you mentioned that:

The frame skipping method is already implemented in the library.
Can you please elaborate on that please ? Which library ? And where do you define how many frames to skip ?
Thank you,

Something about saving models

Hi,

Im just training DQN and notice that most of training loops will use the range() function to iterate, and save in every 5 episode via a judgement if episode % 5 == 0: .

I guess you want to save the model after each 5 loops, but you know, range() will start at 0. So I suppose the judgement here should be if episode % 5 == 4: or if (episode+1) % 5 == 0:. :-)

N-step returns

Do these algorithms compute n-step returns for the reward propagation? The Sonic A2C code looks like it just does 1 step returns V(S) = R(S) + V(S_next), except it's hard to tell because I'm not too familiar with GAE.

PPO the agent is not trained yet?

When I visited the link
I found the words below "WORK IN PROGRESS: the agent is not trained yet ⚠️⚠️⚠️⚠️".
Does it mean that agent.py is not complete? Or there is no well trained model?

[Deep Q Learning with Doom] ValueError: Cannot feed value of shape (64, 84, 84, 4, 260) for Tensor 'DQNetwork/inputs:0', which has shape '(?, 84, 84, 4)'

I have tried to run the script provided in the tutorial in terminal. After the doom windows pop out for a few seconds, the value error appeared. (ValueError: Cannot feed value of shape (64, 84, 84, 4, 260) for Tensor 'DQNetwork/inputs:0', which has shape '(?, 84, 84, 4)')
Platform: Ubuntu 16.04
Tensorflow version: 1.12.0

Here is the output of the error in terminal after the program run failed:

Traceback (most recent call last):
  File "dqdoom.py", line 481, in <module>
    Qs_next_state = sess.run(DQNetwork.output, feed_dict = {DQNetwork.inputs_: next_states_mb})
  File "/home/work-tys/.local/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 929, in run
    run_metadata_ptr)
  File "/home/work-tys/.local/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1128, in _run
    str(subfeed_t.get_shape())))
ValueError: Cannot feed value of shape (64, 84, 84, 4, 260) for Tensor 'DQNetwork/inputs:0', which has shape '(?, 84, 84, 4)'

How to modify the dimension of next_states_mb in order to fit the array required by the DQNetwork.inputs?

Deep Q-Learning, Spaceinvaders, retro-gym

Hi!
I’m trying to run SpaceInvaders, but faced problem with:” Game not found: Did you make sure to import the ROM?”. Then I tried solution by MaximusWudy, with point and renaiming files to .a26 (as adviced at openai/retro). But when I try to run “python3 -m retro.import (…)” it always comes with error
Importing 130863 potential games…
Traceback (most recent call last):
File “//anaconda/lib/python3.5/runpy.py”, line 184, in _run_module_as_main
“main”, mod_spec)
File “//anaconda/lib/python3.5/runpy.py”, line 85, in _run_code
exec(code, run_globals)
File “//anaconda/lib/python3.5/site-packages/retro/import/main.py”, line 4, in
main()
File “//anaconda/lib/python3.5/site-packages/retro/scripts/import_path.py”, line 18, in main
retro.data.merge(*potential_roms, quiet=False)
File “//anaconda/lib/python3.5/site-packages/retro/data/init.py”, line 404, in merge
known_hashes[sha] = (game, ext, os.path.join(path(), curpath))
UnboundLocalError: local variable ‘ext’ referenced before assignment
To be honest, I have no idea what to try next.

Breakout in RND

I've run both pong and space-invaders for long training sets without any issues...but when I try breakout the system hangs some where inside breakout it never finishes the last life and goes into a loop. If a checkpoint is taken at that time then the checkpoint is bad and if restored the system hangs immediately. I do not know if this is a gym or torch issue .... has anyone reported something like this. I can replicate this easily.

atari spaceinvaders tensorboard permission denied and models not found

Hi the current implementation of Atari Spaceinvaders has several errors and i think all related to tensorboard. The first error comes from the first execution of tensorboard with the line
writer = tf.summary.FileWriter("/tensorboard/dqn/1") I avoided this by putting a " . " in front of the first backslash.

the second error is related to tensorboard locations once again with models

saver.restore(sess, "./models/model.ckpt")

I tried copying the notebook and running on mac and i got the explained errors. I have no way of avoiding the errors related to running the lines of code with model.ckpt. I tried commenting out tensorboard but the code still does not run. I have tried both on my local mac and on an online AWS instance with the same errors.

Edit:

I found out i had problems with outdated tf. version in tensorflow and also i had to change my hyperparameters in Training and Rendering.

its solved

Deep Q Learning Spaceinvaders

I've trained the model for 50 total episodes. However, when I run the last code cell, the action is always the same. I've printed Qs and the action, and the action is always [0 0 0 0 0 0 1 0]. The agent never moves and just dies after 3 lives.

I tested the environment with:
(Basically selects a random action)
choice = np.random.rand(1,8)
choice = choice[0]
choice.tolist()
choice = np.argmax(choice)
print(choice)
action = possible_actions[choice]

and the environment renders and the agent dies at around 200 points. So my installation is fine.

Any idea what I'm doing wrong?

Memory Issues (DQN and DDQN)

I have implemented the DQN space invader notebook in Google Colab, as well as in jupyter notebooks. Like clockwork it crashes at 12 episodes, due to a memory (RAM) overflow.
When looking at the ram use, it is low until episode 12, then suddenly it jumps and overflows.

I have tried running the .py script as well in pycharm. to no avail.
I have reduced the size of the batches and the memory size.
I have 16GB ram, which is okay?

Anyone else having this?

Thanks

Deep Q Learning - Doom

state, stacked_frames = stack_frames(stacked_frames, state, True)
File "doom_rl.py", line 80, in stack_frames
frame = preprocess_frame(state)
File "doom_rl.py", line 71, in preprocess_frame
preprocessed_frame = transform.resize(normalized_frame, [84,84])
File "C:\Users\dcane\Anaconda3\envs\tf_gpu\lib\site-packages\skimage\transform_warps.py", line 169, in resize
preserve_range=preserve_range)
File "C:\Users\dcane\Anaconda3\envs\tf_gpu\lib\site-packages\skimage\transform_warps.py", line 896, in warp
_clip_warp_output(image, warped, order, mode, cval, clip)
File "C:\Users\dcane\Anaconda3\envs\tf_gpu\lib\site-packages\skimage\transform_warps.py", line 648, in _clip_warp_output
min_val = input_image.min()
File "C:\Users\dcane\Anaconda3\envs\tf_gpu\lib\site-packages\numpy\core_methods.py", line 32, in _amin
return umr_minimum(a, axis, None, out, keepdims, initial)
ValueError: zero-size array to reduction operation minimum which has no identity

question on checkpoint restart

Was not sure why code is not restoring mean and rms values I made mods as follows so on restart it can pickup where it left off. Is there a reason why this is not done?

>>>>>I added two new paths

   # Define the model path names
    model_path = 'models/{}.model'.format(env_id)
    predictor_path = 'models/{}.pred'.format(env_id)
    target_path = 'models/{}.target'.format(env_id)
    mean_path = 'models/{}_mean.pt'.format(env_id)
    reward_rms_path = 'models/{}_rms.pt'.format(env_id)   

>>>> changed startup code :

    # Loads models
    if is_load_model:
        obs_rms    = torch.load(mean_path)
        reward_rms = torch.load(reward_rms_path)          
        if use_cuda:
            print("Loading PPO Saved Model using GPU")
            agent.model.load_state_dict(torch.load(model_path))
            agent.rnd.predictor.load_state_dict(torch.load(predictor_path))
            agent.rnd.target.load_state_dict(torch.load(target_path))
        else:
            print("Loading PPO Saved Model using CPU")
            agent.model.load_state_dict(torch.load(model_path, map_location='cpu'))
            agent.rnd.predictor.load_state_dict(torch.load(predictor_path, map_location='cpu'))
            agent.rnd.target.load_state_dict(torch.load(target_path, map_location='cpu'))            
    else:
        # normalize obs
       print(" first time intialization")
       next_obs = []
   
       for step in range(num_step * pre_obs_norm_step):
         actions = np.random.randint(0, output_size, size=(num_worker,))

       for parent_conn, action in zip(parent_conns, actions):
             parent_conn.send(action)

       for parent_conn in parent_conns:
             s, r, d, rd, lr = parent_conn.recv()
             next_obs.append(s[3, :, :].reshape([1, 84, 84]))

       if len(next_obs) % (num_step * num_worker) == 0:
             next_obs = np.stack(next_obs)
             obs_rms.update(next_obs)
             next_obs = []      

>>>>>and in check pointing 

            torch.save(agent.model.state_dict(), model_path)
            torch.save(agent.rnd.predictor.state_dict(), predictor_path)
            torch.save(agent.rnd.target.state_dict(), target_path)
            torch.save(obs_rms, mean_path) 
            torch.save(reward_rms, reward_rms_path)

simoninithomas / deep_reinforcement_learning_course Goto Github PK

deep_reinforcement_learning_course's People

Contributors

Stargazers

Watchers

Forkers

deep_reinforcement_learning_course's Issues

Recommend Projects

Recommend Topics

Recommend Org