Code from the Deep Reinforcement Learning in Action book from Manning, Inc

License: MIT License

Jupyter Notebook 99.47% Python 0.53%

deepreinforcementlearninginaction's Introduction

Deep Reinforcement Learning In Action

Code Snippets from the Deep Reinforcement Learning in Action book from Manning, Inc

How this is Organized

The code snippets, listings, and projects are all embedded in Jupyter Notebooks organized by chapter. Visit http://jupyter.org/install for instructions on installing Jupyter Notebooks.

We keep the original Jupyter Notebooks in their respective chapter folders. As we discover errata, we update notebooks in the Errata folder, so those notebooks are the most up-to-date in terms of errors corrected, but we keep the original Jupyter Notebooks to match the book code snippets.

Requirements

In order to run many of the projects, you'll need at least the NumPy library and PyTorch.

pip install -r requirements.txt

Special Instructions

In the notebook 9, there's an issue (appearing in the 15th cell) you can solve by following the instructions of @scottmayberry in Farama-Foundation/MAgent2#14. That means to copy all the files and folders from https://github.com/Farama-Foundation/MAgent2/tree/main/magent2 to the local folder <venv_folder>/lib/python3.X/site-packages/magent2 (or similar path if your OS is other than Linux) - Thanks to donlaiq for this

Contribute

If you experience any issues running the examples, please file an issue. If you see typos or other errors in the book, please edit the Errata.md file and create a pull request.

deepreinforcementlearninginaction's People

Contributors

Stargazers

Watchers

Forkers

vicchugu azai91 zhaoforever collector-m shlpu sli1989 alvincjin gnperdue allensmile vitvicky alongwithyou yucoian landoufulxf jason08 thomblin ustcpcs seanyu-git msaroufim primemover2011 tonyabell deeksha-5 tumurtogtokh fanwangm shubhampachori12110095 w0lv3r1nix joomladigger victor8733 sam-h-bean molilagu maverobot trindqb samper-escudero cemtutum chomolungma jojoee lionely sudipta90 vorlenko watchsea en574894764 diem389 nguyendo24 kevinjesse ryugwang hakiri mkirby42 guyulongcs zubair1811 taogz qinghuajiang xiaoguozhi danagain runjiax gridl nilabha wangncs shivswamiai halesmith pytorchinfo haohaoxiao china-liweihong fjck urmumisanrpm wpskkim t-kubrak craymond0102 alucantonio anatolicvs xrosliang sungjinnam sprbb sunnycd phillip1029 mathisshen mihasajko lonardogio phoitack albtorval pgnepal jiet97 veryquant wlgeorge qiu1234567 kiddxtrizz mhbashari fumanet mecusorin elhamsaeedy qh1874 louis-oscarmorel yeeun-lee aryanphd prashantsekuri curieuxjy scorpio-h sandeepborkartech chediak plola617 chen-yongquan hlfshell

deepreinforcementlearninginaction's Issues

Ch3_book.ipynb Listing 3.3

state1 = torch.from_numpy(state_).float() #E

requirements.txt has torch==0.4.0 which is not available

Looking in links: https://download.pytorch.org/whl/torch_stable.html
Collecting torch==0.4.0
Could not find a version that satisfies the requirement torch==0.4.0 (from versions: 0.1.2, 0.1.2.post1, 0.1.2.post2, 0.4.1, 0.4.1.post2, 1.0.0, 1.0.1, 1.0.1.post2, 1.1.0, 1.2.0, 1.2.0+cpu, 1.2.0+cu92, 1.3.0, 1.3.0+cpu, 1.3.0+cu100, 1.3.0+cu92, 1.3.1, 1.3.1+cpu, 1.3.1+cu100, 1.3.1+cu92, 1.4.0, 1.4.0+cpu, 1.4.0+cu100, 1.4.0+cu92, 1.5.0, 1.5.0+cpu, 1.5.0+cu101, 1.5.0+cu92, 1.5.1, 1.5.1+cpu, 1.5.1+cu101, 1.5.1+cu92, 1.6.0, 1.6.0+cpu, 1.6.0+cu101, 1.6.0+cu92, 1.7.0, 1.7.0+cpu, 1.7.0+cu101, 1.7.0+cu110, 1.7.0+cu92, 1.7.1, 1.7.1+cpu, 1.7.1+cu101, 1.7.1+cu110, 1.7.1+cu92, 1.7.1+rocm3.7, 1.7.1+rocm3.8)
No matching distribution found for torch==0.4.0

Invalid link

Both numpy and pytorch links in readme file are invalid.

Chapter 5 - Listing 5.1 Introduction to multiprocessing

import multiprocessing as mp
import numpy as np
def square(x): #A
return np.square(x)
x = np.arange(64) #B
print(x)
print(mp.cpu_count())
pool = mp.Pool(4) #C

squared = pool.map(square, [x[8i:8i+8] for i in range(4)])
print(squared)

Unfortunately, I receive the following error after running the code.

print(squared)
[ 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47
48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63]
4

Process SpawnPoolWorker-9:
Traceback (most recent call last):
File "C:\ProgramData\Anaconda3\lib\multiprocessing\process.py", line 315, in _bootstrap
self.run()
File "C:\ProgramData\Anaconda3\lib\multiprocessing\process.py", line 108, in run
self._target(*self._args, **self._kwargs)
File "C:\ProgramData\Anaconda3\lib\multiprocessing\pool.py", line 114, in worker
task = get()
File "C:\ProgramData\Anaconda3\lib\multiprocessing\queues.py", line 358, in get
return _ForkingPickler.loads(res)
AttributeError: Can't get attribute 'square' on <module 'main' (built-in)>
Process SpawnPoolWorker-10:
Traceback (most recent call last):
File "C:\ProgramData\Anaconda3\lib\multiprocessing\process.py", line 315, in _bootstrap
self.run()
File "C:\ProgramData\Anaconda3\lib\multiprocessing\process.py", line 108, in run
self._target(*self._args, **self._kwargs)
File "C:\ProgramData\Anaconda3\lib\multiprocessing\pool.py", line 114, in worker
task = get()
File "C:\ProgramData\Anaconda3\lib\multiprocessing\queues.py", line 358, in get
return _ForkingPickler.loads(res)
AttributeError: Can't get attribute 'square' on <module 'main' (built-in)>
Process SpawnPoolWorker-11:
Traceback (most recent call last):
File "C:\ProgramData\Anaconda3\lib\multiprocessing\process.py", line 315, in _bootstrap
self.run()
File "C:\ProgramData\Anaconda3\lib\multiprocessing\process.py", line 108, in run
self._target(*self._args, **self._kwargs)
File "C:\ProgramData\Anaconda3\lib\multiprocessing\pool.py", line 114, in worker
task = get()
File "C:\ProgramData\Anaconda3\lib\multiprocessing\queues.py", line 358, in get
return _ForkingPickler.loads(res)
AttributeError: Can't get attribute 'square' on <module 'main' (built-in)>
Process SpawnPoolWorker-12:
Traceback (most recent call last):
File "C:\ProgramData\Anaconda3\lib\multiprocessing\process.py", line 315, in _bootstrap
self.run()
File "C:\ProgramData\Anaconda3\lib\multiprocessing\process.py", line 108, in run
self._target(*self._args, **self._kwargs)
File "C:\ProgramData\Anaconda3\lib\multiprocessing\pool.py", line 114, in worker
task = get()
File "C:\ProgramData\Anaconda3\lib\multiprocessing\queues.py", line 358, in get
return _ForkingPickler.loads(res)
AttributeError: Can't get attribute 'square' on <module 'main' (built-in)>

Chapter 8: Training loop and min_progress

Unless I'm mistaken, there is something odd about the main training loop (Listing 8.13) for the Super Mario game in Chapter 8. The way that the current x-position is checked against the min_progress parameter makes no sense to me.
More precisely: in line 23 of the main training loop, the environment step is taken (6 times) and last_x_pos is set to the current x-position:

state2, e_reward_, done, info = env.step(action)
last_x_pos = info['x_pos']

In the following lines of code, neither last_x_pos nor info['x_pos'] are changed. Then in line 33 the two are compared to one another:

if episode_length > params['max_episode_len']:
     if (info['x_pos'] - last_x_pos) < params['min_progress']:
          done = True
     else:
          last_x_pos = info['x_pos']

Isn't info['x_pos'] - last_x_pos always going to be zero here? This would always reset the environment as soon as episode_length > params['max_episode_len'].
What is the min_progress parameter meant to be intuitively? The progress from beginning till the end of one episode? The progress from time 0 till max_episode_len? Or the progress against a certain checkpoint in a certain amount of time? If so, how are these checkpoints chosen?
This has not become clear to me yet, neither from the book nor from the code.

Appendix A.4

I get an error returned for this and i have no idea how to fix it

I write in Spyder 5.4.3 with Python 3.11

codeline:

`import torch
import torchvision as TV
import numpy as np
from matplotlib import pyplot as plt

def nn(x,w1,w2):
l1 = x @ w1
l1 = torch.relu(l1)
l2 = l1 @ w2
return l2

w1 = torch.randn(784,200,requires_grad=True)
w2 = torch.randn(200,10,requires_grad=True)

mnist_data = TV.datasets.MNIST("MNIST", train=True, download=False)

plt.figure(figsize=(10,7))
plt.imshow(mnist_data.train_data[0])
plt.axis('off')

lr = 0.0001
epochs = 2500
batch_size = 1000
losses = []
lossfn = torch.nn.CrossEntropyLoss()
for i in range(epochs):
rid = np.random.randint(0,mnist_data.train_data.shape[0],size=batch_size)
x = mnist_data.train_data[rid].float().flatten(start_dim=1)
x /= x.max()
pred = nn(x,w1,w2)
target = mnist_data.train_labels[rid]
loss = lossfn(pred,target)
losses.append(loss)
loss.backward()
with torch.no_grad():
w1 -= lr * w1.grad
w2 -= lr * w2.grad

plt.figure(figsize=(10,7))
plt.xlabel("Training Time", fontsize=22)
plt.ylabel("Loss", fontsize=22)
plt.plot(losses)`

console return:

File ~/anaconda3/lib/python3.11/site-packages/spyder_kernels/py3compat.py:356 in compat_exec
exec(code, globals, locals)

File ~/.spyder-py3/temp.py:49
plt.plot(losses)

File ~/anaconda3/lib/python3.11/site-packages/matplotlib/pyplot.py:2812 in plot
return gca().plot(

File ~/anaconda3/lib/python3.11/site-packages/matplotlib/axes/_axes.py:1688 in plot
lines = [*self._get_lines(*args, data=data, **kwargs)]

File ~/anaconda3/lib/python3.11/site-packages/matplotlib/axes/_base.py:311 in call
yield from self._plot_args(

File ~/anaconda3/lib/python3.11/site-packages/matplotlib/axes/_base.py:496 in _plot_args
x, y = index_of(xy[-1])

File ~/anaconda3/lib/python3.11/site-packages/matplotlib/cbook/init.py:1661 in index_of
y = _check_1d(y)

File ~/anaconda3/lib/python3.11/site-packages/matplotlib/cbook/init.py:1353 in _check_1d
return np.atleast_1d(x)

File <array_function internals>:200 in atleast_1d

File ~/anaconda3/lib/python3.11/site-packages/numpy/core/shape_base.py:65 in atleast_1d
ary = asanyarray(ary)

File ~/anaconda3/lib/python3.11/site-packages/torch/_tensor.py:956 in array
return self.numpy()

RuntimeError: Can't call numpy() on Tensor that requires grad. Use tensor.detach().numpy() instead.

Multiprocessing in Chapter 5

To run in a Colab, seems like it is necessary to add
mp.set_start_method('spawn', force = True)

Chapter 2 - Listing 2.1

get_action_value is not defined anywhere in the notebook.

Chapter 5: undying error when I run multiprocess code in Jupyter

When I copy code in List 5.1 and run in jupyter, it always tells me this:

Can't get attribute 'square' on <module 'main' (built-in)>

According to what I have found in Google, it seems that the code needs to be titled:

if name =='main':

but this way only works in Spyder and Pycharm.

So I wanna know how you guys tackle it.

Grateful to hear any suggestions!

Chapter 3 - Not learning with larger grid (size = 12)

When I change the size to 12 (and in the mode = "player"), the agent no longer learning. It always move towards the borders, i.e. keep taking the action moving towards the borders even when it is already at the border.
Is it because there is no penalty for such action?

Missing explanation of Errata directory

Does the Errata folder contain the correct versions of the notebooks or older incorrect versions? I think it would be helpful to have that information in the README.md or to simply include only corrected versions of the notebooks.

Chapter 9: Listing 9.21

I noticed that for both teams, when calling team_step() we are using the same parameter vector param[0] for both teams:

        acts_1, act_means1, qvals1, obs_small_1, ids_1 = \
            team_step(team1,params[0],acts_1,layers) #B
        env.set_action(team1, acts_1.detach().numpy().astype(np.int32)) #C

        acts_2, act_means2, qvals2, obs_small_2, ids_2 = \
            team_step(team2,params[0],acts_2,layers)
        env.set_action(team2, acts_2.detach().numpy().astype(np.int32))

Shouldn't it be param[0] for team 1 and param[1] for team 2? That's the behaviour shown later when calling train:

            loss1 = train(batch_size,replay1,params[0],layers=layers,J=N1)
            loss2 = train(batch_size,replay2,params[1],layers=layers,J=N1)

Ch 2 - Code doesn't run

Almost all the inline code snippets in chapter 2 are not working.

E.g:

Page 34

Code

plt.xlabel("Plays")
plt.ylabel("Avg Reward")
for i in range(500):
  if random.random() > eps:
    choice = get_best_arm(pastRewards, actions)
  else:
    choice = np.where(arms == np.random.choice(arms))[0][0]
  thisAV = np.array([[choice, reward(arms[choice])]])
  av = np.vstack((av, thisAV))
  percCorrect = 100*(len(av[np.where(av[:,0] == np.argmax(arms))])/len(av))
  runningMean = np.mean(av[:,1])
  plt.scatter(i, runningMean)

Error

NameError: name 'pastRewards' is not defined

Page 37

Code

for i in range(500):
  choice = np.random.choice(arms, p=av_softmax)
  counts[choice] += 1
  k = counts[choice]
  rwd = reward(arms[choice])
  old_avg = av[choice]
  new_avg = old_avg + (1/k)*(rwd - old_avg)
  av[choice] = new_avg
  av_softmax = softmax(av)

Error

IndexError: only integers, slices (`:`), ellipsis (`...`), numpy.newaxis (`None`) and integer or boolean arrays are valid indices

Page 41

Code

>>> x = torch.Tensor([2,4]) #input data
>>> m = torch.randn(2, requires_grad=True) #parameter 1
>>> b = torch.randn(1, requires_grad=True) #parameter 2
>>> y = m*x+b #linear model
>>> loss = (torch.sum(y_known - y))**2 #loss function
>>> loss.backward() #calculate gradients
>>> m.grad
tensor([ 0.7734, -90.4993])

Error

NameError: name 'y_known' is not defined

Code

model = torch.nn.Sequential(
  torchh.nn.Linear(10, 150),
  torch.nn.ReLU(),
  torch.nn.Linear(150, 4),
  torch.nn.ReLU(),
)
loss_fn = torch.nn.MSELoss()
optimizer = torch.optim.Adam(model.parameters(), lr=0.01)

Error

NameError: name 'torchh' is not defined

Chp 7 - Ch7_DistDQN.ipynb

Is there an error in the training loop code for playing Atari-Freeway: specifically generating the predictions?

pred2_batch = dist_dqn(state2_batch.detach(),theta_2,aspace=aspace)

Should the state2_batch be state_batch?

Variable error

When I try to run the notebook I am getting the following error in the cell "Without experience replay"
'Variable' object has no attribute 'reshape'
The error occur in the line
newQ = model(new_state.reshape(1,64)).data.numpy()
I am running pytorch 0.3.1 on Windows 10

Ch 3 - Without experience replay: invalid index of a 0-dim tensor

Running the cell in notebook produces the following error:

---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
<ipython-input-9-448853d32d49> in <module>
     34         optimizer.zero_grad()
     35         loss.backward()
---> 36         losses.append(loss.data[0])
     37         optimizer.step()
     38         state = new_state

IndexError: invalid index of a 0-dim tensor. Use tensor.item() to convert a 0-dim tensor to a Python number

Python 3.7.2 with:
[('Jinja2', '2.10.1'), ('Mako', '1.0.7'), ('Markdown', '3.0.1'), ('MarkupSafe', '1.1.0'), ('Pillow', '6.0.0'), ('Pygments', '2.3.1'), ('Send2Trash', '1.5.0'), ('appnope', '0.1.0'), ('attrs', '19.1.0'), ('backcall', '0.1.0'), ('bleach', '3.1.0'), ('cycler', '0.10.0'), ('decorator', '4.4.0'), ('defusedxml', '0.6.0'), ('entrypoints', '0.3'), ('ipykernel', '5.1.0'), ('ipython', '7.4.0'), ('ipython-genutils', '0.2.0'), ('ipywidgets', '7.4.2'), ('jedi', '0.13.3'), ('jsonschema', '3.0.1'), ('jupyter', '1.0.0'), ('jupyter-client', '5.2.4'), ('jupyter-console', '6.0.0'), ('jupyter-core', '4.4.0'), ('kiwisolver', '1.0.1'), ('matplotlib', '3.0.3'), ('mistune', '0.8.4'), ('nbconvert', '5.4.1'), ('nbformat', '4.4.0'), ('notebook', '5.7.8'), ('numpy', '1.16.3'), ('pandocfilters', '1.4.2'), ('parso', '0.4.0'), ('pdoc3', '0.5.2'), ('pexpect', '4.7.0'), ('pickleshare', '0.7.5'), ('pip', '19.0.3'), ('prometheus-client', '0.6.0'), ('prompt-toolkit', '2.0.9'), ('ptyprocess', '0.6.0'), ('pyparsing', '2.4.0'), ('pyrsistent', '0.14.11'), ('python-dateutil', '2.8.0'), ('pyzmq', '18.0.1'), ('qtconsole', '4.4.3'), ('setuptools', '40.8.0'), ('six', '1.12.0'), ('snap', '5.0.0-64-dev-macosx10.14.3-x64-py3.7'), ('terminado', '0.8.2'), ('testpath', '0.4.2'), ('torch', '1.0.1.post2'), ('torchvision', '0.2.2.post3'), ('tornado', '6.0.2'), ('traitlets', '4.3.2'), ('wcwidth', '0.1.7'), ('webencodings', '0.5.1'), ('wheel', '0.33.0'), ('widgetsnbextension', '3.4.2')]

why i counter the dead loop of listing 3-3

These days, i tried the listing 3-3, and i set the epochs to 1. i found the reward value is always -1. So it seems that it is in dead loop status. How much time does it cost to run this example?

Thanks

Chapter4: More episode duration leads to a decrease in policy gradient method!

Accoring to what authors say in chapter 4, more episode duration will allow the model to hold the game longer.

Then I download the code of chapter 4, run it locally with MAX_EPISODES = 250.

Surprisingly, this makes the model be bad at the task, only 22 times exceed 180s while the original model can make it by 90 times.

And I also reset the model, try with higher MAX_EPISODES, but all of them fail to beat the beginning set.

What may contribute to this phenomenon?

some pictures go blank in eBook!

I am very happy and grateful to read this brilliant book！

But I recently find some pictures in book is blank. In my case, Figure 3.17, 3.18, 4.5 are all blank.

I read the eBook from O'reilly, and I do hope these pictures can show up so that readers can understand all authors' thoughts!

Ch3_book.ipynb list3.3

state2 is already reshaped in

"
state2_ = game.board.render_np().reshape(1,64) + np.random.rand(1,64)/10.0
state2 = torch.from_numpy(state2_).float() #L
"

Therefore,

with torch.no_grad():
newQ = model(state2.reshape(1,64))
maxQ = torch.max(newQ) #M

might be fixed as:

with torch.no_grad():
newQ = model(state2)
maxQ = torch.max(newQ) #M

Ch3_book.ipynb Listing 3.3: Different sizes for the loss_fn

How to resolve this warning:

UserWarning: Using a target size (torch.Size([1])) that is different to the input size (torch.Size([])). This will likely lead to incorrect results due to broadcasting. Please ensure they have the same size.
caused by this code part:

        Y = torch.Tensor([Y]).detach()
        X = qval.squeeze()[action_]
        loss = loss_fn(X, Y)

The script still works fine, but I would like to get rid of the warning. Thanks

Chap.4. softmax(dim=1)

The code for the model is as below

model = torch.nn.Sequential(
    torch.nn.Linear(l1, l2),
    torch.nn.LeakyReLU(),
    torch.nn.Linear(l2, l3),
    torch.nn.Softmax(dim=0) #C
)

But the softmax operation with dim=0 is only OK when the input is a 1 dimensional array. However, when you give a batch input, then the probability will be computed along the row direction of the batch matrix.

You can check it by printing pred_batch of Listing 4.8.

    pred_batch = model(state_batch) #N
    print(pred_batch)

One way to fix this is by modifying it to:

    torch.nn.Softmax(dim=1) #C

and do unsqueeze(0) and squeeze(0) for the computation of just one state vector:

state1 = env.reset()
pred = model(torch.from_numpy(state1).float().unsqueeze(0)) #G
action = np.random.choice(np.array([0,1]), p=pred.data.numpy().squeeze(0)) #H
state2, reward, done, info = env.step(action) #I

I like this book much since it gives some intuition for RL rather than trying to provide the theory^^

Chapter3: a strange error when I run Listing 3.7

In the Listing 3.7, we use both memory replay and target network to improve the stablility.

However, in the memory loop:

if len(replay) > batch_size:
minibatch = random.sample(replay, batch_size)
...
action_batch = torch.Tensor([a for (s1,a,r,s2,d) in minibatch])

The compiler tells me this error:

---> 42 action_batch = torch.Tensor([a for (s1,a,r,s2,d) in minibatch])
too many dimensions 'str'

I suppose that when we memory, the action is represented by a characteristic. There, nevertheless, corresponding number are needed.

So I propose to make a reverse action set to fill this transform.

Chapter5 - Pong

The German translation of the book (p. 134) promises:
"However, if you want, you can easily adapt the algorithm to a more difficult game like Pong in OpenAI Gym; you can find such an implementation on the GitHub page for this chapter: http://mng.bz/JzKp."
Unfortunately I couldn't find anything about it! Anyone know where the code is?

Auf Seite 134 im Buch (Hanser) steht:
"Wenn Sie möchten, können Sie den Algorithmus jedoch leicht an ein schwierigeres Spiel wie Pong in OpenAI Gym anpassen; eine solche Implementierung finden Sie auf der GitHub-Seite zu diesem Kapitel: http://mng.bz/JzKp."
Leider habe ich dazu nichts gefunden! Wer weiss, wo der Code ist?

A question in Chapter two.

I am reading the book in Chapter two, and I have a question in a paragraph below.
In the reward function, We will assume that each arm is executed 10 times to determine if the probability value in the numpy array is less than prob?And whether the reward is a real reward or an estimated reward？

Per our casino example, we will be solving a 10-armed bandit problem, hence n = 10. We’ve also defined a numpy array of length n filled with random floats that can be understood as probabilities. The way we've chosen to implement our reward probability distributions for each arm/lever/slot machine is this: Each arm will have a probability, e.g. 0.7. The maximum reward is $10. We will setup a for loop to 10 and at each step, it will add +1 to the reward if a random float is less than the arm's probability. Thus on the first loop, it makes up a random float (e.g. 0.4). 0.4 is less than 0.7, so reward += 1. On the next iteration, it makes up another random float (e.g. 0.6) which is also less than 0.7, thus reward += 1. This continues until we complete 10 iterations and then we return the final total reward, which could be anything between 0 and 10. With an arm probability of 0.7, the average reward of doing this to infinity would be 7, but on any single play, it could be more or less.

def reward(prob, n=10):
    reward = 0;
    for i in range(n):
        if random.random() < prob:
            reward += 1
    return reward

Chapt 8: Curiosity Driven Deep Learning RAM Requirements

I have attempted to Chapt 8 code, as a python file, on a 32Gb CPU RAM Ubuntu 18.04 rig with 16Gb NVidia 1800 GTi GPU card. However my RAM Utilisation grows excessively as the training epochs run, exceeds 30 Gb when I hit 1800 epochs on the Super Mario Mario Curiosity Deep Training code.

The book suggested that this code would only take 30 minutes on a Mac Book Air (no GPU) So i don’t understand why the RAM use grows to excess as the training epochs grows.

Interested in any others experience on this, or why I would be experiencing such excessive and growing RAM utilisation.

deepreinforcementlearning / deepreinforcementlearninginaction Goto Github PK

deepreinforcementlearninginaction's Introduction

Deep Reinforcement Learning In Action

How this is Organized

Requirements

Special Instructions

Contribute

deepreinforcementlearninginaction's People

Contributors

Stargazers

Watchers

Forkers

deepreinforcementlearninginaction's Issues

Page 34

Code

Error

Page 37

Code

Error

Page 41

Code

Error

Code

Error

Recommend Projects

Recommend Topics

Recommend Org