Git Product home page Git Product logo

deepreinforcementlearninginaction's People

Contributors

azai91 avatar deepreinforcementlearning avatar donlaiq avatar jojoee avatar nairbv avatar outlace avatar ryugwang avatar shouvikcirca avatar sumit521 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

deepreinforcementlearninginaction's Issues

Chapter 8: Training loop and min_progress

Unless I'm mistaken, there is something odd about the main training loop (Listing 8.13) for the Super Mario game in Chapter 8. The way that the current x-position is checked against the min_progress parameter makes no sense to me.
More precisely: in line 23 of the main training loop, the environment step is taken (6 times) and last_x_pos is set to the current x-position:

state2, e_reward_, done, info = env.step(action)
last_x_pos = info['x_pos']

In the following lines of code, neither last_x_pos nor info['x_pos'] are changed. Then in line 33 the two are compared to one another:

if episode_length > params['max_episode_len']:
     if (info['x_pos'] - last_x_pos) < params['min_progress']:
          done = True
     else:
          last_x_pos = info['x_pos']

Isn't info['x_pos'] - last_x_pos always going to be zero here? This would always reset the environment as soon as episode_length > params['max_episode_len'].
What is the min_progress parameter meant to be intuitively? The progress from beginning till the end of one episode? The progress from time 0 till max_episode_len? Or the progress against a certain checkpoint in a certain amount of time? If so, how are these checkpoints chosen?
This has not become clear to me yet, neither from the book nor from the code.

why i counter the dead loop of listing 3-3

These days, i tried the listing 3-3, and i set the epochs to 1. i found the reward value is always -1. So it seems that it is in dead loop status. How much time does it cost to run this example?

Thanks

Chp 7 - Ch7_DistDQN.ipynb

Is there an error in the training loop code for playing Atari-Freeway: specifically generating the predictions?

pred2_batch = dist_dqn(state2_batch.detach(),theta_2,aspace=aspace)

Should the state2_batch be state_batch?

Ch3_book.ipynb list3.3

state2 is already reshaped in

"
state2_ = game.board.render_np().reshape(1,64) + np.random.rand(1,64)/10.0
state2 = torch.from_numpy(state2_).float() #L
"

Therefore,

with torch.no_grad():
newQ = model(state2.reshape(1,64))
maxQ = torch.max(newQ) #M

might be fixed as:

with torch.no_grad():
newQ = model(state2)
maxQ = torch.max(newQ) #M

Chapter 3 - Not learning with larger grid (size = 12)

When I change the size to 12 (and in the mode = "player"), the agent no longer learning. It always move towards the borders, i.e. keep taking the action moving towards the borders even when it is already at the border.
Is it because there is no penalty for such action?

Chap.4. softmax(dim=1)

The code for the model is as below

model = torch.nn.Sequential(
    torch.nn.Linear(l1, l2),
    torch.nn.LeakyReLU(),
    torch.nn.Linear(l2, l3),
    torch.nn.Softmax(dim=0) #C
)

But the softmax operation with dim=0 is only OK when the input is a 1 dimensional array. However, when you give a batch input, then the probability will be computed along the row direction of the batch matrix.

You can check it by printing pred_batch of Listing 4.8.

    pred_batch = model(state_batch) #N
    print(pred_batch)

One way to fix this is by modifying it to:

    torch.nn.Softmax(dim=1) #C

and do unsqueeze(0) and squeeze(0) for the computation of just one state vector:

state1 = env.reset()
pred = model(torch.from_numpy(state1).float().unsqueeze(0)) #G
action = np.random.choice(np.array([0,1]), p=pred.data.numpy().squeeze(0)) #H
state2, reward, done, info = env.step(action) #I

I like this book much since it gives some intuition for RL rather than trying to provide the theory^^

Ch3_book.ipynb Listing 3.3: Different sizes for the loss_fn

How to resolve this warning:

UserWarning: Using a target size (torch.Size([1])) that is different to the input size (torch.Size([])). This will likely lead to incorrect results due to broadcasting. Please ensure they have the same size.
caused by this code part:

        Y = torch.Tensor([Y]).detach()
        X = qval.squeeze()[action_]
        loss = loss_fn(X, Y)

The script still works fine, but I would like to get rid of the warning. Thanks

Appendix A.4

I get an error returned for this and i have no idea how to fix it

I write in Spyder 5.4.3 with Python 3.11

codeline:

`import torch
import torchvision as TV
import numpy as np
from matplotlib import pyplot as plt

def nn(x,w1,w2):
l1 = x @ w1
l1 = torch.relu(l1)
l2 = l1 @ w2
return l2

w1 = torch.randn(784,200,requires_grad=True)
w2 = torch.randn(200,10,requires_grad=True)

mnist_data = TV.datasets.MNIST("MNIST", train=True, download=False)

plt.figure(figsize=(10,7))
plt.imshow(mnist_data.train_data[0])
plt.axis('off')

lr = 0.0001
epochs = 2500
batch_size = 1000
losses = []
lossfn = torch.nn.CrossEntropyLoss()
for i in range(epochs):
rid = np.random.randint(0,mnist_data.train_data.shape[0],size=batch_size)
x = mnist_data.train_data[rid].float().flatten(start_dim=1)
x /= x.max()
pred = nn(x,w1,w2)
target = mnist_data.train_labels[rid]
loss = lossfn(pred,target)
losses.append(loss)
loss.backward()
with torch.no_grad():
w1 -= lr * w1.grad
w2 -= lr * w2.grad

plt.figure(figsize=(10,7))
plt.xlabel("Training Time", fontsize=22)
plt.ylabel("Loss", fontsize=22)
plt.plot(losses)`

console return:

File ~/anaconda3/lib/python3.11/site-packages/spyder_kernels/py3compat.py:356 in compat_exec
exec(code, globals, locals)

File ~/.spyder-py3/temp.py:49
plt.plot(losses)

File ~/anaconda3/lib/python3.11/site-packages/matplotlib/pyplot.py:2812 in plot
return gca().plot(

File ~/anaconda3/lib/python3.11/site-packages/matplotlib/axes/_axes.py:1688 in plot
lines = [*self._get_lines(*args, data=data, **kwargs)]

File ~/anaconda3/lib/python3.11/site-packages/matplotlib/axes/_base.py:311 in call
yield from self._plot_args(

File ~/anaconda3/lib/python3.11/site-packages/matplotlib/axes/_base.py:496 in _plot_args
x, y = index_of(xy[-1])

File ~/anaconda3/lib/python3.11/site-packages/matplotlib/cbook/init.py:1661 in index_of
y = _check_1d(y)

File ~/anaconda3/lib/python3.11/site-packages/matplotlib/cbook/init.py:1353 in _check_1d
return np.atleast_1d(x)

File <array_function internals>:200 in atleast_1d

File ~/anaconda3/lib/python3.11/site-packages/numpy/core/shape_base.py:65 in atleast_1d
ary = asanyarray(ary)

File ~/anaconda3/lib/python3.11/site-packages/torch/_tensor.py:956 in array
return self.numpy()

RuntimeError: Can't call numpy() on Tensor that requires grad. Use tensor.detach().numpy() instead.

Chapter 5: undying error when I run multiprocess code in Jupyter

When I copy code in List 5.1 and run in jupyter, it always tells me this:

Can't get attribute 'square' on <module 'main' (built-in)>

According to what I have found in Google, it seems that the code needs to be titled:

if name =='main':

but this way only works in Spyder and Pycharm.

So I wanna know how you guys tackle it.

Grateful to hear any suggestions!

Chapter4: More episode duration leads to a decrease in policy gradient method!

Accoring to what authors say in chapter 4, more episode duration will allow the model to hold the game longer.

Then I download the code of chapter 4, run it locally with MAX_EPISODES = 250.

Surprisingly, this makes the model be bad at the task, only 22 times exceed 180s while the original model can make it by 90 times.

And I also reset the model, try with higher MAX_EPISODES, but all of them fail to beat the beginning set.

What may contribute to this phenomenon?

Ch 3 - Without experience replay: invalid index of a 0-dim tensor

Running the cell in notebook produces the following error:

---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
<ipython-input-9-448853d32d49> in <module>
     34         optimizer.zero_grad()
     35         loss.backward()
---> 36         losses.append(loss.data[0])
     37         optimizer.step()
     38         state = new_state

IndexError: invalid index of a 0-dim tensor. Use tensor.item() to convert a 0-dim tensor to a Python number

Python 3.7.2 with:
[('Jinja2', '2.10.1'), ('Mako', '1.0.7'), ('Markdown', '3.0.1'), ('MarkupSafe', '1.1.0'), ('Pillow', '6.0.0'), ('Pygments', '2.3.1'), ('Send2Trash', '1.5.0'), ('appnope', '0.1.0'), ('attrs', '19.1.0'), ('backcall', '0.1.0'), ('bleach', '3.1.0'), ('cycler', '0.10.0'), ('decorator', '4.4.0'), ('defusedxml', '0.6.0'), ('entrypoints', '0.3'), ('ipykernel', '5.1.0'), ('ipython', '7.4.0'), ('ipython-genutils', '0.2.0'), ('ipywidgets', '7.4.2'), ('jedi', '0.13.3'), ('jsonschema', '3.0.1'), ('jupyter', '1.0.0'), ('jupyter-client', '5.2.4'), ('jupyter-console', '6.0.0'), ('jupyter-core', '4.4.0'), ('kiwisolver', '1.0.1'), ('matplotlib', '3.0.3'), ('mistune', '0.8.4'), ('nbconvert', '5.4.1'), ('nbformat', '4.4.0'), ('notebook', '5.7.8'), ('numpy', '1.16.3'), ('pandocfilters', '1.4.2'), ('parso', '0.4.0'), ('pdoc3', '0.5.2'), ('pexpect', '4.7.0'), ('pickleshare', '0.7.5'), ('pip', '19.0.3'), ('prometheus-client', '0.6.0'), ('prompt-toolkit', '2.0.9'), ('ptyprocess', '0.6.0'), ('pyparsing', '2.4.0'), ('pyrsistent', '0.14.11'), ('python-dateutil', '2.8.0'), ('pyzmq', '18.0.1'), ('qtconsole', '4.4.3'), ('setuptools', '40.8.0'), ('six', '1.12.0'), ('snap', '5.0.0-64-dev-macosx10.14.3-x64-py3.7'), ('terminado', '0.8.2'), ('testpath', '0.4.2'), ('torch', '1.0.1.post2'), ('torchvision', '0.2.2.post3'), ('tornado', '6.0.2'), ('traitlets', '4.3.2'), ('wcwidth', '0.1.7'), ('webencodings', '0.5.1'), ('wheel', '0.33.0'), ('widgetsnbextension', '3.4.2')]

Chapter 9: Listing 9.21

I noticed that for both teams, when calling team_step() we are using the same parameter vector param[0] for both teams:

        acts_1, act_means1, qvals1, obs_small_1, ids_1 = \
            team_step(team1,params[0],acts_1,layers) #B
        env.set_action(team1, acts_1.detach().numpy().astype(np.int32)) #C

        acts_2, act_means2, qvals2, obs_small_2, ids_2 = \
            team_step(team2,params[0],acts_2,layers)
        env.set_action(team2, acts_2.detach().numpy().astype(np.int32))

Shouldn't it be param[0] for team 1 and param[1] for team 2? That's the behaviour shown later when calling train:

            loss1 = train(batch_size,replay1,params[0],layers=layers,J=N1)
            loss2 = train(batch_size,replay2,params[1],layers=layers,J=N1)

Chapter3: a strange error when I run Listing 3.7

In the Listing 3.7, we use both memory replay and target network to improve the stablility.

However, in the memory loop:

if len(replay) > batch_size:
minibatch = random.sample(replay, batch_size)
...
action_batch = torch.Tensor([a for (s1,a,r,s2,d) in minibatch])

The compiler tells me this error:

---> 42 action_batch = torch.Tensor([a for (s1,a,r,s2,d) in minibatch])
too many dimensions 'str'

I suppose that when we memory, the action is represented by a characteristic. There, nevertheless, corresponding number are needed.

So I propose to make a reverse action set to fill this transform.

some pictures go blank in eBook!

I am very happy and grateful to read this brilliant book!

But I recently find some pictures in book is blank. In my case, Figure 3.17, 3.18, 4.5 are all blank.

I read the eBook from O'reilly, and I do hope these pictures can show up so that readers can understand all authors' thoughts!

Chapt 8: Curiosity Driven Deep Learning RAM Requirements

I have attempted to Chapt 8 code, as a python file, on a 32Gb CPU RAM Ubuntu 18.04 rig with 16Gb NVidia 1800 GTi GPU card. However my RAM Utilisation grows excessively as the training epochs run, exceeds 30 Gb when I hit 1800 epochs on the Super Mario Mario Curiosity Deep Training code.

The book suggested that this code would only take 30 minutes on a Mac Book Air (no GPU) So i don’t understand why the RAM use grows to excess as the training epochs grows.

Interested in any others experience on this, or why I would be experiencing such excessive and growing RAM utilisation.

requirements.txt has torch==0.4.0 which is not available

Looking in links: https://download.pytorch.org/whl/torch_stable.html
Collecting torch==0.4.0
Could not find a version that satisfies the requirement torch==0.4.0 (from versions: 0.1.2, 0.1.2.post1, 0.1.2.post2, 0.4.1, 0.4.1.post2, 1.0.0, 1.0.1, 1.0.1.post2, 1.1.0, 1.2.0, 1.2.0+cpu, 1.2.0+cu92, 1.3.0, 1.3.0+cpu, 1.3.0+cu100, 1.3.0+cu92, 1.3.1, 1.3.1+cpu, 1.3.1+cu100, 1.3.1+cu92, 1.4.0, 1.4.0+cpu, 1.4.0+cu100, 1.4.0+cu92, 1.5.0, 1.5.0+cpu, 1.5.0+cu101, 1.5.0+cu92, 1.5.1, 1.5.1+cpu, 1.5.1+cu101, 1.5.1+cu92, 1.6.0, 1.6.0+cpu, 1.6.0+cu101, 1.6.0+cu92, 1.7.0, 1.7.0+cpu, 1.7.0+cu101, 1.7.0+cu110, 1.7.0+cu92, 1.7.1, 1.7.1+cpu, 1.7.1+cu101, 1.7.1+cu110, 1.7.1+cu92, 1.7.1+rocm3.7, 1.7.1+rocm3.8)
No matching distribution found for torch==0.4.0

Chapter 5 - Listing 5.1 Introduction to multiprocessing

import multiprocessing as mp
import numpy as np
def square(x): #A
return np.square(x)
x = np.arange(64) #B
print(x)
print(mp.cpu_count())
pool = mp.Pool(4) #C

squared = pool.map(square, [x[8i:8i+8] for i in range(4)])
print(squared)

Unfortunately, I receive the following error after running the code.

print(squared)
[ 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47
48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63]
4

Process SpawnPoolWorker-9:
Traceback (most recent call last):
File "C:\ProgramData\Anaconda3\lib\multiprocessing\process.py", line 315, in _bootstrap
self.run()
File "C:\ProgramData\Anaconda3\lib\multiprocessing\process.py", line 108, in run
self._target(*self._args, **self._kwargs)
File "C:\ProgramData\Anaconda3\lib\multiprocessing\pool.py", line 114, in worker
task = get()
File "C:\ProgramData\Anaconda3\lib\multiprocessing\queues.py", line 358, in get
return _ForkingPickler.loads(res)
AttributeError: Can't get attribute 'square' on <module 'main' (built-in)>
Process SpawnPoolWorker-10:
Traceback (most recent call last):
File "C:\ProgramData\Anaconda3\lib\multiprocessing\process.py", line 315, in _bootstrap
self.run()
File "C:\ProgramData\Anaconda3\lib\multiprocessing\process.py", line 108, in run
self._target(*self._args, **self._kwargs)
File "C:\ProgramData\Anaconda3\lib\multiprocessing\pool.py", line 114, in worker
task = get()
File "C:\ProgramData\Anaconda3\lib\multiprocessing\queues.py", line 358, in get
return _ForkingPickler.loads(res)
AttributeError: Can't get attribute 'square' on <module 'main' (built-in)>
Process SpawnPoolWorker-11:
Traceback (most recent call last):
File "C:\ProgramData\Anaconda3\lib\multiprocessing\process.py", line 315, in _bootstrap
self.run()
File "C:\ProgramData\Anaconda3\lib\multiprocessing\process.py", line 108, in run
self._target(*self._args, **self._kwargs)
File "C:\ProgramData\Anaconda3\lib\multiprocessing\pool.py", line 114, in worker
task = get()
File "C:\ProgramData\Anaconda3\lib\multiprocessing\queues.py", line 358, in get
return _ForkingPickler.loads(res)
AttributeError: Can't get attribute 'square' on <module 'main' (built-in)>
Process SpawnPoolWorker-12:
Traceback (most recent call last):
File "C:\ProgramData\Anaconda3\lib\multiprocessing\process.py", line 315, in _bootstrap
self.run()
File "C:\ProgramData\Anaconda3\lib\multiprocessing\process.py", line 108, in run
self._target(*self._args, **self._kwargs)
File "C:\ProgramData\Anaconda3\lib\multiprocessing\pool.py", line 114, in worker
task = get()
File "C:\ProgramData\Anaconda3\lib\multiprocessing\queues.py", line 358, in get
return _ForkingPickler.loads(res)
AttributeError: Can't get attribute 'square' on <module 'main' (built-in)>

A question in Chapter two.

I am reading the book in Chapter two, and I have a question in a paragraph below.
In the reward function, We will assume that each arm is executed 10 times to determine if the probability value in the numpy array is less than prob?And whether the reward is a real reward or an estimated reward?

Per our casino example, we will be solving a 10-armed bandit problem, hence n = 10. We’ve also defined a numpy array of length n filled with random floats that can be understood as probabilities. The way we've chosen to implement our reward probability distributions for each arm/lever/slot machine is this: Each arm will have a probability, e.g. 0.7. The maximum reward is $10. We will setup a for loop to 10 and at each step, it will add +1 to the reward if a random float is less than the arm's probability. Thus on the first loop, it makes up a random float (e.g. 0.4). 0.4 is less than 0.7, so reward += 1. On the next iteration, it makes up another random float (e.g. 0.6) which is also less than 0.7, thus reward += 1. This continues until we complete 10 iterations and then we return the final total reward, which could be anything between 0 and 10. With an arm probability of 0.7, the average reward of doing this to infinity would be 7, but on any single play, it could be more or less.

def reward(prob, n=10):
    reward = 0;
    for i in range(n):
        if random.random() < prob:
            reward += 1
    return reward

Invalid link

Both numpy and pytorch links in readme file are invalid.

Ch 2 - Code doesn't run

Almost all the inline code snippets in chapter 2 are not working.

E.g:

Page 34

Code

plt.xlabel("Plays")
plt.ylabel("Avg Reward")
for i in range(500):
  if random.random() > eps:
    choice = get_best_arm(pastRewards, actions)
  else:
    choice = np.where(arms == np.random.choice(arms))[0][0]
  thisAV = np.array([[choice, reward(arms[choice])]])
  av = np.vstack((av, thisAV))
  percCorrect = 100*(len(av[np.where(av[:,0] == np.argmax(arms))])/len(av))
  runningMean = np.mean(av[:,1])
  plt.scatter(i, runningMean)

Error

NameError: name 'pastRewards' is not defined

Page 37

Code

for i in range(500):
  choice = np.random.choice(arms, p=av_softmax)
  counts[choice] += 1
  k = counts[choice]
  rwd = reward(arms[choice])
  old_avg = av[choice]
  new_avg = old_avg + (1/k)*(rwd - old_avg)
  av[choice] = new_avg
  av_softmax = softmax(av)

Error

IndexError: only integers, slices (`:`), ellipsis (`...`), numpy.newaxis (`None`) and integer or boolean arrays are valid indices

Page 41

Code

>>> x = torch.Tensor([2,4]) #input data
>>> m = torch.randn(2, requires_grad=True) #parameter 1
>>> b = torch.randn(1, requires_grad=True) #parameter 2
>>> y = m*x+b #linear model
>>> loss = (torch.sum(y_known - y))**2 #loss function
>>> loss.backward() #calculate gradients
>>> m.grad
tensor([ 0.7734, -90.4993])

Error

NameError: name 'y_known' is not defined

Code

model = torch.nn.Sequential(
  torchh.nn.Linear(10, 150),
  torch.nn.ReLU(),
  torch.nn.Linear(150, 4),
  torch.nn.ReLU(),
)
loss_fn = torch.nn.MSELoss()
optimizer = torch.optim.Adam(model.parameters(), lr=0.01)

Error

NameError: name 'torchh' is not defined

Missing explanation of Errata directory

Does the Errata folder contain the correct versions of the notebooks or older incorrect versions? I think it would be helpful to have that information in the README.md or to simply include only corrected versions of the notebooks.

Chapter5 - Pong

The German translation of the book (p. 134) promises:
"However, if you want, you can easily adapt the algorithm to a more difficult game like Pong in OpenAI Gym; you can find such an implementation on the GitHub page for this chapter: http://mng.bz/JzKp."
Unfortunately I couldn't find anything about it! Anyone know where the code is?

Auf Seite 134 im Buch (Hanser) steht:
"Wenn Sie möchten, können Sie den Algorithmus jedoch leicht an ein schwierigeres Spiel wie Pong in OpenAI Gym anpassen; eine solche Implementierung finden Sie auf der GitHub-Seite zu diesem Kapitel: http://mng.bz/JzKp."
Leider habe ich dazu nichts gefunden! Wer weiss, wo der Code ist?

Variable error

When I try to run the notebook I am getting the following error in the cell "Without experience replay"
'Variable' object has no attribute 'reshape'
The error occur in the line
newQ = model(new_state.reshape(1,64)).data.numpy()
I am running pytorch 0.3.1 on Windows 10

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.