Have you trained Breakout with your a3c by any chance? I wonder that kind of scores y

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

On this issue, are you aware of this discussion (<a class="issue-link js-issue-link" d

Performance with Breakout about pytorch-a3c HOT 41 CLOSED

ikostrikov commented on July 25, 2024

Performance with Breakout

from pytorch-a3c.

Comments (41)

ikostrikov commented on July 25, 2024 4

@dylanthomas

Update:
After 10h with 16 threads it achieves reward of >400.

from pytorch-a3c.

ikostrikov commented on July 25, 2024 2

The same code.

10 hours with 16 threads on xeon 2650 v4.

from pytorch-a3c.

ikostrikov commented on July 25, 2024 2

I didn't count the number of frames.

Yes.

Which one? If you mean the one referenced above then I don't know. It's extremely difficult to get good results from A3C.

Because A3C is extremely sensitive to hyper parameters (even to random seed). DeepMind ran massive grid search to find the best hyper parameters. Then in evaluation they run 50 trials with fixed hyper parameters for each game and average top 5 performances. It's rather difficult to replicate that.

from pytorch-a3c.

ypxie commented on July 25, 2024 2

Updates, it starts learning this time :D You saved my day. Thank you!
@apaszke Is it true that OMP_NUM_THREADS=1 is a necessary thing to run multi-thread pytorch code?
Thanks!

from pytorch-a3c.

ikostrikov commented on July 25, 2024 1

The numbers are for my code, not from that repo.

I'm not sure whether it's physically possible to replicate DeepMind's results.

from pytorch-a3c.

ikostrikov commented on July 25, 2024 1

Did you run with OMP_NUM_THREADS=1 ?

from pytorch-a3c.

ikostrikov commented on July 25, 2024

It's not as good as DeepMind's implementation.

After several hours of training it gets reward around 300 and stops there.

from pytorch-a3c.

pfrendl commented on July 25, 2024

Have you tried the rmsprop optimizer with shared parameters (this is what the authors use) instead of adam?

from pytorch-a3c.

ikostrikov commented on July 25, 2024

No, but it should be relatively easy to try.

from pytorch-a3c.

dylanthomas commented on July 25, 2024

On this issue, are you aware of this discussion (dennybritz/reinforcement-learning#30) ?

[It's on dqn / tensorflow performance issue but the guess is that a3c tensorflow's performance issue has the same causes ]

Here cgel suggests the following makes the difference -

Important stuff:

Normalise input [0,1]
Clip rewards [0,1]
don't tf.reduce_mean the losses in the batch. Use tf.reduce_max
initialise properly the network with xavier init
use the optimizer that the paper uses. It is not same RMSProp as in tf

Not really sure how important:

They count steps differently. If action repeat is 4 then they count 4 steps for action. So divide all pertinent hyper-parameters by 4.

Little difference (at least in breakout):

pass terminal flag when life is lost
gym vs alewrap. Learning rate is different but If one works so will the other

Among important stuff, what are incorporated into your code ?

from pytorch-a3c.

ikostrikov commented on July 25, 2024

Everything except the optimizer. But I posted a link to the same one as in DM's paper in the description of the repo. At the moment, I'm working on a different project and don't have time to try the correct one but I will gladly accept a pull request :)

Also from their discussion it looks like there is a typo here, and they mean reduce_sum instead of reduce_max.

from pytorch-a3c.

dylanthomas commented on July 25, 2024

Wonderful. Thank you.
Just one more question. In terms of params settings, are they same as this
https://github.com/muupan/async-rl/wiki ?

from pytorch-a3c.

ikostrikov commented on July 25, 2024

No, I decided to use parameters from the open ai starter agent.

from pytorch-a3c.

dylanthomas commented on July 25, 2024

oki doki. Many thanks.

from pytorch-a3c.

AjayTalati commented on July 25, 2024

@ikostrikov have you tried to somehow use your meta-optimizer inside train.py, and somehow initialize and share it's optimizer, from main.py. Just an idea that I thought you might have tried for Pong

from pytorch-a3c.

ikostrikov commented on July 25, 2024

Not yet, I may try in the future. In my experience, for a fixed model and a fixed dataset meta optimizer tends to overfit. However, it's probably not a problem for atari.

from pytorch-a3c.

AjayTalati commented on July 25, 2024

@ikostrikov thanks for the heads up on how the meta-optimiser performs !!!

Regarding models, this looks promising, XNOR-Net. As far as I know it's not been ported over to PyTorch yet, (they released it in Torch last year) ??? But if I do convert it, I'll let you know if I get it working with your meta-optimiser, and how it performs.

from pytorch-a3c.

dylanthomas commented on July 25, 2024

@ikostrikov Super !!

from pytorch-a3c.

IbrahimSobh commented on July 25, 2024

A3C and Breakout:

How did you get reward > 400? (using the same code or did you make some changes?)

I want to run some code and get > 400 rewards, what should I do?

Regards

from pytorch-a3c.

IbrahimSobh commented on July 25, 2024

Thank you

How many frames did you process in 10 hours?

I will clone the code again from:

https://github.com/dennybritz/reinforcement-learning/tree/master/PolicyGradient/a3c

and try .... correct?

This should replicate DeepMind's results, correct?

from pytorch-a3c.

IbrahimSobh commented on July 25, 2024

I am very sorry for asking many questions ...

How many frames did you process in 10 hours?

Is this your code? https://github.com/ikostrikov/pytorch-a3c

Do you have any clue why this repo do not work as expected?! (rewards are around 30 to 35 for Breakout?!)

Why you are not sure whether it's physically possible to replicate DeepMind's results.

from pytorch-a3c.

ypxie commented on July 25, 2024

@dylanthomas , I find the current repo does not learn as expected. Did you make it work?

Time 00h 08m 56s, episode reward -2.0, episode length 106
Time 00h 09m 58s, episode reward -2.0, episode length 111
Time 00h 11m 04s, episode reward -2.0, episode length 113
Time 00h 12m 08s, episode reward -2.0, episode length 104
Time 00h 13m 13s, episode reward -2.0, episode length 111
Time 00h 14m 17s, episode reward -2.0, episode length 107
Time 00h 15m 22s, episode reward -2.0, episode length 110
Time 00h 16m 26s, episode reward -2.0, episode length 105
Time 00h 17m 31s, episode reward -2.0, episode length 104
Time 00h 18m 37s, episode reward -3.0, episode length 156
Time 00h 19m 44s, episode reward -3.0, episode length 156
Time 00h 21m 13s, episode reward -21.0, episode length 764
Time 00h 22m 43s, episode reward -21.0, episode length 764
Time 00h 24m 07s, episode reward -21.0, episode length 764
Time 00h 25m 15s, episode reward -4.0, episode length 179
Time 00h 26m 44s, episode reward -21.0, episode length 764
Time 00h 28m 02s, episode reward -11.0, episode length 425
Time 00h 29m 36s, episode reward -21.0, episode length 764
Time 00h 31m 29s, episode reward -21.0, episode length 1324
Time 00h 32m 58s, episode reward -21.0, episode length 764
Time 00h 34m 30s, episode reward -21.0, episode length 764
Time 00h 36m 01s, episode reward -21.0, episode length 764
Time 00h 37m 30s, episode reward -21.0, episode length 764
Time 00h 39m 32s, episode reward -21.0, episode length 1324

from pytorch-a3c.

ikostrikov commented on July 25, 2024

How many threads did you use?

from pytorch-a3c.

ypxie commented on July 25, 2024

@ikostrikov Thanks for your quick reply.
I am using 8 threads

from pytorch-a3c.

ypxie commented on July 25, 2024

@ikostrikov Wow, maybe that's the reason. I didn't notice this. Why is this important?

from pytorch-a3c.

ikostrikov commented on July 25, 2024

Otherwise it uses multiple cores for OMP within a thread.

from pytorch-a3c.

ypxie commented on July 25, 2024

Do you think it is necessary to add:

model.zero_grad()

at https://github.com/ikostrikov/pytorch-a3c/blob/master/train.py#L108 ?

from pytorch-a3c.

ypxie commented on July 25, 2024

@ikostrikov Thanks!, I will try it again~

from pytorch-a3c.

ikostrikov commented on July 25, 2024

optimizer.zero_gradient() already zeros the gradients.

from pytorch-a3c.

ypxie commented on July 25, 2024

@ikostrikov but it only zeros the gradient of the shared_model right?

from pytorch-a3c.

ikostrikov commented on July 25, 2024

https://github.com/ikostrikov/pytorch-a3c/blob/master/train.py#L18 They share gradients within this thread.

from pytorch-a3c.

ypxie commented on July 25, 2024

@ikostrikov many thanks!
But I still do not get it why using multiple cores for OMP within a thread will be a problem. Won't that make the algorithm faster?

from pytorch-a3c.

ikostrikov commented on July 25, 2024

It will make some threads sequential and you will effectively collect less data.

from pytorch-a3c.

ypxie commented on July 25, 2024

@ikostrikov I see, do you mean that all the threads are very likely to end up processing the same frame, thus useless?

from pytorch-a3c.

ikostrikov commented on July 25, 2024

I think it happens for many reasons. Just try to run it this way :)

from pytorch-a3c.

ypxie commented on July 25, 2024

Thanks! this really bothers me in the last two days.
Do you think this issue is with pytorch or general python code?
Cause I saw many tensorflow implementations do not pose this constraint.

from pytorch-a3c.

ikostrikov commented on July 25, 2024

I think it's just the way multiprocessing is organized in PyTorch. I think the authors of PyTorch have a better answer. I also found this thing surprising.

from pytorch-a3c.

ypxie commented on July 25, 2024

@ikostrikov When doing the optimizer.step(), I notice that there is no mutex to protect the shared_model weights, do you think it is safe?

from pytorch-a3c.

ikostrikov commented on July 25, 2024

In DM's paper they say they perform async updates without locks.

from pytorch-a3c.

ypxie commented on July 25, 2024

Oh I see, thanks!

from pytorch-a3c.

ypxie commented on July 25, 2024

Will the following code restrain the shard_model grad only bound with one local_model?
Cause share_model.grad will not be None after running the following function.

def ensure_shared_grads(model, shared_model):
    for param, shared_param in zip(model.parameters(), shared_model.parameters()):
        if shared_param.grad is not None:
            return
        shared_param._grad = param.grad

from pytorch-a3c.

Performance with Breakout about pytorch-a3c HOT 41 CLOSED

Comments (41)

Here cgel suggests the following makes the difference -

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent