Comments (41)
Update:
After 10h with 16 threads it achieves reward of >400.
from pytorch-a3c.
The same code.
10 hours with 16 threads on xeon 2650 v4.
from pytorch-a3c.
I didn't count the number of frames.
Yes.
Which one? If you mean the one referenced above then I don't know. It's extremely difficult to get good results from A3C.
Because A3C is extremely sensitive to hyper parameters (even to random seed). DeepMind ran massive grid search to find the best hyper parameters. Then in evaluation they run 50 trials with fixed hyper parameters for each game and average top 5 performances. It's rather difficult to replicate that.
from pytorch-a3c.
Updates, it starts learning this time :D You saved my day. Thank you!
@apaszke Is it true that OMP_NUM_THREADS=1 is a necessary thing to run multi-thread pytorch code?
Thanks!
from pytorch-a3c.
The numbers are for my code, not from that repo.
I'm not sure whether it's physically possible to replicate DeepMind's results.
from pytorch-a3c.
Did you run with OMP_NUM_THREADS=1 ?
from pytorch-a3c.
It's not as good as DeepMind's implementation.
After several hours of training it gets reward around 300 and stops there.
from pytorch-a3c.
Have you tried the rmsprop optimizer with shared parameters (this is what the authors use) instead of adam?
from pytorch-a3c.
No, but it should be relatively easy to try.
from pytorch-a3c.
On this issue, are you aware of this discussion (dennybritz/reinforcement-learning#30) ?
[It's on dqn / tensorflow performance issue but the guess is that a3c tensorflow's performance issue has the same causes ]
Here cgel suggests the following makes the difference -
Important stuff:
Normalise input [0,1]
Clip rewards [0,1]
don't tf.reduce_mean the losses in the batch. Use tf.reduce_max
initialise properly the network with xavier init
use the optimizer that the paper uses. It is not same RMSProp as in tf
Not really sure how important:
They count steps differently. If action repeat is 4 then they count 4 steps for action. So divide all pertinent hyper-parameters by 4.
Little difference (at least in breakout):
pass terminal flag when life is lost
gym vs alewrap. Learning rate is different but If one works so will the other
Among important stuff, what are incorporated into your code ?
from pytorch-a3c.
Everything except the optimizer. But I posted a link to the same one as in DM's paper in the description of the repo. At the moment, I'm working on a different project and don't have time to try the correct one but I will gladly accept a pull request :)
Also from their discussion it looks like there is a typo here, and they mean reduce_sum instead of reduce_max.
from pytorch-a3c.
Wonderful. Thank you.
Just one more question. In terms of params settings, are they same as this
https://github.com/muupan/async-rl/wiki ?
from pytorch-a3c.
No, I decided to use parameters from the open ai starter agent.
from pytorch-a3c.
oki doki. Many thanks.
from pytorch-a3c.
@ikostrikov have you tried to somehow
use your meta-optimizer inside train.py
, and somehow
initialize and share it's optimizer, from main.py
. Just an idea that I thought you might have tried for Pong
from pytorch-a3c.
Not yet, I may try in the future. In my experience, for a fixed model and a fixed dataset meta optimizer tends to overfit. However, it's probably not a problem for atari.
from pytorch-a3c.
@ikostrikov thanks for the heads up on how the meta-optimiser performs !!!
Regarding models, this looks promising, XNOR-Net. As far as I know it's not been ported over to PyTorch yet, (they released it in Torch last year) ??? But if I do convert it, I'll let you know if I get it working with your meta-optimiser, and how it performs.
from pytorch-a3c.
@ikostrikov Super !!
from pytorch-a3c.
A3C and Breakout:
How did you get reward > 400? (using the same code or did you make some changes?)
I want to run some code and get > 400 rewards, what should I do?
Regards
from pytorch-a3c.
Thank you
How many frames did you process in 10 hours?
I will clone the code again from:
https://github.com/dennybritz/reinforcement-learning/tree/master/PolicyGradient/a3c
and try .... correct?
This should replicate DeepMind's results, correct?
from pytorch-a3c.
I am very sorry for asking many questions ...
How many frames did you process in 10 hours?
Is this your code? https://github.com/ikostrikov/pytorch-a3c
Do you have any clue why this repo do not work as expected?! (rewards are around 30 to 35 for Breakout?!)
Why you are not sure whether it's physically possible to replicate DeepMind's results.
from pytorch-a3c.
@dylanthomas , I find the current repo does not learn as expected. Did you make it work?
Time 00h 08m 56s, episode reward -2.0, episode length 106
Time 00h 09m 58s, episode reward -2.0, episode length 111
Time 00h 11m 04s, episode reward -2.0, episode length 113
Time 00h 12m 08s, episode reward -2.0, episode length 104
Time 00h 13m 13s, episode reward -2.0, episode length 111
Time 00h 14m 17s, episode reward -2.0, episode length 107
Time 00h 15m 22s, episode reward -2.0, episode length 110
Time 00h 16m 26s, episode reward -2.0, episode length 105
Time 00h 17m 31s, episode reward -2.0, episode length 104
Time 00h 18m 37s, episode reward -3.0, episode length 156
Time 00h 19m 44s, episode reward -3.0, episode length 156
Time 00h 21m 13s, episode reward -21.0, episode length 764
Time 00h 22m 43s, episode reward -21.0, episode length 764
Time 00h 24m 07s, episode reward -21.0, episode length 764
Time 00h 25m 15s, episode reward -4.0, episode length 179
Time 00h 26m 44s, episode reward -21.0, episode length 764
Time 00h 28m 02s, episode reward -11.0, episode length 425
Time 00h 29m 36s, episode reward -21.0, episode length 764
Time 00h 31m 29s, episode reward -21.0, episode length 1324
Time 00h 32m 58s, episode reward -21.0, episode length 764
Time 00h 34m 30s, episode reward -21.0, episode length 764
Time 00h 36m 01s, episode reward -21.0, episode length 764
Time 00h 37m 30s, episode reward -21.0, episode length 764
Time 00h 39m 32s, episode reward -21.0, episode length 1324
from pytorch-a3c.
How many threads did you use?
from pytorch-a3c.
@ikostrikov Thanks for your quick reply.
I am using 8 threads
from pytorch-a3c.
@ikostrikov Wow, maybe that's the reason. I didn't notice this. Why is this important?
from pytorch-a3c.
Otherwise it uses multiple cores for OMP within a thread.
from pytorch-a3c.
Do you think it is necessary to add:
model.zero_grad()
at https://github.com/ikostrikov/pytorch-a3c/blob/master/train.py#L108 ?
from pytorch-a3c.
@ikostrikov Thanks!, I will try it again~
from pytorch-a3c.
optimizer.zero_gradient() already zeros the gradients.
from pytorch-a3c.
@ikostrikov but it only zeros the gradient of the shared_model right?
from pytorch-a3c.
https://github.com/ikostrikov/pytorch-a3c/blob/master/train.py#L18 They share gradients within this thread.
from pytorch-a3c.
@ikostrikov many thanks!
But I still do not get it why using multiple cores for OMP within a thread will be a problem. Won't that make the algorithm faster?
from pytorch-a3c.
It will make some threads sequential and you will effectively collect less data.
from pytorch-a3c.
@ikostrikov I see, do you mean that all the threads are very likely to end up processing the same frame, thus useless?
from pytorch-a3c.
I think it happens for many reasons. Just try to run it this way :)
from pytorch-a3c.
Thanks! this really bothers me in the last two days.
Do you think this issue is with pytorch or general python code?
Cause I saw many tensorflow implementations do not pose this constraint.
from pytorch-a3c.
I think it's just the way multiprocessing is organized in PyTorch. I think the authors of PyTorch have a better answer. I also found this thing surprising.
from pytorch-a3c.
@ikostrikov When doing the optimizer.step(), I notice that there is no mutex to protect the shared_model weights, do you think it is safe?
from pytorch-a3c.
In DM's paper they say they perform async updates without locks.
from pytorch-a3c.
Oh I see, thanks!
from pytorch-a3c.
Will the following code restrain the shard_model grad only bound with one local_model?
Cause share_model.grad will not be None after running the following function.
def ensure_shared_grads(model, shared_model):
for param, shared_param in zip(model.parameters(), shared_model.parameters()):
if shared_param.grad is not None:
return
shared_param._grad = param.grad
from pytorch-a3c.
Related Issues (20)
- gradient share problem HOT 1
- GAE parameter name should be lambda not tau. And why is default 1.0? HOT 4
- What's the difference between environment 'Pong-v4' and 'PongDeterministic-v4'
- Reward Smoothing
- Multi-processing or multi-threading HOT 1
- The while True loop of function train?
- NotImplementedError HOT 6
- [Question] Does a2c support distributed processing?
- Question in train.py
- with respect to how to choose an action
- How does A3C aggregate the model from different learner? HOT 1
- Why do we reverse rewards? HOT 1
- Dependency list not provided (environment.yml file)
- Stuck in 'p.join()' HOT 1
- After some steps, all the NNs always output same action HOT 1
- Scepticism about the correctness of the use of the LSTMCell
- Can you provide the python, pytorch, numpy and other versions used in the project?
- TypeError: tuple indices must be integers or slices, not tuple
- if there's no "if shared_param.grad is not None: return" what will happen? HOT 1
- where see the result?
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from pytorch-a3c.