Comments (6)
Do you mean this commit?
pytorch/examples@5c41070#diff-054e0153cf9e86773f4d272224dd1d13
from pytorch-a3c.
Yes, but I see you call the ensure_shared_grads function after backprop but before the optimizer step, is this still required? To be honest I am still a bit surprised / impressed that asynchronous updates are so straight forward in PyTorch. Essentially I am implementing a A3C RL agent and want to be sure that all processes are updating the global model, I was following the Hogwild example and your code but I noticed this difference.
Cheers
from pytorch-a3c.
When I remove these lines it stops working for me. Probably, it requires to change something else.
from pytorch-a3c.
I think these lines in train are no longer required:
24 model = ActorCritic(env.observation_space.shape[0], env.action_space)
38 model.load_state_dict(shared_model.state_dict())
and also change all instances of model to shared_model
At least that is how it appears to work for the Hogwild example.
It is tricky because this would mean that during an episode the model can potentially change its parameters due to updates from the other processes.
Perhaps the original approach is better as there are advantages and disadvantages to the changes (assuming they would work).
Sorry for the hassle!
from pytorch-a3c.
These lines are important because we need to keep parameters during roll outs.
Probably, I will have to spend more time trying to figure out what they changed.
In meanwhile I would highly recommend to use A2C/PPO/ACKTR. In my experience, it works just as well (except some cases).
from pytorch-a3c.
Ok thank you, I will defer to your expertise and try A2C for the moment.
I have a more general question if you have the time, why do you think it is better to keep the parameters the same during the rollout? Because of the GAE?
from pytorch-a3c.
Related Issues (20)
- gradient share problem HOT 1
- GAE parameter name should be lambda not tau. And why is default 1.0? HOT 4
- What's the difference between environment 'Pong-v4' and 'PongDeterministic-v4'
- Reward Smoothing
- Multi-processing or multi-threading HOT 1
- The while True loop of function train?
- NotImplementedError HOT 6
- [Question] Does a2c support distributed processing?
- Question in train.py
- with respect to how to choose an action
- How does A3C aggregate the model from different learner? HOT 1
- Why do we reverse rewards? HOT 1
- Dependency list not provided (environment.yml file)
- Stuck in 'p.join()' HOT 1
- After some steps, all the NNs always output same action HOT 1
- Scepticism about the correctness of the use of the LSTMCell
- Can you provide the python, pytorch, numpy and other versions used in the project?
- TypeError: tuple indices must be integers or slices, not tuple
- if there's no "if shared_param.grad is not None: return" what will happen? HOT 1
- where see the result?
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from pytorch-a3c.