microsoft / oac-explore Goto Github PK
View Code? Open in Web Editor NEWCode accompanying the paper "Better Exploration with Optimistic Actor Critic" (NeurIPS 2019)
License: MIT License
Code accompanying the paper "Better Exploration with Optimistic Actor Critic" (NeurIPS 2019)
License: MIT License
I've forked the repo and started making some changes to try something out, but the code appears to have a signature checker, because when I try to run main.py
the only output I get is the git diff and then the code exits. Any suggestions on how to disable that?
Obviously I have verified that I can run main.py
locally on the master branch, the issue is on my custom branch.
why conflict in reqirements.txt ?
Would it be straighforward to implement a batched version of get_optimistic_exploration_action
?
the following code generates an error in some of the most recent versions of py-torch
:
oac-explore/trainer/trainer.py
Lines 146 to 159 in cbc0333
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation
In order to solve it is necessary to move these lines
oac-explore/trainer/trainer.py
Lines 120 to 124 in cbc0333
between the q networks gradient steps and the steps on the policy network as so:
"""
Update networks
"""
self.qf1_optimizer.zero_grad()
qf1_loss.backward(retain_graph=True)
self.qf1_optimizer.step()
self.qf2_optimizer.zero_grad()
qf2_loss.backward(retain_graph=True)
self.qf2_optimizer.step()
q_new_actions = torch.min(
self.qf1(obs, new_obs_actions),
self.qf2(obs, new_obs_actions),
)
policy_loss = (alpha * log_pi - q_new_actions).mean()
self.policy_optimizer.zero_grad()
policy_loss.backward(retain_graph=True)
self.policy_optimizer.step()
Be aware that if you simply use an old version of pytorch to solve this problem the behaviour might not be what you expect since the policy_loss
was computed based on a network which no longer exists
Hi,
It seems the current code lack documentation. I just want to implement OAC and I do not know exactly how I can put the code together to do so.
I would appreciate it if you could make it more clear how people can use your code for OAC. There is little documentation on this.
Hi Quan,
I came across your paper and found it to be interesting. One of the doubts I have is with the implementation of the optimistic policies. Why are you computing gradients of the upper bound w.r.t pre-tanh of the policies? As per the paper, isn' it supposed to be the deterministic action (output of the tanh policy)?
Regards,
Kartik
oac-explore/plotting/plot_against_baseline.py
Line 130 in 715db5a
I ran the code with the Walker2d
environment, and got only around 3,000 score at 1M steps, while the score presented at the paper is over 4,000.
Hello, in the SAC paper "Soft Actor-Critic Algorithms and Applications" the calculation of the loss of alpha
is:
J(alpha) = E[-alpha * (log(pi) + H)]
However, in your implementation, the calculation of the loss of alpha
is instead (line 109 of "trainer.py"):
J(alpha) = E[-log(alpha) * (log(pi) + H)]
I am curious why the loss is calculated in this way. I have searched in Github for a couple of PyTorch based SAC implementations and they call calculate the loss in this way. But the TensorFlow based SAC implementations calculate the J(alpha)
in the same way as the SAC paper (https://github.com/rail-berkeley/softlearning/blob/master/softlearning/algorithms/sac.py). TensorFlow implementations still calculate the gradient with respect to log(alpha)
, but when calculating the loss J(alpha)
they use exp(log(alpha))
(which is alpha
) instead of log(alpha)
.
hi,Thank you for your code,it's Perfect . I want to know how to get Figure 7 in your paper.
think you for your code ,can you tell me how to deal with this error
/home/f/anaconda3/envs/f/bin/python /home/f/Downloads/oac-explore-master/main.py
Traceback (most recent call last):
File "/home/f/Downloads/oac-explore-master/main.py", line 219, in
variant['log_dir'] = get_log_dir(args)
File "/home/f/Downloads/oac-explore-master/main.py", line 165, in get_log_dir
get_current_branch('./'),
File "/home/f/Downloads/oac-explore-master/main.py", line 35, in get_current_branch
repo = Repo(dir)
File "/home/f/anaconda3/envs/f/lib/python3.7/site-packages/git/repo/base.py", line 181, in init
raise InvalidGitRepositoryError(epath)
git.exc.InvalidGitRepositoryError: /home/f/Downloads/oac-explore-master
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.