ajlangley / cpo-pytorch Goto Github PK

An implementation of Constrained Policy Optimization (Achiam 2017) in PyTorch

Python 100.00%

cpo-pytorch's Issues

Where can i find the AntGather env?

hi, nice work for this. I found there is some code in the python file related to AntGather env, but in the directory envs I didn't find AntGatherEnv, is it missing when you upload this git repo?

line 2 lead to imp_sampling=1

log_action_probs = action_dists.log_prob(actions)

    imp_sampling = torch.exp(log_action_probs - log_action_probs.detach())
    # Change to torch.matmul
    reward_loss = -torch.mean(imp_sampling * reward_advs)

Since, log_action_probs - log_action_probs.detach()=0,
imp_sampling is a all one vector

"from envs.ant_gather import AntGatherEnv"

there is no "ant_gather" in the envs folder.

mean kl is always=0

hi, I notice that in your code, mean_kl always=0
constraint_grad = flat_grad(constraint_loss, self.policy.parameters(), retain_graph=True) # (b)

    mean_kl = mean_kl_first_fixed(actions_dists, actions_dists)
    Fvp_fun = get_Hvp_fun(mean_kl, self.policy.parameters())

what is the meaning of a gradient of a constant?

[question] How to turn my custom environment into an environment suitable for CPO?

I have a custom environment that I use to train PPO and SAC agents on it. I have some constraints that I penalize through the reward function when they are violated and I would like to test the performance of CPO in this context. My question is what steps should I follow to be able to use your CPO implementation. For example, how can I integrate a cost function?

Does it converge?

I transplanted this code to my environment but it did not converge.

a "bug"? in the cpo method

reward_advs -= reward_advs.mean()
reward_advs /= reward_advs.std()
cost_advs -= \textbf{reward_advs}.mean()
cost_advs /= cost_advs.std()

I guess on line 3, it should be mean of the cost?

mj_loadXML error: b'Error: engine error: Could not allocate memory'

(cpo-pytorch-master) D:\2021\ReinforcementLearning\cpo-pytorch-master\cpo-pytorch-master>python train.py --model-name point_gather
train.py:32: YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details.
config = load(open(config_filename, 'r'))[model_name]
Traceback (most recent call last):
File "train.py", line 57, in
simulator = SinglePathSimulator(env_name, policy, n_trajectories, trajectory_len)
File "D:\2021\ReinforcementLearning\cpo-pytorch-master\cpo-pytorch-master\simulators.py", line 37, in init
**env_args)
File "D:\2021\ReinforcementLearning\cpo-pytorch-master\cpo-pytorch-master\autoassign.py", line 52, in decorated
return f(self, *args, **kwargs)
File "D:\2021\ReinforcementLearning\cpo-pytorch-master\cpo-pytorch-master\simulators.py", line 24, in init
self.env = np.asarray([make_env(env_name, **env_args) for i in range(n_trajectories)])
File "D:\2021\ReinforcementLearning\cpo-pytorch-master\cpo-pytorch-master\simulators.py", line 24, in
self.env = np.asarray([make_env(env_name, **env_args) for i in range(n_trajectories)])
File "D:\2021\ReinforcementLearning\cpo-pytorch-master\cpo-pytorch-master\simulators.py", line 16, in make_env
return PointGatherEnv(**env_args)
File "D:\2021\ReinforcementLearning\cpo-pytorch-master\cpo-pytorch-master\envs\point_gather.py", line 13, in init
sensor_span)
File "D:\2021\ReinforcementLearning\cpo-pytorch-master\cpo-pytorch-master\autoassign.py", line 52, in decorated
return f(self, *args, **kwargs)
File "D:\2021\ReinforcementLearning\cpo-pytorch-master\cpo-pytorch-master\envs\gather_env.py", line 31, in init
self.model = self.build_model(model_path)
File "D:\2021\ReinforcementLearning\cpo-pytorch-master\cpo-pytorch-master\envs\gather_env.py", line 67, in build_model
model = load_model_from_xml(model_xml)
File "cymj.pyx", line 147, in mujoco_py.cymj.load_model_from_xml
Exception:

Failed to load XML from file: C:\Users\***\AppData\Local\Temp\tmpt9hpp_y5.xml. mj_loadXML error: b'Error: engine error: Could not allocate memory'

Trying to run train.py , and this bug occurs. Can anybody help me with this ?

Thanks!

Some questions about the codes

Thanks for your sharing. I just have some questions when I'm reading your codes.

Q1 In line_search.py Line 39
if constraints_satisfied(step_len * search_dir, step_len):

I think it should be constraints_satisfied(search_dir, step_len), because in cpo.py Line 190 ,test_policy = current_policy + step_len * search_dir. It seems that you multiply step_len by twice.

Q2 in cpo.py Line 205
cost_cond = step_len * torch.matmul(constraint_grad, search_dir) <= max(-c, 0.0)

It seems that you want test the constraint of the linearized problem (Equation 11 in the CPO paper). But according to the original paper (the last line in Algorithm 1 ), it should be testd the non-linearized problem (Equation 10 in the CPO paper).
So the right code may be
cost_cond = test_cost<= max(-c, 0.0)
(Besides, I notice the test_cost is calculated but not used in the codes)

ajlangley / cpo-pytorch Goto Github PK

cpo-pytorch's Issues

Where can i find the AntGather env?

Nice work

line 2 lead to imp_sampling=1

"from envs.ant_gather import AntGatherEnv"

mean kl is always=0

[question] How to turn my custom environment into an environment suitable for CPO?

Does it converge?

a "bug"? in the cpo method

mj_loadXML error: b'Error: engine error: Could not allocate memory'

Some questions about the codes

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent