justinjfu / inverse_rl Goto Github PK
View Code? Open in Web Editor NEWLicense: MIT License
License: MIT License
Hi,
Thank you very much for releasing your code. Could you please tell me how to run this code on other environments like HalfCheeta.
Thanks
Hi,
Thank you very much for the open research and the code. I've implemented your environment for my research (with appropriate references), After running SAC over the CustomeAnt environment, I have realized that the reward return doesn't reorient the agent and the Ant still walks sideways. Also according to the code, I can't find the objective of the mentioned behavior (Figure 5 of the paper).
I am looking forward to your kind reply,
Thanks,
Code with the discrepancy:
The reorientation has not been designed in the following code.
def _step(self, a):
vel = self.model.data.qvel.flat[0]
forward_reward = vel
self.do_simulation(a, self.frame_skip)
ctrl_cost = .01 * np.square(a).sum()
contact_cost = 0.5 * 1e-3 * np.sum(
np.square(np.clip(self.model.data.cfrc_ext, -1, 1)))
state = self.state_vector()
flipped = not (state[2] >= 0.2)
flipped_rew = -1 if flipped else 0
reward = forward_reward - ctrl_cost - contact_cost +flipped_rew
self.timesteps += 1
done = self.timesteps >= self.max_timesteps
ob = self._get_obs()
return ob, reward, done, dict(
reward_forward=forward_reward,
reward_ctrl=-ctrl_cost,
reward_contact=-contact_cost,
reward_flipped=flipped_rew)
Thank you very much for releasing your code. When I run ant_data_collect.py , some problem occured:
python ant_data_collect.py
2019-03-25 15:35:17.709269 CST | Warning: skipping Gym environment monitoring since snapshot_dir not configured.
2019-03-25 15:35:17.709250 CST | Warning: skipping Gym environment monitoring since snapshot_dir not configured.
2019-03-25 15:35:17.709358 CST | Warning: skipping Gym environment monitoring since snapshot_dir not configured.
2019-03-25 15:35:17.709459 CST | Warning: skipping Gym environment monitoring since snapshot_dir not configured.
multiprocessing.pool.RemoteTraceback:
"""
Traceback (most recent call last):
File "/home/sylyvahn/anaconda3/envs/rllab/lib/python3.5/multiprocessing/pool.py", line 119, in worker
result = (True, func(*args, **kwds))
File "/home/sylyvahn/anaconda3/envs/rllab/lib/python3.5/multiprocessing/pool.py", line 44, in mapstar
return list(map(*args))
File "/home/sylyvahn/Downloads/AIRL-ahq1993/inverse_rl-master/inverse_rl/utils/hyper_sweep.py", line 47, in kwargs_wrapper
return method(**args)
File "ant_data_collect.py", line 16, in main
env = TfEnv(CustomGymEnv('CustomAnt-v0', record_video=False, record_log=False, force_reset=False))
File "/home/sylyvahn/Downloads/AIRL-ahq1993/inverse_rl-master/inverse_rl/envs/env_utils.py", line 123, in init
video_schedule=FixedIntervalVideoSchedule(50), force_reset=force_reset)
File "/home/sylyvahn/Downloads/AIRL-ahq1993/inverse_rl-master/inverse_rl/envs/env_utils.py", line 35, in init
env = gym.envs.make(env_name)
File "/home/sylyvahn/anaconda3/envs/rllab/lib/python3.5/site-packages/gym/envs/registration.py", line 171, in make
return registry.make(id, **kwargs)
File "/home/sylyvahn/anaconda3/envs/rllab/lib/python3.5/site-packages/gym/envs/registration.py", line 123, in make
env = spec.make(kwargs)
File "/home/sylyvahn/anaconda3/envs/rllab/lib/python3.5/site-packages/gym/envs/registration.py", line 87, in make
env = cls(_kwargs)
File "/home/sylyvahn/Downloads/AIRL-ahq1993/inverse_rl-master/inverse_rl/envs/ant_env.py", line 209, in init
mujoco_env.MujocoEnv.init(self, f.name, 5)
File "/home/sylyvahn/anaconda3/envs/rllab/lib/python3.5/contextlib.py", line 77, in exit
self.gen.throw(type, value, traceback)
File "/home/sylyvahn/Downloads/AIRL-ahq1993/inverse_rl-master/inverse_rl/envs/dynamic_mjc/model_builder.py", line 57, in asfile
yield f
File "/home/sylyvahn/Downloads/AIRL-ahq1993/inverse_rl-master/inverse_rl/envs/ant_env.py", line 209, in init
mujoco_env.MujocoEnv.init(self, f.name, 5)
File "/home/sylyvahn/anaconda3/envs/rllab/lib/python3.5/site-packages/gym/envs/mujoco/mujoco_env.py", line 42, in init
observation, _reward, done, _info = self.step(np.zeros(self.model.nu))
File "/home/sylyvahn/anaconda3/envs/rllab/lib/python3.5/site-packages/gym/core.py", line 62, in step
raise NotImplementedError
NotImplementedError
"""
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "ant_data_collect.py", line 41, in
run_sweep_parallel(main, params_dict, repeat=4)
File "/home/sylyvahn/Downloads/AIRL-ahq1993/inverse_rl-master/inverse_rl/utils/hyper_sweep.py", line 57, in run_sweep_parallel
pool.map(kwargs_wrapper, exp_args)
File "/home/sylyvahn/anaconda3/envs/rllab/lib/python3.5/multiprocessing/pool.py", line 260, in map
return self._map_async(func, iterable, mapstar, chunksize).get()
File "/home/sylyvahn/anaconda3/envs/rllab/lib/python3.5/multiprocessing/pool.py", line 608, in get
raise self._value
NotImplementedError
Exception ignored in: <_io.TextIOWrapper name='' mode='w' encoding='UTF-8'>
BrokenPipeError: [Errno 32] Broken pipe
The current policy optimisations for IRL do not allow for a discrete (CategoricalMLPPolicy) policy. Is there a specific reason why there is no support for discrete action spaces?
Thank you
What is the best way to roll out the learned reward to visualize the results of IRL?
What process would it take to recreate the program with PyTorch?
In the AIRL class there is an assert for the action space of the environments.
assert isinstance(env.action_space, Box)
The action space of my environment is actually discrete, is there a reason why this only works for continuous actions rather than discrete actions? How should one proceed to allow for discrete actions?
Thank you!
In GAIL and GCL papers it is specified, that the policy is sampling and optimization is executed N times before each update of the discriminator. But here in GAIL and GCP implementations policy is sampled only once. Is there any particular reason for it?
Hi,
I have a question regarding the re-optimization of the learned reward function in the state-only case.
I cannot understand if the function that is re-optimized is only the r(s) or the entire f(s,s')=r(s)+gamma*v(s')-v(s).
Thanks!
Hi,
I was wondering if there is any update regarding when the full code for AIRL will be released now that the paper has been accepted to ICLR.
Thanks
Hi,
In your example script (i.e. scripts/pendulum_gcl.py), it runs with GCLDiscrim which is single tilmestep of GCL.
Could you provide an example script with GCLDiscrimTrajectory?
Is it enough if I modify
irl_model = GCLDiscrim(env_spec=env.spec, expert_trajs=experts)
to
irl_model = GCLDiscrimTrajectory(env_spec=env.spec, expert_trajs=experts)
in that script?
Thanks
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.