justinjfu / inverse_rl Goto Github PK

View Code? Open in Web Editor NEW

253.0 253.0 74.0 67 KB

License: MIT License

Python 100.00%

inverse_rl's People

Contributors

Stargazers

Watchers

inverse_rl's Issues

Running Swimmer, HalfCheeta, Humanoid etc

Hi,

Thank you very much for releasing your code. Could you please tell me how to run this code on other environments like HalfCheeta.

Thanks

Discrepancy between the Ant experiment in the paper and the code.

Hi,
Thank you very much for the open research and the code. I've implemented your environment for my research (with appropriate references), After running SAC over the CustomeAnt environment, I have realized that the reward return doesn't reorient the agent and the Ant still walks sideways. Also according to the code, I can't find the objective of the mentioned behavior (Figure 5 of the paper).

I am looking forward to your kind reply,

Thanks,

Code with the discrepancy:
The reorientation has not been designed in the following code.

def _step(self, a):
        vel = self.model.data.qvel.flat[0]
        forward_reward = vel
        self.do_simulation(a, self.frame_skip)

        ctrl_cost = .01 * np.square(a).sum()
        contact_cost = 0.5 * 1e-3 * np.sum(
            np.square(np.clip(self.model.data.cfrc_ext, -1, 1)))
        state = self.state_vector()
        flipped = not (state[2] >= 0.2) 
        flipped_rew = -1 if flipped else 0
        reward = forward_reward - ctrl_cost - contact_cost +flipped_rew

        self.timesteps += 1
        done = self.timesteps >= self.max_timesteps

        ob = self._get_obs()
        return ob, reward, done, dict(
            reward_forward=forward_reward,
            reward_ctrl=-ctrl_cost,
            reward_contact=-contact_cost,
            reward_flipped=flipped_rew)

NotImplementedError

Thank you very much for releasing your code. When I run ant_data_collect.py , some problem occured:

python ant_data_collect.py
2019-03-25 15:35:17.709269 CST | Warning: skipping Gym environment monitoring since snapshot_dir not configured.
2019-03-25 15:35:17.709250 CST | Warning: skipping Gym environment monitoring since snapshot_dir not configured.
2019-03-25 15:35:17.709358 CST | Warning: skipping Gym environment monitoring since snapshot_dir not configured.
2019-03-25 15:35:17.709459 CST | Warning: skipping Gym environment monitoring since snapshot_dir not configured.
multiprocessing.pool.RemoteTraceback:
"""
Traceback (most recent call last):
File "/home/sylyvahn/anaconda3/envs/rllab/lib/python3.5/multiprocessing/pool.py", line 119, in worker
result = (True, func(*args, **kwds))
File "/home/sylyvahn/anaconda3/envs/rllab/lib/python3.5/multiprocessing/pool.py", line 44, in mapstar
return list(map(*args))
File "/home/sylyvahn/Downloads/AIRL-ahq1993/inverse_rl-master/inverse_rl/utils/hyper_sweep.py", line 47, in kwargs_wrapper
return method(**args)
File "ant_data_collect.py", line 16, in main
env = TfEnv(CustomGymEnv('CustomAnt-v0', record_video=False, record_log=False, force_reset=False))
File "/home/sylyvahn/Downloads/AIRL-ahq1993/inverse_rl-master/inverse_rl/envs/env_utils.py", line 123, in init
video_schedule=FixedIntervalVideoSchedule(50), force_reset=force_reset)
File "/home/sylyvahn/Downloads/AIRL-ahq1993/inverse_rl-master/inverse_rl/envs/env_utils.py", line 35, in init
env = gym.envs.make(env_name)
File "/home/sylyvahn/anaconda3/envs/rllab/lib/python3.5/site-packages/gym/envs/registration.py", line 171, in make
return registry.make(id, **kwargs)
File "/home/sylyvahn/anaconda3/envs/rllab/lib/python3.5/site-packages/gym/envs/registration.py", line 123, in make
env = spec.make(kwargs)
File "/home/sylyvahn/anaconda3/envs/rllab/lib/python3.5/site-packages/gym/envs/registration.py", line 87, in make
env = cls(_kwargs)
File "/home/sylyvahn/Downloads/AIRL-ahq1993/inverse_rl-master/inverse_rl/envs/ant_env.py", line 209, in init
mujoco_env.MujocoEnv.init(self, f.name, 5)
File "/home/sylyvahn/anaconda3/envs/rllab/lib/python3.5/contextlib.py", line 77, in exit
self.gen.throw(type, value, traceback)
File "/home/sylyvahn/Downloads/AIRL-ahq1993/inverse_rl-master/inverse_rl/envs/dynamic_mjc/model_builder.py", line 57, in asfile
yield f
File "/home/sylyvahn/Downloads/AIRL-ahq1993/inverse_rl-master/inverse_rl/envs/ant_env.py", line 209, in init
mujoco_env.MujocoEnv.init(self, f.name, 5)
File "/home/sylyvahn/anaconda3/envs/rllab/lib/python3.5/site-packages/gym/envs/mujoco/mujoco_env.py", line 42, in init
observation, _reward, done, _info = self.step(np.zeros(self.model.nu))
File "/home/sylyvahn/anaconda3/envs/rllab/lib/python3.5/site-packages/gym/core.py", line 62, in step
raise NotImplementedError
NotImplementedError
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "ant_data_collect.py", line 41, in
run_sweep_parallel(main, params_dict, repeat=4)
File "/home/sylyvahn/Downloads/AIRL-ahq1993/inverse_rl-master/inverse_rl/utils/hyper_sweep.py", line 57, in run_sweep_parallel
pool.map(kwargs_wrapper, exp_args)
File "/home/sylyvahn/anaconda3/envs/rllab/lib/python3.5/multiprocessing/pool.py", line 260, in map
return self._map_async(func, iterable, mapstar, chunksize).get()
File "/home/sylyvahn/anaconda3/envs/rllab/lib/python3.5/multiprocessing/pool.py", line 608, in get
raise self._value
NotImplementedError
Exception ignored in: <_io.TextIOWrapper name='' mode='w' encoding='UTF-8'>
BrokenPipeError: [Errno 32] Broken pipe

No support for discrete action spaces in batch policy optimization

The current policy optimisations for IRL do not allow for a discrete (CategoricalMLPPolicy) policy. Is there a specific reason why there is no support for discrete action spaces?

Thank you

Rolling out IRL policy

What is the best way to roll out the learned reward to visualize the results of IRL?

PyTorch Implementations

What process would it take to recreate the program with PyTorch?

Assert in AIRL

In the AIRL class there is an assert for the action space of the environments.

assert isinstance(env.action_space, Box)

The action space of my environment is actually discrete, is there a reason why this only works for continuous actions rather than discrete actions? How should one proceed to allow for discrete actions?

Thank you!

Number of iterations of policy optimization

In GAIL and GCL papers it is specified, that the policy is sampling and optimization is executed N times before each update of the discriminator. But here in GAIL and GCP implementations policy is sampled only once. Is there any particular reason for it?

Doubt on the re-optimization of reward

Hi,

I have a question regarding the re-optimization of the learned reward function in the state-only case.

I cannot understand if the function that is re-optimized is only the r(s) or the entire f(s,s')=r(s)+gamma*v(s')-v(s).

Thanks!

Full Code Release

Hi,

I was wondering if there is any update regarding when the full code for AIRL will be released now that the paper has been accepted to ICLR.

Thanks

Examples with GCLDiscrimTrajectory

Hi,

In your example script (i.e. scripts/pendulum_gcl.py), it runs with GCLDiscrim which is single tilmestep of GCL.

Could you provide an example script with GCLDiscrimTrajectory?
Is it enough if I modify
irl_model = GCLDiscrim(env_spec=env.spec, expert_trajs=experts)
to
irl_model = GCLDiscrimTrajectory(env_spec=env.spec, expert_trajs=experts)
in that script?

Thanks

justinjfu / inverse_rl Goto Github PK

inverse_rl's People

Contributors

Stargazers

Watchers

Forkers

inverse_rl's Issues

Running Swimmer, HalfCheeta, Humanoid etc

Discrepancy between the Ant experiment in the paper and the code.

NotImplementedError

No support for discrete action spaces in batch policy optimization

Rolling out IRL policy

PyTorch Implementations

Assert in AIRL

Number of iterations of policy optimization

Doubt on the re-optimization of reward

Full Code Release

Examples with GCLDiscrimTrajectory

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent