Git Product home page Git Product logo

inverse_rl's People

Contributors

justinjfu avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

inverse_rl's Issues

Discrepancy between the Ant experiment in the paper and the code.

Hi,
Thank you very much for the open research and the code. I've implemented your environment for my research (with appropriate references), After running SAC over the CustomeAnt environment, I have realized that the reward return doesn't reorient the agent and the Ant still walks sideways. Also according to the code, I can't find the objective of the mentioned behavior (Figure 5 of the paper).

I am looking forward to your kind reply,

Thanks,

Code with the discrepancy:
The reorientation has not been designed in the following code.

def _step(self, a):
        vel = self.model.data.qvel.flat[0]
        forward_reward = vel
        self.do_simulation(a, self.frame_skip)

        ctrl_cost = .01 * np.square(a).sum()
        contact_cost = 0.5 * 1e-3 * np.sum(
            np.square(np.clip(self.model.data.cfrc_ext, -1, 1)))
        state = self.state_vector()
        flipped = not (state[2] >= 0.2) 
        flipped_rew = -1 if flipped else 0
        reward = forward_reward - ctrl_cost - contact_cost +flipped_rew

        self.timesteps += 1
        done = self.timesteps >= self.max_timesteps

        ob = self._get_obs()
        return ob, reward, done, dict(
            reward_forward=forward_reward,
            reward_ctrl=-ctrl_cost,
            reward_contact=-contact_cost,
            reward_flipped=flipped_rew)

NotImplementedError

Thank you very much for releasing your code. When I run ant_data_collect.py , some problem occured:

python ant_data_collect.py
2019-03-25 15:35:17.709269 CST | Warning: skipping Gym environment monitoring since snapshot_dir not configured.
2019-03-25 15:35:17.709250 CST | Warning: skipping Gym environment monitoring since snapshot_dir not configured.
2019-03-25 15:35:17.709358 CST | Warning: skipping Gym environment monitoring since snapshot_dir not configured.
2019-03-25 15:35:17.709459 CST | Warning: skipping Gym environment monitoring since snapshot_dir not configured.
multiprocessing.pool.RemoteTraceback:
"""
Traceback (most recent call last):
File "/home/sylyvahn/anaconda3/envs/rllab/lib/python3.5/multiprocessing/pool.py", line 119, in worker
result = (True, func(*args, **kwds))
File "/home/sylyvahn/anaconda3/envs/rllab/lib/python3.5/multiprocessing/pool.py", line 44, in mapstar
return list(map(*args))
File "/home/sylyvahn/Downloads/AIRL-ahq1993/inverse_rl-master/inverse_rl/utils/hyper_sweep.py", line 47, in kwargs_wrapper
return method(**args)
File "ant_data_collect.py", line 16, in main
env = TfEnv(CustomGymEnv('CustomAnt-v0', record_video=False, record_log=False, force_reset=False))
File "/home/sylyvahn/Downloads/AIRL-ahq1993/inverse_rl-master/inverse_rl/envs/env_utils.py", line 123, in init
video_schedule=FixedIntervalVideoSchedule(50), force_reset=force_reset)
File "/home/sylyvahn/Downloads/AIRL-ahq1993/inverse_rl-master/inverse_rl/envs/env_utils.py", line 35, in init
env = gym.envs.make(env_name)
File "/home/sylyvahn/anaconda3/envs/rllab/lib/python3.5/site-packages/gym/envs/registration.py", line 171, in make
return registry.make(id, **kwargs)
File "/home/sylyvahn/anaconda3/envs/rllab/lib/python3.5/site-packages/gym/envs/registration.py", line 123, in make
env = spec.make(kwargs)
File "/home/sylyvahn/anaconda3/envs/rllab/lib/python3.5/site-packages/gym/envs/registration.py", line 87, in make
env = cls(
_kwargs)
File "/home/sylyvahn/Downloads/AIRL-ahq1993/inverse_rl-master/inverse_rl/envs/ant_env.py", line 209, in init
mujoco_env.MujocoEnv.init(self, f.name, 5)
File "/home/sylyvahn/anaconda3/envs/rllab/lib/python3.5/contextlib.py", line 77, in exit
self.gen.throw(type, value, traceback)
File "/home/sylyvahn/Downloads/AIRL-ahq1993/inverse_rl-master/inverse_rl/envs/dynamic_mjc/model_builder.py", line 57, in asfile
yield f
File "/home/sylyvahn/Downloads/AIRL-ahq1993/inverse_rl-master/inverse_rl/envs/ant_env.py", line 209, in init
mujoco_env.MujocoEnv.init(self, f.name, 5)
File "/home/sylyvahn/anaconda3/envs/rllab/lib/python3.5/site-packages/gym/envs/mujoco/mujoco_env.py", line 42, in init
observation, _reward, done, _info = self.step(np.zeros(self.model.nu))
File "/home/sylyvahn/anaconda3/envs/rllab/lib/python3.5/site-packages/gym/core.py", line 62, in step
raise NotImplementedError
NotImplementedError
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "ant_data_collect.py", line 41, in
run_sweep_parallel(main, params_dict, repeat=4)
File "/home/sylyvahn/Downloads/AIRL-ahq1993/inverse_rl-master/inverse_rl/utils/hyper_sweep.py", line 57, in run_sweep_parallel
pool.map(kwargs_wrapper, exp_args)
File "/home/sylyvahn/anaconda3/envs/rllab/lib/python3.5/multiprocessing/pool.py", line 260, in map
return self._map_async(func, iterable, mapstar, chunksize).get()
File "/home/sylyvahn/anaconda3/envs/rllab/lib/python3.5/multiprocessing/pool.py", line 608, in get
raise self._value
NotImplementedError
Exception ignored in: <_io.TextIOWrapper name='' mode='w' encoding='UTF-8'>
BrokenPipeError: [Errno 32] Broken pipe

Assert in AIRL

In the AIRL class there is an assert for the action space of the environments.

assert isinstance(env.action_space, Box)

The action space of my environment is actually discrete, is there a reason why this only works for continuous actions rather than discrete actions? How should one proceed to allow for discrete actions?

Thank you!

Number of iterations of policy optimization

In GAIL and GCL papers it is specified, that the policy is sampling and optimization is executed N times before each update of the discriminator. But here in GAIL and GCP implementations policy is sampled only once. Is there any particular reason for it?

Doubt on the re-optimization of reward

Hi,

I have a question regarding the re-optimization of the learned reward function in the state-only case.

I cannot understand if the function that is re-optimized is only the r(s) or the entire f(s,s')=r(s)+gamma*v(s')-v(s).

Thanks!

Full Code Release

Hi,

I was wondering if there is any update regarding when the full code for AIRL will be released now that the paper has been accepted to ICLR.

Thanks

Examples with GCLDiscrimTrajectory

Hi,

In your example script (i.e. scripts/pendulum_gcl.py), it runs with GCLDiscrim which is single tilmestep of GCL.

Could you provide an example script with GCLDiscrimTrajectory?
Is it enough if I modify
irl_model = GCLDiscrim(env_spec=env.spec, expert_trajs=experts)
to
irl_model = GCLDiscrimTrajectory(env_spec=env.spec, expert_trajs=experts)
in that script?

Thanks

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.