Dear Author, it's an honor to see your paper and code! I am a novice in this area and

Issue on reproduce MuJoCo results-HalfCheetah-v2 about iq-learn HOT 14 CLOSED

div99 commented on May 26, 2024

Issue on reproduce MuJoCo results-HalfCheetah-v2

from iq-learn.

Comments (14)

shuoye1000 commented on May 26, 2024 2

Through your tireless guidance, I finally solved the problem, thank you very much for your patience! Other than that, I still found some other minor issues, it seems that the following are missing from our code: Humanoid experiment data (parameters, expert data, etc.), Breakout expert data, Space Invaders expert data. If you could upload these data, it would help me to reproduce your paper better. And if the file is too large, zip seems to be a good option. You are a very charismatic author, good luck with your work!

from iq-learn.

Div99 commented on May 26, 2024 2

Hi, the link to the Atari datasets is fixed. For Humanoid, the expert data got accidentally deleted, and I will try to retrain a SAC agent if I can find time on the weekend to collect new expert trajectories.

from iq-learn.

Div99 commented on May 26, 2024

Hi, please set agent.learn_temp=False and it should work. In my experience, SAC style temperature learning is not very stable with IQ-Learn and can prevent the method from converging. I pushed a change where the temperature learning is disabled by default.

from iq-learn.

Div99 commented on May 26, 2024

Here are some training results I ran yesterday with 1 and 10 expert demos, with the hyperparams in run_mujoco.sh with the temp learning disabled:

from iq-learn.

Div99 commented on May 26, 2024

Imp Update: I found that temperature learning can also work well if the SAC target_entropy param is set to much lower than −dim(A), as we don't need much exploration for IL settings. For Half-Cheetah, setting the target_entropy=-24 works well. So empirically 4 * -dim(A) could be a good value to try with IQ-Learn

Reward Curves:

from iq-learn.

shuoye1000 commented on May 26, 2024

关于您给出的建议，我重新下载了您的代码并且重新运行了脚本，这是我使用的命令：【python train_iq.py env=cheetah agent=sac expert.demos=10 method.loss=value method.regularize=True agent.actor_lr=3e-05 seed=0 agent.learn_temp=False】
This is the output of my console parameters.

I'm too clumsy, whether the obs_dim and action_dim of the agent need to be modified by myself ？
And here are the training results I've gotten so far：

I would be very grateful for your guidance!

from iq-learn.

Div99 commented on May 26, 2024

Hi, the obs_dim and action_dim are set automatically by the code and you can ignore them. The default SAC agent temperature in the project repo was too high and I pushed a fix setting it to 1e-2 . For the HalfCheetah environment, both 1e-2 or 1e-3 temperature values should work very well.

from iq-learn.

BepfCp commented on May 26, 2024

Hi, thanks to your work and patient guidance. Just a small question, why dose your SAC on HalfCheetah-v2 only get 5000 points? From my own experience, SAC can reach to at least 12000 points within 1M steps. See similar results from OpenAI's SpinningUp or the picture below.

from iq-learn.

liushunyu commented on May 26, 2024

Hi, the link to the Atari datasets is fixed. For Humanoid, the expert data got accidentally deleted, and I will try to retrain a SAC agent if I can find time on the weekend to collect new expert trajectories.

HI, I cannot find Space Invaders expert data in the link

from iq-learn.

Div99 commented on May 26, 2024

For the half-cheetah expert, we may have trained with a single critic, instead of double critics so performance is not as high. (I don't know exactly why we did it, but the baselines we compared against like ValueDICE had similar expert performance so we likely didn't optimize too much). Nevertheless, a better expert should easily translate to better imitation with IQ-Learn, as it easily saturates the current expert performance at ~5000 level

from iq-learn.

Div99 commented on May 26, 2024

HI, I cannot find Space Invaders expert data in the link

Sorry for missing SpaceInvaders in the dataset release, I will add it to the GDrive.

from iq-learn.

liushunyu commented on May 26, 2024

HI, I cannot find Space Invaders expert data in the link

Sorry for missing SpaceInvaders in the dataset release, I will add it to the GDrive.

Thanks!!! By the way, can I run SQIL based on your code? I find the "sqil.yaml" in the "iq_learn/conf/method". How can I run it? Moreover, do you provide the GAIL implementation using the same expert data as you?

from iq-learn.

Div99 commented on May 26, 2024

We used this repo for running GAIL: https://github.com/ikostrikov/pytorch-a2c-ppo-acktr-gail. You can add our data-loading setup to it to use our provided expert demos. SQIL code was removed in the newer commits from our repo but is very simple to implement, and you can check an old version of the codebase

from iq-learn.

Div99 commented on May 26, 2024

I have added the expert datasets along with IQ-Learn results on Humanoid-v2 along. Also released a script to generate your expert trajectories for new environments

from iq-learn.

Issue on reproduce MuJoCo results-HalfCheetah-v2 about iq-learn HOT 14 CLOSED

Comments (14)

Related Issues (16)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent