Comments (14)
Through your tireless guidance, I finally solved the problem, thank you very much for your patience! Other than that, I still found some other minor issues, it seems that the following are missing from our code: Humanoid experiment data (parameters, expert data, etc.), Breakout expert data, Space Invaders expert data. If you could upload these data, it would help me to reproduce your paper better. And if the file is too large, zip seems to be a good option. You are a very charismatic author, good luck with your work!
from iq-learn.
Hi, the link to the Atari datasets is fixed. For Humanoid, the expert data got accidentally deleted, and I will try to retrain a SAC agent if I can find time on the weekend to collect new expert trajectories.
from iq-learn.
Hi, please set agent.learn_temp=False
and it should work. In my experience, SAC style temperature learning is not very stable with IQ-Learn and can prevent the method from converging. I pushed a change where the temperature learning is disabled by default.
from iq-learn.
Here are some training results I ran yesterday with 1 and 10 expert demos, with the hyperparams in run_mujoco.sh
with the temp learning disabled:
from iq-learn.
Imp Update: I found that temperature learning can also work well if the SAC target_entropy param is set to much lower than −dim(A), as we don't need much exploration for IL settings. For Half-Cheetah, setting the target_entropy=-24 works well. So empirically 4 * -dim(A) could be a good value to try with IQ-Learn
from iq-learn.
关于您给出的建议,我重新下载了您的代码并且重新运行了脚本,这是我使用的命令:【python train_iq.py env=cheetah agent=sac expert.demos=10 method.loss=value method.regularize=True agent.actor_lr=3e-05 seed=0 agent.learn_temp=False】
This is the output of my console parameters.
I'm too clumsy, whether the obs_dim and action_dim of the agent need to be modified by myself ?
And here are the training results I've gotten so far:
I would be very grateful for your guidance!
from iq-learn.
Hi, the obs_dim and action_dim are set automatically by the code and you can ignore them. The default SAC agent temperature in the project repo was too high and I pushed a fix setting it to 1e-2
. For the HalfCheetah environment, both 1e-2 or 1e-3 temperature values should work very well.
from iq-learn.
Hi, thanks to your work and patient guidance. Just a small question, why dose your SAC on HalfCheetah-v2 only get 5000 points? From my own experience, SAC can reach to at least 12000 points within 1M steps. See similar results from OpenAI's SpinningUp or the picture below.
from iq-learn.
Hi, the link to the Atari datasets is fixed. For Humanoid, the expert data got accidentally deleted, and I will try to retrain a SAC agent if I can find time on the weekend to collect new expert trajectories.
HI, I cannot find Space Invaders expert data in the link
from iq-learn.
For the half-cheetah expert, we may have trained with a single critic, instead of double critics so performance is not as high. (I don't know exactly why we did it, but the baselines we compared against like ValueDICE had similar expert performance so we likely didn't optimize too much). Nevertheless, a better expert should easily translate to better imitation with IQ-Learn, as it easily saturates the current expert performance at ~5000 level
from iq-learn.
HI, I cannot find Space Invaders expert data in the link
Sorry for missing SpaceInvaders in the dataset release, I will add it to the GDrive.
from iq-learn.
HI, I cannot find Space Invaders expert data in the link
Sorry for missing SpaceInvaders in the dataset release, I will add it to the GDrive.
Thanks!!! By the way, can I run SQIL based on your code? I find the "sqil.yaml" in the "iq_learn/conf/method". How can I run it? Moreover, do you provide the GAIL implementation using the same expert data as you?
from iq-learn.
We used this repo for running GAIL: https://github.com/ikostrikov/pytorch-a2c-ppo-acktr-gail. You can add our data-loading setup to it to use our provided expert demos. SQIL code was removed in the newer commits from our repo but is very simple to implement, and you can check an old version of the codebase
from iq-learn.
I have added the expert datasets along with IQ-Learn results on Humanoid-v2 along. Also released a script to generate your expert trajectories for new environments
from iq-learn.
Related Issues (16)
- expert datasets HOT 4
- Getting missing args error running train_iq.py examples from run_offline.sh HOT 2
- Critic function is diverging while using SAC HOT 17
- Question regarding iq_loss implementation HOT 2
- Divergence Issue
- Poor performance on robosuite tasks
- Issue on robosuite tasks
- Offline Learning without access to environment.
- Issue on reproduce MuJoCo results HOT 5
- How to judge the convergence HOT 10
- Issue on reproducing pointmaze experiments HOT 1
- Code for gridworld experiments HOT 3
- Issue on Ant-v2 expertd data and Humanoid-v2 random seed Experiments HOT 1
- Config for expert generation
- Pseudocode and questions
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from iq-learn.