xbpeng / awr Goto Github PK
View Code? Open in Web Editor NEWImplementation of advantage-weighted regression.
License: MIT License
Implementation of advantage-weighted regression.
License: MIT License
In the paper it says the Beta value is 0.05 but in the code here for all environments it is 1.0. Can you provide an explanation for this please?
Thank you.
Trenton
Hi, Thank you for sharing the repo!
I was wondering how the Train_Return and Test_Return is calculated
and what the difference between the two.
I see that one is using norm_a_tf and sample_a_tf in the code.
Hello,
thanks for the code, while I tried to re-implement the program, I find that there is one step to normalize value function vf here . It's implementated by v_predict = v(s; \theta) * (1-/gamma)
and critic update is implemented by
min_\theta [v(s; \theta) * (1-/gamma) - v_estimate ]^2
.
Is there any reason to normalize Value functions output, I tested to remove the normalization term and rescaled learning rate(by 1-gamma), looks there is no problem in HalfCheetah-v2.
It holds similar performance with original version.
Best,
Hello,
I am trying to use this algorithm (rewritten in PyTorch with Gym vectorized envs) for motion imitation, starting with the PyBullet implementation of the DeepMimic environment. In the paper, section 5.3, there is a comparison of DeepMimic's modified off-policy PPO with AWR and RWR on some of DeepMimic's tasks, but no further information was given on which hyperparameters were used there.
The appendix gives some parameters which I think apply to the usual MuJoCo benchmarks, but I'm not sure if they also apply to the DeepMimic tasks (for instance the MLP hidden dimensions of (128, 64) don't seem right for DeepMimic since the original paper uses (1024, 512)).
Hi, I am trying to modify AWR into the offline version (or fully off-policy version). I find that the paper states that one can simply treat the dataset as the replay buffer and don't need to do any modifications. But I notice that if I remove sampling in rl_agent.train
, line 105 in rl_agent.py:
train_return, train_path_count, new_sample_count = self._rollout_train(self._samples_per_iter)
,
new_sample_count
will remain 0, so that update steps are also 0.
Would you like to point out a proper way of modifications to obtain the offline AWR?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.