denisyarats / drq Goto Github PK
View Code? Open in Web Editor NEWDrQ: Data regularized Q
Home Page: https://sites.google.com/view/data-regularized-q
License: MIT License
DrQ: Data regularized Q
Home Page: https://sites.google.com/view/data-regularized-q
License: MIT License
I am trying to get the code to run deterministically, i.e. repeat behavior exactly when running the same seed multiple times. However, I'm having some issues. I've tried to disable the cudnn benchmarking:
torch.backends.cudnn.benchmark = True
I've also added
torch.use_deterministic_algorithms(True)
Still I am not able to repeat the experiments exactly for fixed seeds. Are there any ideas what further sources of non-determinism in the code base might be? Thanks!!
Hi,
I get a different train and eval curves every time I start the training even with the same seed. Is that supposed to happen even after setting all seeds i.e. random, numpy, torch(both cuda and non-cuda)? Did you observe a similar behaviour?
I want to run DRQ on a larger node with multiple GPUs and 18 cores (36 with hyperthreading). When I try to run multiple DRQ jobs in parallel on the node, each job seems to spawn 41 threads, and this seems to be too much to handle for the CPU. Is there any way to limit the number of threads that DRQ launches? Thanks!!
In your code, why the data sampled from the replay buffer obs and obs_aug is the same? What's the purpose? I can't understand.
Waiting for your answer~
In Readme, you mean that if I want to reproduce the results, I just
python train.py env=cartpole_swingup batch_size=512
But I notice the action_repeat number in config.yaml is not 8 for cartpole_swingup.
Maybe you should check this point.
I might have missed something simple, but could you please kindly explain why don't you update the encoder part?
https://github.com/denisyarats/drq/blob/master/drq.py#L263-L264
In other SAC implementations (e.g. rlkit), the gradient back-props through the entire policy network. Thanks!
This is an amazing work, thanks a lot for sharing!!
The paper states that stacking the last 3 image frames can convert POMDP to MDP. While I understand this is common practice, I wonder if you have tried using GRU/LSTM controller? Does it typically perform better or worse than frame stacking in your experience?
Can I get all DMC benchmark results if I use batch_size : 512
and action_repeat : 8
?
I tried batch_size : 128
and action_repeat : 2
in env : finger
, task : turn_easy
. But result was bad(under 500 mean score until 200k).
Hi, thank you for quality code. but I wonder why walker_stand task critic loss is too high(up to 1e+3) in my experiment. In my case, I used your conda.yaml
and changed env :walker_stand
and action_repeat : 2
and batch_size : 512
as you mentioned in paper. how can I get stable critic loss?(for example, reward scaling)
Thank you for reading.
I wanted to ask if any tweak in your implementation might be needed for sparse reward tasks
Hello, I'm trying to replicate the results of the Dreamer and DrQ papers with PyTorch.
While the DrQ code works fine, I am concerned that the environment steps (x-axis in Figure) are counted differently from Dreamer's.
The Dreamer's implementation increments 1000 environment steps per episode. (No matter what the action repeat is.)
However, in the DrQ implementation, the step count (self.step in train.py) incremented 1000/action_repeat per episode.
I believe that this would make the DrQ consumes more episodes to reach the same training_max_steps.
Am I missing something here?
Hi, I wonder what is the purpose of the layer norm after the convolutional layers. Does it improve stability?
I understand that your actor and critic are sharing the convolutional layers. Is layer norm for that purpose?
python train.py env=cartpole_swingup
result :
Traceback (most recent call last):
File "train.py", line 170, in <module>
@hydra.main(config_path='config.yaml', strict=True)
TypeError: main() got an unexpected keyword argument 'strict'
if I delete strict argument then,
ValueError: Using config_path to specify the config name is not supported, specify the config name via config_name.
See https://hydra.cc/docs/next/upgrades/0.11_to_1.0/config_path_changes
How can I fix it?
While modifying this code for research, I found lots of code to be redundant.
It seems like the paper shows results for SAC trained on the underlying state, however, I cannot find that code in the repo. Would it be possible to include code for this? I'd be interested in reproducing your experiments! Thanks!!
Hi, thanks for sharing the code! I'm interested in applying DrQ to a dm_control domain, and I see from the README that I can readily do that using the following command:
python train.py env=cartpole_swingup
However, is there a way to turn off augmentation (e.g., via command line options)? I'd like to compare the performance with and without augmentation.
May I know what are the ways to generate those figures, which code file is for that. Sorry for my ignorance, thank you.
After following the installation instructions, I run into a problem with Hydra:
HYDRA_FULL_ERROR=1 python train.py env=cartpole_swingup
Traceback (most recent call last):
File "/usr/local/Caskroom/miniconda/base/envs/drq/lib/python3.7/site-packages/hydra/_internal/config_loader_impl.py", line 153, in load_configuration
from_shell=from_shell,
File "/usr/local/Caskroom/miniconda/base/envs/drq/lib/python3.7/site-packages/hydra/_internal/config_loader_impl.py", line 256, in _load_configuration
run_mode=run_mode,
File "/usr/local/Caskroom/miniconda/base/envs/drq/lib/python3.7/site-packages/hydra/_internal/config_loader_impl.py", line 796, in _merge_defaults_into_config
hydra_cfg = merge_defaults_list_into_config(hydra_cfg, system_list)
File "/usr/local/Caskroom/miniconda/base/envs/drq/lib/python3.7/site-packages/hydra/_internal/config_loader_impl.py", line 764, in merge_defaults_list_into_config
merged_cfg.merge_with(job_cfg)
File "/usr/local/Caskroom/miniconda/base/envs/drq/lib/python3.7/site-packages/omegaconf/basecontainer.py", line 325, in merge_with
self._format_and_raise(key=None, value=None, cause=e)
File "/usr/local/Caskroom/miniconda/base/envs/drq/lib/python3.7/site-packages/omegaconf/base.py", line 101, in _format_and_raise
type_override=type_override,
File "/usr/local/Caskroom/miniconda/base/envs/drq/lib/python3.7/site-packages/omegaconf/_utils.py", line 610, in format_and_raise
_raise(ex, cause)
File "/usr/local/Caskroom/miniconda/base/envs/drq/lib/python3.7/site-packages/omegaconf/_utils.py", line 591, in _raise
raise ex # set end OC_CAUSE=1 for full backtrace
File "/usr/local/Caskroom/miniconda/base/envs/drq/lib/python3.7/site-packages/omegaconf/basecontainer.py", line 323, in merge_with
self._merge_with(*others)
File "/usr/local/Caskroom/miniconda/base/envs/drq/lib/python3.7/site-packages/omegaconf/basecontainer.py", line 341, in _merge_with
BaseContainer._map_merge(self, other)
File "/usr/local/Caskroom/miniconda/base/envs/drq/lib/python3.7/site-packages/omegaconf/basecontainer.py", line 288, in _map_merge
dest_node._merge_with(src_value)
File "/usr/local/Caskroom/miniconda/base/envs/drq/lib/python3.7/site-packages/omegaconf/basecontainer.py", line 341, in _merge_with
BaseContainer._map_merge(self, other)
File "/usr/local/Caskroom/miniconda/base/envs/drq/lib/python3.7/site-packages/omegaconf/basecontainer.py", line 308, in _map_merge
dest[key] = src._get_node(key)
File "/usr/local/Caskroom/miniconda/base/envs/drq/lib/python3.7/site-packages/omegaconf/dictconfig.py", line 251, in __setitem__
key=key, value=value, type_override=ConfigKeyError, cause=e
File "/usr/local/Caskroom/miniconda/base/envs/drq/lib/python3.7/site-packages/omegaconf/base.py", line 101, in _format_and_raise
type_override=type_override,
File "/usr/local/Caskroom/miniconda/base/envs/drq/lib/python3.7/site-packages/omegaconf/_utils.py", line 610, in format_and_raise
_raise(ex, cause)
File "/usr/local/Caskroom/miniconda/base/envs/drq/lib/python3.7/site-packages/omegaconf/_utils.py", line 591, in _raise
raise ex # set end OC_CAUSE=1 for full backtrace
omegaconf.errors.ConfigKeyError: Key 'name' not in 'HydraConf'
full_key: hydra.name
reference_type=Optional[HydraConf]
object_type=HydraConf
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "train.py", line 178, in <module>
main()
File "/usr/local/Caskroom/miniconda/base/envs/drq/lib/python3.7/site-packages/hydra/main.py", line 37, in decorated_main
strict=strict,
File "/usr/local/Caskroom/miniconda/base/envs/drq/lib/python3.7/site-packages/hydra/_internal/utils.py", line 356, in _run_hydra
lambda: hydra.run(
File "/usr/local/Caskroom/miniconda/base/envs/drq/lib/python3.7/site-packages/hydra/_internal/utils.py", line 210, in run_and_report
raise ex
File "/usr/local/Caskroom/miniconda/base/envs/drq/lib/python3.7/site-packages/hydra/_internal/utils.py", line 207, in run_and_report
return func()
File "/usr/local/Caskroom/miniconda/base/envs/drq/lib/python3.7/site-packages/hydra/_internal/utils.py", line 359, in <lambda>
overrides=args.overrides,
File "/usr/local/Caskroom/miniconda/base/envs/drq/lib/python3.7/site-packages/hydra/_internal/hydra.py", line 104, in run
run_mode=RunMode.RUN,
File "/usr/local/Caskroom/miniconda/base/envs/drq/lib/python3.7/site-packages/hydra/_internal/hydra.py", line 512, in compose_config
from_shell=from_shell,
File "/usr/local/Caskroom/miniconda/base/envs/drq/lib/python3.7/site-packages/hydra/_internal/config_loader_impl.py", line 156, in load_configuration
raise ConfigCompositionException() from e
hydra.errors.ConfigCompositionException
I have looked through the stack trace and am not sufficiently familiar with Hydra or OmniConf to decipher what is actually causing the issue. Maybe we have different versions for the packages that got installed from the conda_env.yml
file?
In line 127 of train.py you are checking if you should evaluate the env (if self.step % self.cfg.eval_frequency == 0:
).
However, this happens inside the if clause checking for done
. Shouldn't it be independent of that? Like this, most of the times when step
is a multiple of eval_frequency
, it doesn't happen to coincide with done
being True
, which means no evaluation will be performed.
Dear Denis,
Thanks for open-sourcing this, the paper is really cool! I am trying to replicate table 1 with the planet benchmark and ran into some problems for the SAC-state baseline. I am using your implementation of SAC-state (github.com/denisyarats/pytorch_sac) but fail to reach the reported performance. Was action repeat applied to SAC-state in table 1? For each environment, I am using frame_skip = action_repeat, where action_repeat comes from table 2 in the paper. To only use 500,000 environment steps, I set num_train_steps = 500,000 // action_repeat. Am I missing something here? Once I figure this out, I will replicate the DrQ experiments. Thanks!!
Hi,
I would like to try a costum gym environment but have encountered with this error:
Error instantiating drq.DRQAgent : Class DRQAgent is not in module drq
Traceback (most recent call last):
File "/home/alireza/.local/share/virtualenvs/SimulationFramework-19OjgRmc/lib/python3.6/site-packages/hydra/utils.py", line 23, in get_class
klass = getattr(mod, class_name)
AttributeError: module 'drq' has no attribute 'DRQAgent'
which happens in line self.agent = hydra.utils.instantiate(cfg.agent)
in train.py
Do you know what might be the reason?
in DRQAgent.update it seems that the critic is updated at every environment step which makes sampling from the environment rather slow.
Do you think if it's safe adding an additional frequency to the critic update?
For replicating the results on the dreamer benchmark, are there any settings to override except batch_size=512 action_repeat=2
? Thanks!!
Did you have a tensor flow code?
Great work!
Am a little confused with the dmc_planet_bench.csv file. Why steps are negative?
To produce results comparable with this csv shall I set the eval_frequency in the config file to 2000?
I want to plot this file using Tensorboard, and just to make sure, shall I set the action_repeat to corresponding action_repeat in table 2 when am going to log this csv with the provided logger? and I have to plot it as eval/episode_reward like following?
logger.log('eval/episode_reward', float(row['episode_reward']), -1 * int(row['step']))
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.