denisyarats / drq Goto Github PK

View Code? Open in Web Editor NEW

400.0 13.0 52.0 13.05 MB

DrQ: Data regularized Q

Home Page: https://sites.google.com/view/data-regularized-q

License: MIT License

Python 1.02% Jupyter Notebook 98.98%

rl reinforcement-learning deep-learning mujoco dm-control gym pixel sac soft-actor-crit pytorch

drq's People

Contributors

Stargazers

Watchers

drq's Issues

Getting the code to run deterministically

I am trying to get the code to run deterministically, i.e. repeat behavior exactly when running the same seed multiple times. However, I'm having some issues. I've tried to disable the cudnn benchmarking:

torch.backends.cudnn.benchmark = True

I've also added

torch.use_deterministic_algorithms(True)

Still I am not able to repeat the experiments exactly for fixed seeds. Are there any ideas what further sources of non-determinism in the code base might be? Thanks!!

Non-deterministic runs

Hi,

I get a different train and eval curves every time I start the training even with the same seed. Is that supposed to happen even after setting all seeds i.e. random, numpy, torch(both cuda and non-cuda)? Did you observe a similar behaviour?

How to limit number of threads spawned?

I want to run DRQ on a larger node with multiple GPUs and 18 cores (36 with hyperthreading). When I try to run multiple DRQ jobs in parallel on the node, each job seems to spawn 41 threads, and this seems to be too much to handle for the CPU. Is there any way to limit the number of threads that DRQ launches? Thanks!!

About Replay Buffer Sample function

In your code, why the data sampled from the replay buffer obs and obs_aug is the same? What's the purpose? I can't understand.
Waiting for your answer~

Action_repeat settings to reproduce your paper results

In Readme, you mean that if I want to reproduce the results, I just
python train.py env=cartpole_swingup batch_size=512

But I notice the action_repeat number in config.yaml is not 8 for cartpole_swingup.

Maybe you should check this point.

Why is the encoder detached?

I might have missed something simple, but could you please kindly explain why don't you update the encoder part?

https://github.com/denisyarats/drq/blob/master/drq.py#L263-L264

In other SAC implementations (e.g. rlkit), the gradient back-props through the entire policy network. Thanks!

Recurrent controller instead of stacked image observation?

This is an amazing work, thanks a lot for sharing!!

The paper states that stacking the last 3 image frames can convert POMDP to MDP. While I understand this is common practice, I wonder if you have tried using GRU/LSTM controller? Does it typically perform better or worse than frame stacking in your experience?

reproduce problem

Can I get all DMC benchmark results if I use batch_size : 512 and action_repeat : 8?
I tried batch_size : 128 and action_repeat : 2 in env : finger, task : turn_easy. But result was bad(under 500 mean score until 200k).

walker_stand critic loss explosion

Hi, thank you for quality code. but I wonder why walker_stand task critic loss is too high(up to 1e+3) in my experiment. In my case, I used your conda.yaml and changed env :walker_stand and action_repeat : 2 and batch_size : 512 as you mentioned in paper. how can I get stable critic loss?(for example, reward scaling)

Thank you for reading.

Not learning on dm_control's pendulum swingup

Upon running the command

python train.py env=pendulum_swingup

I'm getting

I understand that pendulum swingup uses sparse reward, but the agent should be able to earn some reward (although very little) initially by exploration.

sparse reward tasks

I wanted to ask if any tweak in your implementation might be needed for sparse reward tasks

About environment steps

Hello, I'm trying to replicate the results of the Dreamer and DrQ papers with PyTorch.

While the DrQ code works fine, I am concerned that the environment steps (x-axis in Figure) are counted differently from Dreamer's.

The Dreamer's implementation increments 1000 environment steps per episode. (No matter what the action repeat is.)
However, in the DrQ implementation, the step count (self.step in train.py) incremented 1000/action_repeat per episode.

I believe that this would make the DrQ consumes more episodes to reach the same training_max_steps.

Am I missing something here?

The purpose of layer norm after CNN

Hi, I wonder what is the purpose of the layer norm after the convolutional layers. Does it improve stability?

I understand that your actor and critic are sharing the convolutional layers. Is layer norm for that purpose?

hydra 'strict' argument error

python train.py env=cartpole_swingup

result :

Traceback (most recent call last):
  File "train.py", line 170, in <module>
    @hydra.main(config_path='config.yaml', strict=True)
TypeError: main() got an unexpected keyword argument 'strict'

if I delete strict argument then,

ValueError: Using config_path to specify the config name is not supported, specify the config name via config_name.
See https://hydra.cc/docs/next/upgrades/0.11_to_1.0/config_path_changes

How can I fix it?

Unused code

While modifying this code for research, I found lots of code to be redundant.

Code for train() or eval(): only BatchNorm and DropOut are affected by this.
Done flags in replay buffer: they are stored but never used.
output_dim in Encoder is unused
output_logits is always false

Would it be possible to include code for state-sac?

It seems like the paper shows results for SAC trained on the underlying state, however, I cannot find that code in the repo. Would it be possible to include code for this? I'd be interested in reproducing your experiments! Thanks!!

How to run code without augmentation?

Hi, thanks for sharing the code! I'm interested in applying DrQ to a dm_control domain, and I see from the README that I can readily do that using the following command:

python train.py env=cartpole_swingup

However, is there a way to turn off augmentation (e.g., via command line options)? I'd like to compare the performance with and without augmentation.

Figures

May I know what are the ways to generate those figures, which code file is for that. Sorry for my ignorance, thank you.

macOS Catalina can't run due to Hydra

After following the installation instructions, I run into a problem with Hydra:

HYDRA_FULL_ERROR=1 python train.py env=cartpole_swingup                                                
Traceback (most recent call last):
  File "/usr/local/Caskroom/miniconda/base/envs/drq/lib/python3.7/site-packages/hydra/_internal/config_loader_impl.py", line 153, in load_configuration
    from_shell=from_shell,
  File "/usr/local/Caskroom/miniconda/base/envs/drq/lib/python3.7/site-packages/hydra/_internal/config_loader_impl.py", line 256, in _load_configuration
    run_mode=run_mode,
  File "/usr/local/Caskroom/miniconda/base/envs/drq/lib/python3.7/site-packages/hydra/_internal/config_loader_impl.py", line 796, in _merge_defaults_into_config
    hydra_cfg = merge_defaults_list_into_config(hydra_cfg, system_list)
  File "/usr/local/Caskroom/miniconda/base/envs/drq/lib/python3.7/site-packages/hydra/_internal/config_loader_impl.py", line 764, in merge_defaults_list_into_config
    merged_cfg.merge_with(job_cfg)
  File "/usr/local/Caskroom/miniconda/base/envs/drq/lib/python3.7/site-packages/omegaconf/basecontainer.py", line 325, in merge_with
    self._format_and_raise(key=None, value=None, cause=e)
  File "/usr/local/Caskroom/miniconda/base/envs/drq/lib/python3.7/site-packages/omegaconf/base.py", line 101, in _format_and_raise
    type_override=type_override,
  File "/usr/local/Caskroom/miniconda/base/envs/drq/lib/python3.7/site-packages/omegaconf/_utils.py", line 610, in format_and_raise
    _raise(ex, cause)
  File "/usr/local/Caskroom/miniconda/base/envs/drq/lib/python3.7/site-packages/omegaconf/_utils.py", line 591, in _raise
    raise ex  # set end OC_CAUSE=1 for full backtrace
  File "/usr/local/Caskroom/miniconda/base/envs/drq/lib/python3.7/site-packages/omegaconf/basecontainer.py", line 323, in merge_with
    self._merge_with(*others)
  File "/usr/local/Caskroom/miniconda/base/envs/drq/lib/python3.7/site-packages/omegaconf/basecontainer.py", line 341, in _merge_with
    BaseContainer._map_merge(self, other)
  File "/usr/local/Caskroom/miniconda/base/envs/drq/lib/python3.7/site-packages/omegaconf/basecontainer.py", line 288, in _map_merge
    dest_node._merge_with(src_value)
  File "/usr/local/Caskroom/miniconda/base/envs/drq/lib/python3.7/site-packages/omegaconf/basecontainer.py", line 341, in _merge_with
    BaseContainer._map_merge(self, other)
  File "/usr/local/Caskroom/miniconda/base/envs/drq/lib/python3.7/site-packages/omegaconf/basecontainer.py", line 308, in _map_merge
    dest[key] = src._get_node(key)
  File "/usr/local/Caskroom/miniconda/base/envs/drq/lib/python3.7/site-packages/omegaconf/dictconfig.py", line 251, in __setitem__
    key=key, value=value, type_override=ConfigKeyError, cause=e
  File "/usr/local/Caskroom/miniconda/base/envs/drq/lib/python3.7/site-packages/omegaconf/base.py", line 101, in _format_and_raise
    type_override=type_override,
  File "/usr/local/Caskroom/miniconda/base/envs/drq/lib/python3.7/site-packages/omegaconf/_utils.py", line 610, in format_and_raise
    _raise(ex, cause)
  File "/usr/local/Caskroom/miniconda/base/envs/drq/lib/python3.7/site-packages/omegaconf/_utils.py", line 591, in _raise
    raise ex  # set end OC_CAUSE=1 for full backtrace
omegaconf.errors.ConfigKeyError: Key 'name' not in 'HydraConf'
	full_key: hydra.name
	reference_type=Optional[HydraConf]
	object_type=HydraConf

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "train.py", line 178, in <module>
    main()
  File "/usr/local/Caskroom/miniconda/base/envs/drq/lib/python3.7/site-packages/hydra/main.py", line 37, in decorated_main
    strict=strict,
  File "/usr/local/Caskroom/miniconda/base/envs/drq/lib/python3.7/site-packages/hydra/_internal/utils.py", line 356, in _run_hydra
    lambda: hydra.run(
  File "/usr/local/Caskroom/miniconda/base/envs/drq/lib/python3.7/site-packages/hydra/_internal/utils.py", line 210, in run_and_report
    raise ex
  File "/usr/local/Caskroom/miniconda/base/envs/drq/lib/python3.7/site-packages/hydra/_internal/utils.py", line 207, in run_and_report
    return func()
  File "/usr/local/Caskroom/miniconda/base/envs/drq/lib/python3.7/site-packages/hydra/_internal/utils.py", line 359, in <lambda>
    overrides=args.overrides,
  File "/usr/local/Caskroom/miniconda/base/envs/drq/lib/python3.7/site-packages/hydra/_internal/hydra.py", line 104, in run
    run_mode=RunMode.RUN,
  File "/usr/local/Caskroom/miniconda/base/envs/drq/lib/python3.7/site-packages/hydra/_internal/hydra.py", line 512, in compose_config
    from_shell=from_shell,
  File "/usr/local/Caskroom/miniconda/base/envs/drq/lib/python3.7/site-packages/hydra/_internal/config_loader_impl.py", line 156, in load_configuration
    raise ConfigCompositionException() from e
hydra.errors.ConfigCompositionException

I have looked through the stack trace and am not sufficiently familiar with Hydra or OmniConf to decipher what is actually causing the issue. Maybe we have different versions for the packages that got installed from the conda_env.yml file?

eval frequency

In line 127 of train.py you are checking if you should evaluate the env (if self.step % self.cfg.eval_frequency == 0:).
However, this happens inside the if clause checking for done. Shouldn't it be independent of that? Like this, most of the times when step is a multiple of eval_frequency, it doesn't happen to coincide with done being True, which means no evaluation will be performed.

Replicating table 1 of the paper

Dear Denis,

Thanks for open-sourcing this, the paper is really cool! I am trying to replicate table 1 with the planet benchmark and ran into some problems for the SAC-state baseline. I am using your implementation of SAC-state (github.com/denisyarats/pytorch_sac) but fail to reach the reported performance. Was action repeat applied to SAC-state in table 1? For each environment, I am using frame_skip = action_repeat, where action_repeat comes from table 2 in the paper. To only use 500,000 environment steps, I set num_train_steps = 500,000 // action_repeat. Am I missing something here? Once I figure this out, I will replicate the DrQ experiments. Thanks!!

Class DRQAgent is not in module drq

Hi,
I would like to try a costum gym environment but have encountered with this error:

Error instantiating drq.DRQAgent : Class DRQAgent is not in module drq
Traceback (most recent call last):
  File "/home/alireza/.local/share/virtualenvs/SimulationFramework-19OjgRmc/lib/python3.6/site-packages/hydra/utils.py", line 23, in get_class
    klass = getattr(mod, class_name)
AttributeError: module 'drq' has no attribute 'DRQAgent'

which happens in line self.agent = hydra.utils.instantiate(cfg.agent) in train.py

Do you know what might be the reason?

critic update frequency

in DRQAgent.update it seems that the critic is updated at every environment step which makes sampling from the environment rather slow.

Do you think if it's safe adding an additional frequency to the critic update?

replicating results on dreamer benchmark

For replicating the results on the dreamer benchmark, are there any settings to override except batch_size=512 action_repeat=2? Thanks!!

Hi！

Did you have a tensor flow code?

Confiused with csv files

Great work!
Am a little confused with the dmc_planet_bench.csv file. Why steps are negative?
To produce results comparable with this csv shall I set the eval_frequency in the config file to 2000?
I want to plot this file using Tensorboard, and just to make sure, shall I set the action_repeat to corresponding action_repeat in table 2 when am going to log this csv with the provided logger? and I have to plot it as eval/episode_reward like following?

logger.log('eval/episode_reward', float(row['episode_reward']), -1 * int(row['step']))

denisyarats / drq Goto Github PK

drq's People

Contributors

Stargazers

Watchers

Forkers

drq's Issues

Recommend Projects

Recommend Topics

Recommend Org