google-research / dreamer Goto Github PK

View Code? Open in Web Editor NEW

591.0 22.0 72.0 93 KB

Dream to Control: Learning Behaviors by Latent Imagination

Home Page: https://danijar.com/dreamer

License: Apache License 2.0

Python 100.00%

dreamer's Introduction

Dream to Control

Danijar Hafner, Timothy Lillicrap, Jimmy Ba, Mohammad Norouzi

Note: This is the original implementation. To build upon Dreamer, we recommend the newer implementation of Dreamer in TensorFlow 2. It is substantially simpler and faster while replicating the results.

Implementation of Dreamer, the reinforcement learning agent introduced in Dream to Control: Learning Behaviors by Latent Imagination. Dreamer learns long-horizon behaviors from images purely by latent imagination. For this, it backpropagates value estimates through trajectories imagined in the compact latent space of a learned world model. Dreamer solves visual control tasks using substantilly fewer episodes than strong model-free agents.

If you find this open source release useful, please reference in your paper:

@article{hafner2019dreamer,
  title={Dream to Control: Learning Behaviors by Latent Imagination},
  author={Hafner, Danijar and Lillicrap, Timothy and Ba, Jimmy and Norouzi, Mohammad},
  journal={arXiv preprint arXiv:1912.01603},
  year={2019}
}

Method

Dreamer learns a world model from past experience that can predict into the future. It then learns action and value models in its compact latent space. The value model optimizes Bellman consistency of imagined trajectories. The action model maximizes value estimates by propgating their analytic gradients back through imagined trajectories. When interacting with the environment, it simply executes the action model.

Find out more:

Instructions

To train an agent, install the dependencies and then run one of these commands:

python3 -m dreamer.scripts.train --logdir ./logdir/debug \
  --params '{defaults: [dreamer, debug], tasks: [dummy]}' \
  --num_runs 1000 --resume_runs False

python3 -m dreamer.scripts.train --logdir ./logdir/control \
  --params '{defaults: [dreamer], tasks: [walker_run]}'

python3 -m dreamer.scripts.train --logdir ./logdir/atari \
  --params '{defaults: [dreamer, pcont, discrete, atari], tasks: [atari_boxing]}'

python3 -m dreamer.scripts.train --logdir ./logdir/dmlab \
  --params '{defaults: [dreamer, discrete], tasks: [dmlab_collect]}'

The available tasks are listed in scripts/tasks.py. The hyper parameters can be found in scripts/configs.py.

Tips:

Add debug to the list of defaults to use a smaller config and reach the code you're developing more quickly.
Add the flags --resume_runs False and --num_runs 1000 to automatically create unique logdirs.
To train the baseline without value function, add value_head: False to the params.
To train PlaNet, add train_planner: cem, test_planner: cem, planner_objective: reward, action_head: False, value_head: False, imagination_horizon: 0 to the params.

Dependencies

The code was tested under Ubuntu 18 and uses these packages: tensorflow-gpu==1.13.1, tensorflow_probability==0.6.0, dm_control (egl rendering option recommended), gym, imageio, matplotlib, ruamel.yaml, scikit-image, scipy.

Disclaimer: This is not an official Google product.

dreamer's People

Contributors

Stargazers

Watchers

dreamer's Issues

How to evaluate dreamer

Hi, @danijar,

This work is quite interesting!

Is there a separate script we can use to load the trained model+policy/model/policy and evaluate the performance of Dreamer? At present, it seems that we can only test while training, alternating. It's not easy to deploy a model. It seems that the flags --resume_runs True can not achieve the desired function.

Thank you.

Sequence for key observ is of unexpected type object

I wrote my env. my env returns: {'observ':np.ones((64,64,3),dtype=np.uint8)}, reward, done, info
added to the tasks:

def own(config, params):
  # Works with `isolate_envs: thread`.
  action_repeat = params.get('action_repeat', 1)
  env_ctor = tools.bind(
      own_env, 'own, config, params, action_repeat,
      select_obs=None, obs_is_image=True, render_mode='state_pixels')
  return Task('adom', env_ctor, [])
def own_env(
    name, config, params, action_repeat, select_obs=None, obs_is_image=True,
    render_mode='rgb_array'):
  from dreamer import env as env_
  env = env_.Env()
  env = control.wrappers.ActionRepeat(env, action_repeat)
  if obs_is_image:
    env = control.wrappers.ObservationDict(env, 'observ')
  else:
    env = control.wrappers.ObservationDict(env, 'state')
  if select_obs is not None:
    env = control.wrappers.SelectObservations(env, select_obs)
  return _common_env(env, config, params)

def _common_env(env, config, params):
  env = control.wrappers.MinimumDuration(env, config.batch_shape[0])
  max_length = params.get('max_length', None)
  if max_length:
    env = control.wrappers.MaximumDuration(env, max_length)
  #env = control.wrappers.ConvertTo32Bit(env)
  return env

run like this:
python3 -m dreamer.scripts.train --logdir ./logdir/debug --params '{defaults: [dreamer, debug], tasks: [adom]}' \ --num_runs 1000 --resume_runs False
at this line I get obs as dict. what am I doing wrong?

Are dependencies in setup.py up to date?

Hi,

I tried installing and running this repo within a fresh virtualenv (i.e. no packages other than pip and setuptools installed) via the command:
pip install -e ~/git/dreamer.

However, I get an error when running the following command:
python3 -m dreamer.scripts.train --logdir ./logdir/atari --params '{defaults: [dreamer, pcont, discrete, atari], tasks: [atari_boxing]}'

It complains that AttributeError: 'Categorical' object has no attribute 'probs_parameter'.

It would seem that the probs_parameter method was added in TFP r0.8, but dreamer/setup.py installs tensorflow-gpu==1.13.1 and tensorflow-probability==0.6.

I resorted to installing tensorflow-gpu==1.15.2 and tensorflow-probability==0.8 and it seems to run (I now get CUDA_ERROR_OUT_OF_MEMORY: out of memory but that may due to the GPU on my Lenovo P50), but I have no way of telling if this will work as intended.

Can you please confirm the versions of TF and TFP known to work well with dreamer?

Sincerely,
Vassilios

Hyperparameter tuning?

Thanks for the great paper and repo! I was trying to reproduce your results, and using the tf-2 version of the code. I got pretty good results on walker_walk, but the final return of around 600 is not as good as reported in your paper. When it comes to other cases such as atari boxing, pong, etc, the performance is not that good. I was directly reusing the code and parameters you provided, so I was wondering whether I shall tune the hyperparameters or so forth?

Memory Leak when running agent

Hi,

I run into a memory leak when I use this function over and over and over again.

action, state = agent.policy(obs, state, training)`

Is there something that needs to be cleared? I tried calling the function inside the call that resets the state (I'm not sure that it does that I'm assuming). This one:

if state is not None and reset.any():
      mask = tf.cast(1 - reset, self._float)[:, None]
      state = tf.nest.map_structure(lambda x: x * mask, state)

A quick way to reproduce the issue is to modify the dreamer code as shown below and to run HTOP to monitor the RAM.

def main(config):
  if config.gpu_growth:
    for gpu in tf.config.experimental.list_physical_devices('GPU'):
      tf.config.experimental.set_memory_growth(gpu, True)
  assert config.precision in (16, 32), config.precision
  if config.precision == 16:
    prec.set_policy(prec.Policy('mixed_float16'))
  config.steps = int(config.steps)
  config.logdir.mkdir(parents=True, exist_ok=True)
  print('Logdir', config.logdir)

  # Create environments.
  datadir = config.logdir / 'episodes'
  writer = tf.summary.create_file_writer(
      str(config.logdir), max_queue=1000, flush_millis=20000)
  writer.set_as_default()
  train_envs = [wrappers.Async(lambda: make_env(
      config, writer, 'train', datadir, store=True), config.parallel)
      for _ in range(config.envs)]
  test_envs = [wrappers.Async(lambda: make_env(
      config, writer, 'test', datadir, store=False), config.parallel)
      for _ in range(config.envs)]
  actspace = train_envs[0].action_space

  # Prefill dataset with random episodes.
  step = count_steps(datadir, config)
  prefill = max(0, config.prefill - step)
  print(f'Prefill dataset with {prefill} steps.')
  random_agent = lambda o, d, _: ([actspace.sample() for _ in d], None)
  tools.simulate(random_agent, train_envs, prefill / config.action_repeat)
  writer.flush()

  # Train and regularly evaluate the agent.
  step = count_steps(datadir, config)
  print(f'Simulating agent for {config.steps-step} steps.')
  agent = Dreamer(config, datadir, actspace, writer)
  if (config.logdir / 'variables.pkl').exists():
    print('Load checkpoint.')
    agent.load(config.logdir / 'variables.pkl')
  state = None

  import os
  training = True
  files = os.listdir(str(datadir))
  keys = ['image','reward']
  for i in range(len(files)):
      print(i)
      episode = np.load(str(datadir)+'/'+files[i])
      episode = {k: episode[k] for k in episode.keys()}
      state=None
      for i in range(500):
          obs = {k: [episode[k][i]] for k in keys}
          action, state = agent.policy(obs, state, training)
 for env in train_envs + test_envs:
    env.close()

This happened with both python2.7 python3.8 on a ubuntu 18.04 system using tensorflow 2.1.0

Thanks in advance,

Reagrds,

Antoine

Use Dreamer on custom ENV

hi,
thanks for great work.
i'm seeking a workaround for using Dreamer on custom ENVs. ENVs are wrappers like gym and have almost all gym's methods . i'm talking about robotic simulators that u can get sensor inputs like picture and so on.
i have looked the code for couple of days ,its complicated and get no success there yet.
any tutorial, example, tips or walk through would be great.

No docstrings

Comparing this repository to PlaNet, I noticed all the docstrings have been removed in this one. Why is that?

Generating performance curves of Figure 7 (Dream to Control)

Thanks a lot for providing this great open source implementation of Dreamer!
Does running the following sample command generate the Dreamer performance curve provided in Figure 7 of the paper?

python3 -m dreamer.scripts.train --logdir ./logdir/control --params '{defaults: [dreamer], tasks: [walker_run]}'

Running for a little over 1.3M steps on 2 seeds the generated curve in (...)/trainer/test/score seems to follow the performance of PlaNet in Figure 7. Am I comparing the right curves and setting the proper parameters? Thanks a lot!