rajghugare19 / dreamerv2 Goto Github PK

View Code? Open in Web Editor NEW

237.0 237.0 40.0 3.05 MB

Pytorch implementation of Dreamer-v2: Visual Model Based RL Algorithm.

License: MIT License

Python 100.00%

dreamerv2's Introduction

Hey 👋

I am interested in deep learning and reinforcement learning algorithms.
Love to be engrossed in research.
I like coding, cycling, cooking, cognitive science and cricket.
How to reach me: [email protected]

dreamerv2's People

Contributors

Stargazers

Watchers

dreamerv2's Issues

cannot find module dreamerV2

Hi, When I download the dependencies however in your eval.py it says from dreamerv2.utils.wrapper, it says dreamerv2 module not found. Why do I need to pip install dreamerv2 when this repo is meant to me an implementation of Dreamerv2 in pytorch.
Am I missing something.
Thanks for the great repo though, really handy.

Why does the sequences of rewards start at t-1?

Thanks for sharing the code, but I have a question.
According to buffer.py.，here

def _shift_sequences(self, obs, actions, rewards, terminals):
        obs = obs[1:]
        actions = actions[:-1]
        rewards = rewards[:-1]
        terminals = terminals[:-1]
        return obs, actions, rewards, terminals

I think you want to align states with rewards, but in trainer.py, here

obs, actions, rewards, terms = self.buffer.sample()
obs = torch.tensor(obs, dtype=torch.float32).to(self.device)                         # t, t+seq_len
actions = torch.tensor(actions, dtype=torch.float32).to(self.device)                 # t-1, t+seq_len-1
rewards = torch.tensor(rewards, dtype=torch.float32).to(self.device).unsqueeze(-1)   # t-1 to t+seq_len-1
nonterms = torch.tensor(1-terms, dtype=torch.float32).to(self.device).unsqueeze(-1)  # t-1 to t+seq_len-1

Why does the sequence of rewards start at t-1?
When prefilling the buffer, a transition (s_t, a_t, r_t+1, d_t+1) is pushed into the buffer, but the r_t+1 corresponds to the s_t+1, so
when calling the _shift_sequences, the states and the rewards should be aligned, so I think the rewards may start at t rather than t - 1

procgen env

Dumb question here, but How does this algorithm compare in Procgen environment, especially compared to PPG?

Thank you

Why not align with original implementation of authors in calculating so-called `pcon`？

In the original implementation, there is

# tensorflow
weight = tf.stop_gradient(tf.math.cumprod(tf.concat([tf.ones_like(disc[:1]), disc[:-1]], 0), 0))

and in your code:

dreamerv2/dreamerv2/training/trainer.py

Line 196 in 65f5179

discount_arr = torch.cat([torch.ones_like(discount_arr[:1]), discount_arr[1:]])

# pytorch
discount_arr = torch.cat([torch.ones_like(discount_arr[:1]), discount_arr[1:]])
discount = torch.cumprod(discount_arr[:-1], 0)

I've tested that they are different when using pcon predictor. Like:

# tensorflow
x = np.arange(9).reshape(3,3)*0.1
y = tf.convert_to_tensor(x)
z = tf.math.cumprod(tf.concat([tf.ones_like(y[:1]), y[:-1]], 0), 0)
>>> z:
 <tf.Tensor: shape=(3, 3), dtype=float64, numpy=
 array([[0. , 0.1, 0.2],
        [0.3, 0.4, 0.5],
        [0.6, 0.7, 0.8]])>)

# pytorch
x = np.arange(9).reshape(3,3)*0.1
y = torch.as_tensor(x)
z = torch.cumprod(torch.cat([torch.ones_like(y[:1]), y[1:]]),0)
>>> z:
tensor([[1.0000, 1.0000, 1.0000],
        [0.3000, 0.4000, 0.5000],
        [0.1800, 0.2800, 0.4000]], dtype=torch.float64)

So why is the calculation different?

There is a reason I guess is that, because the pcon predictor is a Bernoulli distribution, so the samples are always either 0 or 1. Thus these two different ways of calculating discount weight will always bring the same output, is that right?

But what if we want the pcon predictor to output a "soft" label, then which one is right?

rajghugare19 / dreamerv2 Goto Github PK

dreamerv2's Introduction

Hey 👋

dreamerv2's People

Contributors

Stargazers

Watchers

Forkers

dreamerv2's Issues

cannot find module dreamerV2

Why does the sequences of rewards start at t-1?

procgen env

Why not align with original implementation of authors in calculating so-called `pcon`？

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent