mcgillmrl / prob_mbrl Goto Github PK

View Code? Open in Web Editor NEW

106.0 106.0 14.0 19.18 MB

A library of probabilistic model based RL algorithms in pytorch

License: MIT License

Jupyter Notebook 84.69% Python 15.31%

prob_mbrl's People

Contributors

Stargazers

Watchers

Forkers

anassinator mhdella shanlin-fighting hejia-zhang collector-m mi-przystupa joshholla allensmile praneethsv daniellsm fagan2888 taomo juancamilog staminatang

prob_mbrl's Issues

Inconsistency with forward vs log_probability (more an inquiry)

In the DiagGaussianDensity class, how the log_probability is handled is different. I'm not sure if this is intentional or not but in the forward method there's the following scaling:

forward( ...):
      log_std = -torch.nn.functional.softplus(
            -log_std + self.max_log_std) + self.max_log_std

In the log_prob(...) function though, this isn't done.

I've tried adding it in the log_prob function and it seems to restrict how much the model can learn...which presents a new issue but from playing with this as a strictly supervised learning setting presumably the log-likelihood can just shoot off indefinitely by consistently cranking up the logsigma during optimization :/

Regressor does not un-normalize outputs when not using output density

I think this is a known issue, but just to put out there, the behavior of the Regressor class is wrong when you don't use an output density and are are using normalzied inputs.
In the forward function:

      #rest of the forward method in Regressor
       outs = self.model(x, **kwargs)
        if callable(self.output_density):
            scaling_params = (self.my, self.Sy) if normalize else None
            outs = self.output_density(outs,
                                       scaling_params=scaling_params,
                                       **kwargs)

there should be an if condition in here:

      #rest of the forward method in Regressor
       outs = self.model(x, **kwargs)
        if callable(self.output_density):
            scaling_params = (self.my, self.Sy) if normalize else None
            outs = self.output_density(outs,
                                       scaling_params=scaling_params,
                                       **kwargs)
       elif normalize:
             outs = outs * self.Sy + self.my #something like this I think...

[FEA] Unit Testing

Hi there !
Thanks for the great work on the codebase !
I am not sure how common unit testing is a common practice in RL codebases, but I thought it could be one potential improvement to the code, to improve its maintainability.
Thanks !

Does this code work currently?

Hi there!

My teams thinking to use your codebase as a starting place for a model-based RL project we've just started because it's in PyTorch and related to our work. I tried running it the other day, but it's a little bit fuzzy to me on whether deep pilco is actually solving the task or not. Is there any advice you might have to verify this? From the plots that get generated, it seems like the controller ends up returning the same action every time. I also sent an email to the first author...so I due apologize if this is redundant.

Also, is the dependency on the original code base still relevant, or is that an old comment?

Also also, I due apologize if the headline to my inquiry is a bit blunt...but that's basically the gist of any of my questions.

Thank you!

Dropout Should be turned off when collecting new samples with policy

I'm not quite sure that the behavior for the Policy is correct when collecting samples from it. I can only comment for the deep_pilco_mm.py scenario, but I just notice in the ./prob_mbrl/apply_controller.py function there is no call to set the policy to evaluation mode (e.g. policy.eval()). This means, particularly if pegasus is being used, that the paths in the neural network being optimized will be fixed to whatever the LAST dropout mask was when collecting new data.
This could be an issue in a scenario, for example, where the policy's parameters get resampled near the end the end of policy optimization (e.g. step 998/ 1000) which would change the currently active neurons to a set that was not being optimized for the majority of the current policy iteration cycle.
I can imagine 2 ways about addressing this:

The policy is set to evaluation and so dropout is not used when collecting samples (so all neurons are used). since the models have a custom resample() method I imagine this might actually require additional work to verify the dropout is actually turned off.
Monte carlo sample policy weights during evaluation and taking the average over the number of samples.
Do nothing as perhaps I am missing something when I wrote this (so maybe this behavior IS what you want for a reason not necessarily clear to me).

Resample() does not update dropout masks during policy training

The way the code is written, resample does not actually change the dropout masks used during training the policy. The only scenario it does happen is if the dropout mask and input shapes are misaligned.

My proposal is something to the affect of flagging this when the model is in evaluation mode.
For example, in update_noise(self):

if not self.training:
      self.update_concrete_noise() #other whatever it's called

Iterate_minibatches: does not reshuffle order of training data

Minor thing, but during the optimization process of the dynamics model, it looks like the minibatches are always lumped together in the same order, and I think generally during supervised learning you want to reshuffle your minibatches.

I think it's as simple as this:

def iterate_minibatches(inputs, targets, batchsize):
    assert len(inputs) == len(targets)
    N = len(inputs)
   #2 commented out lines below are behavior
    #indices = np.arange(0, max(N, batchsize)) % N #original 
    #np.random.shuffle(indices) #original
    while True:
       #after 1 epoch, reshuffle the indices
       indices = np.arange(0, max(N, batchsize)) % N #proposed change
        np.random.shuffle(indices) #proposed change
        for i in range(0, len(inputs), batchsize):
            idx = indices[i:i + batchsize]
            yield inputs[idx], targets[idx], idx

This is in prob_mbrl/utils/train_regressor.py

mcgillmrl / prob_mbrl Goto Github PK

prob_mbrl's People

Contributors

Stargazers

Watchers

Forkers

prob_mbrl's Issues

Inconsistency with forward vs log_probability (more an inquiry)

Regressor does not un-normalize outputs when not using output density

[FEA] Unit Testing

Does this code work currently?

Dropout Should be turned off when collecting new samples with policy

Resample() does not update dropout masks during policy training

Iterate_minibatches: does not reshuffle order of training data

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent