Git Product home page Git Product logo

prob_mbrl's People

Contributors

anassinator avatar juancamilog avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

prob_mbrl's Issues

Inconsistency with forward vs log_probability (more an inquiry)

In the DiagGaussianDensity class, how the log_probability is handled is different. I'm not sure if this is intentional or not but in the forward method there's the following scaling:

forward( ...):
      log_std = -torch.nn.functional.softplus(
            -log_std + self.max_log_std) + self.max_log_std

In the log_prob(...) function though, this isn't done.

I've tried adding it in the log_prob function and it seems to restrict how much the model can learn...which presents a new issue but from playing with this as a strictly supervised learning setting presumably the log-likelihood can just shoot off indefinitely by consistently cranking up the logsigma during optimization :/

Regressor does not un-normalize outputs when not using output density

I think this is a known issue, but just to put out there, the behavior of the Regressor class is wrong when you don't use an output density and are are using normalzied inputs.
In the forward function:

      #rest of the forward method in Regressor
       outs = self.model(x, **kwargs)
        if callable(self.output_density):
            scaling_params = (self.my, self.Sy) if normalize else None
            outs = self.output_density(outs,
                                       scaling_params=scaling_params,
                                       **kwargs)

there should be an if condition in here:

      #rest of the forward method in Regressor
       outs = self.model(x, **kwargs)
        if callable(self.output_density):
            scaling_params = (self.my, self.Sy) if normalize else None
            outs = self.output_density(outs,
                                       scaling_params=scaling_params,
                                       **kwargs)
       elif normalize:
             outs = outs * self.Sy + self.my #something like this I think...

[FEA] Unit Testing

Hi there !
Thanks for the great work on the codebase !
I am not sure how common unit testing is a common practice in RL codebases, but I thought it could be one potential improvement to the code, to improve its maintainability.
Thanks !

Does this code work currently?

Hi there!

My teams thinking to use your codebase as a starting place for a model-based RL project we've just started because it's in PyTorch and related to our work. I tried running it the other day, but it's a little bit fuzzy to me on whether deep pilco is actually solving the task or not. Is there any advice you might have to verify this? From the plots that get generated, it seems like the controller ends up returning the same action every time. I also sent an email to the first author...so I due apologize if this is redundant.

Also, is the dependency on the original code base still relevant, or is that an old comment?

Also also, I due apologize if the headline to my inquiry is a bit blunt...but that's basically the gist of any of my questions.

Thank you!

Dropout Should be turned off when collecting new samples with policy

I'm not quite sure that the behavior for the Policy is correct when collecting samples from it. I can only comment for the deep_pilco_mm.py scenario, but I just notice in the ./prob_mbrl/apply_controller.py function there is no call to set the policy to evaluation mode (e.g. policy.eval()). This means, particularly if pegasus is being used, that the paths in the neural network being optimized will be fixed to whatever the LAST dropout mask was when collecting new data.
This could be an issue in a scenario, for example, where the policy's parameters get resampled near the end the end of policy optimization (e.g. step 998/ 1000) which would change the currently active neurons to a set that was not being optimized for the majority of the current policy iteration cycle.
I can imagine 2 ways about addressing this:

  1. The policy is set to evaluation and so dropout is not used when collecting samples (so all neurons are used). since the models have a custom resample() method I imagine this might actually require additional work to verify the dropout is actually turned off.
  2. Monte carlo sample policy weights during evaluation and taking the average over the number of samples.
  3. Do nothing as perhaps I am missing something when I wrote this (so maybe this behavior IS what you want for a reason not necessarily clear to me).

Resample() does not update dropout masks during policy training

The way the code is written, resample does not actually change the dropout masks used during training the policy. The only scenario it does happen is if the dropout mask and input shapes are misaligned.

My proposal is something to the affect of flagging this when the model is in evaluation mode.
For example, in update_noise(self):

if not self.training:
      self.update_concrete_noise() #other whatever it's called

Iterate_minibatches: does not reshuffle order of training data

Minor thing, but during the optimization process of the dynamics model, it looks like the minibatches are always lumped together in the same order, and I think generally during supervised learning you want to reshuffle your minibatches.

I think it's as simple as this:

def iterate_minibatches(inputs, targets, batchsize):
    assert len(inputs) == len(targets)
    N = len(inputs)
   #2 commented out lines below are behavior
    #indices = np.arange(0, max(N, batchsize)) % N #original 
    #np.random.shuffle(indices) #original
    while True:
       #after 1 epoch, reshuffle the indices
       indices = np.arange(0, max(N, batchsize)) % N #proposed change
        np.random.shuffle(indices) #proposed change
        for i in range(0, len(inputs), batchsize):
            idx = indices[i:i + batchsize]
            yield inputs[idx], targets[idx], idx

This is in prob_mbrl/utils/train_regressor.py

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.