mcgillmrl / prob_mbrl Goto Github PK
View Code? Open in Web Editor NEWA library of probabilistic model based RL algorithms in pytorch
License: MIT License
A library of probabilistic model based RL algorithms in pytorch
License: MIT License
In the DiagGaussianDensity class, how the log_probability is handled is different. I'm not sure if this is intentional or not but in the forward method there's the following scaling:
forward( ...):
log_std = -torch.nn.functional.softplus(
-log_std + self.max_log_std) + self.max_log_std
In the log_prob(...) function though, this isn't done.
I've tried adding it in the log_prob function and it seems to restrict how much the model can learn...which presents a new issue but from playing with this as a strictly supervised learning setting presumably the log-likelihood can just shoot off indefinitely by consistently cranking up the logsigma during optimization :/
I think this is a known issue, but just to put out there, the behavior of the Regressor class is wrong when you don't use an output density and are are using normalzied inputs.
In the forward function:
#rest of the forward method in Regressor
outs = self.model(x, **kwargs)
if callable(self.output_density):
scaling_params = (self.my, self.Sy) if normalize else None
outs = self.output_density(outs,
scaling_params=scaling_params,
**kwargs)
there should be an if condition in here:
#rest of the forward method in Regressor
outs = self.model(x, **kwargs)
if callable(self.output_density):
scaling_params = (self.my, self.Sy) if normalize else None
outs = self.output_density(outs,
scaling_params=scaling_params,
**kwargs)
elif normalize:
outs = outs * self.Sy + self.my #something like this I think...
Hi there !
Thanks for the great work on the codebase !
I am not sure how common unit testing is a common practice in RL codebases, but I thought it could be one potential improvement to the code, to improve its maintainability.
Thanks !
Hi there!
My teams thinking to use your codebase as a starting place for a model-based RL project we've just started because it's in PyTorch and related to our work. I tried running it the other day, but it's a little bit fuzzy to me on whether deep pilco is actually solving the task or not. Is there any advice you might have to verify this? From the plots that get generated, it seems like the controller ends up returning the same action every time. I also sent an email to the first author...so I due apologize if this is redundant.
Also, is the dependency on the original code base still relevant, or is that an old comment?
Also also, I due apologize if the headline to my inquiry is a bit blunt...but that's basically the gist of any of my questions.
Thank you!
I'm not quite sure that the behavior for the Policy is correct when collecting samples from it. I can only comment for the deep_pilco_mm.py scenario, but I just notice in the ./prob_mbrl/apply_controller.py function there is no call to set the policy to evaluation mode (e.g. policy.eval()). This means, particularly if pegasus is being used, that the paths in the neural network being optimized will be fixed to whatever the LAST dropout mask was when collecting new data.
This could be an issue in a scenario, for example, where the policy's parameters get resampled near the end the end of policy optimization (e.g. step 998/ 1000) which would change the currently active neurons to a set that was not being optimized for the majority of the current policy iteration cycle.
I can imagine 2 ways about addressing this:
The way the code is written, resample does not actually change the dropout masks used during training the policy. The only scenario it does happen is if the dropout mask and input shapes are misaligned.
My proposal is something to the affect of flagging this when the model is in evaluation mode.
For example, in update_noise(self):
if not self.training:
self.update_concrete_noise() #other whatever it's called
Minor thing, but during the optimization process of the dynamics model, it looks like the minibatches are always lumped together in the same order, and I think generally during supervised learning you want to reshuffle your minibatches.
I think it's as simple as this:
def iterate_minibatches(inputs, targets, batchsize):
assert len(inputs) == len(targets)
N = len(inputs)
#2 commented out lines below are behavior
#indices = np.arange(0, max(N, batchsize)) % N #original
#np.random.shuffle(indices) #original
while True:
#after 1 epoch, reshuffle the indices
indices = np.arange(0, max(N, batchsize)) % N #proposed change
np.random.shuffle(indices) #proposed change
for i in range(0, len(inputs), batchsize):
idx = indices[i:i + batchsize]
yield inputs[idx], targets[idx], idx
This is in prob_mbrl/utils/train_regressor.py
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.