ctallec / world-models Goto Github PK
View Code? Open in Web Editor NEWReimplementation of World-Models (Ha and Schmidhuber 2018) in pytorch
License: MIT License
Reimplementation of World-Models (Ha and Schmidhuber 2018) in pytorch
License: MIT License
In the file models/mdrnn.py, why minus max_log_probs first (line 38) and then add max_log_probs.squeeze() (line 43) ?
batch = batch.unsqueeze(-2)
normal_dist = Normal(mus, sigmas)
g_log_probs = normal_dist.log_prob(batch)
g_log_probs = logpi + torch.sum(g_log_probs, dim=-1)
max_log_probs = torch.max(g_log_probs, dim=-1, keepdim=True)[0]
g_log_probs = g_log_probs - max_log_probs
g_probs = torch.exp(g_log_probs)
probs = torch.sum(g_probs, dim=-1)
log_prob = max_log_probs.squeeze() + torch.log(probs)
if reduce:
return - torch.mean(log_prob)
return - log_prob
Is there a corresponding mathematical formula?
Hello!
Thanks, first of all, for the library. It has been of great help to me!
Now, I wanted to discuss a portion of the code that I believe to be erroneous. In class RolloutGenerator
, function get_action_and_transition()
, we have the following code:
def get_action_and_transition(self, obs, hidden):
""" Get action and transition.
Encode obs to latent using the VAE, then obtain estimation for next
latent and next hidden state using the MDRNN and compute the controller
corresponding action.
:args obs: current observation (1 x 3 x 64 x 64) torch tensor
:args hidden: current hidden state (1 x 256) torch tensor
:returns: (action, next_hidden)
- action: 1D np array
- next_hidden (1 x 256) torch tensor
"""
_, latent_mu, _ = self.vae(obs)
action = self.controller(latent_mu, hidden[0])
_, _, _, _, _, next_hidden = self.mdrnn(action, latent_mu, hidden)
return action.squeeze().cpu().numpy(), next_hidden
I think this function description is quite clear. The problem is, it feeds latent_mu
to both the controller and the mdrnn network. I would argue that we should use the real latent vector instead (let's call it z
).
First, the current implementation is not what they do in the original World models paper, as they describe the controller as, and I quote:
C is a simple single layer linear model that maps z_t and h_t directly to action a_t at each time step.
Second, we train the mdrnn network using the latent vector 'z' (see file trainmdrnn.py
, function to_latent()
). Therefore, why do we use latent_mu
now?
This problem affects both the training and testing of the controller. It might be the reason why you report that the memory module is of little to no help in your experiments (https://ctallec.github.io/world-models/). However, I must say I haven't done any proper testing yet.
I would like to hear your thoughts on this.
RuntimeError: invalid argument 2: size '[-1 x 3 x 96 x 96]' is invalid
for input with 6291456 elements at /pytorch/aten/src/TH/THStorage.c
When trying to run python trainmdrnn.py --logdir exp_dir
(After generation and training vae).
Not sure what is going on there.
Thanks to the authors for sharing their code.
I executed the controller on Ubuntu 18.04 with one GPU.
Unfortunately, the program couldn't execute due to a multi-process problem with Cuda. After that, I put all of the code except functions in the main function and called it.
Now, it works properly.
hi ctallec,
In file mdrnn.py:
I observed the neural number in gmm_linear layer is too few, why is the output size defined as (2 * latents + 1) * gaussians + 2? Shouldn't it be 3 * latents * gaussians +2 (I also saw this definition in other implementation of mdn-rnn)? In your definition, you seem to share the pis to all gaussian element which is not feasible under my understanding of GMM. My understanding is that, each element of the latent vector has its own GMM, that is, for example, if we have 3 gaussian elements, for each z_i we have 3 mus, 3 sigmas and 3 pis. Or have I had some misunderstandings of GMM?
Best,
Hi, I find sleep(0.1)
leads to infty loops in your traincontroller.py
: https://github.com/ctallec/world-models/blob/master/traincontroller.py#L95-L97 and https://github.com/ctallec/world-models/blob/master/traincontroller.py#L137-L138.
Could you help?
Hi, I am trying to train the VAE (with the step 2 command), and I have generated the datasets by the first command(8 threads and 125 data each thread), but after loading the file buffer, it gives the following error:
File "trainvae.py", line 127, in
mkdir(vae_dir)
FileNotFoundError: [Errno 2] No such file or directory: 'exp_dir/vae'
Is there anyone facing the same issue? It would be great if you can give me some suggestions about it.
thanks!
Hi, I wonder if the input to the controller should be the latent vector (z) from VAE and hidden vector from RNN?
Line 161 in d6abd9c
But the code here shows that one of the inputs is the gaussian mean instead.
The variable is_best is set to None each time inside the for loop, shouldn't it be outside the loop, as in trainvae.py?
Line 201 in dbc0de1
Installing the requirements with pip install -r requirements.txt
makes the environment incorrect (apparently a bug in box2d which is fixed by installing gym[all] which contains a forked version of box2d.).
I'm currently trying to train the CMA controller and I keep getting stuck at a reward of around 250-300. After that the controller just stops improving for me. I tried restarting this multiple times, but I'm getting the same result. There are no errors while training. The longest training time was 30 hours in a single session, however the last improvement during that session was after around 10 hours (very small improvement). Is my controller getting stuck at local minima?
The GPU I have here is just a single GTX 970 and I'm only able to run it with 6 workers before running out of memory. Is there further adjustment needed when running on slower hardware?
Hey,
https://github.com/ctallec/world-models/blob/master/models/mdrnn.py#L100
pi = pi.view(seq_len, bs, self.gaussians)
logpi = f.log_softmax(pi, dim=-2)
Here the softmax is done over the batch dimension of pi. Shouldn't the softmax be done over the gaussians dimension (ie. dim=-1
instead of dim=-2
)?
I am running controllertrain.py
on a google cloud VM headlessly with python 3.7 and xvfb. Everything works but I have noticed what seems to be a linear relationship between the number of workers I allow and the time for each worker to execute its rollout.
If only one worker is allowed it can run 200 steps of the environment in 5 seconds. For 10 workers each worker is only able to get 10 steps, this means that the 10 workers are actually 50% slower at getting through the iterations (each worker is outputting the iteration it is on in its rollout (added a print statement inside misc.utils.py
for this))!
Has anyone else observed a similar effect? What could be wrong with my server? I am not using any GPUs, just CPU to run the VAE and MDRNN.
Thank you.
Interesting to learn that an untrained RNN will produce the same result as trained, I was having the same doubt about the value of predicting the "Z(t+1)".
Question is: If RNN untrained, then "h" suppose to be random, how this "h" will contribute to the training of the controller?
I find BCE and KLD are all nan. Could the code update for pytorch 1.0? I will very thankfull !
Hi,
I have some doubts regarding the controller training:
How much usually takes to be trained (e.g. README parameters, one worker/1060 Ti GTX)? Does it converge to a specific error value?
I am not really sure how much a difference increasing the population and n-samples makes in the training.
Thanks!
I noticed a behavior which is a bit odd. If I comment out the line which runs training in trainmdrnn.py see here which means I am only running test, the test error loss is decreasing.
I am confused as to how this can be, since no gradients should be updating anything during test, right?
ETA:
I added this snippet of code in the data_pass
wsum = 0
for w in list(mdrnn.parameters()):
wsum += torch.norm(w)
print(wsum.item())
and it looks like the mdrnn weights indeed aren't changing during test (only during train) -- but I am still not sure how the test loss can be decreasing.
Hi,
I am currently training the MDR-RNN with VAEs of different latent vector sizes (LSIZE). I have noticed that the smaller the size, the smaller the GMM loss (and total loss) is. Specifically, by using an LSIZE of 4 (and default RSIZE of 256), the loss goes below zero.
On the other hand, on the code comments of trainmdrnn.py, I saw that: "The LSIZE + 2 factor is here to counteract the fact that the GMMLoss scales approximately linearily with LSIZE". So I suppose that the fact that the loss goes below zero is somewhat expected when using a small LSIZE.
Nonetheless, I wonder how we could interpret and compare the MDR-RNN losses in order to asses its performance. Also, do you know if the original World Models implementation also does the "LSIZE+2" scaling / has the loss below zero effect? I read their code but could not figure if that was the case.
Thanks again!
when i run the train_controller.py , it is normal at start,while ,after running for about 15 min,the program will break silently, i don't know why, because i have trained VAE and MD_RNN.I have tried both in local computer and on the server,while the issue always exist.
there is different behavoir when running MDRNN vs. MDRNNCell. Specifically, I give MDRNN and MDRNNCell the same input (MDRNN is batched_sequences in the input, then I take only one sequence from the output, and compare that against the same sequence as input to the MDRNNCell). I observe that the mus and sigmas match up, but the logpi does not. The issue is related to the dimension of the softmax.
Specifically, in MDRNN the softmax is applied along the last dimension: (e.g., for a 32x16x5 input, along dimension with 5)
Line 100 in b711934
Where as is MDRNNCell the softmax is applied along the first input (e.g., for a 16x5 input, softmax is applied along dimension with 16).
Line 148 in b711934
I notice that this was changed in this commit:
5d4261e#diff-949b6b6e9db2dd11dbf333ec6fff33ed
Hi, I am trying to train the VAE (with the step 2 command), but when it tries to load the dataset (dataset/carracing) it gives the following error:
Traceback (most recent call last):
File "trainvae.py", line 63, in
dataset_train, batch_size=args.batch_size, shuffle=True, num_workers=2)
File "/home/s1881460/miniconda3/envs/mlp/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 802, in init
sampler = RandomSampler(dataset)
File "/home/s1881460/miniconda3/envs/mlp/lib/python3.7/site-packages/torch/utils/data/sampler.py", line 60, in init
self.num_samples = len(self.data_source)
File "/mnt/mscteach_home/s1881460/world-models/data/loaders.py", line 55, in len
self.load_next_buffer()
File "/mnt/mscteach_home/s1881460/world-models/data/loaders.py", line 34, in load_next_buffer
self._buffer_index = self._buffer_index % len(self._files)
ZeroDivisionError: integer division or modulo by zero
Thanks!
Hi,
I am facing this invalid input size error when training dmrnn.
File "trainmdrnn.py", line 205, in
test_loss = test(e)
File "trainmdrnn.py", line 170, in data_pass
latent_obs, latent_next_obs = to_latent(obs, next_obs)
File "trainmdrnn.py", line 108, in to_latent
[(obs_mu, obs_logsigma), (next_obs_mu, next_obs_logsigma)]]
File "trainmdrnn.py", line 107, in
for x_mu, x_logsigma in
RuntimeError: shape '[16, 32, 32]' is invalid for input of size 11264
In trainvae.py:
transform_test = transforms.Compose([
transforms.ToPILImage(),
transforms.Resize((RED_SIZE, RED_SIZE)),
transforms.ToTensor(),
])
While in trainmdrnn.py:
transform = transforms.Lambda(
lambda x: np.transpose(x, (0, 3, 1, 2)) / 255)
I don't quite understand why there would be such difference and its consequences for the encoder processing of VAE model.
MDRNN training and GMM losses decrease abruptly to very low values, even with gradient clipping.
Was this observed in the originally tested repo, or is this result of recent PyTorch versions.
Issue persists with higher precision PyTorch configuration as well.
Epoch 0: 2912it [00:18, 158.02it/s, loss=-7490883896016783802368.000000 bce= 0.022669 gmm=-7724973763404514721792.000000 mse= 0.000000]
Epoch 0: 100%|██████████████████████████████| 1936/1936 [00:12<00:00, 157.29it/s, loss=-13451842733942434168832.000000 bce= 0.000828 gmm=-13872212352094781308928.000000 mse= 0.000000]
Epoch 1: 2912it [00:18, 157.59it/s, loss=-16901104332652949798912.000000 bce= 0.000793 gmm=-17429263292277607890944.000000 mse= 0.000000]
Epoch 1: 100%|██████████████████████████████| 1936/1936 [00:12<00:00, 156.85it/s, loss=-19335289690015750160384.000000 bce= 0.000749 gmm=-19939516790304420134912.000000 mse= 0.000000]
Epoch 2: 2912it [00:18, 157.39it/s, loss=-20089711310459944042496.000000 bce= 0.000734 gmm=-20717514125435083948032.000000 mse= 0.000000]
Epoch 2: 100%|███████████████████████████████| 1936/1936 [01:09<00:00, 27.85it/s, loss=-20316329081654105604096.000000 bce= 0.000709 gmm=-20951213785059046719488.000000 mse= 0.000000]
Your code is really helpful!
But I am confused about the gmm_loss function. Why is this loss function defined like this? Can anybody help me? I will be very grateful. Thank you.
Doesn't this splitting mean you're actually taking 400 for training and 600 for testing? I think files[:-600]
takes the first 400 out of 1000, but I could be misinterpreting what your code is doing here.
Also, I thought by splitting it should reduce the dataset total for the test_loader, but both train and test loaders have the same length of dataset and takes the same amount of time to complete an epoch. Any idea why this is?
I do not see hidden layer function initialization in trainmdrnn.py or mdrnn.py. Can you explain if this is correct? I think you would need initialize hidden layers of RNN since your training data may come from many different episodes. Thanks in advance if you can clarify.
I'm reimplementing world-models as an exercise in learning pytorch and I noticed that the reparameterization trick in the VAE is implemented slightly differently here and in the original tensorflow. Specifically, in the original, they reparameterize with z = epsilon*exp(log_sigma / 2) + mu, and in this code we just do z = epsilon*exp(log_sigma) + mu. I'm not very familiar with VAEs, and so I wanted to make sure that this all worked out to the same results, maybe because the encoder weights will double or something. Does this always hash out the same? If we regularize the encoder weights, does it still turn out the same? Is one or the other more correct?
Hi, the train controller seems to be forever waiting for the results in the result queue.
Anything I should look for in particular ?
The command I ran:
python traincontroller.py --logdir exp_dir --n-samples 4 --pop-size 4 --target-return 950 --display --max-workers 1
(I tried 4 workers, same, and 32 workers my RAM explodes)
Please find the following errors:
errorController.txt
errorController1.txt
The training result also looks bad:
Loading VAE at epoch 33 with test loss 34.16893509338379
Loading MDRNN at epoch 29 with test loss 1.053613607351445
Loading Controller with reward 316.7092720531355
During the data generation phase using the run command given in the readme, I'm getting an error during import of the utils
package. This is because utils
exists one level above generation_script.py
.
$ python data/generation_script.py --rollouts 1000 --rootdir datasets/carracing --threads 1
xvfb-run -s "-screen 0 1400x900x24" --server-num=1 python data/carracing.py --dir datasets/carracing/thread_0 --rollouts 1001 --policy brown
Traceback (most recent call last):
File "data/carracing.py", line 10, in <module>
from utils.misc import sample_continuous_policy
ModuleNotFoundError: No module named 'utils'
I can get it to run with:
PYTHONPATH='.' python data/generation_script.py --rollouts 1000 --rootdir datasets/carracing --threads 1
Is everyone modifying their paths to get it to run?
Hello, has anyone ran into an issue where one or more of the workers created in python traincontroller.py
dies without explanation, causing the entire script to hang because it is waiting for the dead workers to finish their evaluations?
I've checked the logs created in tmp
for each of the worker processes and unfortunately the .err
logs seem to be uninformative or empty.
I think it might be a GPU memory issue or some issue related to some modifications I made to the CarRacing environment, but the lack of any error logging is concerning. It also does not seem to consistently happen, which is also strange.
Thanks!
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.