ctallec / world-models Goto Github PK

Reimplementation of World-Models (Ha and Schmidhuber 2018) in pytorch

License: MIT License

Python 100.00%

reinforcement-learning model-based-rl pytorch

world-models's Introduction

Pytorch implementation of the "WorldModels"

Paper: Ha and Schmidhuber, "World Models", 2018. https://doi.org/10.5281/zenodo.1207631. For a quick summary of the paper and some additional experiments, visit the github page.

Prerequisites

The implementation is based on Python3 and PyTorch, check their website here for installation instructions. The rest of the requirements is included in the requirements file, to install them:

pip3 install -r requirements.txt

Running the worldmodels

The model is composed of three parts:

A Variational Auto-Encoder (VAE), whose task is to compress the input images into a compact latent representation.
A Mixture-Density Recurrent Network (MDN-RNN), trained to predict the latent encoding of the next frame given past latent encodings and actions.
A linear Controller (C), which takes both the latent encoding of the current frame, and the hidden state of the MDN-RNN given past latents and actions as input and outputs an action. It is trained to maximize the cumulated reward using the Covariance-Matrix Adaptation Evolution-Strategy (CMA-ES) from the cma python package.

In the given code, all three sections are trained separately, using the scripts trainvae.py, trainmdrnn.py and traincontroller.py.

Training scripts take as argument:

--logdir : The directory in which the models will be stored. If the logdir specified already exists, it loads the old model and continues the training.
--noreload : If you want to override a model in logdir instead of reloading it, add this option.

1. Data generation

Before launching the VAE and MDN-RNN training scripts, you need to generate a dataset of random rollouts and place it in the datasets/carracing folder.

Data generation is handled through the data/generation_script.py script, e.g.

python data/generation_script.py --rollouts 1000 --rootdir datasets/carracing --threads 8

Rollouts are generated using a brownian random policy, instead of the white noise random action_space.sample() policy from gym, providing more consistent rollouts.

2. Training the VAE

The VAE is trained using the trainvae.py file, e.g.

python trainvae.py --logdir exp_dir

3. Training the MDN-RNN

The MDN-RNN is trained using the trainmdrnn.py file, e.g.

python trainmdrnn.py --logdir exp_dir

A VAE must have been trained in the same exp_dir for this script to work.

4. Training and testing the Controller

Finally, the controller is trained using CMA-ES, e.g.

python traincontroller.py --logdir exp_dir --n-samples 4 --pop-size 4 --target-return 950 --display

You can test the obtained policy with test_controller.py e.g.

python test_controller.py --logdir exp_dir

Notes

When running on a headless server, you will need to use xvfb-run to launch the controller training script. For instance,

xvfb-run -s "-screen 0 1400x900x24" python traincontroller.py --logdir exp_dir --n-samples 4 --pop-size 4 --target-return 950 --display

If you do not have a display available and you launch traincontroller without xvfb-run, the script will fail silently (but logs are available in logdir/tmp).

Be aware that traincontroller requires heavy gpu memory usage when launched on gpus. To reduce the memory load, you can directly modify the maximum number of workers by specifying the --max-workers argument.

If you have several GPUs available, traincontroller will take advantage of all gpus specified by CUDA_VISIBLE_DEVICES.

Authors

Corentin Tallec - ctallec
Léonard Blier - leonardblier
Diviyan Kalainathan - diviyan-kalainathan

License

This project is licensed under the MIT License - see the LICENSE.md file for details

world-models's People

Contributors

Stargazers

Watchers

Forkers

diviyank codeaudit jgraving ml-lab hbcbh1999 guanlongtianzi afcarl anu-bioinfo narsil mosincos ascenoputing ssghost kimisissi pedronahum pbaljeka modurlpg supermdguy lisaacy bsivanantham trendingtechnology renebidart tegg89 showkeyjar hedgefair ewanlee kelvinson andridns xiaoschannel kevindarby lurium kei-mo hershelm alexanderimanicowenrivers b-kartal martin-philippirsch yanndubs wilson1yan pablohl lucklyric kajiyu jiahaoyao jfsantos lidongyv wh-forker zhenfengcao suswei wildermuthn saikirankannaiah436 jilljenn che-shr-cat xiongbo010 garibarba alexgonro haochihlin eliabruni curiousg102 watabe951 billyang98 harukins eric-mitchell therisingstar dsvilarkovic william0523 bwantan shenseanchen apsarath brunokm shitianyu-hue reinelieben vishnujayanand epigos andrewssdd akashkmr27089 flint-xf-fan xrosliang nasun0 limbryan fabianschuetze shahrutav jlvvlj parthjaggi shoumikmajumdar haichao-zhang panyan7 sarithamathai alikolling natu33 jdily gracefulman borooo aleeeexxx ziyu-deep jihoonpark20 tomato-rikkyo ttt496 cryptowealth-technology amitbando lupusorina comeonlby seolhokim

world-models's Issues

Worker dying issue with controller training

Hello, has anyone ran into an issue where one or more of the workers created in python traincontroller.py dies without explanation, causing the entire script to hang because it is waiting for the dead workers to finish their evaluations?

I've checked the logs created in tmp for each of the worker processes and unfortunately the .err logs seem to be uninformative or empty.

I think it might be a GPU memory issue or some issue related to some modifications I made to the CarRacing environment, but the lack of any error logging is concerning. It also does not seem to consistently happen, which is also strange.

Thanks!

Multiprocessing very slow

I am running controllertrain.pyon a google cloud VM headlessly with python 3.7 and xvfb. Everything works but I have noticed what seems to be a linear relationship between the number of workers I allow and the time for each worker to execute its rollout.

If only one worker is allowed it can run 200 steps of the environment in 5 seconds. For 10 workers each worker is only able to get 10 steps, this means that the 10 workers are actually 50% slower at getting through the iterations (each worker is outputting the iteration it is on in its rollout (added a print statement inside misc.utils.py for this))!

Has anyone else observed a similar effect? What could be wrong with my server? I am not using any GPUs, just CPU to run the VAE and MDRNN.

Thank you.

a multi-process problem in the controller

Thanks to the authors for sharing their code.
I executed the controller on Ubuntu 18.04 with one GPU.
Unfortunately, the program couldn't execute due to a multi-process problem with Cuda. After that, I put all of the code except functions in the main function and called it.
Now, it works properly.

I try to train vae in pytorch 1.0, Failed

I find BCE and KLD are all nan. Could the code update for pytorch 1.0? I will very thankfull !

sleep(0.1) leads to infty loops

Hi, I find sleep(0.1) leads to infty loops in your traincontroller.py: https://github.com/ctallec/world-models/blob/master/traincontroller.py#L95-L97 and https://github.com/ctallec/world-models/blob/master/traincontroller.py#L137-L138.

Could you help?

inconsisent MDRNN / MDRNNCell behavoir

there is different behavoir when running MDRNN vs. MDRNNCell. Specifically, I give MDRNN and MDRNNCell the same input (MDRNN is batched_sequences in the input, then I take only one sequence from the output, and compare that against the same sequence as input to the MDRNNCell). I observe that the mus and sigmas match up, but the logpi does not. The issue is related to the dimension of the softmax.

Specifically, in MDRNN the softmax is applied along the last dimension: (e.g., for a 32x16x5 input, along dimension with 5)

world-models/models/mdrnn.py

Line 100 in b711934

logpi = f.log_softmax(pi, dim=-1)

Where as is MDRNNCell the softmax is applied along the first input (e.g., for a 16x5 input, softmax is applied along dimension with 16).

world-models/models/mdrnn.py

Line 148 in b711934

logpi = f.log_softmax(pi, dim=-2)

I notice that this was changed in this commit:
5d4261e#diff-949b6b6e9db2dd11dbf333ec6fff33ed

Splitting of Train and Validation / Test set

Doesn't this splitting mean you're actually taking 400 for training and 600 for testing? I think files[:-600] takes the first 400 out of 1000, but I could be misinterpreting what your code is doing here.

Also, I thought by splitting it should reduce the dataset total for the test_loader, but both train and test loaders have the same length of dataset and takes the same amount of time to complete an epoch. Any idea why this is?

MDRNN losses extremely low due to numerical instability?

MDRNN training and GMM losses decrease abruptly to very low values, even with gradient clipping.
Was this observed in the originally tested repo, or is this result of recent PyTorch versions.
Issue persists with higher precision PyTorch configuration as well.

Epoch 0: 2912it [00:18, 158.02it/s, loss=-7490883896016783802368.000000 bce=  0.022669 gmm=-7724973763404514721792.000000 mse=  0.000000]                                                

Epoch 0: 100%|██████████████████████████████| 1936/1936 [00:12<00:00, 157.29it/s, loss=-13451842733942434168832.000000 bce=  0.000828 gmm=-13872212352094781308928.000000 mse=  0.000000]

Epoch 1: 2912it [00:18, 157.59it/s, loss=-16901104332652949798912.000000 bce=  0.000793 gmm=-17429263292277607890944.000000 mse=  0.000000]                                              

Epoch 1: 100%|██████████████████████████████| 1936/1936 [00:12<00:00, 156.85it/s, loss=-19335289690015750160384.000000 bce=  0.000749 gmm=-19939516790304420134912.000000 mse=  0.000000]

Epoch 2: 2912it [00:18, 157.39it/s, loss=-20089711310459944042496.000000 bce=  0.000734 gmm=-20717514125435083948032.000000 mse=  0.000000]                                              

Epoch 2: 100%|███████████████████████████████| 1936/1936 [01:09<00:00, 27.85it/s, loss=-20316329081654105604096.000000 bce=  0.000709 gmm=-20951213785059046719488.000000 mse=  0.000000]

Training the controller and getting stuck in local minima

I'm currently trying to train the CMA controller and I keep getting stuck at a reward of around 250-300. After that the controller just stops improving for me. I tried restarting this multiple times, but I'm getting the same result. There are no errors while training. The longest training time was 30 hours in a single session, however the last improvement during that session was after around 10 hours (very small improvement). Is my controller getting stuck at local minima?
The GPU I have here is just a single GTX 970 and I'm only able to run it with 6 workers before running out of memory. Is there further adjustment needed when running on slower hardware?

Possible error when predicting next action (class RolloutGenerator)

Hello!

Thanks, first of all, for the library. It has been of great help to me!

Now, I wanted to discuss a portion of the code that I believe to be erroneous. In class RolloutGenerator, function get_action_and_transition(), we have the following code:

def get_action_and_transition(self, obs, hidden):
    """ Get action and transition.
    Encode obs to latent using the VAE, then obtain estimation for next
    latent and next hidden state using the MDRNN and compute the controller
    corresponding action.
    :args obs: current observation (1 x 3 x 64 x 64) torch tensor
    :args hidden: current hidden state (1 x 256) torch tensor
    :returns: (action, next_hidden)
        - action: 1D np array
        - next_hidden (1 x 256) torch tensor
    """
    _, latent_mu, _ = self.vae(obs)
    action = self.controller(latent_mu, hidden[0])
    _, _, _, _, _, next_hidden = self.mdrnn(action, latent_mu, hidden)
    return action.squeeze().cpu().numpy(), next_hidden

I think this function description is quite clear. The problem is, it feeds latent_mu to both the controller and the mdrnn network. I would argue that we should use the real latent vector instead (let's call it z).

First, the current implementation is not what they do in the original World models paper, as they describe the controller as, and I quote:

C is a simple single layer linear model that maps z_t and h_t directly to action a_t at each time step.

Second, we train the mdrnn network using the latent vector 'z' (see file trainmdrnn.py, function to_latent()). Therefore, why do we use latent_mu now?

This problem affects both the training and testing of the controller. It might be the reason why you report that the memory module is of little to no help in your experiments (https://ctallec.github.io/world-models/). However, I must say I haven't done any proper testing yet.

I would like to hear your thoughts on this.

Controller training random stop

Please find the following errors:

errorController.txt
errorController1.txt

The training result also looks bad:
Loading VAE at epoch 33 with test loss 34.16893509338379
Loading MDRNN at epoch 29 with test loss 1.053613607351445
Loading Controller with reward 316.7092720531355

problem about training VAE

Hi, I am trying to train the VAE (with the step 2 command), and I have generated the datasets by the first command(8 threads and 125 data each thread), but after loading the file buffer, it gives the following error:

File "trainvae.py", line 127, in
mkdir(vae_dir)
FileNotFoundError: [Errno 2] No such file or directory: 'exp_dir/vae'

Is there anyone facing the same issue? It would be great if you can give me some suggestions about it.

thanks!

No init_hidden for mdrnn?

I do not see hidden layer function initialization in trainmdrnn.py or mdrnn.py. Can you explain if this is correct? I think you would need initialize hidden layers of RNN since your training data may come from many different episodes. Thanks in advance if you can clarify.

Negative GMM loss. How to interpret?

Hi,

I am currently training the MDR-RNN with VAEs of different latent vector sizes (LSIZE). I have noticed that the smaller the size, the smaller the GMM loss (and total loss) is. Specifically, by using an LSIZE of 4 (and default RSIZE of 256), the loss goes below zero.

On the other hand, on the code comments of trainmdrnn.py, I saw that: "The LSIZE + 2 factor is here to counteract the fact that the GMMLoss scales approximately linearily with LSIZE". So I suppose that the fact that the loss goes below zero is somewhat expected when using a small LSIZE.

Nonetheless, I wonder how we could interpret and compare the MDR-RNN losses in order to asses its performance. Also, do you know if the original World Models implementation also does the "LSIZE+2" scaling / has the loss below zero effect? I read their code but could not figure if that was the case.

Thanks again!

Shouldn't this be outside the for loop?

The variable is_best is set to None each time inside the for loop, shouldn't it be outside the loop, as in trainvae.py?

world-models/trainmdrnn.py

Line 201 in dbc0de1

cur_best = None

requirements are wrong ?

Installing the requirements with pip install -r requirements.txt makes the environment incorrect (apparently a bug in box2d which is fixed by installing gym[all] which contains a forked version of box2d.).

openai/gym#647

one question about gmm_loss function

Your code is really helpful!
But I am confused about the gmm_loss function. Why is this loss function defined like this? Can anybody help me? I will be very grateful. Thank you.

Data generation script: No module named 'utils'

During the data generation phase using the run command given in the readme, I'm getting an error during import of the utils package. This is because utils exists one level above generation_script.py.

$ python data/generation_script.py --rollouts 1000 --rootdir datasets/carracing --threads 1
xvfb-run -s "-screen 0 1400x900x24" --server-num=1 python data/carracing.py --dir datasets/carracing/thread_0 --rollouts 1001 --policy brown
Traceback (most recent call last):
  File "data/carracing.py", line 10, in <module>
    from utils.misc import sample_continuous_policy
ModuleNotFoundError: No module named 'utils'

I can get it to run with:

PYTHONPATH='.' python data/generation_script.py --rollouts 1000 --rootdir datasets/carracing --threads 1

Is everyone modifying their paths to get it to run?

trainmdrnn running only test, the test loss decreases?

I noticed a behavior which is a bit odd. If I comment out the line which runs training in trainmdrnn.py see here which means I am only running test, the test error loss is decreasing.
I am confused as to how this can be, since no gradients should be updating anything during test, right?

ETA:
I added this snippet of code in the data_pass

        wsum = 0
        for w in list(mdrnn.parameters()):
            wsum += torch.norm(w)
        print(wsum.item())

and it looks like the mdrnn weights indeed aren't changing during test (only during train) -- but I am still not sure how the test loss can be decreasing.

untrained RNN

Interesting to learn that an untrained RNN will produce the same result as trained, I was having the same doubt about the value of predicting the "Z(t+1)".
Question is: If RNN untrained, then "h" suppose to be random, how this "h" will contribute to the training of the controller?

Having trouble generation_script.py

I'm having this error while executing generation_script.py

Could you help resolving this error?

I ran in python=3.5, 3.6, pytorch=4.0, cuda=9.0, cudnn=7 and pip install -r requirements.txt

traincontroller issue - Stuck waiting for r_queue

Hi, the train controller seems to be forever waiting for the results in the result queue.

Anything I should look for in particular ?

The command I ran:
python traincontroller.py --logdir exp_dir --n-samples 4 --pop-size 4 --target-return 950 --display --max-workers 1 (I tried 4 workers, same, and 32 workers my RAM explodes)

[Error] Can not train VAE

Hi, I am trying to train the VAE (with the step 2 command), but when it tries to load the dataset (dataset/carracing) it gives the following error:

Traceback (most recent call last):
File "trainvae.py", line 63, in
dataset_train, batch_size=args.batch_size, shuffle=True, num_workers=2)
File "/home/s1881460/miniconda3/envs/mlp/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 802, in init
sampler = RandomSampler(dataset)
File "/home/s1881460/miniconda3/envs/mlp/lib/python3.7/site-packages/torch/utils/data/sampler.py", line 60, in init
self.num_samples = len(self.data_source)
File "/mnt/mscteach_home/s1881460/world-models/data/loaders.py", line 55, in len
self.load_next_buffer()
File "/mnt/mscteach_home/s1881460/world-models/data/loaders.py", line 34, in load_next_buffer
self._buffer_index = self._buffer_index % len(self._files)
ZeroDivisionError: integer division or modulo by zero

Thanks!

Wrong dimension for pi softmax

Hey,

https://github.com/ctallec/world-models/blob/master/models/mdrnn.py#L100

pi = pi.view(seq_len, bs, self.gaussians)
logpi = f.log_softmax(pi, dim=-2)

Here the softmax is done over the batch dimension of pi. Shouldn't the softmax be done over the gaussians dimension (ie. dim=-1 instead of dim=-2)?

Reparameterization trick coefficient

I'm reimplementing world-models as an exercise in learning pytorch and I noticed that the reparameterization trick in the VAE is implemented slightly differently here and in the original tensorflow. Specifically, in the original, they reparameterize with z = epsilon*exp(log_sigma / 2) + mu, and in this code we just do z = epsilon*exp(log_sigma) + mu. I'm not very familiar with VAEs, and so I wanted to make sure that this all worked out to the same results, maybe because the encoder weights will double or something. Does this always hash out the same? If we regularize the encoder weights, does it still turn out the same? Is one or the other more correct?

Controller Input

Hi, I wonder if the input to the controller should be the latent vector (z) from VAE and hidden vector from RNN?

world-models/utils/misc.py

Line 161 in d6abd9c

_, _, _, _, _, next_hidden = self.mdrnn(action, latent_mu, hidden)

But the code here shows that one of the inputs is the gaussian mean instead.

The definition of GMM linear layer may wrong? Or I have missed something?

hi ctallec,
In file mdrnn.py:
I observed the neural number in gmm_linear layer is too few, why is the output size defined as (2 * latents + 1) * gaussians + 2? Shouldn't it be 3 * latents * gaussians +2 (I also saw this definition in other implementation of mdn-rnn)? In your definition, you seem to share the pis to all gaussian element which is not feasible under my understanding of GMM. My understanding is that, each element of the latent vector has its own GMM, that is, for example, if we have 3 gaussian elements, for each z_i we have 3 mus, 3 sigmas and 3 pis. Or have I had some misunderstandings of GMM?
Best,

the train_controller always break off when trainning about 15min

when i run the train_controller.py , it is normal at start,while ,after running for about 15 min,the program will break silently, i don't know why, because i have trained VAE and MD_RNN.I have tried both in local computer and on the server,while the issue always exist.

Different transform in trainvae.py & trainmdrnn.py

In trainvae.py:
transform_test = transforms.Compose([
transforms.ToPILImage(),
transforms.Resize((RED_SIZE, RED_SIZE)),
transforms.ToTensor(),
])

While in trainmdrnn.py:
transform = transforms.Lambda(
lambda x: np.transpose(x, (0, 3, 1, 2)) / 255)

I don't quite understand why there would be such difference and its consequences for the encoder processing of VAE model.

RuntimeError: invalid argument 2: size '[-1 x 3 x 96 x 96]' is invalid
for input with 6291456 elements at /pytorch/aten/src/TH/THStorage.c

When trying to run python trainmdrnn.py --logdir exp_dir (After generation and training vae).

Not sure what is going on there.

[Question] Some controller training questions

Hi,

I have some doubts regarding the controller training:

Which is the meaning of the screen outputs?

How much usually takes to be trained (e.g. README parameters, one worker/1060 Ti GTX)? Does it converge to a specific error value?
I am not really sure how much a difference increasing the population and n-samples makes in the training.

Thanks!

Error training MD-rnn

Hi,

I am facing this invalid input size error when training dmrnn.

File "trainmdrnn.py", line 205, in
test_loss = test(e)
File "trainmdrnn.py", line 170, in data_pass
latent_obs, latent_next_obs = to_latent(obs, next_obs)
File "trainmdrnn.py", line 108, in to_latent
[(obs_mu, obs_logsigma), (next_obs_mu, next_obs_logsigma)]]
File "trainmdrnn.py", line 107, in
for x_mu, x_logsigma in
RuntimeError: shape '[16, 32, 32]' is invalid for input of size 11264

issue about gmm_loss

In the file models/mdrnn.py, why minus max_log_probs first (line 38) and then add max_log_probs.squeeze() (line 43) ?

batch = batch.unsqueeze(-2)
normal_dist = Normal(mus, sigmas)
g_log_probs = normal_dist.log_prob(batch)
g_log_probs = logpi + torch.sum(g_log_probs, dim=-1)
max_log_probs = torch.max(g_log_probs, dim=-1, keepdim=True)[0]
g_log_probs = g_log_probs - max_log_probs
g_probs = torch.exp(g_log_probs)
probs = torch.sum(g_probs, dim=-1)

log_prob = max_log_probs.squeeze() + torch.log(probs)
if reduce:
    return - torch.mean(log_prob)
return - log_prob

Is there a corresponding mathematical formula？

ctallec / world-models Goto Github PK

world-models's Introduction

Pytorch implementation of the "WorldModels"

Prerequisites

Running the worldmodels

1. Data generation

2. Training the VAE

3. Training the MDN-RNN

4. Training and testing the Controller

Notes

Authors

License

world-models's People

Contributors

Stargazers

Watchers

Forkers

world-models's Issues

Recommend Projects

Recommend Topics

Recommend Org