vincent-leguen / phydnet Goto Github PK

View Code? Open in Web Editor NEW

175.0 175.0 50.0 19.62 MB

Code for our CVPR 2020 paper "Disentangling Physical Dynamics from Unknown Factors for UnsupervisedVideo Prediction"

License: MIT License

Python 99.48% Shell 0.52%

phydnet's People

Contributors

Stargazers

Watchers

phydnet's Issues

MSE obtained far higher from reported results

Hi Authors,

I ran the code using the default configuration on Moving MNIST by directly executing python3 main.py.

The final MSE obtained after 1000 epochs is around 75.26 which is far higher than the MSE reported in the paper which is 24.4. Is there anything that I'm missing here? Thanks!

Eugene

'@' symbol error in constrain_moments.py file - line 161

I'm not sure if the @ symbol in the constrain_moments.py file at line 161:

c = a@b

is a typo or not but it is not allowing the code to compile. Thanks!

could you provide the full experiment settings?

could you provide the full experiment settings such as batch_size,alpha,epoch and input size?

Map

hello，I am new to this direction and I would like to know how Figure 4 of your paper on this work is visualized. Is there any code you can refer to

Thank you for you work! After reading your paper, I did ablation studies with different time block combinations: lstm+lstm, phy+lstm and phy+phy. When keeping the training epoch as the same, the order of performance on moving-mnist from high to low is: lstm+lstm > phy+lstm > phy+phy:
##################################################################################
lstm+lstm(500 epochs):eval mse 32.77762456483479 eval mae 89.78828951678699 eval ssim 0.9261694075639623 eval bce 381.8145131557803

phy+lstm(500 epochs):eval mse 37.04977672914915 eval mae 100.31283704540398 eval ssim 0.9129215236492586 eval bce 410.4891109804565

phy+phy(500 epochs): eval mse 40.79421605943124 eval mae 107.12226124051251 eval ssim 0.9043235692587113 eval bce 433.5146431838409

##################################################################################
lstm+lstm(1000 epochs, training for 10d 9h):eval mse 31.734661367875113 eval mae 87.48544041114518 eval ssim 0.9288795915640016 eval bce 375.4039376560644

phy+lstm(1000 epochs, training for 8d 8h):eval mse 36.324036610277396 eval mae 98.47413142723373 eval ssim 0.9151237767597262 eval bce 405.9187741050238

phy+phy(1000 epochs, training for 8d 12h): eval mse 40.345165107823625 eval mae 106.5546834438662 eval ssim 0.9050380287013375 eval bce 430.59628875346124

where batch_size=64, lr=0.0001. I mean no harm by merely reporting the results of the experiment. Thanks.

In your paper figure 5 is wrong

nil

boundary conditions?

After a quick read of the paper and the supplementary material I could not find anything related to boundary conditions.
So question is: are there any boundary conditions in the SST study of the paper and how could those be implemented?
(with global sst there would be boundary at lon=180East=180West)

Need a multi-GPU version

Currently, it does not work with nn.DataParallel(encoder). Need a multi-GPU version.

An inference example is due needed.

Thank you for this awesome work,
Training the model works, with some tweaks.
The frozen test set file is missing.
I forked the repo and corrected this issues, but a inference example is missing. (generating actual images).
I built a jupyter notebook that do the training here
I also put a moving_mnist repo with some algorithms to forecast images. I have a fully integrated PhyDNet with latest fastai. You can check it here. it makes training a testing so much nicer.

Taylor expansion in

Can other pretrained models (human3.6) be made available?

I successfully ran the moving mnist model and now I am also interested in running the evaluation code on the human3.6 dataset. Can this pretrained model also be made available?

About the metric on Tracffic BJ?

The shape of data in dataset Traffic BJ is 2 channels. (shape: [B, 2, 32, 32])
The mse in Traffic BJ are the mean of channels or the sum of the channels ?
Because I noticed that the code in repo is written as below:

np.mean((predictions-target)**2 , axis=(0,1,2)).sum()

About the prediction mode. Are all the training processes in this article under "prediction-only" mode? is there any other prediction mode?

if we use and train the model in a "prediction-only" mode by setting Kt = 0, the PhyCell can not see any information of image. Is this inappropriate?

I guess, is it better to train Kt by not specifying fixed parameters for Kt? Then when the forecast is reached, manually set Kt = 1 ?

But in Figure2, it is say " Dotted arrows mean that predictions are reinjected as next input only for the ConvLSTM branch, and not for PhyCell ". Ok, as such, PhyCell has neither E(u(t)) nor $ {h}_{t}^{p} $ as input. So the PhyCell is useless to accept any input, is it right? And what PhyCell will output ?

this is a very very important question, very simple to you but fuzzing me !

desire to obtain code in Tensorflow

I have already read your outstanding work in CVPR2020, however, I am a newer in CV fied as i'm a researcher in Speech/Nlp. Therefore, can you share your excellent code?
If possible, my email is [email protected]
Thanks a lot :) 👍

PhyCell_Cell code is different comparing to the paper

In paper K is a function of h_tilde_t and E(u_t).

In code, the implementation is

24 combined = torch.cat([x, hidden], dim=1) # concatenate along channel axis
25 combined_conv = self.convgate(combined)
26 K = torch.sigmoid(combined_conv)

On line 24, it should be torch.cat([x, hidden_tilde, dim=1) ?

discrepancy between the article and the code?

When I read paper Le_Guen_Disentangling_Physical_Dynamics_CVPR_2020_supplemental.pdf
for MNIST dataset there are 6 Encoder Blocks and 6 Decoder Blocks.
When I read code from this repo for the MNIST dataset (the only we have here) it seems to me that there are only 3 Encoder Blocks and 3 Decoder Blocks.

So is there discrepancy between the code and the paper here, or have I misunderstood?

models/models.py

class EncoderRNN(torch.nn.Module):
    def __init__(self,phycell,convcell, device):
        super(EncoderRNN, self).__init__()
        self.encoder_E = encoder_E()   # general encoder 64x64x1 -> 32x32x32
        self.encoder_Ep = encoder_specific() # specific image encoder 32x32x32 -> 16x16x64
        self.encoder_Er = encoder_specific() 
        self.decoder_Dp = decoder_specific() # specific image decoder 16x16x64 -> 32x32x32 
        self.decoder_Dr = decoder_specific()     
        self.decoder_D = decoder_D()  # general decoder 32x32x32 -> 64x64x1

About prediction mode.

You said that in prediction mode to predict the target use no input for PhyDcell that should mean the “decoding” parameter in forward of EncoderRNN in your code. But I don’t see where you change this parameter. And there are no support for “None” type input in your PhyDcell code in case to change the “decoding=True”. (sorry if i miss it please point me out).

Query regarding concatenation

PhyDNet/models/models.py

Line 279 in 23a992d

concat = decoded_Dp + decoded_Dr

In the model, it looks like the latent vector from the physics model and convLSTM are added. Should it have been concatenated along the channel? Also looking at Sec 2.2 of the Supplementary Material: Model architectures, it looks like its concatenated. Can you please clarify?

dataset

I am very interested in your work, so I would like to ask if there is a link to the SST ,Traffic BJ and Human3.6 dataset below ?

Is there a parenthesis missing here?

is the correct formula should be like this?

Obtaining coefficients for differential operators

Hi, thank you for a very interesting paper.

I have a question about the physical cell analysis in section 4.4.1 of the paper. If I understand it correctly, the coefficients c_ij can be obtained from the weights of the 1x1 convolutional layer (i.e. this line). In this particular implementation, the weights have shape (64, 49, 1, 1) because there are 64 hidden convolutional maps and the kernel size is 7x7, corresponding to 49 differential operators. I want to confirm that Figure 6 in the paper is plotting the average of these 49 values over 64 mappings (of course grouped into appropriate derivative orders). I found that after training, the coefficients c_ij are of approximately the same magnitude, instead of the pattern described in Figure 6. I did check that the moment of the weights of the first conv layer are very close to the constraints (i.e. the pre-computed Kronecker matrix). I don’t know what I’m missing here.

Your help would greatly be appreciated.

prediction video for nist dataset

Dear All,
I coud not find any predict module in this repo and understood some people have been wishing for it.
The code below works only for the mnist dataset.
Here is a simple predict function which writes videos. Hope it will help people to start with their own projects.
It only writes videos of the first test batch.
Videos look too good to be true...

You should first run main.py to get converged result in save/encoder_phydnet.pth
The replace in the end of main.py line starting with trainIters(encoder,args.nepochs... by

from utils.predict import predict
if str(device)=='cpu':
    print(f"loading data for cpu!")
    encoder.load_state_dict(torch.load('save/encoder_phydnet.pth', map_location=torch.device('cpu')))
else:
    print(f"loading data for GPU!")
    encoder.load_state_dict(torch.load('save/encoder_phydnet.pth'))
encoder.eval()
predict(encoder,test_loader, device=device)

and add file PhyDnet/utils/predict.py with contents:

import torch
from pathlib import Path
import skvideo.io
import numpy as np


def predict(encoder, loader, device):
    with torch.no_grad():
        for i, out in enumerate(loader, 0):
            input_tensor = out[1].to(device)
            target_tensor = out[2].to(device)
            input_length = input_tensor.size()[1]
            target_length = target_tensor.size()[1]

            for ei in range(input_length - 1):
                encoder_output, encoder_hidden, _, _, _ = encoder(input_tensor[:, ei, :, :, :], (ei == 0))

            decoder_input = input_tensor[:, -1, :, :, :]  # first decoder input= last image of input sequence
            predictions = []

            for di in range(target_length):
                decoder_output, decoder_hidden, output_image, _, _ = encoder(decoder_input, False, False)
                decoder_input = output_image
                predictions.append(output_image.cpu())

            input = input_tensor.cpu().numpy()
            target = target_tensor.cpu().numpy()
            predictions = np.stack(predictions)  # (10, batch_size, 1, 64, 64)
            predictions = predictions.swapaxes(0, 1)  # (batch_size,10, 1, 64, 64)

            print(f"input {input.shape}")
            print(f"target {target.shape}")
            print(f"predictions {predictions.shape}")

            input = np.uint8(input*255)
            target = np.uint8(target*255)
            predictions = np.uint8(predictions*255)

            for ibatch in range(input.shape[0]):
                video = input[ibatch,...]
                truepred=[]
                for iframe in range(input.shape[1]):
                    truepred.append(target[ibatch, iframe,...])
                    truepred.append(predictions[ibatch, iframe,...])
                truepred=np.array(truepred)
                skvideo.io.vwrite(
                    Path("test_X_Y.mp4".replace("X",str(i)).replace("Y", str(ibatch))),
                    np.concatenate([video, truepred],axis=0)
                )
            break

supplements

Dear author:
I'm a reader of this paper from China and found this paper may be helopful to my recent work about weather forcasting. For better understanding the techs, I'd like to read the supplemnts mentioned in the paper. So, where can I find the related supplements.

Thank you for any assistance.

About mse

Hello, the paper is very interesting. I tried to run the proposed code, but got MSE 39. After that, I tried to change the Lmoment to 0.1 and 0.01, but got MSE still around 40. What am I missing ?

Hidden dimensions of EncoderRNN

Hi,

Is there a chance you could explain the meaning of the dimensions of hidden1 and hidden2 found in lines 265 and 266 of models.py?

Thanks,

Taylor's expansion in the supplement

This is great work, and it is very helpful to my current research. I just have a subtle question about Taylor's expansion part in the supplement. I suppose there exist 0th and 1st-degree derivatives in Taylor's expansion, but I don't really find them in the supplement. I am not very familiar with it, could anybody please explain it? Many thanks.

Issue with the Teacher forcing ratio

Hi author, could you please disclose the setting of the teacher forcing ratio used for the human3.6 dataset? and Other training parameters (training epoch, batch_size) if you would like to make them public.

I tried to reproduce your method on this dataset but got terrible results on testing data. However, the performance on the validation set was good. I am guessing if that's the problem with the teacher forcing ratio that makes inconsistency between training and testing?

Did you observe the performance gap (visually) between the validation and testing set?

Thank you.

About Data Set and Test Part Code

Hello, the idea of your article has greatly inspired me, but there is no test code part and the training code of the other three data sets. Can you send me the test code part? My email account number: [email protected].

Cannot find dataset

Hi,

I'm really interested in your paper and would like to reproduce the results on SST and Traffic BJ dataset.
However, I couldn't find them because the links are dead.

Do you still keep a copy of them?

Could you send me through my email ([email protected]) or uploaded in google drive for downloading?

there are some formula details in the Supplementary Material I can't understand, maybe others have the same problem

Q1：what is the k ?

I think it should be written as w, am i getting it right ?

Q2: what is the p, k in filter w ?

i guess the p in filter w mean the channel dimension in filter w, am i getting it right?

i guess the k in filter w just mean the size of filter w, not the kth power of w, am i getting it right?

Q3 : we all know the convolution is generally implemented with the bias weight. But the bias weight does not exist in . Could you explain briefly why don't need to consider bias weight?

thank you so much !

License?

Would be good to have license for this repo. Othervise it is unusable...

Bigger input images

Hello,
I'm trying to use this model to predict cloud movement on 256x256 images. I want to base the model on the one you used for SST but I'm not sure I can replicate it.

Which values did you use here and could you recommend me what would you use for 256x256 images?

phycell = PhyCell(input_shape=(64,64), input_dim=64, F_hidden_dims=[49], n_layers=1, kernel_size=(7,7), device=device)

convcell = ConvLSTM(input_shape=(64,64), input_dim=64, hidden_dims=[128,128,64], n_layers=3, kernel_size=(3,3), device=device)

Are there other params of the architecture I should change?

Thanks for the code, I think it looks very promising for what I'm trying to do

CUDA error: an illegal memory access was encountered

I want to change the code from .to(device) to .cuda() in order to run model on mutil-gpu. But fails. Getting the following error. Model can get loss, but after loss.backward() Model cannot access loss value. With more debug, this situation early happend on line 32 and 34 on model.py. Value hidden_tilde and combinlined cannot access their value(RuntimeError: CUDA error: an illegal memory access was encountered) after backward().
Traceback (most recent call last):
File "main.py", line 180, in
plot_losses = trainIters(encoder, args.n_epochs, print_every=args.print_every, eval_every=args.eval_every)
File "main.py", line 103, in trainIters
teacher_forcing_ratio)
File "main.py", line 81, in train_on_batch
return loss.item() / target_length
RuntimeError: CUDA error: an illegal memory access was encountered

About evaluation

Hi! I have a question about the evaluation of the time sequence forecast problem.
The first problem is: As the paper says, "Metrics are averaged for each frame of the output sequence." However, the metrics are made on the total forecasting sequence in this repo (main.py).
The second problem is: Should we first post-process the forecast result (such as ×255 for the result in the Moving MNIST dataset) and then make the evaluation?
Thanks a lot!

vincent-leguen / phydnet Goto Github PK

phydnet's People

Contributors

Stargazers

Watchers

Forkers

phydnet's Issues

Recommend Projects

Recommend Topics

Recommend Org