Git Product home page Git Product logo

video_prediction's Introduction

Stochastic Adversarial Video Prediction

[Project Page] [Paper]

TensorFlow implementation for stochastic adversarial video prediction. Given a sequence of initial frames, our model is able to predict future frames of various possible futures. For example, in the next two sequences, we show the ground truth sequence on the left and random predictions of our model on the right. Predicted frames are indicated by the yellow bar at the bottom. For more examples, visit the project page.

Stochastic Adversarial Video Prediction,
Alex X. Lee, Richard Zhang, Frederik Ebert, Pieter Abbeel, Chelsea Finn, Sergey Levine.
arXiv preprint arXiv:1804.01523, 2018.

An alternative implementation of SAVP is available in the Tensor2Tensor library.

Getting Started

Prerequisites

  • Linux or macOS
  • Python 3
  • CPU or NVIDIA GPU + CUDA CuDNN

Installation

  • Clone this repo:
git clone -b master --single-branch https://github.com/alexlee-gk/video_prediction.git
cd video_prediction
  • Install TensorFlow >= 1.9 and dependencies from http://tensorflow.org/
  • Install ffmpeg (optional, used to generate GIFs for visualization, e.g. in TensorBoard)
  • Install other dependencies
pip install -r requirements.txt

Miscellaneous installation considerations

  • In python >= 3.6, make sure to add the root directory to the PYTHONPATH, e.g. export PYTHONPATH=path/to/video_prediction.
  • For the best speed and experimental results, we recommend using cudnn version 7.3.0.29 and any tensorflow version >= 1.9 and <= 1.12. The final training loss is worse when using cudnn versions 7.3.1.20 or 7.4.1.5, compared to when using versions 7.3.0.29 and below.
  • In macOS, make sure that bash >= 4.0 is used (needed for associative arrays in download_model.sh script).

Use a Pre-trained Model

  • Download and preprocess a dataset (e.g. bair):
bash data/download_and_preprocess_dataset.sh bair
  • Download a pre-trained model (e.g. ours_savp) for the action-free version of that dataset (i.e. bair_action_free):
bash pretrained_models/download_model.sh bair_action_free ours_savp
  • Sample predictions from the model:
CUDA_VISIBLE_DEVICES=0 python scripts/generate.py --input_dir data/bair \
  --dataset_hparams sequence_length=30 \
  --checkpoint pretrained_models/bair_action_free/ours_savp \
  --mode test \
  --results_dir results_test_samples/bair_action_free
  • The predictions are saved as images and GIFs in results_test_samples/bair_action_free/ours_savp.
  • Evaluate predictions from the model using full-reference metrics:
CUDA_VISIBLE_DEVICES=0 python scripts/evaluate.py --input_dir data/bair \
  --dataset_hparams sequence_length=30 \
  --checkpoint pretrained_models/bair_action_free/ours_savp \
  --mode test \
  --results_dir results_test/bair_action_free

Model Training

  • To train a model, download and preprocess a dataset (e.g. bair):
bash data/download_and_preprocess_dataset.sh bair
  • Train a model (e.g. our SAVP model on the BAIR action-free robot pushing dataset):
CUDA_VISIBLE_DEVICES=0 python scripts/train.py --input_dir data/bair --dataset bair \
  --model savp --model_hparams_dict hparams/bair_action_free/ours_savp/model_hparams.json \
  --output_dir logs/bair_action_free/ours_savp
  • To view training and validation information (e.g. loss plots, GIFs of predictions), run tensorboard --logdir logs/bair_action_free --port 6006 and open http://localhost:6006.
    • Summaries corresponding to the training and validation set are named the same except that the tags of the latter end in "_1".
    • Summaries corresponding to the validation set with sequences that are longer than the ones used in training end in "_2", if applicable (i.e. if the dataset's long_sequence_length differs from sequence_length).
    • Summaries of the metrics over prediction steps are shown as 2D plots in the repurposed PR curves section. To see them, tensorboard needs to be built from source after commenting out two lines from their source code (see tensorflow/tensorboard#1110).
    • Summaries with names starting with "eval_" correspond to the best/average/worst metrics/images out of 100 samples for the stochastic models (as in the paper). The ones starting with "accum_eval_" are the same except that they where computed over (roughly) the whole validation set, as opposed to only a single minibatch of the validation set.
  • For multi-GPU training, set CUDA_VISIBLE_DEVICES to a comma-separated list of devices, e.g. CUDA_VISIBLE_DEVICES=0,1,2,3. To use the CPU, set CUDA_VISIBLE_DEVICES="".
  • See more training details for other datasets and models in scripts/train_all.sh.

Datasets

Download the datasets using the following script. These datasets are collected by other researchers. Please cite their papers if you use the data.

  • Download and preprocess the dataset.
bash data/download_and_preprocess_dataset.sh dataset_name

The dataset_name should be one of the following:

To use a different dataset, preprocess it into TFRecords files and define a class for it. See kth_dataset.py for an example where the original dataset is given as videos.

Note: the bair dataset is used for both the action-free and action-conditioned experiments. Set the hyperparameter use_state=True to use the action-conditioned version of the dataset.

Models

  • Download the pre-trained models using the following script.
bash pretrained_models/download_model.sh dataset_name model_name

The dataset_name should be one of the following: bair_action_free, kth, or bair. The model_name should be one of the available pre-trained models:

  • ours_savp: our complete model, trained with variational and adversarial losses. Also referred to as ours_vae_gan.

The following are ablations of our model:

  • ours_gan: trained with L1 and adversarial loss, with latent variables sampled from the prior at training time.
  • ours_vae: trained with L1 and KL loss.
  • ours_deterministic: trained with L1 loss, with no stochastic latent variables.

See pretrained_models/download_model.sh for a complete list of available pre-trained models.

Model and Training Hyperparameters

The implementation is designed such that each video prediction model defines its architecture and training procedure, and include reasonable hyperparameters as defaults. Still, a few of the hyperparameters should be overriden for each variant of dataset and model. The hyperparameters used in our experiments are provided in hparams as JSON files, and they can be passed onto the training script with the --model_hparams_dict flag.

Citation

If you find this useful for your research, please use the following.

@article{lee2018savp,
  title={Stochastic Adversarial Video Prediction},
  author={Alex X. Lee and Richard Zhang and Frederik Ebert and Pieter Abbeel and Chelsea Finn and Sergey Levine},
  journal={arXiv preprint arXiv:1804.01523},
  year={2018}
}

video_prediction's People

Contributors

alexlee-gk avatar febert avatar richzhang avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

video_prediction's Issues

About sequence length

Hi!
I train my model with context_frames=10 and sequence_length=30, in other words, aiming to predict the next 20 frames. (Because of my Hardware limitation).
I wonder whether I can generate the next 30 frames(rather than 20) when I test the model. Can I set the context_frames=10 and sequence_length=40 directly when testing? Or need I modify anything?
Thanks!

Cannot restore model when attempting to run inference on different image sizes

Hi Alex, thank you so much for the code release.

I am trying to run inference on larger image size, e.g. 128x128 while the model was trained on images of size 64x64. But the model cannot be restored.

I have modified scripts/generate.py with one more args flag: infer_read_pics, which will change the placeholder shape to 128x128 for inference. The data being fed into the placeholder is also resized to 128x128:

... omited code ...

    if args.infer_read_pics:
        inputs = None
        input_phs = {'images': tf.placeholder(dtype=tf.float32, shape=[1, model.hparams.sequence_length, IMG_H, IMG_W, 1], name='images_ph')}
    else:
        inputs = dataset.make_batch(args.batch_size)
        input_phs = {k: tf.placeholder(v.dtype, v.shape, '%s_ph' % k) for k, v in inputs.items()}
    with tf.variable_scope(''):
        model.build_graph(input_phs)

... omitted code...

    while True:
        if args.num_samples and sample_ind >= args.num_samples:
            break

        try:
            if sample_ind > 0:
                break

            if args.infer_read_pics:
                glob_pattern = '/home/von/repo/video_prediction/data/kth/infer_dir' + '/context_image_*.png'
                img_paths = glob.glob(glob_pattern, recursive=True)
                ipaths = sorted(img_paths)
                imgs = skimage.io.imread_collection(ipaths)
                imgs = [(lambda img: cv2.resize(img, dsize=(IMG_H, IMG_W), interpolation=cv2.INTER_CUBIC))(img[..., 0:1]) for img in imgs]
                imgs = np.expand_dims(np.stack(imgs), axis=0)  # (1, 11, 64, 64, 1)
                od = OrderedDict()
                od['images'] = imgs / 255.0
                input_results = od
            else:
                input_results = sess.run(inputs)
        except tf.errors.OutOfRangeError:
            break

I have also commented below code inside video_prediction/models/savp_model.py to make sure the model architecture is the same as the model trained using 64x64 images:

        elif scale_size >= 128:
            self.encoder_layer_specs = [
                (self.hparams.ngf, False),
                (self.hparams.ngf * 2, True),
                (self.hparams.ngf * 4, True),
                (self.hparams.ngf * 8, True),
            ]
            self.decoder_layer_specs = [
                (self.hparams.ngf * 8, True),
                (self.hparams.ngf * 4, True),
                (self.hparams.ngf * 2, False),
                (self.hparams.ngf, False),
            ]

But the model cannot be restored due to the error below:

InvalidArgumentError (see above for traceback): Assign requires shapes of both tensors to match. lhs shape= [32768,100] rhs shape= [8192,100]
         [[node save/Assign_15 (defined at /home/von/repo/video_prediction/video_prediction/utils/tf_utils.py:542)  = Assign[T=DT_FLOAT, _grappler_relax_allocator_constraints=true, use_locking=true, validate_shape=true, _device="/job:localhost/replica:0/task:0/device:GPU:0"](generator/rnn/savp_cell/cdna_kernels/dense/kernel, save/RestoreV2/_15)]]

According to the error, I have traced to the this line in video_prediction/ops.py: kernel_shape = [input_shape[1], units].

Do you have any suggestions about how to make it work for running inference on arbitrary image sizes?

Thanks for reading my question!

Training hangs when training a GAN with multiple GPUs

I use tensorflow 1.11.0, CUDA 9.0.176 and cuDNN 7.3.1 on Ubuntu 16.04. My GPUs are nvidia titan Xp. When I was trying to train a savp model with command

CUDA_VISIBLE_DEVICES=0,1 python scripts/train.py --input_dir data/bair --dataset bair \
  --model savp --model_hparams_dict hparams/bair_action_free/ours_savp/model_hparams.json \
  --output_dir logs/bair_action_free/ours_savp \
  --gpu_mem_frac 0.7

the program seems to stop runing and never move forward like in an endless loop, after outputing

2018-11-08 08:14:24.668709: E tensorflow/core/grappler/optimizers/dependency_optimizer.cc:666] Iteration = 0, topological sort failed with message: The graph couldn't be sorted in topological order.
2018-11-08 08:14:25.028003: E tensorflow/core/grappler/optimizers/dependency_optimizer.cc:666] Iteration = 1, topological sort failed with message: The graph couldn't be sorted in topological order.

What's the possible reason for this?
More information is below

2018-11-08 08:13:09.286148: E tensorflow/core/grappler/optimizers/dependency_optimizer.cc:666] Iteration = 0, topological sort failed with message: The graph couldn't be sorted in topological order.
2018-11-08 08:13:09.515356: E tensorflow/core/grappler/optimizers/dependency_optimizer.cc:666] Iteration = 1, topological sort failed with message: The graph couldn't be sorted in topological order.
session.run took 22.4s
recording summary
done
recording image summary
done
progress  global step 0  epoch 0  step 2560
discrim_video_sn_gan_loss (1.0238764, 1.0)
discrim_video_sn_vae_gan_loss (0.895689, 1.0)
gen_l1_loss (0.0804427, 100.0)
gen_video_sn_gan_loss (1.0158763, 1.0)
gen_video_sn_vae_gan_loss (0.8958128, 1.0)
gen_kl_loss (0.045274347, 0.0)
learning_rate 0.0002
saving model to logs/bair_action_free/ours_savp
done
2018-11-08 08:14:24.668709: E tensorflow/core/grappler/optimizers/dependency_optimizer.cc:666] Iteration = 0, topological sort failed with message: The graph couldn't be sorted in topological order.
2018-11-08 08:14:25.028003: E tensorflow/core/grappler/optimizers/dependency_optimizer.cc:666] Iteration = 1, topological sort failed with message: The graph couldn't be sorted in topological order.

Originally posted by @Bonennult in #9 (comment)

Image Size Reshaping

Hi Alex,

Thanks for releasing the code! I am having some trouble running the code. I am getting a resource exhausted error on GTX 1070 with batch size of 1 on the kth dataset. I've tried using flow and dna as suggested in the other issue. I am also seeing the topological sort failed error.

I've tried modifying the kth_dataset.py file to change the image size to 32x32 instead of 64x64. But then I get reshaping errors. Is the 64x64 image size hard-coded in the model somewhere?

This is the command I am running:

CUDA_VISIBLE_DEVICES=0 python scripts/train.py --input_dir /media//StochasticAdversarialVideoFramePrediction/kth --dataset KTHVideoDataset --model savp --model_hparams_dict hparams/kth/ours_savp/model_hparams.json --output_dir /media//StochasticAdversarialVideoFramePrediction/ --model_hparams tv_weight=0.001,transformation=flow

Thanks!
Masha

Questions about evaluation with the deterministic model

Thanks for your wonderful work! I want to evaluate the deterministic model on kth dataset, so I download the dataset (then preprocessed the dataset) and the prer-trained model "ours_determonistic". When I run "generate.py", there is something wrong. I notice that the model in the file "options.json" is savp. Additionally, there is not a deterministic model in the model class as shown in the /video_prediction/models/init.py. What else I should do if I want to evaluate the deterministic model on kth dataset.

eval image don't have action?

Hi Alex,

Thanks for releasing all the code. I am working on fitting a new dataset into your model. I resize the (height,width) to (256,256) which same as ucf101 datasets. I set context_frames equal to 2 in order to predict 8 frames with the model.

  1. In the code, I find outputs['gen_images'] predict future frames in the generator and outputs['gen_inputs'] is the input clip. But in the tensorboard, in the step 0, it is intricate
    that outputs['gen_images'] is as same as outputs['gen_inputs'],so could you give me some advice for the phenomenon?
  2. when I use scripts/generate.py to predict future frames on my own model, the predict frames is same, don't have any action , it cause the gif images hold still .I debug the code ,and don't find any solutions,
    I look forward to your reply.

Do you have any good suggestion on this issue?

Thanks a lot.

Best,
Sander

KeyError: 'gen_states' when run train.py

Hi, when I was trying to train savp model like the given example did

CUDA_VISIBLE_DEVICES=0 python scripts/train.py --input_dir data/bair --dataset bair \
  --model savp --model_hparams_dict hparams/bair_action_free/ours_savp/model_hparams.json \
  --output_dir logs/bair_action_free/ours_savp

I got error like this

Traceback (most recent call last):
  File "scripts/train.py", line 354, in <module>
    main()
  File "scripts/train.py", line 174, in main
    model.build_graph(inputs)
  File "/home/jio/vp_ws/video_prediction/models/base_model.py", line 478, in build_graph
    outputs_tuple, losses_tuple, loss_tuple, metrics_tuple = self.tower_fn(self.inputs)
  File "/home/jio/vp_ws/video_prediction/models/base_model.py", line 439, in tower_fn
    g_losses = self.generator_loss_fn(inputs, outputs)
  File "/home/jio/vp_ws/video_prediction/models/base_model.py", line 759, in generator_loss_fn
    gen_states = outputs.get('gen_states_enc', outputs['gen_states'])
KeyError: 'gen_states'

I checked model_hparams.json file theree do exist state_weight param. Any one met this problem?

Thank you!

running train_op took too long ??

Thanks for sharing this great work!

I run into this issue when training ours_savp on kth dataset, the training looks going properly, but is very slow.

running train_op took too long (7.2s)
running train_op took too long (7.2s)
.....
progress  global step 100  epoch 0.5
          image/sec 1.1  remaining 37520m (625.3h) (26.1d)
d_loss 0.10482973
   discrim_video_sn_gan_loss (0.5204395, 0.1)
   discrim_video_sn_vae_gan_loss (0.5278577, 0.1)
g_loss 2.0725453
   gen_l1_loss (0.016228592, 100.0)
   gen_video_sn_gan_loss (0.32749984, 0.1)
   gen_video_sn_vae_gan_loss (0.35494953, 0.1)
   gen_video_sn_vae_gan_feature_cdist_loss (0.038144115, 10.0)
   gen_kl_loss (0.6190775, 0.0)
learning_rate 0.0002
running train_op took too long (7.2s)
running train_op took too long (7.2s)
running train_op took too long (7.3s)
......
......

My configuration:
tensorflow: 1.10.0
cuda: 9.0
cudnn: 7.3.0.29

I'm running KTH dataset with ours_savp model. When I use default hparms, I got out of memory error, so I change batch_size=8.

My GPU looks working properly:
+-------------------------------+----------------------+----------------------+
| 1 Tesla K40c Off | 00000000:02:00.0 Off | 0 |
| 37% 73C P0 124W / 235W | 10963MiB / 11441MiB | 76% Default |
+-------------------------------+----------------------+----------------------+

Tensorboard refreshes when summery_freq is reahced.

Appreciate for any suggestions.
Regards,

CDNA Masks

Just a question with the number of Masks for CDNA in the sv2p_model.
You have a mask for each of the 10 Transformed outputs, another for a static background (masking the input image) and another for the output of the model, should that not be 12 masks, not 11?
It seems as though you are not using one of the transformed outputs, or am I interpreting this wrong?
Thanks

File Not Found: net-lin_alex_v0.1.pb

I'm trying to run the generate.py file but I get this error in lpips_tf.py.
FileNotFoundError: [Errno 2] No such file or directory: 'C:\Users\tayya\.lpips\net-lin_alex_v0.1.pb'

This file seems to be necessary for the script but does not exist on my machine even after all successful imports.

I'm using Windows 10, python version 3.6.5 with Anaconda.

The ".lpips" directory exists at the given path but it is empty. I don't have any other *.pb files on my system either, so it's not like the path is wrong.

FailedPreconditionError while trying to predict using gan_only model on KTH

I'm trying to generate videos on KTH dataset using the gan-only pretrained model.
I'm getting the below error

FailedPreconditionError (see above for traceback): Attempting to use uninitialized value generator/encoder/layer_3/InstanceNorm/beta [[node generator/encoder/layer_3/InstanceNorm/beta/read (defined at .../src/video_prediction/video_prediction/layers/normalization.py:129) ]]

I'm attaching the full stack trace below. Please let me know if you need any additional information.

2019-07-29 16:52:32.640421: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA 2019-07-29 16:52:32.682136: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 4008000000 Hz 2019-07-29 16:52:32.682442: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x56082cde0ee0 executing computations on platform Host. Devices: 2019-07-29 16:52:32.682465: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (0): <undefined>, <undefined> 2019-07-29 16:52:32.682552: I tensorflow/core/common_runtime/process_util.cc:71] Creating new thread pool with default inter op setting: 2. Tune using inter_op_parallelism_threads for best performance. Traceback (most recent call last): File ".../SoftwareFiles/Anaconda/anaconda3/envs/savp/lib/python3.7/site-packages/tensorflow/python/client/session.py", line 1334, in _do_call return fn(*args) File ".../SoftwareFiles/Anaconda/anaconda3/envs/savp/lib/python3.7/site-packages/tensorflow/python/client/session.py", line 1319, in _run_fn options, feed_dict, fetch_list, target_list, run_metadata) File ".../SoftwareFiles/Anaconda/anaconda3/envs/savp/lib/python3.7/site-packages/tensorflow/python/client/session.py", line 1407, in _call_tf_sessionrun run_metadata) tensorflow.python.framework.errors_impl.FailedPreconditionError: Attempting to use uninitialized value generator/encoder/layer_3/InstanceNorm/beta [[{{node generator/encoder/layer_3/InstanceNorm/beta/read}}]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File ".../src/my_code/VideoPredictor.py", line 487, in <module> main() File ".../src/my_code/VideoPredictor.py", line 481, in main demo6() File ".../src/my_code/VideoPredictor.py", line 466, in demo6 future_length, frame_rate, num_futures) File ".../src/my_code/VideoPredictor.py", line 83, in predict_and_save_videos gen_images = sess.run(model.outputs['gen_images'], feed_dict=feed_dict) File ".../SoftwareFiles/Anaconda/anaconda3/envs/savp/lib/python3.7/site-packages/tensorflow/python/client/session.py", line 929, in run run_metadata_ptr) File ".../SoftwareFiles/Anaconda/anaconda3/envs/savp/lib/python3.7/site-packages/tensorflow/python/client/session.py", line 1152, in _run feed_dict_tensor, options, run_metadata) File ".../SoftwareFiles/Anaconda/anaconda3/envs/savp/lib/python3.7/site-packages/tensorflow/python/client/session.py", line 1328, in _do_run run_metadata) File ".../SoftwareFiles/Anaconda/anaconda3/envs/savp/lib/python3.7/site-packages/tensorflow/python/client/session.py", line 1348, in _do_call raise type(e)(node_def, op, message) tensorflow.python.framework.errors_impl.FailedPreconditionError: Attempting to use uninitialized value generator/encoder/layer_3/InstanceNorm/beta [[node generator/encoder/layer_3/InstanceNorm/beta/read (defined at .../src/video_prediction/video_prediction/layers/normalization.py:129) ]]

Caused by op 'generator/encoder/layer_3/InstanceNorm/beta/read', defined at: File ".../src/my_code/VideoPredictor.py", line 487, in <module> main() File ".../src/my_code/VideoPredictor.py", line 481, in main demo6() File ".../src/my_code/VideoPredictor.py", line 464, in demo6 model_hparams, batch_size, gpu_mem_frac) File ".../src/my_code/VideoPredictor.py", line 63, in restore_model model.build_graph(input_phs) File ".../src/video_prediction/video_prediction/models/base_model.py", line 478, in build_graph outputs_tuple, losses_tuple, loss_tuple, metrics_tuple = self.tower_fn(self.inputs) File ".../src/video_prediction/video_prediction/models/base_model.py", line 412, in tower_fn gen_outputs = self.generator_fn(inputs) File ".../src/video_prediction/video_prediction/models/savp_model.py", line 710, in generator_fn outputs_posterior = posterior_fn(inputs, hparams) File ".../src/video_prediction/video_prediction/models/savp_model.py", line 29, in posterior_fn image_pairs, nef=hparams.nef, n_layers=hparams.n_layers, norm_layer=hparams.norm_layer) File ".../src/video_prediction/video_prediction/utils/tf_utils.py", line 111, in fn flat_batch_r = flat_batch_fn(flat_batch_x, *args, **kwargs) File ".../src/video_prediction/video_prediction/models/networks.py", line 26, in encoder normalized = norm_layer(convolved) File ".../src/video_prediction/video_prediction/layers/normalization.py", line 129, in fused_instance_norm trainable=trainable) File ".../SoftwareFiles/Anaconda/anaconda3/envs/savp/lib/python3.7/site-packages/tensorflow/contrib/framework/python/ops/arg_scope.py", line 182, in func_with_args return func(*args, **current_args) File ".../SoftwareFiles/Anaconda/anaconda3/envs/savp/lib/python3.7/site-packages/tensorflow/contrib/framework/python/ops/variables.py", line 350, in model_variable aggregation=aggregation) File ".../SoftwareFiles/Anaconda/anaconda3/envs/savp/lib/python3.7/site-packages/tensorflow/contrib/framework/python/ops/arg_scope.py", line 182, in func_with_args return func(*args, **current_args) File ".../SoftwareFiles/Anaconda/anaconda3/envs/savp/lib/python3.7/site-packages/tensorflow/contrib/framework/python/ops/variables.py", line 277, in variable aggregation=aggregation) File ".../SoftwareFiles/Anaconda/anaconda3/envs/savp/lib/python3.7/site-packages/tensorflow/python/ops/variable_scope.py", line 1479, in get_variable aggregation=aggregation) File ".../SoftwareFiles/Anaconda/anaconda3/envs/savp/lib/python3.7/site-packages/tensorflow/python/ops/variable_scope.py", line 1220, in get_variable aggregation=aggregation) File ".../SoftwareFiles/Anaconda/anaconda3/envs/savp/lib/python3.7/site-packages/tensorflow/python/ops/variable_scope.py", line 547, in get_variable aggregation=aggregation) File ".../SoftwareFiles/Anaconda/anaconda3/envs/savp/lib/python3.7/site-packages/tensorflow/python/ops/variable_scope.py", line 499, in _true_getter aggregation=aggregation) File ".../SoftwareFiles/Anaconda/anaconda3/envs/savp/lib/python3.7/site-packages/tensorflow/python/ops/variable_scope.py", line 911, in _get_single_variable aggregation=aggregation) File ".../SoftwareFiles/Anaconda/anaconda3/envs/savp/lib/python3.7/site-packages/tensorflow/python/ops/variables.py", line 213, in __call__ return cls._variable_v1_call(*args, **kwargs) File ".../SoftwareFiles/Anaconda/anaconda3/envs/savp/lib/python3.7/site-packages/tensorflow/python/ops/variables.py", line 176, in _variable_v1_call aggregation=aggregation) File ".../SoftwareFiles/Anaconda/anaconda3/envs/savp/lib/python3.7/site-packages/tensorflow/python/ops/variables.py", line 155, in <lambda> previous_getter = lambda **kwargs: default_variable_creator(None, **kwargs) File ".../SoftwareFiles/Anaconda/anaconda3/envs/savp/lib/python3.7/site-packages/tensorflow/python/ops/variable_scope.py", line 2495, in default_variable_creator expected_shape=expected_shape, import_scope=import_scope) File ".../SoftwareFiles/Anaconda/anaconda3/envs/savp/lib/python3.7/site-packages/tensorflow/python/ops/variables.py", line 217, in __call__ return super(VariableMetaclass, cls).__call__(*args, **kwargs) File ".../SoftwareFiles/Anaconda/anaconda3/envs/savp/lib/python3.7/site-packages/tensorflow/python/ops/variables.py", line 1395, in __init__ constraint=constraint) File ".../SoftwareFiles/Anaconda/anaconda3/envs/savp/lib/python3.7/site-packages/tensorflow/python/ops/variables.py", line 1557, in _init_from_args self._snapshot = array_ops.identity(self._variable, name="read") File ".../SoftwareFiles/Anaconda/anaconda3/envs/savp/lib/python3.7/site-packages/tensorflow/python/util/dispatch.py", line 180, in wrapper return target(*args, **kwargs) File ".../SoftwareFiles/Anaconda/anaconda3/envs/savp/lib/python3.7/site-packages/tensorflow/python/ops/array_ops.py", line 81, in identity ret = gen_array_ops.identity(input, name=name) File ".../SoftwareFiles/Anaconda/anaconda3/envs/savp/lib/python3.7/site-packages/tensorflow/python/ops/gen_array_ops.py", line 3890, in identity "Identity", input=input, name=name) File ".../SoftwareFiles/Anaconda/anaconda3/envs/savp/lib/python3.7/site-packages/tensorflow/python/framework/op_def_library.py", line 788, in _apply_op_helper op_def=op_def) File ".../SoftwareFiles/Anaconda/anaconda3/envs/savp/lib/python3.7/site-packages/tensorflow/python/util/deprecation.py", line 507, in new_func return func(*args, **kwargs) File ".../SoftwareFiles/Anaconda/anaconda3/envs/savp/lib/python3.7/site-packages/tensorflow/python/framework/ops.py", line 3300, in create_op op_def=op_def) File ".../SoftwareFiles/Anaconda/anaconda3/envs/savp/lib/python3.7/site-packages/tensorflow/python/framework/ops.py", line 1801, in __init__ self._traceback = tf_stack.extract_stack()

FailedPreconditionError (see above for traceback): Attempting to use uninitialized value generator/encoder/layer_3/InstanceNorm/beta [[node generator/encoder/layer_3/InstanceNorm/beta/read (defined at .../src/video_prediction/video_prediction/layers/normalization.py:129) ]]

Error in downloading dataset (partially downloaded)

I'm trying to get some sample prediction videos. As per the README, I tried dowloading the dataset. I think it downloaded partially (Total download size is 883.5MB). The output is as follows:

$bash data/download_and_preprocess_dataset.sh bair
Downloading 'bair' dataset (this takes a while)
--2019-05-10 18:31:25-- http://rail.eecs.berkeley.edu/datasets/bair_robot_pushing_dataset_v0.tar
Resolving rail.eecs.berkeley.edu (rail.eecs.berkeley.edu)... 128.32.189.73
Connecting to rail.eecs.berkeley.edu (rail.eecs.berkeley.edu)|128.32.189.73|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 32274964480 (30G) [application/x-tar]
Saving to: ‘./data/bair/bair_robot_pushing_dataset_v0.tar’

./data/bair/bair_robot_pushing_dataset_v0.tar 1%[> ] 421.27M 359KB/s in 17m 29s

2019-05-10 18:48:55 (411 KB/s) - Connection closed at byte 441737147. Retrying.

--2019-05-10 18:48:56-- (try: 2) http://rail.eecs.berkeley.edu/datasets/bair_robot_pushing_dataset_v0.tar
Connecting to rail.eecs.berkeley.edu (rail.eecs.berkeley.edu)|128.32.189.73|:80... connected.
HTTP request sent, awaiting response... 416 Requested range not satisfiable

The file is already fully retrieved; nothing to do.

softmotion30_44k/test/
softmotion30_44k/test/traj_0_to_255.tfrecords
softmotion30_44k/train/
softmotion30_44k/train/traj_9406_to_9661.tfrecords
softmotion30_44k/train/traj_21628_to_21883.tfrecords
tar: Unexpected EOF in archive
tar: rmtlseek not stopped at a record boundary
tar: Error is not recoverable: exiting now

Dependency Nightmare

Hey,

thank you for the release. Unfortunately I can't compile this code anymore, because of the many dependencies that turn out to be a deadlock for me. Your code requires CUDA9, which is only compatible with Ubuntu 16.04, which in return doesn't provide updated drivers for my GPU anymore. I think this is an issue that must be resolved somehow, if you want your code to be feasible in >2021.

I am very happy to hear any reports about people getting this running on 20.04 ubuntu.

Question regarding GAN-only

Hi Alex,

I've recently looked into GAN models, and I was wondering where the GAN-only model you used in the paper is based on.
The paper mentions that it is representative of prior stochastic GAN-based methods, and it seems to me that the overall structure resembles pix2pix (as they are both conditional GAN).
Is the GAN-only model same as the SAVP without VAE?

Thank you.

Training stability & progress

@alexlee-gk

Thanks for an awesome repo and architecture! I am currently training on my own dataset with batch size 8 and have been playing around with lower learning rates.

While the predicted frames are decent, my metrics seem to have a high variance, indicating the training process can be made better for this data.

Do you have any suggestions on how to improve training stability and/or results, besides further decreasing the learning rate? Here are some screenshots from the tensorboard graphs.

Screen Shot 2019-05-18 at 8 18 19 PM

Screen Shot 2019-05-18 at 8 18 45 PM

Screen Shot 2019-05-18 at 8 19 00 PM

Screen Shot 2019-05-18 at 8 19 48 PM

Screen Shot 2019-05-18 at 8 20 03 PM

ModuleNotFoundError: no module named 'video_prediction'

Hello,

I implemented the steps according to the instructions and then run:

CUDA_VISIBLE_DEVICES=0 python scripts/generate.py --input_dir data/bair
--dataset_hparams sequence_length=30
--checkpoint pretrained_models/bair_action_free/ours_savp
--mode test
--results_dir results_test_samples/bair_action_free

Then error occurs: ModuleNotFoundError: no module named 'video_prediction'.

It seems everything is right, but how does it happen? and how to fix this bug?

Thank you.

Training error

I'm trying to train the SAVP model on the BAIR action-free dataset but keep running into this error 'TypeError: tf____call__() missing 1 required positional argument: 'state'.'
Please tell me how I can fix this.

Need help..error running sample prediction on KTH dataset

Hi @alexlee-gk

I am trying to workout pretrained model on kth dataset, I have successfully downloaded and processed dataset by running following command
bash data/download_and_preprocess_dataset.sh kth

Also downloaded pretrained model
bash pretrained_models/download_model.sh kth ours_savp

and now trying to run sample prediction on kth dataset but getting following batch size error, command is

CUDA_VISIBLE_DEVICES=0 python scripts/generate.py --input_dir data/kth --dataset_hparams sequence_length=30 --checkpoint pretrained_models/kth/ours_savp --mode test --results_dir results_test_samples/kth

Error is
File "scripts/generate.py", line 193, in
main()
File "scripts/generate.py", line 130, in main
raise ValueError('batch_size should evenly divide the dataset size %d' % num_examples_per_epoch)
ValueError: batch_size should evenly divide the dataset size 819

I am confused whats went wrong..?
I am newbie in deep learning and Tensor-flow..
Please help
also want to try your models on my own dataset.. need your guidance
Thanks in advance
Avani

Issue Training: tf errors

Hello,

This may be a silly question - but I've been trying to run the model with CUDA 9.0, CUDNN 7.0.5, tf-gpu-1.5.0, python 3.5, and I get the below error when running train.py with the kth dataset. Do you have any insight as to what might be causing the error?

lib/python3.5/site-packages/tensorflow/python/framework/tensor_util.py", line 498, in make_tensor_proto str_values = [compat.as_bytes(x) for x in proto_values] File "/home/video_prediction/python3_env/lib/python3.5/site-packages/tensorflow/python/framework/tensor_util.py", line 498, in str_values = [compat.as_bytes(x) for x in proto_values] File "/home/video_prediction/python3_env/lib/python3.5/site-packages/tensorflow/python/util/compat.py", line 65, in as_bytes (bytes_or_text,)) TypeError: Expected binary or unicode string, got {'eval_gen_images_ssim/min': <tf.Tensor 'eval_outputs_and_metrics/zeros_like_6:0' shape=(10, 16, 64, 64, 3) dtype=float32$, 'eval_vgg_csim/sum': <tf.Tensor 'eval_outputs_and_metrics/zeros_5:0' shape=(10, 16) dtype=float32>, 'eval_ssim_finn/max': <tf.Tensor 'eval_outputs_and_metrics/Fill_9:0' $hape=(10, 16) dtype=float32>, 'eval_gen_images_ssim_scikit/sum': <tf.Tensor 'eval_outputs_and_metrics/zeros_like_10:0' shape=(10, 16, 64, 64, 3) dtype=float32>, 'eval_ssim$finn/sum': <tf.Tensor 'eval_outputs_and_metrics/zeros_4:0' shape=(10, 16) dtype=float32>, 'eval_ssim_finn/min': <tf.Tensor 'eval_outputs_and_metrics/Fill_8:0' shape=(10, 1$) dtype=float32>, 'eval_gen_images_ssim_finn/min': <tf.Tensor 'eval_outputs_and_metrics/zeros_like_12:0' shape=(10, 16, 64, 64, 3) dtype=float32>, 'eval_ssim_scikit/sum': $tf.Tensor 'eval_outputs_and_metrics/zeros_3:0' shape=(10, 16) dtype=float32>, 'eval_mse/max': <tf.Tensor 'eval_outputs_and_metrics/Fill_3:0' shape=(10, 16) dtype=float32>, 'eval_gen_images_ssim_finn/sum': <tf.Tensor 'eval_outputs_and_metrics/zeros_like_13:0' shape=(10, 16, 64, 64, 3) dtype=float32>, 'eval_ssim_scikit/max': <tf.Tensor 'eval_o$tputs_and_metrics/Fill_7:0' shape=(10, 16) dtype=float32>, 'eval_ssim/min': <tf.Tensor 'eval_outputs_and_metrics/Fill_4:0' shape=(10, 16) dtype=float32>, 'eval_gen_images_$gg_cdist/sum': <tf.Tensor 'eval_outputs_and_metrics/zeros_like_19:0' shape=(10, 16, 64, 64, 3) dtype=float32>, 'eval_gen_images_ssim/max': <tf.Tensor 'eval_outputs_and_met$ics/zeros_like_8:0' shape=(10, 16, 64, 64, 3) dtype=float32>, 'eval_gen_images_ssim_scikit/min': <tf.Tensor 'eval_outputs_and_metrics/zeros_like_9:0' shape=(10, 16, 64, 64$ 3) dtype=float32>, 'eval_gen_images_ssim_finn/max': <tf.Tensor 'eval_outputs_and_metrics/zeros_like_14:0' shape=(10, 16, 64, 64, 3) dtype=float32>, 'eval_gen_images_mse/m$x': <tf.Tensor 'eval_outputs_and_metrics/zeros_like_5:0' shape=(10, 16, 64, 64, 3) dtype=float32>, 'eval_vgg_cdist/max': <tf.Tensor 'eval_outputs_and_metrics/Fill_13:0' sh$pe=(10, 16) dtype=float32>, 'eval_vgg_csim/min': <tf.Tensor 'eval_outputs_and_metrics/Fill_10:0' shape=(10, 16) dtype=float32>, 'eval_psnr/sum': <tf.Tensor 'eval_outputs_a$d_metrics/zeros:0' shape=(10, 16) dtype=float32>, 'eval_vgg_cdist/sum': <tf.Tensor 'eval_outputs_and_metrics/zeros_6:0' shape=(10, 16) dtype=float32>, 'eval_mse/sum': <tf.$ensor 'eval_outputs_and_metrics/zeros_1:0' shape=(10, 16) dtype=float32>, 'eval_gen_images_vgg_csim/min': <tf.Tensor 'eval_outputs_and_metrics/zeros_like_15:0' shape=(10, $6, 64, 64, 3) dtype=float32>, 'eval_gen_images_ssim/sum': <tf.Tensor 'eval_outputs_and_metrics/zeros_like_7:0' shape=(10, 16, 64, 64, 3) dtype=float32>, 'eval_gen_images_p$nr/sum': <tf.Tensor 'eval_outputs_and_metrics/zeros_like_1:0' shape=(10, 16, 64, 64, 3) dtype=float32>, 'eval_gen_images_vgg_csim/max': <tf.Tensor 'eval_outputs_and_metric$/zeros_like_17:0' shape=(10, 16, 64, 64, 3) dtype=float32>, 'eval_gen_images_psnr/max': <tf.Tensor 'eval_outputs_and_metrics/zeros_like_2:0' shape=(10, 16, 64, 64, 3) dtyp$=float32>, 'eval_ssim/sum': <tf.Tensor 'eval_outputs_and_metrics/zeros_2:0' shape=(10, 16) dtype=float32>, 'eval_gen_images_vgg_cdist/max': <tf.Tensor 'eval_outputs_and_me$rics/zeros_like_20:0' shape=(10, 16, 64, 64, 3) dtype=float32>, 'eval_gen_images_mse/min': <tf.Tensor 'eval_outputs_and_metrics/zeros_like_3:0' shape=(10, 16, 64, 64, 3) d$ype=float32>, 'eval_psnr/min': <tf.Tensor 'eval_outputs_and_metrics/Fill:0' shape=(10, 16) dtype=float32>, 'eval_psnr/max': <tf.Tensor 'eval_outputs_and_metrics/Fill_1:0' $hape=(10, 16) dtype=float32>, 'eval_vgg_cdist/min': <tf.Tensor 'eval_outputs_and_metrics/Fill_12:0' shape=(10, 16) dtype=float32>, 'eval_ssim/max': <tf.Tensor 'eval_outputs_and_metrics/Fill_5:0' shape=(10, 16) dtype=float32>, 'eval_gen_images_vgg_csim/sum': <tf.Tensor 'eval_outputs_and_metrics/zeros_like_16:0' shape=(10, 16, 64, 64, 3) dtype=float32>, 'eval_gen_images_ssim_scikit/max': <tf.Tensor 'eval_outputs_and_metrics/zeros_like_11:0' shape=(10, 16, 64, 64, 3) dtype=float32>, 'eval_gen_images_vgg_cdist/min': <tf.Tensor 'eval_outputs_and_metrics/zeros_like_18:0' shape=(10, 16, 64, 64, 3) dtype=float32>, 'eval_vgg_csim/max': <tf.Tensor 'eval_outputs_and_metrics/Fill_11:0' shape=(10, 16) dtype=float32>, 'eval_gen_images_psnr/min': <tf.Tensor 'eval_outputs_and_metrics/zeros_like:0' shape=(10, 16, 64, 64, 3) dtype=float32>, 'eval_mse/min': <tf.Tens$r 'eval_outputs_and_metrics/Fill_2:0' shape=(10, 16) dtype=float32>, 'eval_gen_images_mse/sum': <tf.Tensor 'eval_outputs_and_metrics/zeros_like_4:0' shape=(10, 16, 64, 64, 3) dtype=float32>, 'eval_ssim_scikit/min': <tf.Tensor 'eval_outputs_and_metrics/Fill_6:0' shape=(10, 16) dtype=float32>}

topological sort failed

When I was trying to generate gifs from a pre-trained model on the bair dataset, this error happens:

evaluation samples from 0 to 8
2018-11-05 13:18:39.651172: E tensorflow/core/grappler/optimizers/dependency_optimizer.cc:675] Iteration = 0, topological sort failed with message: The graph couldn't be sorted in topological order.
2018-11-05 13:18:39.658586: E tensorflow/core/grappler/optimizers/dependency_optimizer.cc:675] Iteration = 1, topological sort failed with message: The graph couldn't be sorted in topological order.
2018-11-05 13:18:41.185743: E tensorflow/stream_executor/cuda/cuda_dnn.cc:352] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR
Segmentation fault (core dumped)

I'm using tensorflow 1.10.1, CUDA 9.0 and cuDNN 7.1 on Ubuntu 16.04. The CUDA/ cuDNN are installed properly. My GPUs are dual nvidia titan v. I also tried tensorflow 1.6.0 but the same result. Do you have any ideas about this error? Thanks in advance!

Update: I also tried cuDNN 7.0.5 but still have this problem. Thanks!

Tuning the hparams when processing the dataset

Hi Alex,

Thanks for releasing all the code. I am working on fitting a new dataset into your model. But I found I am a little confused when preprocessing my videos by referencing to kth dataset.

  1. /video_prediction/datasets/kth_dataset.py line 22: context_frames=10 # I found comments in /video_prediction/datasets/base_dataset.py ln 65 saying context_frames: the number of ground-truth frames to pass in at start. Can you explain more on what is frames to pass in at start? Is that the number of frames to pass in at the very first second?

  2. kth_dataset.py ln 23: sequence_length=20 # Referenced to the comments in base_dataset.py ln 67. sequence_length is the number of frames in the video sequence. But I checked the frame number of one of kth video: person15_walking_d1_uncomp.avi. It was 740 frames. Can you explain on why do you set sequence_length=20 and how can I tune this parameters accordingly?

  3. Is it necessary to resize every frame into 6464? I noticed the original image is 120160. The image in my database is 7201280. It could be too blur if I resize them into 6464. Do you have any good suggestion on this issue?

Thanks a lot.

Best,
Julia

ValueError: Invalid model savp

Hi, I am trying to run your code by following your instructions but there are some issues.

  1. First one was that "no module named datasets and models" so I added a code snippet wherever it was required, that is:
    import sys, os
    sys.path.append('/home/chetan/Desktop/video_prediction/video_prediction')
    and it is working now.
  2. Then, I am getting another error that
    Traceback (most recent call last):
    File "scripts/generate.py", line 184, in
    main()
    File "scripts/generate.py", line 114, in main
    VideoPredictionModel = models.get_model_class(args.model)
    File "/home/chetan/Desktop/video_prediction/video_prediction/models/init.py", line 33, in get_model_class
    raise ValueError('Invalid model %s' % model)
    ValueError: Invalid model savp

I tried to solve this issue but I am unable to do so. Can you please suggest me what should I do.
Thank you.

Accuracy results are worse when training with tensorflow>1.6

The accuracy results from the pretrained models were obtained with tensorflow 1.6. These accuracy results can't be obtained when training with tensorflow version 1.12. This only affects training. The pretrained models still generate correct predictions with tensorflow 1.12.

Checkpoint data loss error when evaluating

I was able to successfully train a model on the stochastic mnist dataset, but when I try to run the evaluation script, I get this error:
tensorflow.python.framework.errors_impl.DataLossError: Unable to open table file logs/smmnist/ours_savp/checkpoint: Data loss: not an sstable (bad magic number): perhaps your file is in a different file format and you need to use a different restoreoperator?
Is there any reason why this is happening? I'm able to successfully evaluate the model pretrained on KTH, but I can't with my own model.

I'm using CUDA 9.0, cuDNN 7.3.0.29, and TF 1.12

About the save_tf_record()

Hi! Thanks for releasing the code! I am having some trouble preparing my own dataset.
In function read_videos_and_save_tf_records():
clipboard
I think one sequence corresponds to the frames of one video file(for example 1.avi) ,and the function
save_tf_record() save these frames to a .tfrecord file(e.g 1.tfrecord).

However, in the function save_tf_record():
clipboard1
It seems one sequence corresponds to >= 1 video files. I am confused by it,

Can you help me? Thanks!

About the image reshaping when preparing my own dataset

Hi, Alex!

Thanks for releasing the code! I am having some trouble running the code. I want to run your code on my own dataset and I preprocess it into TFRecords files and define a class for it(based on kth_dataset.py) Besides, I need to change the image size to 128x128.

Can you tell me what and where should I modify when I prepare my own dataset (128x128, RGB) except what I have done? Need I modify the network architecture (such as layers) to adjust to my resolution(128x128)?
Thanks very much! Look forward to your reply!

Testing on custom images

Hi Alex,
Thanks for releasing the code! Just had a small query, I was trying to test out one of the pre trained models on a custom dataset. Taking inspiration from kth_dataset.py, I created the .pkl file for my data, resized all my images to 64x64 and converted all of it to .tfrecords. So now, my test set looks like this:

test/ 
     sequence_0_to_9.tfrecords
     sequence_lengths.txt

The dataset is really small, just 10 sequences, each of sequence length 10.

And then, I'm trying to use the ours_savp pre-trained model that you've provided for the kth dataset. It worked for the kth dataset. But it fails on my custom dataset,. This is the command I'm running:

python scripts/generate.py --input_dir data/habitat --dataset_hparams sequence_length=2 --checkpoint 
pretrained_models/kth/ours_savp/ --mode test --results_dir results_test_samples/habitat --batch_size 1

It shoots out an error saying:

Traceback (most recent call last):
  File "scripts/generate.py", line 193, in <module>
    main()
  File "scripts/generate.py", line 135, in main
    model.build_graph(input_phs)
  File "/scratch/abhinav/video_prediction/video_prediction/models/base_model.py", line 478, in build_graph
    outputs_tuple, losses_tuple, loss_tuple, metrics_tuple = self.tower_fn(self.inputs)
  File "/scratch/abhinav/video_prediction/video_prediction/models/base_model.py", line 412, in tower_fn
    gen_outputs = self.generator_fn(inputs)
  File "/scratch/abhinav/video_prediction/video_prediction/models/savp_model.py", line 730, in generator_fn
    gen_outputs_posterior = generator_given_z_fn(inputs_posterior, mode, hparams)
  File "/scratch/abhinav/video_prediction/video_prediction/models/savp_model.py", line 693, in generator_given_z_fn
    cell = SAVPCell(inputs, mode, hparams)
  File "/scratch/abhinav/video_prediction/video_prediction/models/savp_model.py", line 311, in __init__
    ground_truth_sampling = tf.constant(False, dtype=tf.bool, shape=ground_truth_sampling_shape)


  File "/home/luke.skywalker/anaconda3/envs/savp2/lib/python3.6/site-packages/tensorflow/python/framework/constant_op.py", line 196, in constant
    value, dtype=dtype, shape=shape, verify_shape=verify_shape))
  File "/home/luke.skywalker/anaconda3/envs/savp2/lib/python3.6/site-packages/tensorflow/python/framework/tensor_util.py", line 491, in make_tensor_proto
    (shape_size, nparray.size))
ValueError: Too many elements provided. Needed at most -9, but received 1

I think it's because I'm not setting the batch_size and sequence_length parameters properly. When I increase the sequence_length from 2 to 3, I get:

ValueError: Too many elements provided. Needed at most -8, but received 1

I feel I may have to increase the dataset size, but, is it possible for it to work on this one itself? Could you please help me out and advise me on how to fix this?

Thank you,
Abhinav

Using trained model for custom sized images

Hi @alexlee-gk. Thank you for your wonderful code, was quite an interesting journey to dig into it!

I trained models for my own dataset of 64x64 images and wondering if it's possible to use the model to sample 512x512 images? If yes, what changes I should make so that the model become scalable?

Thank you!

ModuleNotFoundError: no module named 'video_prediction'

Hello,

I implemented the steps according to the instructions and then run:
CUDA_VISIBLE_DEVICES=0 python scripts/train.py --input_dir data/bair --dataset bair
--model savp --model_hparams_dict hparams/bair_action_free/ours_savp/model_hparams.json
--output_dir logs/bair_action_free/ours_savp

Then error occurs: ModuleNotFoundError: no module named 'video_prediction'.

It seems everything is right, but how does it happen.

I also try to add sys.append(path) before import. However , it doesn't help.

I really don't know where is the problem. Can you help me? Thanks!

Thank you.

Deterministic network architectyre

Hi Alex,
Thanks for releasing the code! Reading your paper, I am wondering what do you mean by deterministic model? Does it have the same architecture as the generator network with only previous frame as input and a l1 or l2 loss to construct the future frame?

Thanks.

Implementation of SV2P models

Hello Alex,

Thanks for sharing the code. I have a question regarding pretrained_model files you shared for SV2P model. Are those the completely trained models that you have used in your paper.

Also can you share sv2p_time_variant model file for bair_action_free dataset (you only shared time_invariant version for it).

Thanks.

what is the difference between Bair action free and action conditioned

You mention about 2 kinds in BAIR dataset: action-free and action-conditioned. What is the difference between the two?
In the download script, there is option to down bair only. It doesn't distinguish between action-free and action-conditioned. But while downloading pretrained models, there are 2 different models for action-free and action-conditioned. So, I'm assuming the difference is related to how the models are trained. Is that correct? If so (or if not also), what is the difference between the two?

I generated some sample videos using both on savp model. I couldn't find any difference.

train with my own dataset

/softmotion_dataset.py", line 37, in init
raise ValueError('The examples have images under more than one name.')
ValueError: The examples have images under more than one name.

I don't know where is the problem.

thx

Log-Variance from a ReLU activation?

Hi Alex,
I've read your excellent SAVP paper as well as the previous SV2P and CDNA papers and got one question about the SV2P implementation.

I notice in the VAE encoder tower function the default ReLU activation was used to compute a tensor called latent_std. Judging from this name I thought it is the standard deviation of the latent variables. However, when sampling from the latent distribution and computing the KL divergence, this tensor seems to be treated as the log-variance of the latent variable. Since log-variance may take both pos and neg values, ReLU is obviously not fitted as the output activation.

So why use ReLU here? Apologize if I miss anything in the code.

Pretrained Kth Model checkpoint tensor shapes does not match

Hi alex @alexlee-gk ,

When I try to load the pretrained KTH model (savp, gan and all of them) for testing, there is an error:

Traceback (most recent call last):
  File "D:\Program Files (x86)\Microsoft Visual Studio\Shared\Python36_64\lib\site-packages\tensorflow\python\client\session.py", line 1334, in _do_call
    return fn(*args)
  File "D:\Program Files (x86)\Microsoft Visual Studio\Shared\Python36_64\lib\site-packages\tensorflow\python\client\session.py", line 1319, in _run_fn
    options, feed_dict, fetch_list, target_list, run_metadata)
  File "D:\Program Files (x86)\Microsoft Visual Studio\Shared\Python36_64\lib\site-packages\tensorflow\python\client\session.py", line 1407, in _call_tf_sessionrun
    run_metadata)
tensorflow.python.framework.errors_impl.InvalidArgumentError: Assign requires shapes of both tensors to match. lhs shape= [1] rhs shape= [3]
	 [[{{node save/Assign_77}}]]
	 [[{{node save/RestoreV2}}]]
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
  File "<input>", line 1, in <module>
  File "C:\Users\huoqi\PyCharm Community Edition 2018.3.4\helpers\pydev\_pydev_bundle\pydev_umd.py", line 197, in runfile
    pydev_imports.execfile(filename, global_vars, local_vars)  # execute the script
  File "C:\Users\huoqi\PyCharm Community Edition 2018.3.4\helpers\pydev\_pydev_imps\_pydev_execfile.py", line 18, in execfile
    exec(compile(contents+"\n", file, 'exec'), glob, loc)
  File "F:/video_prediction/scripts/generate.py", line 193, in <module>
    main()
  File "F:/video_prediction/scripts/generate.py", line 152, in main
    model.restore(sess, args.checkpoint)

It seems the checkpoint cannot be restored for the testing. I would very appreciate your help!

The entire error message is below:
error.txt

Issue with tensorflow 1.10

Hi Alex,

Thanks a lot for sharing your work!

For info, I am running your code on windows, and had issues when using tensorflow 1.10.
Both for "scripts/generate.py" (error message related to shape 0 instead of shape 30 expected) and "scripts/evaluate.py" (error message related to eager execution).

Everything works fine with tensorflow 1.5.

I thought I would mention it, to save time for others that might encounter the same issue.

Regards,
Thomas.

Training time

Hi,
Approximately how long does it take to train this network. I tried training it on a PC with no GPUs, it takes around 5 mins for each iteration. Am I doing something wrong? Is it significantly faster with GPUs?

Thanks.

ResourceExhaustedError OOM even my batch_size equal 1?

Hi Alex,

Thanks for releasing all the code. I am working on fitting a new dataset into your model. I resize the (height,width) to (240,320) which same as ucf101 datasets. I set context_frames equal to 2 in order to predict 8 frames with the model. When I train the model with batchsize 1 ,everything started fine in few steps, but after a while it have mistakes like oom ResourceExhaustedError. I don't know why this is happening.

Do you have any good suggestion on this issue?

Thanks a lot.

Best,
Sander

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.