Git Product home page Git Product logo

ldmvfi's Introduction

LDMVFI: Video Frame Interpolation with Latent Diffusion Models

Duolikun Danier, Fan Zhang, David Bull

Project | arXiv | Video

Demo gif

Overview

We observe that most existing learning-based VFI models are trained to minimise the L1/L2/VGG loss between their outputs and the ground-truth frames. However, it was shown in previous works that these metrics do not correlate well with the perceptual quality of VFI. On the other hand, generative models, especially diffusion models, are showing remarkable results in generating visual content with high perceptual quality. In this work, we leverage the high-fidelity image/video generation capabilities of latent diffusion models to perform generative VFI.

Paper

Dependencies and Installation

See environment.yaml for requirements on packages. Simple installation:

conda env create -f environment.yaml

Pre-trained Model

The pre-trained model can be downloaded from here, and its corresponding config file is this yaml.

Preparing datasets

Training sets:

[Vimeo-90K] | [BVI-DVC quintuplets]

Test sets:

[Middlebury] | [UCF101] | [DAVIS] | [SNU-FILM]

To make use of the evaluate.py and the files in ldm/data/, the dataset folder names should be lower-case and structured as follows.

└──── <data directory>/
    ├──── middlebury_others/
    |   ├──── input/
    |   |   ├──── Beanbags/
    |   |   ├──── ...
    |   |   └──── Walking/
    |   └──── gt/
    |       ├──── Beanbags/
    |       ├──── ...
    |       └──── Walking/
    ├──── ucf101/
    |   ├──── 0/
    |   ├──── ...
    |   └──── 99/
    ├──── davis90/
    |   ├──── bear/
    |   ├──── ...
    |   └──── walking/
    ├──── snufilm/
    |   ├──── test-easy.txt
    |   ├──── ...
    |   └──── data/SNU-FILM/test/...
    ├──── bvidvc/quintuplets
    |   ├──── 00000/
    |   ├──── ...
    |   └──── 17599/
    └──── vimeo_septuplet/
        ├──── sequences/
        ├──── sep_testlist.txt
        └──── sep_trainlist.txt

Evaluation

To evaluate LDMVFI (with DDIM sampler), for example, on the Middlebury dataset, using PSNR/SSIM/LPIPS, run the following command.

python evaluate.py \
--config configs/ldm/ldmvfi-vqflow-f32-c256-concat_max.yaml \
--ckpt <path/to/ldmvfi-vqflow-f32-c256-concat_max.ckpt> \
--dataset Middlebury_others \
--metrics PSNR SSIM LPIPS \
--data_dir <path/to/data/dir> \
--out_dir eval_results/ldmvfi-vqflow-f32-c256-concat_max/ \
--use_ddim

This will create the directory eval_results/ldmvfi-vqflow-f32-c256-concat_max/Middlebury_others/, and store the interpolated frames, as well as a results.txt file in that directory. For other test sets, replace Middlebury_other with the corresponding class names defined in ldm/data/testsets.py (e.g. Ucf101_triplet).


To evaluate the model on perceptual video metric FloLPIPS, first evaluate the image metrics using the code above (so that the interpolated frames are saved in eval_results/ldmvfi-vqflow-f32-c256-concat_max), then run the following code.

python evaluate_vqm.py \
--exp ldmvfi-vqflow-f32-c256-concat_max \
--dataset Middlebury_others \
--metrics FloLPIPS \
--data_dir <path/to/data/dir> \
--out_dir eval_results/ldmvfi-vqflow-f32-c256-concat_max/ \

This will read the interpolated frames previously stored in eval_results/ldmvfi-vqflow-f32-c256-concat_max/Middlebury_others/ then output the evaluation results to results_vqm.txt in the same folder.


To interpolate a video (in .yuv format), use the following code.

python interpolate_yuv.py \
--net LDMVFI \
--config configs/ldm/ldmvfi-vqflow-f32-c256-concat_max.yaml \
--ckpt <path/to/ldmvfi-vqflow-f32-c256-concat_max.ckpt> \
--input_yuv <path/to/input/yuv> \
--size <spatial res of video, e.g. 1920x1080> \
--out_fps <output fps, should be 2 x original fps> \
--out_dir <desired/output/dir> \
--use_ddim

Training

LDMVFI is trained in two stages, where the VQ-FIGAN and the denoising U-Net are trained separately.

VQ-FIGAN

python main.py --base configs/autoencoder/vqflow-f32.yaml -t --gpus 0,

Denoising U-Net

python main.py --base configs/ldm/ldmvfi-vqflow-f32-c256-concat_max.yaml -t --gpus 0,

These will create a logs/ folder within which the corresonding directories are created for each experiment. The log files from training include checkpoints, images and tensorboard loggings.

To resume from a checkpoint file, simply use the --resume argument in main.py to specify the checkpoint.

Citation

@article{danier2023ldmvfi,
  title={LDMVFI: Video Frame Interpolation with Latent Diffusion Models},
  author={Danier, Duolikun and Zhang, Fan and Bull, David},
  journal={arXiv preprint arXiv:2303.09508},
  year={2023}
}

Acknowledgement

Our code is adapted from the original latent-diffusion repository. We thank the authors for sharing their code.

ldmvfi's People

Contributors

danier97 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.