Git Product home page Git Product logo

vdtr's Introduction

VDTR: Video Deblurring with Transformer

Mingdeng Cao, Yanbo Fan, Yong Zhang, Jue Wang and Yujiu Yang.


[arXiv]

We propose Video Deblurring Transformer (VDTR), a simple yet effective model that takes advantage of the long-range and relation modeling characteristics of the Transformer for video deblurring. VDTR utilizes pure Transformer for both spatial and temporal modeling and obtains highly competitive performance on the popular video deblurring benchmarks.

VDTR surpasses CNN-based state-of-the-art methods more than 1.5dB PSNR with moderate computational costs:


Spatio-temporal learning is significant for video deblurring, which is dominated by convolution-based methods. This paper presents VDTR, an effective Transformer-based model that makes the first attempt to adapt the Transformer for video deblurring. VDTR exploits the superior long-range and relation modeling capabilities of Transformer for both spatial and temporal modeling. However, it is challenging to design an appropriate Transformer-based model for video deblurring due to the high computational costs for high-resolution spatial modeling and the misalignment across frames for temporal modeling. To address these problems, VDTR advocates performing attention within non-overlapping windows and exploiting the hierarchical structure for long-range dependencies modeling. For frame-level spatial modeling, we propose an encoder-decoder Transformer that utilizes multi-scale features for deblurring. For multi-frame temporal modeling, we adapt the Transformer to fuse multiple spatial features efficiently. Compared with CNN-based methods, the proposed method achieves highly competitive results on both synthetic and real-world video deblurring benchmarks, including DVD, GOPRO, REDS and BSD. We hope such a pure Transformer-based architecture can serve as a powerful alternative baseline for video deblurring and other video restoration tasks.

Model Architecture

Environment

  • Python 3.8
  • PyTorch >= 1.5
git clone  git clone https://github.com/ljzycmd/SimDeblur.git

# install the SimDeblur
cd SimDeblur
bash Install.sh

Quick Start

  1. Clone the codes of VDTR
git clone https://github.com/ljzycmd/VDTR.git
  1. Download and unzip the datasets

Then create the soft links of the datasets to the ./datasets folder.

  1. Run the training script
CUDA_VISIBLE_DEVICES=0,1,2,3 python -m torch.distributed.launch --nproc_per_node=4 --master_port=10086 train.py ./configs/vdtr/vdtr_dvd.yaml --gpus=4

the training logs are saved in ./workdir/*

  1. Run the testing script (single GPU, minimum requirement RTX 2080Ti)
python test.py ./configs/vdtr/vdtr_dvd.yaml $Checkpoint_path

the testing logs and frames are saved in ./workdir/*.

Pretrained checkpoints are listed:

Model Dataset Download
VDTR DVD Google drive
VDTR GOPRO Google drive
VDTR BSD-1ms Goodle drive
VDTR BSD-2ms Goodle drive
VDTR BSD-3ms Goodle drive

Experimental Results

VDTR achieves competitive PSNR and SSIM on both synthetic and real-world deblurring datasets.

Quantitative results on popular video deblurring datasets: DVD, GOPRO, REDS qualitative_comparison

Qualitative comparison to state-of-the-art video deblurring methods on GOPRO qualitative_comparison

Quantitative results on real-world video deblurring datasets: BSD

Qualitative comparison to state-of-the-art video deblurring methods on BSD

Citation

If the proposed model is useful for your research, please consider citing

@article{cao2022vdtr,
  title   = {VDTR: Video Deblurring with Transformer},
  author  = {Mingdeng Cao and Yanbo Fan and Yong Zhang and Jue Wang and Yujiu Yang},
  journal = {arXiv:2204.08023},
  year    = {2022}
}

vdtr's People

Contributors

ljzycmd avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.