Git Product home page Git Product logo

ftvsr's Introduction

FTVSR (ECCV 2022)

PWC PWC

This is the official PyTorch implementation of the paper Learning Spatiotemporal Frequency-Transformer for Compressed Video Super-Resolution.

Contents

Introduction

Compressed video super-resolution (VSR) aims to restore high-resolution frames from compressed low-resolution counterparts. Most recent VSR approaches often enhance an input frame by “borrowing” relevant textures from neighboring video frames. Although some progress has been made, there are grand challenges to effectively extract and transfer high-quality textures from compressed videos where most frames are usually highly degraded. we propose a novel Frequency-Transformer for compressed Video Super-Resolution (FTVSR) that conducts self-attention over a joint space-time-frequency domain. FTVSR significantly outperforms previous methods and achieves new SOTA results.

Contribution

We propose transfering video frames into frequecy domain design a novel frequency attention mechanism. We study the different self-attention schemes among space, time and frequency dimensions. We propose a novel Frequency-Transformer for compressed Video Super-Resolution (FTVSR) that conducts self-attention over a joint space-time-frequency domain.

Overview

Visual

Some visual results on videos with different compression rates (No compression, CRF 15, 25, 35).

Requirements and dependencies

  • python 3.7 (recommend to use Anaconda)
  • pytorch == 1.9.0
  • torchvision == 0.10.0
  • opencv-python == 4.5.3
  • mmcv-full == 1.3.9
  • scipy==1.7.3
  • scikit-image == 0.19.0
  • lmdb == 1.2.1
  • yapf == 0.31.0
  • tensorboard == 2.6.0

Model

Pre-trained models can be downloaded from baidu cloud(i42r) or Google drive.

  • FTVSR_REDS.pth: trained on REDS dataset with 50% uncompressed videos and 50% compressed videos (CRF 15, 25, 35).
  • FTVSR_Vimeo90K.pth: trained on Vimeo-90K dataset with 50% uncompressed videos and 50% compressed videos (CRF 15, 25, 35).

Dataset

  1. Training set

    • REDS dataset. We regroup the training and validation dataset into one folder. The original training dataset has 240 clips from 000 to 239. The original validation dataset were renamed from 240 to 269.
      • Make REDS structure be:
      	├────REDS
      		├────train
      			├────train_sharp
      				├────000
      				├────...
      				├────269
      			├────train_sharp_bicubic
      				├────X4
      					├────000
      					├────...
      					├────269
      
    • Viemo-90K dataset. Download the original data and use the script 'degradation/BD_degradation.m' (run in MATLAB) to generate the low-resolution images. The sep_trainlist.txt file listing the training samples in the download zip file.
      • Make Vimeo-90K structure be:
       	├────vimeo_septuplet
       		├────sequences
       			├────00001
       			├────...
       			├────00096
       		├────sequences_BD
       			├────00001
       			├────...
       			├────00096
       		├────sep_trainlist.txt
       		├────sep_testlist.txt
      
    • Generate the compressed videos by ffmpeg with command "ffmpeg -i LR.mp4 -vcodec libx264 -crf CRFvalue LR_compressed.mp4". We train FTVSR on the 50% uncompressed videos and 50% compressed videos with CRF 15, 25, and 35.
  2. Testing set

    • REDS4 and Vid4 dataset. The 000, 011, 015, 020 clips from the original training dataset of REDS. Download the compressed testing videos from baidu cloud or Google drive.

Test

  1. Clone this github repo
git clone https://github.com/researchmm/FTVSR.git
cd FTVSR
  1. Download pre-trained weights (baidu cloudGoogle drive) under ./checkpoint
  2. Prepare testing dataset and modify "dataset_root" in configs/FTVSR_reds4.py and configs/FTVSR_vimeo90k.py
  3. Run test
# REDS model
CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 ./tools/dist_test.sh configs/FTVSR_reds4.py checkpoint/FTVSR_REDS.pth 8 [--save-path 'save_path']
# Vimeo model
CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 ./tools/dist_test.sh configs/FTVSR_vimeo90k.py checkpoint/FTVSR_Vimeo90K.pth 8 [--save-path 'save_path']
  1. The results are saved in save_path.

Train

  1. Clone this github repo
git clone https://github.com/researchmm/FTVSR.git
cd FTVSR
  1. Prepare training dataset and modify "dataset_root" in configs/FTVSR_reds4.py and configs/FTVSR_vimeo90k.py
  2. Run training
# REDS
CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 ./tools/dist_train.sh configs/FTVSR_reds4.py 8
# Vimeo
CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 ./tools/dist_train.sh configs/FTVSR_vimeo90k.py 8

Related projects

We also sincerely recommend some other excellent works related to us. ✨

Citation

If you find the code and pre-trained models useful for your research, please consider citing our paper. 😊

@InProceedings{qiu2022learning,
author = {Qiu, Zhongwei and Yang, Huan and Fu, Jianlong and Fu, Dongmei},
title = {Learning Spatiotemporal Frequency-Transformer for Compressed Video Super-Resolution},
booktitle = {ECCV},
year = {2022},
}

Acknowledgment

This code is built on mmediting. We thank the authors of BasicVSR for sharing their code.

ftvsr's People

Contributors

ericzw avatar hyang0511 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

ftvsr's Issues

Where could I find the paper?

Nice work!
However, based on the title, I could not find the corresponding paper on google scholar or arxiv. Where could I find the paper?

TPMAI

Hi, I want to know when the code of pami will be released.

Evaluation results on the uncompressed dataset.

Hello, I tried to use your model to train on the general ×4 video super-resolution task. Vimeo90K is used for the training set, and the degraded model is BD. Vid4 is used for the validation set, and the degraded model is BD. The effect seems not to be very good, which is worse than BasicVSR. Is that true?

New Super-Resolution Benchmarks

Hello,

MSU Graphics & Media Lab Video Group has recently launched two new Super-Resolution Benchmarks.

If you are interested in participating, you can add your algorithm following the submission steps:

We would be grateful for your feedback on our work!

Out of memory in evaluation after train epochs

When reproducing the REDS datasets, I reduced the num_input_frames=20 to fit the GPU memory, so the training is fine, 8G is used for every 16G GPU.
企业微信截图_16770536907007
But when it's evaluating, the GPU consumed by the training iteration seems not released. So the evaluation encounters the OOM as
RuntimeError: CUDA out of memory. Tried to allocate 436.00 MiB (GPU 1; 15.90 GiB total capacity; 10.98 GiB already allocated; 189.81 MiB free; 14.78 GiB reserved in total by PyTorch)
企业微信截图_16770537214755

The config file for the data is as below:
data = dict( workers_per_gpu=4, train_dataloader=dict(samples_per_gpu=1, drop_last=True), test_dataloader=dict(samples_per_gpu=1, workers_per_gpu=1),

BTW, similar problem was reported in mmdet but no feasible solutions was found.

Question on inference test

Hello !
I tried to run inference to compare to other VSR networks, but I run out of memory where other networks worked.
What I do is: in a notebook prepare 20 frames of 512x384 and feed this to the model.
This crashes with OOM on GPU w 24GB CUDA memory. At home I don't have bigger GPU.
So does your model require more memory or I am not testing it like it should be done ?
AFAIK the more frames you feed at once, the better result can be expected, is this true ?

When we execute the test code of REDS4,have some mistakes.

KeyError: 'TTVSR is not in the model registry'.
We cannot find the TTVSR model in the released code, we doubt the released code is incomplete so we cannot execute the test code successfully. If possible, please the author to review this released code. Thank you!!!

Less frames in Vid4 testing sets?

It seems that there is less frames in your Vid4 than the original Vid4 testing set (each sequence is 4 frame shorter than the original one).

请问单GPU如何训练、测试?

把python分布式命令
python -m torch.distributed.launch --nproc_per_node=$GPUS --master_port=$PORT
$(dirname "$0")/train.py $CONFIG --launcher pytorch ${@:3}
改成
python ./tools/train.py ./configs/FTVSR_reds4.py
运行模型训练报错:KeyError: 'TTVSR is not in the model registry'

请问是什么原因呢?

Evaluation Problems?

For Vid4, according to your testing code, if I do not misunderstand the code, it seems that you average the metric results of 4x4=16 sub-sequences instead of calculating the average metric results of the 4 original sequences. This will cause inconsistency when comparing with other methods. Maybe you need to stitch the 2x2 results back to get the final results and then calculate the metrics.

Welcome update to OpenMMLab 2.0

Welcome update to OpenMMLab 2.0

I am Vansin, the technical operator of OpenMMLab. In September of last year, we announced the release of OpenMMLab 2.0 at the World Artificial Intelligence Conference in Shanghai. We invite you to upgrade your algorithm library to OpenMMLab 2.0 using MMEngine, which can be used for both research and commercial purposes. If you have any questions, please feel free to join us on the OpenMMLab Discord at https://discord.gg/A9dCpjHPfE or add me on WeChat (ID: van-sin) and I will invite you to the OpenMMLab WeChat group.

Here are the OpenMMLab 2.0 repos branches:

OpenMMLab 1.0 branch OpenMMLab 2.0 branch
MMEngine 0.x
MMCV 1.x 2.x
MMDetection 0.x 、1.x、2.x 3.x
MMAction2 0.x 1.x
MMClassification 0.x 1.x
MMSegmentation 0.x 1.x
MMDetection3D 0.x 1.x
MMEditing 0.x 1.x
MMPose 0.x 1.x
MMDeploy 0.x 1.x
MMTracking 0.x 1.x
MMOCR 0.x 1.x
MMRazor 0.x 1.x
MMSelfSup 0.x 1.x
MMRotate 0.x 1.x
MMYOLO 0.x

Attention: please create a new virtual environment for OpenMMLab 2.0.

Training time for 400k iterations

Nice work! And I'am trying to reproduce the training with Reds dataset on 4 GPU(each with 16G memory). The batchsize is set 1 as suggested in the config file. The total training time looks to be ~31 days. I'm wondering if this is case for your training or I'm wrong somewhere.

Thanks a lot.

How to get compressed video dataset?

In the readme file, 'Generate the compressed videos by ffmpeg with command "ffmpeg -i LR.mp4 -vcodec libx264 -crf CRFvalue LR_compressed.mp4".' to obtain the compressed videos dataset, but REDS and Vimeo90K are the sequences datasets not *.mp4, how to get the compressed video dataset from these sequences datasets specifically?

What dose the mean of dataset name?

For example, the vid4 dataset name is the 'calendar_0_0', 'calendar_0_1', 'calendar_1_0', 'calendar_1_1', what dose the mean of the "_0_0" in "./data/vid4/BD_start_1_crf_25/calendar_0_0". Thank you!

Flops: 1.668T, Params:43.33M????

I use the code provided by your github. Thop shows the flops and params are 1.668T and 43.331M, so which model is 10.8M(in your paper)?

RuntimeError: CUDA out of memory.

Hi,

I've tested single gpu test in two different machines, one with a GPU 8GB, another is with A10 24 GB. Both gave me oom error even with num_input_frames = 2. What am I missing?

First one with RTX said:

Tried to allocate 620.00 MiB (GPU 0; 7.80 GiB total capacity; 4.31 GiB already allocated; 412.31 MiB free; 6.33 GiB reserved in total by PyTorch)

while on another machine with A10 said:

Tried to allocate 5.50 GiB (GPU 0; 22.02 GiB total capacity; 10.77 GiB already allocated; 4.78 GiB free; 15.63 GiB reserved in total by PyTorch)

I changed REDS dataset to SRFolderMultipleGTDataset and I've a 2 video subset of REDS4_val which looks like

-REDS4_short/
|-- val_sharp/
|--|-- 000/
|--|--|-- %08d.png
|--|-- 001/
|--|--|-- %08d.png
|-- val_sharp_bicubic/
|--|-- X4/
|--|--|-- 000/
|--|--|--|-- %08d.png
|--|--|-- 001/
|--|--|--|-- %08d.png

And I've used the following command: tools/test.py <path/to/config> <path/to/redsModel>--crf 25 --startIdx 0 --test_frames 50

Let me know if you need any further info to help. Thanks!

about iteration test

If I need to test a sequence of pictures with a total length of 1000 and test 20 pictures each time, how should I modify the code? Because I found that modifying num_ input_ frame to 20, and the test image is only 20,but i want to test 1000 pictures

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.