researchmm / ftvsr Goto Github PK

[ECCV'22] FTVSR: Learning Spatiotemporal Frequency-Transformer for Compressed Video Super-Resolution

License: MIT License

Python 99.48% Shell 0.10% MATLAB 0.42%

video-restoration video-super-resolution

ftvsr's Introduction

FTVSR (ECCV 2022)

This is the official PyTorch implementation of the paper Learning Spatiotemporal Frequency-Transformer for Compressed Video Super-Resolution.

Introduction
Requirements and dependencies
Model
Dataset
Test
Train
Related projects
Citation
Acknowledgment

Introduction

Compressed video super-resolution (VSR) aims to restore high-resolution frames from compressed low-resolution counterparts. Most recent VSR approaches often enhance an input frame by “borrowing” relevant textures from neighboring video frames. Although some progress has been made, there are grand challenges to effectively extract and transfer high-quality textures from compressed videos where most frames are usually highly degraded. we propose a novel Frequency-Transformer for compressed Video Super-Resolution (FTVSR) that conducts self-attention over a joint space-time-frequency domain. FTVSR significantly outperforms previous methods and achieves new SOTA results.

Contribution

We propose transfering video frames into frequecy domain design a novel frequency attention mechanism. We study the different self-attention schemes among space, time and frequency dimensions. We propose a novel Frequency-Transformer for compressed Video Super-Resolution (FTVSR) that conducts self-attention over a joint space-time-frequency domain.

Overview

Visual

Some visual results on videos with different compression rates (No compression, CRF 15, 25, 35).

Requirements and dependencies

python 3.7 (recommend to use Anaconda)
pytorch == 1.9.0
torchvision == 0.10.0
opencv-python == 4.5.3
mmcv-full == 1.3.9
scipy==1.7.3
scikit-image == 0.19.0
lmdb == 1.2.1
yapf == 0.31.0
tensorboard == 2.6.0

Model

Pre-trained models can be downloaded from baidu cloud(i42r) or Google drive.

FTVSR_REDS.pth: trained on REDS dataset with 50% uncompressed videos and 50% compressed videos (CRF 15, 25, 35).
FTVSR_Vimeo90K.pth: trained on Vimeo-90K dataset with 50% uncompressed videos and 50% compressed videos (CRF 15, 25, 35).

Dataset

Training set
- REDS dataset. We regroup the training and validation dataset into one folder. The original training dataset has 240 clips from 000 to 239. The original validation dataset were renamed from 240 to 269.
  - Make REDS structure be:
```
	├────REDS
		├────train
			├────train_sharp
				├────000
				├────...
				├────269
			├────train_sharp_bicubic
				├────X4
					├────000
					├────...
					├────269
```
- Viemo-90K dataset. Download the original data and use the script 'degradation/BD_degradation.m' (run in MATLAB) to generate the low-resolution images. The sep_trainlist.txt file listing the training samples in the download zip file.
  - Make Vimeo-90K structure be:
```
 	├────vimeo_septuplet
 		├────sequences
 			├────00001
 			├────...
 			├────00096
 		├────sequences_BD
 			├────00001
 			├────...
 			├────00096
 		├────sep_trainlist.txt
 		├────sep_testlist.txt
```
- Generate the compressed videos by ffmpeg with command "ffmpeg -i LR.mp4 -vcodec libx264 -crf CRFvalue LR_compressed.mp4". We train FTVSR on the 50% uncompressed videos and 50% compressed videos with CRF 15, 25, and 35.
Testing set
- REDS4 and Vid4 dataset. The 000, 011, 015, 020 clips from the original training dataset of REDS. Download the compressed testing videos from baidu cloud or Google drive.

Test

Clone this github repo

git clone https://github.com/researchmm/FTVSR.git
cd FTVSR

Download pre-trained weights (baidu cloud ｜ Google drive) under ./checkpoint
Prepare testing dataset and modify "dataset_root" in configs/FTVSR_reds4.py and configs/FTVSR_vimeo90k.py
Run test

# REDS model
CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 ./tools/dist_test.sh configs/FTVSR_reds4.py checkpoint/FTVSR_REDS.pth 8 [--save-path 'save_path']
# Vimeo model
CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 ./tools/dist_test.sh configs/FTVSR_vimeo90k.py checkpoint/FTVSR_Vimeo90K.pth 8 [--save-path 'save_path']

The results are saved in save_path.

Train

Clone this github repo

git clone https://github.com/researchmm/FTVSR.git
cd FTVSR

Prepare training dataset and modify "dataset_root" in configs/FTVSR_reds4.py and configs/FTVSR_vimeo90k.py
Run training

# REDS
CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 ./tools/dist_train.sh configs/FTVSR_reds4.py 8
# Vimeo
CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 ./tools/dist_train.sh configs/FTVSR_vimeo90k.py 8

Related projects

We also sincerely recommend some other excellent works related to us. ✨

Citation

If you find the code and pre-trained models useful for your research, please consider citing our paper. 😊

@InProceedings{qiu2022learning,
author = {Qiu, Zhongwei and Yang, Huan and Fu, Jianlong and Fu, Dongmei},
title = {Learning Spatiotemporal Frequency-Transformer for Compressed Video Super-Resolution},
booktitle = {ECCV},
year = {2022},
}

Acknowledgment

This code is built on mmediting. We thank the authors of BasicVSR for sharing their code.

ftvsr's People

Contributors

Stargazers

Watchers

Forkers

peterzs ericzw oak112 lifengcs mrlik ip-superresolution aliang-cv 1359347500cwc zoonono pnunna93 feiwei9696 xyy7

ftvsr's Issues

How to download the pretrained model?

Thanks for sharing the great project - there is no pass to the https://pan.baidu.com/s/1ZIq6T98Iv1oGk7rC46WACg Baidu link.
Please advise

Where could I find the paper?

Nice work!
However, based on the title, I could not find the corresponding paper on google scholar or arxiv. Where could I find the paper?

TPMAI

Hi, I want to know when the code of pami will be released.

RuntimeError: Distributed package doesn't have NCCL built in

Is it not possible to run on Windows?

The resolution of the released Vid4 test dataset is wrong！！！

When we test Vid4 dataset， we find the FTVSR support the input resolution is least 64. However， we get the solution of foliage and walk is 88x60. It's contradictory！！！ So grotesque...

请问是否FTVSR中的upsample network就是TTVSR?

Evaluation results on the uncompressed dataset.

Hello, I tried to use your model to train on the general ×4 video super-resolution task. Vimeo90K is used for the training set, and the degraded model is BD. Vid4 is used for the validation set, and the degraded model is BD. The effect seems not to be very good, which is worse than BasicVSR. Is that true?

hope this will be setup on colab

great work,

hope you can setup a colab notebook in which users can upload their .mp4 files and get a super-res output back.

RuntimeError: expected stride to be a single integer value or a list of 3 values to match the convolution dimensions, but got stride=[8, 8]

When I was running the model, I found that there seems to be a problem with the weight of the DCT layer. Thanks you very much!

New Super-Resolution Benchmarks

Hello,

MSU Graphics & Media Lab Video Group has recently launched two new Super-Resolution Benchmarks.

Super-Resolution for Video Compression benchmark aims to test Super-Resolution methods on compressed videos and select the best model for each video codec standard.
Video Upscalers Benchmark: Quality Enhancement determines the best upscaling methods for increasing video resolution and improving visual quality.

If you are interested in participating, you can add your algorithm following the submission steps:

We would be grateful for your feedback on our work!

Out of memory in evaluation after train epochs

When reproducing the REDS datasets, I reduced the num_input_frames=20 to fit the GPU memory, so the training is fine, 8G is used for every 16G GPU.

But when it's evaluating, the GPU consumed by the training iteration seems not released. So the evaluation encounters the OOM as
RuntimeError: CUDA out of memory. Tried to allocate 436.00 MiB (GPU 1; 15.90 GiB total capacity; 10.98 GiB already allocated; 189.81 MiB free; 14.78 GiB reserved in total by PyTorch)

The config file for the data is as below:
data = dict( workers_per_gpu=4, train_dataloader=dict(samples_per_gpu=1, drop_last=True), test_dataloader=dict(samples_per_gpu=1, workers_per_gpu=1),

BTW, similar problem was reported in mmdet but no feasible solutions was found.

Question on inference test

Hello !
I tried to run inference to compare to other VSR networks, but I run out of memory where other networks worked.
What I do is: in a notebook prepare 20 frames of 512x384 and feed this to the model.
This crashes with OOM on GPU w 24GB CUDA memory. At home I don't have bigger GPU.
So does your model require more memory or I am not testing it like it should be done ?
AFAIK the more frames you feed at once, the better result can be expected, is this true ?

When we execute the test code of REDS4，have some mistakes.

KeyError: 'TTVSR is not in the model registry'.
We cannot find the TTVSR model in the released code, we doubt the released code is incomplete so we cannot execute the test code successfully. If possible, please the author to review this released code. Thank you!!!

Less frames in Vid4 testing sets?

It seems that there is less frames in your Vid4 than the original Vid4 testing set (each sequence is 4 frame shorter than the original one).

请问单GPU如何训练、测试？

把python分布式命令
python -m torch.distributed.launch --nproc_per_node=$GPUS --master_port=$PORT
$(dirname "$0")/train.py $CONFIG --launcher pytorch ${@:3}
改成
python ./tools/train.py ./configs/FTVSR_reds4.py
运行模型训练报错：KeyError: 'TTVSR is not in the model registry'

请问是什么原因呢？

Evaluation Problems?

For Vid4, according to your testing code, if I do not misunderstand the code, it seems that you average the metric results of 4x4=16 sub-sequences instead of calculating the average metric results of the 4 original sequences. This will cause inconsistency when comparing with other methods. Maybe you need to stitch the 2x2 results back to get the final results and then calculate the metrics.

0 How to draw the picture of Fig(b) and Fig(c) in your ECCV2022 paper?

How to draw the DCT-based spectral maps of Fig 1(b) and Amplitude-Frequency curves of Fig(c) in your ECCV2022 paper?

Thank you very much!

Welcome update to OpenMMLab 2.0

I am Vansin, the technical operator of OpenMMLab. In September of last year, we announced the release of OpenMMLab 2.0 at the World Artificial Intelligence Conference in Shanghai. We invite you to upgrade your algorithm library to OpenMMLab 2.0 using MMEngine, which can be used for both research and commercial purposes. If you have any questions, please feel free to join us on the OpenMMLab Discord at https://discord.gg/A9dCpjHPfE or add me on WeChat (ID: van-sin) and I will invite you to the OpenMMLab WeChat group.

Here are the OpenMMLab 2.0 repos branches:

	OpenMMLab 1.0 branch	OpenMMLab 2.0 branch
MMEngine		0.x
MMCV	1.x	2.x
MMDetection	0.x 、1.x、2.x	3.x
MMAction2	0.x	1.x
MMClassification	0.x	1.x
MMSegmentation	0.x	1.x
MMDetection3D	0.x	1.x
MMEditing	0.x	1.x
MMPose	0.x	1.x
MMDeploy	0.x	1.x
MMTracking	0.x	1.x
MMOCR	0.x	1.x
MMRazor	0.x	1.x
MMSelfSup	0.x	1.x
MMRotate	0.x	1.x
MMYOLO		0.x

Attention: please create a new virtual environment for OpenMMLab 2.0.

Training time for 400k iterations

Nice work! And I'am trying to reproduce the training with Reds dataset on 4 GPU(each with 16G memory). The batchsize is set 1 as suggested in the config file. The total training time looks to be ~31 days. I'm wondering if this is case for your training or I'm wrong somewhere.

Thanks a lot.

How to get compressed video dataset?

In the readme file, 'Generate the compressed videos by ffmpeg with command "ffmpeg -i LR.mp4 -vcodec libx264 -crf CRFvalue LR_compressed.mp4".' to obtain the compressed videos dataset, but REDS and Vimeo90K are the sequences datasets not *.mp4, how to get the compressed video dataset from these sequences datasets specifically?

What dose the mean of dataset name?

For example, the vid4 dataset name is the 'calendar_0_0', 'calendar_0_1', 'calendar_1_0', 'calendar_1_1', what dose the mean of the "_0_0" in "./data/vid4/BD_start_1_crf_25/calendar_0_0". Thank you!

How to split the 50% uncompressed videos and 50% compressed videos in training dataset?

In the paper, we got the training dataset split two parts: 50% uncompressed videos and 50% compressed videos, but we can't find how to split the training dataset. It's so confusing!

Flops: 1.668T, Params:43.33M????

I use the code provided by your github. Thop shows the flops and params are 1.668T and 43.331M, so which model is 10.8M(in your paper)?

RuntimeError: CUDA out of memory.

Hi,

I've tested single gpu test in two different machines, one with a GPU 8GB, another is with A10 24 GB. Both gave me oom error even with num_input_frames = 2. What am I missing?

First one with RTX said:

Tried to allocate 620.00 MiB (GPU 0; 7.80 GiB total capacity; 4.31 GiB already allocated; 412.31 MiB free; 6.33 GiB reserved in total by PyTorch)

while on another machine with A10 said:

Tried to allocate 5.50 GiB (GPU 0; 22.02 GiB total capacity; 10.77 GiB already allocated; 4.78 GiB free; 15.63 GiB reserved in total by PyTorch)

I changed REDS dataset to SRFolderMultipleGTDataset and I've a 2 video subset of REDS4_val which looks like

-REDS4_short/
|-- val_sharp/
|--|-- 000/
|--|--|-- %08d.png
|--|-- 001/
|--|--|-- %08d.png
|-- val_sharp_bicubic/
|--|-- X4/
|--|--|-- 000/
|--|--|--|-- %08d.png
|--|--|-- 001/
|--|--|--|-- %08d.png

And I've used the following command: tools/test.py <path/to/config> <path/to/redsModel>--crf 25 --startIdx 0 --test_frames 50

Let me know if you need any further info to help. Thanks!