lidq92 / vsfa Goto Github PK

View Code? Open in Web Editor NEW

196.0 5.0 38.0 64.74 MB

[official] Quality Assessment of In-the-Wild Videos (ACM MM 2019)

Home Page: https://lidq92.github.io/VSFA/

License: MIT License

Python 90.99% MATLAB 9.01%

quality-assessment video-quality-assessment in-the-wild blind-video-quality-assessment pytorch

vsfa's Introduction

Quality Assessment of In-the-Wild Videos

Description

VSFA code for the following papers:

Dingquan Li, Tingting Jiang, and Ming Jiang. Quality Assessment of In-the-Wild Videos. In Proceedings of the 27th ACM International Conference on Multimedia (MM ’19), October 21-25, 2019, Nice, France. [arxiv version]

Intra-Database Experiments (Training and Evaluating)

Feature extraction

CUDA_VISIBLE_DEVICES=0 python CNNfeatures.py --database=KoNViD-1k --frame_batch_size=64

You need to specify the database and change the corresponding videos_dir.

Quality prediction

CUDA_VISIBLE_DEVICES=0 python VSFA.py --database=KoNViD-1k --exp_id=0

You need to specify the database and exp_id.

Visualization

tensorboard --logdir=logs --port=6006 # in the server (host:port)
ssh -p port -L 6006:localhost:6006 user@host # in your PC. See the visualization in your PC

Reproduced results

We set seeds for the random generators and re-run the experiments on the same ten splits, i.e., the first 10 splits (exp_id=0~9). The results may be still not the same among different version of PyTorch. See randomness@Pytorch Docs

The reproduced overall results are better than the previous results published in the paper. We add learning rate scheduling in the updated code. Better hyper-parameters may be set, if you "look" at the training loss curve and the curves of validation results.

The mean (std) values of the first ten index splits (60%:20%:20% train:val:test)

	KoNViD-1k	CVD2014	LIVE-Qualcomm
SROCC	0.7728 (0.0189)	0.8698 (0.0368)	0.7726 (0.0611)
KROCC	0.5784 (0.0194)	0.6950 (0.0465)	0.5871 (0.0620)
PLCC	0.7754 (0.0192)	0.8678 (0.0315)	0.7954 (0.0553)
RMSE	0.4205 (0.0211)	10.8572 (1.3518)	7.5495 (0.7017)

Test Demo

The model weights provided in models/VSFA.pt are the saved weights when running the 9-th split of KoNViD-1k.

python test_demo.py --video_path=test.mp4

Requirement

conda create -n reproducibleresearch pip python=3.6
source activate reproducibleresearch
pip install -r requirements.txt -i https://pypi.tuna.tsinghua.edu.cn/simple
source deactive

PyTorch 1.1.0
TensorboardX 1.2, TensorFlow-TensorBoard

Note: The codes can also be directly run on PyTorch 1.3.

Contact

Dingquan Li, dingquanli AT pku DOT edu DOT cn.

vsfa's People

Contributors

Stargazers

Watchers

vsfa's Issues

Missing closing bracket )

VSFA/test_demo.py

Line 81 in f5af5ee

print('Time: {} s'.format(end-start)

[alternative links of the datasets] How to download KoNViD-1k and LIVE-Qualcomm?

前辈您好，现在这两个数据集下载链接失效了，您可以帮忙提供一下百度云下载链接嘛，谢谢您

视频文件读取问题

在载入大文件视频时会存在内存溢出或者cuda显存溢出问题，希望作者可以改进io这里的方法，分批次读取并推理得分

Which database is the pre-trained model(VSFA.pt) based on ?

Hi! I notice that there is a pre-trained model file 'VSFA.pt' in the directory 'models'.
Could you please tell me which database is the pre-trained model trained on?
Thank you so much.

Loss Function!

Hi, I found a new paper which was aimed at IQA used a novel loss function named 'Norm in Norm'. Yeah, it's a great new work belongs to you and your team, I wonder know weather it is evaluated in VQA case?

[Qustion] On the training the CNN networks

Hi, I want to train the CNN instead of transfer learning and freeze it.

So I integrated CNN part and LSTM part to one class, also modified the train data module.

However, the dimension of CNN output was [timesteps x features x batches x 1]

and input of GRU was [batches x timesteps x features].

Because CNN output and GRU(batchfirst=True) input has difference in terms of dimension,

I used squeeze and unsqueeze and swapaxes technique and it worked. (I'm not 100% sure)

Here is my code.

    def forward(self, pixel, video_length):
        for avg_pooling, model in enumerate(self.batch):
            pixel = model(pixel)
            if avg_pooling == 1:
                #print("D0_1 :", pixel.shape)  # timesteps 576(FILTERS) 1 1
                avg_pooling_2d = nn.functional.adaptive_avg_pool2d(pixel, 1) # timesteps 576(FILTERS) 1 1
                #print("D0_2 :", avg_pooling_2d.shape)
                #avg_pooling_2d = torch.swapaxes(avg_pooling_2d, 2, 0)
                #avg_pooling_2d = torch.swapaxes(avg_pooling_2d, 1, 2)
                #avg_pooling_2d = torch.squeeze(avg_pooling_2d, dim=3)
                #avg_pooling_2d = SA(avg_pooling_2d)
                #print("D1 :", avg_pooling_2d.shape)   #1 timesteps FILTERS
                input = self.ReLU(avg_pooling_2d)
                #print("D2 :", input.shape) #1 24 256
                outputs, _ = self.rnn(input, torch.zeros(1, 1, 64, device=device))

I used batch size to 1 to simplity...
But My train and validation loss is too high... and I think the swapaxes? maybe a reason for unsusseful training....?(loss backpropagation is errored cuz of swapaces...? I'm not sure)

Could you may update your code that may train the CNN instead of just load it??
< I may expect much better performance when it may succeess...^^;;; >
I made it but I think my code has un error in somewhat although it can run in python....
(I experienced CUDA memory out, so I used some timesteps like using every 8th frames, that helped resolve the CUDA memory out)

Thanks..

Is it possible to share VBLIINDS features for live qualcomm and konvid-1k?

How to download LIVE-VQC? Is there baidu disk link ? Thanks for a help

模型问题

作者您好，在我复现您的代码出现了几个问题
1、执行您的源码时发现models路径下并不存在代码中train_model_file所需的名为VSFA-KoNViD-1k-EXP0的模型，
2、当我将此变量更换为使用models内原有的VSFA.pt时，使用默认学习率训练，结果出现SROCC、KROCC、PLCC均为nan，RMSE为0.625的情况，在将学习率调低之后，使用了权重衰减再次训练仍然出现此情况
3、请问训练结束后的模型是直接更新了原有的VSFA.pt吗

CVD2014 dataset

CVD2014 dataset is not avaliable on the website. Can you provide the dataset?

Pretrained Weights

Thanks for your great work! Could you provide the pretrained weight using LIVE-Qualcomm dataset? Thanks very much!

hyper-parameters & motion-related features

你好，我注意到论文中的FC层的设置为先通过全连接FC 4096->128 后 RELU+dropout 再全连接FC 32->32 。这样设置是有什么依据吗？是否尝试过在GRU之后回归得分时采用类似操作来提升性能？

有关 Content-Aware Feature Extraction部分的spatial global average pooling operation的问题

您好！Content-Aware Feature Extraction部分的池化操作不仅用了全局平局池化也用了这个global average pooling operation，这个池化是全局标准差池化的意义是啥呢，求解。为什么不用全局平均池化和全局最大池化呢，会不会效果更好呢？谢谢解答。

global std pooling ?

hi，大佬
这个操作

def global_std_pool2d(x):
    """2D global standard variation pooling"""
    return torch.std(x.view(x.size()[0], x.size()[1], -1, 1),
                     dim=2, keepdim=True)

这个std的池化有什么参考吗？我看论文似乎没说啊，
最后两个维度HW为什么还要保持呢？
多谢。

CUDA out of memory when testing my own datasets

Hi Dr. Li. Sorry for a third question. When I want to test a self-collected datasets, the test_demo.py give a error feedback that memory is not enough

RuntimeError: CUDA out of memory. Tried to allocate 676.00 MiB (GPU 0; 10.76 GiB total capacity; 1.04 GiB already allocated; 602.56 MiB free; 1.56 GiB reserved in total by PyTorch)

I would guess my video resolution caused this with a 1080*1920 ,however, even the provided test.mp4 get the same error.

I want to know if the all the tests are right in your experiments with any resolution videos?

Thank you!

Questions about 'data_info_maker.m' and the epochs

Thanks for the code. But when I reproduce the paper,I found that in 'KoNViD-1k' dataset, the video clips is '5319047612.mp4',but the code in data_info_maker.m is to get the file_name like '5319047612_cut_centercrop_960x540_8s.mp4' ,thus will return erro

FileNotFoundError: [Errno 2] No such file or directory: './data/KoNViD-1k/KoNViD_1k_videos/8536919744 _original_centercrop_960x540_8s.mp4'

I don't know if the code is wrong or the dataset I downloaded has a different file name

Use this method need the super memory?

测试分数为负数值

您好，我用您的模型测试一些视频文件的时候发现会出现一些结果为负数的情况。另外我看到您是最近在get_features()函数中添加了 extractor.eval() 这句话，删除这句话后，测试结果中没有负数值，整体结果也比较好。extractor.eval()这句究竟该不该要。。。

Question about CORNIA

论文中所提到的CORNIA算法，我尝试过寻找开源代码，但并未找到。能否提供一下V-CORNIA的开源出处。十分感谢

Results in MSU Video Quality Metrics Benchmark

Hello! We have recently launched and evaluated this algorithm on the dataset of our video quality metrics benchmark. The dataset distortions refer to compression artifacts on professional and user-generated content. Method took 10th place on the global leaderboard and 4th place on the no-reference-only leaderboard in the terms of SROCC. You can see more detailed results here. If you have any other video quality metric (either full-reference or no-reference) that you want to see in our benchmark, we kindly invite you to participate. You can submit it to the benchmark, following the submission steps, described here.

video_names

您好，请问下载的KoNViD-1k数据集内视频的名字与加载的video_names（8424428827_original_centercrop.mp4）名称不一导致找不到文件是什么原因?

Missing CNN_features_KoNViD-1k/1_resnet-50_res5c.npy

By running the command:

CUDA_VISIBLE_DEVICES=0 python VSFA.py --database=KoNViD-1k --exp_id=0

A error is thrown that the file or directory "CNN_features_KoNViD-1k/1_resnet-50_res5c.npy" is missing (FileNotFoundError)

Can you please add this file or provide a link to download it.
Thanks in advance.

Here the terminal log
$ CUDA_VISIBLE_DEVICES=0 python VSFA.py --database=KoNViD-1k --exp_id=0 EXP ID: 0 KoNViD-1k VSFA Traceback (most recent call last): File "VSFA.py", line 161, in <module> train_dataset = VQADataset(features_dir, train_index, max_len, scale=scale) File "VSFA.py", line 30, in __init__ features = np.load(features_dir + str(index[i]) + '_resnet-50_res5c.npy') File "/<SOME_PATH>/numpy/lib/npyio.py", line 428, in load fid = open(os_fspath(file), "rb") FileNotFoundError: [Errno 2] No such file or directory: 'CNN_features_KoNViD-1k/1_resnet-50_res5c.npy'

The question about GRU setting

我注意到你使用GRU模块将128维降维至32。这块对于GRU的设置处理是怎么思考的，GRU的降维是否会导致部分信息的损失？

mean and std Norm ?

hi,dear 大佬
对于得到的视频帧数据为何进行固定mean和std参数的Norm处理啊？
codes is here
transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
这个有什么参考或者说依据吗？
多谢

How to download the datasets used in the paper?

我发送了申请，但是返回的地址打不开...

Do you have the version of matlab

Due to the ability，can you upload the version of MATLAB。THANKS

Question about V-BLIINDS&NIQE

Hi Dr.Li:

I notice you used NIQE and V-BLIINDS(needs NIQE as part of feature) as the compared methods.

I want to know if you tested with the default setting (125 pristine images with patch size set to 96X96 and sharpness threshold of 0.75) or you retrained and got new NIQE parameters?

Thanks!

请问数据读取是否有问题呢？

ref_ids = Info['ref_ids'][0, :]请问这里是不是有问题？因为info['ref_ids']是(n,1)，所以如果ref_ids = Info['ref_ids'][0, :]这样的话，读取的只是一个数，这样后面都会乱了。

我修改成ref_ids = Info['ref_ids'][:, 0]，然后训练之后，发现结果一直是不对的。请问大神，您知道这是怎么回事吗？

VSFA's Performance comparision with VIIDEO

Hi, Dr.Li:

I find there's a performance comparision with VIIDEO in Table 1. But the VIIDEO algorithm needs luminance as input according to the released code from LIVE.

I want to know if you just transformed the .mp4 file into .yuv and then send it to VIIDEO? Would this cause any disturbance to the final result?

Thanks a lot!

[Qustion] Overfitting problem

Hi, I'm trying to enhance validation mae error, recently I discovered that konvid 1k dataset has overfitting problem. (I'm also trying to train CNN layer also...)

In my specific case, the train mae is reduced to 0.01/1.0 but validation mae is 0.1/1.0 when the epoch is increased.

I think that the small number of train data(total 1200 ea) is one of the problem.

I'm thinking data augmentation like this : https://github.com/okankop/vidaug

Because konvid-1k dataset has MOS label, I think vertical or horizontal flip may good for reduce val mae..? (Not add noise level)

This may good way...? I'm trying to add this in the your provided code and will check this way may reduce val mae. Thanks you!!