Git Product home page Git Product logo

vsfa's Introduction

Quality Assessment of In-the-Wild Videos

License

Description

VSFA code for the following papers:

Intra-Database Experiments (Training and Evaluating)

Feature extraction

CUDA_VISIBLE_DEVICES=0 python CNNfeatures.py --database=KoNViD-1k --frame_batch_size=64

You need to specify the database and change the corresponding videos_dir.

Quality prediction

CUDA_VISIBLE_DEVICES=0 python VSFA.py --database=KoNViD-1k --exp_id=0

You need to specify the database and exp_id.

Visualization

tensorboard --logdir=logs --port=6006 # in the server (host:port)
ssh -p port -L 6006:localhost:6006 user@host # in your PC. See the visualization in your PC

Reproduced results

We set seeds for the random generators and re-run the experiments on the same ten splits, i.e., the first 10 splits (exp_id=0~9). The results may be still not the same among different version of PyTorch. See randomness@Pytorch Docs

The reproduced overall results are better than the previous results published in the paper. We add learning rate scheduling in the updated code. Better hyper-parameters may be set, if you "look" at the training loss curve and the curves of validation results.

The mean (std) values of the first ten index splits (60%:20%:20% train:val:test)

KoNViD-1k CVD2014 LIVE-Qualcomm
SROCC 0.7728 (0.0189) 0.8698 (0.0368) 0.7726 (0.0611)
KROCC 0.5784 (0.0194) 0.6950 (0.0465) 0.5871 (0.0620)
PLCC 0.7754 (0.0192) 0.8678 (0.0315) 0.7954 (0.0553)
RMSE 0.4205 (0.0211) 10.8572 (1.3518) 7.5495 (0.7017)

Test Demo

The model weights provided in models/VSFA.pt are the saved weights when running the 9-th split of KoNViD-1k.

python test_demo.py --video_path=test.mp4

Requirement

conda create -n reproducibleresearch pip python=3.6
source activate reproducibleresearch
pip install -r requirements.txt -i https://pypi.tuna.tsinghua.edu.cn/simple
source deactive
  • PyTorch 1.1.0
  • TensorboardX 1.2, TensorFlow-TensorBoard

Note: The codes can also be directly run on PyTorch 1.3.

Contact

Dingquan Li, dingquanli AT pku DOT edu DOT cn.

vsfa's People

Contributors

dependabot[bot] avatar lidq92 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

vsfa's Issues

视频文件读取问题

在载入大文件视频时会存在内存溢出或者cuda显存溢出问题,希望作者可以改进io这里的方法,分批次读取并推理得分

Loss Function!

Hi, I found a new paper which was aimed at IQA used a novel loss function named 'Norm in Norm'. Yeah, it's a great new work belongs to you and your team, I wonder know weather it is evaluated in VQA case?

[Qustion] On the training the CNN networks

Hi, I want to train the CNN instead of transfer learning and freeze it.

So I integrated CNN part and LSTM part to one class, also modified the train data module.

However, the dimension of CNN output was [timesteps x features x batches x 1]

and input of GRU was [batches x timesteps x features].

Because CNN output and GRU(batchfirst=True) input has difference in terms of dimension,

I used squeeze and unsqueeze and swapaxes technique and it worked. (I'm not 100% sure)

Here is my code.

    def forward(self, pixel, video_length):
        for avg_pooling, model in enumerate(self.batch):
            pixel = model(pixel)
            if avg_pooling == 1:
                #print("D0_1 :", pixel.shape)  # timesteps 576(FILTERS) 1 1
                avg_pooling_2d = nn.functional.adaptive_avg_pool2d(pixel, 1) # timesteps 576(FILTERS) 1 1
                #print("D0_2 :", avg_pooling_2d.shape)
                #avg_pooling_2d = torch.swapaxes(avg_pooling_2d, 2, 0)
                #avg_pooling_2d = torch.swapaxes(avg_pooling_2d, 1, 2)
                #avg_pooling_2d = torch.squeeze(avg_pooling_2d, dim=3)
                #avg_pooling_2d = SA(avg_pooling_2d)
                #print("D1 :", avg_pooling_2d.shape)   #1 timesteps FILTERS
                input = self.ReLU(avg_pooling_2d)
                #print("D2 :", input.shape) #1 24 256
                outputs, _ = self.rnn(input, torch.zeros(1, 1, 64, device=device))

I used batch size to 1 to simplity...
But My train and validation loss is too high... and I think the swapaxes? maybe a reason for unsusseful training....?(loss backpropagation is errored cuz of swapaces...? I'm not sure)

Could you may update your code that may train the CNN instead of just load it??
< I may expect much better performance when it may succeess...^^;;; >
I made it but I think my code has un error in somewhat although it can run in python....
(I experienced CUDA memory out, so I used some timesteps like using every 8th frames, that helped resolve the CUDA memory out)

Thanks..

模型问题

作者您好,在我复现您的代码出现了几个问题
1、执行您的源码时发现models路径下并不存在代码中train_model_file所需的名为VSFA-KoNViD-1k-EXP0的模型,
2、当我将此变量更换为使用models内原有的VSFA.pt时,使用默认学习率训练,结果出现SROCC、KROCC、PLCC均为nan,RMSE为0.625的情况,在将学习率调低之后,使用了权重衰减再次训练仍然出现此情况
3、请问训练结束后的模型是直接更新了原有的VSFA.pt吗

CVD2014 dataset

CVD2014 dataset is not avaliable on the website. Can you provide the dataset?

Pretrained Weights

Thanks for your great work! Could you provide the pretrained weight using LIVE-Qualcomm dataset? Thanks very much!

hyper-parameters & motion-related features

你好,我注意到论文中的FC层的设置为先通过全连接FC 4096->128 后 RELU+dropout 再全连接FC 32->32 。 这样设置是有什么依据吗? 是否尝试过在GRU之后回归得分时采用类似操作来提升性能?

global std pooling ?

hi,大佬
这个操作

def global_std_pool2d(x):
    """2D global standard variation pooling"""
    return torch.std(x.view(x.size()[0], x.size()[1], -1, 1),
                     dim=2, keepdim=True)

这个std的池化有什么参考吗?我看论文似乎没说啊,
最后两个维度HW为什么还要保持呢?
多谢。

CUDA out of memory when testing my own datasets

Hi Dr. Li. Sorry for a third question. When I want to test a self-collected datasets, the test_demo.py give a error feedback that memory is not enough

RuntimeError: CUDA out of memory. Tried to allocate 676.00 MiB (GPU 0; 10.76 GiB total capacity; 1.04 GiB already allocated; 602.56 MiB free; 1.56 GiB reserved in total by PyTorch)

I would guess my video resolution caused this with a 1080*1920 ,however, even the provided test.mp4 get the same error.
image

I want to know if the all the tests are right in your experiments with any resolution videos?

Thank you!

Questions about 'data_info_maker.m' and the epochs

Thanks for the code. But when I reproduce the paper,I found that in 'KoNViD-1k' dataset, the video clips is '5319047612.mp4',but the code in data_info_maker.m is to get the file_name like '5319047612_cut_centercrop_960x540_8s.mp4' ,thus will return erro

FileNotFoundError: [Errno 2] No such file or directory: './data/KoNViD-1k/KoNViD_1k_videos/8536919744 _original_centercrop_960x540_8s.mp4'

I don't know if the code is wrong or the dataset I downloaded has a different file name

测试分数为负数值

您好,我用您的模型测试一些视频文件的时候发现会出现一些结果为负数的情况。另外我看到您是最近在get_features()函数中添加了 extractor.eval() 这句话 ,删除这句话后, 测试结果中没有负数值, 整体结果也比较好。extractor.eval()这句究竟该不该要。。。

Question about CORNIA

论文中所提到的CORNIA算法,我尝试过寻找开源代码,但并未找到。能否提供一下V-CORNIA的开源出处。十分感谢

Results in MSU Video Quality Metrics Benchmark

Hello! We have recently launched and evaluated this algorithm on the dataset of our video quality metrics benchmark. The dataset distortions refer to compression artifacts on professional and user-generated content. Method took 10th place on the global leaderboard and 4th place on the no-reference-only leaderboard in the terms of SROCC. You can see more detailed results here. If you have any other video quality metric (either full-reference or no-reference) that you want to see in our benchmark, we kindly invite you to participate. You can submit it to the benchmark, following the submission steps, described here.

video_names

您好,请问下载的KoNViD-1k数据集内视频的名字与加载的video_names(8424428827_original_centercrop.mp4)名称不一导致找不到文件是什么原因?

Missing CNN_features_KoNViD-1k/1_resnet-50_res5c.npy

By running the command:

  1. CUDA_VISIBLE_DEVICES=0 python VSFA.py --database=KoNViD-1k --exp_id=0

A error is thrown that the file or directory "CNN_features_KoNViD-1k/1_resnet-50_res5c.npy" is missing (FileNotFoundError)

Can you please add this file or provide a link to download it.
Thanks in advance.

Here the terminal log
$ CUDA_VISIBLE_DEVICES=0 python VSFA.py --database=KoNViD-1k --exp_id=0 EXP ID: 0 KoNViD-1k VSFA Traceback (most recent call last): File "VSFA.py", line 161, in <module> train_dataset = VQADataset(features_dir, train_index, max_len, scale=scale) File "VSFA.py", line 30, in __init__ features = np.load(features_dir + str(index[i]) + '_resnet-50_res5c.npy') File "/<SOME_PATH>/numpy/lib/npyio.py", line 428, in load fid = open(os_fspath(file), "rb") FileNotFoundError: [Errno 2] No such file or directory: 'CNN_features_KoNViD-1k/1_resnet-50_res5c.npy'

The question about GRU setting

我注意到你使用GRU模块将128维降维至32。这块对于GRU的设置处理是怎么思考的,GRU的降维是否会导致部分信息的损失?

mean and std Norm ?

hi,dear 大佬
对于得到的视频帧数据为何进行固定mean和std参数的Norm处理啊?
codes is here
transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
这个有什么参考或者说依据吗?
多谢

Question about V-BLIINDS&NIQE

Hi Dr.Li:

I notice you used NIQE and V-BLIINDS(needs NIQE as part of feature) as the compared methods.

I want to know if you tested with the default setting (125 pristine images with patch size set to 96X96 and sharpness threshold of 0.75) or you retrained and got new NIQE parameters?

Thanks!

请问数据读取是否有问题呢?

ref_ids = Info['ref_ids'][0, :]请问这里是不是有问题?因为info['ref_ids']是(n,1),所以如果ref_ids = Info['ref_ids'][0, :]这样的话,读取的只是一个数,这样后面都会乱了。

我修改成ref_ids = Info['ref_ids'][:, 0],然后训练之后,发现结果一直是不对的。请问大神,您知道这是怎么回事吗?

VSFA's Performance comparision with VIIDEO

Hi, Dr.Li:

I find there's a performance comparision with VIIDEO in Table 1. But the VIIDEO algorithm needs luminance as input according to the released code from LIVE.

I want to know if you just transformed the .mp4 file into .yuv and then send it to VIIDEO? Would this cause any disturbance to the final result?

Thanks a lot!

[Qustion] Overfitting problem

Hi, I'm trying to enhance validation mae error, recently I discovered that konvid 1k dataset has overfitting problem. (I'm also trying to train CNN layer also...)

In my specific case, the train mae is reduced to 0.01/1.0 but validation mae is 0.1/1.0 when the epoch is increased.

I think that the small number of train data(total 1200 ea) is one of the problem.

I'm thinking data augmentation like this : https://github.com/okankop/vidaug

Because konvid-1k dataset has MOS label, I think vertical or horizontal flip may good for reduce val mae..? (Not add noise level)

This may good way...? I'm trying to add this in the your provided code and will check this way may reduce val mae. Thanks you!!

跨数据库测试

你好,我注意到论文中并未涉及跨数据库的训练测试。本算法是否有进行过跨数据库的数据测试?

Performance on LIVE-VQC

Hi Dingquan, I am wondering do you have the results on LIVE-VQC [1]? I'd like to simply refer to your results, if available, such that I don't need to test by myself. Thanks!

[1] Z. Sinno and A. C. Bovik, “Large-scale study of perceptual video quality,” IEEE Trans. Image Process., vol. 28, no. 2, pp. 612–627, 2018

时间记忆效应参数选择

你好,我回顾了一下《TEMPORAL HYSTERESIS MODEL OF TIME VARYING SUBJECTIVE VIDEO QUALITY》。发现原论文对于时间记忆模型的参数最优选择为tau=2s,beta=0.8.而本论文中采用的tau=12,beta=0.5并不一致。是否有尝试过采用原文里的最佳参数,或者是说是因为本参数的选择与视频数据库的选择有极强的相关性。

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.