lidq92 / vsfa Goto Github PK
View Code? Open in Web Editor NEW[official] Quality Assessment of In-the-Wild Videos (ACM MM 2019)
Home Page: https://lidq92.github.io/VSFA/
License: MIT License
[official] Quality Assessment of In-the-Wild Videos (ACM MM 2019)
Home Page: https://lidq92.github.io/VSFA/
License: MIT License
By running the command:
A error is thrown that the file or directory "CNN_features_KoNViD-1k/1_resnet-50_res5c.npy" is missing (FileNotFoundError)
Can you please add this file or provide a link to download it.
Thanks in advance.
Here the terminal log
$ CUDA_VISIBLE_DEVICES=0 python VSFA.py --database=KoNViD-1k --exp_id=0 EXP ID: 0 KoNViD-1k VSFA Traceback (most recent call last): File "VSFA.py", line 161, in <module> train_dataset = VQADataset(features_dir, train_index, max_len, scale=scale) File "VSFA.py", line 30, in __init__ features = np.load(features_dir + str(index[i]) + '_resnet-50_res5c.npy') File "/<SOME_PATH>/numpy/lib/npyio.py", line 428, in load fid = open(os_fspath(file), "rb") FileNotFoundError: [Errno 2] No such file or directory: 'CNN_features_KoNViD-1k/1_resnet-50_res5c.npy'
Line 81 in f5af5ee
你好,我注意到论文中的FC层的设置为先通过全连接FC 4096->128 后 RELU+dropout 再全连接FC 32->32 。 这样设置是有什么依据吗? 是否尝试过在GRU之后回归得分时采用类似操作来提升性能?
Hi, I'm trying to enhance validation mae error, recently I discovered that konvid 1k dataset has overfitting problem. (I'm also trying to train CNN layer also...)
In my specific case, the train mae is reduced to 0.01/1.0 but validation mae is 0.1/1.0 when the epoch is increased.
I think that the small number of train data(total 1200 ea) is one of the problem.
I'm thinking data augmentation like this : https://github.com/okankop/vidaug
Because konvid-1k dataset has MOS label, I think vertical or horizontal flip may good for reduce val mae..? (Not add noise level)
This may good way...? I'm trying to add this in the your provided code and will check this way may reduce val mae. Thanks you!!
CVD2014 dataset is not avaliable on the website. Can you provide the dataset?
你好,我回顾了一下《TEMPORAL HYSTERESIS MODEL OF TIME VARYING SUBJECTIVE VIDEO QUALITY》。发现原论文对于时间记忆模型的参数最优选择为tau=2s,beta=0.8.而本论文中采用的tau=12,beta=0.5并不一致。是否有尝试过采用原文里的最佳参数,或者是说是因为本参数的选择与视频数据库的选择有极强的相关性。
Hi, Dr.Li:
I find there's a performance comparision with VIIDEO in Table 1. But the VIIDEO algorithm needs luminance as input according to the released code from LIVE.
I want to know if you just transformed the .mp4 file into .yuv and then send it to VIIDEO? Would this cause any disturbance to the final result?
Thanks a lot!
Hi Dr. Li. Sorry for a third question. When I want to test a self-collected datasets, the test_demo.py give a error feedback that memory is not enough
RuntimeError: CUDA out of memory. Tried to allocate 676.00 MiB (GPU 0; 10.76 GiB total capacity; 1.04 GiB already allocated; 602.56 MiB free; 1.56 GiB reserved in total by PyTorch)
I would guess my video resolution caused this with a 1080*1920 ,however, even the provided test.mp4
get the same error.
I want to know if the all the tests are right in your experiments with any resolution videos?
Thank you!
我注意到你使用GRU模块将128维降维至32。这块对于GRU的设置处理是怎么思考的,GRU的降维是否会导致部分信息的损失?
@lidq92 Could you share me the script for the data/KoNViD-1kinfo.mat
?Thank you very much!!
Due to the ability,can you upload the version of MATLAB。THANKS
Hi, I want to train the CNN instead of transfer learning and freeze it.
So I integrated CNN part and LSTM part to one class, also modified the train data module.
However, the dimension of CNN output was [timesteps x features x batches x 1]
and input of GRU was [batches x timesteps x features].
Because CNN output and GRU(batchfirst=True) input has difference in terms of dimension,
I used squeeze and unsqueeze and swapaxes technique and it worked. (I'm not 100% sure)
Here is my code.
def forward(self, pixel, video_length):
for avg_pooling, model in enumerate(self.batch):
pixel = model(pixel)
if avg_pooling == 1:
#print("D0_1 :", pixel.shape) # timesteps 576(FILTERS) 1 1
avg_pooling_2d = nn.functional.adaptive_avg_pool2d(pixel, 1) # timesteps 576(FILTERS) 1 1
#print("D0_2 :", avg_pooling_2d.shape)
#avg_pooling_2d = torch.swapaxes(avg_pooling_2d, 2, 0)
#avg_pooling_2d = torch.swapaxes(avg_pooling_2d, 1, 2)
#avg_pooling_2d = torch.squeeze(avg_pooling_2d, dim=3)
#avg_pooling_2d = SA(avg_pooling_2d)
#print("D1 :", avg_pooling_2d.shape) #1 timesteps FILTERS
input = self.ReLU(avg_pooling_2d)
#print("D2 :", input.shape) #1 24 256
outputs, _ = self.rnn(input, torch.zeros(1, 1, 64, device=device))
I used batch size to 1 to simplity...
But My train and validation loss is too high... and I think the swapaxes? maybe a reason for unsusseful training....?(loss backpropagation is errored cuz of swapaces...? I'm not sure)
Could you may update your code that may train the CNN instead of just load it??
< I may expect much better performance when it may succeess...^^;;; >
I made it but I think my code has un error in somewhat although it can run in python....
(I experienced CUDA memory out, so I used some timesteps like using every 8th frames, that helped resolve the CUDA memory out)
Thanks..
Thanks for the code. But when I reproduce the paper,I found that in 'KoNViD-1k' dataset, the video clips is '5319047612.mp4',but the code in data_info_maker.m is to get the file_name like '5319047612_cut_centercrop_960x540_8s.mp4' ,thus will return erro
FileNotFoundError: [Errno 2] No such file or directory: './data/KoNViD-1k/KoNViD_1k_videos/8536919744 _original_centercrop_960x540_8s.mp4'
I don't know if the code is wrong or the dataset I downloaded has a different file name
hi,dear 大佬
对于得到的视频帧数据为何进行固定mean和std参数的Norm处理啊?
codes is here
transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
这个有什么参考或者说依据吗?
多谢
您好!Content-Aware Feature Extraction部分的池化操作不仅用了全局平局池化也用了这个global average pooling operation,这个池化是全局标准差池化的意义是啥呢,求解。为什么不用全局平均池化和全局最大池化呢,会不会效果更好呢?谢谢解答。
前辈您好,现在这两个数据集下载链接失效了,您可以帮忙提供一下百度云下载链接嘛,谢谢您
我发送了申请,但是返回的地址打不开...
hi,大佬
这个操作
def global_std_pool2d(x):
"""2D global standard variation pooling"""
return torch.std(x.view(x.size()[0], x.size()[1], -1, 1),
dim=2, keepdim=True)
这个std的池化有什么参考吗?我看论文似乎没说啊,
最后两个维度HW为什么还要保持呢?
多谢。
请问KoNViD-1k数据集没有指定的验证集嘛,如果没有固定的验证集,那做实验的时候如何判断验证的效果是否是因为随机数据
在载入大文件视频时会存在内存溢出或者cuda显存溢出问题,希望作者可以改进io这里的方法,分批次读取并推理得分
作者您好,在我复现您的代码出现了几个问题
1、执行您的源码时发现models路径下并不存在代码中train_model_file所需的名为VSFA-KoNViD-1k-EXP0的模型,
2、当我将此变量更换为使用models内原有的VSFA.pt时,使用默认学习率训练,结果出现SROCC、KROCC、PLCC均为nan,RMSE为0.625的情况,在将学习率调低之后,使用了权重衰减再次训练仍然出现此情况
3、请问训练结束后的模型是直接更新了原有的VSFA.pt吗
Hi! I notice that there is a pre-trained model file 'VSFA.pt' in the directory 'models'.
Could you please tell me which database is the pre-trained model trained on?
Thank you so much.
Hi Dingquan, I am wondering do you have the results on LIVE-VQC [1]? I'd like to simply refer to your results, if available, such that I don't need to test by myself. Thanks!
[1] Z. Sinno and A. C. Bovik, “Large-scale study of perceptual video quality,” IEEE Trans. Image Process., vol. 28, no. 2, pp. 612–627, 2018
Hi, I found a new paper which was aimed at IQA used a novel loss function named 'Norm in Norm'. Yeah, it's a great new work belongs to you and your team, I wonder know weather it is evaluated in VQA case?
论文中所提到的CORNIA算法,我尝试过寻找开源代码,但并未找到。能否提供一下V-CORNIA的开源出处。十分感谢
Thanks for your great work! Could you provide the pretrained weight using LIVE-Qualcomm dataset? Thanks very much!
Hi Dr.Li:
I notice you used NIQE and V-BLIINDS(needs NIQE as part of feature) as the compared methods.
I want to know if you tested with the default setting (125 pristine images with patch size set to 96X96 and sharpness threshold of 0.75) or you retrained and got new NIQE parameters?
Thanks!
您好,请问下载的KoNViD-1k数据集内视频的名字与加载的video_names(8424428827_original_centercrop.mp4)名称不一导致找不到文件是什么原因?
您好,我用您的模型测试一些视频文件的时候发现会出现一些结果为负数的情况。另外我看到您是最近在get_features()函数中添加了 extractor.eval() 这句话 ,删除这句话后, 测试结果中没有负数值, 整体结果也比较好。extractor.eval()这句究竟该不该要。。。
你好,我注意到论文中并未涉及跨数据库的训练测试。本算法是否有进行过跨数据库的数据测试?
Hello! We have recently launched and evaluated this algorithm on the dataset of our video quality metrics benchmark. The dataset distortions refer to compression artifacts on professional and user-generated content. Method took 10th place on the global leaderboard and 4th place on the no-reference-only leaderboard in the terms of SROCC. You can see more detailed results here. If you have any other video quality metric (either full-reference or no-reference) that you want to see in our benchmark, we kindly invite you to participate. You can submit it to the benchmark, following the submission steps, described here.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.