tencentarc / mcq Goto Github PK

View Code? Open in Web Editor NEW

133.0 133.0 16.0 7.6 MB

Official code for "Bridging Video-text Retrieval with Multiple Choice Questions", CVPR 2022 (Oral).

Python 99.53% Shell 0.47%

mcq's People

Contributors

Stargazers

Watchers

Forkers

huangjh98 gymat python-repository-hub wqshao126 peterzs avinashsai zhihong1224 cv-ip huiyangzhou bpiyush shuxjweb mayuelala ilkerkesen ngoctuan1 anoop-qasolve lovecove

mcq's Issues

How to finetune on the MSRVTT

Hello, wonderful project!. Here I wonder how to finetune the pre-trained models on downstream video-text retrieval datasets like MSR-VTT, LSMDC, and MSVD? I notice that the script for zero-shot retrieval has been provided, but there is no script about how to finetune on retrieval datasets.

您好不知道方便给一下CLIP-initialized model的代码么~

我的意思是CLIP-initialized model 的MCQ模型代码，特别是BridgeFormer与VideoFormer和TextFormer的交互部分。

why u add three [MASK] before answer?

Test config of paper result

hello, I want to know whether the result in the paper comes from sliding_window_stride=12 or default=-1？

A regression head for MVM in MILES model ?

I want to know if there is a regression head for MVM during the MILES pretraining phase.

Will you have a plan to release the code?

I have read your paper MCQ and really appreciate it. So do you have a plan to release the code?

How to extract noun phrase and verb?

Thanks to your great work!
I want to extract noun phrases and verbs on my own dataset, could you please tell me what tool you used to extract it?

Questions about action recognition

As mentioned in table 4, there are 3 different test split. How are the specific test sets selected and how many are there? Also for the table 5, what is the training data and what is the test data

Inference code

Given a video I want to do captioning, or as you sugest answer questioning? Is it something possible?

How did you do the visualization? Is there any code to test?

Hello, when will you release the training and testing code of MSVD & LSMDC & DiDeMo?

Thx !

论文中msvd的实验结果问题

论文表2(a)中msvd检索结果引用了Frozen的，zero-shot为33.7，fine-tuning为45.6。请问这个结果是复现得到还是原论文中的？我在原Frozen论文中只看到了一个33.7的R@1，应该是fine-tuning的结果。

数据集

请问您能否共享一下去除三元组后的数据集

Is there any scripts that I can used for extracting the noun phrase

Hi, I want to know how to extract the phrase in the paper? I saw the issue that mentioned extracting the noun phrases, but it did not consistent what presented in the paper. For example, how to extract "an old woman" rather than "woman"?

Is there any scripts that I can used for extracting the phrases?

Why is three [MASK] in noun/ verb answer clause, not one or two?

Hi,
I'm wondering why you add three [MASK] in answers. I have seen your reply in #7, but I still don't know why the number of [MASK] and whether it is important.
Any reply will be helpful!
Thank you for your good job again.

Can't reproduce the reported MSRVTT(zero-shot) results with the released model weights

Hello, thank you for the code of MCQ! We utilize the released weights and follow the data settings, trying to reproduce MSRVTT ZS results. But our result(R@1) is about four points lower than the reported result in the paper. Is there any place we need to pay attention to? Thank you.

Training Script for CLIP

Hi, is there any script or config that use CLIP as the initialization?