ylqi / gl-rg Goto Github PK

View Code? Open in Web Editor NEW

20.0 2.0 4.0 523.19 MB

The code of IJCAI22 paper "GL-RG: Global-Local Representation Granularity for Video Captioning".

License: MIT License

Python 82.30% Shell 17.70%

gl-rg's People

Contributors

Stargazers

Watchers

Forkers

caidhome adeljalalyousif bangbangtangde dwhnicholas

gl-rg's Issues

The CIDER metric file is determined. Isn't there a mismatch between index of metric and data of the random Ground_Truth is selected? And I plan to do more experiments. Could you please post the refFile for calculating CIDEr? Thanks

关于data文件

data文件缺少train相关的文件，请问可以提供完整train的数据吗？非常感谢！！

数据预处理 ./preprocess.sh 中的小问题

颜教授您好，不好意思，想再请教一下在跑 install.md当中 ./preprocess.sh的时候出现如图片中所示问题其实评价指标部分的代码是有的但是它读取不到出现 ModuleNotFoundError vscode 中子目录读取不到父级目录中的代码显示 ModuleNotFoundError

（pycharm里面可以手动设置cider和coco-caption两个目录为资源根目录 但是不清楚 vscode里面怎么去做这种设置）抱歉应该蛮好解决的但是我实在是没找到怎么去设置

Hi! Why the shape of msvd_train_evalscores.pkl is [1200,17]?

Thank you for your great work! There are different numbers of captions for every video in MSVD dataset, such as 29, 42……But I found that the shape of msvd_train_evalscores.pkl is [1200,17], why there are only 17 captions' scores for every video in training set?

Hi, I have a question that why the number of test videos in 'msvd_test_sequencelabel.h5' is 470?

I remember the the standard split contains 1.2K training videos, 100 validation videos, and 670 test videos, so why you did not use the other 200 test videos? Thank u!

about pretrained models

Hi,
I notice you said "Our long-range encoder is pre-trained on the video-to-words dataset (k=300 words) extracted from MSR-VTT or MSVD" in your paper, I wanna to know whether the whole datasets(train, valid, test) were used in your pretraining phase. If so, I think it will lead to serious information leakage. Could you please release your pretraining code? thx : ).

问一下论文中对应三个encoder特征抽取层面的代码可以放出来嘛？

颜教授您好，不好意思，想请教一下论文中对应三个encoder特征抽取层面的代码好像在项目中没有找到，可以放出来嘛？想了解一下传统方式抽取特征的话代码实现层面是怎么操作的，谢谢

About the msvd_train_evalscores.pkl

Hello, Thank you for sharing your amazing work. I have some questions:
1- Can you shortly explain in steps not as code how to obtain the metric scores m( ˆ S) of all ground truths captions
2- in file "GL-RG\data\preprocess\compute_scores.py" :-

    for i in range(args.seq_per_img):
        logger.info('taking caption: %d', i)
        preds_i = {v: [gt_refs[v][i]] for v in videos}

        # removing the refs at i
        if args.remove_in_ref:
            gt_refs_i = {v: gt_refs[v][:i] + gt_refs[v][i + 1:] for v in videos}
        else:
            gt_refs_i = gt_refs

        for scorer, method in scorers:
            score_i, scores_i = scorer.compute_score(gt_refs_i, preds_i)

Why both gt_refs_i and preds_i are equel to gt_refs , but I think preds_i should be the predicted caption based on the initial training with XE

Traceback (most recent call last):
  File "test.py", line 130, in <module>
    assert opt.feat_dims == test_loader.get_feat_dims()
AssertionError

I did not make any change before this.

请求训练代码

您好，非常感谢您的工作！请问训练的主函数代码可以开源吗？train.py只是定义了一些方法而没有程序入口和主函数。