zhangxuying1004 / rstnet Goto Github PK

Official Code for 'RSTNet: Captioning with Adaptive Attention on Visual and Non-Visual Words' (CVPR 2021)

License: BSD 3-Clause "New" or "Revised" License

Python 100.00%

cvpr2021 image-captioning multimodal python pytorch transformer

rstnet's Introduction

Hi, I'm Xu-Ying 👋

👨‍🎓 I am currently a second-year Ph.D. student at Nankai University, supervised by Prof. Ming-Ming Cheng. Earlier, I obtained my M.S. degree under the supervision of Prof. Rongrong Ji and Prof. Xiaoshuai Sun at Xiamen University.
👁️ Recently, I focus on multi-modal learning, camouflaged scene understanding, and NeRF-based 3D vision.
📋 All my research works: Google scholar.
📫 How to reach me: [email protected].

Visitor Count

rstnet's People

Contributors

Stargazers

Watchers

rstnet's Issues

关于运行项目时产生的问题

作者您好！很庆幸能看到你的作品！我在自己的电脑调试项目的时候，出现了很多意想不到的问题，我刚接触cv+nlp领域，有些方面还不太了解，想请教一下您

readme中提到代码是在python3.6环境下运行的，我使用的是Windows系统的python3.9 是否会因为版本不一样导致运行时报错？
在train_transformer中，dataloader_train函数中shuffle如果为true，会产生ValueError: DataLoader with IterableDataset: expected unspecified shuffle option, but got shuffle=True
把shuffle改成false之后，会出现下面的报错（ File "E:\Anaconda\lib\multiprocessing\reduction.py", line 60, in dump
ForkingPickler(file, protocol).dump(obj)
TypeError: 'generator' object is not callable），想问问是否是上面的版本系统不兼容的问题引起的呢？
X-101_features中使用的是关于2014和2015的pth文件，在这个条件下用自己的数据集（例如说COCO2017版本的数据集）会不会有影响，如果要用自己的数据集需要进行什么修改吗？

期待作者您的回复，非常感谢！

请问，为什么强化学习阶段训练损失是负数啊？

关于特征处理

师兄您好，非常有幸能看到您团队的这篇工作，我对该工作非常有兴趣，但是在制作hdf5特征时遇到了一些问题，特此想请教一下您：
我在解压了X-101-features.tgz后，运行feats_process.py时遇到了一点意外。程序报错截图如下，请问如何解决呢？
非常期待您的回复，感谢！

About cross_entropy result

Thank you for your work and open source code. I use your code to train the rstnet, and I get the language model by myself. However, at the end of the cross-entropy training stage, I get the CIDEr score about 103, is this normal？
How much the CIDEr score should be expected at the end of the cross-entropy training stage? And how many epochs should be trained during reinforcement learning？

About how to get the caption of a input pic with my own pretrained model.

Hello, thanks for sharing your code!
But I wonder how to get a image's caption like figure 1(b) or 6 in your artical
Could you share me the code ?
Thanks a lot!

online test 提交

非常抱歉再次打扰你，我使用你开源的 test_online 代码生成两个 json 文件并提交后，并没有顺利给出分数，反而报错了：

请问你有遇到过这个问题吗。我是在 win11 系统下把两个 json 文件放到 result 文件夹下，然后打包成 zip 文件：

是我文件结构有问题吗？非常期待你的回复

Some question about custom dataset

Dear author @zhangxuying1004 ,
It is a wonderful work. If my own dataset have not Object Detection Annotation, how can i use your model?

language Pre-trained model

Hello, thank you for sharing your code and pretrained model. But when I use your language pretrained model 'language_context.pth', I received the following error. Could you please tell me how to solve it? Thank you very much!

No such file or directory: 'vocab_language/vocab_bert_language.pkl'

Hello,author,Thanks for your sharing,What is the reason for this error？

RuntimeError: gather(): Expected dtype int64 for index

When I run train_transformer.py, I'm having difficulty, showing RuntimeError: gather(): Expected dtype int64 for index, can anyone tell me how to solve it?
thank you very much

spice分数评估失败

您好，我在使用spice评估时，调用spice-1.0.jar包时总会发生CalledProcessError的错误，很疑惑

rstnet.pth

The link of rstnet.pth is invalid.

Question about the position encoding

Hello author, thank you very much for sharing your source code with us. I have two questions I would like to ask you about the position encoding in your model.

Are the two parts of the image pointed by the red arrow equivalent?
Position encoding is mentioned only once in the picture，but in the section 3.2 of the paper, there is another position encoding mentioned in Eq. (8).
How position encoding is performed?
In Figure 3 of the paper, Does the position encoding occur between the bert module and the Masked Multi-Head Attention module?

特征下载

您好，非常感谢您能开源论文的代码。您提供的X-101-features特征有326G, 我下载时总是因为网络中断而下载失败，请问您有百度网盘版的特征链接吗？

about x152 grid features

Thanks for this amazing work.
Could you provide the x152 pretrained grid feature h5 file used in this paper?

batchsize设置

作者您好，代码中交叉熵的batchsize设置为50，但强化学习时batchsize除以5，这样缩小batchsize的原因是什么呢？这样的操作会使读取速度降低。请问这样有什么依据吗？期待您的回复。

关于ensemble模型

您好，我在您所开源的代码中注意到有‘’TransformerEnsemble‘’这个类，请问应该如何使用它训练及测试指定数量的ensemble模型呢（比如指定模型是 ensemble 2 或者ensemble 4）？

ImportError: cannot import name 'TensorRecorder'

您好，当我运行train_transformer.py文件时, "from middle import TensorRecorder"这句出现了问题. ImportError: cannot import name 'TensorRecorder'. 我已经安装了middle包, 但是不知道错误出在哪里。

关于X152_grid_feature文件

你好，可以发一下X152的特征文件吗？

测试过程

非常高兴看到您出色的工作，给我带来了很大帮助。在测试阶段我遇到了一个问题，在训练阶段，预训练语言模型通过融合所有前文来预测下一个单词，但是在测试中模型似乎只能通过上一步的一个单词来预测，不知道是不是我的理解有误，如果确实是这样，这种测试方法是否会降低影最终的效果？

This is too many bugs, I gave up

Issues when trying to set up environment m2release

I use conda on Google Colab, not my local.

When I was creating the environment, It couldn't found these packages:

I try to delete these dependencies from environment.yml, and start to create the environment again, it was successfully installed, but when I activate the m2release env, an error happened:

I can't figure out what's wrong, hope someone can help me to solve it.

Much thanks.

question about warning

hello,author.I have two question about your code.
1.In the field.py, you import default_collate,but there is no default_collate in the dataloader.

2.Could you tell me how to deal the User warning,thank you.

vocab.pkl

Hello，thank you for sharing you code. I have checked the size of vocab.pkl and found that it is 10199 not 10201. I am strange to this field so could you please tell me what the other two token are?

关于eval

您好，很荣幸能够拜读到您的论文并且欣赏您的代码，想请问一下，由于特征文件300多个g过于庞大，我们如果只想进行captioning验证推理的话一定需要该特征文件吗？然后就是，是否可以通过coco2014数据集直接生成特征文件，避免下载过程。

非常感谢

关于实验结果

你好，你所的工作很不错，对此很感兴趣。但是我按照你所给的提示运行了代码，在在RL训练前是没有问题的，但是进行RL训练后模型的性能急剧下降，最终的cider为0.009，我不清楚是那里出现问题了，能否给一些意见

只生成<'unk'>，无法生成其他单词

你好，我运行 test.py 文件时，模型只会生成 <'unk'> tokens，没有其他单词，请问是什么原因呢

about original paper

Thank you for opening sources.
Could you provide the paper for this project?

关于 SPICE 分数

非常感谢你能开源论文的代码。但是在 evaluation 中，并没有计算 spice 分数的相关代码，我想知道你是如何得到 spice 分数的？

关于rstnet.pth

您好，非常感谢您的开源代码！
您的预训练模型链接失效了，能麻烦您重新上传一下吗？

Unresolved reference 'transformers'

您好，请问models/rstnet/language_model.py第4行，BertModel模块如何引入？显示未找到transformers模块

关于Online Evaluation

您好！请问用于生成在线评测的代码是否包含在项目内呢？还是需要我们自己去编写呢

About the Pre-trained Model

Hi, that's very interesting work that reuses the grid feature again.

I want to test and reuse the pre-trained model for some tasks,
However, according to the instruction in README, the file is currently inaccessible.
Would u mind uploading the pre-trained model file again?

Thx alot~

visualness

Hello, I am interested in your visualness word. How will deal with the token that haven't occured for one image?

关于X-101特征文件

您好，非常感谢开源您的代码。
您所分享的X-101-features特征，我在解压该文件时遇到很大困扰，它首先解压出800多GB的tar压缩包，而后提取其中的pth文件，2TB内存都不能成功解压，请问有处理后的特征吗？可以分享我一下吗？
另外论文中的语言模型特征是要自己训练吗？

online特征提取

您好！请问您是怎么提取的在线特征啊，我在使用grid-feats-vqa .提取时会报错：

[06/16 14:17:39 fvcore.common.checkpoint]: [Checkpointer] Loading from others/X-101.pth ...
Traceback (most recent call last):
  File "/home/bwh/anaconda3/envs/m2release/lib/python3.6/site-packages/detectron2/data/catalog.py", line 55, in get
    f = DatasetCatalog._REGISTERED[name]
KeyError: 'coco_2014_test'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/bwh/python/d_3_1/others/extract_region_feature.py", line 139, in <module>
    data_loader = build_detection_test_loader_with_attributes(cfg, dataset_name)
  File "/home/bwh/python/d_3_1/others/grid_feats/build_loader.py", line 88, in build_detection_test_loader_with_attributes
    else None,
  File "/home/bwh/anaconda3/envs/m2release/lib/python3.6/site-packages/detectron2/data/build.py", line 224, in get_detection_dataset_dicts
    dataset_dicts = [DatasetCatalog.get(dataset_name) for dataset_name in dataset_names]
  File "/home/bwh/anaconda3/envs/m2release/lib/python3.6/site-packages/detectron2/data/build.py", line 224, in <listcomp>
    dataset_dicts = [DatasetCatalog.get(dataset_name) for dataset_name in dataset_names]
  File "/home/bwh/anaconda3/envs/m2release/lib/python3.6/site-packages/detectron2/data/catalog.py", line 59, in get
    name, ", ".join(DatasetCatalog._REGISTERED.keys())
KeyError: "Dataset 'coco_2014_test' is not registered! Available datasets are: coco_2014_train, coco_2014_val, coco_2014_minival, coco_2014_minival_100, coco_2014_valminusminival, coco_2017_train, coco_2017_val, coco_2017_test, coco_2017_test-dev, coco_2017_val_100, keypoints_coco_2014_train, keypoints_coco_2014_val, keypoints_coco_2014_minival, keypoints_coco_2014_valminusminival, keypoints_coco_2014_minival_100, keypoints_coco_2017_train, keypoints_coco_2017_val, keypoints_coco_2017_val_100, coco_2017_train_panoptic_separated, coco_2017_train_panoptic_stuffonly, coco_2017_val_panoptic_separated, coco_2017_val_panoptic_stuffonly, coco_2017_val_100_panoptic_separated, coco_2017_val_100_panoptic_stuffonly, lvis_v0.5_train, lvis_v0.5_val, lvis_v0.5_val_rand_100, lvis_v0.5_test, lvis_v0.5_train_cocofied, lvis_v0.5_val_cocofied, cityscapes_fine_instance_seg_train, cityscapes_fine_sem_seg_train, cityscapes_fine_instance_seg_val, cityscapes_fine_sem_seg_val, cityscapes_fine_instance_seg_test, cityscapes_fine_sem_seg_test, voc_2007_trainval, voc_2007_train, voc_2007_val, voc_2007_test, voc_2012_trainval, voc_2012_train, voc_2012_val, visual_genome_train, visual_genome_val, visual_genome_test"

$ wget https://drive.google.com/file/d/1sayx7qwOd79XE4RFdvSXG3zyQH4FpyYJ/view?usp=sharing
--2022-03-23 18:09:05-- https://drive.google.com/file/d/1sayx7qwOd79XE4RFdvSXG3zyQH4FpyYJ/view?usp=sharing
Resolving drive.google.com (drive.google.com)... 31.13.95.37, 2001::68f4:2b68
Connecting to drive.google.com (drive.google.com)|31.13.95.37|:443... failed: Connection timed out.
Connecting to drive.google.com (drive.google.com)|2001::68f4:2b68|:443... failed: Network is unreachable.
，网盘账号是15603379965

zhangxuying1004 / rstnet Goto Github PK

rstnet's Introduction

Hi, I'm Xu-Ying 👋

Visitor Count

rstnet's People

Contributors

Stargazers

Watchers

Forkers

rstnet's Issues

Recommend Projects

Recommend Topics

Recommend Org