crossmodalgroup / gsmn Goto Github PK

View Code? Open in Web Editor NEW

162.0 7.0 30.0 72 KB

Implementation of our CVPR2020 paper, Graph Structured Network for Image-Text Matching

Python 100.00%

gsmn's People

Contributors

Stargazers

Watchers

gsmn's Issues

reproduction

Hi,

Thanks for your sharing. Could you please provide more details about how to reproduce the paper results? I simply run the train.py and test.py for Flickr 30, and can not obtain the proposed performance.

Thanks a lot!

Data_loader question

Hi,

I have a little question about your data-loader when I run the training process. The code here will update the self.bbox at each epoch when I debug it. Thus the boxes will get to zeros after a few epochs and take no influence on the model. Is there any miss understanding here? Many thank.

bboxes = self.bbox[img_id]
...
for i in range(k):
bbox = bboxes[i]
bbox[0] /= imsize['image_w']
bbox[1] /= imsize['image_h']
bbox[2] /= imsize['image_w']
bbox[3] /= imsize['image_h']
bboxes[i] = bbox

    captions = torch.Tensor(caps)

Time to update

What a wonderful job you have done! And i want to know when will you update your code on github. Thanks very much!

loss question

Dear contributor,

Many thanks for your time and attention.

When I run the code, I find that the loss does not change until the epoch 6. But normally the loss will perform fast at the beginning. Are there any settings here or any reasons for this situation?

020-09-15 21:05:17,981 Epoch: [5][1069/2266] Eit 12400 lr 0.0002 Le 25.600017547607422 Time 0.900 (0.000) Data 0.039 (0.000)
2020-09-15 21:06:52,140 Epoch: [5][1169/2266] Eit 12500 lr 0.0002 Le 25.60001564025879 Time 0.898 (0.000) Data 0.038 (0.000)
2020-09-15 21:08:22,427 Epoch: [5][1269/2266] Eit 12600 lr 0.0002 Le 25.600017547607422 Time 0.895 (0.000) Data 0.038 (0.000)
2020-09-15 21:09:56,861 Epoch: [5][1369/2266] Eit 12700 lr 0.0002 Le 25.600017547607422 Time 1.087 (0.000) Data 0.039 (0.000)
2020-09-15 21:11:45,949 Epoch: [5][1469/2266] Eit 12800 lr 0.0002 Le 25.600017547607422 Time 1.071 (0.000) Data 0.039 (0.000)
2020-09-15 21:13:34,645 Epoch: [5][1569/2266] Eit 12900 lr 0.0002 Le 25.600013732910156 Time 1.076 (0.000) Data 0.038 (0.000)
2020-09-15 21:15:23,490 Epoch: [5][1669/2266] Eit 13000 lr 0.0002 Le 25.60000991821289 Time 1.143 (0.000) Data 0.101 (0.000)
2020-09-15 21:16:54,946 Epoch: [5][1769/2266] Eit 13100 lr 0.0002 Le 25.600013732910156 Time 0.894 (0.000) Data 0.039 (0.000)
2020-09-15 21:18:24,783 Epoch: [5][1869/2266] Eit 13200 lr 0.0002 Le 25.600013732910156 Time 0.884 (0.000) Data 0.037 (0.000)
2020-09-15 21:19:54,308 Epoch: [5][1969/2266] Eit 13300 lr 0.0002 Le 25.600011825561523 Time 0.891 (0.000) Data 0.039 (0.000)
2020-09-15 21:21:27,408 Epoch: [5][2069/2266] Eit 13400 lr 0.0002 Le 25.60000991821289 Time 0.980 (0.000) Data 0.045 (0.000)
2020-09-15 21:22:58,489 Epoch: [5][2169/2266] Eit 13500 lr 0.0002 Le 25.600013732910156 Time 0.885 (0.000) Data 0.039 (0.000)
2020-09-15 21:24:27,616 Test: [0/79] Time 3.476 (0.000)

shard_xattn batch (15,78)
calculate similarity time: 432.85513067245483
2020-09-15 21:31:49,165 Image to text: 11.9, 31.0, 42.9, 16.0, 65.0
2020-09-15 21:31:49,413 Text to image: 5.7, 17.3, 26.1, 44.0, 103.4
runs/log
test
2020-09-15 21:31:53,452 Epoch: [6][3/2266] Eit 13600 lr 0.0002 Le 25.600006103515625 Time 0.962 (0.000) Data 0.106 (0.000)
2020-09-15 21:33:23,890 Epoch: [6][103/2266] Eit 13700 lr 0.0002 Le 25.600017547607422 Time 0.895 (0.000) Data 0.038 (0.000)
2020-09-15 21:34:53,814 Epoch: [6][203/2266] Eit 13800 lr 0.0002 Le 25.6002197265625 Time 0.894 (0.000) Data 0.038 (0.000)
2020-09-15 21:36:27,509 Epoch: [6][303/2266] Eit 13900 lr 0.0002 Le 25.59981346130371 Time 1.075 (0.000) Data 0.039 (0.000)
2020-09-15 21:38:15,702 Epoch: [6][403/2266] Eit 14000 lr 0.0002 Le 25.51906967163086 Time 1.076 (0.000) Data 0.039 (0.000)
2020-09-15 21:39:46,590 Epoch: [6][503/2266] Eit 14100 lr 0.0002 Le 23.482261657714844 Time 0.907 (0.000) Data 0.047 (0.000)
2020-09-15 21:41:17,698 Epoch: [6][603/2266] Eit 14200 lr 0.0002 Le 21.11315155029297 Time 0.883 (0.000) Data 0.038 (0.000)
2020-09-15 21:42:47,759 Epoch: [6][703/2266] Eit 14300 lr 0.0002 Le 19.28626251220703 Time 0.904 (0.000) Data 0.039 (0.000)
2020-09-15 21:44:18,304 Epoch: [6][803/2266] Eit 14400 lr 0.0002 Le 18.601877212524414 Time 0.903 (0.000) Data 0.040 (0.000)
2020-09-15 21:45:48,979 Epoch: [6][903/2266] Eit 14500 lr 0.0002 Le 18.63102149963379 Time 0.906 (0.000) Data 0.039 (0.000)
2020-09-15 21:47:19,877 Epoch: [6][1003/2266] Eit 14600 lr 0.0002 Le 16.385482788085938 Time 0.903 (0.000) Data 0.048 (0.000)
2020-09-15 21:48:54,785 Epoch: [6][1103/2266] Eit 14700 lr 0.0002 Le 12.813457489013672 Time 0.977 (0.000) Data 0.044 (0.000)
2020-09-15 21:50:33,562 Epoch: [6][1203/2266] Eit 14800 lr 0.0002 Le 16.835599899291992 Time 0.978 (0.000) Data 0.045 (0.000)
2020-09-15 21:52:05,988 Epoch: [6][1303/2266] Eit 14900 lr 0.0002 Le 14.264854431152344 Time 0.897 (0.000) Data 0.039 (0.000)
2020-09-15 21:53:36,136 Epoch: [6][1403/2266] Eit 15000 lr 0.0002 Le 14.423147201538086

Many Thanks!

Fine-tune problem

The author can tell me how to fine-tune the Bigru part of the code in the paper, and what are the specific settings.

验证数据集的疑问？

您好，有些问题想要请教您。
验证数据集有5070张图像和5070文本；处理成5000对图像和句子。为什么又把图像处理成1000张？
是指索引为0的图像和索引为0-4的文本相对应？那么图像索引为1-4的图像和文本索引1-4是没有关系的？
谢谢您

您好，有没有关于论文的视频，有一部分没整明白。

求一份可以直接用的数据集

数据集下载那块没太明白，是需要用SCAN重新跑出需要的数据集吗，请问有没有直接可用的数据集文件，求解答

about data prepare

Excuse me,Thank you for your great work,I find as readme say that the text feature, image bounding box and semantic dependency are precomputed,I want to try your idea on other dataset,Could you share your the part code with me?Thank you very much

请问evaluation.py 中opt.vocab_path = '/media/ubuntu/data/chunxiao/vocab/'文件夹如何获取

No such file or directory“/media/ubuntu/data/chunxiao/vocab/f30k_precomp_vocab.json”

Could you please release the visualization part of code

I was interested in the paper named "Graph Structured Network for Image-Text Matching" . I've also been working on the task recently. Your code in https://github.com/CrossmodalGroup/GSMN is pretty professional! And I was interested in the visualization part of job, I think it's very interesting and awesome , but I don't really know how to draw it in a professional way, so could you please release the visualization part of code?
I will appreciate it very much!

Unable to reach the accuracy in the paper

Thanks for your work.Refer to the accuracy you gave in the pre-trained model on Flickr30K.GSMN-dense:rsum: 481.4,GSMN-sparse:
rsum: 476.8; in the paper,GSMN-dense:rsum: 483.6,GSMN-sparse:rsum: 480.1. Have two models been used in evaluation when the paper got the dense or sparse like SCAN

Very low recall of training results 训练结果召回率很低

您好，我下载了您的程序以及在README和 #17 中提到的数据，但是训练后的Recall最高也不会高于1。程序的参数与README中的一致，请问是哪里有问题吗？

下面是我的目录结构：
./GSMN/
├── coco_dense.log
├── data.py
├── dependency_parser.py
├── evaluation.py
├── f30k_sparse.log
├── graph_model.py
├── layers.py
├── model.py
├── README.md
├── testall.py
├── test.py
├── test_stack.py
├── train.py
|── vocab.py
├── data
|   └── f30k_precomp
|   ├── dev_caps.json
│   ├── dev_caps.txt
│   ├── dev_ids.txt
│   ├── dev_ims_bbx.npy
│   ├── dev_ims.npy
│   ├── dev_ims_size.npy
│   ├── dev_precaps_stan.txt
│    ├── dev_tags.txt
│ ├── test_caps.json
│ ├── test_caps.txt
│   ├── test_ids.txt
│   ├── test_ims_bbx.npy
│    ├── test_ims.npy
│ ├── test_ims_size.npy
│ ├── test_precaps_stan.txt
│   ├── test_tags.txt
│   ├── train_caps.json
│   ├── train_caps.txt
│   ├── train_ids.txt
│   ├── train_ims_bbx.npy
│   ├── train_ims.npy
│   ├── train_ims_size.npy
│   ├── train_precaps_stan.txt
│   └── train_tags.txt
└── vocab
└── f30k_precomp_vocab.json

下面是我训练f30k数据集sparse模型的参数：
python train.py --data_path ./data/ --data_name f30k_precomp --vocab_path ./vocab/ --logger_name ./runs/run_f30k_sparse/log --model_name ./runs/run_f30k_sparse/checkpoint --bi_gru --max_violation --lambda_softmax=20 --num_epochs=30 --lr_update=15 --learning_rate=.0002 --embed_size=1024 --batch_size=64 --is_sparse

下面是经过30个epoch训练之后的评估结果：
Image to text: 0.1, 0.2, 0.3, 2230.0, 2225.5
Text to image: 0.1, 0.5, 0.9, 500.0, 500.4

我使用提供的预训练模型进行评估的结果与论文中的结果相近，但是自己训练的模型结果很差。我一开始怀疑是数据的问题，所以我下载了SCAN中的两个文件data (wget https://iudata.blob.core.windows.net/scan/data.zip) 和 vocab (wget https://iudata.blob.core.windows.net/scan/vocab.zip) 并替换了您所提供的文件中的相同部分，但是结果没有变化。所以我也不知道哪里出了问题。

reproduction

Hi,
First of all, I appreciate the article you wrote, the content is very clear, but through the code you opened, according to the parameters you provided, it is very difficult to reproduce the results in the paper, and even far from it. Is it possible to open the pretrain model for us to use? I really want to do something innovative with your work. Hope to get your reply.

where to download train_precaps_stan.txt？

I want to know where to download "train_precaps_stan.txt"? I think there are no train_precaps_stan.txt in SCAN.

Maybe a bug in layers.py (get_gaussian_weights)

def get_gaussian_weights(self, pseudo_coord):
...
weights = weights_rho * weights_theta (Line117)
weights[(weights != weights).detach()] = 0 (Line118)
(Line119)
# normalise weights (Line120)
weights = weights / torch.sum(weights, dim=1, keepdim=True) (Line121)
...

The dimension of weights should be (batch_size * K, neighbourhood_size, n_kernels),
but the actual dimension is (batch_size * K * neighbourhood_size, n_kernels),
meaning that the weights are normalized on n_kernels instead of neighbourhood_size channels,
and it is different from the operations in compute_weights(self, neighbourhood_weights) (Line227).

I wonder if this is intentional or not, and I‘m not sure whether it affects network performance.
Maybe add 【weights = weights.view(batch_size * K, neighbourhood_size, -1)】in Line119.

Implementation for sparse+dense

Dear contribution,

I read your paper and the best performance is sparse+dense. May I ask how to use sparse+dense but you only provide the config file with sparse or dense.

Thank you very much.

How to generate the bounding box?

Hi, thanks for your share

how can i generate the bounding box in my own database? can you provide the tool for generate it

fantastic work

wo lai gei shi jie da call!

How to get the data as in %s_precaps_stan.txt ?

Hi all,

I've encountered a challenge while working with my own data. Specifically, I'm unsure about how to compute the %s_precaps_stan value. Could anyone provide some guidance or assistance with this?

Thank you,
Yasmeen

How to use the SCAN image feature in GSMN?

Thanks for your share ! I am training the SCAN model now, but I don't know what features can be use in the GSMN.

now I confuse how to set the parameters in SCAN training. Can you provide more details about the model?

求论文代码

您好，关于论文的代码什么时候能够上传那？

FileNotFoundError

Traceback (most recent call last):
File "/home/zyh/simulation/cvpr2020/GSMN-master/train.py", line 276, in
main()
File "/home/zyh/simulation/cvpr2020/GSMN-master/train.py", line 100, in main
opt.vocab_path, '%s_vocab.json' % opt.data_name))
File "/home/zyh/simulation/cvpr2020/GSMN-master/vocab.py", line 57, in deserialize_vocab
with open(src) as f:
FileNotFoundError: [Errno 2] No such file or directory: '/home/zyh/simulation/cvpr2020/GSMN-master/coco_precomp/mscoco_precomp_vocab.json'

#######################################################################
The file "mscoco_precomp_vocab.json" is not found in data.zip, I will be greatful if you provide it.

coco测试数据集

请问coco数据集中的test_caps.json ， test_ims_bbx.npy， test_ims_size.npy文件没有吗？

the region boxes not match the original images

I found that the region box and size files you provided not match the original images, but it works.
i.e. in the valid dataset, the 23rd image is 1155138244.jpg, and its size is 333 * 500 (h * w), but in the file dev_ims_size.npy, it is 375*500.
How to explain this?

crossmodalgroup / gsmn Goto Github PK

gsmn's People

Contributors

Stargazers

Watchers

Forkers

gsmn's Issues

Thanks for your share ! I am training the SCAN model now, but I don't know what features can be use in the GSMN.

Recommend Projects

Recommend Topics

Recommend Org