paranioar / sgraf Goto Github PK

View Code? Open in Web Editor NEW

208.0 5.0 36.0 813 KB

[AAAI2021] The code of “Similarity Reasoning and Filtration for Image-Text Matching”

Python 100.00%

cross-modal-retrieval image-text-matching image-retrieval image-text-retrieval text-matching aaai similarity-metric

sgraf's Introduction

SGRAF

PyTorch implementation for AAAI2021 paper of “Similarity Reasoning and Filtration for Image-Text Matching”.

It is built on top of the SCAN and Awesome_Matching.

We have released two versions of SGRAF: Branch main for python2.7; Branch python3.6 for python3.6.

If any problems, please contact me at [email protected]. ([email protected] is deprecated)

Introduction

The framework of SGRAF:

The updated results (Better than the original paper)

Dataset	Module	Sentence retrieval			Image retrieval
Dataset	Module	R@1	R@5	R@10	R@1	R@5	R@10
Flick30k	SAF	75.6	92.7	96.9	56.5	82.0	88.4
	SGR	76.6	93.7	96.6	56.1	80.9	87.0
	SGRAF	78.4	94.6	97.5	58.2	83.0	89.1
MSCOCO1k	SAF	78.0	95.9	98.5	62.2	89.5	95.4
	SGR	77.3	96.0	98.6	62.1	89.6	95.3
	SGRAF	79.2	96.5	98.6	63.5	90.2	95.8
MSCOCO5k	SAF	55.5	83.8	91.8	40.1	69.7	80.4
	SGR	57.3	83.2	90.6	40.5	69.6	80.3
	SGRAF	58.8	84.8	92.1	41.6	70.9	81.5

Requirements

We recommended the following dependencies for Branch main.

Python 2.7
PyTorch (>=0.4.1)
NumPy (>=1.12.1)
TensorBoard
Punkt Sentence Tokenizer:

import nltk
nltk.download()
> d punkt

Download data and vocab

We follow SCAN to obtain image features and vocabularies, which can be downloaded by using:

https://www.kaggle.com/datasets/kuanghueilee/scan-features

Another download link is available below：

https://drive.google.com/drive/u/0/folders/1os1Kr7HeTbh8FajBNegW8rjJf6GIhFqC

Pre-trained models and evaluation

The pretrained models are only for Branch python3.6(python3.6), not for Branch main(python2.7).
Modify the model_path, data_path, vocab_path in the evaluation.py file. Then run evaluation.py:

python evaluation.py

Note that fold5=True is only for evaluation on mscoco1K (5 folders average) while fold5=False for mscoco5K and flickr30K. Pretrained models and Log files can be downloaded from Flickr30K_SGRAF and MSCOCO_SGRAF.

Training new models from scratch

Modify the data_path, vocab_path, model_name, logger_name in the opts.py file. Then run train.py:

For MSCOCO:

(For SGR) python train.py --data_name coco_precomp --num_epochs 20 --lr_update 10 --module_name SGR
(For SAF) python train.py --data_name coco_precomp --num_epochs 20 --lr_update 10 --module_name SAF

For Flickr30K:

(For SGR) python train.py --data_name f30k_precomp --num_epochs 40 --lr_update 30 --module_name SGR
(For SAF) python train.py --data_name f30k_precomp --num_epochs 30 --lr_update 20 --module_name SAF

Reference

If SGRAF is useful for your research, please cite the following paper:

  @inproceedings{Diao2021SGRAF,
     title={Similarity reasoning and filtration for image-text matching},
     author={Diao, Haiwen and Zhang, Ying and Ma, Lin and Lu, Huchuan},
     booktitle={Proceedings of the AAAI conference on artificial intelligence},
     volume={35},
     number={2},
     pages={1218--1226},
     year={2021}
  }

License

Apache License 2.0.

sgraf's People

Contributors

Stargazers

Watchers

sgraf's Issues

The explanation of joint and independent learning in Table 5.

In the class EncoderSimilarity(nn.Module) of model.py,
we define the SGR and SAF modules separately:

if module_name == 'SGR':
    self.SGR_module = nn.ModuleList([GraphReasoning(sim_dim) for i in range(sgr_step)])
elif module_name == 'SAF'.
    self.SAF_module = AttentionFiltration(sim_dim)
else.
    raise ValueError('Invalid input of opt.module_name in opts.py')

In other words, you can only train either SGR or SAF respectively, and average these two similarities offline to get the results of independent learning in Table5.

For joint learning, you only need to define both SGR and SAF simultaneously as follows:

self.SGR_module = nn.ModuleList([GraphReasoning(sim_dim) for i in range(sgr_step)])
self.SAF_module = AttentionFiltration(sim_dim)

Activating both modules in the forward process and averaging their similarities online to get the final result, which is then fed into the Rank Loss function.
We experimentally found that there is no gain in this joint learning, as shown in Table 5.

How to ensumble models

Thanks for your excellent work, I am sincerely appreciative.
In the paper, I saw you train SGR and SAF model seperately, but I want to know how can I get the result of SGRAF? I didn't find how to get the result of SGRAF in your Github. Is it to add the similarity obtained by the SGR and SAF models on the test set? I'm looking forward to your reply, thank you again from the bottom of my heart

The Loss is nan

Hello thankyou so much for reviewing this issue and reply

When I run train.py ,the result's Loss is nan,what's the problem?

How to test on MSCOCO5K？

code evaluation.py seems to test on mscoco1k
how to test on 5k?

About Similarity Pyramid

Hi, Could you please me how to understand the Similarity Pyramid(and Pyramid Spatial Window, different Pyramid Levels, etc.) which used in obtaining image feature that your paper memtioned???
In your released code, it was only region features extracted by Faster-RCNN(Bottom-up Attention)just as the Pioneers' work? I'm confused about that.
Thank you in Advance! :)

visualization problem

I am very interested in your work (SGRAF). I now encounter a visualization problem, that is, how to visualize the results of image retrieval text and the results of text retrieval image (as shown in the figure below). If it's convenient for you, please provide the code of visualization results. Thank you very much

!

About the formula in the paper

Hello author, I'm reading your paper. In the paper, the implementation of formulas 1, 3, 4, 5 is given in github where the code is located? As for the learnable parameter W in formula 1, I could not find the corresponding position in the code. Looking forward to your answer, thank you

bug

Is there any more detailed replay steps

I found based on redme and found different dataset formats and then started evaluation first but directly,

flickr8k

请问作者有没有尝试过在flickr8k这样的小数据集上做验证，flickr30k对我的实验来说时间有点久

run

i have set the virtual environment of my pycharm as same as torch1.2.0 and python2.7. But i can't run through after my train when i want to evaluate the model. it shows indexerror with indices for array. Is something wrong with my environment or my data or some other probability? I tried to solve it with my friends but we found that the code is totally correct. But it still can't successfully eval on my computer. This problem had been haunted me for few days. Thank u.

evaluate.py does not run with models provided - get error from numpy array copy

Attempted to run evaluation.py using provided MS Coco models.

cpu (ie non-gpu) version of Python 3.6 branch

In evaluation.py, line 103, appears to be attempting to insert a record into array img_embs

Specific line is

img_embs[ids] = img_emb.data.cpu().numpy().copy()

This line throws an Error:

IndexError: too many indices for array

Using provided code (evaluation.py) and MS Coco models, ids appears to be a tuple
which prints as a list of integers

img_emb.data is a Tensor object, so the assignment to a numpy array img_embs appears to be an
attempted conversion of a Tensor to a numpy array, however, the actual intent of the assignment
and a work-around for the Error is unclear

Its documented code and the results from the associated paper are good, but unfortunately
the provided models are not working, and do not allow the paper results to be duplicated

Please publish an update to the code which works with provided MS Coco models

I am out of my depth in attempting to update this code.

About the loss

Hello, I have a problem and want to ask for help. I tried to run your code, but I found that the loss of the model does not decrease and the evaluation index R1,R5,R10 does not increase and the index medr, meanr is very large

questions about visualization

Hello, I am very interested in your work, can you share the code for visual query
Many thanks!

Flickr 30k数据集提取的区域特征

你好，作者，我最近看到您这篇论文实验细节中，用到了Flickr 30k数据集，还提取了其区域特征，我想请问一下在哪里能下载论文实施细节提到的Flickr 30k数据集区域特征，麻烦看到了，回复一下，急用，谢谢！

I downloaded your code and ran the code according to the default hyperparameters, but found that the loss did not decrease. Do I need to do some other operations before running this code?

Please Help with Training -Thank you

Dear Professor Hiawen Diao,
I am sorry for troubling you. Your help so far has been extraordinary. I am sincerely appreciative.
I have been able to replicate your results. My supervisor said that is good. He asked if i can train my own model.

I have trouble with your data, and it may be my fault. If you could please talk about my data question. When you train a machine learning model, my understanding is that you need to include relevant documents in the training data. For example, include 4 of 5 coco captions for training, and hold one back for validation. If my understanding is correct, then I don't see where this happens in the code.

This is the last piece that my supervisor has asked for. Any clarification you can provide would be very helpful.

Thank you

Kent

A request over the info. of your used GPU?

代码

何时会公布代码？