Git Product home page Git Product logo

sgraf's Introduction

SGRAF

PyTorch implementation for AAAI2021 paper of “Similarity Reasoning and Filtration for Image-Text Matching”.

It is built on top of the SCAN and Awesome_Matching.

We have released two versions of SGRAF: Branch main for python2.7; Branch python3.6 for python3.6.

If any problems, please contact me at [email protected]. ([email protected] is deprecated)

Introduction

The framework of SGRAF:

The updated results (Better than the original paper)

Dataset Module Sentence retrieval Image retrieval
R@1R@5R@10 R@1R@5R@10
Flick30k SAF 75.692.796.9 56.582.088.4
SGR 76.693.796.6 56.180.987.0
SGRAF 78.494.697.5 58.283.089.1
MSCOCO1k SAF 78.095.998.5 62.289.595.4
SGR 77.396.098.6 62.189.695.3
SGRAF 79.296.598.6 63.590.295.8
MSCOCO5k SAF 55.583.891.8 40.169.780.4
SGR 57.383.290.6 40.569.680.3
SGRAF 58.884.892.1 41.670.981.5

Requirements

We recommended the following dependencies for Branch main.

import nltk
nltk.download()
> d punkt

Download data and vocab

We follow SCAN to obtain image features and vocabularies, which can be downloaded by using:

https://www.kaggle.com/datasets/kuanghueilee/scan-features

Another download link is available below:

https://drive.google.com/drive/u/0/folders/1os1Kr7HeTbh8FajBNegW8rjJf6GIhFqC

Pre-trained models and evaluation

The pretrained models are only for Branch python3.6(python3.6), not for Branch main(python2.7).
Modify the model_path, data_path, vocab_path in the evaluation.py file. Then run evaluation.py:

python evaluation.py

Note that fold5=True is only for evaluation on mscoco1K (5 folders average) while fold5=False for mscoco5K and flickr30K. Pretrained models and Log files can be downloaded from Flickr30K_SGRAF and MSCOCO_SGRAF.

Training new models from scratch

Modify the data_path, vocab_path, model_name, logger_name in the opts.py file. Then run train.py:

For MSCOCO:

(For SGR) python train.py --data_name coco_precomp --num_epochs 20 --lr_update 10 --module_name SGR
(For SAF) python train.py --data_name coco_precomp --num_epochs 20 --lr_update 10 --module_name SAF

For Flickr30K:

(For SGR) python train.py --data_name f30k_precomp --num_epochs 40 --lr_update 30 --module_name SGR
(For SAF) python train.py --data_name f30k_precomp --num_epochs 30 --lr_update 20 --module_name SAF

Reference

If SGRAF is useful for your research, please cite the following paper:

  @inproceedings{Diao2021SGRAF,
     title={Similarity reasoning and filtration for image-text matching},
     author={Diao, Haiwen and Zhang, Ying and Ma, Lin and Lu, Huchuan},
     booktitle={Proceedings of the AAAI conference on artificial intelligence},
     volume={35},
     number={2},
     pages={1218--1226},
     year={2021}
  }

License

Apache License 2.0.

sgraf's People

Contributors

paranioar avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

sgraf's Issues

The explanation of joint and independent learning in Table 5.

In the class EncoderSimilarity(nn.Module) of model.py,
we define the SGR and SAF modules separately:

if module_name == 'SGR':
    self.SGR_module = nn.ModuleList([GraphReasoning(sim_dim) for i in range(sgr_step)])
elif module_name == 'SAF'.
    self.SAF_module = AttentionFiltration(sim_dim)
else.
    raise ValueError('Invalid input of opt.module_name in opts.py')

In other words, you can only train either SGR or SAF respectively, and average these two similarities offline to get the results of independent learning in Table5.

For joint learning, you only need to define both SGR and SAF simultaneously as follows:

self.SGR_module = nn.ModuleList([GraphReasoning(sim_dim) for i in range(sgr_step)])
self.SAF_module = AttentionFiltration(sim_dim)

Activating both modules in the forward process and averaging their similarities online to get the final result, which is then fed into the Rank Loss function.
We experimentally found that there is no gain in this joint learning, as shown in Table 5.

How to ensumble models

Thanks for your excellent work, I am sincerely appreciative.
In the paper, I saw you train SGR and SAF model seperately, but I want to know how can I get the result of SGRAF? I didn't find how to get the result of SGRAF in your Github. Is it to add the similarity obtained by the SGR and SAF models on the test set? I'm looking forward to your reply, thank you again from the bottom of my heart

The Loss is nan

Hello thankyou so much for reviewing this issue and reply

When I run train.py ,the result's Loss is nan,what's the problem?

About Similarity Pyramid

Hi, Could you please me how to understand the Similarity Pyramid(and Pyramid Spatial Window, different Pyramid Levels, etc.) which used in obtaining image feature that your paper memtioned???
In your released code, it was only region features extracted by Faster-RCNN(Bottom-up Attention)just as the Pioneers' work? I'm confused about that.
Thank you in Advance! :)

visualization problem

I am very interested in your work (SGRAF). I now encounter a visualization problem, that is, how to visualize the results of image retrieval text and the results of text retrieval image (as shown in the figure below). If it's convenient for you, please provide the code of visualization results. Thank you very much
C57BD5CC-3539-465D-A802-BEEA208EE7F4
!

About the formula in the paper

Hello author, I'm reading your paper. In the paper, the implementation of formulas 1, 3, 4, 5 is given in github where the code is located? As for the learnable parameter W in formula 1, I could not find the corresponding position in the code. Looking forward to your answer, thank you

flickr8k

请问作者有没有尝试过在flickr8k这样的小数据集上做验证,flickr30k对我的实验来说时间有点久

run

i have set the virtual environment of my pycharm as same as torch1.2.0 and python2.7. But i can't run through after my train when i want to evaluate the model. it shows indexerror with indices for array. Is something wrong with my environment or my data or some other probability? I tried to solve it with my friends but we found that the code is totally correct. But it still can't successfully eval on my computer. This problem had been haunted me for few days. Thank u.

evaluate.py does not run with models provided - get error from numpy array copy

Attempted to run evaluation.py using provided MS Coco models.

cpu (ie non-gpu) version of Python 3.6 branch

In evaluation.py, line 103, appears to be attempting to insert a record into array img_embs

Specific line is

img_embs[ids] = img_emb.data.cpu().numpy().copy()

This line throws an Error:

IndexError: too many indices for array

Using provided code (evaluation.py) and MS Coco models, ids appears to be a tuple
which prints as a list of integers

img_emb.data is a Tensor object, so the assignment to a numpy array img_embs appears to be an
attempted conversion of a Tensor to a numpy array, however, the actual intent of the assignment
and a work-around for the Error is unclear

Its documented code and the results from the associated paper are good, but unfortunately
the provided models are not working, and do not allow the paper results to be duplicated

Please publish an update to the code which works with provided MS Coco models

I am out of my depth in attempting to update this code.

About the loss

Hello, I have a problem and want to ask for help. I tried to run your code, but I found that the loss of the model does not decrease and the evaluation index R1,R5,R10 does not increase and the index medr, meanr is very large

Flickr 30k数据集提取的区域特征

你好,作者,我最近看到您这篇论文实验细节中,用到了Flickr 30k数据集,还提取了其区域特征,我想请问一下在哪里能下载论文实施细节提到的Flickr 30k数据集区域特征,麻烦看到了,回复一下,急用,谢谢!

Please Help with Training -Thank you

Dear Professor Hiawen Diao,
I am sorry for troubling you. Your help so far has been extraordinary. I am sincerely appreciative.
I have been able to replicate your results. My supervisor said that is good. He asked if i can train my own model.

I have trouble with your data, and it may be my fault. If you could please talk about my data question. When you train a machine learning model, my understanding is that you need to include relevant documents in the training data. For example, include 4 of 5 coco captions for training, and hold one back for validation. If my understanding is correct, then I don't see where this happens in the code.

This is the last piece that my supervisor has asked for. Any clarification you can provide would be very helpful.

Thank you

Kent

代码

何时会公布代码?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.