Git Product home page Git Product logo

rcar's Introduction

RCAR

PyTorch implementation for TIP2023 paper of “Plug-and-Play Regulators for Image-Text Matching”.

It is built on top of the SGRAF, GPO and Awesome_Matching.

If any problems, please contact me at [email protected]. ([email protected] is deprecated)

Introduction

The framework of RCAR:

The reported results (One can import GloVe Embedding or BERT for better results)

Dataset Module Sentence retrieval Image retrieval
R@1R@5R@10 R@1R@5R@10
Flick30k T2I 79.795.097.4 60.984.490.1
I2T 76.995.598.0 58.883.989.3
ALL 82.396.098.4 62.685.891.1
MSCOCO1k T2I 79.196.598.8 63.990.795.9
I2T 79.396.598.8 63.890.495.8
ALL 80.996.998.9 65.791.496.4
MSCOCO5k T2I 59.184.891.8 42.871.581.9
I2T 58.484.691.9 41.771.481.7
ALL 61.386.192.6 44.373.283.2

Requirements

Utilize pip install -r requirements.txt for the following dependencies.

  • Python 3.7.11
  • PyTorch 1.7.1
  • NumPy 1.21.5
  • Punkt Sentence Tokenizer:
import nltk
nltk.download()
> d punkt

Download data and vocab

We follow SCAN to obtain image features and vocabularies, which can be downloaded by using:

https://www.kaggle.com/datasets/kuanghueilee/scan-features

Another download link is available below:

https://drive.google.com/drive/u/0/folders/1os1Kr7HeTbh8FajBNegW8rjJf6GIhFqC
data
├── coco
│   ├── precomp  # pre-computed BUTD region features for COCO, provided by SCAN
│   │      ├── train_ids.txt
│   │      ├── train_caps.txt
│   │      ├── ......
│   │
│   └── id_mapping.json  # mapping from coco-id to image's file name
│   
│
├── f30k
│   ├── precomp  # pre-computed BUTD region features for Flickr30K, provided by SCAN
│   │      ├── train_ids.txt
│   │      ├── train_caps.txt
│   │      ├── ......
│   │
│   └── id_mapping.json  # mapping from f30k index to image's file name
│   
│
└── vocab  # vocab files provided by SCAN (only used when the text backbone is BiGRU)

Pre-trained models and evaluation

Modify the model_path, split, fold5 in the eval.py file. Note that fold5=True is only for evaluation on mscoco1K (5 folders average) while fold5=False for mscoco5K and flickr30K. Pretrained models and Log files can be downloaded from Flickr30K_RCAR and MSCOCO_RCAR.

Then run python eval.py in the terminal.

Training new models from scratch

Uncomment the required parts of BASELINE, RAR, RCR, RCAR in the train_xxxx_xxx.sh file.

Then run ./train_xxx_xxx.sh in the terminal:

Reference

If RCAR is useful for your research, please cite the following paper:

  @article{Diao2023RCAR,
     author={Diao, Haiwen and Zhang, Ying and Liu, Wei and Ruan, Xiang and Lu, Huchuan},
     journal={IEEE Transactions on Image Processing}, 
     title={Plug-and-Play Regulators for Image-Text Matching}, 
     year={2023},
     volume={32},
     pages={2322-2334}
  }

License

Apache License 2.0.

rcar's People

Contributors

guspan-tanadi avatar paranioar avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

rcar's Issues

About the visualization

Thanks for your excellent work, I am sincerely appreciative. After reading your paper, i am waondering how to visualize the alignment weights as shown in the Figure8 and Figure9. If you could help me with this at your convenience that would be great, thanks again for contributing such great work.
1697690288066

1697690508432

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.