RCAR

PyTorch implementation for TIP2023 paper of “Plug-and-Play Regulators for Image-Text Matching”.

It is built on top of the SGRAF, GPO and Awesome_Matching.

If any problems, please contact me at [email protected]. ([email protected] is deprecated)

Introduction

The framework of RCAR:

The reported results (One can import GloVe Embedding or BERT for better results)

Dataset	Module	Sentence retrieval			Image retrieval
Dataset	Module	R@1	R@5	R@10	R@1	R@5	R@10
Flick30k	T2I	79.7	95.0	97.4	60.9	84.4	90.1
	I2T	76.9	95.5	98.0	58.8	83.9	89.3
	ALL	82.3	96.0	98.4	62.6	85.8	91.1
MSCOCO1k	T2I	79.1	96.5	98.8	63.9	90.7	95.9
	I2T	79.3	96.5	98.8	63.8	90.4	95.8
	ALL	80.9	96.9	98.9	65.7	91.4	96.4
MSCOCO5k	T2I	59.1	84.8	91.8	42.8	71.5	81.9
	I2T	58.4	84.6	91.9	41.7	71.4	81.7
	ALL	61.3	86.1	92.6	44.3	73.2	83.2

Requirements

Utilize pip install -r requirements.txt for the following dependencies.

Python 3.7.11
PyTorch 1.7.1
NumPy 1.21.5
Punkt Sentence Tokenizer:

import nltk
nltk.download()
> d punkt

Download data and vocab

We follow SCAN to obtain image features and vocabularies, which can be downloaded by using:

https://www.kaggle.com/datasets/kuanghueilee/scan-features

Another download link is available below：

https://drive.google.com/drive/u/0/folders/1os1Kr7HeTbh8FajBNegW8rjJf6GIhFqC

data
├── coco
│   ├── precomp  # pre-computed BUTD region features for COCO, provided by SCAN
│   │      ├── train_ids.txt
│   │      ├── train_caps.txt
│   │      ├── ......
│   │
│   └── id_mapping.json  # mapping from coco-id to image's file name
│   
│
├── f30k
│   ├── precomp  # pre-computed BUTD region features for Flickr30K, provided by SCAN
│   │      ├── train_ids.txt
│   │      ├── train_caps.txt
│   │      ├── ......
│   │
│   └── id_mapping.json  # mapping from f30k index to image's file name
│   
│
└── vocab  # vocab files provided by SCAN (only used when the text backbone is BiGRU)

Pre-trained models and evaluation

Modify the model_path, split, fold5 in the eval.py file. Note that fold5=True is only for evaluation on mscoco1K (5 folders average) while fold5=False for mscoco5K and flickr30K. Pretrained models and Log files can be downloaded from Flickr30K_RCAR and MSCOCO_RCAR.

Then run python eval.py in the terminal.

Training new models from scratch

Uncomment the required parts of BASELINE, RAR, RCR, RCAR in the train_xxxx_xxx.sh file.

Then run ./train_xxx_xxx.sh in the terminal:

Reference

If RCAR is useful for your research, please cite the following paper:

  @article{Diao2023RCAR,
     author={Diao, Haiwen and Zhang, Ying and Liu, Wei and Ruan, Xiang and Lu, Huchuan},
     journal={IEEE Transactions on Image Processing}, 
     title={Plug-and-Play Regulators for Image-Text Matching}, 
     year={2023},
     volume={32},
     pages={2322-2334}
  }

License

Apache License 2.0.

paranioar / rcar Goto Github PK

rcar's Introduction

RCAR

Introduction

Requirements

Download data and vocab

Pre-trained models and evaluation

Training new models from scratch

Reference

License

rcar's People

Contributors

Stargazers

Watchers

Forkers

rcar's Issues

About the visualization

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent