Git Product home page Git Product logo

tirg's Introduction

Composing Text and Image for Image Retrieval

This is the code for the paper:

Composing Text and Image for Image Retrieval - An Empirical Odyssey
Nam Vo, Lu Jiang, Chen Sun, Kevin Murphy, Li-Jia Li, Li Fei-Fei, James Hays
CVPR 2019.

Please note that this is not an officially supported Google product. And this is the reproduced, not the original code.

If you find this code useful in your research then please cite

@inproceedings{vo2019composing,
  title={Composing Text and Image for Image Retrieval-An Empirical Odyssey},
  author={Vo, Nam and Jiang, Lu and Sun, Chen and Murphy, Kevin and Li, Li-Jia and Fei-Fei, Li and Hays, James},
  booktitle={CVPR},
  year={2019}
}

Introduction

In this paper, we study the task of image retrieval, where the input query is specified in the form of an image plus some text that describes desired modifications to the input image.

Problem Overview

We propose a new way to combine image and text using TIRG function for the retrieval task. We show this outperforms existing approaches on different datasets.

Method

Setup

  • torchvision
  • pytorch
  • numpy
  • tqdm
  • tensorboardX

Running Models

  • main.py: driver script to run training/testing
  • datasets.py: Dataset classes for loading images & generate training retrieval queries
  • text_model.py: LSTM model to extract text features
  • img_text_composition_models.py: various image text compostion models (described in the paper)
  • torch_function.py: contains soft triplet loss function and feature normalization function
  • test_retrieval.py: functions to perform retrieval test and compute recall performance

CSS3D dataset

Download the dataset from this external website.

Make sure the dataset include these files: <dataset_path>/css_toy_dataset_novel2_small.dup.npy <dataset_path>/images/*.png

To run our training & testing:

python main.py --dataset=css3d --dataset_path=./CSSDataset --num_iters=160000 \
  --model=tirg --loss=soft_triplet --comment=css3d_tirg

python main.py --dataset=css3d --dataset_path=./CSSDataset --num_iters=160000 \
  --model=tirg_lastconv --loss=soft_triplet --comment=css3d_tirgconv

The first command apply TIRG to the fully connected layer and the second applies it to the last conv layer. To run the baseline:

python main.py --dataset=css3d --dataset_path=./CSSDataset --num_iters=160000 \
  --model=concat --loss=soft_triplet --comment=css3d_concat

MITStates dataset

Download the dataset from this external website.

Make sure the dataset include these files:

<dataset_path>/images/<adj noun>/*.jpg

For training & testing:

python main.py --dataset=mitstates --dataset_path=./mitstates \
  --num_iters=160000 --model=concat --loss=soft_triplet \
  --learning_rate_decay_frequency=50000 --num_iters=160000 --weight_decay=5e-5 \
  --comment=mitstates_concat

python main.py --dataset=mitstates --dataset_path=./mitstates \
  --num_iters=160000 --model=tirg --loss=soft_triplet \
  --learning_rate_decay_frequency=50000 --num_iters=160000 --weight_decay=5e-5 \
  --comment=mitstates_tirg

Fashion200k dataset

Download the dataset from this external website Download our generated test_queries.txt from here.

Make sure the dataset include these files:

<dataset_path>/labels/*.txt
<dataset_path>/women/<category>/<caption>/<id>/*.jpeg
<dataset_path>/test_queries.txt`

Run training & testing:

python main.py --dataset=fashion200k --dataset_path=./Fashion200k \
  --num_iters=160000 --model=concat --loss=batch_based_classification \
  --learning_rate_decay_frequency=50000 --comment=f200k_concat

python main.py --dataset=fashion200k --dataset_path=./Fashion200k \
  --num_iters=160000 --model=tirg --loss=batch_based_classification \
  --learning_rate_decay_frequency=50000 --comment=f200k_tirg

Pretrained Models:

Our pretrained models can be downloaded below. You can find our best single model accuracy: The numbers are slightly different from the ones reported in the paper due to the re-implementation.

These saved weights might not be working correctly any more with new version, please refer to #12

Notes:

All log files will be saved at ./runs/<timestamp><comment>. Monitor with tensorboard (training loss, training retrieval performance, testing retrieval performance):

tensorboard --logdir ./runs/ --port 8888

Pytorch's data loader might consume a lot of memory, if that's an issue add --loader_num_workers=0 to disable loading data in parallel.

tirg's People

Contributors

lugiavn avatar roadjiang avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

tirg's Issues

Issues

-I faced number of problems running Main class but it worked after i run below
python main.py --dataset=css3d --dataset_path=./CSSDataset --num_iters=160000 --model=tirg --loss=soft_triplet --comment=css3d_tirg --loader_num_workers=0

-another problem was project is python 2 currently python 3 but i succeed to change syntax
-the problem which i couldn't find solution is ".cuda()" i don't have Nvidia Card ,i don't know what it do as i am new to python , so i comment all code has cuda, if can you tell me what it do and alternative for it , it would be great.

  • also after training how can i enter query image and text to get retrieved image ?

Hello, an error occurred while 'train_loop' function was running

In the functiondef 'train_loop' defined in the main function, this statement run error:
for data in tqdm(trainloader, desc='Training for epoch ' + str(epoch)):
I want to ask what changes should been made?
This is an error message:
'Training for epoch 0: 0%| | 0/594 [00:00<?, ?it/s]Traceback (most recent call last):
File "D:/毕设/tirg-master/tirg-master/main.py", line 299, in
main()
File "D:/毕设/tirg-master/tirg-master/main.py", line 294, in main
train_loop(opt, logger, trainset, testset, model, optimizer)
File "D:/毕设/tirg-master/tirg-master/main.py", line 267, in train_loop
for data in tqdm(trainloader, desc='Training for epoch ' + str(epoch)):
File "C:\ProgramData\Anaconda3\envs\pytorch\lib\site-packages\tqdm\std.py", line 1108, in iter
for obj in iterable:
File "C:\ProgramData\Anaconda3\envs\pytorch\lib\site-packages\torch\utils\data\dataloader.py", line 279, in iter
return _MultiProcessingDataLoaderIter(self)
File "C:\ProgramData\Anaconda3\envs\pytorch\lib\site-packages\torch\utils\data\dataloader.py", line 719, in init
w.start()
File "C:\ProgramData\Anaconda3\envs\pytorch\lib\multiprocessing\process.py", line 105, in start
self._popen = self._Popen(self)
File "C:\ProgramData\Anaconda3\envs\pytorch\lib\multiprocessing\context.py", line 212, in _Popen
return _default_context.get_context().Process._Popen(process_obj)
File "C:\ProgramData\Anaconda3\envs\pytorch\lib\multiprocessing\context.py", line 313, in _Popen
return Popen(process_obj)
File "C:\ProgramData\Anaconda3\envs\pytorch\lib\multiprocessing\popen_spawn_win32.py", line 66, in init
reduction.dump(process_obj, to_child)
File "C:\ProgramData\Anaconda3\envs\pytorch\lib\multiprocessing\reduction.py", line 59, in dump
ForkingPickler(file, protocol).dump(obj)
AttributeError: Can't pickle local object 'BaseDataset.get_loader..'
Training for epoch 0: 0%| | 0/594 [00:01<?, ?it/s]'

Looking forward to your reply,Thank you very much.

Problems about Fashion200k dataset

Hi, there.

Thank you for the great work!

According to fashion200k repo, there are two versions of Fashion200k pictures, namely cropped detected images and original images, which one do you use in your implementation?

Why the result on the dataset CSS much is much less than the results in the paper?

Hello, I would like to ask you 2 questions.
1、Why the result on the dataset CSS much is much less than the results in the paper?
Because of version iteration of pytorch, I make some modification to your code to run on the new version (1.10.2+cu102). The results on the fashion200 and MIT datasets are similar to those in the paper, but the result is much less on the CSS dataset.
2. The result on the CSS dataset is much better than results on the other two datasets. Why does the model perform so well on the CSS dataset?

CSS3D dataset only has 6004 train and 6019 test samples

Hello, I downloaded the CSS3D from the readme and load the css_toy_dataset_novel2_small.dup.npy as follow:

import numpy as np
data = np.load("../data/css_toy_dataset_novel2_small.dup.npy",allow_pickle=True,encoding="latin1")
data = data.item()

I found that data["train"]["mods"] only contains 6004 samples and that only includes the 2d->3d mods, not 3d->3d mods

ask for version

Hello.
Could you please tell me version of each requirements?

"ValueError: Type must be a sub-type of ndarray type"

File "/content/drive/My Drive/tirg/tirg/torch_functions.py", line 41, in pairwise_distances
x_norm = (np.power(x, 2)).sum(1).view(1,-1)
ValueError: Type must be a sub-type of ndarray typearises

How to resolve this error!

Which torch version is used in this project?

Hello anh Nam!
Currently I run the code but there's some bugs arise due to update of pytorch (e.g Variable() deprecation or new style of create autograd func).
What is the torch version is used for this project anh? Thank you!

Hello, I want the TIRG code about Classification with compositionally novellabels on Mit-states dataset

Thank you for your research and contribution.
My name is Muah Seol, Korea Electronics and Telecommunications Research Institute.

I want the TIRG code about Classification with compositionally novellabels on Mit-states dataset.
I understand that TIRG code is no longer supported.
But I would like to get the code for this part. Is it possible?

My email address is: [email protected]

Kind regards.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.