I²Transformer: Intra- and Inter-relation Embedding Transformer for TV Show Captioning

This package contains the accompanying code for the following paper:

Tu, Yunbin, et al. "I²Transformer: Intra- and Inter-relation Embedding Transformer for TV Show Captioning.", which has appeared as regular paper in IEEE TIP。

We illustrate the training details as follows:

1. Prepare feature files

Download tvc_feature_release.tar.gz (23GB). After downloading the file, extract it to the data directory.

tar -xf path/to/tvc_feature_release.tar.gz -C data

You should be able to see video_feature under data/tvc_feature_release directory. It contains video features (ResNet, I3D, ResNet+I3D). Plase note that this code only used the features of ResNet+I3D.

2. Install dependencies:

Ubuntu 16.04
Python 2.7
PyTorch 1.1.0
nltk
easydict
tqdm
h5py
tensorboardX
An RTX 2080Ti

3. Add project root to `PYTHONPATH`

source setup.sh

Note that you need to do this each time you start a new session.

4. Git clone the Microsoft COCO evaluation server to evaluate captions and place it in the dir: 'I2Transformer/standalone_eval/'

git clone https://github.com/tylin/coco-caption.git

5. Build Vocabulary

You could skip this step because this file has been provided.

bash baselines/multimodal_transformer/scripts/build_vocab.sh

Running this command will build vocabulary cache/tvc_word2idx.json from TVC train set.

6. I²Transformer training

bash baselines/multimodal_transformer/scripts/train.sh video_sub resnet_i3d

This code will load all the data (~30GB) into RAM to speed up training, use --no_core_driver to disable this behavior.

Training using the above config will stop at around epoch 22, around 7 hours with a single 2080Ti GPU. You should get ~47.2 CIDEr and ~11.4 BLEU@4 scores on val set. The resulting model and config will be saved at a dir: baselines/multimodal_transformer/results/video_sub-res-*

7. I²Transformer inference

After training, you can inference using the saved model on val or test_public set:

bash baselines/multimodal_transformer/scripts/translate.sh MODEL_DIR_NAME SPLIT_NAME

MODEL_DIR_NAME is the name of the dir containing the saved model, e.g., video_sub-res-*. SPLIT_NAME could be val or test_public.

8. Our results

The generated captions and evaluation scores on the val and test_public set are in the dir: 'our_results'

Citing

If you find this helps your research, please consider citing:

@article{tu2022i2transformer,
  title={I2Transformer: Intra-and Inter-relation Embedding Transformer for TV Show Captioning},
  author={Tu, Yunbin and Li, Liang and Su, Li and Gao, Shengxiang and Yan, Chenggang and Zha, Zheng-Jun and Yu, Zhengtao and Huang, Qingming},
  journal={IEEE Transactions on Image Processing},
  year={2022},
  publisher={IEEE}
}

Contact

My email is [email protected]

Any discussions and suggestions are welcome!

Acknowledgement

This work and code are inspired by TVCaption. Thanks for their solid work!

tuyunbin / i2transformer Goto Github PK

i2transformer's Introduction

I²Transformer: Intra- and Inter-relation Embedding Transformer for TV Show Captioning

We illustrate the training details as follows:

1. Prepare feature files

2. Install dependencies:

3. Add project root to `PYTHONPATH`

4. Git clone the Microsoft COCO evaluation server to evaluate captions and place it in the dir: 'I2Transformer/standalone_eval/'

5. Build Vocabulary

6. I²Transformer training

7. I²Transformer inference

8. Our results

Citing

Contact

Acknowledgement

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

tuyunbin / i2transformer Goto Github PK

i2transformer's Introduction

I2Transformer: Intra- and Inter-relation Embedding Transformer for TV Show Captioning

We illustrate the training details as follows:

1. Prepare feature files

2. Install dependencies:

3. Add project root to PYTHONPATH

4. Git clone the Microsoft COCO evaluation server to evaluate captions and place it in the dir: 'I2Transformer/standalone_eval/'

5. Build Vocabulary

6. I2Transformer training

7. I2Transformer inference

8. Our results

Citing

Contact

Acknowledgement

Recommend Projects

Recommend Topics

Recommend Org

I²Transformer: Intra- and Inter-relation Embedding Transformer for TV Show Captioning

3. Add project root to `PYTHONPATH`

6. I²Transformer training

7. I²Transformer inference