TensorFlow implementation of Towards Generating Stylized Image Captions via Adversarial Training.
if you use our codes or models, please cite our paper:
@inproceedings{nezami2019towards,
title={Towards Generating Stylized Image Captions via Adversarial Training},
author={Nezami, Omid Mohamad and Dras, Mark and Wan, Stephen and Paris, C{\'e}cile and Hamey, Len},
booktitle={Pacific Rim International Conference on Artificial Intelligence},
pages={270--284},
year={2019},
organization={Springer}
}
We pretrain our models (both generator and discriminator) using Microsoft COCO Dataset. Then, we train the models using SentiCap Dataset.
- Python 2.7.12
- Numpy 1.15.2
- Hickle
- Python-skimage
- Tensorflow 1.8.0
- Download Microsoft COCO Dataset including neutral image caption data and SentiCap Dataset including sentiment-bearing image caption data.
- Reseize the downloded images into [224, 224] and put them in "./images".
- Preprosses the COCO image caption data and place them in "./data/neutral". You can do this by prepro.py and the ResNet-152 network trained on ImageNet, to generate a [7,7,2048] feature map (we use the Res5c layer of the network).
- Preprosses the SentiCap image caption data and place its positve part in "./data/positive" and its negative part in "./data/negative". (Similar to the Step 3)
- Pretrain the generator using "./data/neutral".
# only activiate the first training loop in "solver_WGAN.py" by specifying the number of epochs
python model_train.py
- Pretrain the discriminator using "./data/neutral".
# only activiate the second training loop in "solver_WGAN.py" by specifying the number of epochs
python model_train.py
- Train the generator and the discriminator using "./data/positive" for the positive part and "./data/negative" for the negative part.
# only activiate the tird training loop in "solver_WGAN.py" by specifying the number of epochs.
python model_train.py
- Add your trained model into "./models".
- Run the test script
python model_test.py
BLEU-1 | BLEU-4 | METEOR | ROUGE-L | CIDEr | SPICE | |
---|---|---|---|---|---|---|
ATTEND-GAN | 56.55% | 13.05% | 18.35% | 44.45% | 62.85% | 16.05% |
ATTEND-GAN is inspired from Self-critical Sequence Training and SeqGAN in TensorFlow.