Light

yinan-zhao / aoanet_vizwiz Goto Github PK

View Code? Open in Web Editor NEW

7.0 2.0 7.0 1.71 MB

License: MIT License

Makefile 0.01% Jupyter Notebook 84.87% Python 14.85% Shell 0.26%

aoanet_vizwiz's Introduction

Benchmarking AoANet on VizWiz-Captions

This repository includes the code for benchmarking Attention on Attention for Image Captioning on VizWiz-Captions.

Requirements

Python 3.6
Java 1.8.0
PyTorch 1.0
tensorboardX

Training AoANet

Prepare data

See details in data/README.md. We combine both the train and val split of VizWiz-Captions for training.

Training from scratch

$ CUDA_VISIBLE_DEVICES=0 sh train_vizwiz.sh

See opts.py for the options. You can also download our trained model here.

Fine-tuning models pretrained on MSCOCO-Captions

Download the pretrained models (log_aoanet_rl) from here.

Then run:

$ CUDA_VISIBLE_DEVICES=0 sh finetune_vizwiz.sh

Evaluation

Generate predictions for the test split using the model pretrained on MSCOCO-Captions.

$ CUDA_VISIBLE_DEVICES=0 sh eval_pretrained.sh

Generate predictions for the test split using the model trained from scratch.

$ CUDA_VISIBLE_DEVICES=0 sh eval_scratch.sh

Generate predictions for the test split using the fine-tuned model.

$ CUDA_VISIBLE_DEVICES=0 sh eval_finetune.sh

The results will be saved in vis/

Performance

Upload the generated results in vis/ to the evaluation server to evalute on the test split. See below for the scores of the model trained from scratch.

Model	Bleu-1	Bleu-2	Bleu-3	Bleu-4	ROUGE-L	METEOR	SPICE	CIDEr
from_scratch	65.91	47.77	33.68	23.41	46.56	20.00	15.11	59.77

Reference

If you find this repo helpful, please consider citing:

@article{gurari2020captioning,
  title={Captioning Images Taken by People Who Are Blind},
  author={Gurari, Danna and Zhao, Yinan and Zhang, Meng and Bhattacharya, Nilavra},
  journal={arXiv preprint arXiv:2002.08565},
  year={2020}
}

@inproceedings{huang2019attention,
  title={Attention on Attention for Image Captioning},
  author={Huang, Lun and Wang, Wenmin and Chen, Jie and Wei, Xiao-Yong},
  booktitle={International Conference on Computer Vision},
  year={2019}
}

Contact

Contact Yinan Zhao ([email protected]) for any question.

Acknowledgements

This repository is based on AoANet, and you may refer to it for more details about the code.

aoanet_vizwiz's People

Contributors

Stargazers

Watchers

Forkers

leyuan hwangjohn violetalien yuanjiay ojassm shawndong98 qinkm22

aoanet_vizwiz's Issues

How to test one image from model?

Hi. I already trained model from scratch? And I want to test one image, to see the result. How to do this?

reset the data loader when validation

@Yinan-Zhao Hi, when I run the train_vizwiz.sh, I find the epoch number won't increase during self-critical learning. I think this is because of the data loader reset operation when validation.
You know, the data loader is used for both training and validation.
With the batch size is 10 and 30408 images(train + val), when iteration%3000(the number for save_checkpoint_every )==0, the data loader hasn't reach the end of the dataset and will be reset by " loader.reset_iterator(split)". Thus the condition 'if data['bounds']['wrapped']:' can never be satisfied and the self-critical learning will never be stopped.
Well, the first stage of training can be stopped because the 'save_checkpoint_every' is 6000.

Training Process

I have started a process where I am training the BUTD model on VizWiz data and the process is processing the 24th epoch from the last 2 days. I am assuming it is some validation process:

iter 215850 (epoch 24), avg_reward = -0.650, time/batch = 0.534
Read data: 0.012876033783
Cider scores: 1.6170099658743349

I see that the Cider score is not very high, am I doing something wrong?

coco vocabulary无法下载

您好，在data/readme文件中，您给出了coco vocabulary的下载链接，但是网页报502错了，您可以重新给个链接嘛？

Bottom-up features for images link is not available

Hi, I found the link of pre-extracted bottom-up features is not available, could you please provide the link again? Thanks!

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.