Git Product home page Git Product logo

aoanet_vizwiz's Introduction

Benchmarking AoANet on VizWiz-Captions

This repository includes the code for benchmarking Attention on Attention for Image Captioning on VizWiz-Captions.

Requirements

  • Python 3.6
  • Java 1.8.0
  • PyTorch 1.0
  • tensorboardX

Training AoANet

Prepare data

See details in data/README.md. We combine both the train and val split of VizWiz-Captions for training.

Training from scratch

$ CUDA_VISIBLE_DEVICES=0 sh train_vizwiz.sh

See opts.py for the options. You can also download our trained model here.

Fine-tuning models pretrained on MSCOCO-Captions

Download the pretrained models (log_aoanet_rl) from here.

Then run:

$ CUDA_VISIBLE_DEVICES=0 sh finetune_vizwiz.sh

Evaluation

Generate predictions for the test split using the model pretrained on MSCOCO-Captions.

$ CUDA_VISIBLE_DEVICES=0 sh eval_pretrained.sh

Generate predictions for the test split using the model trained from scratch.

$ CUDA_VISIBLE_DEVICES=0 sh eval_scratch.sh

Generate predictions for the test split using the fine-tuned model.

$ CUDA_VISIBLE_DEVICES=0 sh eval_finetune.sh

The results will be saved in vis/

Performance

Upload the generated results in vis/ to the evaluation server to evalute on the test split. See below for the scores of the model trained from scratch.

Model Bleu-1 Bleu-2 Bleu-3 Bleu-4 ROUGE-L METEOR SPICE CIDEr
from_scratch 65.91 47.77 33.68 23.41 46.56 20.00 15.11 59.77

Reference

If you find this repo helpful, please consider citing:

@article{gurari2020captioning,
  title={Captioning Images Taken by People Who Are Blind},
  author={Gurari, Danna and Zhao, Yinan and Zhang, Meng and Bhattacharya, Nilavra},
  journal={arXiv preprint arXiv:2002.08565},
  year={2020}
}

@inproceedings{huang2019attention,
  title={Attention on Attention for Image Captioning},
  author={Huang, Lun and Wang, Wenmin and Chen, Jie and Wei, Xiao-Yong},
  booktitle={International Conference on Computer Vision},
  year={2019}
}

Contact

Contact Yinan Zhao ([email protected]) for any question.

Acknowledgements

This repository is based on AoANet, and you may refer to it for more details about the code.

aoanet_vizwiz's People

Contributors

ojassm avatar yinan-zhao avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

aoanet_vizwiz's Issues

reset the data loader when validation

@Yinan-Zhao Hi, when I run the train_vizwiz.sh, I find the epoch number won't increase during self-critical learning. I think this is because of the data loader reset operation when validation.
You know, the data loader is used for both training and validation.
With the batch size is 10 and 30408 images(train + val), when iteration%3000(the number for save_checkpoint_every )==0, the data loader hasn't reach the end of the dataset and will be reset by " loader.reset_iterator(split)". Thus the condition 'if data['bounds']['wrapped']:' can never be satisfied and the self-critical learning will never be stopped.
Well, the first stage of training can be stopped because the 'save_checkpoint_every' is 6000.

Training Process

I have started a process where I am training the BUTD model on VizWiz data and the process is processing the 24th epoch from the last 2 days. I am assuming it is some validation process:

iter 215850 (epoch 24), avg_reward = -0.650, time/batch = 0.534
Read data: 0.012876033783
Cider scores: 1.6170099658743349

I see that the Cider score is not very high, am I doing something wrong?

coco vocabulary无法下载

您好,在data/readme文件中,您给出了coco vocabulary的下载链接,但是网页报502错了,您可以重新给个链接嘛?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.