yashk2810 / image-captioning Goto Github PK

View Code? Open in Web Editor NEW

326.0 12.0 120.0 76.43 MB

Image Captioning using InceptionV3 and beam search

License: MIT License

Jupyter Notebook 100.00%

keras image-captioning tensorflow cnn beam-search lstm

image-captioning's Introduction

Image-Captioning using InceptionV3 and Beam Search

Using Flickr8k dataset since the size is 1GB. MS-COCO is 14GB!

Used Keras with Tensorflow backend for the code. InceptionV3 is used for extracting the features.

I am using Beam search with k=3, 5, 7 and an Argmax search for predicting the captions of the images.

The loss value of 1.5987 has been achieved which gives good results. You can check out some examples below. The rest of the examples are in the jupyter notebook. You can run the Jupyter Notebook and try out your own examples. unique.p is a pickle file which contains all the unique words in the vocabulary.

Everything is implemented in the Jupyter notebook which will hopefully make it easier to understand the code.

I have also written a blog post describing my experience while implementing this project. You can find it here.

You can download the weights here.

Examples

Dependencies

Keras 1.2.2
Tensorflow 0.12.1
tqdm
numpy
pandas
matplotlib
pickle
PIL
glob

References

[1] M. Hodosh, P. Young and J. Hockenmaier (2013) "Framing Image Description as a Ranking Task: Data, Models and Evaluation Metrics", Journal of Artificial Intelligence Research, Volume 47, pages 853-899 http://www.jair.org/papers/paper3994.html

[2] Oriol Vinyals, Alexander Toshev, Samy Bengio, Dumitru Erhan Show and Tell: A Neural Image Caption Generator

[3] CS231n Winter 2016 Lesson 10 Recurrent Neural Networks, Image Captioning and LSTM https://youtu.be/cO0a0QYmFm8?t=32m25s

image-captioning's People

Contributors

Stargazers

Watchers

Forkers

ml-lab jdc08161063 liwzhi benjamesbabala drwq shuolongbj nacun laughingcannon srinjaysarkar roozbehsanaei aaditya-aanand alexliyang tsingzao mensanyan pandelani ashiq24 daibin88 matbilml runngezhang y-v-p wzn0828 xjdeng arasharchor jl0623 sunshinezhihuo zzw1123 zonakostic shubhampachori12110095 salonirk11 ai3dvision sushantjha8 duhaijun peinan sherlock42 fatsheng anujgupta82 dmslay zhengqun luckystar1992 mcsuy gurpreetsingh9465 ishan-ambekar zoonono nomorningstar omniaalwazzan kmanjari wh-forker daicoolb fangqin0703 junyanghu eczy varsha-m1 jefbags xing-long bskalyan shyam1234 mbaey cofec avadhutc priyabrata017 ajhool multiplecrashes misalraj sagorbrur sddai ccodeidea swetakaman anqishao junyongyou davidykzhao kcaravapalli satyam-saxena89 bhautik-chutiyo rutvapatel98 srk260999 carptainhao jeeyeonheo maswewe hefv57 stlovaer itsshaikaslam yanhao-ouc xealdar amitkayal jjdbear jitu-32 aliabd dixonch cv-ljy alirezajvh josenafria manojramamurthy shivangi160 meharima manikanta-thomurothu rishitha24 baba-image-telling-for-blind zjucsxxd jmehul99 ssharshitta

image-captioning's Issues

Poor performance with time_inceptionV3_1.5987_loss.h5

When i try your weights name 'time_inceptionV3_1.5987_loss.h5', the performance of image-caption model is poor.

Taking the basketball image as a example, the output is as follows:
"""Normal Max search: man ."""

Have you met this problem before?

PS: My keras version is 2.1.2.

============================
I also load this weights on Keras 1.2.2 and Tensorflow 0.12.1, but it failed again. It seems the weights file 'time_inceptionV3_1.5987_loss.h5' is useless.

Garbage output using the given weights

@yashk2810 I tried creating an inference pipeline using the weights and the dictionary given in the repo, and got the output as shown in the image above. can you please help me out as to what I can do to fix this??

weights link is broken

Hi links to your weights is broken can you share those weights

Implementing beam search on the Tensorflow 2.0 code

@yashk2810
Is there are beam search implementation on this code:
https://www.tensorflow.org/tutorials/text/image_captioning

weights link is broken

can you please reupload your weight file?

Weird output, using weights from repo

Hi there,

Thank you so much for the Jupyter version of image captioning!

I wanted to use your pre-trained model to generate captions. So I skipped the parts that you generated weights and loaded the weights in the repository. But the output captions look really weird like this:

I got a warning about keras merge algorithm:

does this have to do with the weird output?

I am a beginner in deep learning and I am trying to use captions for any given images for my final project in NLP class. Any suggestions would be welcomed!

Thanks!!!