This project is the implemention of Google's show and tell image captioning model with Flickr8K dataset instead of MSCOCO. We also fine-tune model for different hyperparameters and experiment with different sequence modeling, i.e. using Gated Recurrent Units instead of LSTM, RNN, Bi-directional RNN. For Image model - Convolutional neural network, we use inception checkpoint provided by Google.
This is work in progress, we want to extend this work to perform video description generation from input video frames. Team -
- Ankit Rai
- Gizem Tabak
- Safa Messaoud
- Tarek Elgamal