The image-caption-generation from sourav-raj

image-caption-generation's Introduction

Image Captioning :

Image Captioning is the process of generating textual description of an image. It uses both Natural Language Processing and Computer Vision to generate the captions. The dataset will be in the form [image → captions]. The dataset consists of input images and their corresponding output captions.

Encoder:

The Convolutional Neural Network(CNN) can be thought of as an encoder. The input image is given to CNN to extract the features. The last hidden state of the CNN is connected to the Decoder.

Decoder:

The Decoder is a Recurrent Neural Network(RNN) which does language modelling up to the word level. The first time step receives the encoded output from the encoder and also the vector.

Model Building

For this, Pretrained Resnet-50 model which is trained on ImageNet dataset (available publicly on google) is used for image feature extraction. 5 layered LSTM layer model is used for image caption generation.

Recommend Projects

sourav-raj / image-caption-generation Goto Github PK

image-caption-generation's Introduction

Image Captioning :

Encoder:

Decoder:

Model Building

image-caption-generation's People

Contributors

Stargazers

Watchers

Forkers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent