image_captioning_udacity_ppb's Introduction

Image_Captioning_Udacity_PPB

This project is on Generating Caption from Image, launched by Udacity, Computer Vision Nanodegree program.

Dataset

In this project, Microsoft Common Objects in COntext (MS COCO) dataset is used. It is a large-scale dataset for scene understanding. The dataset is commonly used to train and benchmark object detection, segmentation, and captioning algorithms. You can read more about the dataset on the website or in the research paper.

To explore the dataset and take the preparation of the project, please see 0_Dataset.ipynb file.

Exploring MS COCO Dataset

To use the dataset, please follow the instruction of cocoapi. According to the instruction, please download the full dataset (images + annotations) and maintain the dataset directory structure like following:

Directory Tree of Dataset

├───opt
│   └───cocoapi
│       ├───annotations
│       └───images
│           ├───test2014
│           ├───train2014
│           └───val2014

After installing the API, please run "make" under coco/PythonAPI. For details, please see 1_Preliminaries.ipynb file.

Now,

By using pip install nltk, install nltk python package in the environment.

CNN-RNN model

To know the CNN-RNN model architechture, please see model.py file.

Training CNN-RNN Model

To know the training parameters, and optimizer, please see 2_Training.ipynb file.

Optimizer and Loss Function Selection

To infer the trained CNN-RNN model on test dataset, please see 3_inference.ipynb file.

Recommend Projects

parthapratimbanik / image_captioning_udacity_ppb Goto Github PK