This project is on Generating Caption from Image
, launched by Udacity, Computer Vision Nanodegree program.
In this project, Microsoft Common Objects in COntext (MS COCO) dataset is used. It is a large-scale dataset for scene understanding. The dataset is commonly used to train and benchmark object detection, segmentation, and captioning algorithms.
You can read more about the dataset on the website or in the research paper.
To explore the dataset and take the preparation of the project, please see 0_Dataset.ipynb
file.
To use the dataset, please follow the instruction of cocoapi. According to the instruction, please download the full dataset (images + annotations) and maintain the dataset directory structure like following:
Directory Tree of Dataset
├───opt
│ └───cocoapi
│ ├───annotations
│ └───images
│ ├───test2014
│ ├───train2014
│ └───val2014
After installing the API
, please run "make" under coco/PythonAPI. For details, please see 1_Preliminaries.ipynb
file.
Now,
- By using
pip install nltk
, installnltk
python package in the environment.
To know the CNN-RNN model architechture, please see model.py
file.
To know the training parameters, and optimizer, please see 2_Training.ipynb
file.
To infer the trained CNN-RNN model on test dataset, please see 3_inference.ipynb
file.