Git Product home page Git Product logo

video2language's Introduction

V2L-MSVD

Generating video descriptions using deep learning in Keras

Start with AWS Ubuntu Deep Learning AMI on a EC2 p2.xlarge instance. (or better, p2.xlarge costs $0.9/hour on-demand and ~$0.3/hour as a spot instance)

source activate tensorflow_p27
conda install scikit-learn
conda install scikit-image

If you are not using AWS, ensure you have a recent version of Keras and Tensorflow installed and working, and also install scikit-learn and scikit-image if you want to train tag prediction models

git clone https://github.com/rohit-gupta/V2L-MSVD.git
cd V2L-MSVD

Using a pre-trained video captioning model

Use a video from YouTube

bash fetch-pretrained-model.sh
sudo bash install-youtube-dl.sh
bash fetch-youtube-video.sh https://www.youtube.com/watch?v=cKWuNQAy2Sk
bash process-youtube-video.sh 

Use a video from your local disk

bash fetch-pretrained-model.sh
bash fetch-from-localpath.sh /home/ubuntu/vid1.mp4
bash process-youtube-video.sh 

Training your own video captioning model

Download data: should take about 2 minutes

bash fetch-data.sh

Preprocess text data: ETA ~5 minutes

If you only want to use Verified descriptions ->

bash preprocess-data.sh CleanOnly 

If you want to use both verified and unverified descriptions ->

bash preprocess-data.sh

Extract frames from the Videos: ETA ~30 minutes

bash extract_frames.sh

Extract Video Features: ETA ~15 Minutes

bash run-feature-extractor.sh

Tag Model: ETA ~5 Minutes

bash train-simple-tag-prediction-model.sh

Train Language Model: ETA ~50 minutes (Can be killed around ~25 minutes after 5 Epochs)

bash train-language-model.sh

Score Language Model: ETA ~5 minutes

bash score-language-model.sh

Known Issues

  • If at any stage you get an error that contains
/lib/libstdc++.so.6: version `CXXABI_1.3.x' not found

You can fix it with:

cd ~/anaconda3/envs/tensorflow_p27/lib && mv libstdc++.a stdcpp_bkp && mv libstdc++.so stdcpp_bkp && mv libstdc++.so.6 stdcpp_bkp && mv libstdc++.so.6.0.19 stdcpp_bkp/  && mv libstdc++.so.6.0.19-gdb.py stdcpp_bkp/  && mv libstdc++.so.6.0.21 stdcpp_bkp/  && mv libstdc++.so.6.0.24 stdcpp_bkp/ && cd -
  • Tensorflow 1.3 has a memory leak bug that might affect this code

You can fix it by upgrading Tensorflow.

Reference for this problem: rohit-gupta#3

Results

The video captioning model here uses Mean Pooled ResNet50 features of video frames along with Object, Action and Attribute tags predicted by a simple feedforward network.

The Table below compares the performance of our model with some other models that also rely on mean pooled frame features. It is sourced from papers 1, 2 and 3.

Model METEOR score on MSVD
Mean Pooled (AlexNet Features) 26.9
Mean Pooled (VGG Features) 27.7
Mean Pooled (GoogleNet Features) 28.7
Ours (Mean Pooled ResNet50 Features + Predicted Tags) 29.0

Language Model

video2language's People

Contributors

rohit-gupta avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.