Git Product home page Git Product logo

video2language's Introduction

V2L-MSVD

Generating video descriptions using deep learning in Keras

Start with AWS Ubuntu Deep Learning AMI on a EC2 p2.xlarge instance. (or better, p2.xlarge costs $0.9/hour on-demand and ~$0.3/hour as a spot instance)

source activate tensorflow_p27
conda install scikit-learn
conda install scikit-image

If you are not using AWS, ensure you have a recent version of Keras and Tensorflow installed and working, and also install scikit-learn and scikit-image if you want to train tag prediction models

git clone https://github.com/rohit-gupta/V2L-MSVD.git
cd V2L-MSVD

Using a pre-trained video captioning model

Use a video from YouTube

bash fetch-pretrained-model.sh
sudo bash install-youtube-dl.sh
bash fetch-youtube-video.sh https://www.youtube.com/watch?v=cKWuNQAy2Sk
bash process-youtube-video.sh 

Use a video from your local disk

bash fetch-pretrained-model.sh
bash fetch-from-localpath.sh /home/ubuntu/vid1.mp4
bash process-youtube-video.sh 

Training your own video captioning model

Download data: should take about 2 minutes

bash fetch-data.sh

Preprocess text data: ETA ~5 minutes

If you only want to use Verified descriptions ->

bash preprocess-data.sh CleanOnly 

If you want to use both verified and unverified descriptions ->

bash preprocess-data.sh

Extract frames from the Videos: ETA ~30 minutes

bash extract_frames.sh

Extract Video Features: ETA ~15 Minutes

bash run-feature-extractor.sh

Tag Model: ETA ~5 Minutes

bash train-simple-tag-prediction-model.sh

Train Language Model: ETA ~50 minutes (Can be killed around ~25 minutes after 5 Epochs)

bash train-language-model.sh

Score Language Model: ETA ~5 minutes

bash score-language-model.sh

Known Issues

  • If at any stage you get an error that contains
/lib/libstdc++.so.6: version `CXXABI_1.3.x' not found

You can fix it with:

cd ~/anaconda3/envs/tensorflow_p27/lib && mv libstdc++.a stdcpp_bkp && mv libstdc++.so stdcpp_bkp && mv libstdc++.so.6 stdcpp_bkp && mv libstdc++.so.6.0.19 stdcpp_bkp/  && mv libstdc++.so.6.0.19-gdb.py stdcpp_bkp/  && mv libstdc++.so.6.0.21 stdcpp_bkp/  && mv libstdc++.so.6.0.24 stdcpp_bkp/ && cd -
  • Tensorflow 1.3 has a memory leak bug that might affect this code

You can fix it by upgrading Tensorflow.

Reference for this problem: #3

Results

The video captioning model here uses Mean Pooled ResNet50 features of video frames along with Object, Action and Attribute tags predicted by a simple feedforward network.

The Table below compares the performance of our model with some other models that also rely on mean pooled frame features. It is sourced from papers 1, 2 and 3.

Model METEOR score on MSVD
Mean Pooled (AlexNet Features) 26.9
Mean Pooled (VGG Features) 27.7
Mean Pooled (GoogleNet Features) 28.7
Ours (Mean Pooled ResNet50 Features + Predicted Tags) 29.0

Language Model

video2language's People

Contributors

rohit-gupta avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

video2language's Issues

Forbidden!!

When i try to run the model from the scratch,got an error that ,"You don't have permission to access /~yu239/datasets/youtubeclips.zip on this server.
Server unable to read htaccess file, denying access to be safe"
How can i resolve it?
Thanks in advance...

bash run-feature-extractor.sh Error

video_input : Tensor("input_1:0", shape=(?, 40, 224, 224, 3), dtype=float32)
convnet_model : <keras.engine.training.Model object at 0x7f84a0613d90>

Traceback (most recent call last):
File "batched_extractor.py", line 153, in
encoded_frame_sequence = TimeDistributed(convnet_model)(video_input)
File "/usr/local/lib/python2.7/dist-packages/keras/engine/base_layer.py", line 460, in call
output = self.call(inputs, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/keras/layers/wrappers.py", line 248, in call
y = self.layer.call(inputs, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/keras/engine/network.py", line 573, in call
output_tensors, _, _ = self.run_internal_graph(inputs, masks)
File "/usr/local/lib/python2.7/dist-packages/keras/engine/network.py", line 730, in run_internal_graph
output_tensors = to_list(layer.call(computed_tensor, **kwargs))
File "/usr/local/lib/python2.7/dist-packages/keras/layers/normalization.py", line 195, in call
self.momentum),
File "/usr/local/lib/python2.7/dist-packages/keras/backend/tensorflow_backend.py", line 1011, in moving_average_update
x, value, momentum, zero_debias=True)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/moving_averages.py", line 70, in assign_moving_average
update_delta = _zero_debias(variable, value, decay)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/moving_averages.py", line 180, in _zero_debias
"biased", initializer=biased_initializer, trainable=False)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/variable_scope.py", line 1065, in get_variable
use_resource=use_resource, custom_getter=custom_getter)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/variable_scope.py", line 962, in get_variable
use_resource=use_resource, custom_getter=custom_getter)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/variable_scope.py", line 367, in get_variable
validate_shape=validate_shape, use_resource=use_resource)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/variable_scope.py", line 352, in _true_getter
use_resource=use_resource)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/variable_scope.py", line 664, in _get_single_variable
name, "".join(traceback.format_list(tb))))
ValueError: Variable bn_conv1/moving_mean/biased already exists, disallowed. Did you mean to set reuse=True in VarScope? Originally defined at:

File "/usr/local/lib/python2.7/dist-packages/keras/backend/tensorflow_backend.py", line 1011, in moving_average_update
x, value, momentum, zero_debias=True)
File "/usr/local/lib/python2.7/dist-packages/keras/layers/normalization.py", line 195, in call
self.momentum),
File "/usr/local/lib/python2.7/dist-packages/keras/engine/base_layer.py", line 460, in call
output = self.call(inputs, **kwargs)

FileNotFoundError

FileNotFoundError: [Errno 2] No such file or directory: '../language_model/vocabulary_10.p'

Accuracy

I am having a problem with the prediction I have trained the model for 100 epochs but still, I got 50% accuracy and I have also tried this model to predict on the training set but the prediction accuracy is very very poor.

About memory leak

Hi Rohit, I got the following error msg when running the extract features, do you have any idea? Thanks!

Traceback (most recent call last):
File "batched_extractor.py", line 147, in
encoded_frame_sequence = TimeDistributed(convnet_model)(video_input)
File "build/bdist.linux-x86_64/egg/keras/engine/topology.py", line 619, in call
File "build/bdist.linux-x86_64/egg/keras/layers/wrappers.py", line 211, in call
File "build/bdist.linux-x86_64/egg/keras/engine/topology.py", line 2085, in call
File "build/bdist.linux-x86_64/egg/keras/engine/topology.py", line 2235, in run_internal_graph
File "build/bdist.linux-x86_64/egg/keras/layers/normalization.py", line 193, in call
File "build/bdist.linux-x86_64/egg/keras/backend/tensorflow_backend.py", line 1004, in moving_average_update
File "/home/chikiuso/.conda/envs/py27/lib/python2.7/site-packages/tensorflow/python/training/moving_averages.py", line 70, in assign_moving_average
update_delta = _zero_debias(variable, value, decay)
File "/home/chikiuso/.conda/envs/py27/lib/python2.7/site-packages/tensorflow/python/training/moving_averages.py", line 180, in _zero_debias
"biased", initializer=biased_initializer, trainable=False)
File "/home/chikiuso/.conda/envs/py27/lib/python2.7/site-packages/tensorflow/python/ops/variable_scope.py", line 1065, in get_variable
use_resource=use_resource, custom_getter=custom_getter)
File "/home/chikiuso/.conda/envs/py27/lib/python2.7/site-packages/tensorflow/python/ops/variable_scope.py", line 962, in get_variable
use_resource=use_resource, custom_getter=custom_getter)
File "/home/chikiuso/.conda/envs/py27/lib/python2.7/site-packages/tensorflow/python/ops/variable_scope.py", line 367, in get_variable
validate_shape=validate_shape, use_resource=use_resource)
File "/home/chikiuso/.conda/envs/py27/lib/python2.7/site-packages/tensorflow/python/ops/variable_scope.py", line 352, in _true_getter
use_resource=use_resource)
File "/home/chikiuso/.conda/envs/py27/lib/python2.7/site-packages/tensorflow/python/ops/variable_scope.py", line 664, in _get_single_variable
name, "".join(traceback.format_list(tb))))
ValueError: Variable bn_conv1/moving_mean/biased already exists, disallowed. Did you mean to set reuse=True in VarScope? Originally defined at:

File "build/bdist.linux-x86_64/egg/keras/backend/tensorflow_backend.py", line 1004, in moving_average_update
x, value, momentum, zero_debias=True)
File "build/bdist.linux-x86_64/egg/keras/layers/normalization.py", line 193, in call
self.momentum),
File "build/bdist.linux-x86_64/egg/keras/engine/topology.py", line 619, in call
output = self.call(inputs, **kwargs)

swig/python detected a memory leak of type 'int64_t *', no destructor found.
swig/python detected a memory leak of type 'int64_t *', no destructor found.

Dimension Error

Running Command: bash process-youtube-video.sh

raise ValueError(str(e))
ValueError: Dimension 0 in both shapes must be equal, but are 2617 and 2718. Shapes are [2617,256] and [2718,256]. for 'Assign_18' (op: 'Assign') with input shapes: [2617,256], [2718,256].

And, how can I use the pre-trained model to do the evaluation on all the videos? I would like to get the description results of video captioning. Thx.

bash run-feature-extractor.sh Memory Error

Hi, I got the following error. I am using Tensorflow version 1.12.

Using TensorFlow backend.
Frames will be extracted for 1968 Videos
2018-11-16 23:04:06.951863: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA


Layer (type) Output Shape Param #

input_1 (InputLayer) (None, 40, 224, 224, 3) 0


time_distributed_1 (TimeDist (None, 40, 2048) 23587712


lambda_1 (Lambda) (None, 2048) 0

Total params: 23,587,712
Trainable params: 0
Non-trainable params: 23,587,712


2018-11-16 23:04:23.350239: W tensorflow/core/framework/allocator.cc:122] Allocation of 2055208960 exceeds 10% of system memory.
2018-11-16 23:04:30.131324: W tensorflow/core/framework/allocator.cc:122] Allocation of 2129264640 exceeds 10% of system memory.
2018-11-16 23:04:31.361599: W tensorflow/core/framework/allocator.cc:122] Allocation of 2055208960 exceeds 10% of system memory.
terminate called after throwing an instance of 'std::bad_alloc'
what(): std::bad_alloc
run-feature-extractor.sh: line 2: 15804 Aborted (core dumped) python2 batched_extractor.py

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.