rohit-gupta / video2language Goto Github PK

Generating video descriptions using deep learning in Keras

Python 50.73% Shell 49.27%

deep-learning computer-vision natural-language-processing keras keras-models deep-video-analytics video-captioning video-to-text

video2language's Introduction

V2L-MSVD

Generating video descriptions using deep learning in Keras

Start with AWS Ubuntu Deep Learning AMI on a EC2 p2.xlarge instance. (or better, p2.xlarge costs $0.9/hour on-demand and ~$0.3/hour as a spot instance)

source activate tensorflow_p27
conda install scikit-learn
conda install scikit-image

If you are not using AWS, ensure you have a recent version of Keras and Tensorflow installed and working, and also install scikit-learn and scikit-image if you want to train tag prediction models

git clone https://github.com/rohit-gupta/V2L-MSVD.git
cd V2L-MSVD

Using a pre-trained video captioning model

Use a video from YouTube

bash fetch-pretrained-model.sh
sudo bash install-youtube-dl.sh
bash fetch-youtube-video.sh https://www.youtube.com/watch?v=cKWuNQAy2Sk
bash process-youtube-video.sh

Use a video from your local disk

bash fetch-pretrained-model.sh
bash fetch-from-localpath.sh /home/ubuntu/vid1.mp4
bash process-youtube-video.sh

Training your own video captioning model

Download data: should take about 2 minutes

bash fetch-data.sh

Preprocess text data: ETA ~5 minutes

If you only want to use Verified descriptions ->

bash preprocess-data.sh CleanOnly

If you want to use both verified and unverified descriptions ->

bash preprocess-data.sh

Extract frames from the Videos: ETA ~30 minutes

bash extract_frames.sh

Extract Video Features: ETA ~15 Minutes

bash run-feature-extractor.sh

Tag Model: ETA ~5 Minutes

bash train-simple-tag-prediction-model.sh

Train Language Model: ETA ~50 minutes (Can be killed around ~25 minutes after 5 Epochs)

bash train-language-model.sh

Score Language Model: ETA ~5 minutes

bash score-language-model.sh

Known Issues

If at any stage you get an error that contains

/lib/libstdc++.so.6: version `CXXABI_1.3.x' not found

You can fix it with:

cd ~/anaconda3/envs/tensorflow_p27/lib && mv libstdc++.a stdcpp_bkp && mv libstdc++.so stdcpp_bkp && mv libstdc++.so.6 stdcpp_bkp && mv libstdc++.so.6.0.19 stdcpp_bkp/  && mv libstdc++.so.6.0.19-gdb.py stdcpp_bkp/  && mv libstdc++.so.6.0.21 stdcpp_bkp/  && mv libstdc++.so.6.0.24 stdcpp_bkp/ && cd -

Tensorflow 1.3 has a memory leak bug that might affect this code

You can fix it by upgrading Tensorflow.

Reference for this problem: #3

Results

The video captioning model here uses Mean Pooled ResNet50 features of video frames along with Object, Action and Attribute tags predicted by a simple feedforward network.

The Table below compares the performance of our model with some other models that also rely on mean pooled frame features. It is sourced from papers 1, 2 and 3.

Model	METEOR score on MSVD
Mean Pooled (AlexNet Features)	26.9
Mean Pooled (VGG Features)	27.7
Mean Pooled (GoogleNet Features)	28.7
Ours (Mean Pooled ResNet50 Features + Predicted Tags)	29.0

video2language's People

Contributors

Stargazers

Watchers

Forkers

ranganaths sojhal amirunpri2018 m-and-ms sususushi lianglili abhishek9686 aniloc111 ghareshshinde bargadeori deepaliverma wchen-casia mbd-shift casssi howle mammarkhan

video2language's Issues

Forbidden!!

When i try to run the model from the scratch,got an error that ,"You don't have permission to access /~yu239/datasets/youtubeclips.zip on this server.
Server unable to read htaccess file, denying access to be safe"
How can i resolve it?
Thanks in advance...

Could not open file : frames/vid133/00204.jpg.31x av_interleaved_write_frame(): Input/output error

Happens for different videos, but always on frame 204

when runnung extract frames .sh

Add actual tag creation

Currently the code in git uses pre-computed tags, use Stanford POS tagger to complete that

https://nlp.stanford.edu/software/tagger.shtml#Download

Accuracy

I am having a problem with the prediction I have trained the model for 100 epochs but still, I got 50% accuracy and I have also tried this model to predict on the training set but the prediction accuracy is very very poor.

How could I really make a caption for video?

HI Rohit, I did all steps in your instructions, and what's next to create caption for a video? thanks!

About memory leak

Hi Rohit, I got the following error msg when running the extract features, do you have any idea? Thanks!

Traceback (most recent call last):
File "batched_extractor.py", line 147, in
encoded_frame_sequence = TimeDistributed(convnet_model)(video_input)
File "build/bdist.linux-x86_64/egg/keras/engine/topology.py", line 619, in call
File "build/bdist.linux-x86_64/egg/keras/layers/wrappers.py", line 211, in call
File "build/bdist.linux-x86_64/egg/keras/engine/topology.py", line 2085, in call
File "build/bdist.linux-x86_64/egg/keras/engine/topology.py", line 2235, in run_internal_graph
File "build/bdist.linux-x86_64/egg/keras/layers/normalization.py", line 193, in call
File "build/bdist.linux-x86_64/egg/keras/backend/tensorflow_backend.py", line 1004, in moving_average_update
File "/home/chikiuso/.conda/envs/py27/lib/python2.7/site-packages/tensorflow/python/training/moving_averages.py", line 70, in assign_moving_average
update_delta = _zero_debias(variable, value, decay)
File "/home/chikiuso/.conda/envs/py27/lib/python2.7/site-packages/tensorflow/python/training/moving_averages.py", line 180, in _zero_debias
"biased", initializer=biased_initializer, trainable=False)
File "/home/chikiuso/.conda/envs/py27/lib/python2.7/site-packages/tensorflow/python/ops/variable_scope.py", line 1065, in get_variable
use_resource=use_resource, custom_getter=custom_getter)
File "/home/chikiuso/.conda/envs/py27/lib/python2.7/site-packages/tensorflow/python/ops/variable_scope.py", line 962, in get_variable
use_resource=use_resource, custom_getter=custom_getter)
File "/home/chikiuso/.conda/envs/py27/lib/python2.7/site-packages/tensorflow/python/ops/variable_scope.py", line 367, in get_variable
validate_shape=validate_shape, use_resource=use_resource)
File "/home/chikiuso/.conda/envs/py27/lib/python2.7/site-packages/tensorflow/python/ops/variable_scope.py", line 352, in _true_getter
use_resource=use_resource)
File "/home/chikiuso/.conda/envs/py27/lib/python2.7/site-packages/tensorflow/python/ops/variable_scope.py", line 664, in _get_single_variable
name, "".join(traceback.format_list(tb))))
ValueError: Variable bn_conv1/moving_mean/biased already exists, disallowed. Did you mean to set reuse=True in VarScope? Originally defined at:

File "build/bdist.linux-x86_64/egg/keras/backend/tensorflow_backend.py", line 1004, in moving_average_update
x, value, momentum, zero_debias=True)
File "build/bdist.linux-x86_64/egg/keras/layers/normalization.py", line 193, in call
self.momentum),
File "build/bdist.linux-x86_64/egg/keras/engine/topology.py", line 619, in call
output = self.call(inputs, **kwargs)

swig/python detected a memory leak of type 'int64_t *', no destructor found.
swig/python detected a memory leak of type 'int64_t *', no destructor found.

Video captioning for individual videos using precomputed weights not working

IOError: [Errno 2] No such file or directory: '../language_model/vocabulary_10.p'
on runnng:
bash process-youtube-video.sh
while following:
Use a video from YouTube

bash run-feature-extractor.sh Memory Error

Hi, I got the following error. I am using Tensorflow version 1.12.

Using TensorFlow backend.
Frames will be extracted for 1968 Videos
2018-11-16 23:04:06.951863: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA

Layer (type) Output Shape Param #

input_1 (InputLayer) (None, 40, 224, 224, 3) 0

time_distributed_1 (TimeDist (None, 40, 2048) 23587712

lambda_1 (Lambda) (None, 2048) 0

Total params: 23,587,712
Trainable params: 0
Non-trainable params: 23,587,712

2018-11-16 23:04:23.350239: W tensorflow/core/framework/allocator.cc:122] Allocation of 2055208960 exceeds 10% of system memory.
2018-11-16 23:04:30.131324: W tensorflow/core/framework/allocator.cc:122] Allocation of 2129264640 exceeds 10% of system memory.
2018-11-16 23:04:31.361599: W tensorflow/core/framework/allocator.cc:122] Allocation of 2055208960 exceeds 10% of system memory.
terminate called after throwing an instance of 'std::bad_alloc'
what(): std::bad_alloc
run-feature-extractor.sh: line 2: 15804 Aborted (core dumped) python2 batched_extractor.py

could not find average_frame_features.pickle pickle file and also getting error not able to convert float to tensor

could you please share the requirement.txt or the average_frame_features.pickle file.
which version of tensorflow used
tensorflow-valueerror-failed-to-convert-a-numpy-array-to-a-tensor-unsupported(np.ndarray)

FileNotFoundError

FileNotFoundError: [Errno 2] No such file or directory: '../language_model/vocabulary_10.p'

bash run-feature-extractor.sh Error

video_input : Tensor("input_1:0", shape=(?, 40, 224, 224, 3), dtype=float32)
convnet_model : <keras.engine.training.Model object at 0x7f84a0613d90>

Traceback (most recent call last):
File "batched_extractor.py", line 153, in
encoded_frame_sequence = TimeDistributed(convnet_model)(video_input)
File "/usr/local/lib/python2.7/dist-packages/keras/engine/base_layer.py", line 460, in call
output = self.call(inputs, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/keras/layers/wrappers.py", line 248, in call
y = self.layer.call(inputs, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/keras/engine/network.py", line 573, in call
output_tensors, _, _ = self.run_internal_graph(inputs, masks)
File "/usr/local/lib/python2.7/dist-packages/keras/engine/network.py", line 730, in run_internal_graph
output_tensors = to_list(layer.call(computed_tensor, **kwargs))
File "/usr/local/lib/python2.7/dist-packages/keras/layers/normalization.py", line 195, in call
self.momentum),
File "/usr/local/lib/python2.7/dist-packages/keras/backend/tensorflow_backend.py", line 1011, in moving_average_update
x, value, momentum, zero_debias=True)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/moving_averages.py", line 70, in assign_moving_average
update_delta = _zero_debias(variable, value, decay)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/moving_averages.py", line 180, in _zero_debias
"biased", initializer=biased_initializer, trainable=False)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/variable_scope.py", line 1065, in get_variable
use_resource=use_resource, custom_getter=custom_getter)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/variable_scope.py", line 962, in get_variable
use_resource=use_resource, custom_getter=custom_getter)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/variable_scope.py", line 367, in get_variable
validate_shape=validate_shape, use_resource=use_resource)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/variable_scope.py", line 352, in _true_getter
use_resource=use_resource)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/variable_scope.py", line 664, in _get_single_variable
name, "".join(traceback.format_list(tb))))
ValueError: Variable bn_conv1/moving_mean/biased already exists, disallowed. Did you mean to set reuse=True in VarScope? Originally defined at:

File "/usr/local/lib/python2.7/dist-packages/keras/backend/tensorflow_backend.py", line 1011, in moving_average_update
x, value, momentum, zero_debias=True)
File "/usr/local/lib/python2.7/dist-packages/keras/layers/normalization.py", line 195, in call
self.momentum),
File "/usr/local/lib/python2.7/dist-packages/keras/engine/base_layer.py", line 460, in call
output = self.call(inputs, **kwargs)

Dimension Error

Running Command: bash process-youtube-video.sh

raise ValueError(str(e))
ValueError: Dimension 0 in both shapes must be equal, but are 2617 and 2718. Shapes are [2617,256] and [2718,256]. for 'Assign_18' (op: 'Assign') with input shapes: [2617,256], [2718,256].

And, how can I use the pre-trained model to do the evaluation on all the videos? I would like to get the description results of video captioning. Thx.

Is there a paper correspond to this code?

Hi @rohit-gupta
Thanks for your sharing~
Is this code correspond to a paper ? If so, could you tell me the name of paper? Or it's just one of your project ?
Thank you very much !