tyiannak / multimodal_movie_analysis Goto Github PK

A Python Library for Multimodal Analysis of Movies and Content-based Movie Recommendation

Python 99.39% Shell 0.61%

multimodal_movie_analysis's Introduction

multimodal_movie_analysis

Audio

To analyze a movie in terms of its auditory content, do the following:

cd analyze_audio
python3 analyze_audio.py -f movie.wav

Note: You will need to create a folder in analyze_audio/segment_models where you will store your audio SVM segment classifiers. See analyze_audio/readme.md for instructions on how to train these audio classifiers. Currently the audio analysis module expects 5 audio classifiers: (1) a generic audio classifier (4-classes) (2) two speech emotion classifiers and (3) two musical emotion classifiers.

Visual

To extract hand-crafted audio features run the following:

python3 analyze_visual.py -f ../V236_915000__0.mp4

The features are saved in npy files. The main functionality is implemented in function process_video that extracts features from specific file. See analyze_visual/Readme.md for more details.

You can also train a supervised model of video shots (e.g. types of shots):

python3 train.py -v data/class1 data/class2 -a SVM

The following files will be saved to disk:

shot_classifier_SVM.pkl the classifier
shot_classifier_SVM_scaler.pkl the scaler
shot_classifier_SVM_results.json the cross validation results
shot_classifier_conf_mat_SVC().jpg the confusion matrix of the cross validation procedure

As soon as the supervised model is trained you can classify an unknown shot (or shots organized in folders):

python3 wrapper.py -m SVM -i test.mp4

The following script detects the change of the shots in a video file and it stores the respective shots in individual files. It can be used in combination with the wrapper.py script above to analyze a movie per shot.

python3 shot_generator.py -f data/file.mp4

multimodal_movie_analysis's People

Contributors

Stargazers

Watchers

Forkers

gyyfifafans georgetouros theopsall hadryan braineditor acilikola brahimmade harel-coffee

multimodal_movie_analysis's Issues

Experiments

Shot dataset: design the annotation process and decide the classes

Object Detection As Features

add analyze_audio

MINOR: Printing feature stats vector error

In analyze_visual.analyze_visual.py, in line 397 the shape of the feature matrix is printed instead of the shape of the feature_stats vector.
Change:

print('Shape of feature stats vector including'
      ' object features (after smoothing'
      ' object confidences): {}'.format(feature_matrix.shape))

print('Shape of feature stats vector including'
      ' object features (after smoothing'
      ' object confidences): {}'.format(feature_stats.shape))

add analyze_visual

Shot Evaluation: Create script that compares the 3rd argument of process_video() with the ground truth.

Shot dataset: create annotation aggregation script (consider annotator agreement etc)

Add clustering to wrapper

add analyze_text

Shot Evaluation: experiments for different shot_change() parameters

Shot dataset: run annotation

Plot features histograms

Parser for classification

TypeError with scikit-image==0.18.0

With the scikit-image==0.18.0, i get the error :
Using: cuda:0
Downloading: "https://github.com/NVIDIA/DeepLearningExamples/archive/torchhub.zip" to /home/theo/.cache/torch/hub/torchhub.zip
Downloading: "https://download.pytorch.org/models/resnet50-19c8e357.pth" to /home/theo/.cache/torch/hub/checkpoints/resnet50-19c8e357.pth
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 97.8M/97.8M [00:02<00:00, 37.3MB/s]
Using cache found in /home/theo/.cache/torch/hub/NVIDIA_DeepLearningExamples_torchhub
Traceback (most recent call last):
File "analyze_visual.py", line 35, in
generic_model = gmodel.SsdNvidia()
File "/home/theo/Downloads/multimodal_movie_analysis/analyze_visual/object_detection/generic_model.py", line 107, in init
self.utils = torch.hub.load('NVIDIA/DeepLearningExamples:torchhub',
File "/home/theo/.local/lib/python3.8/site-packages/torch/hub.py", line 370, in load
model = _load_local(repo_or_dir, model, *args, **kwargs)
File "/home/theo/.local/lib/python3.8/site-packages/torch/hub.py", line 399, in _load_local
model = entry(*args, **kwargs)
File "/home/theo/.cache/torch/hub/NVIDIA_DeepLearningExamples_torchhub/hubconf.py", line 230, in nvidia_ssd_processing_utils
import skimage
File "/home/theo/.local/lib/python3.8/site-packages/skimage/init.py", line 135, in
from .data import data_dir
File "/home/theo/.local/lib/python3.8/site-packages/skimage/data/init.py", line 156, in
image_fetcher, data_dir = create_image_fetcher()
File "/home/theo/.local/lib/python3.8/site-packages/skimage/data/init.py", line 136, in create_image_fetcher
image_fetcher = pooch.create(
TypeError: create() got an unexpected keyword argument 'retry_if_failed'

By the way, with the scikit-image==0.17.2 there is no any error.

Code refactoring

Shot dataset: run annotation aggregation script and finalize dataset

add object-related features in visual extraction

Problem when trying to download SSD model endpoint

Getting this error @apoman38

analyze_visual|master⚡ ⇒ python3 shot_generator.py -d ~/Downloads/videos                         
Using: cpu
Downloading: "https://github.com/NVIDIA/DeepLearningExamples/archive/torchhub.zip" to /Users/tyiannak/.cache/torch/hub/torchhub.zip
Downloading: "https://download.pytorch.org/models/resnet50-19c8e357.pth" to /Users/tyiannak/.cache/torch/hub/checkpoints/resnet50-19c8e357.pth
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 97.8M/97.8M [00:18<00:00, 5.61MB/s]
Downloading checkpoint from https://api.ngc.nvidia.com/v2/models/nvidia/ssdpyt_fp32/versions/1/files/nvidia_ssdpyt_fp32_20190225.pt
Traceback (most recent call last):
  File "shot_generator.py", line 5, in <module>
    from analyze_visual import *
  File "/Users/tyiannak/Research/libraries/multimodal_movie_analysis/analyze_visual/analyze_visual.py", line 35, in <module>
    generic_model = gmodel.SsdNvidia()
  File "/Users/tyiannak/Research/libraries/multimodal_movie_analysis/analyze_visual/object_detection/generic_model.py", line 64, in __init__
    ckpt_file = _download_checkpoint(checkpoint_str, force_reload=False)
  File "/Users/tyiannak/Research/libraries/multimodal_movie_analysis/analyze_visual/object_detection/generic_model.py", line 17, in _download_checkpoint
    urllib.request.urlretrieve(checkpoint, ckpt_file)
  File "/Users/tyiannak/.pyenv/versions/3.7.3/lib/python3.7/urllib/request.py", line 247, in urlretrieve
    with contextlib.closing(urlopen(url, data)) as fp:
  File "/Users/tyiannak/.pyenv/versions/3.7.3/lib/python3.7/urllib/request.py", line 222, in urlopen
    return opener.open(url, data, timeout)
  File "/Users/tyiannak/.pyenv/versions/3.7.3/lib/python3.7/urllib/request.py", line 531, in open
    response = meth(req, response)
  File "/Users/tyiannak/.pyenv/versions/3.7.3/lib/python3.7/urllib/request.py", line 641, in http_response
    'http', request, response, code, msg, hdrs)
  File "/Users/tyiannak/.pyenv/versions/3.7.3/lib/python3.7/urllib/request.py", line 569, in error
    return self._call_chain(*args)
  File "/Users/tyiannak/.pyenv/versions/3.7.3/lib/python3.7/urllib/request.py", line 503, in _call_chain
    result = func(*args)
  File "/Users/tyiannak/.pyenv/versions/3.7.3/lib/python3.7/urllib/request.py", line 649, in http_error_default
    raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 404:

Plot confusion matrix for users annotations

Baseline, hand-crafted features shot classifier

Shot Evaluation: report on results

Average agreement calculation

multimodal_movie_analysis/annotation/aggregate_annotations.py

Line 177 in d0c2579

print("\nAverage agreement : %.2f%%" % ann_gr_2['Confidence'].mean())

Instead of ann_gr_2['Confidence'], print ann_gr['Confidence'], in order for the average agreement to be computed on all samples with 'Number_annotations' >= annotators. Not only on the samples that have 2 annotations.

Parser for classification

Wrapper for classification

Add a tool under visual subfolder for command-line shot detection

Shot dataset: run shot detection on 300 movies

Add posterior calculation to wrapper

Shot Evaluation: select 20 videos from different movies

Shot dataset: select X shots

importing problems

When opening in Pycharm (PyCharm 2020.2.3 (Professional Edition), in Ubuntu 20.04.1 LTS) using the root folder as the root of the project, I get the following import error when running from analyze_visual.analyze_visual import process_video:

Traceback (most recent call last):
  File "<input>", line 1, in <module>
  File "/snap/pycharm-professional/218/plugins/python/helpers/pydev/_pydev_bundle/pydev_import_hook.py", line 21, in do_import
    module = self._system_import(name, *args, **kwargs)
  File "/home/zappatistas20/PycharmProjects/multimodal_movie_analysis/analyze_visual/analyze_visual.py", line 30, in <module>
    from object_detection import detection_utils as dutils
  File "/snap/pycharm-professional/218/plugins/python/helpers/pydev/_pydev_bundle/pydev_import_hook.py", line 21, in do_import
    module = self._system_import(name, *args, **kwargs)
ModuleNotFoundError: No module named 'object_detection'

I fixed that (and all the errors that followed after that concerning ModuleNotFoundError) by adding the absolute path to the targeted module like so:

In the analyze_visual.py:

from analyze_visual.object_detection import detection_utils as dutils
from analyze_visual.object_detection import generic_model as gmodel
from analyze_visual.utils import *

In the detection_utils.py:

from analyze_visual.utils import rect_area
from analyze_visual.utils import intersect_rectangles

I am guessing the same should happen in the analyze_textual files as well.

Plot roc and precision-recall curves

analyze_visual problem with empty tensors

Hello!
I am trying to extract features from this video

I am using the following command: python analyze_visual.py -f ../data/V236_915000__0.mp4
and I am getting the following result:

Using: cpu
Using cache found in /home/zappatistas20/.cache/torch/hub/NVIDIA_DeepLearningExamples_torchhub
Using cache found in /home/zappatistas20/.cache/torch/hub/NVIDIA_DeepLearningExamples_torchhub
Began processing video : ../data/V236_915000__0.mp4
FPS      = 23.976023976023978
Duration = 40.04 - 00:00:40.03
[W NNPACK.cpp:80] Could not initialize NNPACK! Reason: Unsupported hardware.
Traceback (most recent call last):
  File "analyze_visual.py", line 476, in <module>
    main(sys.argv)
  File "analyze_visual.py", line 454, in main
    save_results)
  File "analyze_visual.py", line 303, in process_video
    objects = generic_model.detect(frame, 0.1)
  File "/home/zappatistas20/PycharmProjects/multimodal_movie_analysis/analyze_visual/object_detection/generic_model.py", line 132, in detect
    results = self.utils.decode_results(detections_batch)
  File "/home/zappatistas20/.cache/torch/hub/NVIDIA_DeepLearningExamples_torchhub/hubconf.py", line 298, in decode_results
    results = encoder.decode_batch(ploc, plabel, criteria=0.5, max_output=20)
  File "/home/zappatistas20/.cache/torch/hub/NVIDIA_DeepLearningExamples_torchhub/PyTorch/Detection/SSD/src/utils.py", line 154, in decode_batch
    output.append(self.decode_single(bbox, prob, criteria, max_output))
  File "/home/zappatistas20/.cache/torch/hub/NVIDIA_DeepLearningExamples_torchhub/PyTorch/Detection/SSD/src/utils.py", line 197, in decode_single
    bboxes_out, labels_out, scores_out = torch.cat(bboxes_out, dim=0), \
RuntimeError: There were no tensor arguments to this function (e.g., you passed an empty list of Tensors), but no fallback function is registered for schema aten::_cat.  This usually means that this function requires a non-empty list of Tensors.  Available functions are [CPU, QuantizedCPU, BackendSelect, Named, AutogradOther, AutogradCPU, AutogradCUDA, AutogradXLA, AutogradPrivateUse1, AutogradPrivateUse2, AutogradPrivateUse3, Tracer, Autocast, Batched, VmapMode].

CPU: registered at /pytorch/build/aten/src/ATen/CPUType.cpp:2127 [kernel]
QuantizedCPU: registered at /pytorch/build/aten/src/ATen/QuantizedCPUType.cpp:297 [kernel]
BackendSelect: fallthrough registered at /pytorch/aten/src/ATen/core/BackendSelectFallbackKernel.cpp:3 [backend fallback]
Named: registered at /pytorch/aten/src/ATen/core/NamedRegistrations.cpp:7 [backend fallback]
AutogradOther: registered at /pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:8078 [autograd kernel]
AutogradCPU: registered at /pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:8078 [autograd kernel]
AutogradCUDA: registered at /pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:8078 [autograd kernel]
AutogradXLA: registered at /pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:8078 [autograd kernel]
AutogradPrivateUse1: registered at /pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:8078 [autograd kernel]
AutogradPrivateUse2: registered at /pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:8078 [autograd kernel]
AutogradPrivateUse3: registered at /pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:8078 [autograd kernel]
Tracer: registered at /pytorch/torch/csrc/autograd/generated/TraceType_2.cpp:9654 [kernel]
Autocast: registered at /pytorch/aten/src/ATen/autocast_mode.cpp:258 [kernel]
Batched: registered at /pytorch/aten/src/ATen/BatchingRegistrations.cpp:511 [backend fallback]
VmapMode: fallthrough registered at /pytorch/aten/src/ATen/VmapModeRegistrations.cpp:33 [backend fallback]

Up to a point in the video, the process runs very smoothly. The same happens with a few other videos in my collection as well.

My guess is that there's an empty frame hidden somewhere in the file, and as a result an empty tensor is passed in the bboxes_out argument in decode_single. What would be the best way to try/catch this and return 0 or NaN, or just skip the frame completely, so that the process completes?

Thanks!

UnboundLocalError

I just run the visual extraction script and get the following error:

aV.process_video(video_path, process_mode, print_flag, online_display, save_results)
File "/home/theo/Pictures/EnorasiDb/multimodal_movie_analysis/analyze_visual/analyze_visual.py", line 308, in process_video
objects_boxes_all.append(objects[0])
UnboundLocalError: local variable 'objects_boxes_all' referenced before assignment