Git Product home page Git Product logo

amrhendy / multimedia_question_answering Goto Github PK

View Code? Open in Web Editor NEW
2.0 3.0 4.0 799 KB

A simple attention deep learning model to answer questions about a given video with the most relevant video intervals as answers.

License: GNU General Public License v3.0

Python 100.00%
deep-learning video-processing video-description visual-deep-learning attention-model attention-seq2seq feature-extraction cnn glove-embeddings video-question-answering

multimedia_question_answering's Introduction

Multimedia Question Answering

Increasing trend in the research community for video processing using artificial intelligence. Trending Tasks:

  • Video classification.
  • Video content description.
  • Video question answering (VQA).

Main Idea

The main idea of the project is that searching for partition of video which is most relevent to a corresponding query "Question".
Instead of watching the complete video to find the interval you want to watch, you will give our model the video and the query which describes the part you want, then our model will give you the intervals sorted by relevance to the given query.

Examples

Watch the video

Dataset

We use the Microsoft Research Video to Text (MSR-VTT) dataset.
Example of the dataset is shown below.

Extracted Visual Feature

We extracted the visual features of the data set using 3 different models.

Architecture

Here is the base architecture which is used in paper here.

Checkpoints

We have trained the model using different visual features extractors and changed a bit in the model architecture.

  • Using ResNet visual features extractor (like paper): gdrive link

  • Using NASNet visual features extractor: gdrive

  • Using Inception-ResNet-v2 visual features extractor: gdrive link

  • Using Squeeze and Excitation technique with Inception-ResNet-v2: gdrive line

  • Using Dropout technique: gdrive link

  • Using Squeeze and Excitation along with Dropout: gdrive link

  • Using Squeeze and Excitation technique and increasing hidden dimension of the LSTMs: gdrive link

Results

From the results obtained in the explained experiments, we found out that the best results obtained are from using Inception-ResNet-v2 as feature extractor for the visual features.
Our model outperforms the original paper model in all used metrics as shown in the following table:

These results obtained from testing on the test set which contains 2990 videos.

You can see the comparison between all models in the following figure:

Authors

Contribute

Contributions are always welcome!

Please read the contribution guidelines first.

License

This project is licensed under the GNU General Public License v3.0 - see the LICENSE file for details

multimedia_question_answering's People

Contributors

amrhendy avatar muhammedkhamis avatar

Stargazers

 avatar  avatar

Watchers

 avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.