Git Product home page Git Product logo

lifeqa's Introduction

LifeQA data and code

This repo contains the data and PyTorch code that accompanies our LREC 2020 paper:

LifeQA: A Real-life Dataset for Video Question Answering

Santiago Castro, Mahmoud Azab, Jonathan C. Stroud, Cristina Noujaim, Ruoyao Wang, Jia Deng, Rada Mihalcea

You can see more information at the LifeQA website.

Setup

To run it, setup a new environment with Conda and activate it:

conda env create -f environment.yml
conda activate lifeqa

Data

The dataset is under data/, in lqa_train.json, lqa_dev.json, and lqa_test.json. Even though it's divided in train/dev/test, for most experiments we merge them and use a five-fold cross-validation, with the folds indicated in data/folds.

Visual features

You can download the already extracted features or do the following to extract them yourself.

  1. Download the videos. Due to YouTube's Term of Service, we can't provide the video files, however we provide the IDs and timestamps to obtain the same data. Download the YouTube videos indicated in the field parent_video_id from the JSON files, cut them based on the fields start_time and end_time, and save them based on the JSON key (e.g., 213) to data/videos, placing the files there without subdirectories.

  2. Run save_frames.sh to extract the frames in the video files:

    bash feature_extraction/save_frames.sh
  3. Download pretrained weights from Sports1M for C3D and save it in data/features/c3d.pickle.

  4. To extract the features (e.g. from an ImageNet-pretrained ResNet-152) and save them in big H5 files:

    mkdir data/features
    python feature_extraction/extract_features.py resnet

Baselines

Check the scripts under run_scripts to run the available baselines.

TVQA

To run TVQA baseline, it's different from the rest of the baselines.

We copied TVQA's repo content from commit 2c98044 into the TVQA/ folder.

Changes from upstream

It has been changed to support 4 answer choices instead of 5. Some other minor modifications have been done as well.

Setup

  1. Convert LifeQA dataset to TVQA format

    python scripts/to_tvqa_format.py
  2. Enter TVQA/ directory:

    cd TVQA/
  3. Setup the interpreter:

    conda env create -f environment.yml
    conda activate tvqa
  4. Do some pre-processing:

    python preprocessing.py --data_dir ../data/tvqa_format
    
    for i in 0 1 2 3 4; do
       python preprocessing.py --data_dir ../data/tvqa_format/fold${i}
    done
    
    mkdir cache_lifeqa
    python tvqa_dataset.py \
      --input_streams sub \
      --no_ts \
      --vcpt_path ../data/tvqa_format/det_visual_concepts_hq.pickle \
      --train_path ../data/tvqa_format/lqa_train_processed.json \
      --valid_path ../data/tvqa_format/lqa_dev_processed.json \
      --test_path ../data/tvqa_format/lqa_test_processed.json \
      --word2idx_path cache_lifeqa/word2idx.pickle \
      --idx2word_path cache_lifeqa/idx2word.pickle \
      --vocab_embedding_path cache_lifeqa/vocab_embedding.pickle

Train and test on LifeQA dataset from scratch

For 5-fold cross-validation:

for i in 0 1 2 3 4; do
    python main.py \
      --input_streams sub vcpt \
      --no_ts \
      --vcpt_path ../data/tvqa_format/det_visual_concepts_hq.pickle \
      --train_path ../data/tvqa_format/fold${i}/train_processed.json \
      --valid_path ../data/tvqa_format/fold${i}/validation_processed.json \
      --test_path ../data/tvqa_format/fold${i}/test_processed.json \
      --word2idx_path cache_lifeqa/word2idx.pickle \
      --idx2word_path cache_lifeqa/idx2word.pickle \
      --vocab_embedding_path cache_lifeqa/vocab_embedding.pickle

    python test.py --model_dir $(ls -t results/ | head -1) --mode test
done

For train, dev and test partitions:

python main.py \
  --input_streams sub vcpt \
  --no_ts \
  --vcpt_path ../data/tvqa_format/det_visual_concepts_hq.pickle \
  --train_path ../data/tvqa_format/lqa_train_processed.json \
  --valid_path ../data/tvqa_format/lqa_dev_processed.json \
  --test_path ../data/tvqa_format/lqa_test_processed.json \
  --word2idx_path cache_lifeqa/word2idx.pickle \
  --idx2word_path cache_lifeqa/idx2word.pickle \
  --vocab_embedding_path cache_lifeqa/vocab_embedding.pickle

python test.py --model_dir $(ls -t results/ | head -1) --mode test

Train on TVQA dataset

python preprocessing.py

mkdir cache_original
python tvqa_dataset.py \
  --input_streams sub \
  --no_ts \
  --word2idx_path cache_original/word2idx.pickle \
  --idx2word_path cache_original/idx2word.pickle \
  --vocab_embedding_path cache_lifeqa/vocab_embedding.pickle
python main.py \
  --input_streams sub vcpt \
  --no_ts
RESULTS_FOLDER_NAME=$(ls -t results/ | head -1)

The result from this part was saved in results_2019_05_16_23_02_15 in Google Drive. Note it corresponds to S+V+Q, with cpt as the video feature and w/o ts.

Test on LifeQA dataset

For 5-fold cross-validation:

for i in 0 1 2 3 4; do
  python test.py \
    --vcpt_path ../data/tvqa_format/det_visual_concepts_hq.pickle \
    --test_path ../data/tvqa_format/fold${i}/test_processed.json \
    --model_dir "${RESULTS_FOLDER_NAME}" \
    --mode test
done

For the test partition:

python test.py \
  --vcpt_path ../data/tvqa_format/det_visual_concepts_hq.pickle \
  --test_path ../data/tvqa_format/lqa_test_processed.json \
  --model_dir "${RESULTS_FOLDER_NAME}" \
  --mode test
Fine-tune on LifeQA dataset

For 5-fold cross-validation:

python main.py \
  --input_streams sub vcpt \
  --no_ts \
  --vcpt_path ../data/tvqa_format/det_visual_concepts_hq.pickle \
  --train_path ../data/tvqa_format/fold${i}/train_processed.json \
  --valid_path ../data/tvqa_format/fold${i}/validation_processed.json \
  --test_path ../data/tvqa_format/fold${i}/test_processed.json \
  --word2idx_path cache_original/word2idx.pickle \
  --idx2word_path cache_original/idx2word.pickle \
  --vocab_embedding_path cache_original/vocab_embedding.pickle \
  --pretrained_model_dir "${RESULTS_FOLDER_NAME}" \
  --new_word2idx_path cache_lifeqa/word2idx.pickle

For train, dev and test partitions:

python main.py \
  --input_streams sub vcpt \
  --no_ts \
  --vcpt_path ../data/tvqa_format/det_visual_concepts_hq.pickle \
  --train_path ../data/tvqa_format/lqa_train_processed.json \
  --valid_path ../data/tvqa_format/lqa_dev_processed.json \
  --test_path ../data/tvqa_format/lqa_test_processed.json \
  --word2idx_path cache_original/word2idx.pickle \
  --idx2word_path cache_original/idx2word.pickle \
  --vocab_embedding_path cache_original/vocab_embedding.pickle \
  --pretrained_model_dir "${RESULTS_FOLDER_NAME}" \
  --new_word2idx_path cache_lifeqa/word2idx.pickle

Issues

If you encounter issues while using our data or code, please open an issue in this repo.

Citation

BibTeX entry.

lifeqa's People

Contributors

bryant1410 avatar cnoujaim avatar jonathancstroud avatar mmazab avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

lifeqa's Issues

Answer types documentation

what does B in answer_type field mean? could you please point me to documentation for answer_types V, L and B?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.