Git Product home page Git Product logo

psvl's Introduction

Zero-shot Natural Language Video Localization (ZSNLVL) by Pseudo-Supervised Video Localization (PSVL)

This repository is for Zero-shot Natural Language Video Localization. (ICCV 2021, Oral)

We first propose a novel task of zero-shot natural language video localization. The proposed task setup does not require any paired annotation cost for NLVL task but only requires easily available text corpora, off-the-shelf object detector, and a collection of videos to localize. To address the task, we propose a Pseudo-Supervised Video Localization method, called PSVL, that can generate pseudo-supervision for training an NLVL model. Benchmarked on two widely used NLVL datasets, the proposed method exhibits competitive performance and performs on par or outperforms the models trained with stronger supervision.

task_nlvl


Environment

This repository is implemented base on PyTorch with Anaconda.
Refer to below instruction or use Docker (dcahn/psvl:latest).

Get the code

  • Clone this repo with git, please use:
git clone https://github.com/gistvision/PSVL.git
  • Make your own environment (If you use docker envronment, you just clone the code and execute it.)
conda create --name PSVL --file requirements.txt
conda activate PSVL

Working environment

  • RTX2080Ti (11G)
  • Ubuntu 18.04.5
  • pytorch 1.5.1

Download

Dataset & Pretrained model

  • This link is connected for downloading video features used in this paper.
    : After downloading the video feature, you need to set the data path in a config file.

  • This link is connected for downloading pre-trained model.

For ActivityNet-Captions, check Activinet-Captions section of this document.

Evaluating pre-trained models

If you want to evaluate the pre-trained model, you can use below command.

python inference.py --model CrossModalityTwostageAttention --config "YOUR CONFIG PATH" --pre_trained "YOUR MODEL PATH"

Training models from scratch

To train PSVL, run train.py with below command.

# Training from scratch
python train.py --model CrossModalityTwostageAttention --config "YOUR CONFIG PATH"
# Evaluation
python inference.py --model CrossModalityTwostageAttention --config "YOUR CONFIG PATH" --pre_trained "YOUR MODEL PATH"

Activinet-Captions

  • Go to this repository, and download the video features for ActiviNet-Captions.
    Place the data under /dataset/lgi_video_feature/anet_feats.

  • Other data can be downloaded from this link.

Please download the file, unzip it, and type followings to train/inference with the data.

To train the model, please run:

python train.py --model CrossModalityTwostageAttention --config configs/anet_simple_model/simplemodel_anet_BS256_two-stage_attention.yml --dataset anet

To inference with test set, please run:

python inference.py --model CrossModalityTwostageAttention --config configs/anet_simple_model/simplemodel_anet_BS256_two-stage_attention.yml --pre_trained anet_pretrained_best.pth

Lisence

MIT Lisence

Citation

If you use this code, please cite:

@inproceedings{nam2021zero,
  title={Zero-shot Natural Language Video Localization},
  author={Nam, Jinwoo and Ahn, Daechul and Kang, Dongyeop and Ha, Seong Jong and Choi, Jonghyun},
  booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision},
  pages={1470-1479},
  year={2021}
}

Contact

If you have any questions, please send e-mail to me ([email protected], [email protected])

psvl's People

Contributors

dcahn12 avatar skaws2003 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.