The videointentdiscovery from adymaharana

videointentdiscovery's Introduction

Multimodal Intent Discovery from Livestream Videos

PyTorch code for the Findings of NAACL 2022 paper "Multimodal Intent Discovery from Livestream Videos"

Requirements:

This code has been tested on torch==1.9.0 and transformers==4.3.2. Other requirements are moviepy for splicing videos.

Data:

We are releasing two datasets in this paper:

Behance Intent Discovery Dataset
This is a dataset containing ~20K sentences with manual annotations for tool and creative intents (see paper) and accompanied by timestamps for the livestream video they have been taken from.
The files are available in the ./data/bid/ folder.
Use ./scripts/download_videos.py to download and splice the videos for the timestamps present in the dataset.
We follow the HERO paper for extracting video representations; see this repository for extraction code.
Behance Livestreams Corpus: This is the larger unlabelled corpus containing nearly 8K full-length videos and their respective transcripts (download scripts coming soon).

Models:

The scripts for training the models presented in the paper are available under ./model/.

To train the unimodal RoBERTa model on the Behance Intent Discovery dataset, run

bash behance_unimodal.sh <GPU_ID>

To train the multimodal late fusion RoBERTa model on the Behance Intent Discovery dataset, run:

bash behance_late_fusion.sh <feature_type> <path_to_feature_directory> <GPU_ID>

To train the multimodal late fusion RoBERTa model on the Behance Intent Discovery dataset, run:

bash behance_late_fusion.sh <feature_type> <path_to_feature_directory> <GPU_ID>

Dockerized containers for training HERO + Late Fusion and ClipBERT + Late Fusion models are coming soon.

Acknowledgement:

The code in this repository has been adapted from BOND and HERO codebases.

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.

Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

TensorFlow

An Open Source Machine Learning Framework for Everyone

Django

The Web framework for perfectionists with deadlines.

Laravel

A PHP framework for web artisans

D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

web

Some thing interesting about web. New door for the world.

server

A server is a program made to process requests and deliver data to clients.

Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

Visualization

Some thing interesting about visualization, use data art

Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.

Microsoft

Open source projects and samples from Microsoft.

Google

Google ❤️ Open Source for everyone.

Alibaba

Alibaba Open Source for everyone

D3

Data-Driven Documents codes.

Tencent

China tencent open source team.

adymaharana / videointentdiscovery Goto Github PK