Git Product home page Git Product logo

thuiar / mintrec2.0 Goto Github PK

View Code? Open in Web Editor NEW
9.0 1.0 1.0 2.38 MB

MIntRec2.0 is the first large-scale dataset for multimodal intent recognition and out-of-scope detection in multi-party conversations (ICLR 2024)

Python 100.00%
classification multimodal-intent-analysis multimodal-machine-learning out-of-scope-detection dataset deep-neural-networks framework iclr2024 intent-classification intent-detection

mintrec2.0's Introduction

MIntRec2.0

GitHub Download the dataset PRs are Welcome

FeaturesDownloadDataset DescriptionBenchmark FrameworkQuick start

MIntRec2.0 is a large-scale multimodal multi-party benchmark dataset for intent recognition and out-of-scope detection in conversations. We also provide benchmark framework and evaluation codes for usage.

Example: Example

Updates 🔥 🔥 🔥

Date Announcements
1/2024 🎆 🎆 The first large-scale multimodal intent dataset has been released. Refer to the directory MIntRec2.0 for the dataset and codes. Read the paper -- MIntRec2.0: A Large-scale Benchmark Dataset for Multimodal Intent Recognition and Out-of-scope Detection in Conversations (Published in ICLR 2024).
10/2022 🎆 🎆 The first multimodal intent dataset is published. Refer to the directory MIntRec for the dataset and codes. Read the paper -- MIntrec: A New Dataset for Multimodal Intent Recognition (Published in ACM MM 2022).

Features

MIntRec2.0 has the following features:

  • Large in Scale: Compared with our first version of multimodal intent recognition dataset (MIntRec), MIntRec2.0 increase the data-scale from 2.2K to 15K, with 30 intent classes, 9.3K in-scope and 5.7K out-of-scope annotated utterances with text, video, and audio modalities.

  • Multi-turn & Multi-party Dialogues: It contains 1,245 dialogues with an average of 12 utterances per dialogue in continuous conversations. Each utterance has an intent label in each dialogue. Each dialogue has at least two different speakers with annotated speaker identities for each utterance.

  • Out-of-scope Detection: As real-world dialogues are in the open-world scenarios as suggested in TEXTOIR, we further include an OOS tag for detecting those utterances that do not belong to any of existing intent classes. They can be used for out-of-distribution detection and improve system robustness.

Download

Zenodo

The brief version of the dataset (text and video, audio feature files, 7G) can be downloaded from zenodo.

Feature data

We provide video feature files, audio feature files, and text annotation files (9G), which can be downloaded from Google Drive.

Raw data

We also provide raw video data (13G), which can be downloaded from Google Drive.

Dataset Description

  • Data sources: The raw videos are collected from three TV series: Superstore, The Big Bang Theory, and Friends.
  • Dialogue division: We manually divide dialogues based on the scenes and episode.
  • Speaker information: We manually annotate 21, 7, 6 main characters in Superstore, The Big Bang Theory, and Friends, respectively.
  • Intent classes
    • Express emotions or attitudes (16): doubt, acknowledge, refuse, warn, emphasize, complain, praise, apologize, thank, criticize, care, agree, oppose, taunt, flaunt, joke
    • Acheve goals (14): ask for opinions, confirm, explain, invite, plan, inform, advise, arrange, introduce, comfort, leave, prevent, greet, ask for help

Statistics

Item Statistics
Number of coarse-grained intents 2
Number of fine-grained intents 30
Number of dialogues 1,245
Number of utterances 15,040
Number of words in utterances 118,477
Number of unique words in utterances 9,524
Average length of utterances 7.0
Maximum length of utterances 46
Average video clip duration 3.0 (s)
Maximum video clip duration 19.9 (s)
Video hours 12.3 (h)

Data distribution of in-scope (IS) and out-of-scope (OOS) samples:

Intent distribution:

Benchmark Framework

We present a framework to benchmark multimodal intent understanding and out-of-scope detection in both single-turn and multi-turn conversational scenarios.

The overall framework:

The framework contains 4 main modules:

  • Data Organization: Single-turn dialogues use utterance-level samples as inputs. Multi-turn dialogues are arranged chronologically based on the order in which the speakers take their turn.
  • Multimodal Feature Extraction: Extracting features from text, video, and audio modalities. For multi-turn dialogues, we concatenate the context information with the current utterance and separate them with a special token.
  • Multimodal Fusion: Multimodal fusion methods (e.g., MAG-BERT, MulT) can be used for fusing different modalities.
  • Training: In-scope data uses cross-entropy loss. Out-of-scope data uses outlier exposure loss. It may also contain the multimodal fusion loss for capturing cross-modal interactions.
  • Inference: Open set recognition method (e.g., DOC) can be used to identify K known classes and detect one out-of-scope class.

Quick start

  1. Use anaconda to create Python environment

    conda create --name mintrec python=3.9
    conda activate mintrec
    
  2. Install PyTorch (Cuda version 11.2)

    conda install pytorch torchvision torchaudio cudatoolkit=11.3 -c pytorch
    
  3. Clone the MIntRec repository.

    git clone [email protected]:thuiar/MIntRec2.0.git
    cd MIntRec
    
  4. Install related environmental dependencies

    pip install -r requirements.txt
    
  5. Run examples (Take mag-bert as an example, more can be seen here)

    sh examples/run_mag_bert_baselines.sh
    

Citations

If this work is helpful, or you want to use the codes and results in this repo, please cite the following papers:

@inproceedings{MIntRec2.0,
   title={{MI}ntRec2.0: A Large-scale Benchmark Dataset for Multimodal Intent Recognition and Out-of-scope Detection in Conversations},
   author={Zhang, Hanlei and Wang, Xin and Xu, Hua and Zhou, Qianrui and Su, Jianhua and Zhao, Jinyue and Li, Wenrui and Chen, Yanting and Gao, Kai},
   booktitle={The Twelfth International Conference on Learning Representations},
   year={2024},
   url={https://openreview.net/forum?id=nY9nITZQjc}
}
@inproceedings{MIntRec,
   author = {Zhang, Hanlei and Xu, Hua and Wang, Xin and Zhou, Qianrui and Zhao, Shaojie and Teng, Jiayan},
   title = {MIntRec: A New Dataset for Multimodal Intent Recognition},
   year = {2022},
   booktitle = {Proceedings of the 30th ACM International Conference on Multimedia},
   pages = {1688–1697},
}

The dataset and camera ready version of the paper will be updated recently.

mintrec2.0's People

Contributors

hanleizhang avatar mrfocusxin avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

Forkers

xiaoxiaogang1

mintrec2.0's Issues

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.