Git Product home page Git Product logo

umc's Introduction

Unsupervised Multimodal Clustering for Semantics Discovery in Multimodal Utterances

The first unsupervised multimodal clustering method for multimodal semantics discovery.

Introduction

This repository contains the official PyTorch implementation of the research paper Unsupervised Multimodal Clustering for Semantics Discovery in Multimodal Utterances (Accepted by ACL 2024 Main Conference, Long Paper).

Dependencies

We use anaconda to create python environment and install required libraries:

conda create --name umc python=3.8

pip install torch==1.8.1+cu111 torchvision==0.9.1+cu111 torchaudio==0.8.1 -f https://download.pytorch.org/whl/torch_stable.html

pip install -r requirements.txt

Datasets

Datasets

  • MIntRec: The first multimodal intent recognition dataset (Paper, Resource)
  • MELD-DA: A multimodal multi-party dataset for emotion recognition in conversation (Paper, Resource)
  • IEMOCAP-DA: The Interactive Emotional Dyadic Motion Capture database (Paper, Resource)

For MELD-DA and IEMOCAP-DA, we use the well-annotated dialogue act (DA) labels from the EMOTyDA dataset (Paper, Resource)

Features Preparation

You can download the multimodal features from Baidu Cloud (code: swqe) or Google Disk.

An example of the data structure of one dataset is as follows:

Datasets/
├── MIntRec/
│ ├── train.tsv
│ ├── dev.tsv
│ ├── test.tsv
│ ├── video_data/
│ │ └── swin_feats.pkl
│ └── audio_data/
│ │ └── wavlm_feats.pkl
├── MELD-DA/
│ ├──..
├── IEMOCAP-DA/
│ ├──...

The pre-trained bert model can be downloaded from Baidu Cloud with code: v8tk.

Models

In this work, we propose UMC, a novel unsupervised multimodal clustering method. It introduces (1) a unique approach to contructing augmentation views for multimodal data, (2) an innovative strategy to dynamically select high-quality samples as guidance for representation learning, (3) a combined learning approach to use both high- and low-quality samples to learn friendly representations conducive to clustering. The model architecture is as follows:

Framework

The high-quality sampling strategy is illustrated as follows:

Sampling

Usage

Clone the repository:

git clone [email protected]:thuiar/UMC.git

Run the experiments by:

sh examples/run_umc.sh

Quick start from Pretrain

The following example demonstrates a complete quickstart process using the MIntRec dataset.

Step 1 : Download the dataset and pre-trained BERT model using the provided method, and place them in the UMC/ directory.

Step 2 : Modify the parameters in configs/umc_MIntRec.py to include the pre-training process.

'pretrain': [True],

Step 3 :You can modify the parameters in the examples/run_umc.sh file to suit your needs as follows:

--data_path 'Datasets' \  # Change dataset address/path

--train \  # Include the training process

--save_model \  # Specify to save the model

--output_path "outputs"  # Store both pre-trained and final models

Step 4 :Run the experiments by:

sh examples/run_umc.sh

Results

Methods NMI ARI ACC FMI Avg.
MIntRec SCCL 45.33 14.60 36.86 24.89 30.42
CC 47.45 22.04 41.57 26.91 34.49
USNID 47.91 21.52 40.32 26.58 34.08
MCN 18.24 1.70 16.76 10.32 11.76
UMC (Text) 47.15 22.05 42.46 26.93 34.65
UMC 49.26 24.67 43.73 29.39 36.76
MELD-DA SCCL 22.42 14.48 32.09 27.51 24.13
CC 23.03 13.53 25.13 24.86 21.64
USNID 20.80 12.16 24.07 23.28 20.08
MCN 8.34 1.57 18.10 15.31 10.83
UMC (Text) 19.57 16.29 33.40 30.81 25.02
UMC 23.22 20.59 35.31 33.88 28.25
IEMOCAP-DA SCCL 21.90 10.90 26.80 24.14 20.94
CC 23.59 12.99 25.86 24.42 21.72
USNID 22.19 11.92 27.35 23.86 21.33
MCN 8.12 1.81 16.16 14.34 10.11
UMC (Text) 20.01 18.15 32.76 31.10 25.64
UMC 24.16 20.31 33.87 32.49 27.71

Citations

If you are insterested in this work, and want to use the codes or results in this repository, please star this repository and cite the following works:

@article{zhang2024unsupervised,
      title={Unsupervised Multimodal Clustering for Semantics Discovery in Multimodal Utterances}, 
      author={Hanlei Zhang and Hua Xu and Fei Long and Xin Wang and Kai Gao},
      year={2024},
      journal = {arXiv preprint arXiv:2405.12775},
}
@inproceedings{10.1145/3503161.3547906,
    author = {Zhang, Hanlei and Xu, Hua and Wang, Xin and Zhou, Qianrui and Zhao, Shaojie and Teng, Jiayan},
    title = {MIntRec: A New Dataset for Multimodal Intent Recognition},
    year = {2022},
    doi = {10.1145/3503161.3547906},
    booktitle = {Proceedings of the 30th ACM International Conference on Multimedia},
    pages = {1688–1697},
    numpages = {10},
}

umc's People

Contributors

hanleizhang avatar moringfix avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.