Git Product home page Git Product logo

declare-lab / multimodal-infomax Goto Github PK

View Code? Open in Web Editor NEW
143.0 4.0 32.0 148 KB

This repository contains the official implementation code of the paper Improving Multimodal Fusion with Hierarchical Mutual Information Maximization for Multimodal Sentiment Analysis, accepted at EMNLP 2021.

License: MIT License

Python 100.00%
multimodal-deep-learning multimodal-fusion multimodal-sentiment-analysis

multimodal-infomax's Introduction

MultiModal-InfoMax

This repository contains the official implementation code of the paper Improving Multimodal Fusion with Hierarchical Mutual Information Maximization for Multimodal Sentiment Analysis, accepted at EMNLP 2021.

🔥 If you would be interested in other multimodal works in our DeCLaRe Lab, welcome to visit the clustered repository

Introduction

Multimodal-informax (MMIM) synthesizes fusion results from multi-modality input through a two-level mutual information (MI) maximization. We use BA (Barber-Agakov) lower bound and contrastive predictive coding as the target function to be maximized. To facilitate the computation, we design an entropy estimation module with associated history data memory to facilitate the computation of BA lower bound and the training process.

Alt text

Usage

  1. Download the CMU-MOSI and CMU-MOSEI dataset from Google Drive or Baidu Disk (extraction code: g3m2). Place them under the folder Multimodal-Infomax/datasets

  2. Set up the environment (need conda prerequisite)

conda env create -f environment.yml
conda activate MMIM
  1. Start training
python main.py --dataset mosi --contrast

Citation

Please cite our paper if you find our work useful for your research:

@inproceedings{han2021improving,
  title={Improving Multimodal Fusion with Hierarchical Mutual Information Maximization for Multimodal Sentiment Analysis},
  author={Han, Wei and Chen, Hui and Poria, Soujanya},
  booktitle={Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing},
  pages={9180--9192},
  year={2021}
}

Contact

Should you have any question, feel free to contact me through [email protected]

multimodal-infomax's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

multimodal-infomax's Issues

Minimum system requirements to run the MOSEI experiment?

Hi @Clement25 ,

I am trying to train the model with the MOSEI dataset. I have a linux machine with 16GB RAM. OOMD service is killing the python process since the script needs a lot of memory. I think it's happening during the dataset creation where the large MOSEI pickle file is loaded and processed. Is there a minimum system requirements guide to run this experiment? If not, do you have any recommendation to handle this issue? Thanks!

image

Question about the paper

Hi, your work is really great and inspiring to me, but after reading your paper, I am still confused about some parts of it. Does LBA include lld in formula 4 (LBA=lld + H(y)), or is lld just used to update the parameters of the predictor in the first stage of training, and does not need to be used for LBA calculation? Looking forward to your reply.

environment

Hello, I encountered some issues while configuring the environment. It seems that there is an incompatibility between torch 1.7.0 and torchvision 0.4.2. What should I do to resolve this?

about the baseline results

Excuse me, why are the baseline results in the table copied from previous papers and not the results obtained by the author's own experiments?

Data download dead link

Hi,

Both links you provided to download data (Google drive and Baidu Disk) seem to be dead links (error 404 on github).

Can I use this work for just audio and text modalities?

Hi All,

I am working on a sentiment analysis project where I am trying to recognize sentiments in dyadic conversations. I only have access to audio and text data and don't have image/video data. Do you think I can use this work for my project? Is the code written in such a way that you can pick and choose a combination of modalities?

Thank you for your help!

About the code of MMILB

Great work but I have a question in the MMILB class. In Line 173 of src/modules/encoders.py I found a encoder :
self.entropy_prj=nn.Sequential(.......)
In the forward method, it seems that when estimating the entropy of Y, the code does not use the input embeddings of Y directly. Instead, the code first passes the input embeddings to self.entropy_prj and uses its output to estimate the entropy of Y. I didn't find this encoder in the paper. So why this encoder is used?

Question about the feature extractor

Hi, I am confused about the feature extractor in your paper. As I know, COVAREP and P2FA are both feature extractor for acoustic, but you use them for visual and acoustic.

Dataset loading problem

Hello, first of all thank you for your hard work!
My question is: after downloading the MOSI data set, I encountered this problem during the loading of the data set:
Traceback (most recent call last):
File "F:/MLP/2023_Summer/Git/MMIM/src/main.py", line 57, in
solver.train_and_eval()
File "F:\MLP\2023_Summer\Git\MMIM\src\solver.py", line 281, in train_and_eval
train_loss = train(model, optimizer_main, criterion, 1)
File "F:\MLP\2023_Summer\Git\MMIM\src\solver.py", line 128, in train
for i_batch, batch_data in enumerate(self.train_loader):
File "F:\Anaconda\envs\self_mm\lib\site-packages\torch\utils\data\dataloader.py", line 435, in next
data = self._next_data()
File "F:\Anaconda\envs\self_mm\lib\site-packages\torch\utils\data\dataloader.py", line 475, in _next_data
data = self._dataset_fetcher.fetch(index) # may raise StopIteration
File "F:\Anaconda\envs\self_mm\lib\site-packages\torch\utils\data_utils\fetch.py", line 47, in fetch
return self.collate_fn(data)
File "F:\MLP\2023_Summer\Git\MMIM\src\data_loader.py", line 135, in collate_fn
bert_sentences = torch.LongTensor([sample["input_ids"] for sample in bert_details])
ValueError: expected sequence of length 50 at dim 1 (got 39)
image

Using test loss to choose the best model?

Hi,

In solver.py line 295-316, it seems like you are using the test loss (MAE) to choose the best model.

I think it's not correct. Instead, we can only use the validation results to choose the best model.

Gradient disappearance occurs during training

The following error occurs when executing "'python main.py --dataset mosei --contrast". It seems that there is a problem during the training process that causes NaN values to appear in the gradient calculation. I would like to know how to solve it.
1

The sota of MOSET dataset

hello, could you tell me how to set the model to get the sota more closer to the sota of the paper,I have try your parameters,but it does not work effectively

Any details of processing the raw dataset

Dear authors,

There is no details of how to get the features of the raw dataset and no explanations of the extracted features. Can you provide the details?

Thanks!

Why does this issue occur when the GPU is available and the environment is configured properly?

torch==1.7.1
cuda==11.0
cuda.is_available==True
Start loading the data....
train
Training data loaded!
valid
Validation data loaded!
test
Test data loaded!
Finish loading the data....
[W ..\torch\csrc\autograd\python_anomaly_mode.cpp:104] Warning: Error detected in LogdetBackward. Traceback of forward call that caused the error:
File "main.py", line 62, in
solver.train_and_eval()
File "D:\Project\CSCL\paper_test\Main_structure\src\solver.py", line 282, in train_and_eval
train_loss = train(model, optimizer_main, criterion, 1)
File "D:\Project\CSCL\paper_test\Main_structure\src\solver.py", line 158, in train
bert_sent, bert_sent_type, bert_sent_mask, y, mem)
File "D:\CODE_env\Anaconda\anaconda3\envs\CSCL\lib\site-packages\torch\nn\modules\module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "D:\Project\CSCL\paper_test\Main_structure\src\model.py", line 107, in forward
lld_ta, ta_pn, H_ta = self.mi_ta(x=text, y=acoustic, labels=y, mem=mem['ta'])
File "D:\CODE_env\Anaconda\anaconda3\envs\CSCL\lib\site-packages\torch\nn\modules\module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "D:\Project\CSCL\paper_test\Main_structure\src\modules\encoders.py", line 193, in forward
H = 0.25 * (torch.logdet(sigma_pos) + torch.logdet(sigma_neg))
(function print_stack)
Traceback (most recent call last):
File "main.py", line 62, in
solver.train_and_eval()
File "D:\Project\CSCL\paper_test\Main_structure\src\solver.py", line 282, in train_and_eval
train_loss = train(model, optimizer_main, criterion, 1)
File "D:\Project\CSCL\paper_test\Main_structure\src\solver.py", line 189, in train
loss.backward()
File "D:\CODE_env\Anaconda\anaconda3\envs\CSCL\lib\site-packages\torch\tensor.py", line 221, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph)
File "D:\CODE_env\Anaconda\anaconda3\envs\CSCL\lib\site-packages\torch\autograd_init
.py", line 132, in backward
allow_unreachable=True) # allow_unreachable flag
RuntimeError: cusolver error: 7, when calling cusolverDnCreate(handle)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.