declare-lab / multimodal-infomax Goto Github PK

This repository contains the official implementation code of the paper Improving Multimodal Fusion with Hierarchical Mutual Information Maximization for Multimodal Sentiment Analysis, accepted at EMNLP 2021.

License: MIT License

Python 100.00%

multimodal-deep-learning multimodal-fusion multimodal-sentiment-analysis

multimodal-infomax's Introduction

MultiModal-InfoMax

🔥 If you would be interested in other multimodal works in our DeCLaRe Lab, welcome to visit the clustered repository

Introduction

Multimodal-informax (MMIM) synthesizes fusion results from multi-modality input through a two-level mutual information (MI) maximization. We use BA (Barber-Agakov) lower bound and contrastive predictive coding as the target function to be maximized. To facilitate the computation, we design an entropy estimation module with associated history data memory to facilitate the computation of BA lower bound and the training process.

Usage

Download the CMU-MOSI and CMU-MOSEI dataset from Google Drive or Baidu Disk (extraction code: g3m2). Place them under the folder Multimodal-Infomax/datasets
Set up the environment (need conda prerequisite)

conda env create -f environment.yml
conda activate MMIM

Start training

python main.py --dataset mosi --contrast

Citation

Please cite our paper if you find our work useful for your research:

@inproceedings{han2021improving,
  title={Improving Multimodal Fusion with Hierarchical Mutual Information Maximization for Multimodal Sentiment Analysis},
  author={Han, Wei and Chen, Hui and Poria, Soujanya},
  booktitle={Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing},
  pages={9180--9192},
  year={2021}
}

Contact

Should you have any question, feel free to contact me through [email protected]

multimodal-infomax's People

Stargazers

Watchers

multimodal-infomax's Issues

Minimum system requirements to run the MOSEI experiment?

Hi @Clement25 ,

I am trying to train the model with the MOSEI dataset. I have a linux machine with 16GB RAM. OOMD service is killing the python process since the script needs a lot of memory. I think it's happening during the dataset creation where the large MOSEI pickle file is loaded and processed. Is there a minimum system requirements guide to run this experiment? If not, do you have any recommendation to handle this issue? Thanks!

ValueError: expected sequence of length 50 at dim 1 (got 39)

File "/Multimodal-Infomax/src/data_loader.py", line 135, in collate_fn
bert_sentences = torch.LongTensor([sample["input_ids"] for sample in bert_details])

Question about the paper

Hi, your work is really great and inspiring to me, but after reading your paper, I am still confused about some parts of it. Does LBA include lld in formula 4 (LBA=lld + H(y)), or is lld just used to update the parameters of the predictor in the first stage of training, and does not need to be used for LBA calculation? Looking forward to your reply.

environment

Hello, I encountered some issues while configuring the environment. It seems that there is an incompatibility between torch 1.7.0 and torchvision 0.4.2. What should I do to resolve this?

about the baseline results

Excuse me, why are the baseline results in the table copied from previous papers and not the results obtained by the author's own experiments?

Data download dead link

Hi,

Both links you provided to download data (Google drive and Baidu Disk) seem to be dead links (error 404 on github).

acc/f1 calculation

Hi, thank you for your great work! However, there seems to be a little mistake. accuracy_score, f1_score imported from sklearn.metrics should be:

f1_score / accuracy_score(y_true, y_pred)

which is from
https://github.com/declare-lab/Multimodal-Infomax/tree/main/src/utils/eval_metrics.py#L47

you can check it in https://scikit-learn.org/0.21/modules/classes.html#sklearn-metrics-metrics

Can I use this work for just audio and text modalities?

Hi All,

I am working on a sentiment analysis project where I am trying to recognize sentiments in dyadic conversations. I only have access to audio and text data and don't have image/video data. Do you think I can use this work for my project? Is the code written in such a way that you can pick and choose a combination of modalities?

Thank you for your help!

I made the following error when replacing the data set mosi with mosei

FileNotFoundError: [Errno 2] No such file or directory: '../datasets/MOSEI/mosei_senti_data_noalign.pkl'

About the code of MMILB

Great work but I have a question in the MMILB class. In Line 173 of src/modules/encoders.py I found a encoder :
self.entropy_prj=nn.Sequential(.......)
In the forward method, it seems that when estimating the entropy of Y, the code does not use the input embeddings of Y directly. Instead, the code first passes the input embeddings to self.entropy_prj and uses its output to estimate the entropy of Y. I didn't find this encoder in the paper. So why this encoder is used?

Question for Forward lld (gaussian prior) and entropy estimation in MMILB Module.

Multimodal-Infomax/src/modules/encoders.py

Line 152 in 34f92d2

positive = -(mu - y)**2/2./torch.exp(logvar)

Is "positive" vector (above in line 152) for the p(y|x) ~ N(y|µθ1(x), σ2 (x) I)? where is the -(lnσ + C) items in the probability density function for Normal distribution ?

Question about the feature extractor

Hi, I am confused about the feature extractor in your paper. As I know, COVAREP and P2FA are both feature extractor for acoustic, but you use them for visual and acoustic.

how to use it?

Please tell me how to reprocess the code?
than you

Result is different from the paper for the mosi dataset

Hi @Clement25 ,

I trained the model on the mosi dataset and got the following numbers:

They look different than ones reported in the paper:

Would you know what could have caused these differences? Thanks for your help!

Dataset loading problem

Hello, first of all thank you for your hard work!
My question is: after downloading the MOSI data set, I encountered this problem during the loading of the data set:
Traceback (most recent call last):
File "F:/MLP/2023_Summer/Git/MMIM/src/main.py", line 57, in
solver.train_and_eval()
File "F:\MLP\2023_Summer\Git\MMIM\src\solver.py", line 281, in train_and_eval
train_loss = train(model, optimizer_main, criterion, 1)
File "F:\MLP\2023_Summer\Git\MMIM\src\solver.py", line 128, in train
for i_batch, batch_data in enumerate(self.train_loader):
File "F:\Anaconda\envs\self_mm\lib\site-packages\torch\utils\data\dataloader.py", line 435, in next
data = self._next_data()
File "F:\Anaconda\envs\self_mm\lib\site-packages\torch\utils\data\dataloader.py", line 475, in _next_data
data = self._dataset_fetcher.fetch(index) # may raise StopIteration
File "F:\Anaconda\envs\self_mm\lib\site-packages\torch\utils\data_utils\fetch.py", line 47, in fetch
return self.collate_fn(data)
File "F:\MLP\2023_Summer\Git\MMIM\src\data_loader.py", line 135, in collate_fn
bert_sentences = torch.LongTensor([sample["input_ids"] for sample in bert_details])
ValueError: expected sequence of length 50 at dim 1 (got 39)

Using test loss to choose the best model?

Hi,

In solver.py line 295-316, it seems like you are using the test loss (MAE) to choose the best model.

I think it's not correct. Instead, we can only use the validation results to choose the best model.

Gradient disappearance occurs during training

The following error occurs when executing "'python main.py --dataset mosei --contrast". It seems that there is a problem during the training process that causes NaN values to appear in the gradient calculation. I would like to know how to solve it.

The sota of MOSET dataset

hello, could you tell me how to set the model to get the sota more closer to the sota of the paper,I have try your parameters,but it does not work effectively

Any details of processing the raw dataset

Dear authors,

There is no details of how to get the features of the raw dataset and no explanations of the extracted features. Can you provide the details？

Thanks!

Why does this issue occur when the GPU is available and the environment is configured properly?

torch==1.7.1
cuda==11.0
cuda.is_available==True
Start loading the data....
train
Training data loaded!
valid
Validation data loaded!
test
Test data loaded!
Finish loading the data....
[W ..\torch\csrc\autograd\python_anomaly_mode.cpp:104] Warning: Error detected in LogdetBackward. Traceback of forward call that caused the error:
File "main.py", line 62, in
solver.train_and_eval()
File "D:\Project\CSCL\paper_test\Main_structure\src\solver.py", line 282, in train_and_eval
train_loss = train(model, optimizer_main, criterion, 1)
File "D:\Project\CSCL\paper_test\Main_structure\src\solver.py", line 158, in train
bert_sent, bert_sent_type, bert_sent_mask, y, mem)
File "D:\CODE_env\Anaconda\anaconda3\envs\CSCL\lib\site-packages\torch\nn\modules\module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "D:\Project\CSCL\paper_test\Main_structure\src\model.py", line 107, in forward
lld_ta, ta_pn, H_ta = self.mi_ta(x=text, y=acoustic, labels=y, mem=mem['ta'])
File "D:\CODE_env\Anaconda\anaconda3\envs\CSCL\lib\site-packages\torch\nn\modules\module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "D:\Project\CSCL\paper_test\Main_structure\src\modules\encoders.py", line 193, in forward
H = 0.25 * (torch.logdet(sigma_pos) + torch.logdet(sigma_neg))
(function print_stack)
Traceback (most recent call last):
File "main.py", line 62, in
solver.train_and_eval()
File "D:\Project\CSCL\paper_test\Main_structure\src\solver.py", line 282, in train_and_eval
train_loss = train(model, optimizer_main, criterion, 1)
File "D:\Project\CSCL\paper_test\Main_structure\src\solver.py", line 189, in train
loss.backward()
File "D:\CODE_env\Anaconda\anaconda3\envs\CSCL\lib\site-packages\torch\tensor.py", line 221, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph)
File "D:\CODE_env\Anaconda\anaconda3\envs\CSCL\lib\site-packages\torch\autograd_init.py", line 132, in backward
allow_unreachable=True) # allow_unreachable flag
RuntimeError: cusolver error: 7, when calling cusolverDnCreate(handle)