man's Introduction

MAN

Multimodal Adversarial Network for Cross-modal Retrieval (PyTorch Code)

Abstract

Cross-modal retrieval aims to retrieve the pertinent samples across different modalities, which is important in numerous multimodal applications. It is challenging to correlate the multimodal data due to large heterogeneous gap between distinct modalities. In this paper, we propose a Multimodal Adversarial Network (MAN) to project the multimodal data into a common space wherein the similarities between different modalities can be directly computed by the same distance measurement. The proposed MAN consists of multiple modality-specific generators, a discriminator and a multimodal discriminant analysis (MDA) loss. With the adversarial learning, the generators are pitted against the discriminator to eliminate the cross-modal discrepancy. Furthermore, a novel MDA loss is proposed to preserve as much discrimination as possible into all available dimensions of the generated common representations. However, there are some problems in directly optimizing the MDA trace criteria. To be specific, the discriminant function will overemphasize 1) the large distances between already separated classes, 2) and the dominant eigenvalues. These problems may cause poor discrimination of the common representations. To solve these problems, we propose a between-class strategy and an eigenvalue strategy to weaken the largest between-class differences and the dominant eigenvalues, respectively. To the best of our knowledge, the proposed MAN could be one of the first works to specifically design for the multimodal representation learning (more than two modalities) with the adversarial learning. To verify the effectiveness of the proposed method, extensive experiments are carried out on four widely-used multimodal databases comparing with 16 state-of-the-art approaches.

Framework

Result

Citing MAN

If you find MAN useful in your research, please consider citing:

@article{hu2019multimodal,
  title={Multimodal adversarial network for cross-modal retrieval},
  author={Hu, Peng and Peng, Dezhong and Wang, Xu and Xiang, Yong},
  journal={Knowledge-Based Systems},
  volume={180},
  pages={38--50},
  year={2019},
  publisher={Elsevier}
}

man's People

Contributors

Stargazers

Watchers

man's Issues

about the corresponding paper

Dear author,
Thank you for your great works and codes. I am trying to run your codes in my computer. But I can not find the corresponding paper for your codes. Could you upload the corresponding paper or give me the proper title or link of the paper? Thank you very much.

The performances of DCCA and DCCAE are extremely not good by using the same image and text features as MAN.

Dear author,
Thank you for your great works and codes. I have a question about the paper. I reproduced the DCCA and DCCAE based on the corresponding papers. And I used the same image and text features as MAN to test the DCCA and DCCAE based on Wikipedia and Pascal Sentence datasets. However, the retrieval performances(mAP) of the two methods only achieved about 10%, which is extremely lower than what you reported.
I wonder if you have added some additional constraints to train these two methods, or could you please send me your codes about these two methods? My email address is [email protected].

Recommend Projects

penghu-cs / man Goto Github PK

man's Introduction

MAN

Abstract

Framework

Result

Citing MAN

man's People

Contributors

Stargazers

Watchers

Forkers

man's Issues

about the corresponding paper

The performances of DCCA and DCCAE are extremely not good by using the same image and text features as MAN.

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent