Git Product home page Git Product logo

man's Introduction

MAN

Multimodal Adversarial Network for Cross-modal Retrieval (PyTorch Code)

Abstract

Cross-modal retrieval aims to retrieve the pertinent samples across different modalities, which is important in numerous multimodal applications. It is challenging to correlate the multimodal data due to large heterogeneous gap between distinct modalities. In this paper, we propose a Multimodal Adversarial Network (MAN) to project the multimodal data into a common space wherein the similarities between different modalities can be directly computed by the same distance measurement. The proposed MAN consists of multiple modality-specific generators, a discriminator and a multimodal discriminant analysis (MDA) loss. With the adversarial learning, the generators are pitted against the discriminator to eliminate the cross-modal discrepancy. Furthermore, a novel MDA loss is proposed to preserve as much discrimination as possible into all available dimensions of the generated common representations. However, there are some problems in directly optimizing the MDA trace criteria. To be specific, the discriminant function will overemphasize 1) the large distances between already separated classes, 2) and the dominant eigenvalues. These problems may cause poor discrimination of the common representations. To solve these problems, we propose a between-class strategy and an eigenvalue strategy to weaken the largest between-class differences and the dominant eigenvalues, respectively. To the best of our knowledge, the proposed MAN could be one of the first works to specifically design for the multimodal representation learning (more than two modalities) with the adversarial learning. To verify the effectiveness of the proposed method, extensive experiments are carried out on four widely-used multimodal databases comparing with 16 state-of-the-art approaches.

Framework

MAN

Result

Citing MAN

If you find MAN useful in your research, please consider citing:

@article{hu2019multimodal,
  title={Multimodal adversarial network for cross-modal retrieval},
  author={Hu, Peng and Peng, Dezhong and Wang, Xu and Xiang, Yong},
  journal={Knowledge-Based Systems},
  volume={180},
  pages={38--50},
  year={2019},
  publisher={Elsevier}
}

man's People

Contributors

penghu-cs avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

man's Issues

about the corresponding paper

Dear author,
Thank you for your great works and codes. I am trying to run your codes in my computer. But I can not find the corresponding paper for your codes. Could you upload the corresponding paper or give me the proper title or link of the paper? Thank you very much.

The performances of DCCA and DCCAE are extremely not good by using the same image and text features as MAN.

Dear author,
Thank you for your great works and codes. I have a question about the paper. I reproduced the DCCA and DCCAE based on the corresponding papers. And I used the same image and text features as MAN to test the DCCA and DCCAE based on Wikipedia and Pascal Sentence datasets. However, the retrieval performances(mAP) of the two methods only achieved about 10%, which is extremely lower than what you reported.
I wonder if you have added some additional constraints to train these two methods, or could you please send me your codes about these two methods? My email address is [email protected].

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.