Cross-modal Retrieval

1. Introduction
2. Supported Methods
3. Usage

1. Introduction

This library is an open-source repository that contains cross-modal retrieval methods and codes.

2. Supported Methods

The currently supported algorithms include:

[Click to expand]

2.1 Unsupervised cross-modal hashing retrieval

[Click to expand]

2.1.1 Unsupervised shallow cross-modal hashing retrieval

[Click to expand]

2.1.1.1 Matrix Factorization

[Click to expand]

2017

RFDH：Robust and Flexible Discrete Hashing for Cross-Modal Similarity Search(TCSVT) [PDF] [Code]

2015

STMH:Semantic Topic Multimodal Hashing for Cross-Media Retrieval(IJCAI)[PDF]

2014

LSSH:Latent Semantic Sparse Hashing for Cross-Modal Similarity Search(SIGIR)[PDF]
CMFH:Collective Matrix Factorization Hashing for Multimodal Data(CVPR)[PDF]

2.1.1.2 Graph Theory

[Click to expand]

2018

HMR:Hetero-Manifold Regularisation for Cross-Modal Hashing(TPAMI)[PDF]

2017

FSH:Cross-Modality Binary Code Learning via Fusion Similarity Hashing(CVPR)[PDF][Code]

2014

SM2H:Sparse Multi-Modal Hashing(TMM)[PDF]

2013

IMH:Inter-Media Hashing for Large-scale Retrieval from Heterogeneous Data Sources(SIGMOD)[PDF]
LCMH:Linear Cross-Modal Hashing for Efﬁcient Multimedia Search(MM)[PDF]

2011

CVH:Learning Hash Functions for Cross-View Similarity Search(IJCAI)[PDF]

2.1.1.3 Other Shallow

[Click to expand]

2019

CRE:Collective Reconstructive Embeddings for Cross-Modal Hashing(TIP)[PDF]

2018

HMR:Hetero-Manifold Regularisation for Cross-Modal Hashing(TPAMI)[PDF]

2015

FS-LTE:Full-Space Local Topology Extraction for Cross-Modal Retrieval(TIP)[PDF]

2014

IMVH:Iterative Multi-View Hashing for Cross Media Indexing(MM)[PDF]

2013

PDH:Predictable Dual-View Hashing(ICML)[PDF]

2.1.1.4 Quantization

[Click to expand]

2016

CCQ:Composite Correlation Quantization for Efﬁcient Multimodal Retrieval(SIGIR)[PDF]
CMCQ:Collaborative Quantization for Cross-Modal Similarity Search(CVPR)[PDF]

2015

ACQ:Alternating Co-Quantization for Cross-modal Hashing(ICCV)[PDF]

2.1.2 Unsupervised deep cross-modal hashing retrieval

[Click to expand]

2.1.2.1 Naive Network

[Click to expand]

2019

UDFCH:Unsupervised Deep Fusion Cross-modal Hashing(ICMI)[PDF]

2018

UDCMH:Unsupervised Deep Hashing via Binary Latent Factor Models for Large-scale Cross-modal Retrieval(IJCAI)[PDF]

2017

DBRC:Deep Binary Reconstruction for Cross-modal Hashing(MM)[PDF]

2015

DMHOR:Learning Compact Hash Codes for Multimodal Representations Using Orthogonal Deep Structure(TMM)[PDF]

2.1.2.2 GAN

[Click to expand]

2020

MGAH:Multi-Pathway Generative Adversarial Hashing for Unsupervised Cross-Modal Retrieval(TMM)[PDF]

2019

CYC-DGH:Cycle-Consistent Deep Generative Hashing for Cross-Modal Retrieval(TIP)[PDF]
UCH:Coupled CycleGAN: Unsupervised Hashing Network for Cross-Modal Retrieval(AAAI)[PDF]

2018

UGACH:Unsupervised Generative Adversarial Cross-modal Hashing(AAAI)[PDF][Code]

2.1.2.3 Graph Model

[Click to expand]

2022

ASSPH:Adaptive Structural Similarity Preserving for Unsupervised Cross Modal Hashing(MM)[PDF]

2021

AGCH:Aggregation-based Graph Convolutional Hashing for Unsupervised Cross-modal Retrieval(TMM)[PDF]
DGCPN:Deep Graph-neighbor Coherence Preserving Network for Unsupervised Cross-modal Hashing(AAAI)[PDF][Code]

2020

DCSH:Unsupervised Deep Cross-modality Spectral Hashing(TIP)[PDF]
SRCH:Set and Rebase: Determining the Semantic Graph Connectivity for Unsupervised Cross-Modal Hashing(IJCAI)[PDF]
JDSH:Joint-modal Distribution-based Similarity Hashing for Large-scale Unsupervised Deep Cross-modal Retrieval(SIGIR)[PDF][Code]
DSAH:Deep Semantic-Alignment Hashing for Unsupervised Cross-Modal Retrieval(ICMR)[PDF][Code]

2019

DJSRH:Deep Joint-Semantics Reconstructing Hashing for Large-Scale Unsupervised Cross-Modal Retrieval(ICCV)[PDF][Code]

2.1.2.4 Knowledge Distillation

[Click to expand]

2022

DAEH:Deep Adaptively-Enhanced Hashing With Discriminative Similarity Guidance for Unsupervised Cross-Modal Retrieval(TCSVT)[PDF]

2021

KDCMH:Unsupervised Deep Cross-Modal Hashing by Knowledge Distillation for Large-scale Cross-modal Retrieval(ICMR)[PDF]
JOG:Joint-teaching: Learning to Refine Knowledge for Resource-constrained Unsupervised Cross-modal Retrieval(MM)[PDF]

2020

UKD:Creating Something from Nothing: Unsupervised Knowledge Distillation for Cross-Modal Hashing(CVPR)[PDF]

2.2 Supervised-cross-modal-hashing-retrieval

[Click to expand]

2.2.1 Supervised shallow cross-modal hashing retrieval

[Click to expand]

2.2.1.1 Matrix Factorization

[Click to expand]

2022

SCLCH: Joint Specifics and Consistency Hash Learning for Large-Scale Cross-Modal Retrieval(TIP) [PDF]

2020

BATCH: A Scalable Asymmetric Discrete Cross-Modal Hashing(TKDE) [PDF] [Code]

2019

LCMFH: Label Consistent Matrix Factorization Hashing for Large-Scale Cross-Modal Similarity Search(TPAMI) [PDF]
TECH: A Two-Step Cross-Modal Hashing by Exploiting Label Correlations and Preserving Similarity in Both Steps(MM) [PDF]

2018

SCRATCH: A Scalable Discrete Matrix Factorization Hashing for Cross-Modal Retrieval(MM) [PDF]

2017

DCH: Learning Discriminative Binary Codes for Large-scale Cross-modal Retrieval(TIP) [PDF]

2016

SMFH: Supervised Matrix Factorization for Cross-Modality Hashing(IJCAI) [PDF]
SMFH: Supervised Matrix Factorization Hashing for Cross-Modal Retrieval(TIP) [PDF]

2.2.1.2 Dictionary Learning

[Click to expand]

2016

DCDH: Discriminative Coupled Dictionary Hashing for Fast Cross-Media Retrieval(MM) [PDF]

2014

DLCMH: Dictionary Learning Based Hashing for Cross-Modal Retrieval(SIGIR) [PDF]

2.2.1.3 Feature Mapping-Sample-Constraint-Label-Constraint

[Click to expand]

2022

DJSAH: Discrete Joint Semantic Alignment Hashing for Cross-Modal Image-Text Search(TCSVT) (PDF)

2020

FUH: Fast Unmediated Hashing for Cross-Modal Retrieval(TCSVT) (PDF)

2016

MDBE: Multimodal Discriminative Binary Embedding for Large-Scale Cross-Modal Retrieval(TIP) (PDF) [Code]

2.2.1.4 Feature Mapping-Sample-Constraint-Separate-Hamming

[Click to expand]

2017

CSDH: Sequential Discrete Hashing for Scalable Cross-Modality Similarity Retrieval(TIP) (PDF)

2016

DASH: Frustratingly Easy Cross-Modal Hashing(MM) (PDF)

2015

QCH: Quantized Correlation Hashing for Fast Cross-Modal Search(IJCAI) (PDF)

2.2.1.5 Feature Mapping-Sample-Constraint-Common Hamming

[Click to expand]

2021

ASCSH: Asymmetric Supervised Consistent and Speciﬁc Hashing for Cross-Modal Retrieval(TIP) (PDF) [Code]

2019

SRDMH: Supervised Robust Discrete Multimodal Hashing for Cross-Media Retrieval(TMM) (PDF)

2018

FDCH: Fast Discrete Cross-modal Hashing With Regressing From Semantic Labels(MM) (PDF)

2017

SRSH: Semi-Relaxation Supervised Hashing for Cross-Modal Retrieval(MM) (PDF) [Code]
RoPH: Cross-Modal Hashing via Rank-Order Preserving(TMM) (PDF) [Code]

2016

SRDMH: Supervised Robust Discrete Multimodal Hashing for Cross-Media Retrieval(CIKM) (PDF)

2.2.1.6 Feature Mapping-Relation-Constraint

[Click to expand]

2017

LSRH: Linear Subspace Ranking Hashing for Cross-Modal Retrieval(TPAMI) (PDF)

2014

SCM: Large-Scale Supervised Multimodal Hashing with Semantic Correlation Maximization(AAAI) (PDF)
HTH: Scalable Heterogeneous Translated Hashing(KDD) (PDF)

2013

PLMH: Parametric Local Multimodal Hashing for Cross-View Similarity Search(IJCAI) (PDF)
RaHH: Comparing Apples to Oranges: A Scalable Solution with Heterogeneous Hashing(KDD) (PDF) [Code]

2012

CRH: Co-Regularized Hashing for Multimodal Data(CRH) (PDF)

2.2.1.7 Other Shallow

[Click to expand]

2019

DLFH: Discrete Latent Factor Model for Cross-Modal Hashing(TIP) (PDF) [Code]

2018

SDMCH: Supervised Discrete Manifold-Embedded Cross-Modal Hashing(IJCAI) (PDF)

2015

SePH: Semantics-Preserving Hashing for Cross-View Retrieval(CVPR) (PDF)

2012

MLBE: A Probabilistic Model for Multimodal Hash Function Learning(KDD) (PDF)

2010

CMSSH: Data Fusion through Cross-modality Metric Learning using Similarity-Sensitive Hashing(CVPR) (PDF)

2.2.2 Supervised deep cross-modal hashing retrieval

[Click to expand]

2.2.2.1 Naive Network-Distance-Constraint

[Click to expand]

2019

MCITR: Cross-modal Image-Text Retrieval with Multitask Learning(CIKM) (PDF)

2016

CAH: Correlation Autoencoder Hashing for Supervised Cross-Modal Search(ICMR) (PDF)

2014

CMNNH: Cross-Media Hashing with Neural Networks(MM) (PDF)
MMNN: Multimodal Similarity-Preserving Hashing(TPAMI) (PDF)

2.2.2.2 Naive Network-Similarity-Constraint

[Click to expand]

2022

Bi-CMR: Bidirectional Reinforcement Guided Hashing for Effective Cross-Modal Retrieval(AAAI) (PDF) [Code]
Bi-NCMH: Deep Normalized Cross-Modal Hashing with Bi-Direction Relation Reasoning(CVPR) (PDF)

2021

OTCMR: Bridging Heterogeneity Gap with Optimal Transport for Cross-modal Retrieval(CIKM) (PDF)
DUCMH: Deep Uniﬁed Cross-Modality Hashing by Pairwise Data Alignment(IJCAI) (PDF)

2020

NRDH: Nonlinear Robust Discrete Hashing for Cross-Modal Retrieval(SIGIR) (PDF)
DCHUC: Deep Cross-Modal Hashing with Hashing Functions and Uniﬁed Hash Codes Jointly Learning(TKDE) (PDF) [Code]

2017

CHN: Correlation Hashing Network for Efﬁcient Cross-Modal Retrieval(BMVC) (PDF)

2016

DVSH: Deep Visual-Semantic Hashing for Cross-Modal Retrieval(KDD) (PDF)

2.2.2.3 Naive Network-Negative-Log-Likelihood

[Click to expand]

2022

MSSPQ: Multiple Semantic Structure-Preserving Quantization for Cross-Modal Retrieval(ICMR) (PDF)

2021

DMFH: Deep Multiscale Fusion Hashing for Cross-Modal Retrieval(TCSVT) (PDF)
TEACH: Attention-Aware Deep Cross-Modal Hashing(ICMR) (PDF)

2020

MDCH: Mask Cross-modal Hashing Networks(TMM) (PDF)

2019

EGDH: Equally-Guided Discriminative Hashing for Cross-modal Retrieval(IJCAI) (PDF)

2018

DDCMH: Dual Deep Neural Networks Cross-Modal Hashing(AAAI) (PDF)
CMHH: Cross-Modal Hamming Hashing(ECCV) (PDF)

2017

PRDH: Pairwise Relationship Guided Deep Hashing for Cross-Modal Retrieval(AAAI) (PDF)
DCMH: Deep Cross-Modal Hashing(CVPR) (PDF) [Code]

2.2.2.4 Naive Network-Triplet-Constraint

[Click to expand]

2019

RDCMH: Multiple Semantic Structure-Preserving Quantization for Cross-Modal Retrieval(AAAI) (PDF)

2018

MCSCH: Multi-Scale Correlation for Sequential Cross-modal Hashing Learning(MM) (PDF)
TDH: Triplet-Based Deep Hashing Network for Cross-Modal Retrieval(TIP) (PDF)

2.2.2.5 GAN

[Click to expand]

2022

SCAHN: Semantic Structure Enhanced Contrastive Adversarial Hash Network for Cross-media Representation Learning(MM) (PDF) [Code]

2021

TGCR: Multiple Semantic Structure-Preserving Quantization for Cross-Modal Retrieval(TCSVT) (PDF)

2020

CPAH: Multi-Task Consistency-Preserving Adversarial Hashing for Cross-Modal Retrieval(TIP) (PDF) [Code]
MLCAH: Multi-Level Correlation Adversarial Hashing for Cross-Modal Retrieval(TMM) (PDF)
DADH: Deep Adversarial Discrete Hashing for Cross-Modal Retrieval(ICMR) (PDF) [Code]

2019

AGAH: Adversary Guided Asymmetric Hashing for Cross-Modal Retrieval(ICMR) (PDF) [Code]

2018

SSAH: Self-Supervised Adversarial Hashing Networks for Cross-Modal Retrieval(CVPR) (PDF) [Code]

2.2.2.6 Graph Model

[Click to expand]

2022

HMAH: Multi-Task Consistency-Preserving Adversarial Hashing for Cross-Modal Retrieval(TMM) (PDF)
SCAHN: Semantic Structure Enhanced Contrastive Adversarial Hash Network for Cross-media Representation Learning(MM) (PDF) [Code]

2021

LGCNH: Local Graph Convolutional Networks for Cross-Modal Hashing(MM) (PDF) [Code]

2019

GCH: Graph Convolutional Network Hashing for Cross-Modal Retrieval(IJCAI) (PDF) [Code]

2.2.2.7 Transformer

[Click to expand]

2022

DCHMT: Differentiable Cross-modal Hashing via Multimodal Transformers(CIKM) (PDF) [Code]
UniHash: Contrastive Label Correlation Enhanced Unified Hashing Encoder for Cross-modal Retrieval(MM) (PDF) [Code]

2.2.2.8 Memory Network

[Click to expand]

2021

CMPD: Using Cross Memory Network With Pair Discrimination for Image-Text Retrieval(TCSVT) (PDF)

2019

CMMN: Deep Memory Network for Cross-Modal Retrieval(TMM) (PDF)

2.2.2.9 Quantization

[Click to expand]

2022

ACQH: Asymmetric Correlation Quantization Hashing for Cross-Modal Retrieval(TMM) (PDF)

2017

CDQ: Collective Deep Quantization for Efﬁcient Cross-Modal Retrieval(AAAI) (PDF) [Code]

2.3 Unsupervised-cross-modal-real-valued

[Click to expand]

2.3.1 Early unsupervised cross-modal real-valued retrieval

[Click to expand]

2.3.1.1 CCA

[Click to expand]

2017

ICCA:Towards Improving Canonical Correlation Analysis for Cross-modal Retrieval(MM) [PDF]

2015

DCMIT:Deep Correlation for Matching Images and Text(CVPR) [PDF]
RCCA:Learning Query and Image Similarities with Ranking Canonical Correlation Analysis(ICCV) [PDF]

2014

MCCA:A Multi-View Embedding Space for Modeling Internet Images, Tags, and Their Semantics(IJCV) [PDF]

2013

KCCA:Framing Image Description as a Ranking Task Data, Models and Evaluation Metrics(JAIR) [PDF]
DCCA:Deep Canonical Correlation Analysis(ICML) [PDF] [Code]

2012

CR:Continuum Regression for Cross-modal Multimedia Retrieval(ICIP) [PDF]

2010

CCA:A New Approach to Cross-Modal Multimedia Retrieval(MM) [PDF][Code]

2.3.1.2 Topic Model

[Click to expand]

2011

MDRF:Learning Cross-modality Similarity for Multinomial Data(ICCV) [PDF]

2010

tr-mmLDA:Topic Regression Multi-Modal Latent Dirichlet Allocation for Image Annotation(CVPR) [PDF]

2003

Corr-LDA:Modeling Annotated Data(SIGIR) [PDF]

2.3.1.3 Other Shallow

[Click to expand]

2013

Bi-CMSRM:Cross-Media Semantic Representation via Bi-directional Learning to Rank(MM) [PDF]
CTM:Cross-media Topic Mining on Wikipedia(MM) [PDF]

2012

CoCA:Dimensionality Reduction on Heterogeneous Feature Space(ICDM) [PDF]

2011

MCU:Maximum Covariance Unfolding: Manifold Learning for Bimodal Data(NIPS) [PDF]

2008

PAMIR:A Discriminative Kernel-Based Model to Rank Images from Text Queries(TPAMI) [PDF]

2003

CFA:Multimedia Content Processing through Cross-Modal Association(MM) [PDF]

2.3.1.4 Neural Network

[Click to expand]

2018

CDPAE:Comprehensive Distance-Preserving Autoencoders for Cross-Modal Retrieval(MM) [PDF][Code]

2016

CMDN:Cross-Media Shared Representation by Hierarchical Learning with Multiple Deep Networks(IJCAI) [PDF][Code]
MSAE:Effective deep learning-based multi-modal retrieval(VLDB) [PDF]

2014

Corr-AE:Cross-modal Retrieval with Correspondence Autoencoder(MM) [PDF]

2013

RGDBN:Latent Feature Learning in Social Media Network(MM) [PDF]

2012

MDBM:Multimodal Learning with Deep Boltzmann Machines(NIPS) [PDF]

2.3.2 Image-text matching retrieval

[Click to expand]

2.3.2.1 Native Network

[Click to expand]

2022

UWML:Universal Weighting Metric Learning for Cross-Modal Retrieval (TPAMI) [PDF][Code]
LESS:Learning to Embed Semantic Similarity for Joint Image-Text Retrieval (TPAMI)[PDF]
CMCM:Cross-Modal Coherence for Text-to-Image Retrieval (AAAI) [PDF]
P2RM:Point to Rectangle Matching for Image Text Retrieval(MM) [PDF]

2020

DPCITE:Dual-path Convolutional Image-Text Embeddings with Instance Loss(TOMM) [PDF] [code]
PSN:Preserving Semantic Neighborhoods for Robust Cross-Modal Retrieval(ECCV) [PDF] [Code]

2019

LDR:Learning Disentangled Representation for Cross-Modal Retrieval with Deep Mutual Information Estimation(MM) [PDF]

2018

CHAIN-VSE:Bidirectional Retrieval Made Simple(CVPR) [PDF] [Code]

2017

CRC:Cross-media Relevance Computation for Multimedia Retrieval(MM) [PDF]
VSE++: Improving Visual-Semantic Embeddings with Hard Negatives:(Arxiv) [PDF][Code]
RRF-Net:Learning a Recurrent Residual Fusion Network for Multimodal Matching(ICCV) [PDF][Code]

2016

DBRLM:Cross-Modal Retrieval via Deep and Bidirectional Representation Learning(TMM) [PDF]

2015

MSDS:Image-Text Cross-Modal Retrieval via Modality-Speciﬁc Feature Learning(ICMR) [PDF]

2014

DT-RNN:Grounded Compositional Semantics for Finding and Describing Images with Sentences(TACL) [PDF]

2.3.2.2 Dot-product Attention

[Click to expand]

2020

SMAN: Stacked Multimodal Attention Network for Cross-Modal Image-Text Retrieval(TC) [PDF]
CAAN:Context-Aware Attention Network for Image-Text Retrieval(CVPR) [PDF]
IMRAM: Iterative Matching with Recurrent Attention Memory for Cross-Modal Image-Text Retrieval(CVPR) [PDF] [Code]

2019

PFAN:Position Focused Attention Network for Image-Text Matching (IJCAI) [PDF][Code]
CAMP: Cross-Modal Adaptive Message Passing for Text-Image Retrieval(ICCV) [PDF] [Code]
CMRSC:Cross-Modal Image-Text Retrieval with Semantic Consistency(MM) [PDF] [Code]

2018

MCSM:Modality-specific Cross-modal Similarity Measurement with Recurrent Attention Network(TIP) [PDF][Code]
DSVEL:Finding beans in burgers: Deep semantic-visual embedding with localization(CVPR) [PDF][Code]
CRAN:Cross-media Multi-level Alignment with Relation Attention Network(IJCAI)[PDF]
SCAN:Stacked Cross Attention for Image-Text Matching(ECCV) [PDF] [Code]

2017

sm-LSTM:Instance-aware Image and Sentence Matching with Selective Multimodal LSTM(CVPR) [PDF]

2.3.2.3 Graph Model

[Click to expand]

2022

LHSC:Learning Hierarchical Semantic Correspondences for Cross-Modal Image-Text Retrieval(ICMR) [PDF]
IFRFGF:Improving Fusion of Region Features and Grid Features via Two-Step Interaction for Image-Text Retrieval(MM) [PDF]
CODER:Coupled Diversity-Sensitive Momentum Contrastive Learning for Image-Text Retrieval(ECCV) [PDF]

2021

HSGMP: Heterogeneous Scene Graph Message Passing for Cross-modal Retrieval(ICMR) [PDF]
WCGL：Wasserstein Coupled Graph Learning for Cross-Modal Retrieval(ICCV)[PDF]

2020

DSRAN:Learning Dual Semantic Relations with Graph Attention for Image-Text Matching(TCSVT) [PDF] [code]
VSM:Visual-Semantic Matching by Exploring High-Order Attention and Distraction(CVPR) [PDF]
SGM:Cross-modal Scene Graph Matching for Relationship-aware Image-Text Retrieval(WACV) [PDF]

2019

KASCE:Knowledge Aware Semantic Concept Expansion for Image-Text Matching(IJCAI) [PDF]
VSRN:Visual Semantic Reasoning for Image-Text Matching(ICCV) [PDF] [Code]

2.3.2.4 Transformer

[Click to expand]

2022

DREN:Dual-Level Representation Enhancement on Characteristic and Context for Image-Text Retrieval(TCSVT) [PDF]
M2D-BERT:Multi-scale Multi-modal Dictionary BERT For Effective Text-image Retrieval in Multimedia Advertising(CIKM) [PDF]
ViSTA:ViSTA: Vision and Scene Text Aggregation for Cross-Modal Retrieval(CVPR) [PDF]
COTS: Collaborative Two-Stream Vision-Language Pre-Training Model for Cross-Modal Retrieval(CVPR) [PDF]
EI-CLIP: Entity-aware Interventional Contrastive Learning for E-commerce Cross-modal Retrieval(CVPR) [PDF]
SSAMT:Constructing Phrase-level Semantic Labels to Form Multi-Grained Supervision for Image-Text Retrieval(ICMR) [PDF]
TEAM:Token Embeddings Alignment for Cross-Modal Retrieval(MM) [PDF]
CAliC: Accurate and Efficient Image-Text Retrieval via Contrastive Alignment and Visual Contexts Modeling(MM) [PDF]

2021

GRAN:Global Relation-Aware Attention Network for Image-Text Retrieval(ICMR) [PDF]
PCME:Probabilistic Embeddings for Cross-Modal Retrieval(CVPR) [PDF] [code]

2020

FashionBERT: Text and Image Matching with Adaptive Loss for Cross-modal Retrieval(SIGIR) [PDF]

2019

PVSE:Polysemous Visual-Semantic Embedding for Cross-Modal Retrieval(CVPR) [PDF] [Code]

2.3.2.5 Cross-modal Generation

[Click to expand]

2022

PCMDA:Paired Cross-Modal Data Augmentation for Fine-Grained Image-to-Text Retrieval(MM)[PDF]

2021

CRGN:Deep Relation Embedding for Cross-Modal Retrieval(TIP) [PDF][Code]
X-MRS:Cross-Modal Retrieval and Synthesis (X-MRS): Closing the Modality Gapin Shared Representation Learning(MM) [PDF][Code]

2020

AACR:Augmented Adversarial Training for Cross-Modal Retrieval(TMM) [PDF] [Code]

2018

LSCO:Learning Semantic Concepts and Order for Image and Sentence Matching(CVPR) [PDF]
TCCM:Towards Cycle-Consistent Models for Text and Image Retrieval(CVPR) [PDF]
GXN:Look, Imagine and Match: Improving Textual-Visual Cross-Modal Retrieval with Generative Models(CVPR) [PDF]

2017

2WayNet:Linking Image and Text with 2-Way Nets(CVPR) [PDF]

2015

DVSA:Deep Visual-Semantic Alignments for Generating Image Descriptions(CVPR) [PDF]

2.4 Supervised-cross-modal-real-valued

[Click to expand]

2.4.1 Supervised shallow cross-modal real-valued retrieval

[Click to expand]

2.4.1.1 CCA

[Click to expand]

2022

MVMLCCA: Multi-view Multi-label Canonical Correlation Analysis for Cross-modal Matching and Retrieval(CVPRW) [PDF] [Code]

2015

ml-CCA: Multi-Label Cross-modal Retrieval(ICCV) [PDF] [Code]

2014

cluster-CCA: Cluster Canonical Correlation Analysis(ICAIS) [PDF]

2012

GMA: Generalized Multiview Analysis: A Discriminative Latent Space(CVPR) [PDF] [Code]

2.4.1.2 Dictionary Learning

[Click to expand]

2018

JDSLC: Joint Dictionary Learning and Semantic Constrained Latent Subspace Projection for Cross-Modal Retrieval(CIKM) [PDF]

2016

DDL: Discriminative Dictionary Learning With Common Label Alignment for Cross-Modal Retrieval(TMM) [PDF]

2014

CMSDL: Cross-Modality Submodular Dictionary Learning for Information Retrieval(CIKM) [PDF]

2013

SliM2: Supervised Coupled Dictionary Learning with Group Structures for Multi-Modal Retrieval(AAAI) [PDF]

2.4.1.3 Feature Mapping

[Click to expand]

2017

MDSSL: Cross-Modal Retrieval Using Multiordered Discriminative Structured Subspace Learning(TMM) [PDF]
JLSLR: Joint Latent Subspace Learning and Regression for Cross-Modal Retrieval(SIGIR) [PDF]

2016

JFSSL: Joint Feature Selection and Subspace Learning for Cross-Modal Retrieval(TPAIMI) [PDF] [Code]
MDCR: Modality-Dependent Cross-Media Retrieval(TIST) [PDF]
CRLC: Cross-modal Retrieval with Label Completion(MM) [PDF]

2013

JGRHML: Heterogeneous Metric Learning with Joint Graph Regularization for Cross-Media Retrieval(AAAI) [PDF] [Code]
LCFS: Learning Coupled Feature Spaces for Cross-modal Matching(ICCV) [PDF]

2011

Multi-NPP: Learning Multi-View Neighborhood Preserving Projections(ICML) [PDF]

2.4.1.4 Topic Model

[Click to expand]

2014

M3R: Multi-modal Mutual Topic Reinforce Modeling for Cross-media Retrieval(MM) [PDF]
NPBUS: Nonparametric Bayesian Upstream Supervised Multi-Modal Topic Models(WSDM) [PDF]

2.4.1.5 Other Shallow

[Click to expand]

2019

CMOS: Online Asymmetric Metric Learning With Multi-Layer Similarity Aggregation for Cross-Modal Retrieval(TIP) [PDF]

2017

CMOS: Online Asymmetric Similarity Learning for Cross-Modal Retrieval(CVPR) [PDF]

2016

PL-ranking: A Novel Ranking Method for Cross-Modal Retrieval(MM) [PDF]
RL-PLS: Cross-modal Retrieval by Real Label Partial Least Squares(MM) [PDF]

2013

PFAR: Parallel Field Alignment for Cross Media Retrieval(MM) [PDF]

2.4.2 Supervised deep cross-modal real-valued retrieval

[Click to expand]

2.4.2.1 Naive Network

[Click to expand]

2022

C3CMR: Cross-Modality Cross-Instance Contrastive Learning for Cross-Media Retrieval(MM) [PDF]

2020

ED-Net: Event-Driven Network for Cross-Modal Retrieval(CIKM) [PDF]

2019

DSCMR: Deep Supervised Cross-modal Retrieval(CVPR) [PDF] [Code]
SAM: Cross-Modal Subspace Learning with Scheduled Adaptive Margin Constraints(MM) [PDF]

2017

deep-SM: Cross-Modal Retrieval With CNN Visual Features: A New Baseline(TCYB) [PDF] [Code]
CCL: Cross-modal Correlation Learning With Multigrained Fusion by Hierarchical Network(TMM) [PDF]
MSFN: Cross-media Retrieval by Learning Rich Semantic Embeddings of Multimedia(MM) [PDF]
MNiL: Multi-Networks Joint Learning for Large-Scale Cross-Modal Retrieval(MM) [PDF] [Code]

2016

MDNN: Effective deep learning-based multi-modal retrieval(VLDB) [PDF]

2015

RE-DNN: Deep Semantic Mapping for Cross-Modal Retrieval(ICTAI) [PDF]
C2MLR: Deep Compositional Cross-modal Learning to Rank via Local-Global Alignment(MM) [PDF]

2.4.2.2 GAN

[Click to expand]

2022

JFSE: Joint Feature Synthesis and Embedding: Adversarial Cross-Modal Retrieval Revisited(TPAMI) [PDF] [Code]

2021

AACR: Augmented Adversarial Training for Cross-Modal Retrieval(TMM) [PDF] [Code]

2018

CM-GANs: Cross-modal Generative Adversarial Networks for Common Representation Learning(TMM) [PDF] [Code]

2017

ACMR: Adversarial Cross-Modal Retrieval(MM) [PDF] [Code]

2.4.2.3 Graph Model

[Click to expand]

2022

AGCN: Adversarial Graph Convolutional Network for Cross-Modal Retrieval(TCSVT) [PDF]
ALGCN: Adaptive Label-Aware Graph Convolutional Networks for Cross-Modal Retrieval(TMM) [PDF]
HGE: Cross-Modal Retrieval with Heterogeneous Graph Embedding(MM) [PDF]

2021

GCR: Exploring Graph-Structured Semantics for Cross-Modal Retrieval(MM) [PDF] [Code]
DAGNN: Dual Adversarial Graph Neural Networks for Multi-label Cross-modal Retrieval(AAAI) [PDF]

2018

SSPE: Learning Semantic Structure-preserved Embeddings for Cross-modal Retrieval(MM) [PDF]

2.4.2.4 Transformer

[Click to expand]

2021

RLCMR: Rethinking Label-Wise Cross-Modal Retrieval from A Semantic Sharing Perspective(IJCAI) [PDF]

2.5 Cross-modal-Retrieval-under-Special-Retrieval-Scenario

[Click to expand]

2.5.1 Semi-Supervised (Real-valued)

[Click to expand]

2020

SSCMR:Semi-Supervised Cross-Modal Retrieval With Label Prediction(TMM) [PDF]

2019

A3VSE:Annotation Efficient Cross-Modal Retrieval with Adversarial Attentive Alignment(MM) [PDF]
ASFS:Adaptive Semi-Supervised Feature Selection for Cross-Modal Retrieval(TMM) [PDF]

2018

GSS-SL:Generalized Semi-supervised and Structured Subspace Learning for Cross-Modal Retrieval(TMM) [PDF]

2017

SSDC:Semi-supervised Distance Consistent Cross-modal Retrieval(VSCC)[PDF]

2013

JRL:Learning Cross-Media Joint Representation With Sparse and Semisupervised Regularization(TCSVT) [PDF][Code]

2012

MVML-GL:Multiview Metric Learning with Global Consistency and Local Smoothness(TIST) [PDF]

2.5.2 Semi-Supervised (Hashing)

[Click to expand]

2020

SCH-GAN：Semi-Supervised Cross-Modal Hashing by Generative Adversarial Network(TC) [PDF] [Code]
SGCH:Semi-supervised graph convolutional hashing network for large-scale cross-modal retrieval(ICIP) [PDF]

2019

SSDQ:Semi-supervised Deep Quantization for Cross-modal Search(MM) [PDF]
S3PH:Semi-supervised semantic-preserving hashing for efficient cross-modal retrieval(ICME) [PDF]

2017

AUSL:Adaptively Uniﬁed Semi-supervised Learning for Cross-Modal Retrieval(IJCAI) [PDF]

2016

NPH:Neighborhood-Preserving Hashing for Large-Scale Cross-Modal Search(MM) [PDF]

2.5.3 Imbalance (Real-valued)

[Click to expand]

2021

PAN: Prototype-based Adaptive Network for Robust Cross-modal Retrieval(SIGIR) [PDF]
MCCN: Multimodal Coordinated Clustering Network for Large-Scale Cross-modal Retrieval(MM) [PDF]

2020

DAVAE:Incomplete Cross-modal Retrieval with Dual-Aligned Variational Autoencoders(MM) [PDF]

2015

SCDL:Semi-supervised Coupled Dictionary Learning for Cross-modal Retrieval in Internet Images and Texts(MM) [PDF]
LGCFL:Learning Consistent Feature Representation for Cross-Modal Multimedia Retrieval(TMM) [PDF]

2.5.4 Imbalance (Hashing)

[Click to expand]

2020

RUCMH:Robust Unsupervised Cross-modal Hashing for Multimedia Retrieval(TOIS) [PDF]
ATFH-N:Adversarial Tri-Fusion Hashing Network for Imbalanced Cross-Modal Retrieval(TETCI) [PDF]
FlexCMH:Flexible Cross-Modal Hashing(TNNLS) [PDF]

2019

TFNH:Triplet Fusion Network Hashing for Unpaired Cross-Modal Retrieval(ICMR) [PDF] [Code]
CALM:Collective Afﬁnity Learning for Partial Cross-Modal Hashing(TIP) [PDF]
MTFH: A Matrix Tri-Factorization Hashing Framework for Efﬁcient Cross-Modal Retrieval:(TIP) [PDF] [Code]
GSPH:Generalized Semantic Preserving Hashing for Cross-Modal Retrieval(TIP) [PDF]

2018

DAH:Dense Auto-Encoder Hashing for Robust Cross-Modality Retrieval(MM) [PDF]

2017

GSPH:Generalized Semantic Preserving Hashing for n-Label Cross-Modal Retrieval(CVPR) [PDF] [Code]

2.5.5 Incremental

[Click to expand]

2021

MARS: Learning Modality-Agnostic Representation for Scalable Cross-Media Retrieval(TCSVT) [PDF]
CCMR:Continual learning in cross-modal retrieval(CVPR) [PDF]
SCML:Real-world Cross-modal Retrieval via Sequential Learning(TMM) [PDF]

2020

ATTL-CEL:Adaptive Temporal Triplet-loss for Cross-modal Embedding Learning(MM)[PDF]

2019

SVHNs:Separated Variational Hashing Networks for Cross-Modal Retrieval(MM) [PDF]
ECMH:Extensible Cross-Modal Hashing(IJCAI) [PDF] [Code]

2018

TempXNet:Temporal Cross-Media Retrieval with Soft-Smoothing(MM) [PDF]

2.5.6 Noise

[Click to expand]

2022

DECL: Deep Evidential Learning with Noisy Correspondence for Cross-modal Retrieval(MM) (PDF) [Code]
ELRCMR: Early-Learning regularized Contrastive Learning for Cross-Modal Retrieval with Noisy Labels(MM) (PDF)
CMMQ: Mutual Quantization for Cross-Modal Search with Noisy Labels(CVPR) (PDF)

2021

MRL: Learning Cross-Modal Retrieval with Noisy Labels(CVPR) (PDF) [Code]

2018

WSJE: Webly Supervised Joint Embedding for Cross-Modal Image-Text Retrieval(MM) (PDF)

2.5.7 Cross-Domain

[Click to expand]

2021

M2GUDA: Multi-Metrics Graph-Based Unsupervised Domain Adaptation for Cross-Modal Hashing(ICMR) (PDF)
ACP: Adaptive Cross-Modal Prototypes for Cross-Domain Visual-Language Retrieval(CVPR) (PDF)

2020

DASG: Unsupervised Cross-Media Retrieval Using Domain Adaptation With Scene Graph(TCSVT) (PDF)

2.5.8 Zero-Shot

[Click to expand]

2020

LCALE: Learning Cross-Aligned Latent Embeddings for Zero-Shot Cross-Modal Retrieval(AAAI) (PDF)
CFSA: Correlated Features Synthesis and Alignment for Zero-shot Cross-modal Retrieval(SIGIR) (PDF)

2019

ZS-CMR: Learning Cross-Aligned Latent Embeddings for Zero-Shot Cross-Modal Retrieval(TIP) (PDF)

2.5.9 Few-Shot

[Click to expand]

2021

SOCMH: Know Yourself and Know Others: Efficient Common Representation Learning for Few-shot Cross-modal Retrieval(ICMR) (PDF)

2.5.10 Online Learning

[Click to expand]

2020

CMOLRS: Online Fast Adaptive Low-Rank Similarity Learning for Cross-Modal Retrieval(TMM) (PDF) [Code]
LEMON: Label Embedding Online Hashing for Cross-Modal Retrieval(MM) (PDF) [Code]

2019

FOMH: Flexible Online Multi-modal Hashing for Large-scale Multimedia Retrieval(MM) (PDF) [Code]

2017

OCMSR: Online Cross-Modal Scene Retrieval by Binary Representation and Semantic Graph(MM) (PDF)

2016

OCMH: Online cross-modal hashing for web image retrieval(AAAI) (PDF)

2.5.11 Hierarchical

[Click to expand]

2020

SHDCH: Supervised Hierarchical Deep Hashing for Cross-Modal Retrieval(MM) (PDF) [Code]

2019

HiCHNet: Supervised Hierarchical Cross-Modal Hashing(SIGIR) (PDF) [Code]

2.5.12 Fine-grained

[Click to expand]

2022

PCMDA: Paired Cross-Modal Data Augmentation for Fine-Grained Image-to-Text Retrieval(MM) (PDF)

2019

FGCrossNet: A New Benchmark and Approach for Fine-grained Cross-media Retrieval(MM) (PDF) [Code]

3. Usage

3.1 Datasets

Graph Model--GCR

Dataset Link:

Baidu Yun Link: https://pan.baidu.com/s/1YmW8Zz2uK3AgCs6pDEoA8A?pwd=21xh
Code: 21xh

Unsupervised cross-modal real-valued

Dataset link:

Baidu Yun Link：https://pan.baidu.com/s/1hBNo8gBSyLbik0ka1POhiQ 
Code：cc53

Quantization--CDQ

Dataset Link:

Baidu Yun Link: https://pan.baidu.com/s/1mO1hdsJR2FN5xEAv2e7eaw?pwd=us9v
Code: us9v

GAN--CPAH

Dataset Link:

Baidu Yun Link: https://pan.baidu.com/s/145Zool0FUb3758EeSxtHBw?pwd=mxt7
Code: mxt7

Transformer--DCHMT

Dataset Link:

Baidu Yun Link: https://pan.baidu.com/s/1UHr2NVjFkTjLXXQ8Izy5WA?pwd=qfsj
Code: qfsj

Feature Mapping(Sample Constraint)(Label Constraint)--MDBE

Dataset Link:

Baidu Yun Link: https://pan.baidu.com/s/15BtQ_Zz7UihZBW6KXTTodA?pwd=ir7g
Code: ir7g

Feature Mapping(Sample Constraint)(Common Hamming)--RoPH

Dataset Link:

Baidu Yun Link: https://pan.baidu.com/s/1_uIulkuxcIcubvl5u3zsOA?pwd=46c4
Code: 46c4

Online learning--SHDCH

Dataset Link:

Baidu Yun Link: https://pan.baidu.com/s/1-CsIJbvz3IFsmDgYk9BwYg?pwd=7hd8
Code: 7hd8

Noise--MRL

Dataset Link:

Baidu Yun Link: https://pan.baidu.com/s/1FIrB-gXJa9VHKzLRQZf30Q?pwd=g3qt
Code: g3qt

Online learning--LEMON

Dataset Link:

Baidu Yun Link: https://pan.baidu.com/s/1s5SnnAXo5wK7cmRs3zNq4w?pwd=jxjo
Code: jxjo

Fine-grained--FGCrossNet

Dataset Link:

Baidu Yun Link: https://pan.baidu.com/s/1OYxCLmNKvPzwLIs5snTOlA?pwd=r80g
Code: r80g

Noise--DECL

Dataset Link:

Baidu Yun Link: https://pan.baidu.com/s/1FcxkwOuuiUXnIl1LAatDLA?pwd=nl2z
Code: nl2z

bmc-sdnu / cross-modal-retrieval Goto Github PK