Git Product home page Git Product logo

cross-modal-retrieval's Introduction

Cross-modal Retrieval

1. Introduction

This library is an open-source repository that contains cross-modal retrieval methods and codes.

2. Supported Methods

The currently supported algorithms include:

[Click to expand]

2.1 Unsupervised cross-modal hashing retrieval

[Click to expand]

2.1.1 Unsupervised shallow cross-modal hashing retrieval

[Click to expand]

2.1.1.1 Matrix Factorization

[Click to expand]
2017
  • RFDH:Robust and Flexible Discrete Hashing for Cross-Modal Similarity Search(TCSVT) [PDF] [Code]
2015
  • STMH:Semantic Topic Multimodal Hashing for Cross-Media Retrieval(IJCAI)[PDF]
2014
  • LSSH:Latent Semantic Sparse Hashing for Cross-Modal Similarity Search(SIGIR)[PDF]

  • CMFH:Collective Matrix Factorization Hashing for Multimodal Data(CVPR)[PDF]

2.1.1.2 Graph Theory

[Click to expand]
2018
  • HMR:Hetero-Manifold Regularisation for Cross-Modal Hashing(TPAMI)[PDF]
2017
  • FSH:Cross-Modality Binary Code Learning via Fusion Similarity Hashing(CVPR)[PDF][Code]
2014
  • SM2H:Sparse Multi-Modal Hashing(TMM)[PDF]
2013
  • IMH:Inter-Media Hashing for Large-scale Retrieval from Heterogeneous Data Sources(SIGMOD)[PDF]

  • LCMH:Linear Cross-Modal Hashing for Efficient Multimedia Search(MM)[PDF]

2011
  • CVH:Learning Hash Functions for Cross-View Similarity Search(IJCAI)[PDF]

2.1.1.3 Other Shallow

[Click to expand]
2019
  • CRE:Collective Reconstructive Embeddings for Cross-Modal Hashing(TIP)[PDF]
2018
  • HMR:Hetero-Manifold Regularisation for Cross-Modal Hashing(TPAMI)[PDF]
2015
  • FS-LTE:Full-Space Local Topology Extraction for Cross-Modal Retrieval(TIP)[PDF]
2014
  • IMVH:Iterative Multi-View Hashing for Cross Media Indexing(MM)[PDF]
2013
  • PDH:Predictable Dual-View Hashing(ICML)[PDF]

2.1.1.4 Quantization

[Click to expand]
2016
  • CCQ:Composite Correlation Quantization for Efficient Multimodal Retrieval(SIGIR)[PDF]

  • CMCQ:Collaborative Quantization for Cross-Modal Similarity Search(CVPR)[PDF]

2015
  • ACQ:Alternating Co-Quantization for Cross-modal Hashing(ICCV)[PDF]

2.1.2 Unsupervised deep cross-modal hashing retrieval

[Click to expand]

2.1.2.1 Naive Network

[Click to expand]
2019
  • UDFCH:Unsupervised Deep Fusion Cross-modal Hashing(ICMI)[PDF]
2018
  • UDCMH:Unsupervised Deep Hashing via Binary Latent Factor Models for Large-scale Cross-modal Retrieval(IJCAI)[PDF]
2017
  • DBRC:Deep Binary Reconstruction for Cross-modal Hashing(MM)[PDF]
2015
  • DMHOR:Learning Compact Hash Codes for Multimodal Representations Using Orthogonal Deep Structure(TMM)[PDF]

2.1.2.2 GAN

[Click to expand]
2020
  • MGAH:Multi-Pathway Generative Adversarial Hashing for Unsupervised Cross-Modal Retrieval(TMM)[PDF]
2019
  • CYC-DGH:Cycle-Consistent Deep Generative Hashing for Cross-Modal Retrieval(TIP)[PDF]

  • UCH:Coupled CycleGAN: Unsupervised Hashing Network for Cross-Modal Retrieval(AAAI)[PDF]

2018
  • UGACH:Unsupervised Generative Adversarial Cross-modal Hashing(AAAI)[PDF][Code]

2.1.2.3 Graph Model

[Click to expand]
2022
  • ASSPH:Adaptive Structural Similarity Preserving for Unsupervised Cross Modal Hashing(MM)[PDF]
2021
  • AGCH:Aggregation-based Graph Convolutional Hashing for Unsupervised Cross-modal Retrieval(TMM)[PDF]

  • DGCPN:Deep Graph-neighbor Coherence Preserving Network for Unsupervised Cross-modal Hashing(AAAI)[PDF][Code]

2020
  • DCSH:Unsupervised Deep Cross-modality Spectral Hashing(TIP)[PDF]

  • SRCH:Set and Rebase: Determining the Semantic Graph Connectivity for Unsupervised Cross-Modal Hashing(IJCAI)[PDF]

  • JDSH:Joint-modal Distribution-based Similarity Hashing for Large-scale Unsupervised Deep Cross-modal Retrieval(SIGIR)[PDF][Code]

  • DSAH:Deep Semantic-Alignment Hashing for Unsupervised Cross-Modal Retrieval(ICMR)[PDF][Code]

2019
  • DJSRH:Deep Joint-Semantics Reconstructing Hashing for Large-Scale Unsupervised Cross-Modal Retrieval(ICCV)[PDF][Code]

2.1.2.4 Knowledge Distillation

[Click to expand]
2022
  • DAEH:Deep Adaptively-Enhanced Hashing With Discriminative Similarity Guidance for Unsupervised Cross-Modal Retrieval(TCSVT)[PDF]
2021
  • KDCMH:Unsupervised Deep Cross-Modal Hashing by Knowledge Distillation for Large-scale Cross-modal Retrieval(ICMR)[PDF]

  • JOG:Joint-teaching: Learning to Refine Knowledge for Resource-constrained Unsupervised Cross-modal Retrieval(MM)[PDF]

2020
  • UKD:Creating Something from Nothing: Unsupervised Knowledge Distillation for Cross-Modal Hashing(CVPR)[PDF]

2.2 Supervised-cross-modal-hashing-retrieval

[Click to expand]

2.2.1 Supervised shallow cross-modal hashing retrieval

[Click to expand]

2.2.1.1 Matrix Factorization

[Click to expand]
2022
  • SCLCH: Joint Specifics and Consistency Hash Learning for Large-Scale Cross-Modal Retrieval(TIP) [PDF]
2020
  • BATCH: A Scalable Asymmetric Discrete Cross-Modal Hashing(TKDE) [PDF] [Code]
2019
  • LCMFH: Label Consistent Matrix Factorization Hashing for Large-Scale Cross-Modal Similarity Search(TPAMI) [PDF]

  • TECH: A Two-Step Cross-Modal Hashing by Exploiting Label Correlations and Preserving Similarity in Both Steps(MM) [PDF]

2018
  • SCRATCH: A Scalable Discrete Matrix Factorization Hashing for Cross-Modal Retrieval(MM) [PDF]
2017
  • DCH: Learning Discriminative Binary Codes for Large-scale Cross-modal Retrieval(TIP) [PDF]
2016
  • SMFH: Supervised Matrix Factorization for Cross-Modality Hashing(IJCAI) [PDF]

  • SMFH: Supervised Matrix Factorization Hashing for Cross-Modal Retrieval(TIP) [PDF]

2.2.1.2 Dictionary Learning

[Click to expand]
2016
  • DCDH: Discriminative Coupled Dictionary Hashing for Fast Cross-Media Retrieval(MM) [PDF]
2014
  • DLCMH: Dictionary Learning Based Hashing for Cross-Modal Retrieval(SIGIR) [PDF]

2.2.1.3 Feature Mapping-Sample-Constraint-Label-Constraint

[Click to expand]
2022
  • DJSAH: Discrete Joint Semantic Alignment Hashing for Cross-Modal Image-Text Search(TCSVT) (PDF)
2020
  • FUH: Fast Unmediated Hashing for Cross-Modal Retrieval(TCSVT) (PDF)
2016
  • MDBE: Multimodal Discriminative Binary Embedding for Large-Scale Cross-Modal Retrieval(TIP) (PDF) [Code]

2.2.1.4 Feature Mapping-Sample-Constraint-Separate-Hamming

[Click to expand]
2017
  • CSDH: Sequential Discrete Hashing for Scalable Cross-Modality Similarity Retrieval(TIP) (PDF)
2016
  • DASH: Frustratingly Easy Cross-Modal Hashing(MM) (PDF)
2015
  • QCH: Quantized Correlation Hashing for Fast Cross-Modal Search(IJCAI) (PDF)

2.2.1.5 Feature Mapping-Sample-Constraint-Common Hamming

[Click to expand]
2021
  • ASCSH: Asymmetric Supervised Consistent and Specific Hashing for Cross-Modal Retrieval(TIP) (PDF) [Code]
2019
  • SRDMH: Supervised Robust Discrete Multimodal Hashing for Cross-Media Retrieval(TMM) (PDF)
2018
  • FDCH: Fast Discrete Cross-modal Hashing With Regressing From Semantic Labels(MM) (PDF)
2017
  • SRSH: Semi-Relaxation Supervised Hashing for Cross-Modal Retrieval(MM) (PDF) [Code]

  • RoPH: Cross-Modal Hashing via Rank-Order Preserving(TMM) (PDF) [Code]

2016
  • SRDMH: Supervised Robust Discrete Multimodal Hashing for Cross-Media Retrieval(CIKM) (PDF)

2.2.1.6 Feature Mapping-Relation-Constraint

[Click to expand]
2017
  • LSRH: Linear Subspace Ranking Hashing for Cross-Modal Retrieval(TPAMI) (PDF)
2014
  • SCM: Large-Scale Supervised Multimodal Hashing with Semantic Correlation Maximization(AAAI) (PDF)

  • HTH: Scalable Heterogeneous Translated Hashing(KDD) (PDF)

2013
  • PLMH: Parametric Local Multimodal Hashing for Cross-View Similarity Search(IJCAI) (PDF)

  • RaHH: Comparing Apples to Oranges: A Scalable Solution with Heterogeneous Hashing(KDD) (PDF) [Code]

2012
  • CRH: Co-Regularized Hashing for Multimodal Data(CRH) (PDF)

2.2.1.7 Other Shallow

[Click to expand]
2019
  • DLFH: Discrete Latent Factor Model for Cross-Modal Hashing(TIP) (PDF) [Code]
2018
  • SDMCH: Supervised Discrete Manifold-Embedded Cross-Modal Hashing(IJCAI) (PDF)
2015
  • SePH: Semantics-Preserving Hashing for Cross-View Retrieval(CVPR) (PDF)
2012
  • MLBE: A Probabilistic Model for Multimodal Hash Function Learning(KDD) (PDF)
2010
  • CMSSH: Data Fusion through Cross-modality Metric Learning using Similarity-Sensitive Hashing(CVPR) (PDF)

2.2.2 Supervised deep cross-modal hashing retrieval

[Click to expand]

2.2.2.1 Naive Network-Distance-Constraint

[Click to expand]
2019
  • MCITR: Cross-modal Image-Text Retrieval with Multitask Learning(CIKM) (PDF)
2016
  • CAH: Correlation Autoencoder Hashing for Supervised Cross-Modal Search(ICMR) (PDF)
2014
  • CMNNH: Cross-Media Hashing with Neural Networks(MM) (PDF)

  • MMNN: Multimodal Similarity-Preserving Hashing(TPAMI) (PDF)

2.2.2.2 Naive Network-Similarity-Constraint

[Click to expand]
2022
  • Bi-CMR: Bidirectional Reinforcement Guided Hashing for Effective Cross-Modal Retrieval(AAAI) (PDF) [Code]

  • Bi-NCMH: Deep Normalized Cross-Modal Hashing with Bi-Direction Relation Reasoning(CVPR) (PDF)

2021
  • OTCMR: Bridging Heterogeneity Gap with Optimal Transport for Cross-modal Retrieval(CIKM) (PDF)

  • DUCMH: Deep Unified Cross-Modality Hashing by Pairwise Data Alignment(IJCAI) (PDF)

2020
  • NRDH: Nonlinear Robust Discrete Hashing for Cross-Modal Retrieval(SIGIR) (PDF)

  • DCHUC: Deep Cross-Modal Hashing with Hashing Functions and Unified Hash Codes Jointly Learning(TKDE) (PDF) [Code]

2017
  • CHN: Correlation Hashing Network for Efficient Cross-Modal Retrieval(BMVC) (PDF)
2016
  • DVSH: Deep Visual-Semantic Hashing for Cross-Modal Retrieval(KDD) (PDF)

2.2.2.3 Naive Network-Negative-Log-Likelihood

[Click to expand]
2022
  • MSSPQ: Multiple Semantic Structure-Preserving Quantization for Cross-Modal Retrieval(ICMR) (PDF)
2021
  • DMFH: Deep Multiscale Fusion Hashing for Cross-Modal Retrieval(TCSVT) (PDF)

  • TEACH: Attention-Aware Deep Cross-Modal Hashing(ICMR) (PDF)

2020
  • MDCH: Mask Cross-modal Hashing Networks(TMM) (PDF)
2019
  • EGDH: Equally-Guided Discriminative Hashing for Cross-modal Retrieval(IJCAI) (PDF)
2018
  • DDCMH: Dual Deep Neural Networks Cross-Modal Hashing(AAAI) (PDF)

  • CMHH: Cross-Modal Hamming Hashing(ECCV) (PDF)

2017
  • PRDH: Pairwise Relationship Guided Deep Hashing for Cross-Modal Retrieval(AAAI) (PDF)

  • DCMH: Deep Cross-Modal Hashing(CVPR) (PDF) [Code]

2.2.2.4 Naive Network-Triplet-Constraint

[Click to expand]
2019
  • RDCMH: Multiple Semantic Structure-Preserving Quantization for Cross-Modal Retrieval(AAAI) (PDF)
2018
  • MCSCH: Multi-Scale Correlation for Sequential Cross-modal Hashing Learning(MM) (PDF)

  • TDH: Triplet-Based Deep Hashing Network for Cross-Modal Retrieval(TIP) (PDF)

2.2.2.5 GAN

[Click to expand]
2022
  • SCAHN: Semantic Structure Enhanced Contrastive Adversarial Hash Network for Cross-media Representation Learning(MM) (PDF) [Code]
2021
  • TGCR: Multiple Semantic Structure-Preserving Quantization for Cross-Modal Retrieval(TCSVT) (PDF)
2020
  • CPAH: Multi-Task Consistency-Preserving Adversarial Hashing for Cross-Modal Retrieval(TIP) (PDF) [Code]

  • MLCAH: Multi-Level Correlation Adversarial Hashing for Cross-Modal Retrieval(TMM) (PDF)

  • DADH: Deep Adversarial Discrete Hashing for Cross-Modal Retrieval(ICMR) (PDF) [Code]

2019
  • AGAH: Adversary Guided Asymmetric Hashing for Cross-Modal Retrieval(ICMR) (PDF) [Code]
2018
  • SSAH: Self-Supervised Adversarial Hashing Networks for Cross-Modal Retrieval(CVPR) (PDF) [Code]

2.2.2.6 Graph Model

[Click to expand]
2022
  • HMAH: Multi-Task Consistency-Preserving Adversarial Hashing for Cross-Modal Retrieval(TMM) (PDF)

  • SCAHN: Semantic Structure Enhanced Contrastive Adversarial Hash Network for Cross-media Representation Learning(MM) (PDF) [Code]

2021
  • LGCNH: Local Graph Convolutional Networks for Cross-Modal Hashing(MM) (PDF) [Code]
2019
  • GCH: Graph Convolutional Network Hashing for Cross-Modal Retrieval(IJCAI) (PDF) [Code]

2.2.2.7 Transformer

[Click to expand]
2022
  • DCHMT: Differentiable Cross-modal Hashing via Multimodal Transformers(CIKM) (PDF) [Code]

  • UniHash: Contrastive Label Correlation Enhanced Unified Hashing Encoder for Cross-modal Retrieval(MM) (PDF) [Code]

2.2.2.8 Memory Network

[Click to expand]
2021
  • CMPD: Using Cross Memory Network With Pair Discrimination for Image-Text Retrieval(TCSVT) (PDF)
2019
  • CMMN: Deep Memory Network for Cross-Modal Retrieval(TMM) (PDF)

2.2.2.9 Quantization

[Click to expand]
2022
  • ACQH: Asymmetric Correlation Quantization Hashing for Cross-Modal Retrieval(TMM) (PDF)
2017
  • CDQ: Collective Deep Quantization for Efficient Cross-Modal Retrieval(AAAI) (PDF) [Code]

2.3 Unsupervised-cross-modal-real-valued

[Click to expand]

2.3.1 Early unsupervised cross-modal real-valued retrieval

[Click to expand]

2.3.1.1 CCA

[Click to expand]
2017
  • ICCA:Towards Improving Canonical Correlation Analysis for Cross-modal Retrieval(MM) [PDF]
2015
  • DCMIT:Deep Correlation for Matching Images and Text(CVPR) [PDF]

  • RCCA:Learning Query and Image Similarities with Ranking Canonical Correlation Analysis(ICCV) [PDF]

2014
  • MCCA:A Multi-View Embedding Space for Modeling Internet Images, Tags, and Their Semantics(IJCV) [PDF]
2013
  • KCCA:Framing Image Description as a Ranking Task Data, Models and Evaluation Metrics(JAIR) [PDF]

  • DCCA:Deep Canonical Correlation Analysis(ICML) [PDF] [Code]

2012
  • CR:Continuum Regression for Cross-modal Multimedia Retrieval(ICIP) [PDF]
2010
  • CCA:A New Approach to Cross-Modal Multimedia Retrieval(MM) [PDF][Code]

2.3.1.2 Topic Model

[Click to expand]
2011
  • MDRF:Learning Cross-modality Similarity for Multinomial Data(ICCV) [PDF]
2010
  • tr-mmLDA:Topic Regression Multi-Modal Latent Dirichlet Allocation for Image Annotation(CVPR) [PDF]
2003
  • Corr-LDA:Modeling Annotated Data(SIGIR) [PDF]

2.3.1.3 Other Shallow

[Click to expand]
2013
  • Bi-CMSRM:Cross-Media Semantic Representation via Bi-directional Learning to Rank(MM) [PDF]

  • CTM:Cross-media Topic Mining on Wikipedia(MM) [PDF]

2012
  • CoCA:Dimensionality Reduction on Heterogeneous Feature Space(ICDM) [PDF]
2011
  • MCU:Maximum Covariance Unfolding: Manifold Learning for Bimodal Data(NIPS) [PDF]
2008
  • PAMIR:A Discriminative Kernel-Based Model to Rank Images from Text Queries(TPAMI) [PDF]
2003
  • CFA:Multimedia Content Processing through Cross-Modal Association(MM) [PDF]

2.3.1.4 Neural Network

[Click to expand]
2018
  • CDPAE:Comprehensive Distance-Preserving Autoencoders for Cross-Modal Retrieval(MM) [PDF][Code]
2016
  • CMDN:Cross-Media Shared Representation by Hierarchical Learning with Multiple Deep Networks(IJCAI) [PDF][Code]

  • MSAE:Effective deep learning-based multi-modal retrieval(VLDB) [PDF]

2014
  • Corr-AE:Cross-modal Retrieval with Correspondence Autoencoder(MM) [PDF]
2013
  • RGDBN:Latent Feature Learning in Social Media Network(MM) [PDF]
2012
  • MDBM:Multimodal Learning with Deep Boltzmann Machines(NIPS) [PDF]

2.3.2 Image-text matching retrieval

[Click to expand]

2.3.2.1 Native Network

[Click to expand]
2022
  • UWML:Universal Weighting Metric Learning for Cross-Modal Retrieval (TPAMI) [PDF][Code]

  • LESS:Learning to Embed Semantic Similarity for Joint Image-Text Retrieval (TPAMI)[PDF]

  • CMCM:Cross-Modal Coherence for Text-to-Image Retrieval (AAAI) [PDF]

  • P2RM:Point to Rectangle Matching for Image Text Retrieval(MM) [PDF]

2020
  • DPCITE:Dual-path Convolutional Image-Text Embeddings with Instance Loss(TOMM) [PDF] [code]

  • PSN:Preserving Semantic Neighborhoods for Robust Cross-Modal Retrieval(ECCV) [PDF] [Code]

2019
  • LDR:Learning Disentangled Representation for Cross-Modal Retrieval with Deep Mutual Information Estimation(MM) [PDF]
2018
  • CHAIN-VSE:Bidirectional Retrieval Made Simple(CVPR) [PDF] [Code]
2017
  • CRC:Cross-media Relevance Computation for Multimedia Retrieval(MM) [PDF]

  • VSE++: Improving Visual-Semantic Embeddings with Hard Negatives:(Arxiv) [PDF][Code]

  • RRF-Net:Learning a Recurrent Residual Fusion Network for Multimodal Matching(ICCV) [PDF][Code]

2016
  • DBRLM:Cross-Modal Retrieval via Deep and Bidirectional Representation Learning(TMM) [PDF]
2015
  • MSDS:Image-Text Cross-Modal Retrieval via Modality-Specific Feature Learning(ICMR) [PDF]
2014
  • DT-RNN:Grounded Compositional Semantics for Finding and Describing Images with Sentences(TACL) [PDF]

2.3.2.2 Dot-product Attention

[Click to expand]
2020
  • SMAN: Stacked Multimodal Attention Network for Cross-Modal Image-Text Retrieval(TC) [PDF]

  • CAAN:Context-Aware Attention Network for Image-Text Retrieval(CVPR) [PDF]

  • IMRAM: Iterative Matching with Recurrent Attention Memory for Cross-Modal Image-Text Retrieval(CVPR) [PDF] [Code]

2019
  • PFAN:Position Focused Attention Network for Image-Text Matching (IJCAI) [PDF][Code]

  • CAMP: Cross-Modal Adaptive Message Passing for Text-Image Retrieval(ICCV) [PDF] [Code]

  • CMRSC:Cross-Modal Image-Text Retrieval with Semantic Consistency(MM) [PDF] [Code]

2018
  • MCSM:Modality-specific Cross-modal Similarity Measurement with Recurrent Attention Network(TIP) [PDF][Code]

  • DSVEL:Finding beans in burgers: Deep semantic-visual embedding with localization(CVPR) [PDF][Code]

  • CRAN:Cross-media Multi-level Alignment with Relation Attention Network(IJCAI)[PDF]

  • SCAN:Stacked Cross Attention for Image-Text Matching(ECCV) [PDF] [Code]

2017
  • sm-LSTM:Instance-aware Image and Sentence Matching with Selective Multimodal LSTM(CVPR) [PDF]

2.3.2.3 Graph Model

[Click to expand]
2022
  • LHSC:Learning Hierarchical Semantic Correspondences for Cross-Modal Image-Text Retrieval(ICMR) [PDF]

  • IFRFGF:Improving Fusion of Region Features and Grid Features via Two-Step Interaction for Image-Text Retrieval(MM) [PDF]

  • CODER:Coupled Diversity-Sensitive Momentum Contrastive Learning for Image-Text Retrieval(ECCV) [PDF]

2021
  • HSGMP: Heterogeneous Scene Graph Message Passing for Cross-modal Retrieval(ICMR) [PDF]

  • WCGL:Wasserstein Coupled Graph Learning for Cross-Modal Retrieval(ICCV)[PDF]

2020
  • DSRAN:Learning Dual Semantic Relations with Graph Attention for Image-Text Matching(TCSVT) [PDF] [code]

  • VSM:Visual-Semantic Matching by Exploring High-Order Attention and Distraction(CVPR) [PDF]

  • SGM:Cross-modal Scene Graph Matching for Relationship-aware Image-Text Retrieval(WACV) [PDF]

2019
  • KASCE:Knowledge Aware Semantic Concept Expansion for Image-Text Matching(IJCAI) [PDF]

  • VSRN:Visual Semantic Reasoning for Image-Text Matching(ICCV) [PDF] [Code]

2.3.2.4 Transformer

[Click to expand]
2022
  • DREN:Dual-Level Representation Enhancement on Characteristic and Context for Image-Text Retrieval(TCSVT) [PDF]

  • M2D-BERT:Multi-scale Multi-modal Dictionary BERT For Effective Text-image Retrieval in Multimedia Advertising(CIKM) [PDF]

  • ViSTA:ViSTA: Vision and Scene Text Aggregation for Cross-Modal Retrieval(CVPR) [PDF]

  • COTS: Collaborative Two-Stream Vision-Language Pre-Training Model for Cross-Modal Retrieval(CVPR) [PDF]

  • EI-CLIP: Entity-aware Interventional Contrastive Learning for E-commerce Cross-modal Retrieval(CVPR) [PDF]

  • SSAMT:Constructing Phrase-level Semantic Labels to Form Multi-Grained Supervision for Image-Text Retrieval(ICMR) [PDF]

  • TEAM:Token Embeddings Alignment for Cross-Modal Retrieval(MM) [PDF]

  • CAliC: Accurate and Efficient Image-Text Retrieval via Contrastive Alignment and Visual Contexts Modeling(MM) [PDF]

2021
  • GRAN:Global Relation-Aware Attention Network for Image-Text Retrieval(ICMR) [PDF]

  • PCME:Probabilistic Embeddings for Cross-Modal Retrieval(CVPR) [PDF] [code]

2020
  • FashionBERT: Text and Image Matching with Adaptive Loss for Cross-modal Retrieval(SIGIR) [PDF]
2019
  • PVSE:Polysemous Visual-Semantic Embedding for Cross-Modal Retrieval(CVPR) [PDF] [Code]

2.3.2.5 Cross-modal Generation

[Click to expand]
2022
  • PCMDA:Paired Cross-Modal Data Augmentation for Fine-Grained Image-to-Text Retrieval(MM)[PDF]
2021
  • CRGN:Deep Relation Embedding for Cross-Modal Retrieval(TIP) [PDF][Code]

  • X-MRS:Cross-Modal Retrieval and Synthesis (X-MRS): Closing the Modality Gapin Shared Representation Learning(MM) [PDF][Code]

2020
  • AACR:Augmented Adversarial Training for Cross-Modal Retrieval(TMM) [PDF] [Code]
2018
  • LSCO:Learning Semantic Concepts and Order for Image and Sentence Matching(CVPR) [PDF]

  • TCCM:Towards Cycle-Consistent Models for Text and Image Retrieval(CVPR) [PDF]

  • GXN:Look, Imagine and Match: Improving Textual-Visual Cross-Modal Retrieval with Generative Models(CVPR) [PDF]

2017
  • 2WayNet:Linking Image and Text with 2-Way Nets(CVPR) [PDF]
2015
  • DVSA:Deep Visual-Semantic Alignments for Generating Image Descriptions(CVPR) [PDF]

2.4 Supervised-cross-modal-real-valued

[Click to expand]

2.4.1 Supervised shallow cross-modal real-valued retrieval

[Click to expand]

2.4.1.1 CCA

[Click to expand]
2022
  • MVMLCCA: Multi-view Multi-label Canonical Correlation Analysis for Cross-modal Matching and Retrieval(CVPRW) [PDF] [Code]
2015
  • ml-CCA: Multi-Label Cross-modal Retrieval(ICCV) [PDF] [Code]
2014
  • cluster-CCA: Cluster Canonical Correlation Analysis(ICAIS) [PDF]
2012
  • GMA: Generalized Multiview Analysis: A Discriminative Latent Space(CVPR) [PDF] [Code]

2.4.1.2 Dictionary Learning

[Click to expand]
2018
  • JDSLC: Joint Dictionary Learning and Semantic Constrained Latent Subspace Projection for Cross-Modal Retrieval(CIKM) [PDF]
2016
  • DDL: Discriminative Dictionary Learning With Common Label Alignment for Cross-Modal Retrieval(TMM) [PDF]
2014
  • CMSDL: Cross-Modality Submodular Dictionary Learning for Information Retrieval(CIKM) [PDF]
2013
  • SliM2: Supervised Coupled Dictionary Learning with Group Structures for Multi-Modal Retrieval(AAAI) [PDF]

2.4.1.3 Feature Mapping

[Click to expand]
2017
  • MDSSL: Cross-Modal Retrieval Using Multiordered Discriminative Structured Subspace Learning(TMM) [PDF]

  • JLSLR: Joint Latent Subspace Learning and Regression for Cross-Modal Retrieval(SIGIR) [PDF]

2016
  • JFSSL: Joint Feature Selection and Subspace Learning for Cross-Modal Retrieval(TPAIMI) [PDF] [Code]

  • MDCR: Modality-Dependent Cross-Media Retrieval(TIST) [PDF]

  • CRLC: Cross-modal Retrieval with Label Completion(MM) [PDF]

2013
  • JGRHML: Heterogeneous Metric Learning with Joint Graph Regularization for Cross-Media Retrieval(AAAI) [PDF] [Code]

  • LCFS: Learning Coupled Feature Spaces for Cross-modal Matching(ICCV) [PDF]

2011
  • Multi-NPP: Learning Multi-View Neighborhood Preserving Projections(ICML) [PDF]

2.4.1.4 Topic Model

[Click to expand]
2014
  • M3R: Multi-modal Mutual Topic Reinforce Modeling for Cross-media Retrieval(MM) [PDF]

  • NPBUS: Nonparametric Bayesian Upstream Supervised Multi-Modal Topic Models(WSDM) [PDF]

2.4.1.5 Other Shallow

[Click to expand]
2019
  • CMOS: Online Asymmetric Metric Learning With Multi-Layer Similarity Aggregation for Cross-Modal Retrieval(TIP) [PDF]
2017
  • CMOS: Online Asymmetric Similarity Learning for Cross-Modal Retrieval(CVPR) [PDF]
2016
  • PL-ranking: A Novel Ranking Method for Cross-Modal Retrieval(MM) [PDF]

  • RL-PLS: Cross-modal Retrieval by Real Label Partial Least Squares(MM) [PDF]

2013
  • PFAR: Parallel Field Alignment for Cross Media Retrieval(MM) [PDF]

2.4.2 Supervised deep cross-modal real-valued retrieval

[Click to expand]

2.4.2.1 Naive Network

[Click to expand]
2022
  • C3CMR: Cross-Modality Cross-Instance Contrastive Learning for Cross-Media Retrieval(MM) [PDF]
2020
  • ED-Net: Event-Driven Network for Cross-Modal Retrieval(CIKM) [PDF]
2019
  • DSCMR: Deep Supervised Cross-modal Retrieval(CVPR) [PDF] [Code]

  • SAM: Cross-Modal Subspace Learning with Scheduled Adaptive Margin Constraints(MM) [PDF]

2017
  • deep-SM: Cross-Modal Retrieval With CNN Visual Features: A New Baseline(TCYB) [PDF] [Code]

  • CCL: Cross-modal Correlation Learning With Multigrained Fusion by Hierarchical Network(TMM) [PDF]

  • MSFN: Cross-media Retrieval by Learning Rich Semantic Embeddings of Multimedia(MM) [PDF]

  • MNiL: Multi-Networks Joint Learning for Large-Scale Cross-Modal Retrieval(MM) [PDF] [Code]

2016
  • MDNN: Effective deep learning-based multi-modal retrieval(VLDB) [PDF]
2015
  • RE-DNN: Deep Semantic Mapping for Cross-Modal Retrieval(ICTAI) [PDF]

  • C2MLR: Deep Compositional Cross-modal Learning to Rank via Local-Global Alignment(MM) [PDF]

2.4.2.2 GAN

[Click to expand]
2022
  • JFSE: Joint Feature Synthesis and Embedding: Adversarial Cross-Modal Retrieval Revisited(TPAMI) [PDF] [Code]
2021
  • AACR: Augmented Adversarial Training for Cross-Modal Retrieval(TMM) [PDF] [Code]
2018
  • CM-GANs: Cross-modal Generative Adversarial Networks for Common Representation Learning(TMM) [PDF] [Code]
2017
  • ACMR: Adversarial Cross-Modal Retrieval(MM) [PDF] [Code]

2.4.2.3 Graph Model

[Click to expand]
2022
  • AGCN: Adversarial Graph Convolutional Network for Cross-Modal Retrieval(TCSVT) [PDF]

  • ALGCN: Adaptive Label-Aware Graph Convolutional Networks for Cross-Modal Retrieval(TMM) [PDF]

  • HGE: Cross-Modal Retrieval with Heterogeneous Graph Embedding(MM) [PDF]

2021
  • GCR: Exploring Graph-Structured Semantics for Cross-Modal Retrieval(MM) [PDF] [Code]

  • DAGNN: Dual Adversarial Graph Neural Networks for Multi-label Cross-modal Retrieval(AAAI) [PDF]

2018
  • SSPE: Learning Semantic Structure-preserved Embeddings for Cross-modal Retrieval(MM) [PDF]

2.4.2.4 Transformer

[Click to expand]
2021
  • RLCMR: Rethinking Label-Wise Cross-Modal Retrieval from A Semantic Sharing Perspective(IJCAI) [PDF]

2.5 Cross-modal-Retrieval-under-Special-Retrieval-Scenario

[Click to expand]

2.5.1 Semi-Supervised (Real-valued)

[Click to expand]
2020
  • SSCMR:Semi-Supervised Cross-Modal Retrieval With Label Prediction(TMM) [PDF]
2019
  • A3VSE:Annotation Efficient Cross-Modal Retrieval with Adversarial Attentive Alignment(MM) [PDF]

  • ASFS:Adaptive Semi-Supervised Feature Selection for Cross-Modal Retrieval(TMM) [PDF]

2018
  • GSS-SL:Generalized Semi-supervised and Structured Subspace Learning for Cross-Modal Retrieval(TMM) [PDF]
2017
  • SSDC:Semi-supervised Distance Consistent Cross-modal Retrieval(VSCC)[PDF]
2013
  • JRL:Learning Cross-Media Joint Representation With Sparse and Semisupervised Regularization(TCSVT) [PDF][Code]
2012
  • MVML-GL:Multiview Metric Learning with Global Consistency and Local Smoothness(TIST) [PDF]

2.5.2 Semi-Supervised (Hashing)

[Click to expand]
2020
  • SCH-GAN:Semi-Supervised Cross-Modal Hashing by Generative Adversarial Network(TC) [PDF] [Code]

  • SGCH:Semi-supervised graph convolutional hashing network for large-scale cross-modal retrieval(ICIP) [PDF]

2019
  • SSDQ:Semi-supervised Deep Quantization for Cross-modal Search(MM) [PDF]

  • S3PH:Semi-supervised semantic-preserving hashing for efficient cross-modal retrieval(ICME) [PDF]

2017
  • AUSL:Adaptively Unified Semi-supervised Learning for Cross-Modal Retrieval(IJCAI) [PDF]
2016
  • NPH:Neighborhood-Preserving Hashing for Large-Scale Cross-Modal Search(MM) [PDF]

2.5.3 Imbalance (Real-valued)

[Click to expand]
2021
  • PAN: Prototype-based Adaptive Network for Robust Cross-modal Retrieval(SIGIR) [PDF]

  • MCCN: Multimodal Coordinated Clustering Network for Large-Scale Cross-modal Retrieval(MM) [PDF]

2020
  • DAVAE:Incomplete Cross-modal Retrieval with Dual-Aligned Variational Autoencoders(MM) [PDF]
2015
  • SCDL:Semi-supervised Coupled Dictionary Learning for Cross-modal Retrieval in Internet Images and Texts(MM) [PDF]

  • LGCFL:Learning Consistent Feature Representation for Cross-Modal Multimedia Retrieval(TMM) [PDF]

2.5.4 Imbalance (Hashing)

[Click to expand]
2020
  • RUCMH:Robust Unsupervised Cross-modal Hashing for Multimedia Retrieval(TOIS) [PDF]

  • ATFH-N:Adversarial Tri-Fusion Hashing Network for Imbalanced Cross-Modal Retrieval(TETCI) [PDF]

  • FlexCMH:Flexible Cross-Modal Hashing(TNNLS) [PDF]

2019
  • TFNH:Triplet Fusion Network Hashing for Unpaired Cross-Modal Retrieval(ICMR) [PDF] [Code]

  • CALM:Collective Affinity Learning for Partial Cross-Modal Hashing(TIP) [PDF]

  • MTFH: A Matrix Tri-Factorization Hashing Framework for Efficient Cross-Modal Retrieval:(TIP) [PDF] [Code]

  • GSPH:Generalized Semantic Preserving Hashing for Cross-Modal Retrieval(TIP) [PDF]

2018
  • DAH:Dense Auto-Encoder Hashing for Robust Cross-Modality Retrieval(MM) [PDF]
2017
  • GSPH:Generalized Semantic Preserving Hashing for n-Label Cross-Modal Retrieval(CVPR) [PDF] [Code]

2.5.5 Incremental

[Click to expand]
2021
  • MARS: Learning Modality-Agnostic Representation for Scalable Cross-Media Retrieval(TCSVT) [PDF]

  • CCMR:Continual learning in cross-modal retrieval(CVPR) [PDF]

  • SCML:Real-world Cross-modal Retrieval via Sequential Learning(TMM) [PDF]

2020
  • ATTL-CEL:Adaptive Temporal Triplet-loss for Cross-modal Embedding Learning(MM)[PDF]
2019
  • SVHNs:Separated Variational Hashing Networks for Cross-Modal Retrieval(MM) [PDF]

  • ECMH:Extensible Cross-Modal Hashing(IJCAI) [PDF] [Code]

2018
  • TempXNet:Temporal Cross-Media Retrieval with Soft-Smoothing(MM) [PDF]

2.5.6 Noise

[Click to expand]
2022
  • DECL: Deep Evidential Learning with Noisy Correspondence for Cross-modal Retrieval(MM) (PDF) [Code]

  • ELRCMR: Early-Learning regularized Contrastive Learning for Cross-Modal Retrieval with Noisy Labels(MM) (PDF)

  • CMMQ: Mutual Quantization for Cross-Modal Search with Noisy Labels(CVPR) (PDF)

2021
  • MRL: Learning Cross-Modal Retrieval with Noisy Labels(CVPR) (PDF) [Code]
2018
  • WSJE: Webly Supervised Joint Embedding for Cross-Modal Image-Text Retrieval(MM) (PDF)

2.5.7 Cross-Domain

[Click to expand]
2021
  • M2GUDA: Multi-Metrics Graph-Based Unsupervised Domain Adaptation for Cross-Modal Hashing(ICMR) (PDF)

  • ACP: Adaptive Cross-Modal Prototypes for Cross-Domain Visual-Language Retrieval(CVPR) (PDF)

2020
  • DASG: Unsupervised Cross-Media Retrieval Using Domain Adaptation With Scene Graph(TCSVT) (PDF)

2.5.8 Zero-Shot

[Click to expand]
2020
  • LCALE: Learning Cross-Aligned Latent Embeddings for Zero-Shot Cross-Modal Retrieval(AAAI) (PDF)

  • CFSA: Correlated Features Synthesis and Alignment for Zero-shot Cross-modal Retrieval(SIGIR) (PDF)

2019
  • ZS-CMR: Learning Cross-Aligned Latent Embeddings for Zero-Shot Cross-Modal Retrieval(TIP) (PDF)

2.5.9 Few-Shot

[Click to expand]
2021
  • SOCMH: Know Yourself and Know Others: Efficient Common Representation Learning for Few-shot Cross-modal Retrieval(ICMR) (PDF)

2.5.10 Online Learning

[Click to expand]
2020
  • CMOLRS: Online Fast Adaptive Low-Rank Similarity Learning for Cross-Modal Retrieval(TMM) (PDF) [Code]

  • LEMON: Label Embedding Online Hashing for Cross-Modal Retrieval(MM) (PDF) [Code]

2019
  • FOMH: Flexible Online Multi-modal Hashing for Large-scale Multimedia Retrieval(MM) (PDF) [Code]
2017
  • OCMSR: Online Cross-Modal Scene Retrieval by Binary Representation and Semantic Graph(MM) (PDF)
2016
  • OCMH: Online cross-modal hashing for web image retrieval(AAAI) (PDF)

2.5.11 Hierarchical

[Click to expand]
2020
  • SHDCH: Supervised Hierarchical Deep Hashing for Cross-Modal Retrieval(MM) (PDF) [Code]
2019
  • HiCHNet: Supervised Hierarchical Cross-Modal Hashing(SIGIR) (PDF) [Code]

2.5.12 Fine-grained

[Click to expand]
2022
  • PCMDA: Paired Cross-Modal Data Augmentation for Fine-Grained Image-to-Text Retrieval(MM) (PDF)
2019
  • FGCrossNet: A New Benchmark and Approach for Fine-grained Cross-media Retrieval(MM) (PDF) [Code]

3. Usage

3.1 Datasets

  • Graph Model--GCR

Dataset Link:

Baidu Yun Link: https://pan.baidu.com/s/1YmW8Zz2uK3AgCs6pDEoA8A?pwd=21xh
Code: 21xh
  • Unsupervised cross-modal real-valued

Dataset link:

Baidu Yun Link:https://pan.baidu.com/s/1hBNo8gBSyLbik0ka1POhiQ 
Code:cc53
  • Quantization--CDQ

Dataset Link:

Baidu Yun Link: https://pan.baidu.com/s/1mO1hdsJR2FN5xEAv2e7eaw?pwd=us9v
Code: us9v
  • GAN--CPAH

Dataset Link:

Baidu Yun Link: https://pan.baidu.com/s/145Zool0FUb3758EeSxtHBw?pwd=mxt7
Code: mxt7
  • Transformer--DCHMT

Dataset Link:

Baidu Yun Link: https://pan.baidu.com/s/1UHr2NVjFkTjLXXQ8Izy5WA?pwd=qfsj
Code: qfsj
  • Feature Mapping(Sample Constraint)(Label Constraint)--MDBE

Dataset Link:

Baidu Yun Link: https://pan.baidu.com/s/15BtQ_Zz7UihZBW6KXTTodA?pwd=ir7g
Code: ir7g
  • Feature Mapping(Sample Constraint)(Common Hamming)--RoPH

Dataset Link:

Baidu Yun Link: https://pan.baidu.com/s/1_uIulkuxcIcubvl5u3zsOA?pwd=46c4
Code: 46c4
  • Online learning--SHDCH

Dataset Link:

Baidu Yun Link: https://pan.baidu.com/s/1-CsIJbvz3IFsmDgYk9BwYg?pwd=7hd8
Code: 7hd8
  • Noise--MRL

Dataset Link:

Baidu Yun Link: https://pan.baidu.com/s/1FIrB-gXJa9VHKzLRQZf30Q?pwd=g3qt
Code: g3qt
  • Online learning--LEMON

Dataset Link:

Baidu Yun Link: https://pan.baidu.com/s/1s5SnnAXo5wK7cmRs3zNq4w?pwd=jxjo
Code: jxjo
  • Fine-grained--FGCrossNet

Dataset Link:

Baidu Yun Link: https://pan.baidu.com/s/1OYxCLmNKvPzwLIs5snTOlA?pwd=r80g
Code: r80g
  • Noise--DECL

Dataset Link:

Baidu Yun Link: https://pan.baidu.com/s/1FcxkwOuuiUXnIl1LAatDLA?pwd=nl2z
Code: nl2z

cross-modal-retrieval's People

Contributors

futuretwt avatar styx29-0 avatar wangbowen7 avatar xiaolaohuqiao avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.