Awesome Hot Knowledge Distillation research for Computer vision
Distillation for diffusion models
-
A Comprehensive Survey on Knowledge Distillation of Diffusion Models. 2023
Weijian Luo. [pdf]
-
Knowledge distillation in iterative generative models for improved sampling speed. 2021.
Eric Luhman, Troy Luhman. [pdf]
-
Progressive Distillation for Fast Sampling of Diffusion Models. ICLR 2022.
Tim Salimans and Jonathan Ho. [pdf]
-
On Distillation of Guided Diffusion Models. CVPR 2023.
Chenlin Meng and Robin Rombach and Ruiqi Gao and Diederik P. Kingma and Stefano Ermon and Jonathan Ho and Tim Salimans. [pdf]
-
TRACT: Denoising Diffusion Models with Transitive Closure Time-Distillation. 2023
Berthelot, David and Autef, Arnaud and Lin, Jierui and Yap, Dian Ang and Zhai, Shuangfei and Hu, Siyuan and Zheng, Daniel and Talbott, Walter and Gu, Eric. [pdf]
-
BK-SDM: Architecturally Compressed Stable Diffusion for Efficient Text-to-Image Generation. ICML 2023.
Kim, Bo-Kyeong and Song, Hyoung-Kyu and Castells, Thibault and Choi, Shinkook. [pdf]
-
On Architectural Compression of Text-to-Image Diffusion Models. 2023.
Kim, Bo-Kyeong and Song, Hyoung-Kyu and Castells, Thibault and Choi, Shinkook. [pdf]
-
Knowledge Diffusion for Distillation. 2023.
Tao Huang, Yuan Zhang, Mingkai Zheng, Shan You, Fei Wang, Chen Qian, Chang Xu [pdf]
-
SnapFusion: Text-to-Image Diffusion Model on Mobile Devices within Two Seconds. 2023.
Yanyu Li, Huan Wang, Qing Jin, Ju Hu, Pavlo Chemerys, Yun Fu, Yanzhi Wang, Sergey Tulyakov, Jian Ren1. [pdf]
-
BOOT: Data-free Distillation of Denoising Diffusion Models with Bootstrapping. 2023.
Jiatao Gu, Shuangfei Zhai, Yizhe Zhang, Lingjie Liu, Josh Susskind. [pdf]
-
Consistency models. 2023
Yang Song, Prafulla Dhariwal, Mark Chen, and Ilya Sutskever. [pdf]
Knowledge Distillation Knowledge Distillation for Semantic Segmentation
Structured knowledge distillation for semantic segmentation. CVPR-2019.
Intra-class feature variation distillation for semantic segmentation. ECCV-2020.
Channel-wise knowledge distillation for dense prediction. ICCV-2021.
Double Similarity Distillation for Semantic Image Segmentation. TIP-2021.
Cross-Image Relational Knowledge Distillation for Semantic Segmentation. CVPR-2022.
Awesome Knowledge Distillation for Object Detection
(CVPR2017) Mimicking very efficient network for object detection[pdf]
(CVPR2019) Distilling object detectors with fine-grained feature imitation[pdf]
(CVPR2021) General instance distillation for object detection[pdf]
(CVPR2021) Distilling object detectors via decoupled features[pdf]
(NeurIPS2021) Distilling object detectors with feature richness[pdf]
(CVPR2022) Focal and global knowledge distillation for detectors[pdf]
(AAAI2022) Rank Mimicking and Prediction-guided Feature Imitation[pdf]
(ECCV2022) Prediction-Guided Distillation[pdf]
(ICLR2023) Masked Distillation with Receptive Tokens[pdf]
SSIM. NeurIPS 2022. [OpenReview] [arXiv] <GitHub> - Structural Knowledge Distillation for Object Detection
DRKD. IJCAI 2023. [arXiv] - Dual Relation Knowledge Distillation for Object Detection
GLAMD. ECCV 2022. [ECVA] [Springer] - GLAMD: Global and Local Attention Mask Distillation for Object Detectors
G-DetKD. ICCV 2021. [CVF] [IEEE Xplore] [arXiv] - G-DetKD: Towards General Distillation Framework for Object Detectors via Contrastive and Semantic-guided Feature Imitation
PKD. NeurIPS 2022. [OpenReview] [arXiv] <GitHub> - PKD: General Distillation Framework for Object Detectors via Pearson Correlation Coefficient
MimicDet. ECCV 2020. [ECVA] [Springer] [arXiv] - MimicDet: Bridging the Gap Between One-Stage and Two-Stage Object Detection
LabelEnc. ECCV 2020. [ECVA] [Springer] [arXiv] <GitHub> - LabelEnc: A New Intermediate Supervision Method for Object Detection
HEAD. ECCV 2022. [ECVA] [Springer] [arXiv] <GitHub> - HEAD: HEtero-Assists Distillation for Heterogeneous Object Detectors
LGD. AAAI 2022. [AAAI] [arXiv] - LGD: Label-Guided Self-Distillation for Object Detection
TPAMI. [IEEE Xplore] - When Object Detection Meets Knowledge Distillation: A Survey
ScaleKD. CVPR 2023. [CVF] - ScaleKD: Distilling Scale-Aware Knowledge in Small Object Detector.
CrossKD. [arXiv] <GitHub> - CrossKD: Cross-Head Knowledge Distillation for Dense Object Detection
Knowledge Distillation in Vision Transformers
Training data-efficient image transformers & distillation through attention. ICML2021
Co-advise: Cross inductive bias distillation.CVPR2022.
Tinyvit: Fast pretraining distillation for small vision transformers. arXiv preprint arXiv:2207.10666.
Attention Probe: Vision Transformer Distillation in the Wild. ICASSP2022
Dear KD: Data-Efficient Early Knowledge Distillation for Vision Transformers. CVPR2022
Efficient vision transformers via fine-grained manifold distillation. NIPS2022
Cross-Architecture Knowledge Distillation. arXiv preprint arXiv:2207.05273. ACCV2022
MiniViT: Compressing Vision Transformers with Weight Multiplexing. CVPR2022
ViTKD: Practical Guidelines for ViT feature knowledge distillation. arXiv 2022, code
Methods for Distillation Gaps
Improved Knowledge Distillation via Teacher Assistant: Bridging the Gap Between Student and Teacher. Mirzadeh et al. AAAI2020
Search to Distill: Pearls are Everywhere but not the Eyes. Liu Yu et al. CVPR 2020
Reducing the Teacher-Student Gap via Spherical Knowledge Disitllation, arXiv:2020
Knowledge Distillation via the Target-aware Transformer, CVPR2022
Decoupled Knowledge Distillation, Borui Zhao, et al. , CVPR 2022, code
Prune Your Model Before Distill It, Jinhyuk Park and Albert No, ECCV 2022, code
Asymmetric Temperature Scaling Makes Larger Networks Teach Well Again, NeurIPS 2022
Weighted Distillation with Unlabeled Examples, NeurIPS 2022
Respecting Transfer Gap in Knowledge Distillation, NeurIPS 2022
Knowledge Distillation from A Stronger Teacher. arXiv preprint arXiv:2205.10536.
Masked Generative Distillation, Zhendong Yang, et al. , ECCV 2022, code
Curriculum Temperature for Knowledge Distillation, Zheng Li, et al. , AAAI 2023, code
Knowledge distillation: A good teacher is patient and consistent, Lucas Beyeret al. CVPR 2022
Knowledge Distillation with the Reused Teacher Classifier, Defang Chen, et al. , CVPR 2022
Scaffolding a Student to Instill Knowledge, ICLR2023
Function-Consistent Feature Distillation, ICLR2023
Better Teacher Better Student: Dynamic Prior Knowledge for Knowledge Distillation, ICLR2023
Supervision Complexity and its Role in Knowledge Distillation, ICLR2023
Knowledge from logits
Title | Venue | Note |
---|---|---|
Distilling the knowledge in a neural network | arXiv:1503.2531 | |
Deep Model Compression: Distilling Knowledge from Noisy Teachers | arXiv:161009650 | |
Semi-Supervised Knowledge Transfer for Deep Learning from Private Training Data | ICLR 2017 | |
Knowledge Adaptation: Teaching to Adapt | Arxiv:17022052 | |
Learning from Multiple Teacher Networks | KDD 2017 | |
Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results | NIPS 2017 | |
Training Deep Neural Networks in Generations:A More Tolerant Teacher Educates Better Students | arXiv:1805.551 | |
Moonshine:Distilling with Cheap Convolutions | NIPS 2018 | |
Learning from Multiple Teacher Networks | KDD 2017 | |
Positive-Unlabeled Compression on the Cloud | NIPS 2019 | |
Variational Student: Learning Compact and Sparser Networks in Knowledge Distillation Framework | arXiv:1910.12061 | |
Preparing Lessons: Improve Knowledge Distillation with Better Supervision | arXiv:1911.7471 | |
Adaptive Regularization of Labels | arXiv:1908.5474 | |
Learning Metrics from Teachers: Compact Networks for Image Embedding | CVPR 2019 | |
Diversity with Cooperation: Ensemble Methods for Few-Shot Classification | ICCV 2019 | |
Improved Knowledge Distillation via Teacher Assistant: Bridging the Gap Between Student and Teacher | arXiv:1902.3393 | |
MEAL: Multi-Model Ensemble via Adversarial Learning | AAAI 2019 | |
Revisit Knowledge Distillation: a Teacher-free Framework | CVPR 2020 [code] | |
Ensemble Distribution Distillation | ICLR 2020 | |
Noisy Collaboration in Knowledge Distillation | ICLR 2020 | |
Self-training with Noisy Student improves ImageNet classification | CVPR 2020 | |
QUEST: Quantized embedding space for transferring knowledge | CVPR 2020(pre) | |
Meta Pseudo Labels | ICML 2020 | |
Subclass Distillation | ICML2020 | |
Boosting Self-Supervised Learning via Knowledge Transfer | CVPR 2018 | |
Neural Networks Are More Productive Teachers Than Human Raters: Active Mixup for Data-Efficient Knowledge Distillation from a Blackbox Model | CVPR 2020 [code] | |
Regularizing Class-wise Predictions via Self-knowledge Distillation | CVPR 2020 [code] | |
Rethinking Data Augmentation: Self-Supervision and Self-Distillation | ICLR 2020 | |
What it Thinks is Important is Important: Robustness Transfers through Input Gradients | CVPR 2020 | |
Role-Wise Data Augmentation for Knowledge Distillation | ICLR 2020 [code] | |
Distilling Effective Supervision from Severe Label Noise | CVPR 2020 | |
Learning with Noisy Class Labels for Instance Segmentation | ECCV 2020 | |
Self-Distillation Amplifies Regularization in Hilbert Space | arXiv:2002.5715 | |
MINILM: Deep Self-Attention Distillation for Task-Agnostic Compression of Pre-Trained Transformers | arXiv:200210957 | |
Hydra: Preserving Ensemble Diversity for Model Distillation | arXiv:20014694 | |
Teacher-Class Network: A Neural Network Compression Mechanism | arXiv:2004.3281 | |
Learning from a Lightweight Teacher for Efficient Knowledge Distillation | arXiv:2005.9163 | |
Self-Distillation as Instance-Specific Label Smoothing | arXiv:2006.5065 | |
Self-supervised Knowledge Distillation for Few-shot Learning | arXiv:2006.09785 | |
Improving Weakly Supervised Visual Grounding by Contrastive Knowledge Distillation | arXiv:2007.1951 | |
Few Sample Knowledge Distillation for Efficient Network Compression | CVPR 2020 | |
Learning What and Where to Transfer | ICML 2019 | |
Transferring Knowledge across Learning Processes | ICLR 2019 | |
Semantic-Aware Knowledge Preservation for Zero-Shot Sketch-Based Image Retrieval | ICCV 2019 | |
Diversity with Cooperation: Ensemble Methods for Few-Shot Classification | ICCV 2019 | |
Knowledge Representing: Efficient, Sparse Representation of Prior Knowledge for Knowledge Distillation | arXiv:191105329v1 | |
Progressive Knowledge Distillation For Generative Modeling | ICLR 2020 | |
Few Shot Network Compression via Cross Distillation | AAAI 2020 |
Knowledge from intermediate layers
Title | Venue | Note |
---|---|---|
Fitnets: Hints for thin deep nets | arXiv:1412.6550 | |
Paying more attention to attention: Improving the performance of convolutional neural networks via attention transfer | ICLR 2017 | |
Knowledge Projection for Effective Design of Thinner and Faster Deep Neural Networks | arXiv:1710.9505 | |
A Gift from Knowledge Distillation: Fast Optimization, Network Minimization and Transfer Learning | CVPR 2017 | |
Paraphrasing complex network: Network compression via factor transfer | NIPS 2018 | |
Knowledge transfer with jacobian matching | ICML 2018 | |
Like What You Like: Knowledge Distill via Neuron Selectivity Transfer | CVPR2018 | |
An Embarrassingly Simple Approach for Knowledge Distillation | MLR 2018 | |
Self-supervised knowledge distillation using singular value decomposition | ECCV 2018 | |
Learning Deep Representations with Probabilistic Knowledge Transfer | ECCV 2018 | |
Correlation Congruence for Knowledge Distillation | ICCV 2019 | |
Similarity-Preserving Knowledge Distillation | ICCV 2019 | |
Variational Information Distillation for Knowledge Transfer | CVPR 2019 | |
Knowledge Distillation via Instance Relationship Graph | CVPR 2019 | |
Knowledge Distillation via Instance Relationship Graph | CVPR 2019 | |
Knowledge Distillation via Route Constrained Optimization | ICCV 2019 | |
Similarity-Preserving Knowledge Distillation | ICCV 2019 | |
Stagewise Knowledge Distillation | arXiv: 1911.6786 | |
Distilling Object Detectors with Fine-grained Feature Imitation | ICLR 2020 | |
Knowledge Squeezed Adversarial Network Compression | AAAI 2020 | |
Knowledge Distillation from Internal Representations | AAAI 2020 | |
Knowledge Flow:Improve Upon Your Teachers | ICLR 2019 | |
LIT: Learned Intermediate Representation Training for Model Compression | ICML 2019 | |
A Comprehensive Overhaul of Feature Distillation | ICCV 2019 | |
Residual Knowledge Distillation | arXiv:2002.9168 | |
Knowledge distillation via adaptive instance normalization | arXiv:2003.4289 | |
Channel Distillation: Channel-Wise Attention for Knowledge Distillation | arXiv:2006.01683 | |
Matching Guided Distillation | ECCV 2020 | |
Differentiable Feature Aggregation Search for Knowledge Distillation | ECCV 2020 | |
Local Correlation Consistency for Knowledge Distillation | ECCV 2020 |
Oneline KD
Title | Venue | Note |
---|---|---|
Deep Mutual Learning | CVPR 2018 | |
Born-Again Neural Networks | ICML 2018 | |
Knowledge distillation by on-the-fly native ensemble | NIPS 2018 | |
Collaborative learning for deep neural networks | NIPS 2018 | |
Unifying Heterogeneous Classifiers with Distillation | CVPR 2019 | |
Snapshot Distillation: Teacher-Student Optimization in One Generation | CVPR 2019 | |
Deeply-supervised knowledge synergy | CVPR 2019 | |
Be Your Own Teacher: Improve the Performance of Convolutional Neural Networks via Self Distillation | ICCV 2019 | |
Distillation-Based Training for Multi-Exit Architectures | ICCV 2019 | |
MSD: Multi-Self-Distillation Learning via Multi-classifiers within Deep Neural Networks | arXiv:1911.9418 | |
FEED: Feature-level Ensemble for Knowledge Distillation | AAAI 2020 | |
Stochasticity and Skip Connection Improve Knowledge Transfer | ICLR 2020 | |
Online Knowledge Distillation with Diverse Peers | AAAI 2020 | |
Online Knowledge Distillation via Collaborative Learning | CVPR 2020 | |
Collaborative Learning for Faster StyleGAN Embedding | arXiv:20071758 | |
Online Knowledge Distillation via Collaborative Learning | CVPR 2020 | |
Feature-map-level Online Adversarial Knowledge Distillation | ICML 2020 | |
Knowledge Transfer via Dense Cross-layer Mutual-distillation | ECCV 2020 | |
MetaDistiller: Network Self-boosting via Meta-learned Top-down Distillation | ECCV 2020 | |
ResKD: Residual-Guided Knowledge Distillation | arXiv:2006.4719 | |
Interactive Knowledge Distillation | arXiv:2007.1476 |
Understanding KD
Title | Venue | Note |
---|---|---|
Do deep nets really need to be deep? | NIPS 2014 | |
When Does Label Smoothing Help? | NIPS 2019 | |
Towards Understanding Knowledge Distillation | AAAI 2019 | |
Harnessing deep neural networks with logical rules | ACL 2016 | |
Adaptive Regularization of Labels | arXiv:1908 | |
Knowledge Isomorphism between Neural Networks | arXiv:1908 | |
Understanding and Improving Knowledge Distillation | arXiv:2002.3532 | |
The State of Knowledge Distillation for Classification | arXiv:1912.10850 | |
Explaining Knowledge Distillation by Quantifying the Knowledge | CVPR 2020 | |
DeepVID: deep visual interpretation and diagnosis for image classifiers via knowledge distillation | IEEE Trans, 2019 | |
On the Unreasonable Effectiveness of Knowledge Distillation: Analysis in the Kernel Regime | arXiv:2003.13438 | |
Why distillation helps: a statistical perspective | arXiv:2005.10419 | |
Transferring Inductive Biases through Knowledge Distillation | arXiv:2006.555 | |
Does label smoothing mitigate label noise? Lukasik, Michal et al | ICML 2020 | |
An Empirical Analysis of the Impact of Data Augmentation on Knowledge Distillation | arXiv:2006.3810 | |
Does Adversarial Transferability Indicate Knowledge Transferability? | arXiv:2006.14512 | |
On the Demystification of Knowledge Distillation: A Residual Network Perspective | arXiv:2006.16589 | |
Teaching To Teach By Structured Dark Knowledge | ICLR 2020 | |
Inter-Region Affinity Distillation for Road Marking Segmentation | CVPR 2020 [code] | |
Heterogeneous Knowledge Distillation using Information Flow Modeling | CVPR 2020 [code] | |
Local Correlation Consistency for Knowledge Distillation | ECCV2020 | |
Few-Shot Class-Incremental Learning | CVPR 2020 | |
Unifying distillation and privileged information | ICLR 2016 |
KD for model pruning , quantization, NAS
Title | Venue | Note |
---|---|---|
Accelerating Convolutional Neural Networks with Dominant Convolutional Kernel and Knowledge Pre-regression | ECCV 2016 | |
N2N Learning: Network to Network Compression via Policy Gradient Reinforcement Learning | ICLR 2018 | |
Slimmable Neural Networks | ICLR 2018 | |
Apprentice: Using Knowledge Distillation Techniques To Improve Low-Precision Network Accuracy | NIPS 2018 | |
MetaPruning: Meta Learning for Automatic Neural Network Channel Pruning | ICCV 2019 | |
LightPAFF: A Two-Stage Distillation Framework for Pre-training and Fine-tuning | ICLR 2020 | |
Pruning with hints: an efficient framework for model acceleration | ICLR 2020 | |
Knapsack Pruning with Inner Distillation | arXiv:2002.8258 | |
Training convolutional neural networks with cheap convolutions and online distillation | arXiv:190913063 | |
Cooperative Pruning in Cross-Domain Deep Neural Network Compression | IJCAI 2019 | |
QKD: Quantization-aware Knowledge Distillation | arXiv:191112491v1 | |
Neural Network Pruning with Residual-Connections and Limited-Data | CVPR 2020 | |
Training Quantized Neural Networks with a Full-precision Auxiliary Module | CVPR 2020 | |
Towards Effective Low-bitwidth Convolutional Neural Networks | CVPR 2018 | |
Effective Training of Convolutional Neural Networks with Low-bitwidth Weights and Activations | arXiv:19084680 | |
Paying more attention to snapshots of Iterative Pruning: Improving Model Compression via Ensemble Distillation | arXiv:200611487 | |
Knowledge Distillation Beyond Model Compression | arxiv:20071493 | |
Teacher Guided Architecture Search | ICCV 2019 | |
Distillation Guided Residual Learning for Binary Convolutional Neural Networks | ECCV 2020 | |
MutualNet: Adaptive ConvNet via Mutual Learning from Network Width and Resolution | ECCV 2020 | |
Improving Neural Architecture Search Image Classifiers via Ensemble Learning | arXiv:19036236 | |
Blockwisely Supervised Neural Architecture Search with Knowledge Distillation | arXiv:191113053v1 | |
Towards Oracle Knowledge Distillation with Neural Architecture Search | AAAI 2020 | |
Search for Better Students to Learn Distilled Knowledge | arXiv:200111612 | |
Circumventing Outliers of AutoAugment with Knowledge Distillation | arXiv:200311342 | |
Network Pruning via Transformable Architecture Search | NIPS 2019 | |
Search to Distill: Pearls are Everywhere but not the Eyes | CVPR 2020 | |
AutoGAN-Distiller: Searching to Compress Generative Adversarial Networks | ICML 2020 [code] |
Application of KD
Sub | Title | Venue |
---|---|---|
Graph | Graph-based Knowledge Distillation by Multi-head Attention Network | arXiv:19072226 |
Graph Representation Learning via Multi-task Knowledge Distillation | arXiv:19115700 | |
Deep geometric knowledge distillation with graphs | arXiv:19113080 | |
Better and faster: Knowledge transfer from multiple self-supervised learning tasks via graph distillation for video classification | IJCAI 2018 | |
Distillating Knowledge from Graph Convolutional Networks | CVPR 2020 | |
Face | Face model compression by distilling knowledge from neurons | AAAI 2016 |
MarginDistillation: distillation for margin-based softmax | arXiv:2003.2586 | |
ReID | Distilled Person Re-Identification: Towards a More Scalable System | CVPR 2019 |
Robust Re-Identification by Multiple Views Knowledge Distillation | ECCV 2020 [code] | |
Detection | Learning efficient object detection models with knowledge distillation | NIPS 2017 |
Distilling Object Detectors with Fine-grained Feature Imitation | CVPR 2019 | |
Relation Distillation Networks for Video Object Detection | ICCV 2019 | |
Learning Lightweight Face Detector with Knowledge Distillation | IEEE 2019 | |
Teacher Supervises Students How to Learn From Partially Labeled Images for Facial Landmark Detection | ICCV 2019 | |
Learning Lightweight Lane Detection CNNs by Self Attention Distillation | ICCV 2019 | |
A Multi-Task Mean Teacher for Semi-Supervised Shadow Detection | CVPR 2020 [code] | |
Boosting Weakly Supervised Object Detection with Progressive Knowledge Transfer | ECCV 2020 | |
A Multi-Task Mean Teacher for Semi-Supervised Shadow Detection | CVPR 2020 [code] | |
Temporal Self-Ensembling Teacher for Semi-Supervised Object Detection | IEEE 2020 [code] | |
Uninformed Students: Student-Teacher Anomaly Detection with Discriminative Latent Embeddings | CVPR 2020 | |
Distilling Knowledge from Refinement in Multiple Instance Detection Networks | arXiv:2004.10943 | |
Enabling Incremental Knowledge Transfer for Object Detection at the Edge | arXiv:2004.5746 | |
Pose | DOPE: Distillation Of Part Experts for whole-body 3D pose estimation in the wild | ECCV 2020 |
Fast Human Pose Estimation | CVPR 2019 | |
Distill Knowledge From NRSfM for Weakly Supervised 3D Pose Learning | ICCV 2019 | |
Segmentation | ROAD: Reality Oriented Adaptation for Semantic Segmentation of Urban Scenes | CVPR 2018 |
Knowledge Distillation for Incremental Learning in Semantic Segmentation | arXiv:1911.3462 | |
Geometry-Aware Distillation for Indoor Semantic Segmentation | CVPR 2019 | |
Structured Knowledge Distillation for Semantic Segmentation | CVPR 2019 | |
Self-similarity Student for Partial Label Histopathology Image Segmentation | ECCV 2020 | |
Knowledge Distillation for Brain Tumor Segmentation | arXiv:2002.3688 | |
ROAD: Reality Oriented Adaptation for Semantic Segmentation of Urban Scenes | CVPR 2018 | |
Low-Vision | Lightweight Image Super-Resolution with Information Multi-distillation Network | ICCVW 2019 |
Collaborative Distillation for Ultra-Resolution Universal Style Transfer | CVPR 2020 [code] | |
Video | Efficient Video Classification Using Fewer Frames | CVPR 2019 |
Relation Distillation Networks for Video Object Detection | ICCV 2019 | |
Teacher Supervises Students How to Learn From Partially Labeled Images for Facial Landmark Detection | ICCV 2019 | |
Progressive Teacher-student Learning for Early Action Prediction | CVPR 2019 | |
MOD: A Deep Mixture Model with Online Knowledge Distillation for Large Scale Video Temporal Concept Localization | arXiv:1910.12295 | |
AWSD:Adaptive Weighted Spatiotemporal Distillation for Video Representation | ICCV 2019 | |
Dynamic Kernel Distillation for Efficient Pose Estimation in Videos | ICCV 2019 | |
Online Model Distillation for Efficient Video Inference | ICCV 2019 | |
Optical Flow Distillation: Towards Efficient and Stable Video Style Transfer | ECCV 2020 | |
Adversarial Self-Supervised Learning for Semi-Supervised 3D Action Recognition | ECCV 2020 | |
Object Relational Graph with Teacher-Recommended Learning for Video Captioning | CVPR 2020 | |
Spatio-Temporal Graph for Video Captioning with Knowledge distillation | CVPR 2020 [code] | |
TA-Student VQA: Multi-Agents Training by Self-Questioning | CVPR 2020 |
Data-free KD
Title | Venue | Note |
---|---|---|
Data-Free Knowledge Distillation for Deep Neural Networks | NIPS 2017 | |
Zero-Shot Knowledge Distillation in Deep Networks | ICML 2019 | |
DAFL:Data-Free Learning of Student Networks | ICCV 2019 | |
Zero-shot Knowledge Transfer via Adversarial Belief Matching | NIPS 2019 | |
Dream Distillation: A Data-Independent Model Compression Framework | ICML 2019 | |
Dreaming to Distill: Data-free Knowledge Transfer via DeepInversion | CVPR 2020 | |
Data-Free Adversarial Distillation | CVPR 2020 | |
The Knowledge Within: Methods for Data-Free Model Compression | CVPR 2020 | |
Knowledge Extraction with No Observable Data | NIPS 2019 | |
Data-Free Knowledge Amalgamation via Group-Stack Dual-GAN | CVPR 2020 | |
DeGAN : Data-Enriching GAN for Retrieving Representative Samples from a Trained Classifier | arXiv:1912.11960 | |
Generative Low-bitwidth Data Free Quantization | arXiv:2003.3603 | |
This dataset does not exist: training models from generated images | arXiv:1911.2888 | |
MAZE: Data-Free Model Stealing Attack Using Zeroth-Order Gradient Estimation | arXiv:2005.3161 | |
Generative Teaching Networks: Accelerating Neural Architecture Search by Learning to Generate Synthetic Training Data | ECCV 2020 | |
Billion-scale semi-supervised learning for image classification | arXiv:1905.00546 | |
Data-free Parameter Pruning for Deep Neural Networks | arXiv:1507.6149 | |
Data-Free Quantization Through Weight Equalization and Bias Correction | ICCV 2019 | |
DAC: Data-free Automatic Acceleration of Convolutional Networks | WACV 2019 |
Cross-modal KD & DA
Title | Venue | Note |
---|---|---|
SoundNet: Learning Sound Representations from Unlabeled Video SoundNet Architecture | ECCV 2016 | |
Cross Modal Distillation for Supervision Transfer | CVPR 2016 | |
Emotion recognition in speech using cross-modal transfer in the wild | ACM MM 2018 | |
Through-Wall Human Pose Estimation Using Radio Signals | CVPR 2018 | |
Compact Trilinear Interaction for Visual Question Answering | ICCV 2019 | |
Cross-Modal Knowledge Distillation for Action Recognition | ICIP 2019 | |
Learning to Map Nearly Anything | arXiv:1909.6928 | |
Semantic-Aware Knowledge Preservation for Zero-Shot Sketch-Based Image Retrieval | ICCV 2019 | |
UM-Adapt: Unsupervised Multi-Task Adaptation Using Adversarial Cross-Task Distillation | ICCV 2019 | |
CrDoCo: Pixel-level Domain Transfer with Cross-Domain Consistency | CVPR 2019 | |
XD:Cross lingual Knowledge Distillation for Polyglot Sentence Embeddings | ||
Effective Domain Knowledge Transfer with Soft Fine-tuning | arXiv:1909.2236 | |
ASR is all you need: cross-modal distillation for lip reading | arXiv:1911.12747 | |
Knowledge distillation for semi-supervised domain adaptation | arXiv:1908.7355 | |
Domain Adaptation via Teacher-Student Learning for End-to-End Speech Recognition | arXiv:2001.1798 | |
Cluster Alignment with a Teacher for Unsupervised Domain Adaptation | ICCV 2019. | |
Attention Bridging Network for Knowledge Transfer | ICCV 2019 | |
Unpaired Multi-modal Segmentation via Knowledge Distillation | arXiv:2001.3111 | |
Multi-source Distilling Domain Adaptation | arXiv:1911.11554 | |
Creating Something from Nothing: Unsupervised Knowledge Distillation for Cross-Modal Hashing | CVPR 2020 | |
Improving Semantic Segmentation via Self-Training | arXiv:2004.14960 | |
Speech to Text Adaptation: Towards an Efficient Cross-Modal Distillation | arXiv:2005.8213 | |
Joint Progressive Knowledge Distillation and Unsupervised Domain Adaptation | arXiv:2005.7839 | |
Knowledge as Priors: Cross-Modal Knowledge Generalization for Datasets without Superior Knowledge | CVPR 2020 | |
Large-Scale Domain Adaptation via Teacher-Student Learning | arXiv:1708.5466 | |
Large Scale Audiovisual Learning of Sounds with Weakly Labeled Data | IJCAI 2020 | |
Distilling Cross-Task Knowledge via Relationship Matching | CVPR 2020 [code] | |
Modality distillation with multiple stream networks for action recognition | ECCV 2018 | |
Domain Adaptation through Task Distillation | ECCV 2020 |
Adversarial KD
Title | Venue | Note |
---|---|---|
Training Shallow and Thin Networks for Acceleration via Knowledge Distillation with Conditional Adversarial Networks | arXiv:1709.00513 | |
KTAN: Knowledge Transfer Adversarial Network | arXiv:1810.08126 | |
KDGAN:Knowledge Distillation with Generative Adversarial Networks. | NIPS 2018 | |
Adversarial Learning of Portable Student Networks | AAAI 2018 | |
Adversarial Network Compression | ECCV 2018 | |
Cross-Modality Distillation: A case for Conditional Generative Adversarial Networks | ICASSP 2018 | |
Adversarial Distillation for Efficient Recommendation with External Knowledge | TOIS 2018 | |
Training student networks for acceleration with conditional adversarial networks | BMVC 2018 | |
Adversarial network compression | ECCV 2018 | |
KDGAN:Knowledge Distillation with Generative Adversarial Networks | NIPS 2018 | |
DAFL:Data-Free Learning of Student Networks | ICCV 2019 | |
MEAL: Multi-Model Ensemble via Adversarial Learning | AAAI 2019 | |
Exploiting the Ground-Truth: An Adversarial Imitation Based Knowledge Distillation Approach for Event Detection | AAAI 2019 | |
Adversarially Robust Distillation | AAAI 2020 | |
GAN-Knowledge Distillation for one-stage Object Detection | arXiv:1906.08467 | |
Lifelong GAN: Continual Learning for Conditional Image Generation | arXiv:1908.03884 | |
Compressing GANs using Knowledge Distillation | arXiv:1902.00159 | |
Feature-map-level Online Adversarial Knowledge Distillation | ICML 2020 | |
MineGAN: effective knowledge transfer from GANs to target domains with few images | CVPR 2020 | |
Distilling portable Generative Adversarial Networks for Image Translation | AAAI 2020 | |
GAN Compression: Efficient Architectures for Interactive Conditional GANs | CVPR 2020 |