This repository collects papers for "A Survey on Knowledge Distillation of Large Language Models". We break down KD into Knowledge Elicitation and Distillation Algorithms, and explore the Skill & Vertical Distillation of LLMs.

data-augmentation instruction-following kd knowledge-distillation large-language-model llm self-training survey compression data-synthesis

awesome-knowledge-distillation-of-llms's People

Contributors

Stargazers

Watchers

awesome-knowledge-distillation-of-llms's Issues

Further distillation papers to consider

Thanks for the great repo, these additional papers related to masked latent semantic modeling (in which pre-training is achieved by recovering latent semantic information extracted from a teacher model) might fit the scope of the survey as well:

[One latest paper] Rethinking Kullback-Leibler Divergence in Knowledge Distillation for Large Language Models

Nice work!
One missing related work:
Rethinking Kullback-Leibler Divergence in Knowledge Distillation for Large Language Models
https://arxiv.org/abs/2404.02657

Request for adding a reference

Dear authors,

I am writing to express my appreciation for your comprehensive and inspiring survey paper about knowledge distillation of LLMs!

I want to bring your attention to our recent paper titled "KnowTuning: Knowledge-aware Fine-tuning for Large Language Models".

In this work, we introduced KnowTuning, a method designed to explicitly and implicitly enhance the knowledge awareness of Large Language Models (LLMs). Based on teacher model GPT-4, we devise an explicit knowledge-aware generation stage to train LLMs to explicitly identify knowledge triples in answers. We also propose an implicit knowledge-aware comparison stage to train LLMs to implicitly distinguish between reliable and unreliable knowledge, in three aspects: completeness, factuality, and logicality.

I think our method is relevant to the discussion in your survey paper.

Once again, thank you for your excellent contribution to the field.

Best regards

[One paper] A new way to perform KD (verified on BERT compression)

Thanks for the great work!
I want to recommend a new method to perform KD: inherited weight.

Name: Weight-Inherited Distillation for Task-Agnostic BERT Compression
code: https://github.com/wutaiqiang/WID-NAACL2024
Blog: https://zhuanlan.zhihu.com/p/687294843
TL, DR: 使用权重继承的思路来实现模型压缩, 直接学习一个映射，将教师模型的权重映射到学生模型。

This method is not proposed for LLM and is evaluated on BERT compression. But may inspire readers to consider a new way to perform distillation. Thanks~

Missing paper (Towards Cross-Tokenizer Distillation: the Universal Logit Distillation Loss for LLMs)

Hy guys, thanks for this great initiative. We recently release a new papers paving the way to a new method to distil any LLM to any other. I thing it could be very useful to add the paper on the list. Feel free to ask me any questions if needed.

Paper (Towards Cross-Tokenizer Distillation: the Universal Logit Distillation Loss for LLMs): https://arxiv.org/abs/2402.12030

tebmer / awesome-knowledge-distillation-of-llms Goto Github PK

awesome-knowledge-distillation-of-llms's People

Contributors

Stargazers

Watchers

Forkers

awesome-knowledge-distillation-of-llms's Issues

Further distillation papers to consider

[One latest paper] Rethinking Kullback-Leibler Divergence in Knowledge Distillation for Large Language Models

Request for adding a reference

[One paper] A new way to perform KD (verified on BERT compression)

Missing paper (Towards Cross-Tokenizer Distillation: the Universal Logit Distillation Loss for LLMs)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent