Git Product home page Git Product logo

tebmer / awesome-knowledge-distillation-of-llms Goto Github PK

View Code? Open in Web Editor NEW
372.0 5.0 23.0 18.42 MB

This repository collects papers for "A Survey on Knowledge Distillation of Large Language Models". We break down KD into Knowledge Elicitation and Distillation Algorithms, and explore the Skill & Vertical Distillation of LLMs.

data-augmentation instruction-following kd knowledge-distillation large-language-model llm self-training survey compression data-synthesis

awesome-knowledge-distillation-of-llms's People

Contributors

alphadl avatar tebmer avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

awesome-knowledge-distillation-of-llms's Issues

Further distillation papers to consider

Thanks for the great repo, these additional papers related to masked latent semantic modeling (in which pre-training is achieved by recovering latent semantic information extracted from a teacher model) might fit the scope of the survey as well:

Request for adding a reference

Dear authors,

I am writing to express my appreciation for your comprehensive and inspiring survey paper about knowledge distillation of LLMs!

I want to bring your attention to our recent paper titled "KnowTuning: Knowledge-aware Fine-tuning for Large Language Models".

In this work, we introduced KnowTuning, a method designed to explicitly and implicitly enhance the knowledge awareness of Large Language Models (LLMs). Based on teacher model GPT-4, we devise an explicit knowledge-aware generation stage to train LLMs to explicitly identify knowledge triples in answers. We also propose an implicit knowledge-aware comparison stage to train LLMs to implicitly distinguish between reliable and unreliable knowledge, in three aspects: completeness, factuality, and logicality.

I think our method is relevant to the discussion in your survey paper.

Once again, thank you for your excellent contribution to the field.

Best regards

[One paper] A new way to perform KD (verified on BERT compression)

Thanks for the great work!
I want to recommend a new method to perform KD: inherited weight.

Name: Weight-Inherited Distillation for Task-Agnostic BERT Compression
code: https://github.com/wutaiqiang/WID-NAACL2024
Blog: https://zhuanlan.zhihu.com/p/687294843
TL, DR: 使用权重继承的思路来实现模型压缩, 直接学习一个映射,将教师模型的权重映射到学生模型。

This method is not proposed for LLM and is evaluated on BERT compression. But may inspire readers to consider a new way to perform distillation. Thanks~

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.