Git Product home page Git Product logo

awesome-llm-paper's Introduction

Awesome-LLM-paper

Awesome License: MIT Made With Love

This repository contains papers related to all kinds of LLMs.

We strongly encourage researchers in the hope of advancing their excellent work.


Contents


Resources

Workshops and Tutorials

Theme Source Link Other
…… …… …… ……
Descriptions ……

Papers

Survey

Paper Source Link Other
A Survey on Multimodal Large Language Models for Autonomous Driving arXiv:2311.12320 bilibili ……
Descriptions ……
Retrieval-Augmented Generation for Large Language Models: A Survey Arxiv2023'Tongji University …… ……
Descriptions This paper provides a comprehensive overview of the integration of retrieval mechanisms with generative processes within large language models to enhance their performance and knowledge capabilities.
A Survey on Multimodal Large Language Models for Autonomous Driving WACV2023'Purdue University Bilibili: MLM for Autonomous Driving Survey Github: MLM for Autonomous Driving Survey
Descriptions This paper provides a comprehensive overview of the integration of retrieval mechanisms with generative processes within large language models to enhance their performance and knowledge capabilities.

Benchmark and Evaluation

Paper Source Link Other
…… …… …… ……
Descriptions ……

RAG

Paper Source Link Other
Improving Text Embeddings with Large Language Models Arxiv2024'Microsoft …… ……
ARES: An Automated Evaluation Framework for Retrieval-Augmented Generation Systems NAACL 2024 …… Code: stanford-futuredata/ARES
Descriptions ARES, an Automated RAG Evaluation System, efficiently evaluates retrieval-augmented generation systems across multiple tasks using synthetic data and minimal human annotations, maintaining accuracy even with domain shifts.

Embedding

Paper Source Link Other
C-Pack: Packaged Resources To Advance General Chinese Embedding Arxiv2023'BAAI Bilibili: C-Pack Github: C-Pack
Descriptions BAAI and Huggingface introduce C-Pack which is an advanced model for Chinese embeddings, significantly outperforming existing models and includes comprehensive benchmarks, a massive dataset, and a range of models. BAAI 联合Huggingface 推出的 C-Pack,主打中文嵌入,性能明显优于现有模型,包括全面的基准测试、大规模数据集和多种模型。

LLM

Paper Source Link Other
Llama 2: Open Foundation and Fine-Tuned Chat Models Arxiv2023'Meta bilibili Github: Llama
Descriptions The technical report of Llama 2 from Meta Which is one of the top leaders of the LLMs open-sourced community. The greatest contribution of Llama 2 is the development of a range of pretrained and fine-tuned large language models (LLMs) that not only outperform existing open-source chat models on various benchmarks but are also optimized for dialogue scenarios. Additionally, these models have shown excellent performance in human evaluations of helpfulness and safety, potentially serving as effective substitutes for closed-source models. The Llama 2 project also provides a detailed description of the fine-tuning process and safety enhancements, aimed at fostering further development by the community and contributing to the responsible development of large language models.

本论文是Llama 2 模型发布的技术报告,来自全球最主要的大模型开源领袖之一 Meta。Llama 2的最大贡献是开发了一系列预训练和微调的大型语言模型(LLM),这些模型不仅在多个基准测试中优于现有的开源聊天模型,而且还经过优化,特别适用于对话场景。此外,这些模型在人类评估的帮助性和安全性方面表现出色,可能成为闭源模型的有效替代品。Llama 2项目还提供了对微调过程和安全性提升的详细描述,旨在促进社区基于此工作进一步发展,贡献于负责任的大型语言模型的开发。

Higher Layers Need More LoRA Experts Arxiv2024'Northwestern University …… ……
Descriptions In deep learning models, higher layers require more LoRA (Low-Rank Adaptation) experts to enhance the model’s expressive power and adaptability.
LongLLMLingua: Accelerating and Enhancing LLMs in Long Context Scenarios via Prompt Compression Arxiv2023'Microsoft …… ……
Descriptions To accelerate and enhance the performance of large language models (LLMs) in handling long texts, compressing prompts can be an effective method.
Can AI Assistants Know What They Don't Know? Arxiv2024'Fudan University …… Code: Say-I-Dont-Know
Descriptions The paper explores if AI assistants can identify when they don't know something, creating a "I don't know" dataset to teach this, resulting in fewer false answers and increased accuracy.
Code Llama: Open Foundation Models for Code Arxiv2023'Meta AI bilibili codellama
Descriptions The article introduces Code Llama, a family of large programming language models developed by Meta AI, based on Llama 2, designed to offer state-of-the-art performance among open models, support large input contexts, and possess zero-shot instruction following capabilities for programming tasks.
Are Emergent Abilities of Large Language Models a Mirage? NIPS2023'Stanford University bilibili ……
Descriptions The article challenges the notion that large language models (LLMs) exhibit "emergent abilities," suggesting that these abilities may be an artifact of the metrics chosen by researchers rather than inherent properties of the models themselves. Through mathematical modeling, empirical testing, and meta-analysis, the authors demonstrate that alternative metrics or improved statistical methods can eliminate the perception of emergent abilities, casting doubt on their existence as a fundamental aspect of scaling AI models.
Vary: Scaling up the Vision Vocabulary for Large Vision-Language Models Arxiv2023'MEGVII Technology …… VaryBase
Descriptions The article introduces Vary, a method for expanding the visual vocabulary of Large Vision-Language Models (LVLMs) to enhance dense and fine-grained visual perception capabilities for specific visual tasks, such as document-level OCR or chart understanding.
Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks Arxiv2019'UKP Lab bilibili sentence-transformers
Descriptions The paper introduces Sentence-BERT (SBERT), a modification of the BERT network that employs siamese and triplet network structures to produce semantically meaningful sentence embeddings that can be compared using cosine similarity, thereby significantly enhancing the efficiency of sentence similarity search and clustering tasks.

Fine-tuning

Towards a Unified View of Parameter-Efficient Transfer Learning ICLR2022'Carnegie Mellon University …… unify-parameter-efficient-tuning
Descriptions This paper presents a unified framework for understanding and improving various parameter-efficient transfer learning methods by modifying specific hidden states in pre-trained models, defining a set of design dimensions to differentiate between methods, and experimentally demonstrating the framework's ability to identify important design choices in previous methods and instantiate new parameter-efficient tuning methods that are more effective with fewer parameters.
QLoRA: Efficient Finetuning of Quantized LLMs NeurIPS2023'University of Washington bilibili Github: QLoRA
Descriptions This paper introduces QLoRA, a method for fine-tuning LLMs that significantly reduces memory usage. QLoRA achieves this by:
  • Using a new data type called 4-bit NormalFloat (NF4) for weights, which is efficient for storing normally distributed weight values.
  • Applying "double quantization" to compress the size of quantization constants.
  • Employing "paged optimizers" to manage memory spikes during training.

这篇论文提出了一种名为 QLoRA 的大型语言模型的方法,可以显著降低内存使用量。QLoRA 通过以下方式实现这一点:

  • 使用一种名为 4-bit NormalFloat (NF4) 的新数据类型来存储权重,该数据类型对存储服从正态分布的权重值非常有效。
  • 应用“双量化”来压缩量化常数的尺寸。
  • 采用“分页优化器”来管理训练过程中的内存峰值。

这些创新使 QLoRA 能够在内存有限的单个 GPU (48GB) 上微调大型模型 (例如,65B 参数)。 训练出的模型在聊天机器人基准测试上实现了最先进的性能,甚至在某些情况下超过了 ChatGPT 等先前模型的性能。

Prefix-Tuning: Optimizing Continuous Prompts for Generation ArXive2021'Stanford University bilibili ...
Descriptions This paper introduces prefix-tuning, a lightweight alternative to fine-tuning for natural language generation tasks. Unlike fine-tuning, which modifies all language model parameters, prefix-tuning keeps them frozen and optimizes a small continuous task-specific vector (called the prefix). This allows prefix-tuning to be more efficient than fine-tuning, especially in low-data settings.

这篇论文提出了一种名为“前缀微调”的轻量级替代微调的方法,用于自然语言生成任务。与微调修改所有语言模型参数不同,前缀微调保持参数冻结,并优化一个小的连续任务特定向量 (称为前缀)。这使得前缀微调比微调更有效,尤其是在数据量较小的背景下。

Prompt/Context

Label Words are Anchors: An Information Flow Perspective for Understanding In-Context Learning EMNLP2023'Peking University bilibili Github: ICL
Descriptions This paper sheds light on the inner workings of in-context learning (ICL) in LLMs. While ICL has shown promise in enabling LLMs to perform various tasks through demonstrations, the mechanism behind this learning has been unclear. The authors investigate this mechanism through the lens of information flow and discover that labels in the demonstrations act as anchors. These labels serve two key functions: 1) During initial processing, semantic information accumulates within the representations of these label words. 2) This consolidated information acts as a reference point for the LLMs' final predictions. Based on these findings, the paper introduces three novel contributions: 1) An anchor re-weighting method to enhance ICL performance, 2) A demonstration compression technique to improve efficiency, and 3) An analysis framework to diagnose ICL errors in GPT2-XL. The effectiveness of these contributions validates the proposed mechanism and paves the way for future research in ICL.

这篇论文通过信息流视角揭示了大型语言模型 (LLM) 中的上下文学习 (ICL) 的内部工作原理。虽然 ICL 在通过演示让大型语言模型执行各种任务方面表现出潜力,但其背后的学习机制一直不清楚。作者通过信息流的视角研究了这种机制,并发现演示中的标签充当锚点作用。这些标签具有两个关键功能:1) 在初始处理过程中,语义信息会累积在这些标签词的表征中。2) 这种整合的信息作为大型语言模型最终预测的参考点。基于这些发现,论文提出了三项原创贡献:1) 提高 ICL 性能的锚点重新加权方法,2) 提高推理效率的演示压缩技术,3) 用于诊断 GPT2-XL 中 ICL 错误的分析框架。这些贡献的有效性验证了所提出的机制,并为 ICL 的未来研究铺平了道路。。

Agent

Paper Source Link Other
…… …… …… ……
Descriptions ……

MMLM

Paper Source Link Other
…… …… …… ……
Descriptions ……

Reinforcement Learning

Paper Source Link Other
Diffusion Policies as an Expressive Policy Class for Offline Reinforcement Learning ICLR2023 …… ……
Descriptions Diffusion strategies, as a highly expressive class of policies, are used in offline reinforcement learning scenarios to improve learning efficiency and decision-making performance.

🌟 Contributors

Star History

Star History Chart

awesome-llm-paper's People

Contributors

brycewang2018 avatar eleanor825 avatar ajupyter avatar buaadreamer avatar anooyman avatar eavae avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.