This repository contains papers related to all kinds of LLMs.
We strongly encourage researchers in the hope of advancing their excellent work.
Theme |
Source |
Link |
Other |
…… |
…… |
…… |
…… |
Descriptions |
…… |
Paper |
Source |
Link |
Other |
…… |
…… |
…… |
…… |
Descriptions |
…… |
Paper |
Source |
Link |
Other |
Llama 2: Open Foundation and Fine-Tuned Chat Models |
Arxiv2023'Meta |
bilibili |
Github: Llama |
Descriptions |
The technical report of Llama 2 from Meta Which is one of the top leaders of the LLMs open-sourced community. The greatest contribution of Llama 2 is the development of a range of pretrained and fine-tuned large language models (LLMs) that not only outperform existing open-source chat models on various benchmarks but are also optimized for dialogue scenarios. Additionally, these models have shown excellent performance in human evaluations of helpfulness and safety, potentially serving as effective substitutes for closed-source models. The Llama 2 project also provides a detailed description of the fine-tuning process and safety enhancements, aimed at fostering further development by the community and contributing to the responsible development of large language models.
本论文是Llama 2 模型发布的技术报告,来自全球最主要的大模型开源领袖之一 Meta。Llama 2的最大贡献是开发了一系列预训练和微调的大型语言模型(LLM),这些模型不仅在多个基准测试中优于现有的开源聊天模型,而且还经过优化,特别适用于对话场景。此外,这些模型在人类评估的帮助性和安全性方面表现出色,可能成为闭源模型的有效替代品。Llama 2项目还提供了对微调过程和安全性提升的详细描述,旨在促进社区基于此工作进一步发展,贡献于负责任的大型语言模型的开发。 |
Higher Layers Need More LoRA Experts |
Arxiv2024'Northwestern University |
…… |
…… |
Descriptions |
In deep learning models, higher layers require more LoRA (Low-Rank Adaptation) experts to enhance the model’s expressive power and adaptability. |
LongLLMLingua: Accelerating and Enhancing LLMs in Long Context Scenarios via Prompt Compression |
Arxiv2023'Microsoft |
…… |
…… |
Descriptions |
To accelerate and enhance the performance of large language models (LLMs) in handling long texts, compressing prompts can be an effective method. |
Can AI Assistants Know What They Don't Know? |
Arxiv2024'Fudan University |
…… |
Code: Say-I-Dont-Know |
Descriptions |
The paper explores if AI assistants can identify when they don't know something, creating a "I don't know" dataset to teach this, resulting in fewer false answers and increased accuracy. |
Code Llama: Open Foundation Models for Code |
Arxiv2023'Meta AI |
bilibili |
codellama |
Descriptions |
The article introduces Code Llama, a family of large programming language models developed by Meta AI, based on Llama 2, designed to offer state-of-the-art performance among open models, support large input contexts, and possess zero-shot instruction following capabilities for programming tasks. |
Are Emergent Abilities of Large Language Models a Mirage? |
NIPS2023'Stanford University |
bilibili |
…… |
Descriptions |
The article challenges the notion that large language models (LLMs) exhibit "emergent abilities," suggesting that these abilities may be an artifact of the metrics chosen by researchers rather than inherent properties of the models themselves. Through mathematical modeling, empirical testing, and meta-analysis, the authors demonstrate that alternative metrics or improved statistical methods can eliminate the perception of emergent abilities, casting doubt on their existence as a fundamental aspect of scaling AI models. |
Vary: Scaling up the Vision Vocabulary for Large Vision-Language Models |
Arxiv2023'MEGVII Technology |
…… |
VaryBase |
Descriptions |
The article introduces Vary, a method for expanding the visual vocabulary of Large Vision-Language Models (LVLMs) to enhance dense and fine-grained visual perception capabilities for specific visual tasks, such as document-level OCR or chart understanding. |
Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks |
Arxiv2019'UKP Lab |
bilibili |
sentence-transformers |
Descriptions |
The paper introduces Sentence-BERT (SBERT), a modification of the BERT network that employs siamese and triplet network structures to produce semantically meaningful sentence embeddings that can be compared using cosine similarity, thereby significantly enhancing the efficiency of sentence similarity search and clustering tasks. |
Towards a Unified View of Parameter-Efficient Transfer Learning |
ICLR2022'Carnegie Mellon University |
…… |
unify-parameter-efficient-tuning |
Descriptions |
This paper presents a unified framework for understanding and improving various parameter-efficient transfer learning methods by modifying specific hidden states in pre-trained models, defining a set of design dimensions to differentiate between methods, and experimentally demonstrating the framework's ability to identify important design choices in previous methods and instantiate new parameter-efficient tuning methods that are more effective with fewer parameters. |
QLoRA: Efficient Finetuning of Quantized LLMs |
NeurIPS2023'University of Washington |
bilibili |
Github: QLoRA |
Descriptions |
This paper introduces QLoRA, a method for fine-tuning LLMs that significantly reduces memory usage. QLoRA achieves this by:
- Using a new data type called 4-bit NormalFloat (NF4) for weights, which is efficient for storing normally distributed weight values.
- Applying "double quantization" to compress the size of quantization constants.
- Employing "paged optimizers" to manage memory spikes during training.
这篇论文提出了一种名为 QLoRA 的大型语言模型的方法,可以显著降低内存使用量。QLoRA 通过以下方式实现这一点:
- 使用一种名为 4-bit NormalFloat (NF4) 的新数据类型来存储权重,该数据类型对存储服从正态分布的权重值非常有效。
- 应用“双量化”来压缩量化常数的尺寸。
- 采用“分页优化器”来管理训练过程中的内存峰值。
这些创新使 QLoRA 能够在内存有限的单个 GPU (48GB) 上微调大型模型 (例如,65B 参数)。 训练出的模型在聊天机器人基准测试上实现了最先进的性能,甚至在某些情况下超过了 ChatGPT 等先前模型的性能。 |
Prefix-Tuning: Optimizing Continuous Prompts for Generation |
ArXive2021'Stanford University |
bilibili |
... |
Descriptions |
This paper introduces prefix-tuning, a lightweight alternative to fine-tuning for natural language generation tasks. Unlike fine-tuning, which modifies all language model parameters, prefix-tuning keeps them frozen and optimizes a small continuous task-specific vector (called the prefix). This allows prefix-tuning to be more efficient than fine-tuning, especially in low-data settings.
这篇论文提出了一种名为“前缀微调”的轻量级替代微调的方法,用于自然语言生成任务。与微调修改所有语言模型参数不同,前缀微调保持参数冻结,并优化一个小的连续任务特定向量 (称为前缀)。这使得前缀微调比微调更有效,尤其是在数据量较小的背景下。 |
Label Words are Anchors: An Information Flow Perspective for Understanding In-Context Learning |
EMNLP2023'Peking University |
bilibili |
Github: ICL |
Descriptions |
This paper sheds light on the inner workings of in-context learning (ICL) in LLMs. While ICL has shown promise in enabling LLMs to perform various tasks through demonstrations, the mechanism behind this learning has been unclear. The authors investigate this mechanism through the lens of information flow and discover that labels in the demonstrations act as anchors. These labels serve two key functions: 1) During initial processing, semantic information accumulates within the representations of these label words. 2) This consolidated information acts as a reference point for the LLMs' final predictions. Based on these findings, the paper introduces three novel contributions: 1) An anchor re-weighting method to enhance ICL performance, 2) A demonstration compression technique to improve efficiency, and 3) An analysis framework to diagnose ICL errors in GPT2-XL. The effectiveness of these contributions validates the proposed mechanism and paves the way for future research in ICL.
这篇论文通过信息流视角揭示了大型语言模型 (LLM) 中的上下文学习 (ICL) 的内部工作原理。虽然 ICL 在通过演示让大型语言模型执行各种任务方面表现出潜力,但其背后的学习机制一直不清楚。作者通过信息流的视角研究了这种机制,并发现演示中的标签充当锚点作用。这些标签具有两个关键功能:1) 在初始处理过程中,语义信息会累积在这些标签词的表征中。2) 这种整合的信息作为大型语言模型最终预测的参考点。基于这些发现,论文提出了三项原创贡献:1) 提高 ICL 性能的锚点重新加权方法,2) 提高推理效率的演示压缩技术,3) 用于诊断 GPT2-XL 中 ICL 错误的分析框架。这些贡献的有效性验证了所提出的机制,并为 ICL 的未来研究铺平了道路。。 |
Paper |
Source |
Link |
Other |
…… |
…… |
…… |
…… |
Descriptions |
…… |
Paper |
Source |
Link |
Other |
…… |
…… |
…… |
…… |
Descriptions |
…… |