⚠️ This repository is not maintained anymore. Checkout our survey paper on efficient LLM and the corresponding paper list.

Paper-list-resource-efficient-large-language-model

Target venues: system conferences (OSDI/SOSP/ATC/EuroSys), architecture conferences (ISCA/MICRO/ASPLOS/HPCA), network conferences (NSDI/SIGCOMM), mobile conferences (MobiCom/MobiSys/SenSys/UbiComp), AI conferences (NeurIPS/ACL/ICLR/ICML)

We will keep maintaining this list :)

Note: We only focus on inference now. We plan to involve training work in the future.

Example: [Conference'year] Title, First-author Affiliation

Model

[ICLR'23] GPTQ: ACCURATE POST-TRAINING QUANTIZATION FOR GENERATIVE PRE-TRAINED TRANSFORMERS, IST Austria
[ICLR'23] Token Merging: Your ViT But Faster, Georgia Tech
[ICLR'23] Efficient Attention via Control Variates, University of Hong Kong
[ICLR'23] HiViT: A Simpler and More Efficient Design of Hierarchical Vision Transformer, University of Chinese Academy of Sciences
[ICLR'23] Knowledge-in-Context: Towards Knowledgeable Semi-Parametric Language Models, Tencent AI Lab

[MLSys'23] Practical Edge Kernels for Integer-Only Vision Transformers Under Post-training Quantization, National university of Singapore

[ACL'22] AraT5: Text-to-Text Transformers for Arabic Language Generation The University of British Columbia
[ACL'22] ClusterFormer: Neural Clustering Attention for Efficient and Effective Transformer Tianjin University
[ACL'22] ∞-former: Infinite Memory Transformer Instituto de Telecomunicações
[ACL'22] LiLT: A Simple yet Effective Language-Independent Layout Transformer for Structured Document Understanding South China University of Technology
[ACL'22] PLANET: Dynamic Content Planning in Autoregressive Transformers for Long-form Text Generation Baidu Inc

[ICLR'22] Memorizing Transformers, Google
[ICLR'22] Understanding the Role of Self Attention for Efficient Speech Recognition, Seoul National University

[NeurIPS'22] Confident Adaptive Language Modeling, Google Research
[NeurIPS'22] Transcormer: Transformer for Sentence Scoring with Sliding Language Modeling Microsoft Research Asia
[NeurIPS'22] Large Language Models are Zero-Shot Reasoners The University of Tokyo
[NeurIPS'22] Training language models to follow instructions with human feedback, OpenAI

[ACL'21] RealFormer: Transformer Likes Residual Attention Google Research

[NeurIPS'21] Shapeshifter: a Parameter-efficient Transformer using Factorized Reshaped Matrices Virginia Commonwealth University
[NeurIPS'21] Systematic Generalization with Edge Transformers University of California
[NeurIPS'21] NxMTransformer: Semi-Structured Sparsification for Natural Language Understanding via ADMM Colorado School of Mines
[NeurIPS'21] Redesigning the Transformer Architecture with Insights from Multi-particle Dynamical Systems Jadavpur University
[NeurIPS'21] Fast Multi-Resolution Transformer Fine-tuning for Extreme Multi-label Text Classification Amazon
[NeurIPS'21] Sparse is Enough in Scaling Transformers Google Research
[NeurIPS'21] Neural Rule-Execution Tracking Machine For Transformer-Based Text Generation Macquarie University
[NeurIPS'21] Long-Short Transformer: Efficient Transformers for Language and Vision University of Maryland
[NeurIPS'21] Combiner: Full Attention Transformer with Sparse Computation Cost Stanford University
[NeurIPS'21] FMMformer: Efficient and Flexible Transformer via Decomposed Near-field and Far-field Attention University of California
[NeurIPS'21] Searching for Efficient Transformers for Language Modeling Google Research