Git Product home page Git Product logo

paper-list-resource-efficient-large-language-model's Introduction

⚠️ This repository is not maintained anymore. Checkout our survey paper on efficient LLM and the corresponding paper list.

Paper-list-resource-efficient-large-language-model

Target venues: system conferences (OSDI/SOSP/ATC/EuroSys), architecture conferences (ISCA/MICRO/ASPLOS/HPCA), network conferences (NSDI/SIGCOMM), mobile conferences (MobiCom/MobiSys/SenSys/UbiComp), AI conferences (NeurIPS/ACL/ICLR/ICML)

We will keep maintaining this list :)

Note: We only focus on inference now. We plan to involve training work in the future.

Example: [Conference'year] Title, First-author Affiliation

Model

[ICLR'23] GPTQ: ACCURATE POST-TRAINING QUANTIZATION FOR GENERATIVE PRE-TRAINED TRANSFORMERS, IST Austria
[ICLR'23] Token Merging: Your ViT But Faster, Georgia Tech
[ICLR'23] Efficient Attention via Control Variates, University of Hong Kong
[ICLR'23] HiViT: A Simpler and More Efficient Design of Hierarchical Vision Transformer, University of Chinese Academy of Sciences
[ICLR'23] Knowledge-in-Context: Towards Knowledgeable Semi-Parametric Language Models, Tencent AI Lab

[MLSys'23] Practical Edge Kernels for Integer-Only Vision Transformers Under Post-training Quantization, National university of Singapore

[ACL'22] AraT5: Text-to-Text Transformers for Arabic Language Generation The University of British Columbia
[ACL'22] ClusterFormer: Neural Clustering Attention for Efficient and Effective Transformer Tianjin University
[ACL'22] ∞-former: Infinite Memory Transformer Instituto de Telecomunicações
[ACL'22] LiLT: A Simple yet Effective Language-Independent Layout Transformer for Structured Document Understanding South China University of Technology
[ACL'22] PLANET: Dynamic Content Planning in Autoregressive Transformers for Long-form Text Generation Baidu Inc

[ICLR'22] Memorizing Transformers, Google
[ICLR'22] Understanding the Role of Self Attention for Efficient Speech Recognition, Seoul National University

[NeurIPS'22] Confident Adaptive Language Modeling, Google Research
[NeurIPS'22] Transcormer: Transformer for Sentence Scoring with Sliding Language Modeling Microsoft Research Asia
[NeurIPS'22] Large Language Models are Zero-Shot Reasoners The University of Tokyo
[NeurIPS'22] Training language models to follow instructions with human feedback, OpenAI

[ACL'21] RealFormer: Transformer Likes Residual Attention Google Research

[NeurIPS'21] Shapeshifter: a Parameter-efficient Transformer using Factorized Reshaped Matrices Virginia Commonwealth University
[NeurIPS'21] Systematic Generalization with Edge Transformers University of California
[NeurIPS'21] NxMTransformer: Semi-Structured Sparsification for Natural Language Understanding via ADMM Colorado School of Mines
[NeurIPS'21] Redesigning the Transformer Architecture with Insights from Multi-particle Dynamical Systems Jadavpur University
[NeurIPS'21] Fast Multi-Resolution Transformer Fine-tuning for Extreme Multi-label Text Classification Amazon
[NeurIPS'21] Sparse is Enough in Scaling Transformers Google Research
[NeurIPS'21] Neural Rule-Execution Tracking Machine For Transformer-Based Text Generation Macquarie University
[NeurIPS'21] Long-Short Transformer: Efficient Transformers for Language and Vision University of Maryland
[NeurIPS'21] Combiner: Full Attention Transformer with Sparse Computation Cost Stanford University
[NeurIPS'21] FMMformer: Efficient and Flexible Transformer via Decomposed Near-field and Far-field Attention University of California
[NeurIPS'21] Searching for Efficient Transformers for Language Modeling Google Research

[SenSys'21] LIMU-BERT: Unleashing the Potential of Unlabeled Data for IMU Sensing Applications, Nanyang Technological University

[NeurIPS'20] Deep Transformers with Latent Depth Facebook AI Research
[NeurIPS'20] Funnel-Transformer: Filtering out Sequential Redundancy for Efficient Language Processing Carnegie Mellon University
[NeurIPS'20] MiniLM: Deep Self-Attention Distillation for Task-Agnostic Compression of Pre-Trained Transformers Microsoft Research
[NeurIPS'20] Big Bird: Transformers for Longer Sequences Google Research
[NeurIPS'20] Fast Transformers with Clustered Attention Idiap Research Institute, Switzerland

[NeurIPS'19] Levenshtein Transformer Facebook AI Research
[NeurIPS'19] Novel positional encodings to enable tree-based transformers Microsoft Research
[NeurIPS'19] A Tensorized Transformer for Language Modeling Tianjin University

[ICLR'18] Non-Autoregressive Neural Machine Translation, University of Hong Kong

Input

[UbiComp'22] IF-ConvTransformer: A Framework for Human Activity Recognition Using IMU Fusion and ConvTransformer, National University of Defense Technology

Training algorithm

[MobiCom'23] Efficient Federated Learning for Modern NLP, Beijing University of Posts and Telecommunications
[MobiCom'23] Federated Few-shot Learning for Mobile NLP, Beijing University of Posts and Telecommunications
[ICLR'23] Not All Tasks Are Born Equal: Understanding Zero-Shot Generalization, Tsinghua University
[ICLR'23] Sparse MoE as the New Dropout: Scaling Dense and Self-Slimmable Transformers, Hong Kong University of Science and Technology
[ATC'23] Accelerating Distributed MoE Training and Inference with Lina, City University of Hong Kong
[ATC'23] SmartMoE: Efficiently Training Sparsely-Activated Models through Combining Offline and Online Parallelization, Tsinghua University

[ICLR'22] Towards a Unified View of Parameter-Efficient Transfer Learning, Carnegie Mellon University

[NeurIPS'22] AD-DROP: Attribution-Driven Dropout for Robust Language Model Fine-Tuning, Sun Yat-sen University
[NeurIPS'22] A Win-win Deal: Towards Sparse and Robust Pre-trained Language Models, Chinese Academy of Sciences
[NeurIPS'22] Fine-Tuning Pre-Trained Language Models Effectively by Optimizing Subnetworks Adaptively, Peking University

[NeurIPS'20] Accelerating Training of Transformer-Based Language Models with Progressive Layer Dropping Microsoft Corporation

[NeurIPS'19] Ouroboros: On Accelerating Training of Transformer-Based Language Models Duke University

Inference engine

[ASPLOS'23] FLAT: An Optimized Dataflow for Mitigating Attention Bottlenecks, Georgia Institute of Technology
[ISCA'23] OliVe: Accelerating Large Language Models via Hardware-friendly Outlier-Victim Pair Quantization, SJTU
[ISCA'23] FACT: FFN-Attention Co-optimized Transformer Architecture with Eager Correlation Prediction, THU

[EuroSys'23]Tabi: An Efficient Multi-Level Inference System for Large Language Models, HKUST
[MLSys'23] Flex: Adaptive Mixture-of-Experts at Scale, Microsoft Research

[MLSys'23] Efficiently Scaling Transformer Inference, Google

[OSDI'22] Orca: A Distributed Serving System for Transformer-Based Generative Models, Seoul National University

[ATC'22] PetS: A Unified Framework for Parameter-Efficient Transformers Serving, Peking University

[NeurIPS'22] Towards Efficient Post-training Quantization of Pre-trained Language Models, Huawei Noah’s Ark Lab
[NeurIPS'22] Solving Quantitative Reasoning Problems with Language Models, Google Research
[NeurIPS'22] Fine-tuning Language Models over Slow Networks using Activation Quantization with Guarantees, ETH Zürich, Switzerland
[NeurIPS'22] Few-shot Task-agnostic Neural Architecture Search for Distilling Large Language Models, NC State University
[NeurIPS'22] Exploring Length Generalization in Large Language Models, Google Research

[ACL'21] MiniLMv2: Multi-Head Self-Attention Relation Distillation for Compressing Pretrained Transformers Microsoft Research

[ASPLOS'23] Overlap Communication with Dependent Computation via Decomposition in Large Deep Learning Models Google

[MobiCom'23] LUT-NN: Empower Efficient Neural Network Inference with Centroid Learning and Table Lookup, Microsoft Research Asia

[ACL'23] Distilling Script Knowledge from Large Language Models for Constrained Language Planning, Fudan University
[ACL'23] I2D2: Inductive Knowledge Distillation with NeuroLogic and Self-Imitation, University of Southern California
[ACL'23] Symbolic Chain-of-Thought Distillation: Small Models Can Also “Think” Step-by-Step, University of California
[ACL'23] GKD: A General Knowledge Distillation Framework for Large-scale Pre-trained Language Model, Anhui University

[NeurIPS'23] Can Language Models Teach? Teacher Explanations Improve Student Performance via Theory of Mind, University of North Carolina at Chapel Hill
[NeurIPS'23] Blockwise Parallel Transformer for Large Context Models, UC Berkeley
[NeurIPS'23] LLM-Pruner: On the Structural Pruning of Large Language Models, National University of Singapore
[NeurIPS'23] The Emergence of Essential Sparsity in Large Pre-trained Models: The Weights that Matter, University of Texas at Austin
[NeurIPS'23] Scissorhands: Exploiting the Persistence of Importance Hypothesis for LLM KV Cache Compression at Test Time, Rice University
[NeurIPS'23] Dynamic Context Pruning for Efficient and Interpretable Autoregressive Transformers, ETH Zürich
[NeurIPS'23] QuIP: 2-Bit Quantization of Large Language Models With Guarantees, Cornell University

Training Engine

[ASPLOS'23] Optimus-CC: Efficient Large NLP Model Training with 3D Parallelism Aware Communication Compression Yonsei University

[ASPLOS'23] Mobius: Fine Tuning Large-Scale Models on Commodity GPU Servers THU

[HPCA'23] MPress: Democratizing Billion-Scale Model Training on Multi-GPU Servers via Memory-Saving Inter-Operator Parallelism UCST

[HPCA'23] OptimStore: In-Storage Optimization of Large Scale DNNs with On-Die Processing KAIST

[NeurIPS'23] QLoRA: Efficient Finetuning of Quantized LLMs, University of Washington

Compiler

Hardware

Search Engine

[UbiComp'23] ODSearch: Fast and Resource Efficient On-device Natural Language Search for Fitness Trackers' Data Boston University

paper-list-resource-efficient-large-language-model's People

Contributors

caidongqi avatar liang1232018 avatar qipengwang avatar ubiquitouslearning avatar yinwangsong avatar yirongjie avatar yuanjinliang1 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.