Git Product home page Git Product logo

awesome-fast-attention's Introduction

awesome-fast-attention Awesome

A curated list of efficient attention modules (last update: Fri, 31 Jul 2020 16:04:07 +0000)

Table of Contents

Efficient Attention

Paper (citations) Code Complexity AutoRegressive Main Idea
Generating Wikipedia by Summarizing Long Sequences (208) - formula
EXPAND

compresses key and value + blocked attention

CCNet: Criss-Cross Attention for Semantic Segmentation (148) CCNet formula
EXPAND

each pixel attends to its row and column simultaneously

Efficient Attention: Attention with Linear Complexities (2) efficient-attention formula
EXPAND

Softmax(Q)*(Softmax(K^T)*V)

Star-Transformer (24) fastNLP formula
EXPAND

uses a relay(global) node and attends to/from that node

Generating Long Sequences with Sparse Transformers (138) torch-blocksparse formula ✔️
EXPAND

sparse block based attention

GCNet: Non-local Networks Meet Squeeze-Excitation Networks and Beyond (96) GCNet formula
EXPAND

squeeze and excitation with an attention pooling (instead of a GAP)

SCRAM: Spatially Coherent Randomized Attention Maps (1) - formula ✔️
EXPAND

uses PatchMatch to find close keys

Interlaced Sparse Self-Attention for Semantic Segmentation (13) IN_PAPER formula ✔️
EXPAND

combination of a short length and then long range(dilated) attention

Permutohedral Attention Module for Efficient Non-Local Neural Networks (2) Permutohedral_attention_module formula
EXPAND

uses permutohedral lattice approximation algorithm to approximate the attention output

Large Memory Layers with Product Keys (28) XLM formula ✔️
EXPAND

search for nearest neighbor keys

Expectation-Maximization Attention Networks for Semantic Segmentation (38) EMANet formula
EXPAND

applys expectation maximization to cluster keys into k clusters

Compressive Transformers for Long-Range Sequence Modelling (20) compressive-transformer-pytorch formula ✔️
EXPAND

compresses distant tokens instead of just stop_grad() ing them, more efficient version of transformerXL

BP-Transformer: Modelling Long-Range Context via Binary Partitioning (8) BPT formula ✔️
EXPAND

attends to distant tokens coarsely and attends to close tokens in a more fine-grained manner

Axial Attention in Multidimensional Transformers (5) axial-attention formula ✔️
EXPAND

apply attention on each axis separately

Reformer: The Efficient Transformer (69) trax formula ✔️
EXPAND

uses LSH to find close keys

Transformer on a Diet (2) transformer-on-diet formula ✔️
EXPAND

dilated transformer like wavenet

Sparse Sinkhorn Attention (4) sinkhorn-transformer formula ✔️
EXPAND

uses a cost matrix to limit attention between buckets

SAC: Accelerating and Structuring Self-Attention via Sparse Adaptive Connection (1) - formula ✔️
EXPAND

learns the q, k connections == dynamically creates a sparse attention matrix

Efficient Content-Based Sparse Attention with Routing Transformers (11) routing-transformer formula
EXPAND

computes attention with same-cluster tokens (computed by online k-means)

Longformer: The Long-Document Transformer (15) longformer formula ✔️
EXPAND

global + blocked attention

Neural Architecture Search for Lightweight Non-Local Networks (2) AutoNL formula
EXPAND

computes Q(KV) and also down samples q, k, v both in spatial and channel dimensions

ETC: Encoding Long and Structured Data in Transformers (2) - formula ✔️
EXPAND

combines global attention (star transformer with multiple global tokens) with local attention

Multi-scale Transformer Language Models (1) IN_PAPER formula ✔️
EXPAND

UNet like + retina attetion is something close to BP-Transformer

Synthesizer: Rethinking Self-Attention in Transformer Models (5) - formula ✔️
EXPAND

does not compute pairwise interactions

Jukebox: A Generative Model for Music (9) jukebox formula ✔️
EXPAND

better attention patterns from Sparse Transformer

GMAT: Global Memory Augmentation for Transformers (0) gmat formula
EXPAND

adds global tokens

Masked Language Modeling for Proteins via Linearly Scalable Long-Context Transformers (0) google-research formula ✔️
EXPAND

calculate an unbiased stochastic approximation of the attention matrix

Hand-crafted Attention is All You Need? A Study of Attention on Self-supervised Audio Transformer (0) - formula ✔️
EXPAND

does not compute pairwise interactions and uses fixed mask patters

Transformers are RNNs: Fast Autoregressive Transformers with Linear Attention (1) fast-transformers formula ✔️
EXPAND

uses phi(q)(phi(k)v) and also improves the sequential sampling step

Linformer: Self-Attention with Linear Complexity (3) linformer formula
EXPAND

project key and value from nd to kd

Real-time Semantic Segmentation with Fast Attention (0) - formula
EXPAND

l2_norm(q)*(l2_norm(k)*v)

Fast Transformers with Clustered Attention (0) fast-transformers formula
EXPAND

groups queries together with LSH

Articles

awesome-fast-attention's People

Contributors

separius avatar

Watchers

 avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.