Light

cyr20040123 / awesome-real-time-ai Goto Github PK

View Code? Open in Web Editor NEW

This project forked from kyrie-zhao/awesome-real-time-ai

0.0 1.0 0.0 400 KB

This is a list of awesome edgeAI inference related papers.

awesome-real-time-ai's Introduction

Awesome Real-time AI []

This is a list of awesome real-time AI and DNN inference related projects & papers.

Contents

Benchmark and Dataset
Open Source Projects
Papers

Benchmark and Dataset

Open Source Projects

Papers

Survey

Edge Intelligence: Architectures, Challenges, and Applications by Xu, Dianlei, et al., arxiv 2020
A Survey of Multi-Tenant Deep Learning Inference on GPU by Yu, Fuxun, et al., arxiv 2022
Machine Learning in Real-Time Internet of Things (IoT) Systems: A Survey by Bian, Jiang, et al., IOTJ 2022
AI Augmented Edge and Fog Computing: Trends and Challenges by Tuli S, Mirhakimi F, Pallewatta S, et al., arxiv 2022
Enable deep learning on mobile devices: Methods, systems, and applications by Cai, Han, et al., TODAES 2022
Multi-DNN Accelerators for Next-Generation AI Systems by Venieris, Stylianos I., Christos-Savvas Bouganis, and Nicholas D. Lane., arxiv 2022
A Survey of GPU Multitasking Methods Supported by Hardware Architecture Zhao, Chen, et al., IEEE TPDS 2021

DNN Compiler

TASO: The Tensor Algebra SuperOptimizer for Deep Learning by Zhihao Jia et al., SOSP 2019
AStitch: Enabling a New Multi-dimensional Optimization Space for Memory-Intensive ML Training and Inference on Modern SIMT Architectures by Zhen Zheng et al., ASPLOS 2022
PET: Optimizing Tensor Programs with Partially Equivalent Transformations and Automated Corrections by Haojie Wang et al., OSDI 2021
Rammer: Enabling Holistic Deep Learning Compiler Optimizations with rTasks by Lingxiao Ma et al., OSDI 2020
TASO: The Tensor Algebra SuperOptimizer for Deep Learning by Zhihao Jia et al., SOSP 2019
Bolt: Bridging the Gap between Auto-tuners and Hardware-native Performance by Jiarong Xing et al., MLSys 2022
Ansor: Generating High-Performance Tensor Programs for Deep Learning by Lianmin Zheng et al., OSDI 2020
TenSet: A Large-scale Program Performance Dataset for Learned Tensor Compilers by Lianmin Zheng., NeurIPS 2021
Romou: Rapidly Generate High-Performance Tensor Kernels for Mobile GPUs by Liang, Rendong, et al., MobiCom 2022
Asymo: scalable and efficient deep-learning inference on asymmetric mobile cpus by Wang, Manni, et al., MobiCom 2021
Ios: Inter-operator scheduler for cnn acceleration by Ding, Yaoyao, et al., MLSys 2021
Moses: Efficient Exploitation of Cross-device Transferable Features for Tensor Program Optimization by Zhao, Zhihe, et al., arxiv 2022
DeepCuts: A Deep Learning Optimization Framework for Versatile GPU Workloads by Jung, Wookeun, Thanh Tuan Dao, and Jaejin Lee., PLDI 2021
CASE: a compiler-assisted SchEduling framework for multi-GPU systems by Chen, Chao, Chris Porter, and Santosh Pande., PPoPP 2022
Chameleon: Adaptive code optimization for expedited deep neural network compilation by Ahn, Byung Hoon, et al., arxiv 2020
Analytical characterization and design space exploration for optimization of CNNs by Li, Rui, et al., ASPLOS 2021
DNNFusion: accelerating deep neural networks execution with advanced operator fusion by Niu, Wei, et al., PLDI 2021
AutoGTCO: Graph and Tensor Co-Optimize for Image Recognition with Transformers on GPU by Bai, Yang, et al., ICCAD 2021
DietCode: Automatic Optimization for Dynamic Tensor Programs by Zheng, Bojian, et al., MLSys 2022
ROLLER: Fast and Efficient Tensor Compilation for Deep Learning by Zhu, Hongyu, et al., OSDI 2022
FamilySeer: Towards Optimized Tensor Codes by Exploiting Computation Subgraph Similarity by Zhang, Shanjun, et al., arxiv 2022
Reusing Auto-Schedules for Efficient DNN Compilation by Gibson, Perry, and José Cano., arxiv 2022
Hidet: Task Mapping Programming Paradigm for Deep Learning Tensor Programs by Ding, Yaoyao, et al., arxiv 2022

Edge-Cloud Collaborative Inference

EdgeML: An AutoML framework for real-time deep learning on the edge by Zhao, Zhihe, et al., IoTDI 2021
SPINN: synergistic progressive inference of neural networks over device and cloud by Laskaridis, Stefanos, et al., MobiCom 2020
Clio: Enabling automatic compilation of deep learning pipelines across iot and cloud by Huang, Jin, et al., MobiCom 2020
Neurosurgeon: Collaborative intelligence between the cloud and mobile edge by Kang, Yiping, et al., ASPLOS 2017
Mistify: Automating DNN Model Porting for On-Device Inference at the Edge by Guo, Peizhen et al., NSDI 2021
Deep compressive offloading: Speeding up neural network inference by trading edge computation for network latency. by Yao, Shuochao, et al., SenSys 2020
Elf: accelerate high-resolution mobile deep vision with content-aware parallel offloading by Zhang, Wuyang, et al., MobiCom 2021
Edge assisted real-time object detection for mobile augmented reality by Liu, Luyang, Hongyu Li, and Marco Gruteser., MobiCom 2019

Concurrent DNN Inference

VELTAIR: towards high-performance multi-tenant deep learning services via adaptive compilation and scheduling by Liu, Zihan, et al., ASPLOS 2021
RT-mDL: Supporting Real-Time Mixed Deep Learning Tasks on Edge Platforms by Ling, Neiwen, et al., SenSys 2021
Horus: Interference-aware and prediction-based scheduling in deep learning systems by Yeung, Gingfung, et al., IEEE TPDS 2021
Automated Runtime-Aware Scheduling for Multi-Tenant DNN Inference on GPU by Yu, Fuxun, et al., ICCAD 2021
Interference-aware scheduling for inference serving by Mendoza, Daniel, et al., EuroMLSys 2021
Microsecond-scale Preemption for Concurrent GPU-accelerated DNN Inferences by Han, Mingcong, et al., OSDI 2022
Planaria: Dynamic architecture fission for spatial multi-tenant acceleration of deep neural networks by Ghodrati, Soroush, et al., MICRO 2020
Heimdall: mobile GPU coordination platform for augmented reality applications by Yi, Juheon, and Youngki Lee., MobiCom 2020
Deepeye: Resource efficient local execution of multiple deep vision models using wearable commodity hardware by Mathur, Akhil, et al., MobiSys 2017
PipeSwitch: Fast Pipelined Context Switching for Deep Learning Applications by Bai, Zhihao, et al., OSDI 2020
Enable simultaneous DNN services based on deterministic operator overlap and precise latency prediction by Cui, Weihao, et al., SC 2021
LegoDNN: block-grained scaling of deep neural networks for mobile vision by Han, Rui, et al., MobiCom 2021
NeuOS: A Latency-Predictable Multi-Dimensional Optimization Framework for DNN-driven Autonomous Systems by Bateni, Soroush, and Cong Liu., ATC 2020
Multi-Neural Network Acceleration Architecture by Baek, Eunjin, Dongup Kwon, and Jangwoo Kim., ISCA 2020
Pipelined data-parallel CPU/GPU scheduling for multi-DNN real-time inference by Xiang, Yecheng, and Hyoseung Kim., RTSS 2019
Nestdnn: Resource-aware multi-tenant on-device deep learning for continuous mobile vision by Fang, Biyi, Xiao Zeng, and Mi Zhang., MobiCom 2018
Flep: Enabling flexible and efficient preemption on gpus by Wu, Bo, et al., ASPLOS 2017
Prophet: Precise qos prediction on non-preemptive accelerators to improve utilization in warehouse-scale computers by Chen, Quan, et al., ASPLOS 2017
PAME: precision-aware multi-exit DNN serving for reducing latencies of batched inferences by Zhang, Shulai, et al., ICS 2022
Layerweaver: Maximizing resource utilization of neural processing units via layer-wise scheduling by Oh, Young H., et al., HPCA 2021
LiteReconfig: cost and content aware reconfiguration of video object detection systems for mobile GPUs by Xu, Ran, et al., EuroSys 2022
ApproxNet: Content and contention-aware video object classification system for embedded clients Xu, Ran, et al.

Heterogeneous Platforms

Lalarand: Flexible layer-by-layer cpu/gpu scheduling for real-time dnn tasks by Kang, Woosung, et al., RTSS 2021
DUET: A Compiler-Runtime Subgraph Scheduling Approach for Tensor Programs on a Coupled CPU-GPU Architecture by Zhang, Minjia, Zehua Hu, and Mingqin Li., IPDPS 2021
Band: coordinated multi-DNN inference on heterogeneous mobile processors by Jeong, Joo Seong, et al., MobiSys 2022
ODMDEF: On-Device Multi-DNN Execution Framework Utilizing Adaptive Layer-Allocation on General Purpose Cores and Accelerator by Lim, Cheolsun, and Myungsun Kim., IEEE ACCESS 2021
μlayer: Low latency on-device inference using cooperative single-layer acceleration and processor-friendly quantization by Kim, Youngsok, et al., EuroSys 2019
OPTiC: Optimizing collaborative CPU–GPU computing on mobile devices with thermal constraints by Wang, Siqi, Gayathri Ananthanarayanan, and Tulika Mitra., TCAD 2019
Accelerating Sequence-to-Graph Alignment on Heterogeneous Processors by Feng, Zonghao, and Qiong Luo., ICPP 2021
Efficient Execution of Deep Neural Networks on Mobile Devices with NPU by Tan, Tianxiang, and Guohong Cao., IPSN 2021
CoDL: efficient CPU-GPU co-execution for deep learning inference on mobile devices by Jia, Fucheng, et al., MobiSys 2022
Coda: Improving resource utilization by slimming and co-locating dnn and cpu jobs by Zhao, Han, et al. ICDCS 2020

HPC and Archs

GPUReplay: a 50-KB GPU stack for client ML by Park, Heejin, and Felix Xiaozhu Lin., ASPLOS 2022
Real-time high performance computing using a Jetson Xavier AGX by Cetre, Cyril, et al., ERTS 2022
GPU scheduling on the NVIDIA TX2: Hidden details revealed by Amert, Tanya, et al., RTSS 2017
Nimble: Lightweight and parallel gpu task scheduling for deep learning by Kwon, Woosuk, et al., NeurIPS 2020
Addressing GPU on-chip shared memory bank conflicts using elastic pipeline by Gou, Chunyang, and Georgi N. Gaydadjiev., IJPP 2013
A study of persistent threads style GPU programming for GPGPU workloads by Gupta, Kshitij, Jeff A. Stuart, and John D. Owens., IEEE 2012
Demystifying the placement policies of the NVIDIA GPU thread block scheduler for concurrent kernels by Gilman, Guin, et al., ACM SIGMETRICS Performance Evaluation Review 2021
Exploiting Intra-SM Parallelism in GPUs via Persistent and Elastic Blocks by Zhao, Han, et al., ICDC 2021
Online Thread Auto-Tuning for Performance Improvement and Resource Saving by Luan, Guangqiang, et al., IEEE TPDS 2021
Hsm: A hybrid slowdown model for multitasking gpus by Zhao, Xia, Magnus Jahre, and Lieven Eeckhout., ASPLOS 2020
Enabling and exploiting flexible task assignment on GPU through SM-centric program transformations by Wu, Bo, et al., ACM ICS 2015
Warped-Slicer: Efficient Intra-SM Slicing through Dynamic Resource Partitioning for GPU Multiprogramming by Xu, Qiumin, et al., ISCA 2016
Kernelet: High-Throughput GPU Kernel Executions with Dynamic Slicing and Scheduling by Zhong, Jianlong, and Bingsheng He. IEEE TPDS 2013
Improving GPGPU concurrency with elastic kernels by Pai, Sreepathi, Matthew J. Thazhuthaveetil, and Ramaswamy Govindarajan., ACM SIGARCH Computer Architecture News 2013
Neither More Nor Less: Optimizing Thread-level Parallelism for GPGPUs Kayıran, Onur, et al. ICPCT 2013
Orion: A framework for gpu occupancy tuning by Hayes, Ari B., et al., International Middleware Conference. 2016
Efficient performance estimation and work-group size pruning for OpenCl kernels on GPUs by Wang, Xiebing, et al., IEEE TPDS 2019
Online evolutionary batch size orchestration for scheduling deep learning workloads in GPU clusters by Bian, Zhengda, et al., SC 2021
Autotuning GPU kernels via static and predictive analysis by Lim, Robert, Boyana Norris, and Allen Malony., IEEE ICPP 2017
Gslice: controlled spatial sharing of gpus for a scalable inference platform by Dhakal, Aditya, Sameer G. Kulkarni, and K. K. Ramakrishnan., SOCC 2020

Latency Predictor

MAPLE-X: Latency Prediction with Explicit Microprocessor Prior Knowledge by Abbasi, Saad, Alexander Wong, and Mohammad Javad Shafiee., arxiv 2022
MAPLE-Edge: A Runtime Latency Predictor for Edge Devices by Nair, Saeejith, et al., CVPR 2022
Maple: Microprocessor a priori for latency estimation by Abbasi, Saad, Alexander Wong, and Mohammad Javad Shafiee., CVPR 2022
nn-Meter: towards accurate latency prediction of deep-learning model inference on diverse edge devices by Zhang, Li Lyna, et al., MobiSys 2021
Wei, Mengze, et al. "Predicting and reining in application-level slowdown on spatial multitasking GPUs. by Wei, Mengze, et al., JPDC 2020

TinyML

Mcunet: Tiny deep learning on iot devices by Lin, Ji, et al. , NeurIPS 2020
TinyML: Current Progress, Research Challenges, and Future Roadmap by Shafique, Muhammad, et al., DAC 2021
Benchmarking TinyML systems: Challenges and direction by Banbury, Colby R., et al., arxiv 2020
μNAS: Constrained Neural Architecture Search for Microcontrollers by Liberis, Edgar, Łukasz Dudziak, and Nicholas D. Lane., EuroMLSys 2021
Memory-efficient Patch-based Inference for Tiny Deep Learning by Lin, Ji, et al., NeurIPS 2021
Deep Learning on Microcontrollers: A Study on Deployment Costs and Challenge by Filip Svoboda, Javier Fernandez-Marques, Edgar Liberis, Nicholas D Lane, EuroMLSys 2022

Multi-modality Inference

Dynamic Multimodal Fusion by Xue, Zihui, and Radu Marculescu., arxiv 2022
LM-Nav: Robotic Navigation with Large Pre-Trained Models of Language, Vision, and Action by Shah, Dhruv, et al., arxiv 2022
Accelerating mobile audio sensing algorithms through on-chip gpu offloading by Georgiev, Petko, et al., MobiSys 2017

Sparse Inference

SparTA: Deep-Learning Model Sparsity via Tensor-with-Sparsity-Attribute by Zheng, Ningxin, et al., OSDI 2022
ESCALATE: Boosting the Efficiency of Sparse CNN Accelerator with Kernel Decomposition by Li, Shiyu, et al., MICRO 2021
A high-performance sparse tensor algebra compiler in Multi-Level IR by Tian, Ruiqin, et al., arxiv 2021
Efficient Sparse Matrix Kernels based on Adaptive Workload-Balancing and Parallel-Reduction by Huang, Guyue, et al., arxiv 2021
COEXE: An Efficient Co-execution Architecture for Real-Time Neural Network Services by Liu, Chubo, et al., DAC 2020
TorchSparse: Efficient Point Cloud Inference Engine by Tang, Haotian, et al., MLSys 2022

Other Cool Ideas

Understanding and Optimizing Deep Learning Cold-Start Latency on Edge Devices by Yi, Rongjie, et al., arxiv 2022
Towards efficient vision transformer inference: a first study of transformers on mobile devices by Wang, Xudong, et al., HotMobile 2022
Edgebert: Sentence-level energy optimizations for latency-aware multi-task nlp inference by Tambe, Thierry, et al., MICRO 2021
EDGEWISE: A Better Stream Processing Engine for the Edge by Fu, Xinwei, et al., ATC 2019
LiteFlow: towards high-performance adaptive neural networks for kernel datapath by Zhang, Junxue, et al., SIGCOMM 2022
CoCoPIE: Making Mobile AI Sweet As PIE--Compression-Compilation Co-Design Goes a Long Way by Liu, Shaoshan, et al., arxiv 2020
Beyond Data and Model Parallelism for Deep Neural Networks by Jia, Zhihao, Matei Zaharia, and Alex Aiken, MLSys 2019
Discovering faster matrix multiplication algorithms with reinforcement learning by Fawzi, Alhussein, et al., Nature 2022

awesome-real-time-ai's People

Contributors

Watchers

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.