Light

acmore / papers-notebook-with-scheduling Goto Github PK

View Code? Open in Web Editor NEW

This project forked from yylin1/papers-notebook-with-scheduling

0.0 3.0 0.0 53 KB

碩士論文文獻筆記（Deep Learning、Scheduling、Distributed、Kubernetes）

papers-notebook-with-scheduling's Introduction

papers-notebook-with-scheduling

論文文獻筆記

此專案記錄閱讀論文彙整，並透過閱讀過程記錄研究方法與重點，其整理Papers分類幫助自己正確朝著研究方向深入探討。

Keywords Shortcuts:

AS: Auto Scaling
DL: Deep Learning
DS: Distributed system
NE: Network efficient
RM: Resource management
RU: Resource Utilization
RC: Resource Contention
RS: Resource scheduling
DMLCS: Distributed machine learning Centralized scheduling
PA: Performance Analysis
PT: Parallelized Training

文獻

排程 Scheduler

Keywords	Paper Title	PDF	Slide	Year
DL, Scheduling	Gandiva: Introspective Cluster Scheduling for Deep Learning	[pdf]	[slide]	2018
DL, CPU, RS	Scheduling CPU for GPU-based Deep Learning Jobs	[pdf]	[slide]	2018
DL, NE, Scheduling	DLTAP: A Network-efficient Scheduling Method for Distributed Deep Learning Workload in Containerized Cluster Environment	[pdf]	[slide]	2018
DL,Training System	Project Adam: Building an Efficient and Scalable Deep Learning Training System	[pdf]	[Video]	2014
DL, PS, Rack-Scale	Parameter Hub: a Rack-Scale Parameter Server for Distributed Deep Neural Network Training	[pdf]	[slide]	2018
ML, PS	Scaling Distributed Machine Learning with the Parameter Server	[pdf]	[slide]	2014
ML, Infra	Applied Machine Learning at Facebook:A Datacenter Infrastructure Perspective	[pdf]	[slide]	2014
RM	Optimus: An Efficient Dynamic Resource Scheduler for Deep Learning Cluster	[pdf]	[slide]	2018
DS, PS	Scaling Distributed Machine Learning with the Parameter Server	[pdf]	[slide][Video]	2014
Scheduling, GPU, PA, RC	Topology-Aware GPU Scheduling for Learning Workloads in Cloud Environments	[pdf]	[slide]	2017
DL, GPU	Multi-tenant GPU Clusters for Deep Learning Workloads: Analysis and Implications	[pdf]	[slide]	2018
DL, DS, GPU	Tiresias: A GPU Cluster Manager for Distributed Deep Learning	[pdf]	[slide]	2019
DL, GPU	Salus: Fine-Grained GPU Sharing Primitives for Deep Learning Applications	[pdf]	[slide]	2019

# kubernetes

Keywords	Paper Title	PDF	Slide	Year
DL, AS, kubernetes	Deep Learning Based Auto-Scaling Load Balancing Mechanism for Distributed Software-Defined Storage Service	[pdf]	[slide]	2018
ML, benchmarking, kubernetes	Kubebench: A Benchmarking Platform for ML Workloads	[pdf]	[slide]	2018
RM, DMLCS,RU, kubernetes	GAI: A Centralized Tree-Based Scheduler for Machine Learning Workload in Large Shared Clusters	[pdf]	[slide]	2018
DL, Scheduling, Algorithm	Online Job Scheduling in Distributed Machine Learning Clusters	[pdf]	[slide]	2018
Autoscaling, kubernetes	Containers Orchestration with Cost-Efficient Autoscaling in Cloud Computing Environments	[pdf]	[slide]	2018
DL, PT, kubernetes	Parallelized Training of Deep NN – Comparison of Current Concepts and Frameworks	[pdf]	[slide]	2018
DL, Resource Orchestration, Job Scheduling, Autoscaling	DRAGON: A Dynamic Scheduling and Scaling Controller for Managing Distributed Deep Learning Jobs in Kubernetes Cluster	[pdf]	[slide]	2019

# other

Keywords	Paper Title	PDF	Slide	Year
DL, DS	Multi-tenant GPU Clusters for Deep Learning Workloads: Analysis and Implications	[pdf]	[slide]	2018
DL, DS	GeePS: Scalable deep learning on distributed GPUs with a GPU-specialized parameter server	[pdf]	[slide]	2015
DL	Poseidon: A system architecture for efficient GPU-based deep learning on multiple machines	[pdf]	[slide]	2015
Mesos, Marathon, Ceph	Toward High-Availability Container as a Service on Mesos Cluster with Distributed Shared Volumes	[pdf]	[slide]	2015

Classic [排程(Scheduler)]

Paper Direction

Traditional scheduling architecture
Machine learning Distributed Cluster
- Model training
- Farmwork
- Parameters Server / AllReduce
Combination of both

Ref-Link

分佈式機器學習/ 深度學習論文整理

Learning Scheduler

Scheulder affinity
Scheduler Policy
Hardware GPU topology
Kube-batch

Operator Learning

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.