Git Product home page Git Product logo

papers-notebook-with-scheduling's Introduction

papers-notebook-with-scheduling

論文文獻筆記

此專案記錄閱讀論文彙整,並透過閱讀過程記錄研究方法與重點,其整理Papers分類幫助自己正確朝著研究方向深入探討。

Keywords Shortcuts:

  • AS: Auto Scaling
  • DL: Deep Learning
  • DS: Distributed system
  • NE: Network efficient
  • RM: Resource management
  • RU: Resource Utilization
  • RC: Resource Contention
  • RS: Resource scheduling
  • DMLCS: Distributed machine learning Centralized scheduling
  • PA: Performance Analysis
  • PT: Parallelized Training

文獻

排程 Scheduler

Keywords Paper Title PDF Slide Year
DL, Scheduling Gandiva: Introspective Cluster Scheduling for Deep Learning [pdf] [slide] 2018
DL, CPU, RS Scheduling CPU for GPU-based Deep Learning Jobs [pdf] [slide] 2018
DL, NE, Scheduling DLTAP: A Network-efficient Scheduling Method for Distributed Deep Learning Workload in Containerized Cluster Environment [pdf] [slide] 2018
DL,Training System Project Adam: Building an Efficient and Scalable Deep Learning Training System [pdf] [Video] 2014
DL, PS, Rack-Scale Parameter Hub: a Rack-Scale Parameter Server for Distributed Deep Neural Network Training [pdf] [slide] 2018
ML, PS Scaling Distributed Machine Learning with the Parameter Server [pdf] [slide] 2014
ML, Infra Applied Machine Learning at Facebook:A Datacenter Infrastructure Perspective [pdf] [slide] 2014
RM Optimus: An Efficient Dynamic Resource Scheduler for Deep Learning Cluster [pdf] [slide] 2018
DS, PS Scaling Distributed Machine Learning with the Parameter Server [pdf] [slide][Video] 2014
Scheduling, GPU, PA, RC Topology-Aware GPU Scheduling for Learning Workloads in Cloud Environments [pdf] [slide] 2017
DL, GPU Multi-tenant GPU Clusters for Deep Learning Workloads: Analysis and Implications [pdf] [slide] 2018
DL, DS, GPU Tiresias: A GPU Cluster Manager for Distributed Deep Learning [pdf] [slide] 2019
DL, GPU Salus: Fine-Grained GPU Sharing Primitives for Deep Learning Applications [pdf] [slide] 2019

# kubernetes

Keywords Paper Title PDF Slide Year
DL, AS, kubernetes Deep Learning Based Auto-Scaling Load Balancing Mechanism for Distributed Software-Defined Storage Service [pdf] [slide] 2018
ML, benchmarking, kubernetes Kubebench: A Benchmarking Platform for ML Workloads [pdf] [slide] 2018
RM, DMLCS,RU, kubernetes GAI: A Centralized Tree-Based Scheduler for Machine Learning Workload in Large Shared Clusters [pdf] [slide] 2018
DL, Scheduling, Algorithm Online Job Scheduling in Distributed Machine Learning Clusters [pdf] [slide] 2018
Autoscaling, kubernetes Containers Orchestration with Cost-Efficient Autoscaling in Cloud Computing Environments [pdf] [slide] 2018
DL, PT, kubernetes Parallelized Training of Deep NN – Comparison of Current Concepts and Frameworks [pdf] [slide] 2018
DL, Resource Orchestration, Job Scheduling, Autoscaling DRAGON: A Dynamic Scheduling and Scaling Controller for Managing Distributed Deep Learning Jobs in Kubernetes Cluster [pdf] [slide] 2019

# other

Keywords Paper Title PDF Slide Year
DL, DS Multi-tenant GPU Clusters for Deep Learning Workloads: Analysis and Implications [pdf] [slide] 2018
DL, DS GeePS: Scalable deep learning on distributed GPUs with a GPU-specialized parameter server [pdf] [slide] 2015
DL Poseidon: A system architecture for efficient GPU-based deep learning on multiple machines [pdf] [slide] 2015
Mesos, Marathon, Ceph Toward High-Availability Container as a Service on Mesos Cluster with Distributed Shared Volumes [pdf] [slide] 2015

Classic [排程(Scheduler)]

Paper Direction

  • Traditional scheduling architecture
  • Machine learning Distributed Cluster
    • Model training
    • Farmwork
    • Parameters Server / AllReduce
  • Combination of both

Ref-Link

Learning Scheduler

  • Scheulder affinity
  • Scheduler Policy
  • Hardware GPU topology
  • Kube-batch

Operator Learning

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.