Git Product home page Git Product logo

papers-notebook-with-scheduling's Introduction

Papers Notebook 文獻筆記

此專案記錄自己閱讀論文蒐集過程,並希望透過閱讀過程記錄研究方法與重點,其整理Papers分類幫助自己正確朝著研究方向深入探討。論文主題以基於kubernetes 與 Scheduling。

Keywords Shortcuts:

  • AS: Auto Scaling
  • DL: Deep Learning
  • DS: Distributed System
  • NE: Network Efficient
  • RM: Resource Management
  • RU: Resource Utilization
  • RC: Resource Contention
  • RS: Resource Scheduling
  • DMLCS: Distributed Machine Learning Centralized Scheduling
  • PA: Performance Analysis
  • PT: Parallelized Training

目錄

排程 Scheduler

Keywords Paper Title PDF Slide Year
DL, Scheduling Gandiva: Introspective Cluster Scheduling for Deep Learning [pdf] [slide] 2018
DL, CPU, RS Scheduling CPU for GPU-based Deep Learning Jobs [pdf] [slide] 2018
DL, NE, Scheduling DLTAP: A Network-efficient Scheduling Method for Distributed Deep Learning Workload in Containerized Cluster Environment [pdf] [slide] 2018
DL,Training System Project Adam: Building an Efficient and Scalable Deep Learning Training System [pdf] [Video] 2014
DL, PS, Rack-Scale Parameter Hub: a Rack-Scale Parameter Server for Distributed Deep Neural Network Training [pdf] [slide] 2018
ML, PS Scaling Distributed Machine Learning with the Parameter Server [pdf] [slide] 2014
ML, Infra Applied Machine Learning at Facebook:A Datacenter Infrastructure Perspective [pdf] [slide] 2014
RM Optimus: An Efficient Dynamic Resource Scheduler for Deep Learning Cluster [pdf] [slide] 2018
DS, PS Scaling Distributed Machine Learning with the Parameter Server [pdf] [slide][Video] 2014
Scheduling, GPU, PA, RC Topology-Aware GPU Scheduling for Learning Workloads in Cloud Environments [pdf] [slide] 2017
DL, RO, Job Scheduling, Autoscaling DRAGON: A Dynamic Scheduling and Scaling Controller for Managing Distributed Deep Learning Jobs in Kubernetes Cluster [pdf] [slide] 2017

kubernetes

Keywords Paper Title PDF Slide Year
DL, AS, kubernetes Deep Learning Based Auto-Scaling Load Balancing Mechanism for Distributed Software-Defined Storage Service [pdf] [slide] 2018
ML, benchmarking, kubernetes Kubebench: A Benchmarking Platform for ML Workloads [pdf] [slide] 2018
RM, DMLCS,RU, kubernetes, kubeflow GAI: A Centralized Tree-Based Scheduler for Machine Learning Workload in Large Shared Clusters [pdf] [slide] 2018
DL, Scheduling, Algorithm Online Job Scheduling in Distributed Machine Learning Clusters [pdf] [slide] 2018
Autoscaling, kubernetes Containers Orchestration with Cost-Efficient Autoscaling in Cloud Computing Environments [pdf] [slide] 2018
DL, PT, kubernetes Parallelized Training of Deep NN – Comparison of Current Concepts and Frameworks [pdf] [slide] 2018

Other

Keywords Paper Title PDF Slide Year
DL, DS Multi-tenant GPU Clusters for Deep Learning Workloads: Analysis and Implications [pdf] [slide] 2018
DL, DS GeePS: Scalable deep learning on distributed GPUs with a GPU-specialized parameter server [pdf] [slide] 2015
DL Poseidon: A system architecture for efficient GPU-based deep learning on multiple machines [pdf] [slide] 2015
Mesos, Marathon, Ceph Toward High-Availability Container as a Service on Mesos Cluster with Distributed Shared Volumes [pdf] [slide] 2015
DL, System Scalable Deep Learning on Distributed Infrastructures: Challenges, Techniques and Tools [pdf] [slide] 2019

Classic [排程(Scheduler)]

Paper Direction

  • Traditional scheduling architecture
  • Machine learning Distributed Cluster
    • Model training
    • Farmwork
    • Parameters Server / AllReduce
  • Combination of both

Ref-Link

Learning Scheduler

  • Scheulder affinity
  • Scheduler Policy
  • Hardware GPU topology
  • Kube-batch

Operator Learning

papers-notebook-with-scheduling's People

Contributors

yylin1 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

papers-notebook-with-scheduling's Issues

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.