Git Product home page Git Product logo

awesome-system-for-machine-learning's Introduction

Maintenance Commit Activity Last Commit Ask Me Anything ! Awesome GitHub license GitHub stars

Awesome System for Machine Learning

Path to system for AI [Whitepaper You Must Read]

A curated list of research in machine learning system. Link to the code if available is also present. I also summarize some papers if I think they are really interesting. You are very welcome to pull request by using our template

AI system

General Resources

System for AI Papers

Survey

  • Toward Highly Available, Intelligent Cloud and ML Systems [Slide]
  • awesome-production-machine-learning: A curated list of awesome open source libraries to deploy, monitor, version and scale your machine learning [GitHub]
  • Opportunities and Challenges Of Machine Learning Accelerators In Production [Paper]
    • Ananthanarayanan, Rajagopal, et al. "
    • 2019 {USENIX} Conference on Operational Machine Learning (OpML 19). 2019.
  • How (and How Not) to Write a Good Systems Paper [Advice]
  • Applied machine learning at Facebook: a datacenter infrastructure perspective [Paper]
    • Hazelwood, Kim, et al. (HPCA 2018)
  • Infrastructure for Usable Machine Learning: The Stanford DAWN Project
    • Bailis, Peter, Kunle Olukotun, Christopher Ré, and Matei Zaharia. (preprint 2017)
  • Hidden technical debt in machine learning systems [Paper]
    • Sculley, David, et al. (NIPS 2015)
  • End-to-end arguments in system design [Paper]
    • Saltzer, Jerome H., David P. Reed, and David D. Clark.
  • System Design for Large Scale Machine Learning [Thesis]
  • Deep Learning Inference in Facebook Data Centers: Characterization, Performance Optimizations and Hardware Implications [Paper]
    • Park, Jongsoo, Maxim Naumov, Protonu Basu et al. arXiv 2018
    • Summary: This paper presents a characterizations of DL models and then shows the new design principle of DL hardware.
  • A Berkeley View of Systems Challenges for AI [Paper]

Book

  • Computer Architecture: A Quantitative Approach [Must read]
  • Streaming Systems [Book]
  • Kubernetes in Action (start to read) [Book]
  • Machine Learning Systems: Designs that scale [Website]

Video

  • ScalaDML2020: Learn from the best minds in the machine learning community. [Video]
  • From Research to Production with PyTorch [Video]
  • Introduction to Microservices, Docker, and Kubernetes [YouTube]
  • ICML Keynote: Lessons Learned from Helping 200,000 non-ML experts use ML [Video]
  • Adaptive & Multitask Learning Systems [Website]
  • System thinking. A TED talk. [YouTube]
  • Flexible systems are the next frontier of machine learning. Jeff Dean [YouTube]
  • Is It Time to Rewrite the Operating System in Rust? [YouTube]
  • InfoQ: AI, ML and Data Engineering [YouTube]
    • Start to watch.
  • Netflix: Human-centric Machine Learning Infrastructure [InfoQ]
  • SysML 2019: [YouTube]
  • ScaledML 2019: David Patterson, Ion Stoica, Dawn Song and so on [YouTube]
  • ScaledML 2018: Jeff Dean, Ion Stoica, Yangqing Jia and so on [YouTube] [Slides]
  • A New Golden Age for Computer Architecture History, Challenges, and Opportunities. David Patterson [YouTube]
  • How to Have a Bad Career. David Patterson (I am a big fan) [YouTube]
  • SysML 18: Perspectives and Challenges. Michael Jordan [YouTube]
  • SysML 18: Systems and Machine Learning Symbiosis. Jeff Dean [YouTube]

Course

Blog

  • Parallelizing across multiple CPU/GPUs to speed up deep learning inference at the edge [Amazon Blog]
  • Building Robust Production-Ready Deep Learning Vision Models in Minutes [Blog]
  • Deploy Machine Learning Models with Keras, FastAPI, Redis and Docker [Blog]
  • How to Deploy a Machine Learning Model -- Creating a production-ready API using FastAPI + Uvicorn [Blog] [GitHub]
  • Deploying a Machine Learning Model as a REST API [Blog]
  • Continuous Delivery for Machine Learning [Blog]
  • Kubernetes CheatSheets In A4 [GitHub]
  • A Gentle Introduction to Kubernetes [Blog]
  • Train and Deploy Machine Learning Model With Web Interface - Docker, PyTorch & Flask [GitHub]
  • Learning Kubernetes, The Chinese Taoist Way [GitHub]
  • Data pipelines, Luigi, Airflow: everything you need to know [Blog]
  • The Deep Learning Toolset — An Overview [Blog]
  • Summary of CSE 599W: Systems for ML [Chinese Blog]
  • Polyaxon, Argo and Seldon for Model Training, Package and Deployment in Kubernetes [Blog]
  • Overview of the different approaches to putting Machine Learning (ML) models in production [Blog]
  • Being a Data Scientist does not make you a Software Engineer [Part1] Architecting a Machine Learning Pipeline [Part2]
  • Model Serving in PyTorch [Blog]
  • Machine learning in Netflix [Medium]
  • SciPy Conference Materials (slides, repo) [GitHub]
  • 继Spark之后,UC Berkeley 推出新一代AI计算引擎——Ray [Blog]
  • 了解/从事机器学习/深度学习系统相关的研究需要什么样的知识结构? [Zhihu]
  • Learn Kubernetes in Under 3 Hours: A Detailed Guide to Orchestrating Containers [Blog] [GitHub]
  • data-engineer-roadmap: Learning from multiple companies in Silicon Valley. Netflix, Facebook, Google, Startups [GitHub]
  • TensorFlow Serving + Docker + Tornado机器学习模型生产级快速部署 [Blog]
  • Deploying a Machine Learning Model as a REST API [Blog]

Userful Tools

Profile

  • Collective Knowledge repository to automate MLPerf - a broad ML benchmark suite for measuring performance of ML software frameworks, ML hardware accelerators, and ML cloud platforms [GitHub]
  • NetworKit is a growing open-source toolkit for large-scale network analysis. [GitHub]
  • gpu-sentry: Flask-based package for monitoring utilisation of nVidia GPUs. [GitHub]
  • anderskm/gputil: A Python module for getting the GPU status from NVIDA GPUs using nvidia-smi programmically in Python [GitHub]
  • Pytorch-Memory-Utils: detect your GPU memory during training with Pytorch. [GitHub]
  • torchstat: a lightweight neural network analyzer based on PyTorch. [GitHub]
  • NVIDIA GPU Monitoring Tools [GitHub]
  • PyTorch/cpuinfo: cpuinfo is a library to detect essential for performance optimization information about host CPU. [GitHub]
  • Popular Network memory consumption and FLOP counts [GitHub]
  • Intel® VTune™ Amplifier [Website]
    • Stop guessing why software is slow. Advanced sampling and profiling techniques quickly analyze your code, isolate issues, and deliver insights for optimizing performance on modern processors
  • Pyflame: A Ptracing Profiler For Python [GitHub]

Others

  • Facebook AI Performance Evaluation Platform [GitHub]
  • Netron: Visualizer for deep learning and machine learning models [GitHub]
  • Facebook/FBGEMM: FBGEMM (Facebook GEneral Matrix Multiplication) is a low-precision, high-performance matrix-matrix multiplications and convolution library for server-side inference. [GitHub]
  • Dslabs: Distributed Systems Labs and Framework for UW system course [GitHub]
  • Machine Learning Model Zoo [Website]
  • Faiss: A library for efficient similarity search and clustering of dense vectors [GitHub]
  • Microsoft/MMdnn: A comprehensive, cross-framework solution to convert, visualize and diagnose deep neural network models.[GitHub]
  • gpushare-scheduler-extender [GitHub]
    • More and more data scientists run their Nvidia GPU based inference tasks on Kubernetes. Some of these tasks can be run on the same Nvidia GPU device to increase GPU utilization. So one important challenge is how to share GPUs between the pods
  • Example recipes for Kubernetes Network Policies that you can just copy paste [GitHub]

Project

  • Machine Learning for .NET [GitHub]
    • ML.NET is a cross-platform open-source machine learning framework which makes machine learning accessible to .NET developers.
    • ML.NET allows .NET developers to develop their own models and infuse custom machine learning into their applications, using .NET, even without prior expertise in developing or tuning machine learning models.
  • ONNX: Open Neural Network Exchange [GitHub]
  • ONNXRuntime: has an open architecture that is continually evolving to address the newest developments and challenges in AI and Deep Learning. ONNX Runtime stays up to date with the ONNX standard, supporting all ONNX releases with future compatibility and maintaining backwards compatibility with prior releases. [GitHub]
  • BentoML: Machine Learning Toolkit for packaging and deploying models [GitHub]
  • EuclidesDB: A multi-model machine learning feature embedding database [GitHub]
  • Prefect: Perfect is a new workflow management system, designed for modern infrastructure and powered by the open-source Prefect Core workflow engine. [GitHub]
  • MindsDB: MindsDB's goal is to make it very simple for developers to use the power of artificial neural networks in their projects [GitHub]
  • PAI: OpenPAI is an open source platform that provides complete AI model training and resource management capabilities. [Microsoft Project]
  • Bistro: Scheduling Data-Parallel Jobs Against Live Production Systems [Facebook Project]
  • GNES is Generic Neural Elastic Search, a cloud-native semantic search system based on deep neural network. [GitHub]
  • Jittor is a high-performance deep learning framework based on JIT compiling and meta-operators.[GitHub]
  • MindSpore is a new open source deep learning training/inference framework that could be used for mobile, edge and cloud scenarios.[GitHub]
  • MegEngine is a fast, scalable and easy-to-use numerical evaluation framework, with auto-differentiation.[GitHub]

awesome-system-for-machine-learning's People

Contributors

anancds avatar gaocegege avatar huaizhengzhang avatar huangyz0918 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.