Git Product home page Git Product logo

production-level-machine-learning's Introduction

Tools for Production Level Machine Learning

[NOTE: This repo is an constant work in progress. Any feedback is greatly appreciated ๐Ÿ˜„]

A curated list of useful tools and references for production level machine learning, both open source, and propreitary. Also included are some pointers for tradeoffs of choosing

Data

Data handing is a broad topic that requires a number of tools for processing, storage, privacy, pipelines, and many others.

Storage

Open Source & SaaS Services

  • Object store: Store binary data (images, sound files, compressed texts)
  • Database: Store metadata (file paths, labels, user activity, etc).
    • Postgres is the right choice for most of applications, with the best-in-class SQL and great support for unstructured JSON.
  • Data Lake: to aggregate features which are not obtainable from database (e.g. logs)
  • Feature Store: store, access, and share machine learning features. Feature extraction can be computationally expensive and nearly impossible to scale, hence re-using features by different models and teams is a key to high performance ML teams.

Pipelines

Open Source

Machine Learning Development Frameworks

There are number of development frameworks out there. There are fundamental libraries as well as derivative APIs (e.g. Keras) which simplifies the interface.

Open Source

  • Tensorflow: Fundamental tool for deep learning and well supported by Google & community
  • PyTorch: Fundamental tool based upon Torch developed and well supported by Facebook & community
  • Keras: Simplified API for easier development
  • Scikit Learn
  • DeepDetect

Model / Experiment Management

Open Source

  • Polyaxon: reproducible machine learning at scale
  • Datmo: replicable model versions
  • MLFlow: machine learning experiment tracking
  • ModelDB: system for managing machine learning models for scikit-learn & spark.ml
  • DVC: replicable etl and feature extraction pipelines
  • CookieCutter Data Science: replicable file structures for data projects
  • Docker CookieCutter Data Science: fork of above to run cookie-cutter project in a Docker container
  • Duct Tape: replicable running of code
  • Dynamic Training Bench: tensorflow training and tuning
  • Sacred: reproduce experiments with a GUI to track
  • Pachyderm: reproducible way to version data and ETL pipelines
  • Django Estimators: specific to django and scikit-learn estimators
  • MAX: model template for tracking model types
  • Kinoa: save experiment results easily

Continuous Integration

SaaS Tools

  • Argo: Open source Kubernetes native workflow engine for orchestrating parallel jobs (incudes workflows, events, CI and CD).
  • CircleCI: Language-Inclusive Support, Custom Environments, Flexible Resource Allocation, used by instacart, Lyft, and StackShare.
  • Travis CI
  • Buildkite: Fast and stable builds, Open source agent runs on almost any machine and architecture, Freedom to use your own tools and services

Open Source

  • Jenkins: Open source on device build system

Training for Machine Learning / Deep Learning

Open Source

For Production Systems / Model Serving

Open Source

End-to-End

SaaS Proprietary

References

production-level-machine-learning's People

Contributors

asampat3090 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.