Git Product home page Git Product logo

ethicalml / awesome-production-machine-learning Goto Github PK

View Code? Open in Web Editor NEW
16.0K 404.0 2.1K 2.25 MB

A curated list of awesome open source libraries to deploy, monitor, version and scale your machine learning

Home Page: https://ethical.institute/principles.html

License: MIT License

machine-learning mlops interpretability explainability responsible-ai deep-learning machine-learning-operations ml-ops ml-operations privacy-preserving

awesome-production-machine-learning's People

Contributors

a-y-khan avatar abhi526691 avatar al-yakubovich avatar allenhaozi avatar axsaucedo avatar brunowego avatar daavoo avatar dmelikyan avatar edenlightning avatar floscha avatar gidim avatar hardianlawi avatar ianhellstrom avatar ilmoi avatar lkevinzc avatar msaudade avatar nastasiasaby avatar pommedeterresautee avatar redotics avatar shankarcchandrasekaran avatar shayh avatar sohaibfarooqi avatar spoilerdo avatar ssakhavi avatar stefannae avatar tamersalama avatar tanertopal avatar visenger avatar zhimin-z avatar zoranpandovski avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

awesome-production-machine-learning's Issues

Add link to the airflow-dvc project

Airflow-dvc provides a plugin to integrate DVC tool into Airflow systems. DVC is a version-control system that allows versioning of ML artifacts, data-science intermediate datasets and other large-volume files. It's the alternative for git lfs that has better support for pipelines and results reproduction.

Link to the PR

Adding Haven-Ai to the curated list

Thanks for curating a list of useful libraries that help people deploy machine learning projects!

Would it be okay to add the Haven-Ai library to the list? https://github.com/haven-ai/haven-ai

It's a library that we have been working on over the last 3 years that allows us to design codebase for quick reliable prototyping and also running, managing and visualizing large scale reproducible experiments.

I think it fits under Model and Data Versioning section in the README.md.

Thanks a lot!

Sort the sections alphabetically [help from community appreciated]

We have already updated the frameworks in each section to be ordered alphabetically. The sections themselves should be sorted alphabetically as well. The commercial section would still stay at the bottom given that the primary objective of this list is to showcase OSS tools. If anyone from the community would be keen to give it a shot we would be very grateful!

Suggestion : Automate your cycle of Intelligence

Katonic MLOps Platform is a collaborative platform with a Unified UI to manage all data science activities in one place and introduce MLOps practice into the production systems of customers and developers. It is a collection of cloud-native tools for all of these stages of MLOps:

-Data exploration
-Feature preparation
-Model training/tuning
-Model serving, testing and versioning

Katonic is for both data scientists and data engineers looking to build production-grade machine learning implementations and can be run either locally in your development environment or on a production cluster. Katonic provides a unified system—leveraging Kubernetes for containerization and scalability for the portability and repeatability of its pipelines.

It will be great if you can list it on your account

Website -
Katonic One Pager.pdf

https://katonic.ai/

Labeling of time series data

There is trainset for lightweight UI-based visual labeling of time series data. Does someone know standalone tool alternative to trainset?

BTW: Grafana provides annotation API but I don't know an UI integration/tool which enable "drag & drop" style usage.

Suggest to add Substra framework to Model Deployment and Orchestration Frameworks

Hello,

I would like to present Substra: an open-source framework for privacy-preserving, traceable and collaborative Machine Learning.

Substra gathers data providers and algorithm designers into a network of nodes that can train models on demand but under advanced permission regimes. To guarantee data privacy, Substra implements distributed learning: the data never leave their nodes; only algorithms, predictive models and non-sensitive metadata are exchanged on the network. The computations are orchestrated by a Distributed Ledger Technology which guarantees traceability and authenticity of information without needing to trust a third party. Although originally developed for Healthcare applications, Substra is not data, algorithm or programming language specific. It supports many types of computation plans including parallel computation plan commonly used in Federated Learning. With appropriate guidelines, it can be deployed for numerous Machine Learning use-cases with data or algorithm providers where trust is limited.

Github:

Nebullvm or Speedster

Since Speedster is already a part of Nebullvm, why not integrate Speedster into Nebullvm? @axsaucedo
The logic is straightforward. We only keep Ray rather than its specific component such as Rllib (a famous production-level reinforcement learning library).

  • Nebullvm - Nebullvm is an ecosystem of plug-and-play modules to boost the performances of your AI systems. The optimization modules are stack-agnostic and work with any library. They are designed to be easily integrated into your system, providing a quick and seamless boost to its performance.

Would you mind reopening this pr for further discussion? #307

Fix Plotly.py Stats

In the Readme.md, the number of Github stars for Plotly.py is wrong. It lists 40 stars when it should be around 7k.

PRs Add Descriptions

Massive thanks for all the additions @fkromer ! Really great finds! Only thing before we can merge teh PRs, would be to add a short rescription to all the libraries in #83, #82, #81, #80, #79, #78, #77, #76, #75 and #74 - let me know if you can add those, if not I can also jump in and add descriptions.

Create new section commercial e2e platforms

The commercial platforms section may be miss-leading. Often it's not obvious if the platforms are end-to-end platforms with more less "complete" feature set (e.g. dataiku DSS) or commercial variants of single components of an overall e2e platform, like e.g. experiment management (e.g. neptune.ai) or model deployment (e.g. datatron). For new contributors it would be helpful to have a separate end-to-end ml platform section.

Suggestion: add "Vector Database" section

Vector databases are a relatively new phenomenon, but they might deserve their own section.

For the convenience of the maintainer of this repo, I've added two PRs, one without the new section one with the new section.

This reading material about Weaviate might be helpful in the argumentation why the vector database might deserve its own spot on this list :)

Suggestion: Add tournesol.app platform

Tournesol is a collaborative content recommendation platform based on the latest research in AI security (some articles on the algorithms used have already been published). The goal of Tournesol is to identify top videos of public utility by eliciting contributors' judgements on content quality. All the code is open-source and available here https://github.com/tournesol-app/tournesol

If the suggestion is accepted, I may do a PR.

Remove general tools that aim not for model deployment specifically

Some tools in the FaaS module are designed to solve more general scalability issues rather than model deployment.
I suggest removing those available tools from our list since they seem misleading somehow. For example, we should not include Kubernetes in our list since it is a general tool that aims not for model deployment specifically.
check #333

Automate your cycle of Intelligence

Katonic MLOps Platform is a collaborative platform with a Unified UI to manage all data science activities in one place and introduce MLOps practice into the production systems of customers and developers. It is a collection of cloud-native tools for all of these stages of MLOps:

-Data exploration
-Feature preparation
-Model training/tuning
-Model serving, testing and versioning

Katonic is for both data scientists and data engineers looking to build production-grade machine learning implementations and can be run either locally in your development environment or on a production cluster. Katonic provides a unified system—leveraging Kubernetes for containerization and scalability for the portability and repeatability of its pipelines.

It will be great if you can list it on your account

Website -
Katonic One Pager.pdf

https://katonic.ai/

Keras feature engineering automation

I found out that a very useful package keras-tuner is missing from the list.

Keras tuner is a great package used for feature engineering and automated hyperparameter tuning in Keras and Tensorflow and can be really helpful in finding the best hyperparameters using various search methods.

Add new section on ML Serving

As suggested and discussed in #94 with @rmminusrslash and @visenger we'll add a new section on ML serving, as there's been quite a lot of new projects in this space. A few ML Serving frameworks include:

  • Seldon Core
  • KFServing
  • BentoML
  • Cortex
  • TFX
  • TFSerivng
  • ForestFlow

Any other suggestions? @rmminusrslash / @visenger would you be interested on explore adding a PR?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.