Git Product home page Git Product logo

ci's Introduction

Ease.ml/ci

GitHub Python 3.7

Ease.ml/ci is a library to support continuous testing and integration of machine learning models with statistical guarantees. It can be used as a stand-alone library or deployed as a CI&CD service.

Why another CI/CD system

There exist many different CI/CD tools for classical software development (e.g., Jenkins). However, using them out-of-the box for continously testing machine learning models can lead to failures in production. The reason is firstly, that when testing an ML model with a fixed test set, one has to take into account the inherent randomness of ML. Secondly, when using the same test set multiple times, one has to make sure to not overfitt to it, even when only evaluating and having access to the outcome of test conditions. More details about the inherent challenges on how to test ML models can be found in our blog post.

Sample size estimator

The core component of ease.ml/ci is a sample size estimator. Given a test condition, the number of commits one itends to use the same test set, and the confidence bounds one has to guarantee, the sample size estimator will output the minimum number of samples required to satisfy these requirements. This estimator can then be uses in a standalone fashion (i.e., as a library), or integreated in a CI/CD workflow (i.e., using GitHub action or buildbot). The later requires to also include functionalities on how to actually calculate quantities supported in the test conditions (like accuracy or difference in predictions of models), and how to notify the user to provide a new test set and replace the existing one in the system.

Ease.ml/ci as a library

Install the library

pip install git+https://github.com/easeml/ci

within the python kernel with the installed library

from easeml_cicd.core.utils import SampleCalculator
# location of ci&cd config file
config_path=".easeml.yml"
# initialize the sample calculator with the config file
sc=SampleCalculator(config_path)
# Cacluate the number of samples needed
N = sc.calculate_n()

A jupyter notebook showcasing this can be found here

Ease.ml/ci as a GitHub action

Ease.ml/ci can be used within a GitHub Action.

Prerequisites

  • Ease.ML/ci repository structure

Overview

  1. Generate dataset encryption and decryption keys, by running the command.
easeml_create_key
  1. Base64 encode the keys and add them as a repository secret.
cat easeml_pub.asc | base64 -w 0
cat easeml_priv.asc | base64 -w 0
  1. Store the keys as GitHub Secrets under the names B64_EASEML_PUB and B64_EASEML_PRIV.
  2. Create a GitHub Action yaml under .github/workflows/, e.g. easemlci.yml

An example repository using Ease.ml/ci as a GitHub Action can be found here: https://github.com/leaguilar/ci_action

Ease.ml/ci on buildbot

For heavier workloads Ease.ml/ci can be deployed as a service interfacing with a github repository, deploying models as containers with docker, managing the encrypted datasets and notifying users by email the results of their ML CI&CD pipeline. For this buildbot is used as a base and Easeml/CI&CD is used as a plugin

Prerequisites

Overview

A playlist with a detailed example of setting up the service can be found here and the videos are linked throughout the overview

  1. Provision a server or cluster with a publicly reachable ip/domain name and port,e.g. http://ec2-18-219-109-220.us-east-2.compute.amazonaws.com:8010
  2. Install Docker and enable execution without sudo, e.g. https://docs.docker.com/engine/install/ubuntu/, https://docs.docker.com/engine/install/linux-postinstall/
  3. Create buildbot master/worker
    • Install this package on the worker and master, i.e. pip install git+https://github.com/easeml/cicd, on their respective virtual environments
    • Customize and Set the master configuration, e.g. master.cfg
    • Videos: 1.1, 1.2, 1.3, 1.4, 1.5
  4. Register a GitHub app and link it to a repository
  5. Set keys
    • GitHub access keys in $HOME/.easeml/keys/service_private_key.pem, (this is the key required for the GitHub app to access the repository)
    • Data decryption and encryption keys:
      • $HOME/.easeml/keys/easeml_priv.asc
      • $HOME/.easeml/keys/easeml_pub.asc
    • Videos: 3.1
  6. Run the service

Citations

@inproceedings{renggli2019mlsys,
 author = {Cedric Renggli and Bojan Karlaš and Bolin Ding and Feng Liu and Kevin Schawinski and Wentao Wu and Ce Zhang},
 booktitle = {Proceedings of Machine Learning and Systems},
 title = {Continuous Integration of Machine Learning Models with ease.ml/ci: A Rigorous Yet Practical Treatment},
 year = {2019}
}

@inproceedings{karlas2020sigkdd,
 author = {Bojan Karlaš and Matteo Interlandi and Cedric Renggli and Wentao Wu and Ce Zhang and Deepak Mukunthu Iyappan Babu and Jordan Edwards and Chris Lauren and Andy Xu and Markus Weimer},
 booktitle = {Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining},
 title = {Building continuous integration services for machine learning},
 year = {2020}
}

@inproceedings{aguilar2021ease,
  title={Ease. ML: A Lifecycle Management System for Machine Learning},
  author={Aguilar Melgar, Leonel and Dao, David and Gan, Shaoduo and G{\"u}rel, Nezihe M and Hollenstein, Nora and Jiang, Jiawei and Karla{\v{s}}, Bojan and Lemmin, Thomas and Li, Tian and Li, Yang and others},
  booktitle={11th Annual Conference on Innovative Data Systems Research (CIDR 2021)(virtual)},
  year={2021},
  organization={CIDR}
}

ci's People

Contributors

leaguilar avatar rengglic avatar

Stargazers

Niranjan Anandkumar avatar Haocheng Lin avatar Rakesh Raj avatar  avatar kyoyachuan avatar  avatar

Watchers

Bojan Karlaš avatar  avatar DUO avatar  avatar  avatar

Forkers

hlin863

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.