Git Product home page Git Product logo

mlops-k8s-mlflow2seldon's Introduction

Neu.ro MLFlow2Seldon deployer

An integration service to deploy MLFlow registered model as REST/GRPC API to Kubernetes cluster using Seldon-core.

Usage

This service is running inside of the Kubernetes cluster, where the Seldon-core is deployed. By constantly fetching the MLFlow server registered models (running as a platform job) via MLFlow Python SDK, it synchronizes the MLFlow state to Seldon-core within the Kubernetes cluster.

For instance, if the MLFlow registered model version gets assigned to the Staging/Production stage, the corresponding model binary gets deployed from the MLFlow into the K8s cluster as the SeldonDeployment (exposing REST/GRPC APIs). If the stage assignment gets removed/updated - the corresponding SeldonDeployment is changed respectively.

Given that, all the interaction with the service is done implicitly via the MLFlow server state. There is no need to execute particular commands/workloads against this service directly.

Prerequisites and usage assumptions

  • MLFlow
    • is up and running as a platform job
    • disabled platform SSO;
    • artifact store as a platform storage, mounted as local path;
    • mlflow server version is at least 1.11.0;
  • Seldon
    • SeldonDeployment container image (model wrapper) should be stored in the platform registry, on the same cluster where MLFlow is runnnig;
    • kubectl tool at the time of this service deployment should be authenticated to communicate with a Kubernetes cluster, where Seldon is deployed;
    • seldon-core-operator version is at least 1.5.0;

Deployment

  • make helm_deploy - will ask one several questions (e.g. what is the MLFlow URL, which Neu.ro cluster should be considered, etc.). Alternatively, one might also set the following env vars:
    • M2S_MLFLOW_HOST - MLFlow server host name (example: https://mlflow--user.jobs.cluster.org.neu.ro)/;
    • M2S_MLFLOW_STORAGE_ROOT - artifact root path in the platform storage (storage:myproject/mlruns);
    • M2S_SELDON_NEURO_DEF_IMAGE - docker image, stored in a platform registry, which will be used to deploy the model (image:myproject/seldon:v1). Alternatively, one might configure service to use another platform image for deployment by tagging the respective registerred model (not a model version (!) ) with the tag named after M2S_MLFLOW_DEPLOY_IMG_TAG chart parameter value (for instance, with a tag named "deployment-image" and the value "image:myproject/seldon:v2);
    • M2S_SRC_NEURO_CLUSTER - Neu.ro cluster, where deployment image, MLflow artifacts and MLFlow itself are hosted (demo_cluster);
  • Direct use of the helm chart is possible, however less comfortable - all requested by makefile info should be passed as chart values.

Cleanup

  • make helm_delete - will delete:
    • all created by this helm chart resources, required for this service and the service itself;

Got questions or suggestions?

Feel free to contact us via ๐Ÿ“ง or @ slack.

Maintained by Neu.ro MLOps team with โค๏ธ

mlops-k8s-mlflow2seldon's People

Contributors

anayden avatar pre-commit-ci[bot] avatar yevheniisemendiak avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

mlops-k8s-mlflow2seldon's Issues

[tech debt] Add backward sync K8s (Seldon) -> Operator

Currently, our sync flow is MLFlow <-> Operator -> K8s (Seldon) (only those deployments, which were created by this service are getting deleted)
We need to have a proper bi-directional sync MLFlow <-> Operator <-> K8s (Seldon)

Validate model names

They should not contain underscores (_), otherwise we are getting

2021-10-13 07:44:40 - root - INFO - Deploying model: _DeployedModel(image='registry.default.org.neu.ro/yevheniisemendiak/ml_recipe_bone_age/inference:21.4.13', model_name='bone_age_predictor', model_storage_uri=URL('storage://default/yevheniisemendiak/ml_recipe_bone_age/mlruns/0/26e39c613b48477d8a2511748c9efecc/artifacts/model'), model_stage='Staging', model_version='1', source_run_id='26e39c613b48477d8a2511748c9efecc', deployment_namespace='seldon', need_redeploy=True)
The SeldonDeployment "bone_age_predictor-staging" is invalid: metadata.name: Invalid value: "bone_age_predictor-staging": a DNS-1123 subdomain must consist of lower case alphanumeric characters, '-' or '.', and must start and end with an alphanumeric character (e.g. 'example.com', regex used for validation is '[a-z0-9]([-a-z0-9]*[a-z0-9])?(\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*')
2021-10-13 07:44:45 - root - ERROR - Unable to deploy '_DeployedModel(image='registry.default.org.neu.ro/yevheniisemendiak/ml_recipe_bone_age/inference:21.4.13', model_name='bone_age_predictor', model_storage_uri=URL('storage://default/yevheniisemendiak/ml_recipe_bone_age/mlruns/0/26e39c613b48477d8a2511748c9efecc/artifacts/model'), model_stage='Staging', model_version='1', source_run_id='26e39c613b48477d8a2511748c9efecc', deployment_namespace='seldon', need_redeploy=True)': Command 'kubectl apply -f /tmp/tmpx8e7hxlf' returned non-zero exit status 1.
2021-10-13 07:44:45 - root - INFO - Deployed models: bone_age_predictor-staging:1

Support different MLFlow backends

Currently, we do support the platform storage on the same cluster, where MLFlow is running as a job.
In future we need to support also:

  • other platform clusters
  • object storages

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.