kserve / modelmesh-serving Goto Github PK

View Code? Open in Web Editor NEW

195.0 20.0 111.0 30.32 MB

Controller for ModelMesh

License: Apache License 2.0

Dockerfile 0.43% Makefile 1.14% Go 87.73% Shell 9.51% Python 1.20%

modelmesh-serving's Introduction

ModelMesh Serving

ModelMesh Serving is the Controller for managing ModelMesh, a general-purpose model serving management/routing layer.

Getting Started

To quickly get started with ModelMesh Serving, check out the Quick Start Guide.

For help, please open an issue in this repository.

Components and their Repositories

ModelMesh Serving currently comprises components spread over a number of repositories. The supported versions for the latest release are documented here.

Issues across all components are tracked centrally in this repo.

Core Components

https://github.com/kserve/modelmesh-serving (this repo) - the model serving controller
https://github.com/kserve/modelmesh - the ModelMesh containers used for orchestrating model placement and routing

Runtime Adapters

modelmesh-runtime-adapter - the containers which run in each model serving pod and act as an intermediary between ModelMesh and third-party model-server containers. Its build produces a single "multi-purpose" image which can be used as an adapter to work with each of the out-of-the-box supported model servers. It also incorporates the "puller" logic which is responsible for retrieving the models from storage before handing over to the respective adapter logic to load the model (and to delete after unloading). This image is also used for a container in the load/unload path of custom ServingRuntime Pods, as a "standalone" puller.

Model Serving runtimes

ModelMesh Serving provides out-of-the-box integration with the following model servers.

triton-inference-server - Nvidia's Triton Inference Server
seldon-mlserver - Seldon's Python MLServer
openVINO-model-server - OpenVINO Model Server
torchserve - TorchServe

ServingRuntime custom resources can be used to add support for other existing or custom-built model servers, see the docs on implementing a custom Serving Runtime

Supplementary

KServe V2 REST Proxy - a reverse-proxy server which translates a RESTful HTTP API into gRPC. This allows sending inference requests using the KServe V2 REST Predict Protocol to ModelMesh models which currently only support the V2 gRPC Predict Protocol.

Libraries

These are helper Java libraries used by the ModelMesh component.

kv-utils - Useful KV store recipes abstracted over etcd and Zookeeper
litelinks-core - RPC/service discovery library based on Apache Thrift, used only for communications internal to ModelMesh.

Contributing

Please read our contributing guide for details on contributing.

Building Images

# Build develop image
make build.develop

# After building the develop image,  build the runtime image
make build

modelmesh-serving's People

Contributors

Stargazers

Watchers

modelmesh-serving's Issues

Standardize KServe multi-model management SPI and add built-in support

For dynamic loading/unloading of models, Triton defines a "Model Repository" API which is described as an extension to the KServe v2 dataplane API.

This includes both REST and gRPC variants of the following API endpoints:

POST v2/repository/index
POST v2/repository/models/${MODEL_NAME}/load
POST v2/repository/models/${MODEL_NAME}/unload

MLServer followed this and have implemented the same API but unfortunately their gRPC service definition uses different service and packages name:

MLServer defines a separate service with name inference.model_repository.ModelRepositoryService
Triton just includes them as additional methods in the same inference.GRPCInferenceService data-plane service

ModelMesh uses these in the built-in modelmesh support for Triton/MLServer to manage models in each Triton instance, but currently the logic is mostly specific to each because of the differing service names and different filesystem layout requirements. Note that only the load/unload methods are used, index isn't required.

It seems that this is an at least de facto standard KServe API for model management so it would make sense to support it as an option for other/custom model server implementations via our built-in adapter, as alternative to implementing the native model-mesh gRPC model runtime SPI.

First though we should decide on the official/standard package and service name to use for the gRPC service, and copy its specification into the KServe repo somewhere.

ModelMesh Quickstart error starting sample predictor

modelmesh-serving-mlserver containers fail startup and are in CrashLoopBackOff state. Error message is:

Error: failed to start container "mlserver": Error response from daemon: OCI runtime create failed: container_linux.go:380: starting container process caused: process_linux.go:545: container init caused: process_linux.go:508: setting cgroup config for procHooks process caused: failed to write "500000": write /sys/fs/cgroup/cpu/kubepods/burstable/podaf518315-5ad0-451b-b437-410b8b93b050/mlserver/cpu.cfs_quota_us: invalid argument: unknown

To Reproduce
Steps to reproduce the behavior:

Followed quickstart guide here: https://github.com/kserve/modelmesh-serving/blob/main/docs/quickstart.md

Running on a mac 11.6

Created minikube cluster: minikube start --memory 8192 --cpus 4 -p kserve-mm
create namespace: kubectl create namespace modelmesh-serving
run quickstart: ./scripts/install.sh --namespace modelmesh-serving --quickstart
Deploy sample model:
kubectl apply -f - <<EOF
apiVersion: serving.kserve.io/v1alpha1
kind: Predictor
metadata:
name: example-mnist-predictor
spec:
modelType:
name: sklearn
path: sklearn/mnist-svm.joblib
storage:
s3:
secretKey: localMinIO
EOF

Pods start crash loop with the error message

Error: failed to start container "mlserver": Error response from daemon: OCI runtime create failed: container_linux.go:380: starting container process caused: process_linux.go:545: container init caused: process_linux.go:508: setting cgroup config for procHooks process caused: failed to write "500000": write /sys/fs/cgroup/cpu/kubepods/burstable/podaf518315-5ad0-451b-b437-410b8b93b050/mlserver/cpu.cfs_quota_us: invalid argument: unknown

Expected behavior

Expect mlserver runtime pod to start and the model to be deployed, based on quickstart guide.

Environment (please complete the following information):

OS: Mac OSX 11.6
minikube v1.22.0
Kubernetes v1.21.2
Docker 20.10.7

Thanks in advance for any insight on the issue

Run Tensorflow model on GPU

I was trying to run the example-tensorflow-mnist on GPU.
To achieve this, I edited the ServingRuntime object adding nvidia.com/gpu: 1 to the spec.containers.resources.limits and spec.containers.resources.requests sections of the tritonserver container and the needed tolerations.
Once done that, I noticed that the example-tensorflow-mnist predictor always failed to be loaded.

This is the error I'm getting from the Predictor:
UNAVAILABLE: Failed to load Model due to Triton runtime error: rpc error: code = Unavailable desc = error reading from server: EOF

Before setting the nvidia.com/gpu: 1 I was able to run an inference on example-tensorflow-mnist.

My goal is to load with success the model and then run an inference using the GPU.

Thanks in advance.

The sed command in some scripts fail to run on Mac OSX 12.1

Describe the bug
The BSD sed on OSX fail with this command sed: 1: "minio-storage-secret.yaml": invalid command code m
These file has the sed -i
./script/delete.sh
./script/install.sh
./script/deploy/iks

To Reproduce

modelmesh_release="v0.8.0-rc0"
wget https://raw.githubusercontent.com/kserve/modelmesh-serving/${modelmesh_release}/config/dependencies/minio-storage-secret.yaml
sed -i "s/controller_namespace/controller_namespace/g" minio-storage-secret.yaml

OS: OSX 12.1

Add support for KServe transformer

The current ModelMesh Serving supports InferenceService with a predictor only. The SKerve transformer concept, providing pre and post processing for predict, should be provided in the case of ModelMesh as well.

The user would be able to apply the same transformer in both the single-model serving in KServer and multi-model serving in ModelMesh.

[Need Help] Model orchestration documentation

Overview

Describe the goal or feature or two, usually in the form of a user story.
As a user, I want to modelmesh help me automatically and efficiently orchestrate models into available runtime servers, so that I no need to care about where model will be placed

Acceptance Criteria

Questions

Sorry for ask this, I am new here and interested this project to do model orchestration. I am not sure now modelmesh already supported or not. I try to find some tutorial documents but still not found anything instead of deploying single model. If it already support, could you help me pin down some documents or guidelines.
I have some questions:
when it provision more runtime server?
how it find suitable runtime server to deploy
how about scalability (I tried with 2 triton runtime server and deploy few models, then I checked there are some model weights downloaded in both triton runtime server and serve)
Sorry for my lacking understanding
Thanks!

Assumptions

Reference

Failed to build develop docker image on a Mac

Describe the bug
Fail to build develop image on Mac. Error message:

Building dev image kserve/modelmesh-controller-develop:feb21e0272b82ed0
[+] Building 4.8s (8/8) FINISHED
 => [internal] load build definition from Dockerfile.develop                                                                                                                              0.0s
 => => transferring dockerfile: 3.64kB                                                                                                                                                    0.0s
 => [internal] load .dockerignore                                                                                                                                                         0.0s
 => => transferring context: 2B                                                                                                                                                           0.0s
 => [internal] load metadata for registry.access.redhat.com/ubi8/ubi-minimal:8.4                                                                                                          0.7s
 => [1/4] FROM registry.access.redhat.com/ubi8/ubi-minimal:8.4@sha256:54ef2173bba7384dc7609e8affbae1c36f8a3ec137cacc0866116d65dd4b9afe                                                    0.0s
 => [internal] load build context                                                                                                                                                         0.0s
 => => transferring context: 116.07kB                                                                                                                                                     0.0s
 => CACHED [2/4] WORKDIR /workspace                                                                                                                                                       0.0s
 => CACHED [3/4] COPY .pre-commit-config.yaml go.mod go.sum ./                                                                                                                            0.0s
 => ERROR [4/4] RUN microdnf install     diffutils     gcc-c++     make     wget     tar     vim     git     python38     nodejs &&     pip3 install pre-commit &&     set -eux;     wge  3.9s
------
.....
#8 3.831
#8 3.833 error: Not enough free space in /var/cache/yum/metadata: needed 159.1 MB, available 0 bytes
#8 3.847 /bin/sh: wget: command not found
#8 3.848 /bin/sh: tar: command not found
#8 3.849 /bin/sh: go: command not found

To Reproduce
run make build.develop

Expected behavior
Successful build message on a Ubuntu looks like:

Successfully built e8efba498870
Successfully tagged kserve/modelmesh-controller-develop:feb21e0272b82ed0

Environment (please complete the following information):

MacBookPro 2015
OSX 11.6
Docker Engine v20.10.8

gRPC example in the quick start guide is yields an error

While running through the quick start guide, I get an error when trying to run the gRPC example under the "perform an inference test" section.

Running the command as written in the guide gives an error:

MODEL_NAME=example-mnist-predictor
grpcurl \
  -plaintext \
  -proto fvt/proto/kfs_inference_v2.proto \
  -d '{ "model_name": "'"${MODEL_NAME}"'", "inputs": [{ "name": "predict", "shape": [1, 64], "datatype": "FP32", "contents": { "fp32_contents": [0.0, 0.0, 1.0, 11.0, 14.0, 15.0, 3.0, 0.0, 0.0, 1.0, 13.0, 16.0, 12.0, 16.0, 8.0, 0.0, 0.0, 8.0, 16.0, 4.0, 6.0, 16.0, 5.0, 0.0, 0.0, 5.0, 15.0, 11.0, 13.0, 14.0, 0.0, 0.0, 0.0, 0.0, 2.0, 12.0, 16.0, 13.0, 0.0, 0.0, 0.0, 0.0, 0.0, 13.0, 16.0, 16.0, 6.0, 0.0, 0.0, 0.0, 0.0, 16.0, 16.0, 16.0, 7.0, 0.0, 0.0, 0.0, 0.0, 11.0, 13.0, 12.0, 1.0, 0.0] }}]}' \
  localhost:8033 \
  inference.GRPCInferenceService.ModelInfer

To Reproduce
Steps to reproduce the behavior:

Install via the quick start guide instructions
Try to do an inference with the sample given for gRPC
See the following error

Failed to process proto source files.: could not parse given files: open fvt/proto/kfs_inference_v2.proto: no such file or directory

Expected behavior
I expect the inference to be returned as documented.

Client Version: 4.5.0-202005291417-9933eb9
Server Version: 4.9.0-rc.5
Kubernetes Version: v1.22.0-rc.0+8719299

When I run the REST example, I do get the expected results for the inference.

Update Triton Runtime Server to support TensorRT 8

Need to support TensorRT 8 model which run in Triton Inference Server 21.11-py3 (or above)

I tried to update Triton Inference Server image to nvcr.io/nvidia/tritonserver:21.09-py3 and got the below error.
It seem to have change in interface then could not connect to the server

Describe your proposed solution

Describe alternatives you have considered

Additional context

ClusterServingRuntime support

KServe supports cluster-scoped ServingRuntimes called ClusterServingRuntimes. These act as the built-in or default serving runtimes accessible to any user/namespace in the cluster. Currently ModelMesh-Serving only considers the the namespace-scoped ServingRunimes. Let's think about how ModelMesh-Serving can handle these cluster-level resources.

Sync ServingRuntime spec between MM and KServe

Currently the CRDs for ServingRuntimes are generated in two places: the kserve/kserve repo and the kserve/modelmesh-serving repo. Although it'd be preferred to have a single source, for now, let's ensure that the specs are synchronized. This way a user won't run into issues when using ServingRuntimes in the different contexts.

List of changes;

KServe has readinessProbes as a part of the Container struct.
- Needs PR in ModelMesh-Serving
MM has PR for introducing multiModel field to SR (#89)
- Once merged, needs PR in `KServe'
KServe has PR for introducing autoPlace field to ModelType/Framework (kserve/kserve#1948)
- Once merged, needs PR in `ModelMesh-Serving'
KServe ServingRuntime and InferenceService Predictor Spec Renames
- Rename Framework/ModelType to ModelFormat on InferenceService CR
- Rename Framework/ModelType to SupportedModelFormat on ServingRuntime CR

Installer from quick start guide yields an error

While trying to run through the quick start guide, the install script runs into an error

Steps to reproduce the behavior:

kubectl create namespace modelmesh-serving
./scripts/install.sh --namespace modelmesh-serving --quickstart
Wait for installer to run
See the following error

Installing ModelMesh Serving built-in runtimes
Error: unknown flag: --load-restrictor
error: no objects passed to apply

Expected behavior

I expect the installer in the quick start guide to run without errors with the provided instructions.

Output from installer run attached.

mminstall.txt

Environment:

Client Version: 4.5.0-202005291417-9933eb9
Server Version: 4.9.0-rc.5
Kubernetes Version: v1.22.0-rc.0+8719299

Additional context
Running on OpenShift 4.9

Create developer document

To help onboard new developers, we should have a document that goes over some of the development processes.

Should include:

How to set up local environment
How to build development image
How to run unit tests and linting

Can also add things like running controller-gen to generate code and manifests.

Establish release process

We should determine how releases should be handled and document the process.

Some notes:

Currently KFServing has a RELEASE_PROCESS.md document (though it is a bit dated) that outlines what needs to be done during a release. Essentially, several image tags are updated in the config yamls to be pinned to the specific release (instead of latest). Then a single yaml file is generated containing all the resources as seen here. There are also workflows for publishing images to Dockerhub for the tagged version when a tag in GitHub is created.
For KFServing, all the generated install yaml files for each version can be seen in the install directory: https://github.com/kubeflow/kfserving/tree/master/install and also as assets under the actual Release on GitHub (e.g. https://github.com/kubeflow/kfserving/releases). I think it's probably fine to forgo the install directory paradigm, and just rely on publishing needed assets under each GitHub release. This will keep things cleaner. For this, we might add what is published internally (install script and tar.gz of the config folder).
We can also adjust the install script to accept a release version as an argument, and the script will pull the config files from the Github release and install as normal. However, the install script already has a --install-config-path argument that can be leveraged.
Should modelmesh and modelmesh-runtime-adapter release cadence and versioning be tied to modelmesh-serving?

Tentative work items:

Github Actions workflow for publishing images when a tag is created.
Script for automating the image tag replacement for the config yamls that will be tied to the release.
Document end to end process for a release.

Import ServingRuntime and InferenceService types from KServe

Types for these resources are currently defined in two places: the kserve repository and the modelmesh-serving repository. Let's investigate importing these types from KServe and using those within modelmesh-serving as a step towards unification.

Increase default gRPC max message size for mlserver runtime

The default of 4MiB is often too small for many use cases. This should be increased in the mlserver serving runtime config.
This was recently made configurable via SeldonIO/MLServer#317. Should probably be 16Mib to match the modelmesh default.

Consolidate KFS Predictor and MM Predictor schema

This is a parent issue for this overall effort. The goal here is to figure out how we can merge and evolve the two different specs moving forward.

Sub issues:

Make GitHub Action workflows more selective

GitHub Action workflows should be more selective on when they run. For example, if just the README or docs are updated, we shouldn't have to run unit tests, although maybe linting is still needed for those. Path filters can be utilized for these scenarios.

Examples do not work

Many examples do not work

Hi,

Thanks for providing modelmesh-serving! I'm trying to use examples provided in ./config/example-predictors/ but I found only example-sklearn-mnist-svm can be accessed. Other examples, such as example-onnx-mnist, example-tensorflow-mnist, example-keras-mnist can't be used. Specifically, clients (both grpc and REST) return "predict() method not implemented" error.

To Reproduce
Steps to reproduce the behavior:

Follow the example in quickstart.md but change the model from example-mnist-predictor to others, such as example-onnx-mnist
Run:

MODEL_NAME=example-tensorflow-mnist
curl -X POST -k http://localhost:8008/v2/models/${MODEL_NAME}/infer -d '{"inputs": [{ "name": "predict", "shape": [1, 64], "datatype": "FP32", "data": [0.0, 0.0, 1.0, 11.0, 14.0, 15.0, 3.0, 0.0, 0.0, 1.0, 13.0, 16.0, 12.0, 16.0, 8.0, 0.0, 0.0, 8.0, 16.0, 4.0, 6.0, 16.0, 5.0, 0.0, 0.0, 5.0, 15.0, 11.0, 13.0, 14.0, 0.0, 0.0, 0.0, 0.0, 2.0, 12.0, 16.0, 13.0, 0.0, 0.0, 0.0, 0.0, 0.0, 13.0, 16.0, 16.0, 6.0, 0.0, 0.0, 0.0, 0.0, 16.0, 16.0, 16.0, 7.0, 0.0, 0.0, 0.0, 0.0, 11.0, 13.0, 12.0, 1.0, 0.0]}]}'

Result:

{"code":2,"message":"inference.GRPCInferenceService/ModelInfer: UNKNOWN: Unexpected <class 'NotImplementedError'>: predict() method not implemented"}

Environment:

OS: Ubuntu 20.04

Triton Python Backend

Is your feature request related to a problem? If so, please describe.

Describe your proposed solution
I was wondering why Modelmesh doesn't support Triton Python backend? If there is no specific reason, can we add that to the roadmap?

Describe alternatives you have considered

Additional context

Error for golangci-lint when using go 1.17

Describe the bug

Error for golangci-lint when using go 1.17
Lint error when upgrade golangci-lint version

To Reproduce
Steps to reproduce the behavior:

➜  modelmesh-serving git:(remove-trainedmodel) ✗ go version
go version go1.17.1 darwin/amd64

When using original v1.32.0 of golangci-lint

➜  modelmesh-serving git:(remove-trainedmodel) ✗ pre-commit run --all-files
golangci-lint............................................................Failed
- hook id: golangci-lint
- exit code: 2

fatal error: unexpected signal during runtime execution
[signal SIGSEGV: segmentation violation code=0x1 addr=0xb01dfacedebac1e pc=0x7fff20425c9e]

runtime stack:
runtime: unexpected return pc for runtime.sigpanic called from 0x7fff20425c9e
stack: frame={sp:0x7ffeefbff218, fp:0x7ffeefbff268} stack=[0x7ffeefb802b8,0x7ffeefbff320)
0x00007ffeefbff118:  0x01007ffeefbff138  0x0000000000000004
0x00007ffeefbff128:  0x000000000000001f  0x00007fff20425c9e
0x00007ffeefbff138:  0x0b01dfacedebac1e  0x0000000000000001
0x00007ffeefbff148:  0x0000000004038831 <runtime.throw+0x0000000000000071>  0x00007ffeefbff1e8
0x00007ffeefbff158:  0x00000000049f4939  0x00007ffeefbff1a0
0x00007ffeefbff168:  0x0000000004038ae8 <runtime.fatalthrow.func1+0x0000000000000048>  0x00000000051fbd40
0x00007ffeefbff178:  0x0000000000000001  0x0000000000000001
0x00007ffeefbff188:  0x00007ffeefbff1e8  0x0000000004038831 <runtime.throw+0x0000000000000071>
0x00007ffeefbff198:  0x00000000051fbd40  0x00007ffeefbff1d8
0x00007ffeefbff1a8:  0x0000000004038a70 <runtime.fatalthrow+0x0000000000000050>  0x00007ffeefbff1b8
0x00007ffeefbff1b8:  0x0000000004038aa0 <runtime.fatalthrow.func1+0x0000000000000000>  0x00000000051fbd40
0x00007ffeefbff1c8:  0x0000000004038831 <runtime.throw+0x0000000000000071>  0x00007ffeefbff1e8
......

Investigated it should be due to golangci-lint version does not match go version, so upgrade to v1.42.1

➜  modelmesh-serving git:(remove-trainedmodel) ✗ pre-commit run --all-files
[INFO] Initializing environment for https://github.com/golangci/golangci-lint.
[INFO] Installing environment for https://github.com/golangci/golangci-lint.
[INFO] Once installed this environment will be reused.
[INFO] This may take a few minutes...
golangci-lint............................................................Failed
- hook id: golangci-lint
- exit code: 1

apis/serving/v1alpha1/predictor_types.go:42:22: fieldalignment: struct with 24 pointer bytes could be 16 (govet)
type S3StorageSource struct {
                     ^
apis/serving/v1alpha1/predictor_types.go:49:12: fieldalignment: struct with 56 pointer bytes could be 48 (govet)
type Model struct {
           ^
apis/serving/v1alpha1/predictor_types.go:144:18: fieldalignment: struct with 72 pointer bytes could be 64 (govet)
type FailureInfo struct {
                 ^
apis/serving/v1alpha1/predictor_types.go:163:22: fieldalignment: struct with 88 pointer bytes could be 80 (govet)
type PredictorStatus struct {
                     ^
apis/serving/v1alpha1/servingruntime_types.go:23:16: fieldalignment: struct with 24 pointer bytes could be 16 (govet)
type ModelType struct {
               ^
apis/serving/v1alpha1/servingruntime_types.go:30:16: fieldalignment: struct with 160 pointer bytes could be 144 (govet)
type Container struct {
               ^
apis/serving/v1alpha1/servingruntime_types.go:79:25: fieldalignment: struct with 136 pointer bytes could be 120 (govet)
type ServingRuntimeSpec struct {
                        ^
apis/serving/v1alpha1/servingruntime_types.go:163:21: fieldalignment: struct of size 424 could be 416 (govet)
type ServingRuntime struct {
                    ^
controllers/config.go:50:13: fieldalignment: struct of size 632 could be 608 (govet)
type Config struct {
            ^
controllers/config.go:78:23: fieldalignment: struct of size 32 could be 24 (govet)
type PrometheusConfig struct {
                      ^
controllers/config.go:99:22: fieldalignment: struct with 152 pointer bytes could be 128 (govet)
type RESTProxyConfig struct {
                     ^
controllers/config.go:121:21: fieldalignment: struct with 80 pointer bytes could be 72 (govet)
type ConfigProvider struct {
                    ^
controllers/config.go:234:27: fieldalignment: struct with 72 pointer bytes could be 64 (govet)
type ResourceRequirements struct {
                          ^
controllers/service_controller.go:68:24: fieldalignment: struct with 128 pointer bytes could be 120 (govet)
type ServiceReconciler struct {
                       ^
controllers/servingruntime_controller.go:59:31: fieldalignment: struct with 136 pointer bytes could be 120 (govet)
type ServingRuntimeReconciler struct {
                              ^
controllers/modelmesh/cluster_config.go:45:20: fieldalignment: struct with 32 pointer bytes could be 24 (govet)
type ClusterConfig struct {
                   ^
controllers/modelmesh/modelmesh.go:38:17: fieldalignment: struct of size 408 could be 376 (govet)
type Deployment struct {
                ^
controllers/modelmesh/endpoint_test.go:22:13: fieldalignment: struct with 48 pointer bytes could be 40 (govet)
	tests := []struct {
	           ^
controllers/modelmesh/model_type_labels_test.go:73:18: fieldalignment: struct with 112 pointer bytes could be 104 (govet)
	tableTests := []struct {
	                ^
controllers/modelmesh/runtime_test.go:107:37: fieldalignment: struct with 24 pointer bytes could be 16 (govet)
var addStorageConfigVolumeTests = []struct {
                                    ^
controllers/modelmesh/util_test.go:30:15: fieldalignment: struct with 80 pointer bytes could be 72 (govet)
var tests = []struct {
              ^
fvt/fvtclient.go:70:16: fieldalignment: struct with 96 pointer bytes could be 80 (govet)
type FVTClient struct {
               ^
pkg/mmesh/etcdrangewatcher.go:73:15: fieldalignment: struct with 32 pointer bytes could be 24 (govet)
type KeyEvent struct {
              ^
pkg/mmesh/grpc_resolver.go:36:22: fieldalignment: struct with 56 pointer bytes could be 48 (govet)
type serviceResolver struct {
                     ^
pkg/mmesh/grpc_resolver.go:85:19: fieldalignment: struct with 64 pointer bytes could be 48 (govet)
type KubeResolver struct {
                  ^
pkg/mmesh/modelmesh_service.go:29:16: fieldalignment: struct of size 104 could be 88 (govet)
type MMService struct {
               ^
pkg/predictor_source/cached_predictor_source.go:45:27: fieldalignment: struct with 16 pointer bytes could be 8 (govet)
type PredictorStreamEvent struct {
                          ^
pkg/predictor_source/cached_predictor_source.go:76:28: fieldalignment: struct of size 120 could be 112 (govet)
type cachedPredictorSource struct {
                           ^
pkg/predictor_source/watchrefresh_predictor_source_test.go:34:18: fieldalignment: struct with 72 pointer bytes could be 40 (govet)
type testWatcher struct {
                 ^
controllers/modelmesh/etcd.go:45:18: unusedwrite: unused write to field ReadOnly (govet)
					volumeMount.ReadOnly = true
					            ^
controllers/modelmesh/etcd.go:46:18: unusedwrite: unused write to field MountPath (govet)
					volumeMount.MountPath = etcdMountPath
					            ^
pkg/mmesh/etcdrangewatcher.go:140:17: unusedwrite: unused write to field found (govet)
								current.found = true
								        ^

Expected behavior
No error

Message: inference.GRPCInferenceService/ModelInfer: INVALID_ARGUMENT: unexpected inference input 'predict' for model 'example-mnist-predictor__ksp-c3597b719f'

Try to update the model type(using tensorflow instead of sklearn) and location following the example , will get this error:

 % grpcurl \
  -plaintext \
  -proto fvt/proto/kfs_inference_v2.proto \
  -d '{ "model_name": "example-mnist-predictor", "inputs": [{ "name": "predict", "shape": [1, 64], "datatype": "FP32", "contents": { "fp32_contents": [0.0, 0.0, 1.0, 11.0, 14.0, 15.0, 3.0, 0.0, 0.0, 1.0, 13.0, 16.0, 12.0, 16.0, 8.0, 0.0, 0.0, 8.0, 16.0, 4.0, 6.0, 16.0, 5.0, 0.0, 0.0, 5.0, 15.0, 11.0, 13.0, 14.0, 0.0, 0.0, 0.0, 0.0, 2.0, 12.0, 16.0, 13.0, 0.0, 0.0, 0.0, 0.0, 0.0, 13.0, 16.0, 16.0, 6.0, 0.0, 0.0, 0.0, 0.0, 16.0, 16.0, 16.0, 7.0, 0.0, 0.0, 0.0, 0.0, 11.0, 13.0, 12.0, 1.0, 0.0] }}]}' \
  localhost:8033 \
  inference.GRPCInferenceService.ModelInfer
ERROR:
  Code: InvalidArgument
  Message: inference.GRPCInferenceService/ModelInfer: INVALID_ARGUMENT: unexpected inference input 'predict' for model 'example-mnist-predictor__ksp-c3597b719f'

Remove TrainedModel CR reconciliation

With the TrainedModel CRD being deprecated in favor of using the InferenceService CRD for both single model and multi model deployments, we can go ahead and remove support for TrainedModel reconciliation. Our current support is still using the old serving.kubeflow.org/v1alpha1 APIGroupVersion anyway.

Investigate storage differences in Predictor vs InferenceService

Currently, the way a user defines a storage location of a model varies between the MM Predictor and KFS InferenceService.

With the InferenceService, a user specifies a storageUri (ex: s3://kfserving-examples/tf-models/mnist).
With the Predictor, a user can specify a storage object, but the path would be it's own key/value. Example:

path: tf-models/mnist
storage:
    s3:
      secretKey: my_storage
      bucket: kfserving-examples

In some cases, the storage object can be omitted in the case of custom runtimes managing their own storage.

Would be good to examine the advantages and disadvantages of these approaches and see how we may consolidate.

Change ModelMesh Serving controller to work with multiple namespaces

The current ModelMesh Serving is namespace scoped, meaning all of its components must exist within a single namespace and only one instance of ModelMesh Serving can be installed per namespace. The idea is to make ModelMesh Serving cluster scoped so that one set of controller components can serve multiple namespaces.

The limitation of having a controller and a set of Serving Runtimes in each namespace means sharing is not possible between namespaces and would cause unnecessarily high resource consumption when ModelMesh Serving is installed in different namespaces.

At a high level, we would like to change Serving Runtimes to be cluster scoped and make the controller to manage resources across namespaces. Here is the doc with more details, including a couple of possible solutions. The doc will be updated whenever decisions are made and implementations are identified.

Triton on-wire compression of request/response on HTTP

Is it possible to leverage Triton client (https://github.com/triton-inference-server/client) features like the on-wire compression of request/response on HTTP using the current /infer endpoint? (https://github.com/triton-inference-server/server/blob/main/docs/inference_protocols.md#compression)

If not, will be implemented on future ModelMesh releases?

Spark MLlib support

MLServer supports Spark MLlib. We should verify this, and add it as a supported framework to the MLServer Serving Runtime. This should be accompanied by functional tests for the framework as well as documentation.

ModelMesh Release Tracker for KServe v0.7.0

The plan is to cut the KServe 0.7 release mid next week. For this release, ModelMesh will be loosely integrated with KServe.

Action Items:

Sync InferenceService CR to ModelMesh Serving controller

Describe the bug

{"level":"error","ts":1640666407.3959372,"logger":"controller.predictor","msg":"Reconciler error","reconciler group":"serving.kserve.io","reconciler kind":"Predictor","name":"example-sklearn-isvc","namespace":"isvc_modelmesh-serving","error":"failed to fetch CR from kubebuilder cache for predictor example-sklearn-isvc: No valid InferenceService predictor framework found","stacktrace":"sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/root/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:227"}

To Reproduce
Steps to reproduce the behavior:

Create InferenceService CR with modelmesh annotation
ModelMesh controller fails to reconcile with above error

Expected behavior

Screenshots

Environment (please complete the following information):

OS: [e.g. iOS]
Browser [e.g. chrome, safari]
Version: latest modelmesh-controller from master branch

Additional context

Add GCS storage provider to pullman

The ModelMesh Runtime Adapter has been refactored to leverage a go-library named pullman . Currently it supports http and s3 based on the storage providers listed here. Since many KServe users use Google Cloud Storage, let's aim to add pullman support for GCS.

For reference:

GCS Go package: https://pkg.go.dev/cloud.google.com/go/storage
KServe MMS Agent GCS puller logic: https://github.com/kserve/kserve/blob/master/pkg/agent/storage/gcs.go

InferenceService Deployment with ModelMesh Example: fail to send requests from Istio Ingress

/kind bug

What steps did you take and what happened:
I follow the InferenceService Deployment with ModelMesh: example-sklearn-isvc from here

Here is what I have already get, and it has been tested successfully by using the port-forward method as the example instruction.

However, I am considering to achieve serverless function by curl from Ingress gateway with HOST Header.
I have tried different curl commands, but none of them work.

echo ${SERVICE_HOSTNAME} modelmesh-serving.modelmesh-serving:8008

Here is what I get by using kubectl get inferenceservice
example-sklearn-isvc grpc://modelmesh-serving.modelmesh-serving:8033 True 2d9h

curl -v -H "Host: ${SERVICE_HOSTNAME}" http://${INGRESS_HOST}:${INGRESS_PORT}/v1/models/example-sklearn-isvc:predict -d @./isvc-input.json

curl -v -H "Host: ${SERVICE_HOSTNAME}" http://${INGRESS_HOST}:${INGRESS_PORT}/v1/models/example-sklearn-isvc/infer -d @./isvc-input.json

They all show the 404 Not found in the following:
`Trying 192.168.49.2:30413...
TCP_NODELAY set
Connected to 192.168.49.2 (192.168.49.2) port 30413 (#0)
POST /v1/models/example-sklearn-isvc/infer HTTP/1.1
Host: modelmesh-serving.modelmesh-serving
User-Agent: curl/7.68.0
Content-Length: 431
Content-Type: application/x-www-form-urlencoded

upload completely sent off: 431 out of 431 bytes
Mark bundle as not supporting multiuse
HTTP/1.1 404 Not Found
date: Thu, 02 Jun 2022 20:53:01 GMT
server: istio-envoy
connection: close
content-length: 0
Closing connection 0`

Environment:

Istio Version: v1.13.4
Knative Version:v1.4.0
KFServing Version:v0.7.0
Kubeflow version: no use
Kfdef:[k8s_istio/istio_dex/gcp_basic_auth/gcp_iap/aws/aws_cognito/ibm]
Minikube version: v1.25.2
Kubernetes version: (use kubectl version): v1.24.1

Could you figure out where the problem is? Thanks

The install.sh script failed when used with -u

Describe the bug

./scripts/install.sh: line 280: ctrl_ns: unbound variable

To Reproduce

kubectl create ns modelmesh-serving
kubectl create ns tedchang1
kubectl create ns tedchang2
./scripts/install.sh --namespace modelmesh-serving --quickstart -u "tedchang1 tedchang2"

Expected behavior

namespace/tedchang1 labeled
servingruntime.serving.kserve.io/mlserver-0.x created
servingruntime.serving.kserve.io/triton-2.x created
secret/storage-config created
namespace/tedchang2 labeled
servingruntime.serving.kserve.io/mlserver-0.x created
servingruntime.serving.kserve.io/triton-2.x created
secret/storage-config created
Successfully installed ModelMesh Serving!

OS: OSX 12.1

Documentation refresh

Internally, the documentation has been expanded and overhauled. We should go through and sync the docs with this repository.

Related: kserve/website#17

ModelMesh performance benchmarks

Would be good to analyze the performance and perhaps compare with the other current model serving methods available.

Some ideas:

Maximum number of models able to deploy on a specific cluster. Density testing.
Request latency under varying loads or model densities (using both gRPC and REST when supported)

Other ideas and tooling for benchmarking are welcome.

install.sh script failed with "usage: sleep seconds"

Describe the bug
The install.sh failed with this message on OSX 12.x but not in OSX 11.x. It seems that sleep "10s" syntax no long works after the latest OSX update.

Pods found with selector '--field-selector metadata.name=etcd' are not ready yet. Waiting 10 secs...
usage: sleep seconds

To Reproduce
Following the Quick Start guide to run a ./scripts/install.sh --namespace modelmesh-serving --quickstart command.

Expected behavior
Installing ModelMesh Serving built-in runtimes

Environment (please complete the following information):

OS: OSX 12.1

[Discussion] Is there any plan to remove the dependency on an extra etcd cluster?

Is your feature request related to a problem? If so, please describe.

Hi~ I'm new here and interested in this project.

I found that the system depends on an etcd cluster outside of kubernetes. I think it will bring extra cost to maintain the etcd.

So I was wondering is there any plan to remove the dependency on an extra etcd cluster ?

Describe your proposed solution

Maybe we can use another CRD which could describe a deployed model (e.g. PredictorInstance or Model ?). Thus the ModelMesh sidecar could watch/update the CRD and send request to ServiceRuntime. (The mechanism maybe similar to pod in kubernetes)

Describe alternatives you have considered

Additional context

Add functional tests as a check for PRs

Currently, FVTs are able to run on a nightly basis using IBM Toolchains with support for that included in #7. However, we want to use this set of tests as a gate for incoming PRs, so they will need to be run on a per-PR basis.

Toolchains on IBM Cloud appears to have limitations regarding exposing logs and status to external users, so alternative avenues may need to be explored. I see that there is the ability to download log assets from toolchain pipeline runs (in zip format), so I am wondering if there is an API endpoint that can be used to download these logs so that we can rehost them somewhere externally accessible (perhaps on the prow server if prow can be used to invoke the pipeline run?).

Some documentation for Kubeflow infra is noted here: https://github.com/kubeflow/testing#test-infrastructure. They essentially use prow to submit argo workflows for building and running tests.

Another thing to potentially explore is using GitHub Actions self-hosted runners.

FYI @chinhuang007

Add ServingRuntime field to denote modelmesh compatibility

With KServe now using ServingRuntimes, we need to make sure that the ModelMesh-Serving runtime selection logic only selects runtimes that are ModelMesh compatible. While currently, KServe leverages the existence of the field GrpcMultiModelManagementEndpoint to determine this, let's try to make this more explicit with a dedicated boolean field in the SR spec.

Something like modelMeshCompatible, isMMS, supportsModelMesh? Open to suggestions.

We will also have to make sure the SR controller only creates deployments for accessible runtimes with this field se to true.

Add ModelMetadata endpoint support

Users may want to find the input tensor name and/or shape of a particular model. The v2 predict protocol outlines a ModelMetadata API. We should have ModelMesh support this as I believe Triton and MLServer already expose this endpoint.

Related issue and discussion: #104

Get ModelMetadata using gRPC

I have the example-tensorflow-mnist predictor deploy and the triton-2.x pods running to serve the tensorflow model.

I'm trying to get the model metadata using gRPC but I'm facing issues:

MODEL_NAME: example-tensorflow-mnist
gRPC command: grpcurl -plaintext -proto fvt/proto/kfs_inference_v2.proto -d '{ "name": "'"${MODEL_NAME}"'"}' localhost:8033 inference.GRPCInferenceService.ModelMetadata
Error: ERROR: Code: InvalidArgument Message: must include mm-model-id header

Even adding the mm-model-id header I get this error:
ERROR: Code: Unimplemented Message: inference.GRPCInferenceService/ModelMetadata: UNIMPLEMENTED: Method not found or not permitted: inference.GRPCInferenceService/ModelMetadata

Thanks in advance

Model fails to load when using InferenceService storageUri

Describe the bug

To Reproduce
Steps to reproduce the behavior:

Create the following example InferenceService CR

apiVersion: serving.kserve.io/v1beta1
kind: InferenceService
metadata:
  name: mnist 
  annotations:
    serving.kserve.io/deploymentMode: ModelMesh
    serving.kserve.io/secretKey: localMinIO
spec:
  predictor:
    sklearn:
      storageUri: s3://modelmesh-example-models/sklearn/mnist-svm.joblib

Model fails to load on model puller sidecar

2021-12-29T12:56:32.918Z	DEBUG	MLServer Adapter.MLServer Adapter Server	Listing objects from s3	{"bucket": "modelmesh-example-models", "prefix": "/sklearn/mnist-svm.joblib"}
2021-12-29T12:56:32.960Z	DEBUG	MLServer Adapter.MLServer Adapter Server	Ignore downloading s3 object matching part of the path	{"bucket": "modelmesh-example-models", "prefix": "/sklearn/mnist-svm.joblib", "s3_path": "sklearn/mnist-svm.joblib"}
2021-12-29T12:56:32.960Z	ERROR	MLServer Adapter.MLServer Adapter Server.Load Model	Failed to pull model from storage	{"model_id": "mnist__isvc-04bd33724e", "error": "rpc error: code = Unknown desc = Failed to pull model from storage due to error: no objects found for path '/sklearn/mnist-svm.joblib'"}
github.com/kserve/modelmesh-runtime-adapter/internal/proto/mmesh._ModelRuntime_LoadModel_Handler
	/opt/app/internal/proto/mmesh/model-runtime_grpc.pb.go:175
google.golang.org/grpc.(*Server).processUnaryRPC
	/root/go/pkg/mod/google.golang.org/[email protected]/server.go:1217
google.golang.org/grpc.(*Server).handleStream
	/root/go/pkg/mod/google.golang.org/[email protected]/server.go:1540
google.golang.org/grpc.(*Server).serveStreams.func1.2
	/root/go/pkg/mod/google.golang.org/[email protected]/server.go:878

Expected behavior

The model exists on Minio and show load successfully, the issue seems to be the modelPath which was set to sklearn/mnist-svm.joblib in v0.7.0 while from master branch it sets to /sklearn/mnist-svm.joblib which caused the above error.

Screenshots

Environment (please complete the following information):

OS: [e.g. iOS]
Browser [e.g. chrome, safari]
Version: latest modelmesh-controller from master branch

Additional context

Support of model pipeline

Does kserve modelmesh support pipelining models in consecutive steps in the form of inference graph? Some thing like similar technologies like Seldon Inference graph, Ray Deployment Graph?

Add sample Keras model to minio images

Since ModelMesh supports deploying Keras models as noted here, we should ensure that our minio images have a sample Keras model that users can try.

Both the FVT and quickstart image should probably be updated:
kserve/modelmesh-minio-dev-examples
kserve/modelmesh-minio-examples

Add TorchServe as a built-in Serving Runtime

TorchServe will soon support the KServe v2 inference protocol and can support additional types of PyTorch models that Triton does not currently support.

Create a serving runtime for TorchServe with a corresponding ModelMesh adapter: kserve/modelmesh-runtime-adapter#4

The v2 support is still WIP (see kserve/kserve#1870 and pytorch/serve#1190), but I think this torchserve image: jagadeeshj/torchserve-kfsv2:1.0 can be used in the meantime until it's official.

Reconcile KServe InferenceService CRD

ModelMesh-Serving needs to be able to reconcile InferenceServices with the serving.kserve.io/deploymentMode: ModelMesh annotation. If this annotation does not exist, the resource will be ignored (the KServe controller will handle reconciliation).

Related: kserve/kserve#54

For the first iteration, this will be similar to how a TrainedModel is reconciled in ModelMesh-Serving where TrainedModel fields are mapped internally to PredictorSpec fields as shown here.

The goal is for an InferneceService YAML like the following to be applied and the corresponding model deployed using ModelMesh:

apiVersion: serving.kserve.io/v1beta1
kind: InferenceService
metadata:
 name: example-sklearn-mnist-svm
 annotations:
   serving.kserve.io/deploymentMode: ModelMesh
   serving.kserve.io/secret-key: localMinio
Spec:
 predictor:
   sklearn:
     storageUri: s3://sklearn/mnist-svm.joblib

Note that this would be the structure with the InferenceService CRD's current state. There is ongoing and upcoming work to introduce a modelType field to the InferenceService and also work for storage/credential handling.

TensorRT examples and tests

The Triton Inference Server supports TensorRT models and our the Triton Serving Runtime indicates this.

We should include some documentation, examples, and tests to demonstrate this. It probably needs to be investigated if TensorRT will work on Triton without a gpu.

Add functional tests for InferenceServices

The InferenceService CRD is the primary interface for KServe, so we want to ensure that model deployments on ModelMesh work as expected when users use InferenceService CRD.

I think for this, we just need some basic tests:

Deploy an Isvc with the "serving.kserve.io/deploymentMode": "ModelMesh" annotation, and ensure that it becomes ready, and you can perform inference.

Would like to test the following formats:

apiVersion: serving.kserve.io/v1beta1
kind: InferenceService
metadata:
  name: example-sklearn-isvc
  annotations:
    serving.kserve.io/deploymentMode: ModelMesh
    serving.kserve.io/secretKey: localMinIO
spec:
  predictor:
    sklearn:
      storageUri: s3://modelmesh-example-models/sklearn/mnist-svm.joblib

And also using the new format:

apiVersion: serving.kserve.io/v1beta1
kind: InferenceService
metadata:
  name: example-sklearn-isvc2
  annotations:
    serving.kserve.io/deploymentMode: ModelMesh
    serving.kserve.io/secretKey: localMinIO
spec:
  predictor:
    model:
      modelFormat:
        name: sklearn 
      storageUri: s3://modelmesh-example-models/sklearn/mnist-svm.joblib

And once the new storage spec is in (kserve/kserve#1899):

apiVersion: serving.kserve.io/v1beta1
kind: InferenceService
metadata:
  name: example-sklearn-isvc
  annotations:
    serving.kserve.io/deploymentMode: ModelMesh
spec:
  predictor:
    sklearn:
      storage:
        key: localMinIO
        path: sklearn/mnist-svm.joblib
        # schemaPath: null
        parameters:
          bucket: modelmesh-example-models

The run-fvt.yml GitHub actions workflow will probably need to be updated to install the InferenceService CRD onto the minikube cluster before ModelMesh-Serving is installed.

need to remove the 'modelmesh-serving' in k8s configure

Is that right behavior that we need to rollback the kubectl config to its previous status:
Initial status

 % kubectl config view 
apiVersion: v1
clusters:
- cluster:
    certificate-authority-data: DATA+OMITTED
    server: https://kubernetes.docker.internal:6443
  name: docker-desktop
contexts:
- context:
    cluster: docker-desktop
    user: docker-desktop
  name: docker-desktop
current-context: docker-desktop
kind: Config
preferences: {}
users:
- name: docker-desktop
  user:
    client-certificate-data: REDACTED
    client-key-data: REDACTED

After install modelmesh serving, it is added namespace: modelmesh-serving to current-context:

% kubectl config view                       
apiVersion: v1
clusters:
- cluster:
    certificate-authority-data: DATA+OMITTED
    server: https://kubernetes.docker.internal:6443
  name: docker-desktop
contexts:
- context:
    cluster: docker-desktop
    namespace: modelmesh-serving   <<<<<<<<<<<<< new added 
    user: docker-desktop
  name: docker-desktop
current-context: docker-desktop
kind: Config
preferences: {}
users:
- name: docker-desktop
  user:
    client-certificate-data: REDACTED
    client-key-data: REDACTED

However, after deleting your ModelMesh Serving installation:

 % kubectl config view 
apiVersion: v1
clusters:
- cluster:
    certificate-authority-data: DATA+OMITTED
    server: https://kubernetes.docker.internal:6443
  name: docker-desktop
contexts:
- context:
    cluster: docker-desktop
    namespace: modelmesh-serving     <<<<<<<<< It is still here
    user: docker-desktop
  name: docker-desktop
current-context: docker-desktop
kind: Config
preferences: {}
users:
- name: docker-desktop
  user:
    client-certificate-data: REDACTED
    client-key-data: REDACTED

Enable REST support

Currently only gRPC inference requests are supported with ModelMesh. However, there is a need for REST support in order to smoothen the user experience of interacting with ModelMesh models.

Currently, the idea is to have a transcoder container/service that will transcode the REST v2 protocol JSON into the gRPC format ModelMesh expects (and vice versa). The transcoding mappings will have to be explored, and performance will need to be kept in mind.

Update default MLServer serving runtime image

We are currently using mlserver 0.3.2, however the latest is 0.5.2. This should be updated.

kserve / modelmesh-serving Goto Github PK

modelmesh-serving's Introduction

ModelMesh Serving

Getting Started

Components and their Repositories

Core Components

Runtime Adapters

Model Serving runtimes

Supplementary

Libraries

Contributing

Building Images

modelmesh-serving's People

Contributors

Stargazers

Watchers

Forkers

modelmesh-serving's Issues

Overview

Acceptance Criteria

Questions

Assumptions

Reference

Recommend Projects

Recommend Topics

Recommend Org