Git Product home page Git Product logo

seldonio / seldon-core Goto Github PK

View Code? Open in Web Editor NEW
4.3K 82.0 825.0 234.21 MB

An MLOps framework to package, deploy, monitor and manage thousands of production machine learning models

Home Page: https://www.seldon.io/tech/products/core/

License: Other

Makefile 1.49% Java 0.90% Shell 1.48% Smarty 0.01% Python 14.52% Jupyter Notebook 15.37% R 0.23% Dockerfile 0.43% JavaScript 1.40% Go 12.92% C++ 10.54% Starlark 0.15% HTML 40.49% CMake 0.05% Mustache 0.02% Jinja 0.03%
kubernetes machine-learning deployment serving mlops aiops machine-learning-operations production-machine-learning

seldon-core's Issues

Create integration testing script

  • Creates GCP k8s cluster
  • launches a series of deployments and tests the grpc and REST endpoints for each
  • deletes clusters

Look at integrating manually or automatically for releases/pull requests

Support deployment of a Python 3 model

I was looking at deploying one of the examples (iris) using Python 3 but I don't think its possible to do it at the moment.

I changed the base image using --base-image=python:3 but because of one of the requirements in seldon_requirements.txt more specifically grpc the image cannot be built since that library only works on python 2.

Step 14/19 : RUN cd /tmp &&     pip install --no-cache-dir -r seldon_requirements.txt &&     pip install --no-cache-dir -r requirements.txt
 ---> Running in ee334e1dbc4f
Collecting numpy==1.11.2 (from -r seldon_requirements.txt (line 1))
  Downloading numpy-1.11.2.tar.gz (4.2MB)
Collecting pandas==0.18.1 (from -r seldon_requirements.txt (line 2))
  Downloading pandas-0.18.1.tar.gz (7.3MB)
Collecting grpc==0.3.post19 (from -r seldon_requirements.txt (line 3))
  Downloading grpc-0.3-19.tar.gz
    Complete output from command python setup.py egg_info:
    Traceback (most recent call last):
      File "<string>", line 1, in <module>
      File "/tmp/pip-build-he7l6xxx/grpc/setup.py", line 7, in <module>
        version_tuple = __import__('grpc').VERSION
      File "/tmp/pip-build-he7l6xxx/grpc/grpc/__init__.py", line 6, in <module>
        from .rpc import *
      File "/tmp/pip-build-he7l6xxx/grpc/grpc/rpc.py", line 141
        except OSError, ex:
                      ^
    SyntaxError: invalid syntax

I was also wondering if this requirement might be easy to remove since this is another RPC library and not grpc.io, thats is also in the list of seldon_requirements.txt.

Graph with epsilon greedy router sometimes fails on first request

The graph can be found in notebooks/resources/epsilon_greedy.json

When the graph has just been deployed and a first request is sent, it takes a long time and sometimes fails. The second request onward works fine.

When looking into the microservice logs it appears that the request was sent four times to the router and successive models. The issue can be reproduced by following the epsilon greedy example notebook:
notebooks/epsilon_greedy.ipynb

Create wrapper for PyTorch models

There are issues to create wrappers for R and Spark, so adding PyTorch here too as it was raised the community Slack earlier today. This would be a great first PR for anyone looking to contribute!

  • expose PyTorch models with a thin REST server or gRPC server that respects the internal model API
  • Build Docker image

Insufficient cpu error when creating complex graphs

When applying the seldon deployment below on a GCP cluster, the main pod status stays on pending (the canary works) and shows the following warning:

Warning FailedScheduling No nodes are available that match all of the predicates: Insufficient cpu (5).

The cluster has 5 nodes.

seldon deployment JSON:

{
"apiVersion": "machinelearning.seldon.io/v1alpha1",
"kind": "SeldonDeployment",
"metadata": {
"labels": {
"app": "seldon"
},
"name": "seldon-deployment-example"
},
"spec": {
"annotations": {
"project-name":"FX Market Prediction",
"deployment_version": "v1"
},
"name": "test-deployment-complex",
"oauth_key": "oauth-key",
"oauth_secret": "oauth-secret",
"predictors": [
{
"componentSpec": {
"spec": {
"containers": [
{
"image": "seldonio/mean_classifier:0.6",
"imagePullPolicy": "IfNotPresent",
"name": "mean-classifier-1",
"resources": {
"requests": {
"memory": "1Mi",
"cpu": "0.1"
}
}
},
{
"image": "seldonio/mean_classifier:0.6",
"imagePullPolicy": "IfNotPresent",
"name": "mean-classifier-2",
"resources": {
"requests": {
"memory": "1Mi",
"cpu": "0.1"
}
}
},
{
"image": "seldonio/mean_classifier:0.6",
"imagePullPolicy": "IfNotPresent",
"name": "mean-classifier-3",
"resources": {
"requests": {
"memory": "1Mi",
"cpu": "0.1"
}
}
},
{
"image": "seldonio/mock_outlier_detector:1.0",
"imagePullPolicy": "IfNotPresent",
"name": "outlier-detector",
"resources": {
"requests": {
"memory": "1Mi",
"cpu": "0.1"
}
}
},
{
"image": "seldonio/mock_transformer:1.0",
"imagePullPolicy": "IfNotPresent",
"name": "mean-transformer",
"resources": {
"requests": {
"memory": "1Mi",
"cpu": "0.1"
}
}
}
],
"terminationGracePeriodSeconds": 20
}
},
"name": "fx-market-predictor",
"replicas": 1,
"annotations": {
"predictor_version": "v1"
},
"graph": {
"name": "outlier-detector",
"type": "TRANSFORMER",
"endpoint": {
"type": "REST"
},
"children": [
{
"name": "random-abtest",
"implementation": "RANDOM_ABTEST",
"type": "UNKNOWN_TYPE",
"children": [
{
"name": "mean-transformer",
"type": "TRANSFORMER",
"endpoint": {
"type": "REST"
},
"children": [
{
"name": "mean-classifier-1",
"type": "MODEL",
"endpoint": {
"type": "REST"
}
}
]
},
{
"name": "ensemble",
"type": "UNKNOWN_TYPE",
"implementation": "AVERAGE_COMBINER",
"children": [
{
"name": "mean-classifier-2",
"type": "MODEL",
"endpoint": {
"type": "REST"
}
},
{
"name": "mean-classifier-3",
"type": "MODEL",
"endpoint": {
"type": "REST"
}
}
]
}
]
}
]
}
},
{
"componentSpec": {
"spec": {
"containers": [
{
"image": "seldonio/mean_classifier:0.6",
"imagePullPolicy": "IfNotPresent",
"name": "mean-classifier",
"resources": {
"requests": {
"memory": "1Mi",
"cpu": "0.1"
}
}
},
{
"image": "seldonio/mock_transformer:1.0",
"imagePullPolicy": "IfNotPresent",
"name": "mean-transformer",
"resources": {
"requests": {
"memory": "1Mi",
"cpu": "0.1"
}
}
}
],
"terminationGracePeriodSeconds": 20
}
},
"name": "fx-market-predictor-canary",
"replicas": 1,
"annotations": {
"predictor_version": "v1"
},
"graph": {
"name": "mean-transformer",
"type": "TRANSFORMER",
"endpoint": {
"type": "REST"
},
"children": [
{
"name": "mean-classifier",
"endpoint": {
"type": "REST"
},
"type": "MODEL"
}
]
}
}
]
}
}

Get rid of name property in spec

The real identifier for a Seldon Deployment is in the metadata. The spec.name attribute is redundant.
The cluster manager should not use it anymore.

Review status field for CRD

Add top level state to show when seldonDeployment is running as normal? replicasAvailable = replicas for each predictor?

Cluster manager stuck in an error loop after failed deployment

To reproduce:

  • Apply a seldon deployment with a broken resource request as follows:
requests: {
 "cpu": 0.1
}

(0.1 should be a string not a float)
Delete the seldon deployment. Now the cluster-manager is broken and logs the following error repeatedly:

2018-01-17 15:56:24.385 ERROR 5 --- [pool-1-thread-1] o.s.s.s.TaskUtils$LoggingErrorHandler    : Unexpected error occurred in scheduled task.

com.google.protobuf.InvalidProtocolBufferException: Can't decode io.kubernetes.client.proto.resource.Quantity from 0.1
	at io.seldon.clustermanager.pb.QuantityUtils$QuantityParser.merge(QuantityUtils.java:63) ~[classes!/:0.3.1]
	at io.seldon.clustermanager.pb.JsonFormat$ParserImpl.merge(JsonFormat.java:1241) ~[classes!/:0.3.1]
	at io.seldon.clustermanager.pb.JsonFormat$ParserImpl.parseFieldValue(JsonFormat.java:1797) ~[classes!/:0.3.1]
	at io.seldon.clustermanager.pb.JsonFormat$ParserImpl.mergeMapField(JsonFormat.java:1484) ~[classes!/:0.3.1]
	at io.seldon.clustermanager.pb.JsonFormat$ParserImpl.mergeField(JsonFormat.java:1458) ~[classes!/:0.3.1]
	at io.seldon.clustermanager.pb.JsonFormat$ParserImpl.mergeMessage(JsonFormat.java:1294) ~[classes!/:0.3.1]
	at io.seldon.clustermanager.pb.JsonFormat$ParserImpl.merge(JsonFormat.java:1252) ~[classes!/:0.3.1]
	at io.seldon.clustermanager.pb.JsonFormat$ParserImpl.parseFieldValue(JsonFormat.java:1797) ~[classes!/:0.3.1]
	at io.seldon.clustermanager.pb.JsonFormat$ParserImpl.mergeField(JsonFormat.java:1462) ~[classes!/:0.3.1]
	at io.seldon.clustermanager.pb.JsonFormat$ParserImpl.mergeMessage(JsonFormat.java:1294) ~[classes!/:0.3.1]
	at io.seldon.clustermanager.pb.JsonFormat$ParserImpl.merge(JsonFormat.java:1252) ~[classes!/:0.3.1]
	at io.seldon.clustermanager.pb.JsonFormat$ParserImpl.parseFieldValue(JsonFormat.java:1797) ~[classes!/:0.3.1]
	at io.seldon.clustermanager.pb.JsonFormat$ParserImpl.mergeRepeatedField(JsonFormat.java:1541) ~[classes!/:0.3.1]
	at io.seldon.clustermanager.pb.JsonFormat$ParserImpl.mergeField(JsonFormat.java:1460) ~[classes!/:0.3.1]
	at io.seldon.clustermanager.pb.JsonFormat$ParserImpl.mergeMessage(JsonFormat.java:1294) ~[classes!/:0.3.1]
	at io.seldon.clustermanager.pb.JsonFormat$ParserImpl.merge(JsonFormat.java:1252) ~[classes!/:0.3.1]
	at io.seldon.clustermanager.pb.JsonFormat$ParserImpl.parseFieldValue(JsonFormat.java:1797) ~[classes!/:0.3.1]
	at io.seldon.clustermanager.pb.JsonFormat$ParserImpl.mergeField(JsonFormat.java:1462) ~[classes!/:0.3.1]
	at io.seldon.clustermanager.pb.JsonFormat$ParserImpl.mergeMessage(JsonFormat.java:1294) ~[classes!/:0.3.1]
	at io.seldon.clustermanager.pb.JsonFormat$ParserImpl.merge(JsonFormat.java:1252) ~[classes!/:0.3.1]
	at io.seldon.clustermanager.pb.JsonFormat$ParserImpl.parseFieldValue(JsonFormat.java:1797) ~[classes!/:0.3.1]
	at io.seldon.clustermanager.pb.JsonFormat$ParserImpl.mergeField(JsonFormat.java:1462) ~[classes!/:0.3.1]
	at io.seldon.clustermanager.pb.JsonFormat$ParserImpl.mergeMessage(JsonFormat.java:1294) ~[classes!/:0.3.1]
	at io.seldon.clustermanager.pb.JsonFormat$ParserImpl.merge(JsonFormat.java:1252) ~[classes!/:0.3.1]
	at io.seldon.clustermanager.pb.JsonFormat$ParserImpl.parseFieldValue(JsonFormat.java:1797) ~[classes!/:0.3.1]
	at io.seldon.clustermanager.pb.JsonFormat$ParserImpl.mergeRepeatedField(JsonFormat.java:1541) ~[classes!/:0.3.1]
	at io.seldon.clustermanager.pb.JsonFormat$ParserImpl.mergeField(JsonFormat.java:1460) ~[classes!/:0.3.1]
	at io.seldon.clustermanager.pb.JsonFormat$ParserImpl.mergeMessage(JsonFormat.java:1294) ~[classes!/:0.3.1]
	at io.seldon.clustermanager.pb.JsonFormat$ParserImpl.merge(JsonFormat.java:1252) ~[classes!/:0.3.1]
	at io.seldon.clustermanager.pb.JsonFormat$ParserImpl.parseFieldValue(JsonFormat.java:1797) ~[classes!/:0.3.1]
	at io.seldon.clustermanager.pb.JsonFormat$ParserImpl.mergeField(JsonFormat.java:1462) ~[classes!/:0.3.1]
	at io.seldon.clustermanager.pb.JsonFormat$ParserImpl.mergeMessage(JsonFormat.java:1294) ~[classes!/:0.3.1]
	at io.seldon.clustermanager.pb.JsonFormat$ParserImpl.merge(JsonFormat.java:1252) ~[classes!/:0.3.1]
	at io.seldon.clustermanager.pb.JsonFormat$ParserImpl.merge(JsonFormat.java:1126) ~[classes!/:0.3.1]
	at io.seldon.clustermanager.pb.JsonFormat$Parser.merge(JsonFormat.java:305) ~[classes!/:0.3.1]
	at io.seldon.clustermanager.k8s.SeldonDeploymentUtils.jsonToSeldonDeployment(SeldonDeploymentUtils.java:44) ~[classes!/:0.3.1]
	at io.seldon.clustermanager.k8s.SeldonDeploymentWatcher.watchSeldonMLDeployments(SeldonDeploymentWatcher.java:123) ~[classes!/:0.3.1]
	at io.seldon.clustermanager.k8s.SeldonDeploymentWatcher.watch(SeldonDeploymentWatcher.java:146) ~[classes!/:0.3.1]
	at sun.reflect.GeneratedMethodAccessor112.invoke(Unknown Source) ~[na:na]
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[na:1.8.0_131]
	at java.lang.reflect.Method.invoke(Method.java:498) ~[na:1.8.0_131]
	at org.springframework.scheduling.support.ScheduledMethodRunnable.run(ScheduledMethodRunnable.java:65) ~[spring-context-4.3.6.RELEASE.jar!/:4.3.6.RELEASE]
	at org.springframework.scheduling.support.DelegatingErrorHandlingRunnable.run(DelegatingErrorHandlingRunnable.java:54) ~[spring-context-4.3.6.RELEASE.jar!/:4.3.6.RELEASE]
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [na:1.8.0_131]
	at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) [na:1.8.0_131]
	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) [na:1.8.0_131]
	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) [na:1.8.0_131]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [na:1.8.0_131]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [na:1.8.0_131]
	at java.lang.Thread.run(Thread.java:748) [na:1.8.0_131]

gitops demo

https://github.com/errordeveloper/seldon-gitops

The setups steps haven't been documented in the repo yet, happy to walk through it.

One thing to note is that I've done some more work extending what I have already contributed via #51. Namely, I added GCB config and optimised the build to use cache properly.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.