deislabs / osiris Goto Github PK

A general purpose, scale-to-zero component for Kubernetes

License: MIT License

Dockerfile 0.26% Makefile 2.86% Shell 1.43% Smarty 0.55% Go 94.90%

osiris's Introduction

Osiris - A general purpose, Scale to Zero component for Kubernetes

Osiris enables greater resource efficiency within a Kubernetes cluster by allowing idling workloads to automatically scale-to-zero and allowing scaled-to-zero workloads to be automatically re-activated on-demand by inbound requests.

Osiris, as a concept, is highly experimental and currently remains under heavy development.

How it works

Various types of Kubernetes resources can be Osiris-enabled using an annotation.

Osiris-enabled pods are automatically instrumented with a metrics-collecting proxy deployed as a sidecar container.

Osiris-enabled deployments (if already scaled to a configurable minimum number of replicas-- one by default) automatically have metrics from their pods continuously scraped and analyzed by the zeroscaler component. When the aggregated metrics reveal that all of the deployment's pods are idling, the zeroscaler scales the deployment to zero replicas.

Under normal circumstances, scaling a deployment to zero replicas poses a problem: any services that select pods from that deployment (and only that deployment) would lose all of their endpoints and become permanently unavailable. Osiris-enabled services, however, have their endpoints managed by the Osiris endpoints controller (instead of Kubernetes' built-in endpoints controller). The Osiris endpoints controller will automatically add Osiris activator endpoints to any Osiris-enabled service that has lost the rest of its endpoints.

The Osiris activator component receives traffic for Osiris-enabled services that are lacking any application endpoints. The activator initiates a scale-up of a corresponding deployment to a configurable minimum number of replicas (one, by default). When at least one application pod becomes ready, the request will be forwarded to the pod.

After the activator "reactivates" the deployment, the endpoints controller (described above) will naturally observe the availability of application endpoints for any Osiris-enabled services that select those pods and will remove activator endpoints from that service. All subsequent traffic for the service will, once again, flow directly to application pods... until a period of inactivity causes the zeroscaler to take the application offline again.

Scaling to zero and the HPA

Osiris is designed to work alongside the Horizontal Pod Autoscaler and is not meant to replace it-- it will scale your pods from n to 0 and from 0 to n, where n is a configurable minimum number of replicas (one, by default). All other scaling decisions may be delegated to an HPA, if desired.

This diagram better illustrates the different roles of Osiris, the HPA and the Cluster Autoscaler:

Setup

Prerequisites:

Helm (v3.0.0 or greater)
A running Kubernetes cluster.

Installation

Osiris' Helm chart is hosted in an Azure Container Registry, which does not yet support anonymous access to charts therein. Until this is resolved, adding the Helm repository from which Osiris can be installed requires use of a shared set of read-only credentials.

Make sure helm is initialized in your running kubernetes cluster.

For more details on initializing helm, Go here

helm repo add osiris https://osiris.azurecr.io/helm/v1/repo \
  --username eae9749a-fccf-4a24-ac0d-6506fe2a6ab3 \
  --password =s-e.2-84BhIo6LM6=/l4C_sFzxb=sT[

Installation requires use of the --devel flag to indicate pre-release versions of the specified chart are eligible for download and installation. The following command install the latest version of Osiris with the default values for all options - see the next section for all available installation options.

kubectl create namespace osiris-system
helm install osiris osiris/osiris-edge \
  --namespace osiris-system \
  --devel

Installation Options

Osiris global configuration is minimal - because most of it will be done by the users with annotations on the Kubernetes resources.

The following table lists the configurable parameters of the Helm chart and their default values.

Parameter	Description	Default
`zeroscaler.metricsCheckInterval`	The interval in which the zeroScaler would repeatedly track the pod http request metrics. The value is the number of seconds of the interval. Note that this can also be set on a per-deployment basis, with an annotation.	`150`

Example of installation with Helm and a custom configuration:

kubectl create namespace osiris-system
helm install osiris osiris/osiris-edge \
  --namespace osiris-system \
  --devel \
  --set zeroscaler.metricsCheckInterval=600

Usage

Osiris will not affect the normal behavior of any Kubernetes resource without explicitly being directed to do so.

To enabled the zeroscaler to scale a deployment with idling pods to zero replicas, annotate the deployment like so:

apiVersion: apps/v1
kind: Deployment
metadata:
  namespace: my-aoo
  name: my-app
  annotations:
    osiris.deislabs.io/enabled: "true"
spec:
  replicas: 1
  selector:
    matchLabels:
      app: my-app
  template:
    metadata:
      labels:
        app: nginx
      annotations:
        osiris.deislabs.io/enabled: "true"
    # ...
  # ...

Note that the template for the pod also uses an annotation to enable Osiris-- in this case, it enables the metrics-collecting proxy sidecar container on all of the deployment's pods.

In Kubernetes, there is no direct relationship between deployments and services. Deployments manage pods and services may select pods managed by one or more deployments. Rather than attempt to infer relationships between deployments and services and potentially impact service behavior without explicit consent, Osiris requires services to explicitly opt-in to management by the Osiris endpoints controller. Such services must also utilize an annotation to indicate which deployment should be reactivated when the activator component intercepts a request on their behalf. For example:

kind: Service
apiVersion: v1
metadata:
  namespace: my-namespace
  name: my-app
  annotations:
    osiris.deislabs.io/enabled: "true"
    osiris.deislabs.io/deployment: my-app
spec:
  selector:
    app: my-app
  # ...

Configuration

Most of Osiris configuration is done with Kubernetes annotations - as seen in the Usage section.

Deployment Annotations

The following table lists the supported annotations for Kubernetes Deployments and their default values.

Annotation	Description	Default
`osiris.deislabs.io/enabled`	Enable the zeroscaler component to scrape and analyze metrics from the deployment's pods and scale the deployment to zero when idle. Allowed values: `y`, `yes`, `true`, `on`, `1`.	no value (= disabled)
`osiris.deislabs.io/minReplicas`	The minimum number of replicas to set on the deployment when Osiris will scale up. If you set `2`, Osiris will scale the deployment from `0` to `2` replicas directly. Osiris won't collect metrics from deployments which have more than `minReplicas` replicas - to avoid useless collections of metrics.	`1`
`osiris.deislabs.io/metricsCheckInterval`	The interval in which Osiris would repeatedly track the pod http request metrics. The value is the number of seconds of the interval. Note that this value override the global value defined by the `zeroscaler.metricsCheckInterval` Helm value.	value of the `zeroscaler.metricsCheckInterval` Helm value

Pod Annotations

The following table lists the supported annotations for Kubernetes Pods and their default values.

Annotation	Description	Default
`osiris.deislabs.io/enabled`	Enable the metrics collecting proxy sidecar container to be injected into this pod. Allowed values: `y`, `yes`, `true`, `on`, `1`.	no value (= disabled)
`osiris.deislabs.io/ignoredPaths`	The list of (url) paths that should be "ignored" by Osiris. Requests to such paths won't be "counted" by the proxy. Format: comma-separated string.	no value

Service Annotations

The following table lists the supported annotations for Kubernetes Services and their default values.

Annotation	Description	Default
`osiris.deislabs.io/enabled`	Enable this service's endpoints to be managed by the Osiris endpoints controller. Allowed values: `y`, `yes`, `true`, `on`, `1`.	no value (= disabled)
`osiris.deislabs.io/deployment`	Name of the deployment which is behind this service. This is required to map the service with its deployment.	no value
`osiris.deislabs.io/loadBalancerHostname`	Map requests coming from a specific hostname to this service. Note that if you have multiple hostnames, you can set them with different annotations, using `osiris.deislabs.io/loadBalancerHostname-1`, `osiris.deislabs.io/loadBalancerHostname-2`, ...	no value
`osiris.deislabs.io/ingressHostname`	Map requests coming from a specific hostname to this service. If you use an ingress in front of your service, this is required to create a link between the ingress and the service. Note that if you have multiple hostnames, you can set them with different annotations, using `osiris.deislabs.io/ingressHostname-1`, `osiris.deislabs.io/ingressHostname-2`, ...	no value
`osiris.deislabs.io/ingressDefaultPort`	Custom service port when the request comes from an ingress. Default behaviour if there are more than 1 port on the service, is to look for a port named `http`, and fallback to the port `80`. Set this if you have multiple ports and using a non-standard port with a non-standard name.	no value
`osiris.deislabs.io/tlsPort`	Custom port for TLS-secured requests. Default behaviour if there are more than 1 port on the service, is to look for a port named `https`, and fallback to the port `443`. Set this if you have multiple ports and using a non-standard TLS port with a non-standard name.	no value

Note that you might see an osiris.deislabs.io/selector annotation - this is for internal use only, and you shouldn't try to set/update or delete it.

Demo

Deploy the example application hello-osiris :

kubectl create -f ./example/hello-osiris.yaml

This will create an Osiris-enabled deployment and service named hello-osiris.

Get the External IP of the hello-osiris service once it appears:

kubectl get service hello-osiris -o jsonpath='{.status.loadBalancer.ingress[*].ip}'

Point your browser to "http://<EXTERNAL-IP>", and verify that hello-osiris is serving traffic.

After about 2.5 minutes, the Osiris-enabled deployment should scale to zero replicas and the one hello-osiris pod should be terminated.

Make a request again, and watch as Osiris scales the deployment back to one replica and your request is handled successfully.

Limitations

It is a specific goal of Osiris to enable greater resource efficiency within Kubernetes clusters, in general, but especially with respect to "nodeless" Kubernetes options such as Virtual Kubelet or Azure Kubernetes Service Virtual Nodes, however, due to known issues with those technologies, Osiris remains incompatible with them for the near term.

Contributing

Osiris follows the CNCF Code of Conduct.

osiris's People

Contributors

Stargazers

Watchers

osiris's Issues

Is Osiris compatible with NodePort services?

Question:
Experimenting with Osiris in minikube. My osiris-flagged pods aren't being zeroscaled after 2.5 minutes, or at all. Logs don't appear to indicate any issues across deployments in the osiris-system namespace. The repo example service is of type LoadBalancer (and I saw a corresponding Osiris annotation). Is Osiris only compatible with services of this type?

Bug: Current algorithm for checking activity can miss some

Currently, the zeroscaler monitors activity only for deployments that are already scaled to their minimum and it does this on a periodic basis. For the sake of illustration, let's say imagine a one pod minimum that we check on every two minutes.

It's possible that a pod dies and is replaced with a new one in the two minutes between checks. This means whatever new activity might have occurred on the first pod isn't accounted for. In a scenario such as this, we should be assuming there was activity on the pod that died to avoid a premature scale-to-zero.

A similar case involves a pod that dies in the two minutes between checks and is replaced by a new pod that also dies in the same interval, and is replaced with a third pod. When the next check occurs, only the third pod will be checked for new metrics. Our current algorithm wouldn't even be aware of the short-lived pod that still might have had some activity.

We already have an open issue for improving our metrics-gathering approach (#17) in support of solutions to other issues (#15 and #16), so this issue isn't its own action item so much as it supplements #17 by documenting a specific deficiency with the current approach.

Osiris tries to activate already active service

Bug:

Activator works and scales up the deployment however it looks like Osiris does not register the fact that the deployment is now running and keeps attempting to scale up.

This is what is logged by activator for each request:

I1128 00:55:21.673849       1 request_handling.go:10] Request received for for host MY_DOMAIN_HERE
I1128 00:55:21.673865       1 request_handling.go:19] Deployment MY_SERVICE_HERE in namespace MY_NAMESPACE_HERE may require activation
I1128 00:55:21.673872       1 request_handling.go:51] Found NO activation in-progress for deployment MY_SERVICE_HERE in namespace MY_NAMESPACE_HERE
I1128 00:55:21.679078       1 activating.go:29] Activating deployment MY_SERVICE_HERE in namespace MY_NAMESPACE_HERE
I1128 00:55:21.682330       1 deployment_activation.go:116] App pod with ip 172.27.34.162 is in service

Does ZeroScaler works with Grpc Services (deployed on Pod) ?

Environment summary

K8s Master Info (e.g. AKS, ACS, Bare Metal, EKS)
AKS
Osiris Version
latest
Install Method (e.g. k8s YAMLs, Helm Chart)
helm chart

Issue Details

Let me start with, this is not an Issue rather this is a question, I couldn't find a discussion thread/community section where I can ask question. and hence, creating as an issue.

My question is related to functionalities of the "ZeroScaler", as per the comment in the Values.yaml file
[ The interval in which the zeroScaler would repeatedly track the pod http request metrics.
The value is the number of seconds of the interval.
metricsCheckInterval: 150 ]
If we look at the ":repeatedly track the pod http request metrics" part, just to ensure I am on the same page, what if the Pod opens only uses Grpc ....my understanding is since Grpc uses Http 2 underline, ZeroScaler will still be able to track the Grpc requests .... I mean to say if the Pod doesn't have a typical REST service but it has a Grpc Service, the ZeroScaler will still work ....is my understanding correct ?
Or does ZeroScaler will not work if the Pod has a Grpc service ... please confirm.

Improve metrics collection strategy

#15 and #16 will both require us to alter our approach to serving and aggregating metrics, since it is clear that request count is an inadequate metric upon which to make scaling decisions if we are to support protocols other than HTTP (e.g. HTTPS, HTTP2, gRPC).

The new approach may "simply" (it's not actually simple) involve counting active TCP connections over a period of time.

I'd like to suggest avoiding a dependency on Prometheus for as long as it can be avoided, as it raises the barrier to entry for using Osiris. In the near term, we can investigate options for metrics-collecting sidecar proxies to keep time series data in memory, with older data evicted as is become irrelevant.

Update labels to use to non-reserved prefix

It has been brought to my attention that the "kubernetes.io" prefix for labels and annotations is reserved and that using it for non-core projects isn't appropriate.

It's also worth acknowledging that this prefix reservation is not well documented.

WebSocket Support

Environment:

Kubernetes distribution: EKS
Kubernetes version: v1.11.8-eks-7c34c0
Osiris version: 0.0.1-2019.05.21.13.53.23-4f8deee
Install method: helm install osiris/osiris-edge --name osiris --namespace osiris-system --devel

What happened?
I have been starting to use this project to scale down our preview environments (which is really awesome btw!). However I came across issues when I tried to use this one of our services which relies on websockets. The browser gets a '502 Bad Gateway' response from the websocket requests. Checking the logs from the proxy sidecar I see this error:

http: proxy error: internal error: 101 switching protocols response with non-writable body

So I was wondering if websockets are supported, and what I might be missing to get this working?

What you expected to happen?
Websocket requests work correctly through the proxy.

Osiris VS. Knative serving ?

Question:

What are the diffenreces between Osiris and Knative serving ?

Thank you !

osiris-proxy sidecar cannot serve connections

Environment:

Kubernetes distribution (e.g. AKS, ACS Engine, GKE, EKS, etc.): GKE 1.13.6-gke.13
Kubernetes version (use kubectl version): v1.13.7
Osiris version (specify by SHA or semver): osiris-edge-0.0.1-2019.07.31.16.58.30-0e6ffe9
Install method (specify exact helm install command used):
helm install osiris/osiris-edge --name osiris --namespace osiris-system --devel

What happened?
After installing the chart and annotating both the service and the deployment, the osiris-proxy sidecar log shows that it cannot serve connections as the connection is not recognized as being used for HTTP or TLS.

kubectl logs -f wordpress-c666cd99b-fhl7s -c osiris-proxy
I0821 13:02:25.386322       1 proxy.go:12] Starting Osiris Proxy -- version devel -- commit 0e6ffe9
I0821 13:02:25.386544       1 proxy.go:115] Healthz and metrics server is listening on :5002
E0821 13:02:56.990789       1 dynamic_proxy.go:116] Error serving connection: Connection not recognized as being used for HTTP or TLS
E0821 13:03:01.990571       1 dynamic_proxy.go:116] Error serving connection: Connection not recognized as being used for HTTP or TLS
E0821 13:03:06.990499       1 dynamic_proxy.go:116] Error serving connection: Connection not recognized as being used for HTTP or TLS
E0821 13:03:11.990673       1 dynamic_proxy.go:116] Error serving connection: Connection not recognized as being used for HTTP or TLS
E0821 13:03:16.990511       1 dynamic_proxy.go:116] Error serving connection: Connection not recognized as being used for HTTP or TLS
E0821 13:03:21.990754       1 dynamic_proxy.go:116] Error serving connection: Connection not recognized as being used for HTTP or TLS
...

What you expected to happen?
I expected not to see errors in the osiris-proxy logs.

How to reproduce it (as minimally and precisely as possible):
See attached service and deployment yaml.
osiris-test.yaml.zip

Anything else that we need to know?

Using Osiris ZeroScaler with Istio Proxy Injected pods

Question:
We are trying to use Osiris ZeroScaler with Istio proxy injected pods. Looks like its scaling the pods to 0 when there is no traffic. However, it does not ever scale them back up. We set the Annotations as below on the Service
"osiris.deislabs.io/deployment": "pods-istio",
"osiris.deislabs.io/enabled": "true",
"osiris.deislabs.io/ingressDefaultPort": "80",
"osiris.deislabs.io/ingressHostname": "pods-istio.example.com",
"osiris.deislabs.io/loadBalancerHostname": "pods-istio.example.com"

Is it possible to make it work with Istio enabled deployments which has Istio Ingress Gateway, Virtual service, Destination Rules in front of it?

helm install with --version specified fails

Environment:

Kubernetes distribution (e.g. AKS, ACS Engine, GKE, EKS, etc.): AKS
Kubernetes version (use kubectl version):

Client Version: version.Info{Major:"1", Minor:"18", GitVersion:"v1.18.5", GitCommit:"e6503f8d8f769ace2f338794c914a96fc335df0f", GitTreeState:"archive", BuildDate:"2020-07-01T16:28:46Z", GoVersion:"go1.14.4", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"17", GitVersion:"v1.17.9", GitCommit:"e3808385c7b3a3b86db714d67bdd266dc2b6ab62", GitTreeState:"clean", BuildDate:"2020-07-15T20:50:36Z", GoVersion:"go1.13.6", Compiler:"gc", Platform:"linux/amd64"}

Osiris version (specify by SHA or semver):
All
Install method (specify exact helm install command used):
helm install

What happened?

After adding the repo as per the installation instructions, if I run:
$ helm install osiris osiris/osiris-edge --namespace osiris-system --devel --debug
it successfully installs chart: osiris-edge-0.0.1-2019.11.26.17.17.31-2bf13af

If I try to specify that version myself:
$ helm install osiris osiris/osiris-edge --namespace osiris-system --version osiris-edge-0.0.1-2019.11.26.17.17.31-2bf13af --debug
an error occurs:

install.go:159: [debug] Original chart version: "osiris-edge-0.0.1-2019.11.26.17.17.31-2bf13af"
Error: chart "osiris-edge" matching osiris-edge-0.0.1-2019.11.26.17.17.31-2bf13af not found in osiris index. (try 'helm repo update'): improper constraint: osiris-edge-0.0.1-2019.11.26.17.17.31-2bf13af
helm.go:84: [debug] improper constraint: osiris-edge-0.0.1-2019.11.26.17.17.31-2bf13af
chart "osiris-edge" matching osiris-edge-0.0.1-2019.11.26.17.17.31-2bf13af not found in osiris index. (try 'helm repo update')
helm.sh/helm/v3/pkg/downloader.(*ChartDownloader).ResolveChartVersion
	helm.sh/helm/v3/pkg/downloader/chart_downloader.go:234
helm.sh/helm/v3/pkg/downloader.(*ChartDownloader).DownloadTo
	helm.sh/helm/v3/pkg/downloader/chart_downloader.go:87
helm.sh/helm/v3/pkg/action.(*ChartPathOptions).LocateChart
	helm.sh/helm/v3/pkg/action/install.go:667
main.runInstall
	helm.sh/helm/v3/cmd/helm/install.go:171
main.newInstallCmd.func1
	helm.sh/helm/v3/cmd/helm/install.go:117
github.com/spf13/cobra.(*Command).execute
	github.com/spf13/[email protected]/command.go:842
github.com/spf13/cobra.(*Command).ExecuteC
	github.com/spf13/[email protected]/command.go:950
github.com/spf13/cobra.(*Command).Execute
	github.com/spf13/[email protected]/command.go:887
main.main
	helm.sh/helm/v3/cmd/helm/helm.go:83
runtime.main
	runtime/proc.go:203
runtime.goexit
	runtime/asm_amd64.s:1373

What you expected to happen?
Successful installation when version number is specified.

How to reproduce it (as minimally and precisely as possible):
As above.

Anything else that we need to know?

$helm version
version.BuildInfo{Version:"v3.2.4", GitCommit:"0ad800ef43d3b826f31a5ad8dfbb4fe05d143688", GitTreeState:"clean", GoVersion:"go1.14.4"}

Appears to be an issue with the chart version numbering / semver compliance.
This potentially may be helpful: jupyterhub/chartpress#86 (comment)

Failed requests with Osiris proxy

Environment:

Kubernetes distribution (e.g. AKS, ACS Engine, GKE, EKS, etc.): GKE & docker for mac
Kubernetes version (use kubectl version):

for GKE:

Client Version: version.Info{Major:"1", Minor:"15", GitVersion:"v1.15.3", GitCommit:"2d3c76f9091b6bec110a5e63777c332469e0cba2", GitTreeState:"clean", BuildDate:"2019-08-19T12:38:00Z", GoVersion:"go1.12.9", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"13+", GitVersion:"v1.13.10-gke.0", GitCommit:"569511c9540f78a94cc6a41d895c382d0946c11a", GitTreeState:"clean", BuildDate:"2019-08-21T23:28:44Z", GoVersion:"go1.11.13b4", Compiler:"gc", Platform:"linux/amd64"}

for docker-for-mac:

Client Version: version.Info{Major:"1", Minor:"15", GitVersion:"v1.15.3", GitCommit:"2d3c76f9091b6bec110a5e63777c332469e0cba2", GitTreeState:"clean", BuildDate:"2019-08-19T12:38:00Z", GoVersion:"go1.12.9", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"14", GitVersion:"v1.14.6", GitCommit:"96fac5cd13a5dc064f7d9f4f23030a6aeface6cc", GitTreeState:"clean", BuildDate:"2019-08-19T11:05:16Z", GoVersion:"go1.12.9", Compiler:"gc", Platform:"linux/amd64"}

Osiris version (specify by SHA or semver): bb78180 & 472cbec
Install method (specify exact helm install command used):

helm install osiris/osiris-edge --name osiris --devel

What happened?

We used to run on GKE with an old version of Osiris - bb78180, before the http2 PR. Everything worked nicely for months.
We recently upgraded to a more recent commit to benefit from recent changes - 472cbec, after the http2 PR. And now we have failed requests from time to time. Rollbacking to the previous version fixed the issue.

What you expected to happen?

no failed requests.

How to reproduce it (as minimally and precisely as possible):

on docker-for-mac:

$ helm install osiris/osiris-edge --name osiris --devel
$ kubectl apply -f example/hello-osiris.yaml
$ ab -n 1000 -c 5 http://localhost:8080/

and then rollback to an older version:

$ helm upgrade --reset-values osiris osiris/osiris-edge --version=0.0.1-2018.12.21.23.19.56-bb78180
$ kubectl scale deployment hello-osiris --replicas 0
$ kubectl scale deployment hello-osiris --replicas 1
$ ab -n 1000 -c 5 http://localhost:8080/

Anything else that we need to know?

Output of the ab run with commit 472cbec (broken version):

This is ApacheBench, Version 2.3 <$Revision: 1807734 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/

Benchmarking localhost (be patient)
Completed 100 requests
Completed 200 requests
Completed 300 requests
Completed 400 requests
Completed 500 requests
Completed 600 requests
Completed 700 requests
Completed 800 requests
Completed 900 requests
Completed 1000 requests
Finished 1000 requests


Server Software:
Server Hostname:        localhost
Server Port:            8080

Document Path:          /
Document Length:        13 bytes

Concurrency Level:      5
Time taken for tests:   0.849 seconds
Complete requests:      1000
Failed requests:        20
   (Connect: 0, Receive: 0, Length: 20, Exceptions: 0)
Total transferred:      127400 bytes
HTML transferred:       12740 bytes
Requests per second:    1177.77 [#/sec] (mean)
Time per request:       4.245 [ms] (mean)
Time per request:       0.849 [ms] (mean, across all concurrent requests)
Transfer rate:          146.53 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   0.1      0       1
Processing:     1    4   1.6      4      14
Waiting:        0    4   1.6      3      14
Total:          2    4   1.6      4      14

Percentage of the requests served within a certain time (ms)
  50%      4
  66%      4
  75%      5
  80%      5
  90%      6
  95%      7
  98%      9
  99%     10
 100%     14 (longest request)

Notice the 20 failed requests.

Logs of the osiris-proxy container on the hello-osiris pod:

I1014 07:22:31.007410       1 proxy.go:12] Starting Osiris Proxy -- version devel -- commit 472cbec
I1014 07:22:31.007683       1 proxy.go:115] Healthz and metrics server is listening on :5004

Logs of the hello-osiris container:

2019/10/14 07:22:30 Listening for HTTP/1.x without TLS on :8080
2019/10/14 07:22:30 Listening for h2c (HTTP/2 without TLS) on :8081
2019/10/14 07:22:30 Listening for HTTPS (HTTP/1.x OR HTTP/2 with TLS) on :4430
2019/10/14 07:22:30 Listening for insecure gRPC (no TLS) on :8082
2019/10/14 07:22:30 Note: Due to limitations of SNI, Osiris only supports one TLS-enabled port per application, so this example does not demonstrate gRPC with TLS, although this combination should work.
2019/10/14 07:22:42 Received: GET / HTTP/1.1 (without TLS)
...

$ kubectl logs hello-osiris-6fdc75d555-l8dxj hello-osiris | grep "Received: GET / HTTP/1.1 (without TLS)" | wc -l
    1000

Output of the proxy metrics on :5004/metrics:

{
    "proxyId": "9ac00e99-ec38-4fda-8ea2-f41a26a602d6",
    "connectionsOpened": 1000,
    "connectionsClosed": 1000
}

Output of the ab run with commit bb78180 (good old version):

This is ApacheBench, Version 2.3 <$Revision: 1807734 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/

Benchmarking localhost (be patient)
Completed 100 requests
Completed 200 requests
Completed 300 requests
Completed 400 requests
Completed 500 requests
Completed 600 requests
Completed 700 requests
Completed 800 requests
Completed 900 requests
Completed 1000 requests
Finished 1000 requests


Server Software:
Server Hostname:        localhost
Server Port:            8080

Document Path:          /
Document Length:        13 bytes

Concurrency Level:      5
Time taken for tests:   0.881 seconds
Complete requests:      1000
Failed requests:        0
Total transferred:      130000 bytes
HTML transferred:       13000 bytes
Requests per second:    1134.97 [#/sec] (mean)
Time per request:       4.405 [ms] (mean)
Time per request:       0.881 [ms] (mean, across all concurrent requests)
Transfer rate:          144.09 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   0.1      0       2
Processing:     2    4   2.0      4      22
Waiting:        1    4   1.8      3      20
Total:          2    4   2.0      4      22

Percentage of the requests served within a certain time (ms)
  50%      4
  66%      5
  75%      5
  80%      5
  90%      6
  95%      8
  98%     10
  99%     12
 100%     22 (longest request)

I'll try to investigate more, but I just wanted to write a bug report first, maybe you'll have an idea of the issue.

HPA not working since osiris-proxy sidecar has no option to set resource request or limit

Environment:

Kubernetes distribution (e.g. AKS, ACS Engine, GKE, EKS, etc.): AWS EKS
Kubernetes version (use kubectl version): 1.14
Osiris version (specify by SHA or semver): 113a458
Install method (specify exact helm install command used): helm install osiris/osiris-edge --name osiris --namespace osiris-system --devel

What happened?
Tried to configure Horizontal Pod Autoscaler on Pod where Osiris is enabled.
HPA complains about a missing resource request, it works without the osiris-proxy sidecar.

Name:                                                  example-app-hpa
Namespace:                                             tenant-1
Labels:                                                <none>
Annotations:                                           kubectl.kubernetes.io/last-applied-configuration:
                                                         {"apiVersion":"autoscaling/v1","kind":"HorizontalPodAutoscaler","metadata":{"annotations":{},"name":"example-app-hpa","namespace":"tenant-...
CreationTimestamp:                                     Fri, 18 Oct 2019 14:41:41 +0200
Reference:                                             Deployment/example-app-deployment
Metrics:                                               ( current / target )
  resource cpu on pods  (as a percentage of request):  <unknown> / 85%
Min replicas:                                          1
Max replicas:                                          5
Deployment pods:                                       0 current / 0 desired
Conditions:
  Type           Status  Reason             Message
  ----           ------  ------             -------
  AbleToScale    True    SucceededGetScale  the HPA controller was able to get the target's current scale
  ScalingActive  False   ScalingDisabled    scaling is disabled since the replica count of the target is zero
Events:
  Type     Reason                        Age                From                       Message
  ----     ------                        ----               ----                       -------
  Warning  FailedGetResourceMetric       14m (x5 over 15m)  horizontal-pod-autoscaler  unable to get metrics for resource cpu: no metrics returned from resource metrics API
  Warning  FailedComputeMetricsReplicas  14m (x5 over 15m)  horizontal-pod-autoscaler  failed to get cpu utilization: unable to get metrics for resource cpu: no metrics returned from resource metrics API
  Warning  FailedComputeMetricsReplicas  12m (x7 over 14m)  horizontal-pod-autoscaler  failed to get cpu utilization: missing request for cpu
  Warning  FailedGetResourceMetric       12m (x8 over 14m)  horizontal-pod-autoscaler  missing request for cpu

What you expected to happen?
HPA should work on Pod where Osiris is enabled.
There should be a possibility to configure the osiris-proxy sidecar container.

How to reproduce it (as minimally and precisely as possible):
Enable Osiris on a deployment, configure HPA for this deployment too.
Osiris will work, HPA will report a missing resource request.

Anything else that we need to know?
Didn't found any possibility to set the resources on the osiris-proxy sidecar container.
I tried setting the resources on all containers by passing it in a yaml file when installing with helm but this didn't affected the osiris-proxy sidecar.

Not able to access the azure container repository to fetch helm chart

Couldn't fetch the repo using the below creds:

https://osiris.azurecr.io/helm/v1/repo
--username eae9749a-fccf-4a24-ac0d-6506fe2a6ab3
--password =s-e.2-84BhIo6LM6=/l4C_sFzxb=sT[

Throws a 401.

@krancour @yaron2

Wrong annotation prefix for LB/ingress hostname

Environment:

Kubernetes distribution (e.g. AKS, ACS Engine, GKE, EKS, etc.):
Kubernetes version (use kubectl version):
Osiris version (specify by SHA or semver): 0.0.1-2018.12.17.14.39.01-f3f1817
Install method (specify exact helm install command used):

What happened?

tried the basic example from the README, didn't work (got a 404) because "request_handling.go:25] No deployment found for host xxx.yyy"
tried using the osiris.deislabs.io/loadBalancerHostname found in the hello-osiris example, and then the osiris.deislabs.io/ingressHostname, but it didn't worked

What you expected to happen?

making a request would scale up the deployment, and I would get a 200 response

How to reproduce it (as minimally and precisely as possible):

Anything else that we need to know?

I made it work by looking at the code, and found that the annotation prefix should be osiris.io - see https://github.com/deislabs/osiris/blob/master/pkg/deployments/activator/index.go

Dependency management switch from dep to mod

I'd like to provide some contributions and the first thing I noticed is that the package is still relying on dep rather than Go mod that is provided since 1.11.

JFI I'm happy to take care of this issue.

What would you like to be added?

go.{mod.sum} files following the go mod shadow vendoring (no more need to commit vendor files, too).

Why is this needed?

dep is no more widely used, also considering that the vendoring is fully provided by Go itself without the need to rely on a third-party package.

HTTP Azure Function Trigger not scaling up

Environment:

Kubernetes distribution - AKS - 3 node cluster
Kubernetes version - v1.15.0
Osiris version: Latest
Install method: Azure Core Tools

What happened?
I was trying to deploy an HTTP Trigger to keda. I have installed Osiris components for the same. It helped me to scale to zero when no request is coming, but it is not scaling up from 1 instance. I have removed all replica-constraints from deploy.yaml file still no effects. Can you help me with any supportive links?

What you expected to happen?
I expect the nodes to be scaled up when doing a load test for 100 users. But it always show 1 instance
How to reproduce it (as minimally and precisely as possible):
Deploy a Http trigger to Keda and load test >100 user
Anything else that we need to know?

Delta for adding TCP support?

What would you like to be added?
Support for TCP connections (instead of HTTP-only)

Why is this needed?
Interested in downscaling pods that accept TCP connection and don't make use of HTTP

I saw in another ticket (about gRCP and HTTP2) that Osiris relies on L7 proxies and thus only supports HTTP currently. What work would be needed in order to support TCP? Are there any known blockers or limitations that make it impossible altogether? If so, why/why not?

Assuming that it's possible, I'd be interested in helping out

Using a label selector to link a service to a deployment

Question:

Hi guys,

quick question: why don't you use a label selector to link the deployment and the service ? instead of using the deployment name.

I'm asking because using the deployment name can be painful, for example when using helm - see helm/helm#2492 for example. It would have been much easier for us to use a label selector.

Thanks

Newly created osiris pods cannot activate already scaled to zero deployments

Environment:

Kubernetes distribution (e.g. AKS, ACS Engine, GKE, EKS, etc.): GKE

Kubernetes version (use kubectl version):

Client Version: version.Info{Major:"1", Minor:"15+", GitVersion:"v1.15.11-dispatcher", GitCommit:"f2a77f678d6baccda27740d700f6cba2754dfacf", GitTreeState:"clean", BuildDate:"2020-04-21T04:44:03Z", GoVersion:"go1.12.17", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"15+", GitVersion:"v1.15.12-gke.2", GitCommit:"fb7add51f767aae42655d39972210dc1c5dbd4b3", GitTreeState:"clean", BuildDate:"2020-06-01T22:20:10Z", GoVersion:"go1.12.17b4", Compiler:"gc", Platform:"linux/amd64"}

Osiris version (specify by SHA or semver):

  kubectl get deployment/osiris-osiris-edge-zeroscaler  -n osiris-system -oyaml | grep image:
      image: osiris.azurecr.io/osiris:2bf13af

I think you should tell us how to find it.

Install method (specify exact helm install command used): Sorry, I'm don't remember it. It will be great if install method is recorded as changecause or some annotations.

What happened?

A osiris enabled deployment/service was scaled to zero after some idling.
Osiris pods are re-scheduled by some reason (I'm using GKE preemptive nodes. So pods are constantly evicted).
Make a request to the osiris enabled deployment/service, but it was not activated again.

I see a log in the edge-zero-scaler:

$ kubectl logs -f deployment/osiris-osiris-edge-zeroscaler  -n osiris-system
...
 I0804 11:44:34.857072       1 zeroscaler.go:77] Notified about new or updated Osiris-enabled deployment my-deployment in namespace default
 I0804 11:44:34.857077       1 zeroscaler.go:94] Osiris-enabled deployment my-deployment in namespace default is running zero replicas OR more than the minimum number of replicas; ensuring NO metrics collection
...

What you expected to happen?
Osiris activate it again.

How to reproduce it (as minimally and precisely as possible):
See the what happen section.

Anything else that we need to know?
Please leave a comment if you have a question.

Plans for v0.1

Question:
Are there plans for releasing a first stable version such as v0.1?

Is there an annotation available to increase idle time for scale-down to zero

Question:
Dear team, is there a configurable option for default idle timeout to scale down to zero? Please let me know, if osiris supports statefulset?

Downscaling of multiple deployments

There are cases where it would be nice to turn off multiple deployments/statefulsets based on traffic coming to a single service. For example, when we have a web application with a database behind it.

It would be possible to use the annotation on all services, but then each deployment would be upscaled sequentially, most likely resulting in a timeout.

A similar pattern is apparently used in OpenShift (https://github.com/openshift/service-idler).

What would you like to be added?
Update the service annotation so it can take references to multiple deployments.

Why is this needed?
Make it possible to scale to 0 more complex applications.

Support HTTPS

What would you like to be added?

Support for HTTPS.

Why is this needed?

Currently, HTTPS is not supported by Osiris at all. For the most common case, this can be worked around by using an ingress controller that terminates SSL and makes HTTP calls to backend services, but this inadequately addresses two somewhat common use cases:

Applications / organizations with an end-to-end encryption requirement.
Intra-cluster traffic that may have an encryption requirement, but is not routed through an ingress controller since it does not originate from outside the cluster.

Wildcard hostnames

What would you like to be added?
Ability to specify hostname like

    osiris.deislabs.io/ingressHostname: "*.example.com"

Why is this needed?
In our case there is no set list of hostnames - when you create account and get a new subdomain. It is impossible to list them all, adding wildcard support would solve it.

Update metrics-collecting proxy and activator to use httputil.ReverseProxy

This will do a petter job of preserving all request details (headers, for instance) than we currently do today. (We're not very careful about that currently.)

502 gateway errors when a new request comes in while zero scaler is terminating PODs

Environment:

Kubernetes distribution (e.g. AKS, ACS Engine, GKE, EKS, etc.): Oracle Kubernetes Engine
Kubernetes version (use kubectl version): v1.10.11
Osiris version (specify by SHA or semver): master branch
Install method (specify exact helm install command used): helm

What happened?
When zero scaler is terminating pods after reaching metrics interval threshold. During termination process, if a new request comes in simultaneously. We are running into 502 errors because either it is possible the traffic is reaching the pods being terminated or PODs being activated is taking time to bring up the containers.
What you expected to happen?
The request should never fail even when zero scaler is terminating pods i.e. there should be some retry or timeout which can help to recover the 502 errors
How to reproduce it (as minimally and precisely as possible):
This is happening with nginx ingress controller lbr endpoint when we configure domain name using annotation in service yaml.
Have a nginx ingress controller based lbr with ingress configured for service end point.
Wait for zero scaler PODs termination to kick off after metrics interval threshold is reached. When the pods are being terminated submit a request from lbr you will see 502 errros. It is easily reproducible.
Anything else that we need to know?
Are there any kubernetes ingress level annotations which can avoid 502 errors and still be able to retry and timeout for this scenario?

Never scale up the deployment back to one replica once scale down to zero

Environment:

Kubernetes distribution (e.g. AKS, ACS Engine, GKE, EKS, etc.):
Kubernetes version (use kubectl version):
Osiris version (specify by SHA or semver):
Install method (specify exact helm install command used):
helm install

What happened?
Never scale up again the deployment back to one replica

What you expected to happen?
Should scale up again the deployment back to one replica

How to reproduce it (as minimally and precisely as possible):
I tried to validate the example deployment in my cluster and once Osiris-enabled deployment scale to zero replica and hello-osiris pod get terminated it never scale up again the deployment back to one replica

Anything else that we need to know?

Proposal: proxy-less mode

This is a proposal for a new feature, so we can get an agreement before starting coding ;-)

The idea is to allow multiple implementations of the "metrics collector", with the default one still being the osiris auto-injected proxy, and with at least a new one: a prometheus-based metrics collector.
This new collector would collect metrics from an already existing prometheus endpoint exposed by the pod. It would need the following input:

port exposed by the pod on which the prometheus endpoint is available
path on which the prometheus metrics data is exposed. default to /metrics
metrics names to collect. To be compliant with the current (and default) metrics collector, we would need 2 metrics: 1 for the opened connections, and another one for the closed connections.

This new feature will bring the following benefits:

complete control over how a request is counted, ie no need to use the ignoredPaths, and if needed requests can be ignored based on different input: user-agent, source IP, ...
allow the use of another tool that inject a transparent proxy as a sidecar container, like a service mesh.
avoid the "cost" of the proxy, and the possible issues that could come from using it (#45 for example...)

I was thinking about adding a new annotation on the deployments: osiris.deislabs.io/metricsCollector, using a JSON value - similar to what datadog is doing with https://docs.datadoghq.com/getting_started/integrations/prometheus/?tab=kubernetes :

metadata:
  annotations:
    osiris.deislabs.io/metricsCollector: |
      {
        "type": "prometheus",
        "implementation": {
          "port": "8080",
          "path": "/metrics",
          "metrics": {
            "openedConnections": "http_req_new",
            "closedConnections": "http_req_closed"
          }
        }
      }

this JSON would have the following schema:

type: name of the collector implementation. default to osiris
implementation: a RawJSON that each implementation can use as they see fit.
we could also imagine moving the osiris.deislabs.io/metricsCheckInterval annotation to a checkInterval field here - to avoid too many annotations.

what do you think ?

Error creating endpoints manager for service <SERVICE NAME> in namespace <NAMESPACE>: Selector not found for service <SERVICE NAME> in namespace <NAMESPACE>

Question:

In the endpoint controller I'm getting

Error creating endpoints manager for service pa-service in namespace prism-fb-osiris-fb: Selector not found for service pa-service in namespace prism-fb-osiris-fb

What could be causing this?

The problem I'm having is my pods aren't scaling to zero, not sure if this is what's causing it

Question: what is the state of the project?

... but especially with respect to "nodeless" Kubernetes options such as Virtual Kubelet

Does this still apply?
Thanks

Support HTTP2

What would you like to be added?

Support for HTTP2.

Why is this needed?

HTTP2 is increasingly common, especially as the underlying transport for gRPC.

N.B.: HTTP2 does not strictly require the use of TLS, however, many HTTP2 clients do. Mileage from this feature may vary unless/until #15 is also addressed.

ReadinessProbe Settings way too high

Since the readiness probe did not get configured apart from the path to check, it takes its default settings.

That means by default 10 seconds, so a non started service will be delayed by 10 seconds. Set periodSeconds to 1 successThreshold to 1 and failureThreshold to 5 and services will be available in 3-5 seconds instead.

Pulling Osiris on an AKS cluster

Environment summary

K8s Master Info (e.g. AKS, ACS, Bare Metal, EKS)
AKS
Osiris Version
latest
Install Method (e.g. k8s YAMLs, Helm Chart)
helm chart

Issue Details

The images go in pull back off errors.

Repo Steps

Used the readme instructions
After installing the chart do a 'helm repo up' to work around the issue.

Can Osiris properly detect an endpoint on a related Ingress for my app ?

I'm new to Go, and very new to Kubernetes.

As I can gather from the source code, once the activator detects a request on a known hostname, it :

Keeps the connection alive until the pod has been successfully started and is reachable
Retrieves the app from the list of apps on which Osiris is enabled
Sets the minReplica for the deployment back to 1
Waits for the app to be reachable, using k8s library's SharedIndexInformer that watches a list of endpoints that match a specific selector
Once reachable, syncs the hijacker and the newly started app endpoints

I have a problem with my config that throws the following error messages:

E0625 19:28:51.850358       1 deployment_activation.go:71] Activation of deployment hello-osiris in namespace default timed out
E0625 19:28:51.850452       1 proxy.go:97] Error executing start proxy callback for host "apps.contoso.io": Timed out waiting for activation of deployment hello-osiris in namespace default: %!s(<nil>)

My app's ingress name was previously suffixed with -customer-ingress. I removed that, thinking that it would help (if my service and my ingress had the same metadata.name) but no luck with that, the scale from zero still times out.

Here is the config of my app, with the added Ingress from the example in this repo:

apiVersion: v1
kind: Service
metadata:
  name: hello-osiris
  labels:
    app: hello-osiris
  annotations:
    osiris.deislabs.io/enabled: "true"
    osiris.deislabs.io/deployment: hello-osiris
    osiris.deislabs.io/ingressHostname: apps.contoso.io
spec:
  type: ClusterIP
  ports:
  - name: http1
    port: 8080
    targetPort: 80
  - name: http2
    port: 8080
    targetPort: 8080
  selector:
    app: hello-osiris
---
apiVersion: extensions/v1beta1
kind: Ingress
metadata:
  name: hello-osiris
  annotations:
    kubernetes.io/ingress.class: traefik
spec:
  rules:
  - host: apps.contoso.io
    http:
      paths:
      - path: /hello-osiris
        backend:
          serviceName: hello-osiris
          servicePort: http1

---
apiVersion: v1
kind: Secret
metadata:
  name: hello-osiris-cert
  labels:
    app: hello-osiris
data:
  server.crt: LS0tLS1CRUdJTiBDRVJUSUZJQ0FURS0tLS0tCk1JSUQyRENDQXNBQ0NRQy9HeC92dExTTlZEQU5CZ2txaGtpRzl3MEJBUXNGQURDQnJURUxNQWtHQTFVRUJoTUMKVlZNeEV6QVJCZ05WQkFnTUNsZGhjMmhwYm1kMGIyNHhFREFPQmdOVkJBY01CMUpsWkcxdmJtUXhFakFRQmdOVgpCQW9NQ1VSbGFYTWdUR0ZpY3pFVU1CSUdBMVVFQ3d3TFJXNW5hVzVsWlhKcGJtY3hJVEFmQmdOVkJBTU1HR2hsCmJHeHZMVzl6YVhKcGN5NWtaV2x6YkdGaWN5NXBiekVxTUNnR0NTcUdTSWIzRFFFSkFSWWJhMlZ1ZEM1eVlXNWoKYjNWeWRFQnRhV055YjNOdlpuUXVZMjl0TUI0WERURTVNREl5TXpBd01UazBNVm9YRFRJd01ESXlNekF3TVRrMApNVm93Z2EweEN6QUpCZ05WQkFZVEFsVlRNUk13RVFZRFZRUUlEQXBYWVhOb2FXNW5kRzl1TVJBd0RnWURWUVFICkRBZFNaV1J0YjI1a01SSXdFQVlEVlFRS0RBbEVaV2x6SUV4aFluTXhGREFTQmdOVkJBc01DMFZ1WjJsdVpXVnkKYVc1bk1TRXdId1lEVlFRRERCaG9aV3hzYnkxdmMybHlhWE11WkdWcGMyeGhZbk11YVc4eEtqQW9CZ2txaGtpRwo5dzBCQ1FFV0cydGxiblF1Y21GdVkyOTFjblJBYldsamNtOXpiMlowTG1OdmJUQ0NBU0l3RFFZSktvWklodmNOCkFRRUJCUUFEZ2dFUEFEQ0NBUW9DZ2dFQkFMZEJZTFlwT2NaSUdEWWs2Qzl5T1hreE8wcUZQbVkzNGVWanBrc0oKZjZqaUVEUDFWZlBoWXV6TnRwMUY0ZWhlR1h3WEU0cmt4ZjQ5bExtcmU4L3g4NUh6RHNKK1NNdnlaZ09XZjZMcgpoNE53aVBKcmNjcDhGTVlXMmtJenJiVWZFS0wxYUZ1VCtkRXc3NkgxRlhPUmsvS0Y0V3JMYXhkRlBPbDhLMWVPCm1NazFtSkU3NTNYZzVYd2FVVUVHZ2tGbUZkZHJhQ2N3Y1U0QmtnbXRObTdFTExJQ2Nnb3MzNHVmR21ndmN2ZkwKSWhuenZxNmxCNDM4a0hyaG16OG11WFVwYjhQa1k2NmtxRGpxTk53YlBsLzJrMjYvVTg1RUVlazI0YnowMzlBZApsUnpKUTFndUhacmxKOTBQckl0aFJzM3NNZmdzWGtIaGZDR0J5dlVXYWZubUZPY0NBd0VBQVRBTkJna3Foa2lHCjl3MEJBUXNGQUFPQ0FRRUFzckltVURVK0E2YWNHVGloK3N5c2lpQWVWYWtsL0FNcytvWXJIa1NPK2NrOXVNcFUKMUw3dUtrNDZBSGZ0dkplbXJqaFBObHJpSmZOSEh5bEZ0YlpkTjRqM2RmL3p1L1ExbVRMa2dLUXJWZkl6ZFF2eApVUlEyOXB2ZWFBdFJyL0x6VXZINllWRE5lTk9wWXk3ZEJJT1ZqcGpjZFJ5amRHZE1xejBLbEhvUGlnbEdiUEFWCldzdXBmbWI3Nzd1Q3ZtUGRHc1lwb2wvTE9jOU44ZUE0VUdPNk9sWmtWU0NGOTJjSk9oaCtyd1c0cktTOXZFTTcKRzJmaXZVVDZJWCtFamdGQzB0ZytLMkNSSjRoTElnNTFSc0lmdEllNk01MFBpOCsvc3d6andYV2ZuQmpkUWkyegpudC9XYmR3THA2Q2pMR01UdVNrZmVGam00Z2QzM2cyMmRSY09RZz09Ci0tLS0tRU5EIENFUlRJRklDQVRFLS0tLS0K
  server.key: LS0tLS1CRUdJTiBSU0EgUFJJVkFURSBLRVktLS0tLQpNSUlFcEFJQkFBS0NBUUVBdDBGZ3RpazV4a2dZTmlUb0wzSTVlVEU3U29VK1pqZmg1V09tU3dsL3FPSVFNL1ZWCjgrRmk3TTIyblVYaDZGNFpmQmNUaXVURi9qMlV1YXQ3ei9IemtmTU93bjVJeS9KbUE1Wi9vdXVIZzNDSThtdHgKeW53VXhoYmFRak90dFI4UW92Vm9XNVA1MFREdm9mVVZjNUdUOG9YaGFzdHJGMFU4Nlh3clY0Nll5VFdZa1R2bgpkZURsZkJwUlFRYUNRV1lWMTJ0b0p6QnhUZ0dTQ2EwMmJzUXNzZ0p5Q2l6Zmk1OGFhQzl5OThzaUdmTytycVVICmpmeVFldUdiUHlhNWRTbHZ3K1JqcnFTb09PbzAzQnMrWC9hVGJyOVR6a1FSNlRiaHZQVGYwQjJWSE1sRFdDNGQKbXVVbjNRK3NpMkZHemV3eCtDeGVRZUY4SVlISzlSWnArZVlVNXdJREFRQUJBb0lCQUdVS2lDK0lSWkc5V0pRcAovMWVCekl5MUIzTU1TcDZEdTJzR2FiOC82b0tNdXRCYk9sd3c3cUdRdjFxeUdHQk4yaEZnaStidVF2anVyVjArClh4TUYzZjJnSFloQnB4UEVnRmtFRnpZV1ZXNjBrdDNQUGp1ZDlMcFFDV0d0S3Q4TjFOZDFKbWd3Qy9NNjN6WFcKYzFCNGVUR2tmZWlyWmsyN1lGMkFtRWs3bDZTQWw4SXVSbHNvUnpSVjBjTGxDQ1ZqeGZrWFFaTko3d0tnQzk2RAovSXppTVhzZ2h3MEd5NFg2L1FadjZnSGFIdHN0d1lITkFyeUF4eExYMVNQOTJhc1BEdWFXYUFMbEJzdGRSc2RtCkpIdU43dWtuMGFHZEMyeVAwQWlDS2prM1NjUnpVdHhyMlBqb3d4WGYwd3BDYUFYN2xVenk2RWpId29VTmg1VEcKenlhRTRoa0NnWUVBN2o2T21lSzdqN1E4SlMwR2NtTDVxbmMzdkRkUkdSdFlhclk2UzR1Qko2UmVnMkJGSFZ4MApPU2lrV1E2Z2dTTkJvMnEwR3pvcVpYZ3FSdGVtYjh1WUhUR25ad2NlRGd4dVhMcXg2cHRKWTlvajJJcDNXQXFaCjdWei95WGl6aWpIWmJZTW5IQmUyODFEMmFUQzlJRXpwdGg2NmlTZ2VYa1pWOFhNL1BCQjk1ZVVDZ1lFQXhPbXcKM04wY2J6QkxPeUN5Zkdhc2phRnAxNml3djFMYVB2b3ZSaG9ubDhkczVmVy9GdC9kMTY1YVorOCtnTU1rN1kxTwpFdkxmVjFkZXVUS0Y0VDV4dC90ZTVYcU5ocXJRbWt5RmZUY0tFem8rODZnUnpxTWRqSXh1eDhjN3FRWjg3Y2xxCitSNTBUbzZraGt1YXlpc1hWT05CV2VxekFSWk9QWmh2L2xSaUl0c0NnWUVBb0xZZ1dkeGg2OW1JTFFmSGJvZ24KcFA5UTRLMXNEb1NzeXlkc0FhUDBsdnBCSzF4WW95ckgxL3I3aW52Y2QrQ0JtYXdVSEwzSzliSHV5dVVVQ0J3TgoyN3V3RWtieDFrWTZlR0VVUFk5TkhZZDhZTWxmSWt2Y2RBc2xIUkpJQXJRSDJPRDlFKzFIWTdFODE4Nmg5ZFVNClh1Y3hxKzRkTmprNkptczR2OXJjSXFVQ2dZQUJrTDRJTTNYTGFIM2duWFR0eWo4cTdSS1RWVkw2WW1VN3hPOWwKUmtYMFRmQ09yM0p5Y3hzbllNcDFNeEN6STFvQ3pYSEdjc25WdnVzUTI5YjJvSEYwL2ZtV0ozQkNsczhMdXZvQQpzZFJSck0vZFRnTytPY3U5VjB4MktCNVFUSzNua2dkWXJhWk5EWk0vUWhDYjlOVzlwZ1RaK3lTcktJczhzQjZMCnpnM3Rxd0tCZ1FDTzhrNkVZRVVoZW5DMWNldSs0ejdEWmZrL01CTWlJci9ob1NySllaSmVOWldBSDJOd2p5M0cKUlhYWTZzdVRZRFRXVUJYWTZZMDl2STdOQzhmRk11ZmhyM28zaThMMVNWMlNCQ0VyMlV4T3RHWnN4TEVMMnhUQwpCbFIrMEF2MnUzSFBKRTBiV3ptVGh3U1RlQ2h0Z3pZQ2tIUlZlNlJMZVhET0w3SkFnQWNyM1E9PQotLS0tLUVORCBSU0EgUFJJVkFURSBLRVktLS0tLQo=
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: hello-osiris
  labels:
    app: hello-osiris
  annotations:
    osiris.deislabs.io/enabled: "true"
    osiris.deislabs.io/minReplicas: "1"
spec:
  replicas: 1
  selector:
    matchLabels:
      app: hello-osiris
  template:
    metadata:
      labels:
        app: hello-osiris
    spec:
      containers:
      - name: hello-osiris
        image: krancour/hello-osiris:v0.1.0
        args:
        - --https-cert
        - /hello-osiris/cert/server.crt
        - --https-key
        - /hello-osiris/cert/server.key
        ports:
        - containerPort: 8080
        - containerPort: 8081
        - containerPort: 8082
        - containerPort: 4430
        volumeMounts:
        - name: cert
          mountPath: /hello-osiris/cert
          readOnly: true
        livenessProbe:
          httpGet:
            path: /healthz
            port: 8080
        readinessProbe:
          httpGet:
            path: /healthz
            port: 8080
      volumes:
      - name: cert
        secret:
          secretName: hello-osiris-cert

I also foresee a problem with the path in my Ingress spec. As far as I can tell, Osiris has no understanding of the Ingress as it is. It thus does not use the path of my app.

I have several apps, under the same domain name, with each instance accessible under a specific path. Is it something that Osiris supports or plan to support in the near future?

Thank you so much for you work. We like the simplicity of Osiris a lot. If our use case needs some work on the source code, we would be more than happy to contribute to Osiris.

chart repo unavailable: 401 Unauthorized

running helm repo update we have the following error:

...Unable to get an update from the "osiris" chart repository (https://osiris.azurecr.io/helm/v1/repo):
	Failed to fetch https://osiris.azurecr.io/helm/v1/repo/index.yaml : 401 Unauthorized

using the credentials from the README. this used to work for months, and stopped working today.

To be able to customize the image pull secret

I would like to be able to customize the image pull secret. This would allow me to use custom made or private images

support cpu/gpu consumption metrics as well in addition to request count

Thanks for releasing this useful tool :)

What would you like to be added?
Currently it seems that only metrics supported is request count. Is there any plan to monitor consumption of CPU and GPU resources in addition to request count to make decision as if a given pod is idle or not.

Why is this needed?
As a user might simple submit a job and the job runs for hours before user will check it again on status. Usually in ML model training. So this will avoid the issue if killing the pod when the analysis is running.

Supported Helm Version

Question:
Hi,

In the docs it says that it can be install with Helm 2.11 or greater, but having tried it with Helm 3.0 I got the following error.

$ helm repo add osiris https://osiris.azurecr.io/helm/v1/repo --username 'eae9749a-fccf-4a24-ac0d-6506fe2a6ab3' --pass                                                                                                                      word '=s-e.2-84BhIo6LM6=/l4C_sFzxb=sT['
Error: Looks like "https://osiris.azurecr.io/helm/v1/repo" is not a valid chart repository or cannot be reached: error u                                                                                                                      nmarshaling JSON: while decoding JSON: json: cannot unmarshal number into Go struct field ChartVersion.entries.appVersio                                                                                                                      n of type string

Thinking it was an unfair test using a new major version I downgraded to 2.16.1 but get the same error. I have now downgraded to 2.11 and I can add the repo successfully.

So my question(s)

What versions of Helm can I use and when will 3.0 be supported?

Thanks

Error: looks like "https://osiris.azurecr.io/helm/v1/repo" is not a valid chart repository or cannot be reached: error unmarshaling JSON: while decoding JSON: json: unknown field "acrMetadata"

I couldn't add the repo to helm chart. I have an AKS cluster and my helm version 3.3.4.

Kindly help with this

deislabs / osiris Goto Github PK

osiris's Introduction

Osiris - A general purpose, Scale to Zero component for Kubernetes

How it works

Scaling to zero and the HPA

Setup

Installation

Installation Options

Usage

Configuration

Deployment Annotations

Pod Annotations

Service Annotations

Demo

Limitations

Contributing

osiris's People

Contributors

Stargazers

Watchers

Forkers

osiris's Issues

Environment summary

Issue Details

Environment summary

Issue Details

Repo Steps

Recommend Projects

Recommend Topics

Recommend Org