zalando-incubator / kube-metrics-adapter Goto Github PK

General purpose metrics adapter for Kubernetes HPA metrics

License: MIT License

Dockerfile 0.15% Makefile 1.32% Go 97.76% Shell 0.78%

kubernetes custom metrics external hpa horizontal pod autoscaling zalando prometheus

kube-metrics-adapter's Introduction

kube-metrics-adapter

Kube Metrics Adapter is a general purpose metrics adapter for Kubernetes that can collect and serve custom and external metrics for Horizontal Pod Autoscaling.

It supports scaling based on Prometheus metrics, SQS queues and others out of the box.

It discovers Horizontal Pod Autoscaling resources and starts to collect the requested metrics and stores them in memory. It's implemented using the custom-metrics-apiserver library.

Here's an example of a HorizontalPodAutoscaler resource configured to get requests-per-second metrics from each pod of the deployment myapp.

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: myapp-hpa
  annotations:
    # metric-config.<metricType>.<metricName>.<collectorType>/<configKey>
    metric-config.pods.requests-per-second.json-path/json-key: "$.http_server.rps"
    metric-config.pods.requests-per-second.json-path/path: /metrics
    metric-config.pods.requests-per-second.json-path/port: "9090"
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: myapp
  minReplicas: 1
  maxReplicas: 10
  metrics:
  - type: Pods
    pods:
      metric:
        name: requests-per-second
      target:
        averageValue: 1k
        type: AverageValue

The metric-config.* annotations are used by the kube-metrics-adapter to configure a collector for getting the metrics. In the above example it configures a json-path pod collector.

Kubernetes compatibility

Like the support policy offered for Kubernetes, this project aims to support the latest three minor releases of Kubernetes.

The default supported API is autoscaling/v2 (available since v1.23). This API MUST be available in the cluster which is the default.

Building

This project uses Go modules as introduced in Go 1.11 therefore you need Go >=1.11 installed in order to build. If using Go 1.11 you also need to activate Module support.

Assuming Go has been setup with module support it can be built simply by running:

export GO111MODULE=on # needed if the project is checked out in your $GOPATH.
$ make

Install in Kubernetes

Clone this repository, and run as below:

$ cd kube-metrics-adapter/docs
$ kubectl apply -f .

Collectors

Collectors are different implementations for getting metrics requested by an HPA resource. They are configured based on HPA resources and started on-demand by the kube-metrics-adapter to only collect the metrics required for scaling the application.

The collectors are configured either simply based on the metrics defined in an HPA resource, or via additional annotations on the HPA resource.

Pod collector

The pod collector allows collecting metrics from each pod matching the label selector defined in the HPA's scaleTargetRef. Currently only json-path collection is supported.

Supported HPA `scaleTargetRef`

The Pod Collector utilizes the scaleTargetRef specified in an HPA resource to obtain the label selector from the referenced Kubernetes object. This enables the identification and management of pods associated with that object. Currently, the supported Kubernetes objects for this operation are: Deployment, StatefulSet and Rollout.

Supported metrics

Metric	Description	Type	K8s Versions
custom	No predefined metrics. Metrics are generated from user defined queries.	Pods	`>=1.12`

Example

This is an example of using the pod collector to collect metrics from a json metrics endpoint of each pod matched by the HPA.

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: myapp-hpa
  annotations:
    # metric-config.<metricType>.<metricName>.<collectorType>/<configKey>
    metric-config.pods.requests-per-second.json-path/json-key: "$.http_server.rps"
    metric-config.pods.requests-per-second.json-path/path: /metrics
    metric-config.pods.requests-per-second.json-path/port: "9090"
    metric-config.pods.requests-per-second.json-path/scheme: "https"
    metric-config.pods.requests-per-second.json-path/aggregator: "max"
    metric-config.pods.requests-per-second.json-path/interval: "60s" # optional
    metric-config.pods.requests-per-second.json-path/min-pod-ready-age: "30s" # optional
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: myapp
  minReplicas: 1
  maxReplicas: 10
  metrics:
  - type: Pods
    pods:
      metric:
        name: requests-per-second
      target:
        averageValue: 1k
        type: AverageValue

The pod collector is configured through the annotations which specify the collector name json-path and a set of configuration options for the collector. json-key defines the json-path query for extracting the right metric. This assumes the pod is exposing metrics in JSON format. For the above example the following JSON data would be expected:

{
  "http_server": {
    "rps": 0.5
  }
}

The json-path query support depends on the github.com/spyzhov/ajson library. See the README for possible queries. It's expected that the metric you query returns something that can be turned into a float64.

The other configuration options path, port and scheme specify where the metrics endpoint is exposed on the pod. The path and port options do not have default values so they must be defined. The scheme is optional and defaults to http.

The aggregator configuration option specifies the aggregation function used to aggregate values of JSONPath expressions that evaluate to arrays/slices of numbers. It's optional but when the expression evaluates to an array/slice, it's absence will produce an error. The supported aggregation functions are avg, max, min and sum.

The raw-query configuration option specifies the query params to send along to the endpoint:

  metric-config.pods.requests-per-second.json-path/path: /metrics
  metric-config.pods.requests-per-second.json-path/port: "9090"
  metric-config.pods.requests-per-second.json-path/raw-query: "foo=bar&baz=bop"

will create a URL like this:

http://<podIP>:9090/metrics?foo=bar&baz=bop

There are also configuration options for custom (connect and request) timeouts when querying pods for metrics:

metric-config.pods.requests-per-second.json-path/request-timeout: 2s
metric-config.pods.requests-per-second.json-path/connect-timeout: 500ms

The default for both of the above values is 15 seconds.

The min-pod-ready-age configuration option instructs the service to start collecting metrics from the pods only if they are "older" (time elapsed after pod reached "Ready" state) than the specified amount of time. This is handy when pods need to warm up before HPAs will start tracking their metrics.

The default value is 0 seconds.

Prometheus collector

The Prometheus collector is a generic collector which can map Prometheus queries to metrics that can be used for scaling. This approach is different from how it's done in the k8s-prometheus-adapter where all available Prometheus metrics are collected and transformed into metrics which the HPA can scale on, and there is no possibility to do custom queries. With the approach implemented here, users can define custom queries and only metrics returned from those queries will be available, reducing the total number of metrics stored.

One downside of this approach is that bad performing queries can slow down/kill Prometheus, so it can be dangerous to allow in a multi tenant cluster. It's also not possible to restrict the available metrics using something like RBAC since any user would be able to create the metrics based on a custom query.

I still believe custom queries are more useful, but it's good to be aware of the trade-offs between the two approaches.

Supported metrics

Metric	Description	Type	Kind	K8s Versions
`prometheus-query`	Generic metric which requires a user defined query.	External		`>=1.12`
custom	No predefined metrics. Metrics are generated from user defined queries.	Object	any	`>=1.12`

Example: External Metric

This is an example of an HPA configured to get metrics based on a Prometheus query. The query is defined in the annotation metric-config.external.processed-events-per-second.prometheus/query where processed-events-per-second is the query name which will be associated with the result of the query. This allows having multiple prometheus queries associated with a single HPA.

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: myapp-hpa
  annotations:
    # This annotation is optional.
    # If specified, then this prometheus server is used,
    # instead of the prometheus server specified as the CLI argument `--prometheus-server`.
    metric-config.external.processed-events-per-second.prometheus/prometheus-server: http://prometheus.my-namespace.svc
    # metric-config.<metricType>.<metricName>.<collectorType>/<configKey>
    metric-config.external.processed-events-per-second.prometheus/query: |
      scalar(sum(rate(event-service_events_count{application="event-service",processed="true"}[1m])))
    metric-config.external.processed-events-per-second.prometheus/interval: "60s" # optional
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: custom-metrics-consumer
  minReplicas: 1
  maxReplicas: 10
  metrics:
  - type: External
    external:
      metric:
        name: processed-events-per-second
        selector:
          matchLabels:
            type: prometheus
      target:
        type: AverageValue
        averageValue: "10"

Example: Object Metric [DEPRECATED]

Note: Prometheus Object metrics are deprecated and will most likely be removed in the future. Use the Prometheus External metrics instead as described above.

This is an example of an HPA configured to get metrics based on a Prometheus query. The query is defined in the annotation metric-config.object.processed-events-per-second.prometheus/query where processed-events-per-second is the metric name which will be associated with the result of the query.

It also specifies an annotation metric-config.object.processed-events-per-second.prometheus/per-replica which instructs the collector to treat the results as an average over all pods targeted by the HPA. This makes it possible to mimic the behavior of targetAverageValue which is not implemented for metric type Object as of Kubernetes v1.10. (It will most likely come in v1.12).

apiVersion: autoscaling/v2beta1
kind: HorizontalPodAutoscaler
metadata:
  name: myapp-hpa
  annotations:
    # metric-config.<metricType>.<metricName>.<collectorType>/<configKey>
    metric-config.object.processed-events-per-second.prometheus/query: |
      scalar(sum(rate(event-service_events_count{application="event-service",processed="true"}[1m])))
    metric-config.object.processed-events-per-second.prometheus/per-replica: "true"
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: custom-metrics-consumer
  minReplicas: 1
  maxReplicas: 10
  metrics:
  - type: Object
    object:
      metricName: processed-events-per-second
      target:
        apiVersion: v1
        kind: Pod
        name: dummy-pod
      targetValue: 10 # this will be treated as targetAverageValue

Note: The HPA object requires an Object to be specified. However when a Prometheus metric is used there is no need for this object. But to satisfy the schema we specify a dummy pod called dummy-pod.

Skipper collector

The skipper collector is a simple wrapper around the Prometheus collector to make it easy to define an HPA for scaling based on Ingress or RouteGroup metrics when skipper is used as the ingress implementation in your cluster. It assumes you are collecting Prometheus metrics from skipper and it provides the correct Prometheus queries out of the box so users don't have to define those manually.

Supported metrics

Metric	Description	Type	Kind	K8s Versions
`requests-per-second`	Scale based on requests per second for a certain ingress or routegroup.	Object	`Ingress`, `RouteGroup`	`>=1.19`

Example

Ingress

This is an example of an HPA that will scale based on requests-per-second for an ingress called myapp.

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: myapp-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: myapp
  minReplicas: 1
  maxReplicas: 10
  metrics:
  - type: Object
    object:
      describedObject:
        apiVersion: networking.k8s.io/v1
        kind: Ingress
        name: myapp
      metric:
        name: requests-per-second
        selector:
          matchLabels:
            backend: backend1 # optional backend
      target:
        averageValue: "10"
        type: AverageValue

RouteGroup

This is an example of an HPA that will scale based on requests-per-second for a routegroup called myapp.

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: myapp-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: myapp
  minReplicas: 1
  maxReplicas: 10
  metrics:
  - type: Object
    object:
      describedObject:
        apiVersion: zalando.org/v1
        kind: RouteGroup
        name: myapp
      metric:
        name: requests-per-second
        selector:
          matchLabels:
            backend: backend1 # optional backend
      target:
        averageValue: "10"
        type: AverageValue

Metric weighting based on backend

Skipper supports sending traffic to different backends based on annotations present on the Ingress object, or weights on the RouteGroup backends. By default the number of replicas will be calculated based on the full traffic served by that ingress/routegroup. If however only the traffic being routed to a specific backend should be used then the backend name can be specified via the backend label under matchLabels for the metric. The ingress annotation where the backend weights can be obtained can be specified through the flag --skipper-backends-annotation.

External RPS collector

The External RPS collector, like Skipper collector, is a simple wrapper around the Prometheus collector to make it easy to define an HPA for scaling based on the RPS measured for a given hostname. When skipper is used as the ingress implementation in your cluster everything should work automatically, in case another reverse proxy is used as ingress, like Nginx for example, its necessary to configure which prometheus metric should be used through --external-rps-metric-name <metric-name> flag. Assuming skipper-ingress is being used or the appropriate metric name is passed using the flag mentioned previously this collector provides the correct Prometheus queries out of the box so users don't have to define those manually.

Supported metrics

Metric	Description	Type	Kind	K8s Versions
`requests-per-second`	Scale based on requests per second for a certain hostname.	External		`>=1.12`

Example: External Metric

This is an example of an HPA that will scale based on requests-per-second for the RPS measured in the hostnames called: www.example1.com and www.example2.com; and weighted by 42%.

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: myapp-hpa
  annotations:
    metric-config.external.example-rps.requests-per-second/hostnames: www.example1.com,www.example2.com
    metric-config.external.example-rps.requests-per-second/weight: "42"
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: custom-metrics-consumer
  minReplicas: 1
  maxReplicas: 10
  metrics:
  - type: External
    external:
      metric:
        name: example-rps
        selector:
          matchLabels:
            type: requests-per-second
      target:
        type: AverageValue
        averageValue: "42"

Multiple hostnames per metric

This metric supports a relation of n:1 between hostnames and metrics. The way it works is the measured RPS is the sum of the RPS rate of each of the specified hostnames. This value is further modified by the weight parameter explained below.

Metric weighting based on backend

There are ingress-controllers, like skipper-ingress, that supports sending traffic to different backends based on some kind of configuration, in case of skipper annotations present on the Ingress object, or weights on the RouteGroup backends. By default the number of replicas will be calculated based on the full traffic served by these components. If however only the traffic being routed to a specific hostname should be used then the weight for the configured hostname(s) might be specified via the weight annotation metric-config.external.<metric-name>.request-per-second/weight for the metric being configured.

InfluxDB collector

The InfluxDB collector maps Flux queries to metrics that can be used for scaling.

Note that the collector targets an InfluxDB v2 instance, that's why we only support Flux instead of InfluxQL.

Supported metrics

Metric	Description	Type	Kind	K8s Versions
`flux-query`	Generic metric which requires a user defined query.	External		`>=1.10`

Example: External Metric

This is an example of an HPA configured to get metrics based on a Flux query. The query is defined in the annotation metric-config.external.<metricName>.influxdb/query where <metricName> is the query name which will be associated with the result of the query. This allows having multiple flux queries associated with a single HPA.

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: myapp-hpa
  annotations:
    # These annotations are optional.
    # If specified, then they are used for setting up the InfluxDB client properly,
    # instead of using the ones specified via CLI. Respectively:
    #  - --influxdb-address
    #  - --influxdb-token
    #  - --influxdb-org
    metric-config.external.queue-depth.influxdb/address: "http://influxdbv2.my-namespace.svc"
    metric-config.external.queue-depth.influxdb/token: "secret-token"
    # This could be either the organization name or the ID.
    metric-config.external.queue-depth.influxdb/org: "deadbeef"
    # metric-config.<metricType>.<metricName>.<collectorType>/<configKey>
    # <configKey> == query-name
    metric-config.external.queue-depth.influxdb/query: |
        from(bucket: "apps")
          |> range(start: -30s)
          |> filter(fn: (r) => r._measurement == "queue_depth")
          |> group()
          |> max()
          // Rename "_value" to "metricvalue" for letting the metrics server properly unmarshal the result.
          |> rename(columns: {_value: "metricvalue"})
          |> keep(columns: ["metricvalue"])
    metric-config.external.queue-depth.influxdb/interval: "60s" # optional
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: queryd-v1
  minReplicas: 1
  maxReplicas: 4
  metrics:
  - type: External
    external:
      metric:
        name: queue-depth
        selector:
          matchLabels:
            type: influxdb
      target:
        type: Value
        value: "1"

AWS collector

The AWS collector allows scaling based on external metrics exposed by AWS services e.g. SQS queue lengths.

AWS IAM role

To integrate with AWS, the controller needs to run on nodes with access to AWS API. Additionally the controller have to have a role with the following policy to get all required data from AWS:

PolicyDocument:
  Statement:
    - Action: 'sqs:GetQueueUrl'
      Effect: Allow
      Resource: '*'
    - Action: 'sqs:GetQueueAttributes'
      Effect: Allow
      Resource: '*'
    - Action: 'sqs:ListQueues'
      Effect: Allow
      Resource: '*'
    - Action: 'sqs:ListQueueTags'
      Effect: Allow
      Resource: '*'
  Version: 2012-10-17

Supported metrics

Metric	Description	Type	K8s Versions
`sqs-queue-length`	Scale based on SQS queue length	External	`>=1.12`

Example

This is an example of an HPA that will scale based on the length of an SQS queue.

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: myapp-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: custom-metrics-consumer
  minReplicas: 1
  maxReplicas: 10
  metrics:
  - type: External
    external:
      metric:
        name: my-sqs
        selector:
          matchLabels:
            type: sqs-queue-length
            queue-name: foobar
            region: eu-central-1
      target:
        averageValue: "30"
        type: AverageValue

The matchLabels are used by kube-metrics-adapter to configure a collector that will get the queue length for an SQS queue named foobar in region eu-central-1.

The AWS account of the queue currently depends on how kube-metrics-adapter is configured to get AWS credentials. The normal assumption is that you run the adapter in a cluster running in the AWS account where the queue is defined. Please open an issue if you would like support for other use cases.

ZMON collector

The ZMON collector allows scaling based on external metrics exposed by ZMON checks.

Supported metrics

Metric	Description	Type	K8s Versions
`zmon-check`	Scale based on any ZMON check results	External	`>=1.12`

Example

This is an example of an HPA that will scale based on the specified value exposed by a ZMON check with id 1234.

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: myapp-hpa
  annotations:
    # metric-config.<metricType>.<metricName>.<collectorType>/<configKey>
    metric-config.external.my-zmon-check.zmon/key: "custom.*"
    metric-config.external.my-zmon-check.zmon/tag-application: "my-custom-app-*"
    metric-config.external.my-zmon-check.zmon/interval: "60s" # optional
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: custom-metrics-consumer
  minReplicas: 1
  maxReplicas: 10
  metrics:
  - type: External
    external:
      metric:
          name: my-zmon-check
          selector:
            matchLabels:
              type: zmon
              check-id: "1234" # the ZMON check to query for metrics
              key: "custom.value"
              tag-application: my-custom-app
              aggregators: avg # comma separated list of aggregation functions, default: last
              duration: 5m # default: 10m
      target:
        averageValue: "30"
        type: AverageValue

The check-id specifies the ZMON check to query for the metrics. key specifies the JSON key in the check output to extract the metric value from. E.g. if you have a check which returns the following data:

{
    "custom": {
        "value": 1.0
    },
    "other": {
        "value": 3.0
    }
}

Then the value 1.0 would be returned when the key is defined as custom.value.

The tag-<name> labels defines the tags used for the kariosDB query. In a normal ZMON setup the following tags will be available:

application
alias (name of Kubernetes cluster)
entity - full ZMON entity ID.

aggregators defines the aggregation functions applied to the metrics query. For instance if you define the entity filter type=kube_pod,application=my-custom-app you might get three entities back and then you might want to get an average over the metrics for those three entities. This would be possible by using the avg aggregator. The default aggregator is last which returns only the latest metric point from the query. The supported aggregation functions are avg, count, last, max, min, sum, diff. See the KariosDB docs for details.

The duration defines the duration used for the timeseries query. E.g. if you specify a duration of 5m then the query will return metric points for the last 5 minutes and apply the specified aggregation with the same duration .e.g max(5m).

The annotations metric-config.external.my-zmon-check.zmon/key and metric-config.external.my-zmon-check.zmon/tag-<name> can be optionally used if you need to define a key or other tag with a "star" query syntax like values.*. This hack is in place because it's not allowed to use * in the metric label definitions. If both annotations and corresponding label is defined, then the annotation takes precedence.

Nakadi collector

The Nakadi collector allows scaling based on Nakadi Subscription API stats metrics consumer_lag_seconds or unconsumed_events.

Supported metrics

Metric Type	Description	Type	K8s Versions
`unconsumed-events`	Scale based on number of unconsumed events for a Nakadi subscription	External	`>=1.24`
`consumer-lag-seconds`	Scale based on number of max consumer lag seconds for a Nakadi subscription	External	`>=1.24`

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: myapp-hpa
  annotations:
    # metric-config.<metricType>.<metricName>.<collectorType>/<configKey>
    metric-config.external.my-nakadi-consumer.nakadi/interval: "60s" # optional
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: custom-metrics-consumer
  minReplicas: 0
  maxReplicas: 8 # should match number of partitions for the event type
  metrics:
  - type: External
    external:
      metric:
          name: my-nakadi-consumer
          selector:
            matchLabels:
              type: nakadi
              subscription-id: "708095f6-cece-4d02-840e-ee488d710b29"
              metric-type: "consumer-lag-seconds|unconsumed-events"
      target:
        # value is compatible with the consumer-lag-seconds metric type.
        # It describes the amount of consumer lag in seconds before scaling
        # additionally up.
        # if an event-type has multiple partitions the value of
        # consumer-lag-seconds is the max of all the partitions.
        value: "600" # 10m
        type: Value
        # averageValue is compatible with unconsumed-events metric type.
        # This means for every 30 unconsumed events a pod is added.
        # unconsumed-events is the sum of of unconsumed_events over all
        # partitions.
        averageValue: "30"
        type: AverageValue

The subscription-id is the Subscription ID of the relevant consumer. The metric-type indicates whether to scale on consumer-lag-seconds or unconsumed-events as outlined below.

unconsumed-events - is the total number of unconsumed events over all partitions. When using this metric-type you should also use the target averageValue which indicates the number of events which can be handled per pod. To best estimate the number of events per pods, you need to understand the average time for processing an event as well as the rate of events.

Example: You have an event type producing 100 events per second between 00:00 and 08:00. Between 08:01 to 23:59 it produces 400 events per second. Let's assume that on average a single pod can consume 100 events per second, then we can define 100 as averageValue and the HPA would scale to 1 between 00:00 and 08:00, and scale to 4 between 08:01 and 23:59. If there for some reason is a short spike of 800 events per second, then it would scale to 8 pods to process those events until the rate goes down again.

consumer-lag-seconds - describes the age of the oldest unconsumed event for a subscription. If the event type has multiple partitions the lag is defined as the max age over all partitions. When using this metric-type you should use the target value to indicate the max lag (in seconds) before the HPA should scale.

Example: You have a subscription with a defined SLO of "99.99 of events are consumed within 30 min.". In this case you can define a target value of e.g. 20 min. (1200s) (to include a safety buffer) such that the HPA only scales up from 1 to 2 if the target of 20 min. is breached and it needs to work faster with more consumers. For this case you should also account for the average time for processing an event when defining the target.

HTTP Collector

The http collector allows collecting metrics from an external endpoint specified in the HPA. Currently only json-path collection is supported.

Supported metrics

Metric	Description	Type	K8s Versions
custom	No predefined metrics. Metrics are generated from user defined queries.	Pods	`>=1.12`

Example

This is an example of using the HTTP collector to collect metrics from a json metrics endpoint specified in the annotations.

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: myapp-hpa
  annotations:
    # metric-config.<metricType>.<metricName>.<collectorType>/<configKey>
    metric-config.external.unique-metric-name.json-path/json-key: "$.some-metric.value"
    metric-config.external.unique-metric-name.json-path/endpoint: "http://metric-source.app-namespace:8080/metrics"
    metric-config.external.unique-metric-name.json-path/aggregator: "max"
    metric-config.external.unique-metric-name.json-path/interval: "60s" # optional
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: myapp
  minReplicas: 1
  maxReplicas: 10
  metrics:
  - type: External
    external:
      metric:
        name: unique-metric-name
        selector:
          matchLabels:
            type: json-path
      target:
        averageValue: 1
        type: AverageValue

The HTTP collector similar to the Pod Metrics collector. The following configuration values are supported:

json-key to specify the JSON path of the metric to be queried
endpoint the fully formed path to query for the metric. In the above example a Kubernetes Service in the namespace app-namespace is called.
aggregator is only required if the metric is an array of values and specifies how the values are aggregated. Currently this option can support the values: sum, max, min, avg.

Scrape Interval

It's possible to configure the scrape interval for each of the metric types via an annotation:

metric-config.<metricType>.<metricName>.<collectorType>/interval: "30s"

The default is 60s but can be reduced to let the adapter collect metrics more often.

ScalingSchedule Collectors

The ScalingSchedule and ClusterScalingSchedule collectors allow collecting time-based metrics from the respective CRD objects specified in the HPA.

These collectors are disabled by default, you have to start the server with the --scaling-schedule flag to enable it. Remember to deploy the CRDs ScalingSchedule and ClusterScalingSchedule and allow the service account used by the server to read, watch and list them.

Supported metrics

Metric	Description	Type	K8s Versions
ObjectName	The metric is calculated and stored for each `ScalingSchedule` and `ClusterScalingSchedule` referenced in the HPAs	`ScalingSchedule` and `ClusterScalingSchedule`	`>=1.16`

Ramp-up and ramp-down feature

To avoid abrupt scaling due to time based metrics,the SchalingSchedule collector has a feature of ramp-up and ramp-down the metric over a specific period of time. The duration of the scaling window can be configured individually in the [Cluster]ScalingSchedule object, via the option scalingWindowDurationMinutes or globally for all scheduled events, and defaults to a globally configured value if not specified. The default for the latter is set to 10 minutes, but can be changed using the --scaling-schedule-default-scaling-window flag.

This spreads the scale events around, creating less load on the other components, and helping the rest of the metrics (like the CPU ones) to adjust as well.

The HPA algorithm does not make changes if the metric change is less than the specified by the horizontal-pod-autoscaler-tolerance flag:

We'll skip scaling if the ratio is sufficiently close to 1.0 (within a globally-configurable tolerance, from the --horizontal-pod-autoscaler-tolerance flag, which defaults to 0.1.

With that in mind, the ramp-up and ramp-down feature divides the scaling over the specified period of time in buckets, trying to achieve changes bigger than the configured tolerance. The number of buckets defaults to 10 and can be configured by the --scaling-schedule-ramp-steps flag.

Important: note that the ramp-up and ramp-down feature can lead to deployments achieving less than the specified number of pods, due to the HPA 10% change rule and the ceiling function applied to the desired number of the pods (check the algorithm details). It varies with the configured metric for ScalingSchedule events, the number of pods and the configured horizontal-pod-autoscaler-tolerance flag of your kubernetes installation. This gist contains the code to simulate the situations a deployment with different number of pods, with a metric of 10000 can face with 10 buckets (max of 90% of the metric returned) and 5 buckets (max of 80% of the metric returned). The ramp-up and ramp-down feature can be disabled by setting --scaling-schedule-default-scaling-window to 0 and abrupt scalings can be handled via scaling policies.

Example

This is an example of using the ScalingSchedule collectors to collect metrics from a deployed kind of the CRD. First, the schedule object:

apiVersion: zalando.org/v1
kind: ClusterScalingSchedule
metadata:
  name: "scheduling-event"
spec:
  schedules:
  - type: OneTime
    date: "2021-10-02T08:08:08+02:00"
    durationMinutes: 30
    value: 100
  - type: Repeating
    durationMinutes: 10
    value: 120
    period:
      startTime: "15:45"
      timezone: "Europe/Berlin"
      days:
      - Mon
      - Wed
      - Fri

This resource defines a scheduling event named scheduling-event with two schedules of the kind ClusterScalingSchedule.

ClusterScalingSchedule objects aren't namespaced, what means it can be referenced by any HPA in any namespace in the cluster. ScalingSchedule have the exact same fields and behavior, but can be referenced just by HPAs in the same namespace. The schedules can have the type Repeating or OneTime.

This example configuration will generate the following result: at 2021-10-02T08:08:08+02:00 for 30 minutes a metric with the value of 100 will be returned. Every Monday, Wednesday and Friday, starting at 15 hours and 45 minutes (Berlin time), a metric with the value of 120 will be returned for 10 minutes. It's not the case of this example, but if multiple schedules collide in time, the biggest value is returned.

Check the CRDs definitions (ScalingSchedule, ClusterScalingSchedule) for a better understanding of the possible fields and their behavior.

An HPA can reference the deployed ClusterScalingSchedule object as this example:

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: "myapp-hpa"
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: myapp
  minReplicas: 1
  maxReplicas: 15
  metrics:
  - type: Object
    object:
      describedObject:
        apiVersion: zalando.org/v1
        kind: ClusterScalingSchedule
        name: "scheduling-event"
      metric:
        name: "scheduling-event"
      target:
        type: AverageValue
        averageValue: "10"

The name of the metric is equal to the name of the referenced object. The target.averageValue in this example is set to 10. This value will be used by the HPA controller to define the desired number of pods, based on the metric obtained (check the HPA algorithm details for more context). This HPA configuration explicitly says that each pod of this application supports 10 units of the ClusterScalingSchedule metric. Multiple applications can share the same ClusterScalingSchedule or ScalingSchedule event and have a different number of pods based on its target.averageValue configuration.

In our specific example at 2021-10-02T08:08:08+02:00 as the metric has the value 100, this application will scale to 10 pods (100/10). Every Monday, Wednesday and Friday, starting at 15 hours and 45 minutes (Berlin time) the application will scale to 12 pods (120/10). Both scaling up will last at least the configured duration times of the schedules. After that, regular HPA scale down behavior applies.

Note that these number of pods are just considering these custom metrics, the normal HPA behavior still applies, such as: in case of multiple metrics the biggest number of pods is the utilized one, HPA max and min replica configuration, autoscaling policies, etc.

kube-metrics-adapter's People

Contributors

Stargazers

Watchers

Forkers

linki thejasbabu csenol avaussant stefanprodan banzaicloud akkipuneet everesio topikachu bpd1069 zillaer jay-rob tomaspinho pinkavaj daguilarv jfuechsl elvistan mbr001 affo rahanar mukteshkrmishra nitesh-vaidyanath doyshinda jingjinglll rkutin vetinari nucsimple arashkaffamanesh persunde emmanuel-wang trung1490 dominicgunn prageethw xvzf andychinka tomsun28 srve4 awsum henrysecond1 petercunning harrystericker ewanvalentine zhubby ferrastas jerryline tanersener bihailantian21 vikasgoel vvalorous mad01 mbwhelan victor23d tyzbit jorgeuriarte adutchak-x iamgrewal7 dramirez-qb pexeso devopstoday11 syllogy veekayr omnivore husoule jacky68147527 yonimoses funlake aoisososola archerbj xrc1989 frankfanslc rbounds openwengo igorgorovoy prune998 dmccoystephenson codestalkerr withlin mfamador alexanderyastrebov skyebefreeman egmar ermakov-oleg wagelyapp phongdh29 miliyondiriba slivarez adidenko antoinedeschenes campspot dilinade pandeyshishir daftping demonware daylioti johnzheng1975

kube-metrics-adapter's Issues

Support for AWS collector from external cluster

I need to scale based on sqs queue length, but my cluster will be running on GKE (or locally,) and I'm not 100% sure if this use-case is supported yet. Can I just add the aws credentials to env, and get it working externally?

Add support for metrics with labels

#80 introduces passing of metricSelectors to the metric storage interface e.g.: https://github.com/zalando-incubator/kube-metrics-adapter/pull/80/files#diff-2554eb34d855a4440f5810661211a7f5R265

Implement support for storing multiple metrics with the same name but different labels.

Prometheus collector should create External target.type

Expected Behavior

I think it makes more sense for the Prometheus collector to create an External metric instead of an Object. The syntax fits better. Also, although it can provide metrics about an Object, I doubt it will be used much in that way.

Actual Behavior

It currently provides the metrics of type Object. So a dummy target has to be added and for Kubernetes <1.12 targetAverageValue requires a workaround.

Support for exposing /metrics and /healthz?

Hi,

Does kube-metrics-adapter exposes its own metrics with a prometheus client? Does it expose an endpoint for health checks using kubernetes probes?

Thanks

v2beta2 HPA w/prometheus query: 404 logs & <unknown>

Expected Behavior

I was creating a test HPA (v2beta2) with a simple Prometheus query (to test, not to get a valid scaling value), and it doesn't seem to get populated in the custom-metrics. The HPA value thus the target is stuck with the <unkown> value.

Actual Behavior

I'm getting a 404 error in the kube-metrics-adapter logs (below) and the HPA is never getting updated.
I'm probably missing something obvious but couldn't find enough examples to figure it out.

Steps to Reproduce the Problem

HPA:

apiVersion: autoscaling/v2beta2
kind: HorizontalPodAutoscaler
metadata:
  name: hpa-test-v2beta2
  namespace: test
  annotations: 
    metric-config.external.prometheus-query.prometheus/prometheus-server: http://prometheus.kube-system-monitoring.svc.cluster.local:9090
    metric-config.external.prometheus-query.prometheus/rabbitmq_queue_messages_ready: rabbitmq_queue_messages_ready{queue="task_priority_apiworker"}
spec:
  scaleTargetRef:
   #apiVersion: extensions/v1beta1
    apiVersion: apps/v1
    kind: Deployment
    name: apiworker
  minReplicas: 1
  maxReplicas: 15
  metrics:
  - type: External
    external:
      metric:
        name: prometheus-query
        selector:
          matchLabels:
            query-name: rabbitmq_queue_messages_ready
      target:
        type: AverageValue
        averageValue: 1

Logs:

2020-05-20T21:04:16+00:00 localhost docker/k8s_kube-metrics-adapter_kube-metrics-adapter-8474d5546f-nv7wk_kube-system_abe181ea-ad95-46d6-9fa1-662d486a1cf8_0[51530]: time="2020-05-20T21:04:16Z" level=info msg="Looking for HPAs" provider=hpa
2020-05-20T21:04:16+00:00 localhost docker/k8s_kube-metrics-adapter_kube-metrics-adapter-8474d5546f-nv7wk_kube-system_abe181ea-ad95-46d6-9fa1-662d486a1cf8_0[51530]: time="2020-05-20T21:04:16Z" level=info msg="Removing previously scheduled metrics collector: {hpa-test-v2beta2 test}" provider=hpa
2020-05-20T21:04:16+00:00 localhost docker/k8s_kube-metrics-adapter_kube-metrics-adapter-8474d5546f-nv7wk_kube-system_abe181ea-ad95-46d6-9fa1-662d486a1cf8_0[51530]: time="2020-05-20T21:04:16Z" level=info msg="Adding new metrics collector: *collector.PrometheusCollector" provider=hpa
2020-05-20T21:04:16+00:00 localhost docker/k8s_kube-metrics-adapter_kube-metrics-adapter-8474d5546f-nv7wk_kube-system_abe181ea-ad95-46d6-9fa1-662d486a1cf8_0[51530]: time="2020-05-20T21:04:16Z" level=info msg="Found 1 new/updated HPA(s)" provider=hpa
2020-05-20T21:04:16+00:00 localhost docker/k8s_kube-metrics-adapter_kube-metrics-adapter-8474d5546f-nv7wk_kube-system_abe181ea-ad95-46d6-9fa1-662d486a1cf8_0[51530]: time="2020-05-20T21:04:16Z" level=error msg="Failed to collect metrics: client_error: client error: 404" provider=hpa
2020-05-20T21:04:16+00:00 localhost docker/k8s_kube-metrics-adapter_kube-metrics-adapter-8474d5546f-nv7wk_kube-system_abe181ea-ad95-46d6-9fa1-662d486a1cf8_0[51530]: time="2020-05-20T21:04:16Z" level=info msg="Collected 0 new metric(s)" provider=hpa

kubectl get hpa --all-namespaces -o wide

NAMESPACE   NAME               REFERENCE                     TARGETS             MINPODS   MAXPODS   REPLICAS   AGE
test        hpa-test-v2beta2   Deployment/warden-apiworker   <unknown>/1 (avg)   1         15        15         8m

kubectl get --raw /apis/custom.metrics.k8s.io/v1beta1/ | jq

{
  "kind": "APIResourceList",
  "apiVersion": "v1",
  "groupVersion": "custom.metrics.k8s.io/v1beta1",
  "resources": []
}

Specifications

Version: K8s 1.18.2
Platform: Ubuntu 20.04
Subsystem: Prometheus 2.18.1

Cheers! :)

Metrics from an outside service?

So this works great with prometheus but we autoscale so much that 100 nodes might go up and down per hour and thousands of pods autoscale up and down and prometheus is having memory issues that's OOMing.

I am wondering if this can hit an outside service/hostname for json metrics. Like I want to autoscale deployment sidekiq with metrics from service sidekiq-collector

If not currently.. then are you open to PRs to add this feature in? I like the ability to just remove promethus all together if possible for some HPAs

too many /metrics requests

We flood skipper instances with quite amount of calls per second:

3.121.220.107 - - [27/Feb/2019:18:46:47 +0000] "GET /metrics HTTP/1.1" 200 18 "-" "Go-http-client/2.0" 2 sample-custom-metrics-autoscaling-e2e.teapot.zalan.do - -
35.156.0.203 - - [27/Feb/2019:18:46:47 +0000] "GET /metrics HTTP/1.1" 200 18 "-" "Go-http-client/2.0" 2 sample-custom-metrics-autoscaling-e2e.teapot.zalan.do - -
3.122.97.48 - - [27/Feb/2019:18:46:47 +0000] "GET /metrics HTTP/1.1" 200 18 "-" "Go-http-client/2.0" 2 sample-custom-metrics-autoscaling-e2e.teapot.zalan.do - -
3.121.220.107 - - [27/Feb/2019:18:46:47 +0000] "GET /metrics HTTP/1.1" 200 18 "-" "Go-http-client/2.0" 2 sample-custom-metrics-autoscaling-e2e.teapot.zalan.do - -
35.156.0.203 - - [27/Feb/2019:18:46:47 +0000] "GET /metrics HTTP/1.1" 200 18 "-" "Go-http-client/2.0" 3 sample-custom-metrics-autoscaling-e2e.teapot.zalan.do - -
3.122.97.48 - - [27/Feb/2019:18:46:47 +0000] "GET /metrics HTTP/1.1" 200 18 "-" "Go-http-client/2.0" 2 sample-custom-metrics-autoscaling-e2e.teapot.zalan.do - -
3.121.220.107 - - [27/Feb/2019:18:46:47 +0000] "GET /metrics HTTP/1.1" 200 18 "-" "Go-http-client/2.0" 2 sample-custom-metrics-autoscaling-e2e.teapot.zalan.do - -
35.156.0.203 - - [27/Feb/2019:18:46:47 +0000] "GET /metrics HTTP/1.1" 200 18 "-" "Go-http-client/2.0" 3 sample-custom-metrics-autoscaling-e2e.teapot.zalan.do - -
3.122.97.48 - - [27/Feb/2019:18:46:47 +0000] "GET /metrics HTTP/1.1" 200 18 "-" "Go-http-client/2.0" 2 sample-custom-metrics-autoscaling-e2e.teapot.zalan.do - -

I guess we have to decouple metrics gathering from the hpa instances in some way.

Adapter failing to scrape Prometheus metrics

I've been trying to work through this example of HPA through istio metrics via Prometheus, but the example is pretty old. The docker container was built quite some time ago. However, the example does work correctly. However, upgrading the image to v0.1.5 through the banzai helm chart yields an error:

 time="2020-06-09T22:55:49Z" level=info msg="Event(v1.ObjectReference{Kind:\"HorizontalPodAutoscaler\", Namespace:\"test\", Name:\"podinfo\", UID:\"26036502-3285-4359-a2f4-606dbc5d32ed\", APIVersion:\"autoscaling/v2beta2\", ResourceVersion:\"73634\", FieldPath:\"\"}): type: 'Warning' reason: 'CreateNewMetricsCollector' Failed to create new metrics collector: no plugin found for {Object {istio-requests-total nil}}"

I've verified that this metric exists in Prometheus. I've also tried building v0.1.2 myself, but dependencies fail.

Expected Behavior

the adapter scrapes metrics from istio-prometheus, and provides them to the metrics server.

Actual Behavior

The adapter does not scrape metrics from istio's prometheus.

Steps to Reproduce the Problem

using the v0.1.1 banzai helm chart , install the adapter., setting the image to be v0.1.5 in the deployment
from the example repository, deploy the test pod and load simulator
Tail the logs from the adapter, observing the errors.

Specifications

Version: v0.1.5 (built from source on internal image)
Platform: AWS
Subsystem: Kubernetes 1.15.7

Additional Logs:

I0609 22:51:18.975548       6 serving.go:306] Generated self-signed cert (apiserver.local.config/certificates/apiserver.crt, apiserver.local.config/certificates/apiserver.key)
W0609 22:51:19.424396       6 configmap_cafile_content.go:102] unable to load initial CA bundle for: "client-ca::kube-system::extension-apiserver-authentication::client-ca-file" due to: configmap "extension-apiserver-authentication" not found
W0609 22:51:19.424432       6 configmap_cafile_content.go:102] unable to load initial CA bundle for: "client-ca::kube-system::extension-apiserver-authentication::requestheader-client-ca-file" due to: configmap "extension-apiserver-authentication" not found
time="2020-06-09T22:51:19Z" level=info msg="Looking for HPAs" provider=hpa
I0609 22:51:19.438024       6 configmap_cafile_content.go:205] Starting client-ca::kube-system::extension-apiserver-authentication::requestheader-client-ca-file
I0609 22:51:19.438030       6 configmap_cafile_content.go:205] Starting client-ca::kube-system::extension-apiserver-authentication::client-ca-file
I0609 22:51:19.438047       6 shared_informer.go:197] Waiting for caches to sync for client-ca::kube-system::extension-apiserver-authentication::requestheader-client-ca-file
I0609 22:51:19.438047       6 shared_informer.go:197] Waiting for caches to sync for client-ca::kube-system::extension-apiserver-authentication::client-ca-file
I0609 22:51:19.438163       6 dynamic_serving_content.go:129] Starting serving-cert::apiserver.local.config/certificates/apiserver.crt::apiserver.local.config/certificates/apiserver.key
I0609 22:51:19.439245       6 secure_serving.go:178] Serving securely on [::]:443
I0609 22:51:19.439269       6 tlsconfig.go:219] Starting DynamicServingCertificateController
time="2020-06-09T22:51:19Z" level=info msg="Removing previously scheduled metrics collector: {istio-ingressgateway istio-system}" provider=hpa
time="2020-06-09T22:51:19Z" level=info msg="Removing previously scheduled metrics collector: {istio-pilot istio-system}" provider=hpa
time="2020-06-09T22:51:19Z" level=info msg="Removing previously scheduled metrics collector: {istio-policy istio-system}" provider=hpa
time="2020-06-09T22:51:19Z" level=info msg="Removing previously scheduled metrics collector: {istio-telemetry istio-system}" provider=hpa
time="2020-06-09T22:51:19Z" level=info msg="Removing previously scheduled metrics collector: {podinfo test}" provider=hpa
time="2020-06-09T22:51:19Z" level=info msg="Found 5 new/updated HPA(s)" provider=hpa
time="2020-06-09T22:51:19Z" level=info msg="Event(v1.ObjectReference{Kind:\"HorizontalPodAutoscaler\", Namespace:\"test\", Name:\"podinfo\", UID:\"26036502-3285-4359-a2f4-606dbc5d32ed\", APIVersion:\"autoscaling/v2beta2\", ResourceVersion:\"73538\", FieldPath:\"\"}): type: 'Warning' reason: 'CreateNewMetricsCollector' Failed to create new metrics collector: no plugin found for {Object {istio-requests-total nil}}"
I0609 22:51:19.538176       6 shared_informer.go:204] Caches are synced for client-ca::kube-system::extension-apiserver-authentication::requestheader-client-ca-file
I0609 22:51:19.538176       6 shared_informer.go:204] Caches are synced for client-ca::kube-system::extension-apiserver-authentication::client-ca-file
time="2020-06-09T22:51:49Z" level=info msg="Looking for HPAs" provider=hpa
time="2020-06-09T22:51:49Z" level=info msg="Removing previously scheduled metrics collector: {podinfo test}" provider=hpa
time="2020-06-09T22:51:49Z" level=info msg="Found 1 new/updated HPA(s)" provider=hpa
time="2020-06-09T22:51:49Z" level=info msg="Event(v1.ObjectReference{Kind:\"HorizontalPodAutoscaler\", Namespace:\"test\", Name:\"podinfo\", UID:\"26036502-3285-4359-a2f4-606dbc5d32ed\", APIVersion:\"autoscaling/v2beta2\", ResourceVersion:\"73634\", FieldPath:\"\"}): type: 'Warning' reason: 'CreateNewMetricsCollector' Failed to create new metrics collector: no plugin found for {Object {istio-requests-total nil}}"

add unit tests

We don't have any tests yet. It probably makes sense with unit tests. As we make the code more "testable", this might also improve the quality

TODO

Determine coverage of current codebase ( Integrate coverall )
Set target coverage for the codebase
Implement unit-tests in code with low coverage

https://github.com/mikkeloscar/kube-metrics-adapter/issues/9#issue-362541610

Helm Chart

We should have a published helm chart for this project.

add release notes

The current workflow to release this project does not include any release notes.
To document for us and the public audience we should use GH releases and set up release notes similar we do in in skipper, for example: https://github.com/zalando/skipper/releases/tag/v0.10.122

SQS hpa throwing region error

Expected Behavior

HPA should get the given sqs queue metric.

Actual Behavior

time="2020-05-19T07:28:31Z" level=info msg="Event(v1.ObjectReference{Kind:\"HorizontalPodAutoscaler\", Namespace:\"test-app\", Name:\"test-hpa\", UID:\"916f0d14-999d-11ea-9292-0aead91950b0\", APIVersion:\"autoscaling/v2beta2\", ResourceVersion:\"88872595\", FieldPath:\"\"}): type: 'Warning' reason: 'CreateNewMetricsCollector' Failed to create new metrics collector: the metric region: us-west-2 is not configured"

Here is my hpa config:

apiVersion: autoscaling/v2beta2
kind: HorizontalPodAutoscaler
metadata:
  name: test-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: test-service
  minReplicas: 1
  maxReplicas: 10
  metrics:
  - type: External
    external:
      metric:
        name: sqs-queue-length
        selector:
          matchLabels:
            queue-name: TEST_QUEUE
            region: us-west-2
      target:
        averageValue: "30"
        type: AverageValue

Specifications

Version: 1.14
Platform: centos
Subsystem:

Docker image tags

Hi,
There is a list of current all available tags of the kube-metrics-adapter docker image ?. Currently I find a latest tag only.

I want to check the kube-metrics-adapter but I'd like to stay in a version for a while instead of always pulling the latest without know about when the image has been built.

I also tried stable but seems that doesn't exist.

Comparison of Pod Collector to PodMonitor

While the difference between the Prometheus Collector and the Prometheus Adapter is documented, I am interested in a comparison of the Pod Collector and the experimental PodMonitor.

Recently, the prometheus operator got the concept of a PodMonitor: prometheus-operator/prometheus-operator#2566

The primary use case of a PodMonitor is scraping Pods directly without needing an explicit association to any specific service, as in sidecars shared by several services. For example, I want to generically scape all Istio sidecars.

not sure how to install the adapter

Expected Behavior

Actual Behavior

im not sure how to install the adapter. make was working fine. but when I start the kube-metrics-adapter binary I get some error. is there more doc somewhere, which I missed to read?

$ make

$ KUBERNETES_SERVICE_HOST=100.63.0.10 KUBERNETES_SERVICE_PORT=443 ./kube-metrics-adapter

panic: failed to get delegated authentication kubeconfig: failed to get delegated authentication kubeconfig: open /var/run/secrets/kubernetes.io/serviceaccount/token: no such file or directory

goroutine 1 [running]:
main.main()
	/Users/geri/Work/cmv-tfserving/provisioning/zalando-kube-metrics-adapter/kube-metrics-adapter/main.go:40 +0x10e

is there an official helm chart to install the adapter - which works with aws Kops?
what I found is this:
https://hub.helm.sh/charts/banzaicloud-stable/kube-metrics-adapter
can I use this helm chart for install the zalando metrics-server?

currently I have this metrics-server installed, will I need to deinstall it to use zalando metrics-server?

$ helm ls
NAME          	REVISION	UPDATED                 	STATUS  	CHART               	APP VERSION	NAMESPACE   
istio         	2       	Fri Dec 27 18:33:51 2019	DEPLOYED	istio-1.4.0         	1.4.0      	istio-system
kube2iam      	1       	Mon Dec 16 16:36:50 2019	DEPLOYED	kube2iam-2.1.0      	0.10.7     	kube-system 
metrics-server	1       	Mon Dec 16 14:55:56 2019	DEPLOYED	metrics-server-2.8.8	0.3.5      	kube-system

Steps to Reproduce the Problem

Specifications

Version:

git clone https://zalando-incubator/kube-metrics-adapter
commit 4412e3dca486658a04bc2585e1843c170da85e21 (HEAD -> master, origin/master, origin/HEAD)

Platform:
I locally have Mac osx
Subsystem:
aws kops with k8s

$ kubectl version
Client Version: version.Info{Major:"1", Minor:"14", GitVersion:"v1.14.6", GitCommit:"96fac5cd13a5dc064f7d9f4f23030a6aeface6cc", GitTreeState:"clean", BuildDate:"2019-08-19T11:13:49Z", GoVersion:"go1.12.9", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"15", GitVersion:"v1.15.6", GitCommit:"7015f71e75f670eb9e7ebd4b5749639d42e20079", GitTreeState:"clean", BuildDate:"2019-11-13T11:11:50Z", GoVersion:"go1.12.12", Compiler:"gc", Platform:"linux/amd64"}

$ helm version
Client: &version.Version{SemVer:"v2.16.1", GitCommit:"bbdfe5e7803a12bbdf97e94cd847859890cf4050", GitTreeState:"clean"}
Server: &version.Version{SemVer:"v2.16.1", GitCommit:"bbdfe5e7803a12bbdf97e94cd847859890cf4050", GitTreeState:"clean"}

Add it to https://github.com/kubernetes/metrics/blob/master/IMPLEMENTATIONS.md#custom-metrics-api

Support older autoscaling versions

Currently, the adapter only tries to discover autoscaling/v2beta2 HPAs. This means it can only be used on newer kubernetes versions that provide autoscaling/v2beta2. It would be nice to have support for the old autoscaling/v2beta1 API as well.

Alternatively, the README could have a support matrix showing which versions of the adapter support which versions of kubernetes.

[Metric-Config] Support for passing query params to pod metrics endpoint

Expected Behavior

Ability to add URL query parameters (foo=bar&baz=bop) to the metrics config json-path to be passed on to the pod's metric endpoint.

Actual Behavior

There is no explicit option to pass query parameters to the endpoint defined when collecting pod metrics. Appending the query to the path config option results in Go correctly percent encoding the ? because it is a reserved character:

metric-config.pods.concurrency-average.json-path/path: /metrics?foo=bar

will generate the following URL:

http://<podIP>:<port>/metrics%3Ffoo=bar

This results in the application not using the ? as delimiter and usually returning a 400 (or ignorning the query params).

Steps to Reproduce the Problem

Setup a custom API that expects query params
Setup json-path to add those query params: ...json-path/path: /metrics?foo=bar
Observe that the k8s autoscaler reports: unable to get metric <metric>: unable to fetch metrics from custom metrics API: the server is currently unable to handle the request (get pods.custom.metrics.k8s.io *)

Specifications

Version: 0.0.5

Failed to collect metrics: client_error: client error: 401

Hi all
I have setup the kube-metrics-adapter
but when it want to get the metrics it says :
Failed to collect metrics: client_error: client error: 401
what is the problem ?

myapp and hpa are in the namespace A
kube-metrics-adapter is in namespace kube-sysetem
and prometheus adapter is in namespace cattle-promethus

unable to get metric requests-per-second: no metrics returned from custom metrics API

Expected Behavior

The HPA should be showing current number of http_requests

Actual Behavior

HPA Shows Unknown as status.
Metrics: ( current / target )
"requests-per-second" on pods: / 10
Warning FailedGetPodsMetric 5s horizontal-pod-autoscaler unable to get metric requests-per-second: no metrics returned from custom metrics API
Warning FailedComputeMetricsReplicas 5s horizontal-pod-autoscaler failed to get pods metric value: unable to get metric requests-per-second: no metrics returned from custom metrics API
[r

Steps to Reproduce the Problem

Followed installation steps by running the docs/
Created an HPA as in the readme and replaced deployment name with a deployment in the name space on which HPA is present.

Scale pods with metrics from other pods

I have implemented HPA based on metrics server (based on memory and CPU), however, it is not suitable for my use case and I'd like to scale the pods (logstash) based on KAFKA consumer lag, I have the metrics in Prometheus that is running inside the k8s cluster under a different namespace.

Prometheus query for getting the metrics:
sum(kafka_consumergroup_lag{instance="xxxx:9308",consumergroup=~"solr-large-consumer"}) by (consumergroup, topic)

Currently, I am able to get these metrics with some other applications in json format. Can I scale the logstash pods with the metrics from the different pods?

kube-metrics-adapter (with pod collector enabled) returns stale data

Hello,

I have been testing kube-metrics-adapter (with pod collector enabled) and noticed that it returns stale data during specific conditions, which results in HPA misbehaving. The conditions occur when HPA scales up a deployment due to high load, then load subsides and it scales the deployment back to min-replica. However, if I increase the load again that should result in a scale-up, the HPA would not scale up because the data it is acting upon is stale. When it queries kube-metrics-adapter for custom metrics, the adapter is returning old data that contains non-existent pods and old values. This results in incorrect averages for the HPA.

I tracked it down to the in-memory cache that kube-metrics-adapter uses to track pods and metrics. The metrics have TTL of 15 mins and the GC is run every 10 mins. The issue mentioned above disappears after old metrics are cleaned up by GC. To fix the issue permanently, I modified TTL to 30s and GC to run every minute. HPA is able to act on new conditions much better. This should cover other edge cases, such as if HPA is in a middle of scale-down and there is a new load, it should able to act on it faster.

I think this is a simple solution for now that solves the issue and there is a better way of dealing with it. The cache can be updated on each run, so HPA can have an up-to-date view of the cluster without a delay of up to 1 minute.

Expected Behavior

kube-metrics-adapter doesn't return stale data to HPA that contains information about non-existent pods and their values.

Actual Behavior

kube-metrics-adapter returns stale data to HPA due to 15min TTL and 10 mins garbage collection. This causes HPA to calculate wrong averages based on custom metrics.

Steps to Reproduce the Problem

Let HPA scale up a deployment based on a custom metric from pod (pod collector)
Let HPA scale down the deployment due to load subsiding
As soon as the deployment hits minReplica, increase the load. Notice that averages in HPA are not correct. Specifically, the average is a function of the current value of the custom metric divided by number of pods that HPA scaled up to previously.

Specifications

Version: v0.0.5

I have a fix for this, but would like to discuss if it is appropriate. I propose to make TTL and GC run configurable and set the default values to 30s and 1m respectively. A better long term solution can be keeping the the cache up-to-date on each run.

idea: Nakadi collector

Reference: https://nakadi.io/manual.html#/subscriptions/subscription_id/stats_get

We have a service that pushes unconsumed events into CloudWatch (in GHE: tea/nakadiwatchdog), this may help to get started.

https://github.com/mikkeloscar/kube-metrics-adapter/issues/1#issuecomment-404738138

[Metric-Config] JsonPath support string indexers for json prop keys with dots

Expected Behavior

When setting up metric configs using jsonpath, it's expected to support the regular jsonpath expression.

Actual Behavior

The actual behaviour is in the case of using string indexers for accessing json props with .'s in them, the jsonpath lib is not able to extract the value and go into the value.

https://github.com/zalando-incubator/kube-metrics-adapter/blob/master/pkg/collector/json_path_collector.go#L13

Steps to Reproduce the Problem

Setup a json-path/json-key value with something like: $.histograms['service.pool.Usage']mean
On k8s the autoscaler will report: Failed to create new metrics collector: format '' not supported

Specifications

Version: latest

There is no custom metrics api after kube-metrics-adapter installation

Expected Behavior

Custom metrics api can be accessed after installing kube-metrics-adapter

Actual Behavior

$ kubectl get --raw "/apis/custom.metrics.k8s.io/v1beta1" | jq .
Error from server (ServiceUnavailable): the server is currently unable to handle the request

Steps to Reproduce the Problem

Deploy the adapter using the yaml file in the docs directory
kubectl get --raw "/apis/custom.metrics.k8s.io/v1beta1" | jq .

Specifications

Version: kubernetes 1.14
Platform: centos 7.3
Subsystem:

unable to get metric error : using prometheus-collector

Expected Behavior

I have a prometheus query that can be used for scaling, expected behavoir scale up using metric

Actual Behavior

Error: Warning FailedGetObjectMetric 3s (x2 over 18s) horizontal-pod-autoscaler unable to get metric jvm-memory-bytes-used: Service on dev event-service/unable to fetch metrics from custom metrics API: the server could not find the metric jvm-memory-bytes-used for services event-service

Steps to Reproduce the Problem

HPA yaml file:

apiVersion: autoscaling/v2beta1
kind: HorizontalPodAutoscaler
metadata:
  name: aggregation-hazelcast
  namespace: dev
  annotations:
    # metric-config.<metricType>.<metricName>.<collectorName>/<configKey>
    metric-config.object.jvm-memory-bytes-used.prometheus/query: |
      scalar((sum(jvm_memory_bytes_used{area="heap"}) ) / (sum(jvm_memory_bytes_max{area="heap"}))
    metric-config.object.jvm-memory-bytes-used.prometheus/per-replica: "true"
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: StatefulSet
    name: hazelcast
  minReplicas: 1
  maxReplicas: 10
  metrics:
  - type: Object
    object:
      metricName: jvm-memory-bytes-used
      target:
        apiVersion: v1
        kind: Service
        name: event-service
      targetValue: 80Mi # this will be treated as targetAverageValue

Specifications

Version: kops version 1.12.5
Platform: aws
I am I missing anything ? i provided the exact query which works in prometheus

Templates from Doc folder gives some 403 errors

I was able to bring up pod with the template and only thing i changed was Prometheus URL
to : --prometheus-server=http://prometheus.monitoring.svc.cluster.local:9090

I see the bunch of errors

time="2018-12-07T19:26:19Z" level=error msg="horizontalpodautoscalers.autoscaling is forbidden: User \"system:serviceaccount:kube-system:custom-metrics-apiserver\" cannot list horizontalpodautoscalers.autoscaling at the cluster scope" provider=hpa

E1207 19:26:21.268780 1 writers.go:149] apiserver was unable to write a JSON response: expected pointer, but got nil E1207 19:26:21.268808 1 status.go:64] apiserver received an error that is not an metav1.Status: &errors.errorString{s:"expected pointer, but got nil"}

Custom and external metrics are empty
(venv) [ec2-user@ip-10-230-198-112 qaas]$ kubectl get --raw "/apis/custom.metrics.k8s.io/v1beta1" --kubeconfig=config-demo| jq { "kind": "APIResourceList", "apiVersion": "v1", "groupVersion": "custom.metrics.k8s.io/v1beta1", "resources": [] }

(venv) [ec2-user@ip-10-230-198-112 qaas]$ kubectl get --raw "/apis/external.metrics.k8s.io/v1beta1" --kubeconfig=config-demo| jq { "kind": "APIResourceList", "apiVersion": "v1", "groupVersion": "external.metrics.k8s.io/v1beta1", "resources": [] }

I have prometheus server running and it has metrics if i do
curl http://prometheus.monitoring.svc.cluster.local:9090/mertics

Specifications

Version: K8s 1.11.0
Platform: KOPS k8s cluster

Create a fullstack example - aka 1 minute guide

As seen in #30, the docs should have an example that deploys a full stack that shows the capability of this tool.
The example could show an example that have an example hpa with requests-per-second, that also deploys the components required to have it.

Feature request: add Prometheus URL as annotation in HorizontalPodAutoscaler definition

In our cluster we have several Prometheus instances. I was wondering, if it is possible to add (optional) Prometheus URL in HorizontalPodAutoscaler definition, for example:

apiVersion: autoscaling/v2beta1
kind: HorizontalPodAutoscaler
metadata:
  name: myapp-hpa
  annotations:
    metric-config.external.prometheus-query.prometheus/url: prometheus.namespace2.svc:9090 
    
# metric-config.<metricType>.<metricName>.<collectorName>/<configKey>
    # <configKey> == query-name
    metric-config.external.prometheus-query.prometheus/processed-events-per-second: |
      scalar(sum(rate(event-service_events_count{application="event-service",processed="true"}[1m])))
spec:

If no such annotation defined - use the one defined as command line arguments.

bug: hpa provider only provides values for the last sqs metric in list

Expected Behavior

With two external metrics defined, the hpa provider should provide metrics for both, but only returns values for the second one.

Here's the hpa manifest:

apiVersion: autoscaling/v2beta1
kind: HorizontalPodAutoscaler
metadata:
  name: foo
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: foo
  minReplicas: 1
  maxReplicas: 10
  metrics:
  - type: External
    external:
      metricName: sqs-queue-length
      metricSelector:
        matchLabels:
          queue-name: bar
          region: us-west-2
      targetAverageValue: 600
  - type: External
    external:
      metricName: sqs-queue-length
      metricSelector:
        matchLabels:
          queue-name: baz
          region: us-west-2
      targetAverageValue: 600

Actual Behavior

The collector finds bar and baz on the first run, but then only provides values for baz after the first round of metrics is collected.

kube-metrics-adapter log:

I0226 03:42:58.340810       1 serving.go:273] Generated self-signed cert (apiserver.local.config/certificates/apiserver.crt, apiserver.local.config/certificates/apiserver.key)
time="2019-02-26T03:43:04Z" level=info msg="Looking for HPAs" provider=hpa
I0226 03:43:04.343794       1 serve.go:96] Serving securely on [::]:443
time="2019-02-26T03:43:05Z" level=info msg="Found 0 new/updated HPA(s)" provider=hpa
time="2019-02-26T03:43:35Z" level=info msg="Looking for HPAs" provider=hpa
.
.
.
time="2019-02-26T03:46:05Z" level=info msg="Found 0 new/updated HPA(s)" provider=hpa
time="2019-02-26T03:46:35Z" level=info msg="Looking for HPAs" provider=hpa
time="2019-02-26T03:46:36Z" level=info msg="Adding new metrics collector: *collector.AWSSQSCollector" provider=hpa
time="2019-02-26T03:46:36Z" level=info msg="Adding new metrics collector: *collector.AWSSQSCollector" provider=hpa
time="2019-02-26T03:46:36Z" level=info msg="Found 1 new/updated HPA(s)" provider=hpa
time="2019-02-26T03:46:36Z" level=info msg="stopping collector runner..."
time="2019-02-26T03:46:36Z" level=info msg="Collected 1 new metric(s)" provider=hpa
time="2019-02-26T03:46:36Z" level=info msg="Collected new external metric 'sqs-queue-length' (3476) [queue-name=bar,region=us-west-2]" provider=hpa
time="2019-02-26T03:46:36Z" level=info msg="Collected 1 new metric(s)" provider=hpa
time="2019-02-26T03:46:36Z" level=info msg="Collected new external metric 'sqs-queue-length' (8271) [queue-name=baz,region=us-west-2]" provider=hpa
time="2019-02-26T03:47:06Z" level=info msg="Looking for HPAs" provider=hpa
time="2019-02-26T03:47:06Z" level=info msg="Found 0 new/updated HPA(s)" provider=hpa
time="2019-02-26T03:47:36Z" level=info msg="Looking for HPAs" provider=hpa
time="2019-02-26T03:47:36Z" level=info msg="Found 0 new/updated HPA(s)" provider=hpa
time="2019-02-26T03:47:36Z" level=info msg="Collected 1 new metric(s)" provider=hpa
time="2019-02-26T03:47:36Z" level=info msg="Collected new external metric 'sqs-queue-length' (8268) [queue-name=baz,region=us-west-2]" provider=hpa
time="2019-02-26T03:48:06Z" level=info msg="Looking for HPAs" provider=hpa
time="2019-02-26T03:48:06Z" level=info msg="Found 0 new/updated HPA(s)" provider=hpa
time="2019-02-26T03:48:36Z" level=info msg="Looking for HPAs" provider=hpa
time="2019-02-26T03:48:36Z" level=info msg="Found 0 new/updated HPA(s)" provider=hpa
time="2019-02-26T03:48:37Z" level=info msg="Collected 1 new metric(s)" provider=hpa
time="2019-02-26T03:48:37Z" level=info msg="Collected new external metric 'sqs-queue-length' (8268) [queue-name=baz,region=us-west-2]" provider=hpa

Steps to Reproduce the Problem

create 2 sqs queues, bar and baz in us-west-2 (or change the yaml)
create a deployment, foo
apply the above hpa manifest
wait a few minutes
check logs for kube-metrics-adapter and kubectl get hpa foo -o yaml to see that no values are being collected for the bar queue

Specifications

Version: latest
Platform: gke
Subsystem: aws collector

CreateNewMetricsCollector : Failed to create new metrics collector

Expected Behavior

external and custom metrics api should give some output but there is none

Actual Behavior

kubectl get --raw "/apis/external.metrics.k8s.io/v1beta1" | jq .
{
"kind": "APIResourceList",
"apiVersion": "v1",
"groupVersion": "external.metrics.k8s.io/v1beta1",
"resources": []
}

Steps to Reproduce the Problem

Using the manifests with some updation of config parameters in (https://github.com/zalando-incubator/kubernetes-on-aws/tree/dev/cluster/manifests/kube-metrics-adapter), I am able to run the kube-metrics-apdapter but the logs of the container constantly show the below warnings

time="2019-01-22T10:32:24Z" level=info msg="Looking for HPAs" provider=hpa
time="2019-01-22T10:32:24Z" level=info msg="Found 1 new/updated HPA(s)" provider=hpa
time="2019-01-22T10:32:24Z" level=info msg="Event(v1.ObjectReference{Kind:\"HorizontalPodAutoscaler\", Namespace:\"default\", Name:\"myservice-k8\", UID:\"ba31da13-1e10-11e9-a841-06f3d10b141c\", APIVersion:\"autoscaling/v2beta1\", ResourceVersion:\"36437\", FieldPath:\"\"}): type: 'Warning' reason: 'CreateNewMetricsCollector' Failed to create new metrics collector: format '' not supported"

Can not find any documentation or examples explaining what is expected here ?

Using the example I am able to successfully run kube-ingress-aws-controller in (https://github.com/zalando-incubator/kube-ingress-aws-controller/tree/master/deploy)

And my prometheus stack detects the skipper metrics and I can see in my promethues graphs the values such as

skipper_response_duration_seconds_sum{application="skipper",code="200",component="ingress",controller_revision_hash="1183544327",instance="172.20.4.24:9911",job="kubernetes-pods",kubernetes_namespace="kube-system",kubernetes_pod_name="skipper-ingress-mpm6g",method="GET",pod_template_generation="3",route="kube_default__myservice_k8__myservice_k8_stag_mydomain_com____myservice_k8_0__lb_group"}

Now I am expecting/assuming that when I run kube-metrics-adapter with configurations mentioned above , I should be able to see some metrics under api /apis/external.metrics.k8s.io/v1beta1 or /apis/custom.metrics.k8s.io/v1beta1
but its empty .

Remember myservice is NOT running any metrics exporter.

Specifications

Version:1.11
Platform: EKS
Subsystem:

Docs on using HTTPS with Pod collector

Expected Behavior

Add as an annotation:
metric-config.pods.requests-per-second.json-path/protocol: "https"
to have the pod collector reach out to an https pod endpoint.

Actual Behavior

I couldn't find a way to achieve this. If there is any way to have the hpa use https, please let me know. Thank you.

non sence target value compare to the value in the prometheus

Hi all
I have set up the external metrics
from the Prometheus the query eg X shows the 34 value
and I have set the AverageValue to 15 in hpa

target:
        type: AverageValue
        averageValue: 15

it works correctly and for the value 34 the HPA create 3 replica
but in kubernetes the HPA shows 10334m/15
how 34 is being converted to 10334m ?

this is my full HPA :

apiVersion: autoscaling/v2beta2
kind: HorizontalPodAutoscaler
metadata:
  namespace: test
  name: XX
  annotations:
prometheus.default.svc.cluster.local:9090
    metric-config.external.prometheus-query.prometheus/X: |
      X
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: test
  minReplicas: 2
  maxReplicas: 10
  metrics:
  - type: External
    external:
      metric:
        name: prometheus-query
        selector:
          matchLabels:
            query-name: X
      target:
        type: AverageValue
        averageValue: 15

Hard to distinguish which metrics query is in play ... maybe this is a feature than a defect

Expected Behavior

I have multiple external metrics in play similar to below,

    metric-config.object.istio-requests-error-rate.prometheus/query: |
      sum(rate(istio_requests_total{destination_workload="go-demo-7-primary",
               destination_workload_namespace="go-demo-7", reporter="destination",response_code=~"5.*"}[1m])) 
      / 
      sum(rate(istio_requests_total{destination_workload="go-demo-7-primary", 
               destination_workload_namespace="go-demo-7",reporter="destination"}[1m]) > 0)* 100
      or
      sum(rate(istio_requests_total{destination_workload="go-demo-7-primary", 
               destination_workload_namespace="go-demo-7",reporter="destination"}[1m])) > bool 0 * 100

    metric-config.external.prometheus-query.prometheus/istio-requests-per-replica: |
      sum(rate(istio_requests_total{destination_service_name="go-demo-7",destination_workload_namespace="go-demo-7",
                reporter="destination"}[1m])) 
      /
      count(count(container_memory_usage_bytes{namespace="go-demo-7",pod_name=~"go-demo-7-primary.*"}) by (pod_name))
    metric-config.external.prometheus-query.prometheus/istio-requests-average-resp-time: |
      sum(rate(istio_request_duration_seconds_sum{destination_workload="go-demo-7-primary", reporter="destination"}[1m])) 
      / 
      sum(rate(istio_request_duration_seconds_count{destination_workload="go-demo-7-primary", reporter="destination"}[1m]) > 0)
      or
      sum(rate(istio_request_duration_seconds_count{destination_workload="go-demo-7-primary", reporter="destination"}[1m])) 
      > bool 0

spec:
  maxReplicas: 10
  minReplicas: 1
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: go-demo-7-primary
  metrics:
  - type: External
    external:
      metric:
        name: prometheus-query
        selector:
          matchLabels:
            query-name: istio-requests-per-replica
      target:
        type: AverageValue
        value: 5
  - type: External
    external:
      metric:
        name: prometheus-query
        selector:
          matchLabels:
            query-name: istio-requests-average-resp-time
      target:
        type: Value
        value: 100m
  - type: Object
    object:
      metric:
        name: istio-requests-error-rate
      describedObject:
        apiVersion: v1 #make sure you check the api version on the targeted resource using get command.
        kind: Pod # note Pod can be used as resource kind for kube-metrics-adapter.
        name: go-demo-7-primary
      target:
        type: Value
        value: 5

Then when I describe HPA I would expect that I will be able to distinguish each metrics that are in play as below,

  "istio-requests-per-replica" (target value):                                    0 / 5
  "istio-requests-average-resp-time" (target value):                                    0 / 100m
  "istio-requests-error-rate" on Pod/go-demo-7-primary (target value):  0 / 5

Actual Behavior

I see below in HPA which is very hard to distinguish which external metrics is in play

Metrics:                                                                ( current / target )
  "prometheus-query" (target value):                                    0 / 5
  "prometheus-query" (target value):                                    0 / 100m
  "istio-requests-error-rate" on Pod/go-demo-7-primary (target value):  0 / 5

Steps to Reproduce the Problem

1.Install Kube-metrics-adaptor v0.1.1
1.Create an hpa with the above metrics
1.Describe HPA

Specifications

Version:

    --set image.repository=registry.opensource.zalan.do/teapot/kube-metrics-adapter \
    --set image.tag=v0.1.0

Platform:
AWS KOPS
Subsystem:

Possible to write HPA using nested prometheus metric ?

Expected Behavior

prometheus is setup and recording metrics from collectd using collectd to Prometheus and can be queried Prometheus like this which gives

curl prometheus.monitoring.svc:9090/api/v1/query?query="collectd_statsd_derive_total"

{
"status": "success",
"data": {
"resultType": "vector",
"result": [{
"metric": {
"name": "collectd_statsd_derive_total",
"exported_instance": "collectd-statsd-6f5475dd46-92fx4",
"instance": "collectd-statsd.monitoring.svc:9103",
"job": "collect-statsd",
"statsd": "prometheus_target_kbasync_lenh_sec"
},
"value": [1544343252.741, "6"]
}, {
"metric": {
"name": "collectd_statsd_derive_total",
"exported_instance": "collectd-statsd-6f5475dd46-92fx4",
"instance": "collectd-statsd.monitoring.svc:9103",
"job": "collect-statsd",
"statsd": "targetkb_kbasync_lenh_sec"
},
"value": [1544343252.741, "3"]
}, {
"metric": {
"name": "collectd_statsd_derive_total",
"exported_instance": "collectd-statsd-6f5475dd46-92fx4",
"instance": "collectd-statsd.monitoring.svc:9103",
"job": "collect-statsd",
"statsd": "targetkb_kbasync_lenh_sec_g"
},
"value": [1544343252.741, "240"]
}, {
"metric": {
"name": "collectd_statsd_derive_total",
"exported_instance": "collectd-statsd-6f5475dd46-92fx4",
"instance": "collectd-statsd.monitoring.svc:9103",
"job": "collect-statsd",
"statsd": "targetkb_kbasync_lenh_sec_gg"
},
"value": [1544343252.741, "3"]
}, {
"metric": {
"name": "collectd_statsd_derive_total",
"exported_instance": "collectd-statsd-6f5475dd46-92fx4",
"instance": "collectd-statsd.monitoring.svc:9103",
"job": "collect-statsd",
"statsd": "targetkb_kbasync_lenh_sec_ggg"
},
"value": [1544343252.741, "3"]
}]
}
}

My metric name is stored in key statsd key in above response returned by prometheus
eg :"statsd": "targetkb_kbasync_lenh_sec_ggg"

i want to write an HPA using metric name from key statsd

bug:crash when removing previously scheduled metrics collector and prometheus external metric is null

Expected Behavior

Don‘t crash

apiVersion: autoscaling/v2beta1
kind: HorizontalPodAutoscaler
metadata:
  name: "hpa-foo"
  namespace: default
  annotations:
    metric-config.external.prometheus-query.prometheus/ingress-foo-requests-per-second: |
      scalar(max(rate(nginx_ingress_controller_requests{ingress="ingress-foo", status="200"}[30s])))
spec:
  scaleTargetRef:
    apiVersion: extensions/v1beta1
    kind: Deployment
    name: "foo"
  minReplicas: 1
  maxReplicas: 2
  metrics:
  - type: External
    external:
      metricName: prometheus-query
      metircSelector:
        matchLabels:
          query-name: "ingress-foo-requests-per-second"
      targetAverageValue: 10k
  - type: Resource
    resource:
      name: cpu
      targetAverageUtilization: 80
  - type: Resource
    resource:
      name: memory
      targetAverageUtilization: 90

Actual Behavior

Crash.
The bug is try to remove a collector that do not exist
May be to make a jugement about whether collector exists or not .
Only reproduced when the prometheus metric is nil

kube-metrics-adapter log:

time="2019-07-26T09:24:29Z" level=info msg="Removing previously scheduled metrics collector: {hpa-2679 default}" provider=hpa
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x16ae6ea]

goroutine 128 [running]:
github.com/zalando-incubator/kube-metrics-adapter/pkg/collector.ParseHPAMetrics(0xc000228a80, 0xc0008aef50, 0x8, 0xc0008aef58, 0x7, 0x1)
        /workspace/pkg/collector/collector.go:212 +0x53a
github.com/zalando-incubator/kube-metrics-adapter/pkg/provider.(*HPAProvider).updateHPAs(0xc0000fef60, 0x1c148e0, 0xc0000fef60)
        /workspace/pkg/provider/hpa.go:145 +0x88c
github.com/zalando-incubator/kube-metrics-adapter/pkg/provider.(*HPAProvider).Run(0xc0000fef60, 0x1efd9a0, 0xc000581cc0)
        /workspace/pkg/provider/hpa.go:97 +0xfb
created by github.com/zalando-incubator/kube-metrics-adapter/pkg/server.AdapterServerOptions.RunCustomMetricsAdapterServer
        /workspace/pkg/server/start.go:219 +0x8d7

Steps to Reproduce the Problem

1.Create a hpa containing promethues metric
2.Make sure the prometheus metric is nil

Specifications

Version:lasted
Platform:k8s 1.13.7

Doesn't play well with another collector

Expected Behavior

kube-metrics-adapter should behave well in an environment with a different external or custom metrics provider.

Actual Behavior

kube-metrics-adapter will attempt modifying HPAs that have external or custom metrics when it is not configured as a metrics provider for that metric type.

Logs:

time="2019-11-26T10:00:23Z" level=info msg="Event(v1.ObjectReference{Kind:\"HorizontalPodAutoscaler\", Namespace:\"development\", Name:\"task-worker\", UID:\"8e7d8d12-0c84-11ea-8744-0a26919e3b5a\", APIVersion:\"autoscaling/v2beta2\", ResourceVersion:\"92543736\", FieldPath:\"\"}): type: 'Warning' reason: 'CreateNewMetricsCollector' Failed to create new metrics collector: no plugin found for {External {aws.sqs.sqs_messages_visible &LabelSelector{MatchLabels:map[string]string{aws_account_name: development,queue_name: tasks,region: eu-west-1,},MatchExpressions:[]LabelSelectorRequirement{},}}}"

Event on respective HPA:

  Type     Reason                     Age                   From                  Message
  ----     ------                     ----                  ----                  -------
  Warning  CreateNewMetricsCollector  94s (x1947 over 16h)  kube-metrics-adapter  Failed to create new metrics collector: no plugin found for {External {aws.sqs.sqs_messages_visible &LabelSelector{MatchLabels:map[string]string{aws_account_name: development,queue_name: tasks,region: eu-west-1,},MatchExpressions:[]LabelSelectorRequirement{},}}}

Steps to Reproduce the Problem

Install this project only as a Custom Metrics Provider
Install datadog-cluster-agent as an External Metrics Provider, for instance.
Configure a HPA targetting External Metrics

Specifications

Version: v0.0.5
Platform: Kubernetes
Subsystem: HPA provider

kube-metrics-adapter using multiple prometheus in different namespaces

Expected Behavior

When you have 2 HorizontalPodAutoscalers using annotations to specify the prometheus server (e.g. metric-config.external.prometheus-query.prometheus/prometheus-server: http://my-prometheus), metrics should be read from the prometheus server in the annotation.

Actual Behavior

The prometheus-server defined in the kube-metrics-server flags is used instead of using the value in the metric-config.external.prometheus-query.prometheus/prometheus-server annotation.

Steps to Reproduce the Problem

Deploy two prometheus servers in two different namespaces.
Create two HPAs using the metric-config.external.prometheus-query.prometheus/prometheus-server annotation. Each HPA should use one of the prometheus servers.
Verify that the annotation is not honored.

Specifications

Version: v0.0.4
Platform: kubernetes 1.14.6
Subsystem: -

Support for multiple external http metrics.

Hi,

Can we somehow get metrics from multiple external HTTP endpoints.
Something like below

apiVersion: autoscaling/v2beta2
kind: HorizontalPodAutoscaler
metadata:
  name: echo-server
  annotations:
    # metric-config.<metricType>.<metricName>.<collectorName>/<configKey>
    metric-config.external.http-0.json/json-key: "$stats"
    metric-config.external.http-0.json/endpoint: "http://www.mocky.io/v2/5eb3fd280e0000670008180c"
    metric-config.external.http-1.json/json-key: "$stats"
    metric-config.external.http-1.json/endpoint: "http://www.mocky.io/v2/5eb3fd280e0000670008180c"
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: echo-server
  minReplicas: 1
  maxReplicas: 10
  metrics:
  - type: External
    external:
      metric:
        name: http-0
        selector:
          matchLabels:
            identifier: kafka_queue_length
      target:
        averageValue: 1
        type: AverageValue
  - type: External
    external:
      metric:
        name: http-1
        selector:
          matchLabels:
            identifier: kafka_ingest_rate
      target:
        averageValue: 2
        type: AverageValue

Any chance to update the AWS Go SDK to support IAM For Service Accounts?

Any chance the AWS go SDK could be updated to support IAM for Service Accounts? Would go a long way for us EKS users or anyone running on AWS. That way we don't need to run Kube2IAM or KIAM in the cluster.

Thank you!

Prometheus External Metric not working

Expected Behavior

kube-metrics-adapter should be able to create a metrics collector based on the example in the readme.

Actual Behavior

kube-metrics-adapter logs Failed to create new metrics collector: no plugin found for {External prometheus-query} for my HPA.

Steps to Reproduce the Problem

I'm running kube-metrics-adapter from the banzaicloud chart at https://github.com/banzaicloud/banzai-charts/tree/master/kube-metrics-adapter
I have the following HPA:

---
apiVersion: autoscaling/v2beta1
kind: HorizontalPodAutoscaler
metadata:
  name: selenium-node-chrome-hpa
  namespace: selenium-grid-demo
  annotations:
    "metric-config.external.prometheus-query.prometheus/selenium-grid-node-chrome-ready-count": |-
      sum(selenium_grid_node_ready{app="selenium-node-chrome",namespace="selenium-grid-demo"})
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: selenium-node-chrome
  minReplicas: 2
  maxReplicas: 20
  metrics:
  - type: External
    external:
      metricName: prometheus-query
      metricSelector:
        matchLabels:
          query-name: selenium-grid-node-chrome-ready-count
      targetAverageValue: 5

Verified that I'm not getting any metrics

$ kubectl get --raw "/apis/external.metrics.k8s.io/v1beta1" | jq .
{
  "kind": "APIResourceList",
  "apiVersion": "v1",
  "groupVersion": "external.metrics.k8s.io/v1beta1",
  "resources": []
}

Cant retrieve Custom Metrics

Expected Behavior

Custom metrics exposed by spring boot actuator metric is read

Actual Behavior

Unable to get metrics

Steps to Reproduce the Problem

Deploy spring boot application in k8s using the dockerhub image https://cloud.docker.com/repository/docker/vinaybalamuru/spring-boot-hpa

kubectl run springboot-webapp --image=vinaybalamuru/spring-boot-hpa  --requests=cpu=200m --limits=cpu=500m --expose --port=7070

Install kube metrics adapter (custom and external etc). The adapter currently works for external Prometheus queries

sh-4.2$ kubectl get apiservices |grep metrics
v1beta1.custom.metrics.k8s.io          kube-system/kube-metrics-adapter   True        18h
v1beta1.external.metrics.k8s.io        kube-system/kube-metrics-adapter   True        5d23h
v1beta1.metrics.k8s.io                 kube-system/metrics-server         True        9d
sh-4.2$ kubectl get po -n kube-system |grep metrics
kube-metrics-adapter-55cbb64dc9-g22pb   1/1     Running   0          3d
metrics-server-9cb648b76-4j6tf          1/1     Running   2          9d

Create a HPA associated with springboot-webapp and intended to scale on metric load per minute

cat <<EOF | kubectl apply -f -
apiVersion: autoscaling/v2beta1
kind: HorizontalPodAutoscaler
metadata:
  name: springboot-custom-hpa
  namespace: default
  labels:
    application: custom-metrics-consumer
  annotations:
    # metric-config.<metricType>.<metricName>.<collectorName>/<configKey>
    metric-config.pods.load-per-min.json-path/json-key: "$.measurements[:1].value"
    metric-config.pods.load-per-min.json-path/path: /actuator/metrics/system.load.average.1m
    metric-config.pods.load-per-min.json-path/port: "7070"
    metric-config.pods.load-per-min.json-path/scheme: "http"
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: springboot-webapp
  minReplicas: 1
  maxReplicas: 10
  metrics:
  - type: Pods
    pods:
      metricName: load-per-min
      targetAverageValue: 1
EOF

sh-4.2$ kubectl get hpa                  
NAME                                     REFERENCE                      TARGETS             MINPODS   MAXPODS   REPLICAS   AGE
springboot-custom-hpa                    Deployment/springboot-webapp   <unknown>/1         1         10        2          18h
sh-4.2$

Describing the HPA. it looks like the HPA couldn't read the custom metrics

$ kubectl describe hpa springboot-custom-hpa 
Name:                      springboot-custom-hpa
Namespace:                 default
Labels:                    application=custom-metrics-consumer
Annotations:               kubectl.kubernetes.io/last-applied-configuration:
                             {"apiVersion":"autoscaling/v2beta1","kind":"HorizontalPodAutoscaler","metadata":{"annotations":{"metric-config.pods.load-per-min.json-path...
                           metric-config.pods.load-per-min.json-path/json-key: $.measurements[:1].value
                           metric-config.pods.load-per-min.json-path/path: /actuator/metrics/system.load.average.1m
                           metric-config.pods.load-per-min.json-path/port: 7070
                           metric-config.pods.load-per-min.json-path/scheme: http
CreationTimestamp:         Wed, 21 Aug 2019 17:03:16 -0500
Reference:                 Deployment/springboot-webapp
Metrics:                   ( current / target )
  "load-per-min" on pods:  <unknown> / 1
Min replicas:              1
Max replicas:              10
Deployment pods:           2 current / 2 desired
Conditions:
  Type            Status  Reason               Message
  ----            ------  ------               -------
  AbleToScale     True    SucceededGetScale    the HPA controller was able to get the target's current scale
  ScalingActive   False   FailedGetPodsMetric  the HPA was unable to compute the replica count: unable to get metric load-per-min: no metrics returned from custom metrics API
  ScalingLimited  True    TooFewReplicas       the desired replica count is increasing faster than the maximum scale rate
Events:
  Type     Reason                        Age                  From                       Message
  ----     ------                        ----                 ----                       -------
  Warning  FailedGetPodsMetric           53m (x9 over 55m)    horizontal-pod-autoscaler  unable to get metric queue-length: no metrics returned from custom metrics API
  Warning  FailedComputeMetricsReplicas  53m (x9 over 55m)    horizontal-pod-autoscaler  Invalid metrics (1 invalid out of 1), last error was: failed to get object metric value: unable to get metric queue-length: no metrics returned from custom metrics API
  Warning  FailedComputeMetricsReplicas  53m (x3 over 53m)    horizontal-pod-autoscaler  Invalid metrics (1 invalid out of 1), last error was: failed to get object metric value: unable to get metric load-per-min: no metrics returned from custom metrics API
  Warning  FailedGetPodsMetric           54s (x209 over 53m)  horizontal-pod-autoscaler  unable to get metric load-per-min: no metrics returned from custom metrics API

I know that I can get into a pod and retrieve metrics from the service endpoint so I'm not sure where things are going wrong
eg

/ # wget -q -O- http://springboot-webapp.default.svc.cluster.local:7070/actuator/metrics/system.load.average.1m
{
  "name" : "system.load.average.1m",
  "description" : "The sum of the number of runnable entities queued to available processors and the number of runnable entities running on the available processors averaged over a period of time",
  "baseUnit" : null,
  "measurements" : [ {
    "statistic" : "VALUE",
    "value" : 0.0439453125
  } ],
  "availableTags" : [ ]
}/ #

Specifications

Version:

  kube-metrics-adapter:
    Container ID:  docker://5b32f88953acbb290a22279899d10f853dabc1d7d9a0dd582c9652e32ee61295
    Image:         registry.opensource.zalan.do/teapot/kube-metrics-adapter:latest
    Image ID:      docker-pullable://registry.opensource.zalan.do/teapot/kube-metrics-adapter@sha256:12bd1e57c8448ed935a876959b143827b1f8f070d7

Platform:
K8s 1.15
Subsystem:

Crash in Kube Controller Manager

Actual Behavior

Kube-Metrics-adapter produces kernel Panic in Kube-Controller

Steps to Reproduce the Problem

Configured Secret with serving.crt and serving.key that has common_name kube-metrics-adapter and alt_names kube-metrics-adapter.kube-system,kube-metrics-adapter.kube-system.svc,kube-metrics-adapter.kube-system.svc.cluster.local
Create all the files as described in docs, but remove --skipper-ingress-metrics --aws-external-metrics and add instead --tls-cert-file=/var/run/serving-cert/serving.crt, --tls-private-key-file=/var/run/serving-cert/serving.key and ving-cert/serving.key
Now you can see the following logs for kube-metrics-deployment:

time="2019-03-24T15:24:44Z" level=info msg="Looking for HPAs" provider=hpa
I0324 15:24:44.156768       1 serve.go:96] Serving securely on [::]:443
time="2019-03-24T15:24:44Z" level=info msg="Found 6 new/updated HPA(s)" provider=hpa
time="2019-03-24T15:24:44Z" level=info msg="Event(v1.ObjectReference{Kind:\"HorizontalPodAutoscaler\", Namespace:\"microservices\", Name:\"ms-1\", UID:\"f861d0ed-4ce2-11e9-b661-025217a46e36\", APIVersion:\"autoscaling/v2beta1\", ResourceVersion:\"2296595\", FieldPath:\"\"}): type: 'Warning' reason: 'CreateNewMetricsCollector' Failed to create new metrics collector: format '' not supported"
E0324 15:24:51.243240       1 writers.go:149] apiserver was unable to write a JSON response: expected pointer, but got nil
E0324 15:24:51.243267       1 status.go:64] apiserver received an error that is not an metav1.Status: &errors.errorString{s:"expected pointer, but got nil"}

And for Kube-Controller-Manager:

I0324 15:24:37.134887       6 replica_set.go:477] Too few replicas for ReplicaSet kube-system/kube-metrics-adapter-f6cb64c84, need 1, creating 1
I0324 15:24:37.138649       6 event.go:221] Event(v1.ObjectReference{Kind:"Deployment", Namespace:"kube-system", Name:"kube-metrics-adapter", UID:"ef50f882-4e48-11e9-b661-025217a46e36", APIVersion:"apps/v1", ResourceVersion:"2300163", FieldPath:""}): type: 'Normal' reason: 'ScalingReplicaSet' Scaled up replica set kube-metrics-adapter-f6cb64c84 to 1
I0324 15:24:37.150869       6 deployment_controller.go:484] Error syncing deployment kube-system/kube-metrics-adapter: Operation cannot be fulfilled on deployments.apps "kube-metrics-adapter": the object has been modified; please apply your changes to the latest version and try again
I0324 15:24:37.161632       6 event.go:221] Event(v1.ObjectReference{Kind:"ReplicaSet", Namespace:"kube-system", Name:"kube-metrics-adapter-f6cb64c84", UID:"ef530b22-4e48-11e9-b661-025217a46e36", APIVersion:"apps/v1", ResourceVersion:"2300164", FieldPath:""}): type: 'Normal' reason: 'SuccessfulCreate' Created pod: kube-metrics-adapter-f6cb64c84-jt5sf
W0324 15:24:37.731700       6 garbagecollector.go:647] failed to discover some groups: map[custom.metrics.k8s.io/v1beta1:the server is currently unable to handle the request external.metrics.k8s.io/v1beta1:the server is currently unable to handle the request]
I0324 15:24:51.233692       6 horizontal.go:777] Successfully updated status for istio-telemetry-autoscaler
E0324 15:24:51.245935       6 runtime.go:69] Observed a panic: &runtime.TypeAssertionError{_interface:(*runtime._type)(0x334f2e0), concrete:(*runtime._type)(0x39236e0), asserted:(*runtime._type)(0x390c980), missingMethod:""} (interface conversion: runtime.Object is *v1.Status, not *v1beta2.MetricValueList)
/workspace/anago-v1.13.4-beta.0.55+c27b913fddd1a6/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:76
/workspace/anago-v1.13.4-beta.0.55+c27b913fddd1a6/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:65
/workspace/anago-v1.13.4-beta.0.55+c27b913fddd1a6/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:51
/usr/local/go/src/runtime/asm_amd64.s:522
/usr/local/go/src/runtime/panic.go:513
/usr/local/go/src/runtime/iface.go:248
/usr/local/go/src/runtime/iface.go:258
/workspace/anago-v1.13.4-beta.0.55+c27b913fddd1a6/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/k8s.io/metrics/pkg/client/custom_metrics/versioned_client.go:269
/workspace/anago-v1.13.4-beta.0.55+c27b913fddd1a6/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/k8s.io/metrics/pkg/client/custom_metrics/multi_client.go:136
/workspace/anago-v1.13.4-beta.0.55+c27b913fddd1a6/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/pkg/controller/podautoscaler/metrics/rest_metrics_client.go:113
/workspace/anago-v1.13.4-beta.0.55+c27b913fddd1a6/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/pkg/controller/podautoscaler/replica_calculator.go:158
/workspace/anago-v1.13.4-beta.0.55+c27b913fddd1a6/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/pkg/controller/podautoscaler/horizontal.go:347
/workspace/anago-v1.13.4-beta.0.55+c27b913fddd1a6/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/pkg/controller/podautoscaler/horizontal.go:274
/workspace/anago-v1.13.4-beta.0.55+c27b913fddd1a6/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/pkg/controller/podautoscaler/horizontal.go:550
/workspace/anago-v1.13.4-beta.0.55+c27b913fddd1a6/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/pkg/controller/podautoscaler/horizontal.go:318
/workspace/anago-v1.13.4-beta.0.55+c27b913fddd1a6/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/pkg/controller/podautoscaler/horizontal.go:210
/workspace/anago-v1.13.4-beta.0.55+c27b913fddd1a6/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/pkg/controller/podautoscaler/horizontal.go:198
/workspace/anago-v1.13.4-beta.0.55+c27b913fddd1a6/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/pkg/controller/podautoscaler/horizontal.go:164
/workspace/anago-v1.13.4-beta.0.55+c27b913fddd1a6/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:133
/workspace/anago-v1.13.4-beta.0.55+c27b913fddd1a6/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:134
/workspace/anago-v1.13.4-beta.0.55+c27b913fddd1a6/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:88
/usr/local/go/src/runtime/asm_amd64.s:1333
panic: interface conversion: runtime.Object is *v1.Status, not *v1beta2.MetricValueList [recovered]
	panic: interface conversion: runtime.Object is *v1.Status, not *v1beta2.MetricValueList

goroutine 3060 [running]:
k8s.io/kubernetes/vendor/k8s.io/apimachinery/pkg/util/runtime.HandleCrash(0x0, 0x0, 0x0)
	/workspace/anago-v1.13.4-beta.0.55+c27b913fddd1a6/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:58 +0x108
panic(0x3315300, 0xc006b86030)
	/usr/local/go/src/runtime/panic.go:513 +0x1b9
k8s.io/kubernetes/vendor/k8s.io/metrics/pkg/client/custom_metrics.(*namespacedMetrics).GetForObjects(0xc006916660, 0x0, 0x0, 0x3a216bc, 0x3, 0x3efb4a0, 0xc005cb7e00, 0xc005614e20, 0x19, 0x3efb500, ...)
	/workspace/anago-v1.13.4-beta.0.55+c27b913fddd1a6/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/k8s.io/metrics/pkg/client/custom_metrics/versioned_client.go:269 +0x4c6
k8s.io/kubernetes/vendor/k8s.io/metrics/pkg/client/custom_metrics.(*multiClientInterface).GetForObjects(0xc004c448b0, 0x0, 0x0, 0x3a216bc, 0x3, 0x3efb4a0, 0xc005cb7e00, 0xc005614e20, 0x19, 0x3efb500, ...)
	/workspace/anago-v1.13.4-beta.0.55+c27b913fddd1a6/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/k8s.io/metrics/pkg/client/custom_metrics/multi_client.go:136 +0x118
k8s.io/kubernetes/pkg/controller/podautoscaler/metrics.(*customMetricsClient).GetRawMetric(0xc0003f44d0, 0xc005614e20, 0x19, 0xc0047b20a0, 0xd, 0x3efb4a0, 0xc005cb7e00, 0x3efb500, 0x6747b10, 0x0, ...)
	/workspace/anago-v1.13.4-beta.0.55+c27b913fddd1a6/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/pkg/controller/podautoscaler/metrics/rest_metrics_client.go:113 +0x102
k8s.io/kubernetes/pkg/controller/podautoscaler.(*ReplicaCalculator).GetMetricReplicas(0xc00095af40, 0x1, 0x2710, 0xc005614e20, 0x19, 0xc0047b20a0, 0xd, 0x3efb4a0, 0xc005cb7e00, 0x3efb500, ...)
	/workspace/anago-v1.13.4-beta.0.55+c27b913fddd1a6/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/pkg/controller/podautoscaler/replica_calculator.go:158 +0xb0
k8s.io/kubernetes/pkg/controller/podautoscaler.(*HorizontalController).computeStatusForPodsMetric(0xc0005f2780, 0x1, 0xc0064a8918, 0x4, 0x0, 0xc004d34cc0, 0x0, 0x0, 0xc0056ed340, 0x3efb4a0, ...)
	/workspace/anago-v1.13.4-beta.0.55+c27b913fddd1a6/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/pkg/controller/podautoscaler/horizontal.go:347 +0xd8
k8s.io/kubernetes/pkg/controller/podautoscaler.(*HorizontalController).computeReplicasForMetrics(0xc0005f2780, 0xc0056ed340, 0xc00690e500, 0xc004802cf0, 0x1, 0x1, 0x11, 0x3af8df0, 0x3d, 0x0, ...)
	/workspace/anago-v1.13.4-beta.0.55+c27b913fddd1a6/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/pkg/controller/podautoscaler/horizontal.go:274 +0xb10
k8s.io/kubernetes/pkg/controller/podautoscaler.(*HorizontalController).reconcileAutoscaler(0xc0005f2780, 0xc0002dab30, 0xc0046c96a0, 0x12, 0x0, 0x0)
	/workspace/anago-v1.13.4-beta.0.55+c27b913fddd1a6/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/pkg/controller/podautoscaler/horizontal.go:550 +0x1678
k8s.io/kubernetes/pkg/controller/podautoscaler.(*HorizontalController).reconcileKey(0xc0005f2780, 0xc0046c96a0, 0x12, 0x30439c0, 0xc004798750, 0x0)
	/workspace/anago-v1.13.4-beta.0.55+c27b913fddd1a6/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/pkg/controller/podautoscaler/horizontal.go:318 +0x278
k8s.io/kubernetes/pkg/controller/podautoscaler.(*HorizontalController).processNextWorkItem(0xc0005f2780, 0x3eb7700)
	/workspace/anago-v1.13.4-beta.0.55+c27b913fddd1a6/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/pkg/controller/podautoscaler/horizontal.go:210 +0xdf
k8s.io/kubernetes/pkg/controller/podautoscaler.(*HorizontalController).worker(0xc0005f2780)
	/workspace/anago-v1.13.4-beta.0.55+c27b913fddd1a6/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/pkg/controller/podautoscaler/horizontal.go:198 +0x2b
k8s.io/kubernetes/pkg/controller/podautoscaler.(*HorizontalController).worker-fm()
	/workspace/anago-v1.13.4-beta.0.55+c27b913fddd1a6/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/pkg/controller/podautoscaler/horizontal.go:164 +0x2a
k8s.io/kubernetes/vendor/k8s.io/apimachinery/pkg/util/wait.JitterUntil.func1(0xc004819940)
	/workspace/anago-v1.13.4-beta.0.55+c27b913fddd1a6/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:133 +0x54
k8s.io/kubernetes/vendor/k8s.io/apimachinery/pkg/util/wait.JitterUntil(0xc004819940, 0x3b9aca00, 0x0, 0x1, 0xc000394900)
	/workspace/anago-v1.13.4-beta.0.55+c27b913fddd1a6/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:134 +0xbe
k8s.io/kubernetes/vendor/k8s.io/apimachinery/pkg/util/wait.Until(0xc004819940, 0x3b9aca00, 0xc000394900)
	/workspace/anago-v1.13.4-beta.0.55+c27b913fddd1a6/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:88 +0x4d
created by k8s.io/kubernetes/pkg/controller/podautoscaler.(*HorizontalController).Run
	/workspace/anago-v1.13.4-beta.0.55+c27b913fddd1a6/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/pkg/controller/podautoscaler/horizontal.go:164 +0x1c6

Specifications

Version: registry.opensource.zalan.do/teapot/kube-metrics-adapter:latest
Platform: Kubernetes 1.13.4
Subsystem: Ubuntu

OpenAPI spec does not exist

Hi guys! More of an annoyance than anything else really, but would nice if we could avoid having the API server stop spewing logs about "OpenAPI spec does not exist" for the v1beta1.external.metrics.k8s.io and v1beta1.custom.metrics.k8s.io objects.

Expected Behavior

API server logs not full of messages like:

1 controller.go:114] loading OpenAPI spec for "v1beta1.external.metrics.k8s.io" failed with: OpenAPI spec does not exist
1 controller.go:114] loading OpenAPI spec for "v1beta1.custom.metrics.k8s.io" failed with: OpenAPI spec does not exist

Actual Behavior

API server logs full of messages like that.

I guess the best course of action would be to expose OpenAPI specs for those 2 objects - found this commit elsewhere which seemed to detail a similar annoyance.

Anyway, thanks for kube-metrics-adapter!

To-dos

CONTRIBUTING.md updated
CONTRIBUTORS.md updated with names of external contributors
CODEOWNERS updated with usernames of who review which PRs
MAINTAINERS updated with team member contact info
CODE_OF_CONDUCT.md reviewed
SECURITY.md reviewed
Pull request template reviewed
Issue template reviewed
Readme updated

Feature request - configure interval between scraping

Today

Today it seems that metrics are collected every 60 seconds (based on what I see in the logs from kube-metric-adapter)
This means that there is as much as 60 seconds delay before HPA gets a new metric to act on.
For services that need to scale up fast, that is adding a lot of delay.

Request

It would help to be able to configure this, so that HPA works faster.
In our case we collect CPU metrics every 15 seconds, so querying Prometheus every 15 seconds or even more often would be useful.
You of course need to be aware of not running heavy querys often, but in our case the query is very lightweight and we only have one HPA configured.

Specifications

Version: :latest
Platform: Kubernetes
Subsystem: Prometheus

v0.1.3 requires additional rbac permissions

We just upgraded from 0.1.0 to 0.1.3 and started seeing errors in our logs like:

E0413 20:07:05.845509 1 reflector.go:153] pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:105: Failed to list *v1.ConfigMap: configmaps "extension-apiserver-authentication" is forbidden: User "system:serviceaccount:kube-system:custom-metrics-apiserver" cannot list resource "configmaps" in API group "" in the namespace "kube-system"

I'm not sure what changed, but adding this apiGroups section to the rules for custom-metrics-resource-collector fixed it for us:

apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: custom-metrics-resource-collector
rules:
...
- apiGroups:
  - ""
  resources:
  - configmaps
  verbs:
  - list
  - watch

Expected Behavior

There should be no errors/warnings in the logs.

Actual Behavior

See above logs. This caused the HorizontalPodAutoscalers to fail.

Steps to Reproduce the Problem

Create installation following the configuration in the docs/ folder.
Use either AWS SQS queue or HTTP collector (following docs in the README)
Look at the logs on the kube-metrics-adapter pod.

Specifications

Version: Kubernetes v.15.17

Metrics server seems to fail to pull metrics on eks

Expected Behavior

Metrics should be pulled .

Actual Behavior

  "istio-requests-error-rate" on Pod/go-demo-7-app (target value):        <unknown>/ 100m
  "istio-requests-max-resp-time" on Pod/go-demo-7-app (target value):      <unknown> / 500m
  "istio-requests-average-resp-time" on Pod/go-demo-7-app (target value):  <unknown> / 250m
  "istio-requests-per-replica" on Pod/go-demo-7-app (target value):        <unknown> / 5

Steps to Reproduce the Problem

annotations:
   metric-config.object.istio-requests-error-rate.prometheus/query: |
     (sum(rate(istio_requests_total{destination_workload=~"go-demo-7-app.*",
              destination_workload_namespace="go-demo-7", reporter="destination",response_code=~"5.*"}[5m])) 
     / 
     sum(rate(istio_requests_total{destination_workload=~"go-demo-7-app.*", 
              destination_workload_namespace="go-demo-7",reporter="destination"}[5m]))) > 0 or on() vector(0)
   metric-config.object.istio-requests-per-replica.prometheus/query: |
     sum(rate(istio_requests_total{destination_workload=~"go-demo-7-app.*",destination_workload_namespace="go-demo-7",
               reporter="destination"}[5m])) 
     /
     count(count(container_memory_usage_bytes{namespace="go-demo-7",pod=~"go-demo-7-app.*"}) by (pod))
   metric-config.object.istio-requests-average-resp-time.prometheus/query: | 
     (sum(rate(istio_request_duration_milliseconds_sum{destination_workload=~"go-demo-7-app.*", reporter="destination"}[5m])) 
     / 
     sum(rate(istio_request_duration_milliseconds_count{destination_workload=~"go-demo-7-app.*", reporter="destination"}[5m])))/1000 > 0 or on() vector(0)
   metric-config.object.istio-requests-max-resp-time.prometheus/query: |
     histogram_quantile(0.95, 
                 sum(irate(istio_request_duration_milliseconds_bucket{destination_workload=~"go-demo-7-app.*"}[1m])) by (le))/1000 > 0  or on() vector(0)

Specifications

Version:

kubectl version
Client Version: version.Info{Major:"1", Minor:"18", GitVersion:"v1.18.6", GitCommit:"dff82dc0de47299ab66c83c626e08b245ab19037", GitTreeState:"clean", BuildDate:"2020-07-16T00:04:31Z", GoVersion:"go1.14.4", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"16+", GitVersion:"v1.16.8-eks-fd1ea7", GitCommit:"fd1ea7c64d0e3ccbf04b124431c659f65330562a", GitTreeState:"clean", BuildDate:"2020-05-28T19:06:00Z", GoVersion:"go1.13.8", Compiler:"gc", Platform:"linux/amd64"}

Platform:
EKS
Subsystem:

    --set image.repository=registry.opensource.zalan.do/teapot/kube-metrics-adapter \
    --set image.tag=v0.1.5

logs shows...

**1 reflector.go:307] pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:105: Failed to watch *v1.ConfigMap: unknown (get configmaps)
kube-metrics-adapter-7b79498f9-7b8rt kube-metrics-adapter E0717 03:49:16.970700       1 reflector.go:307] pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:105: Failed to watch *v1.ConfigMap: unknown (get configmaps)
kube-metrics-adapter-7b79498f9-7b8rt kube-metrics-adapter E0717 03:49:17.972675       1 reflector.go:307] pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:105: Failed to watch *v1.ConfigMap: unknown (get configmaps)
kube-metrics-adapter-7b79498f9-7b8rt kube-metrics-adapter E0717 03:49:17.973213**

works fine with,

    **--set image.repository=registry.opensource.zalan.do/teapot/kube-metrics-adapter \
    --set image.tag=v0.1.0**

Errors in logs when attempting to query a label value which doesn't exist

When using an external metrics with a query such as the following:

http_requests_total{status="5xx"}

If no 5xx events have been registered yet, we get the following in the logs:

level=error msg="Failed to collect metrics: query 'http_requests_total{status=\"5xx\"}\n' did not result a valid response" provider=hpa

Perhaps this should be a warning instead of error? Considering that this label value can be updated at any time.

ParseHPAMetrics in package collector may modify MatchLabels for external metric

Expected Behavior

The HPAProvider caches HPA resources and recreates associated metrics collectors, should the HPA resource change. I expect it to not recreate collectors if the HPA didn't change.

Actual Behavior

HPAs with external metric collectors (e.g. Prometheus) are modified at each step (call of updateHPAs), thus bypassing the caching logic which causes the recreation of the metrics collectors.

This happens because of the way the metric config is created for external metrics in ParseHPAMetrics.
The Config field is set to the address of the MatchLabels map in the HPA resource object: https://github.com/zalando-incubator/kube-metrics-adapter/blob/master/pkg/collector/collector.go#L216
Later (https://github.com/zalando-incubator/kube-metrics-adapter/blob/master/pkg/collector/collector.go#L227) this map is modified, thus modifying the HPA resource object.

The fix, in my opinion, would be to perform a copy of the MatchLabels map to the Config field.

Steps to Reproduce the Problem

Create a HPA with a Prometheus external metric.
Observe the kube-metrics-adapter logs. They recreate the metrics collector at each step. E.g.

time="2019-11-07T05:53:46Z" level=info msg="Looking for HPAs" provider=hpa
time="2019-11-07T05:53:46Z" level=info msg="Removing previously scheduled metrics collector: {xxx yyy}" provider=hpa
time="2019-11-07T05:53:46Z" level=info msg="Adding new metrics collector: *collector.PrometheusCollector" provider=hpa
time="2019-11-07T05:53:46Z" level=info msg="Found 1 new/updated HPA(s)" provider=hpa
time="2019-11-07T05:53:46Z" level=info msg="stopping collector runner..."
time="2019-11-07T05:53:46Z" level=info msg="Collected 1 new metric(s)" provider=hpa
time="2019-11-07T05:53:46Z" level=info msg="Collected new external metric 'prometheus-query' (99) [test=scalar(vector(99.0)),query-name=test]" provider=hpa

Specifications

Version: v0.0.4
Platform: linux amd64
Subsystem: package collector

zalando-incubator / kube-metrics-adapter Goto Github PK

kube-metrics-adapter's Introduction

kube-metrics-adapter

Kubernetes compatibility

Building

Install in Kubernetes

Collectors

Pod collector

Supported HPA scaleTargetRef

Supported metrics

Example

Prometheus collector

Supported metrics

Example: External Metric

Example: Object Metric [DEPRECATED]

Skipper collector

Supported metrics

Example

Ingress

RouteGroup

Metric weighting based on backend

External RPS collector

Supported metrics

Example: External Metric

Multiple hostnames per metric

Metric weighting based on backend

InfluxDB collector

Supported metrics

Example: External Metric

AWS collector

AWS IAM role

Supported metrics

Example

ZMON collector

Supported metrics

Example

Nakadi collector

Supported metrics

HTTP Collector

Supported metrics

Example

Scrape Interval

ScalingSchedule Collectors

Supported metrics

Ramp-up and ramp-down feature

Example

kube-metrics-adapter's People

Contributors

Stargazers

Watchers

Forkers

kube-metrics-adapter's Issues

Expected Behavior

Actual Behavior

Expected Behavior

Actual Behavior

Steps to Reproduce the Problem

Specifications

Expected Behavior

Actual Behavior

Steps to Reproduce the Problem

Specifications

Additional Logs:

TODO

Expected Behavior

Actual Behavior

Specifications

Expected Behavior

Actual Behavior

Steps to Reproduce the Problem

Specifications

Expected Behavior

Actual Behavior

Steps to Reproduce the Problem

Specifications

Expected Behavior

Actual Behavior

Steps to Reproduce the Problem

Expected Behavior

Actual Behavior

Supported HPA `scaleTargetRef`