googlecloudplatform / k8s-stackdriver Goto Github PK
View Code? Open in Web Editor NEWLicense: Apache License 2.0
License: Apache License 2.0
I have a GKE cluster in a google project, and I have Google Cloud resources (pubsub, bigtable) in a different project.
I would like to autoscale using the external metrics api based on those resources.
Is it possible to pass the project id to the adapter? My main project already gets stackdriver metrics from all other projects.
Thank you.
Hi, I have this DaemonSet :
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: {{ template "prometheus.fullname" . }}
labels:
app: {{ template "prometheus.name" . }}
chart: {{ template "qpipeline.chart" . }}
release: {{ .Release.Name }}
heritage: {{ .Release.Service }}
spec:
selector:
matchLabels:
monitor: kamon-to-prometheus
template:
metadata:
labels:
monitor: kamon-to-prometheus
spec:
containers:
- name: prometheus-to-sd
image: {{ .Values.stackdriver.image }}
command: ["/monitor", "--stackdriver-prefix={{ .Values.stackdriver.prefix }}",
"--dynamic-source=mix:http://:{{ .Values.stackdriver.port }}{{ .Values.stackdriver.endpoint }}?podIdLabel=kamon-to-prometheus&namespaceIdLabel=default",
"--namespace-id=default"]
To run in default
namespace and it should resolve pods also in default
namespace, but I cannot force it and I get :
main.go:123] pods is forbidden: User "system:serviceaccount:default:default" cannot list pods in the namespace "kube-system": Unknown user "system:serviceaccount:default:default"
which means it tries to do service discovery in the kube-system
instead of default
... It is hardcoded here for kube-system
I mean, I'm running this DaemonSet in default
namespace and all pods that it should discover lives also in default
namespace, but it has hardcoded kube-system
namespace. Shouldn't it use the --namespace-id
flag instead of it being a constant?
k8s-stackdriver/event-exporter/sinks/stackdriver/sink.go
Lines 101 to 114 in c94c033
I'm trying to better understand the behavior when the watcher sends an UPDATE
event.
As per my understanding of the comment, if the newEvent.Count
is greater than 1 when compared to oldEvent.Count
we don't want to send a logEntry
to Stackdriver. However, the if
block doesn't have a return
statement to break out.
It looks like we are sending the logEntry
/newEvent
regardless of the count.
Is this the expected behavior?
They need to be included in Kubernetes 1.10. This is to include PRs: #95 and #96
cc @tallclair
From logs
W0604 12:26:23.181684 1 poll.go:46] Failed to create time series request: Failed to translate data from summary &{***}: UsageNanoCores missing from CPUStats &{2018-06-04 12:26:23 +0000 UTC <nil> 0xc420280b78}
kubectl version
Client Version: version.Info{Major:"1", Minor:"8+", GitVersion:"v1.8.8-gke.0", GitCommit:"6e5b33a290a99c067003632e0fd6be0ead48b233", GitTreeState:"clean", BuildDate:"2018-02-16T18:28:23Z", GoVersion:"go1.8.3b4", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"8+", GitVersion:"v1.8.8-gke.0", GitCommit:"6e5b33a290a99c067003632e0fd6be0ead48b233", GitTreeState:"clean", BuildDate:"2018-02-16T18:26:58Z", GoVersion:"go1.8.3b4", Compiler:"gc", Platform:"linux/amd64"}
kubelet-to-gcm version: 1.2.4
It looks like event-exporter only requires a base image to provide ca-certificates. If that's the case, the image should be rebased on scratch gcr.io/distroless/static:latest
,
EDIT: new recommendation is gcr.io/distroless/static over scratch. See kubernetes/kubernetes#70249
Currently prom-to-sd is exporting every batch of scraped metrics. We could decouple these two things, so that a number of scrapes would accumulate a state in memory, which then would be exported with smaller frequency. This will allow better precision of measuring metrics (e.g. when a pod dies between exports), without increasing the frequency of exports.
Hi
I'm running a customized fluentd-gcp image (it just adds the fluent-plugin-rewrite-tag-filter plugin). I had to upgrade to the latest fluentd-gcp image recently because fluentd started crashing on gke with the version we were running. However I'm having trouble using the latest fluentd-gcp image (2.0.18). When I run fluentd with fluent-plugin-rewrite-tag-filter ~>2.1 installed, it fails on dependency conflict
Unable to activate fluent-plugin-systemd-0.0.11, because fluentd-1.2.5 conflicts with fluentd (~> 0.12) path=nil error_class=Gem::ConflictError error="Unable to activate fluent-plugin-systemd-0.0.11, because fluentd-1.2.5 conflicts with fluentd (~> 0.12)
I'm not sure how much testing is required to upgrade the fluent-plugin-systemd plugin to latest?
I see that the prometheus metric scraper doesn't handle any form of authentication.
Is there any guidance to do this, or where I can start to implement handling of a bearer_token
?
When running the direct-to-sd example on my GKE cluster I get the following error:
2018/03/02 10:27:36 Failed to write time series data: googleapi: Error 500: One or more TimeSeries could not be written:
An internal error occurred: timeSeries[0], backendError
Could this be related to the new stackdriver model mentioned in #86 ?
Hi,
I can not access stackdriver metrics through http://localhost:8080/apis/external.metrics.k8s.io/v1beta1
Any ideas ?
Thank you.
Hi
I've to try to use new Stackdriver Kubernetes Beta monitoring for GKE cluster (region: us-east4) in the zonal cluster everything works great. All resources (pods, nodes, namespaces, services, deployments etc.) shows in Stackdriver -> Resources -> Kubernets Beta. But if I try to use GKE in the regional cluster or multi-zone there is only aggregated metrics for namespaces and nothing about pods or nodes metrics.
I didn't find any cases or bug reports about this that's why I've opened the issue here. If this case should be redirected to GCP Support Team please let me know.
Thank you.
I'm pretty sure that this image is hosted on gcr.io
now and that the location listed in the README file is out of date.
The current location seems to be:
e.g.
gcr.io/google-containers/fluentd-gcp
As there is a 2.0.2 tag at that location that was updated within the last 6 months. Where as the old location has not been updated for two years. https://hub.docker.com/r/kubernetes/fluentd-gcp/tags/
A general overhaul of the README file would be great.
Right now the path to /metrics
is hardcoded in the prometheus-to-sd's source (and it doesn't even seem documented...). Had to do some code-digging to find where I should serve my metrics.
However, I think there should be ways to configure this path. My use-case is that I want to expose two endpoints with different paths: one for pod-level metrics, another one for service-level metrics (the latter are harder to calculate and I want to do it only once per service per interval, not once per pod per interval).
Although Stackdriver Monitoring is not support only GCP but also multi-cloud and hybrid-cloud deployment,
custom-metrics-stackdriver-adapter
is strongly tied with GCE/GKE because it use ComputeTokenSource
.
DefaultTokenSource
should be more widely useful option because it allows to use GOOGLE_APPLICATION_CREDENTIALS
.
I am not testing it yet but it will be a small patch like below.
apstndb@5cc3399
According to the Readme, prometheus-to-sd supports scrape-interval and export-interval params to configure the rate of scraping and exporting.
https://github.com/GoogleCloudPlatform/k8s-stackdriver/tree/master/prometheus-to-sd#scrape-interval-vs-export-interval
However the helm chart does not support configuring those values:
https://github.com/GoogleCloudPlatform/k8s-stackdriver/blob/master/prometheus-to-sd/kubernetes/prometheus-to-sd-kube-state-metrics.yaml#L32
We can work around this by forking the yaml but it would be a nice to have if the official version could support this :)
I spent the day digging around but could not figure out where I can see the metrics being sent from event-exporter
.
Besides the one running in the 1.7+ GKE cluster, I even ran my own (log shows it is successfully sending metrics to Stackdriver, I created the ClusterRoleBinding), however I'm not sure where to actually see the metrics in SD (I tried to create a chart with custom metrics, but none appear).
Tailing the container gives me:
...
I0912 20:56:31.575763 1 sink.go:160] Sending 4 entries to Stackdriver
I0912 20:56:31.685767 1 sink.go:167] Successfully sent 4 entries to Stackdriver
...
Further, I want to confirm that it is exporting events kubectl get event
?
Any guidance or README update in this would be great.
Currently, label addition will break GKE. I checked https://github.com/GoogleCloudPlatform/k8s-stackdriver/blob/master/prometheus-to-sd, for metrics with prefix "container.googleapis.com" (e.g., etcd metrics), if definition (for example, label is one of the definition) of the metric was changed, then the metric is marked as broken and the metric is not going to be pushed:
. prometheus-to-sd only UpdateMetricDescriptors if it contains "custom.googleapis.com" prefix:k8s-stackdriver/prometheus-to-sd/main.go
Line 141 in 1047589
How ever, many label additions are conceptually backward compatible since their introduction would not break existing readers that are unaware of the label. So I would expect backward compatible label addition does not break GCP/GKE.
Here is an example: etcd-io/etcd#10022
I hope this is the right place to report this bug. I am running a GKE cluster with version v1.7.5.
I get many (~10/minute) warnings like these in the stackdriver logging output:
W Metric stackdriver_sink_request_count was not found in the cache.
W Metric stackdriver_sink_received_entry_count was not found in the cache.
W Metric stackdriver_sink_successfully_sent_entry_count was not found in the cache.
This only happened recently after a Kubernetes upgrade. Am I doing something wrong or how can I reduce these warnings?
{
insertId: "1p24kzgfok4w2d"
labels: {
compute.googleapis.com/resource_name: "gke-live-cluster-pool-XXXXXX"
container.googleapis.com/namespace_name: "kube-system"
container.googleapis.com/pod_name: "event-exporter-1421584133-43n5j"
container.googleapis.com/stream: "stderr"
}
logName: "projects/XXXXXX/logs/prometheus-to-sd-exporter"
receiveTimestamp: "2017-09-13T10:36:48.347526394Z"
resource: {
labels: {
cluster_name: "live-cluster"
container_name: "prometheus-to-sd-exporter"
instance_id: "XXXXXX"
namespace_id: "kube-system"
pod_id: "event-exporter-1421584133-43n5j"
project_id: "wcd-production"
zone: "europe-west1-d"
}
type: "container"
}
severity: "WARNING"
textPayload: "Metric stackdriver_sink_successfully_sent_entry_count was not found in the cache."
timestamp: "2017-09-13T10:36:46Z"
}
event-exporter
deployment{
"kind": "Deployment",
"apiVersion": "extensions/v1beta1",
"metadata": {
"name": "event-exporter",
"namespace": "kube-system",
"selfLink": "/apis/extensions/v1beta1/namespaces/kube-system/deployments/event-exporter",
"uid": "65301c2e-917d-11e7-8857-42010a840101",
"resourceVersion": "9533781",
"generation": 1,
"creationTimestamp": "2017-09-04T14:29:01Z",
"labels": {
"addonmanager.kubernetes.io/mode": "Reconcile",
"k8s-app": "event-exporter",
"kubernetes.io/cluster-service": "true",
"version": "v0.1.5"
},
"annotations": {
"deployment.kubernetes.io/revision": "1",
"kubectl.kubernetes.io/last-applied-configuration": "{\"apiVersion\":\"apps/v1beta1\",\"kind\":\"Deployment\",\"metadata\":{\"annotations\":{},\"labels\":{\"addonmanager.kubernetes.io/mode\":\"Reconcile\",\"k8s-app\":\"event-exporter\",\"kubernetes.io/cluster-service\":\"true\",\"version\":\"v0.1.5\"},\"name\":\"event-exporter\",\"namespace\":\"kube-system\"},\"spec\":{\"replicas\":1,\"template\":{\"metadata\":{\"labels\":{\"k8s-app\":\"event-exporter\",\"version\":\"v0.1.5\"}},\"spec\":{\"containers\":[{\"command\":[\"/event-exporter\"],\"image\":\"gcr.io/google-containers/event-exporter:v0.1.5\",\"name\":\"event-exporter\"},{\"command\":[\"/monitor\",\"--component=event_exporter\",\"--stackdriver-prefix=container.googleapis.com/internal/addons\",\"--whitelisted-metrics=stackdriver_sink_received_entry_count,stackdriver_sink_request_count,stackdriver_sink_successfully_sent_entry_count\"],\"image\":\"gcr.io/google-containers/prometheus-to-sd:v0.2.1\",\"name\":\"prometheus-to-sd-exporter\",\"volumeMounts\":[{\"mountPath\":\"/etc/ssl/certs\",\"name\":\"ssl-certs\"}]}],\"serviceAccountName\":\"event-exporter-sa\",\"terminationGracePeriodSeconds\":30,\"volumes\":[{\"hostPath\":{\"path\":\"/etc/ssl/certs\"},\"name\":\"ssl-certs\"}]}}}}\n"
}
},
"spec": {
"replicas": 1,
"selector": {
"matchLabels": {
"k8s-app": "event-exporter",
"version": "v0.1.5"
}
},
"template": {
"metadata": {
"creationTimestamp": null,
"labels": {
"k8s-app": "event-exporter",
"version": "v0.1.5"
}
},
"spec": {
"volumes": [
{
"name": "ssl-certs",
"hostPath": {
"path": "/etc/ssl/certs"
}
}
],
"containers": [
{
"name": "event-exporter",
"image": "gcr.io/google-containers/event-exporter:v0.1.5",
"command": [
"/event-exporter"
],
"resources": {},
"terminationMessagePath": "/dev/termination-log",
"terminationMessagePolicy": "File",
"imagePullPolicy": "IfNotPresent"
},
{
"name": "prometheus-to-sd-exporter",
"image": "gcr.io/google-containers/prometheus-to-sd:v0.2.1",
"command": [
"/monitor",
"--component=event_exporter",
"--stackdriver-prefix=container.googleapis.com/internal/addons",
"--whitelisted-metrics=stackdriver_sink_received_entry_count,stackdriver_sink_request_count,stackdriver_sink_successfully_sent_entry_count"
],
"resources": {},
"volumeMounts": [
{
"name": "ssl-certs",
"mountPath": "/etc/ssl/certs"
}
],
"terminationMessagePath": "/dev/termination-log",
"terminationMessagePolicy": "File",
"imagePullPolicy": "IfNotPresent"
}
],
"restartPolicy": "Always",
"terminationGracePeriodSeconds": 30,
"dnsPolicy": "ClusterFirst",
"serviceAccountName": "event-exporter-sa",
"serviceAccount": "event-exporter-sa",
"securityContext": {},
"schedulerName": "default-scheduler"
}
},
"strategy": {
"type": "RollingUpdate",
"rollingUpdate": {
"maxUnavailable": "25%",
"maxSurge": "25%"
}
},
"revisionHistoryLimit": 2,
"progressDeadlineSeconds": 600
},
"status": {
"observedGeneration": 1,
"replicas": 1,
"updatedReplicas": 1,
"readyReplicas": 1,
"availableReplicas": 1,
"conditions": [
{
"type": "Progressing",
"status": "True",
"lastUpdateTime": "2017-09-04T14:29:20Z",
"lastTransitionTime": "2017-09-04T14:29:02Z",
"reason": "NewReplicaSetAvailable",
"message": "ReplicaSet \"event-exporter-1421584133\" has successfully progressed."
},
{
"type": "Available",
"status": "True",
"lastUpdateTime": "2017-09-13T09:43:49Z",
"lastTransitionTime": "2017-09-13T09:43:49Z",
"reason": "MinimumReplicasAvailable",
"message": "Deployment has minimum availability."
}
]
}
}
In particular, this involves measuring the resource requirements to set more precise limits. Also the options passed should be revisited, especially the ones discouraged by documentation such as insecureSkipTLSVerify.
Hi,
I'm following...
https://cloud.google.com/kubernetes-engine/docs/tutorials/custom-metrics-autoscaling
and
https://github.com/GoogleCloudPlatform/k8s-stackdriver/tree/master/custom-metrics-stackdriver-adapter/examples/prometheus-to-sd
...to get custom-metric autoscaling working on a GKE v1.9.3 cluster.
JEG-CON-GEL0068:helm-haproxy-ingress james.masson$ curl -s http://localhost:8001/apis/custom.metrics.k8s.io/v1beta1/ | head -20
{
"kind": "APIResourceList",
"apiVersion": "v1",
"groupVersion": "custom.metrics.k8s.io/v1beta1",
"resources": [
{
"name": "pods/go_goroutines",
"singularName": "",
"namespaced": true,
"kind": "MetricValueList",
"verbs": [
"get"
]
},
{
"name": "pods/go_memstats_alloc_bytes",
"singularName": "",
"namespaced": true,
"kind": "MetricValueList",
"verbs": [
JEG-CON-GEL0068:helm-haproxy-ingress james.masson$ curl http://localhost:8001/apis/custom.metrics.k8s.io/v1beta1/namespaces/shared-services/pods/*/go_goroutines
{
"kind": "Status",
"apiVersion": "v1",
"metadata": {
},
"status": "Failure",
"message": "the server could not find the metric go_goroutines for pods",
"reason": "NotFound",
"code": 404
Nothing interesting in the logs
JEG-CON-GEL0068:helm-haproxy-ingress james.masson$ kubectl -n shared-services logs po/custom-metrics-stackdriver-adapter-5fc6b6d856-s62tz
I0220 11:00:41.810065 1 serving.go:283] Generated self-signed cert (apiserver.local.config/certificates/apiserver.crt, apiserver.local.config/certificates/apiserver.key)
I0220 11:00:43.110267 1 serve.go:85] Serving securely on 0.0.0.0:443
I0220 11:00:52.050474 1 trace.go:76] Trace[629458047]: "List /apis/custom.metrics.k8s.io/v1beta1/namespaces/shared-services/pods/*/go_memstats_alloc_bytes" (started: 2018-02-20 11:00:51.28992615 +0000 UTC m=+12.274162011) (total time: 760.486521ms):
Trace[629458047]: [760.486521ms] [760.47448ms] END
I have haproxy/exporter/prometheus-to-sd configured like this
- name: haproxy-ingress
image: quay.io/jcmoraisjr/haproxy-ingress:v0.5-beta.1
args:
- --default-backend-service=kube-system/default-http-backend
- --default-ssl-certificate=$(POD_NAMESPACE)/tls-secret
- --configmap=$(POD_NAMESPACE)/haproxy-ingress
- --publish-service=$(POD_NAMESPACE)/haproxy-ingress
readinessProbe:
httpGet:
path: /healthz
port: 10253
scheme: HTTP
livenessProbe:
httpGet:
path: /healthz
port: 10253
scheme: HTTP
initialDelaySeconds: 10
timeoutSeconds: 1
ports:
- name: http
containerPort: 80
- name: https
containerPort: 443
- name: stat
containerPort: 1936
env:
- name: POD_NAME
valueFrom:
fieldRef:
fieldPath: metadata.name
- name: POD_NAMESPACE
valueFrom:
fieldRef:
fieldPath: metadata.namespace
- name: haproxy-exporter
image: prom/haproxy-exporter:v0.9.0
args:
- '--haproxy.scrape-uri=http://localhost:1936/?stats;csv'
ports:
- name: prometheus
containerPort: 9101
- name: prometheus-to-sd
image: gcr.io/google-containers/prometheus-to-sd:v0.2.3
command:
- /monitor
- --stackdriver-prefix=custom.googleapis.com
- --source=:http://localhost:9101
- --pod-id=$(POD_NAME)
- --namespace-id=$(POD_NAMESPACE)
env:
- name: POD_NAME
valueFrom:
fieldRef:
fieldPath: metadata.name
- name: POD_NAMESPACE
valueFrom:
fieldRef:
fieldPath: metadata.namespace
any hints on where to look next?
thanks
James M
Trying to use prometheus-to-sd in my project (as shown in the example kube config) but keep getting this error:
Error while sending request to Stackdriver googleapi: Error 403: Request had insufficient authentication scopes., forbidden
I also tried adding auth credentials for a service account that has the "monitoring admin" and "metrics writer" roles, but no dice. This is how the kube config for my prometheus-to-sd container looks like:
- name: prometheus-to-sd
image: gcr.io/google-containers/prometheus-to-sd:v0.2.1
ports:
- name: profiler
containerPort: 6060
command:
- /monitor
- --stackdriver-prefix=custom.googleapis.com
- --source=kube-state-metrics:http://localhost:5000
- --pod-id=$(POD_NAME)
- --namespace-id=$(POD_NAMESPACE)
volumeMounts:
- name: prometheus-to-sd-cred
mountPath: /etc/cred
readOnly: true
env:
- name: GOOGLE_APPLICATION_CREDENTIALS
value: /etc/cred/prometheus-to-sd-cred.json
- name: POD_NAME
valueFrom:
fieldRef:
fieldPath: metadata.name
- name: POD_NAMESPACE
valueFrom:
fieldRef:
fieldPath: metadata.namespace
I am using prometheus-to-sd v0.3.1 for sending metrics from a spring boot application.
When the application is running in isolation i.e. no service hitting this service, it is able to send metrics to Stackdriver. But as soon as i switch the traffic to this service i.e. other service starts calling this service it stops sending metrics.
Also if I let the application to run in isolation for few hours it stops sending metrics after 3-4 hours.
When i check the logs in Stackdriver logging interface there is no logs form prometheus-to-sd container from the point at which its stop sending metrics.
Could let me know where i could find more info for this problem?
Gke cluster version: 1.10.6.2
kubernetes version: 1.10
- name: applicationName-sd
image: gcr.io/google-containers/prometheus-to-sd:v0.3.1
command:
- /monitor
- --source=:http://localhost:42802/prometheus
- --stackdriver-prefix=custom.googleapis.com/applicationName
- --pod-id=$(POD_ID)
- --namespace-id=$(POD_NAMESPACE)
env:
- name: POD_ID
valueFrom:
fieldRef:
fieldPath: metadata.name
- name: POD_NAMESPACE
valueFrom:
fieldRef:
fieldPath: metadata.namespace
applicationName and port is having right names in my config.
Logs from application start up
`GCE config: &{Project:projectName Zone:zone Cluster:dev-v2 Instance:insatnceName MetricsPrefix:custom.googleapis.com/applicationName}
Taking source configs from flags
Taking source configs from kubernetes api server
Built the following source configs: [{ localhost 42802 /prometheus [] {applciationName-5f75979b6b-b4jh6 namespace}}]
Running prometheus-to-sd, monitored target is localhost:42802
GCE config: &{Project:projectName Zone:us-east4-b Cluster:dev-v2 Instance:gke-dev-v2-standard8-10c2b7d3-xx6m MetricsPrefix:custom.googleapis.com/applicationName}
Taking source configs from flags
Taking source configs from kubernetes api server
Built the following source configs: [{ localhost 42802 /prometheus [] {applicationName-5f75979b6b-n5tt6 namespace}}]
Running prometheus-to-sd, monitored target is localhost:42802
GCE config: &{Project:projectName Zone:us-east4-b Cluster:dev-v2 Instance:gke-dev-v2-standard8-10c2b7d3-d6dd MetricsPrefix:custom.googleapis.com/applicationName}
Taking source configs from flags
Taking source configs from kubernetes api server
Built the following source configs: [{ localhost 42802 /prometheus [] {applicationName-5f75979b6b-5fdlb namespace}}]
Running prometheus-to-sd, monitored target is localhost:42802
GCE config: &{Project:projectName Zone:us-east4-c Cluster:dev-v2 Instance:gke-dev-v2-standard8-541c8362-kv5l MetricsPrefix:custom.googleapis.com/applicationName}
Taking source configs from flags
Taking source configs from kubernetes api server
Built the following source configs: [{ localhost 42802 /prometheus [] {applicationName-5f75979b6b-tqjcw namespace}}]
Running prometheus-to-sd, monitored target is localhost:42802
GCE config: &{Project:projectName Zone:us-east4-a Cluster:dev-v2 Instance:gke-dev-v2-standard8-61da8de0-zb18 MetricsPrefix:custom.googleapis.com/applicationName}
Taking source configs from flags
Taking source configs from kubernetes api server
Built the following source configs: [{ localhost 42802 /prometheus [] {applicationName-5f75979b6b-74vnt namespace}}]
Running prometheus-to-sd, monitored target is localhost:42802
GCE config: &{Project:projectName Zone:us-east4-b Cluster:dev-v2 Instance:gke-dev-v2-standard8-10c2b7d3-5scm MetricsPrefix:custom.googleapis.com/applicationName}
Taking source configs from flags
Taking source configs from kubernetes api server
Built the following source configs: [{ localhost 42802 /prometheus [] {applicationName-5f75979b6b-l46x8 namespace}}]
Running prometheus-to-sd, monitored target is localhost:42802
Metric process_start_time_seconds invalid or not defined. Using 1970-01-01 00:00:01 +0000 UTC instead. Cumulative metrics might be inaccurate.`
There are warning messages like these after start up but it was still sending the metrics to stackdriver.
Metric process_start_time_seconds invalid or not defined. Using 1970-01-01 00:00:01 +0000 UTC instead. Cumulative metrics might be inaccurate
The port that is exposing the metrics is also the port for liveliness and readiness probe but it hits a different url.
I have the 3+ pods running.
Let me what more details i could provide.
Running event exporter with command
event-exporter -sink-opts="-location=europe-west2-b"
does not fail, but location is not set.
I would expect it failing with badly passed arguments
Currently it's hardcoded gke_container
/cc @loburm
looking in the container registry, the least release of the metrics adapter was July 2018, over 6 months ago. Since then it looks like the ability to query for metrics across projects, as well as use GOOGLE_APPLICATION_CREDENTIALS
have been merged.
Can we get a new release published so these new features can be used? (Thanks!). Happy to assist with that however I can as my org has large need for cross-project metric queries for autoscaling to go GA in GKE.
This is a feature request. I'd like to add prefix or suffix to the scraped metrics' names. Since in a case when collecting metrics from some components whose names of metrics are difficult to change, I'd like to distinguish such metrics on Stackdriver; especially by the environment the metrics are collected (e.g., staging or production).
"No metrics to send to Stackdriver" is a warning, that will be written on every collection which is really annoying. Could you please only send it once (every X) or make it V(2/4)?
I'm running a Golang HTTP server on K8s using OpenCensus for tracing. In my logs I set the JSON field trace
to "projects/$PROJECT/traces/$TRACE_ID"
.
If I click on the "Log" link in a Stackdriver trace (see screenshot) I end up with zero log results. However, if I manually replace trace=...
with jsonPayload.trace=...
the filter works and I get the log entries associated with the trace I clicked on.
Is there anything I can do to make this work automatically? Ie. is there any way to set the trace
field directly instead of just the jsonPayload.trace
?
I'm running two clusters that were upgraded to 1.8.3-gke.0
, and are running gcr.io/google-containers/prometheus-to-sd:v0.2.2
. Every minute, this container logs:
Error while sending request to Stackdriver googleapi: Error 400: Unknown metric: container.googleapis.com/internal/addons/event_exported/stackdriver_sink_received_entry_count, badRequest
It seems like this must be some sort of mismatch between the expected names, and the actual names in Stackdriver or something?
I am working on adding a cAdvisor daemonset with an example integration with the prometheus-to-sd sidecar
Prometheus-to-sd currently just uses the namespace_id, pod_id, and container_name provided by the config. However, for cAdvisor, or other monitoring daemonsets that publish metrics about other pods/containers in the cluster, adding namespace_id, pod_id, and container_name as prometheus labels results in stackdriver labels pointing to the "correct" container, but the monitored resource points to the cAdvisor container (as that is where the metrics originate from).
I propose either allowing prometheus labels to override the monitored resource attributes by default (if provided), or adding the ability to configure which prometheus label is used for each monitored resource attribute. E.g. --container-name-label
, --pod-id-label
, etc, and allowing only these, or the --pod-id
, --namespace-id
flags to be set. The latter has the advantage of being able to perform "remapping" of labels from the target pod to monitored resource attributes in stackdriver, and means we don't need to match labels exactly (e.g. I could specify --namespace-id-label=namespace_name
if that was how the producer labeled metrics).
I am running the prometheus-to-sd
sidecar that scrapes the Prometheus metrics endpoint of my service and sends metrics to Stackdriver. There is a metric called handler_error_count
which represents API handler errors, e.g.:
handler_error_count{caller="some_caller",error_type="client",outcome="invalid_argument",procedure="some_procedure"} 1
I can find the metric in Stackdriver's Metrics Explorer but it says that its resource type is None
. For this reason I am unable to create charts or set up alert policies using this metric.
Unless I am missing something, the resource type should be gke_container
for all metrics.
Any suggestions on how to fix this issue would be appreciated.
After closing and opening a PR issue have disappeared.
https://travis-ci.org/GoogleCloudPlatform/k8s-stackdriver/builds/253132704
It seems that prometheus-to-sd is unable to translate summary metrics and throws errors every time that scrape the metrics:
W0817 15:04:56.690842 1 translator.go:61] Error while processing metric http_request_duration_microseconds: Metric type SUMMARY of family http_request_duration_microseconds not supported
However, the stackdriver-prometheus project actually is able to parse this kind of metric and push it to stackdriver (https://github.com/Stackdriver/stackdriver-prometheus/blob/9f5c836bfc3abea91b7473f4a57c107806099bf9/stackdriver/translate.go#L118).
Is there any reason that prometheus-to-sd does not support this kind of metrics? Are you planning to add support for summary metrics? I can work on it if you accept PRs.
I'm getting this error logged every ~30 seconds: Error while sending request to Stackdriver Post /v3/projects/{my project}/timeSeries?alt=json: unsupported protocol scheme """
I've just been watching my kube logs via the kubectl log
CLI this whole time and this wasn't appearing in any of those logs. All my services are working as expected. But when I checked GCP logs interface on the website, apparently this has been going on for at least 1 week now, perhaps longer.
Any ideas?
I'd like to use k8s-stackdriver to do pod autoscaling based on metrics that already exist in Stackdriver's predefined metrics list. For example, using Cloud Pub/Sub's subscription/num_undelivered_messages as the metric to autoscale on.
Is this possible with what exists today or what is planned for this project, or is this only intended for custom metrics?
Looks like mechanism for deduplicating data points (
)/cc @loburm
I have been using histograms in my python app to monitor the response time of my application. Naturally histograms in prometheus have the value type Distribution in stackdriver but im unclear why this component choses Cumulative at the metric kind.
Cumulative metrics seem appropriate for counters but for prometheus Histograms wouldn't the kind being delta (or gauge) be more suitable. My response time metrics aren't graphing properly in the UI and i think its because of this.
Prometheus-to-sd tries to update the Stackdriver MonitoredDescriptor if the installed one is incompatible with the Prometheus metric. This is a lossy operation that will drop all the historical data, and it could be triggered by unexpected changes or bugs in the Prometheus exporters.
I propose that we refuse the write and log an error. The metric descriptor update should be pulled out into a separate tool that requires manual invocation, for users who are OK losing the data.
The code in question is here: https://github.com/GoogleCloudPlatform/k8s-stackdriver/blob/master/prometheus-to-sd/translator/stackdriver.go#L88
MonitoredDescriptor.Create is documented here, I've asked the owners to document this behavior explicitly: https://cloud.google.com/monitoring/api/ref_v3/rest/v3/projects.metricDescriptors/create
h/t to quentin
I'm interested in scarping metrics of the prometheus /federate endpoint. In order to do so I need to be able to configure the scape endpoint as well as supply some match expressions.
Thus I would like to extend the source URI by a metrics path and add the match parameter. Something like:
source=prometheus:http://localhost:9090/federate?match[]={job="prometheus"}&match[]={name=~"job:.*"}
The following error keeps showing up in the logs:
E0509 10:17:28.743204 1 stackdriver.go:58] Error while sending request to Stackdriver json: error calling MarshalJSON for type *monitoring.CreateTimeSeriesRequest: json: error calling MarshalJSON for type *monitoring.TimeSeries: json: error calling MarshalJSON for type *monitoring.Point: json: error calling MarshalJSON for type *monitoring.TypedValue: json: error calling MarshalJSON for type *monitoring.Distribution: json: unsupported value: +Inf
The current default type is "INT64"
This is breaking floating point metrics such as: "custom.googleapis.com/kube-state-metrics/kube_pod_container_resource_requests_cpu_cores"
Since the variable that holds the Value is of type float64 already, I think the best candidate for default type should be "DOUBLE" instead of "INT64".
I'm trying to get the kube-state-metrics setup and I can see metrics in stack driver, but none of them appear to have labels. When I looked at the prometheus-to-sd
logs I see a bunch of errors like this:
Error in attempt to update metric descriptor googleapi: Error 400:
Field labels had an invalid value:
When creating metric custom.googleapis.com/kube-state-metrics/kube_service_labels:
the metric has more than 10 labels., badRequest
I'm not quite sure what the issue is but I really need the labels for this to work. Any one seen this before?
Is there a way to pull prom metrics from http://localhost/xyx instead of always defaulting to http://localhost/metrics ?
When I specify
command:
- /monitor
- --source=:http://localhost:80/promMetrics
- --stackdriver-prefix=custom.googleapis.com
- --pod-id=$(POD_ID)
- --namespace-id=$(POD_NAMESPACE)
It still defaults to /metrics
We are seeing a lot of log entries in stackdriver of the form:
This is on a GKE cluster, 1.9.7-gke.0
This is coming from the prometheus-to-sd container
Hi Googlers,
thanks for your great work and contribution to the open source community!
I just wanted to let you know that you have a 404 in the Readme: stackdriverSite.
Cheers Maiky
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.