Git Product home page Git Product logo

Comments (9)

x13n avatar x13n commented on June 13, 2024

@loburm Can you take a look?

from k8s-stackdriver.

vyfster avatar vyfster commented on June 13, 2024

On GKE cluster 1.9.7-gke.0. I get the same as the above message and the following where code is replaced with grpc -

E0519 09:17:23.051452 1 stackdriver.go:58] Error while sending request to Stackdriver googleapi: Error 400: Field timeSeries[0].metric.labels[0] had an invalid value of "grpc": Unrecognized metric label., badRequest

I've updated the yaml for the fluentd-gcp-v2.0.17 deployment and the yaml for the individual pods changing the prometheus-to-sd-exporter image to gcr.io/google-containers/prometheus-to-sd:v0.2.6 to no effect. Still get both error messages spamming the logs.

from k8s-stackdriver.

tomoe avatar tomoe commented on June 13, 2024

Could this be caused by a prometheus data schema change leading to incompatible TimeSeries resource?

Here's the diff of the fluentd metrics (v1.8 vs v1.10 GKE clusters):

$ diff -u 1.8.txt 1.10.txt 
--- 1.8.txt     2018-05-23 23:54:34.472643397 +0900
+++ 1.10.txt    2018-05-24 00:53:35.567634042 +0900
@@ -1,28 +1,33 @@
 # TYPE process_start_time_seconds gauge
 # HELP process_start_time_seconds Timestamp of the process start in seconds
-process_start_time_seconds 1527087032.0
+process_start_time_seconds 1527086089
 # TYPE logging_entry_count counter
 # HELP logging_entry_count Total number of log entries generated by either application containers or system components
-logging_entry_count 459.0
+logging_entry_count 880
 # TYPE stackdriver_successful_requests_count counter
 # HELP stackdriver_successful_requests_count A number of successful requests to the Stackdriver Logging API
-stackdriver_successful_requests_count{grpc="false"} 32.0
+stackdriver_successful_requests_count{grpc="true",code="0"} 48
 # TYPE stackdriver_failed_requests_count counter
 # HELP stackdriver_failed_requests_count A number of failed requests to the Stackdriver Logging API, broken down by the error code
 # TYPE stackdriver_ingested_entries_count counter
 # HELP stackdriver_ingested_entries_count A number of log entries ingested by Stackdriver Logging
-stackdriver_ingested_entries_count 459.0
+stackdriver_ingested_entries_count{grpc="true",code="0"} 721
 # TYPE stackdriver_dropped_entries_count counter
 # HELP stackdriver_dropped_entries_count A number of log entries dropped by the Stackdriver output plugin
+# TYPE stackdriver_retried_entries_count counter
+# HELP stackdriver_retried_entries_count The number of log entries that failed to be ingested by the Stackdriver output plugin due to a transient error and were retried
 # TYPE fluentd_status_buffer_queue_length gauge
 # HELP fluentd_status_buffer_queue_length Current buffer queue length.
-fluentd_status_buffer_queue_length{plugin_id="object:3fe4940f3454",plugin_category="output",type="google_cloud"} 0.0
-fluentd_status_buffer_queue_length{plugin_id="object:3fe4941a8aac",plugin_category="output",type="google_cloud"} 0.0
+fluentd_status_buffer_queue_length{plugin_id="object:160bf98",plugin_category="output",type="google_cloud"} 0
+fluentd_status_buffer_queue_length{plugin_id="object:16ecf98",plugin_category="output",type="google_cloud"} 0
+fluentd_status_buffer_queue_length{plugin_id="object:1663b44",plugin_category="output",type="google_cloud"} 0
 # TYPE fluentd_status_buffer_total_bytes gauge
 # HELP fluentd_status_buffer_total_bytes Current total size of queued buffers.
-fluentd_status_buffer_total_bytes{plugin_id="object:3fe4940f3454",plugin_category="output",type="google_cloud"} 0.0
-fluentd_status_buffer_total_bytes{plugin_id="object:3fe4941a8aac",plugin_category="output",type="google_cloud"} 1640.0
+fluentd_status_buffer_total_bytes{plugin_id="object:160bf98",plugin_category="output",type="google_cloud"} 3057
+fluentd_status_buffer_total_bytes{plugin_id="object:16ecf98",plugin_category="output",type="google_cloud"} 0
+fluentd_status_buffer_total_bytes{plugin_id="object:1663b44",plugin_category="output",type="google_cloud"} 0
 # TYPE fluentd_status_retry_count gauge
 # HELP fluentd_status_retry_count Current retry counts.
-fluentd_status_retry_count{plugin_id="object:3fe4940f3454",plugin_category="output",type="google_cloud"} 0.0
-fluentd_status_retry_count{plugin_id="object:3fe4941a8aac",plugin_category="output",type="google_cloud"} 0.0
+fluentd_status_retry_count{plugin_id="object:160bf98",plugin_category="output",type="google_cloud"} 0
+fluentd_status_retry_count{plugin_id="object:16ecf98",plugin_category="output",type="google_cloud"} 0
+fluentd_status_retry_count{plugin_id="object:1663b44",plugin_category="output",type="google_cloud"} 0

I can see code, grpc for example in stackdriver_successful_requests_count.

from k8s-stackdriver.

loburm avatar loburm commented on June 13, 2024

Hi Tomoe,
Thanks for checking, this a root cause of a problem. I'm going to talk with Stackdriver folks about it.

from k8s-stackdriver.

0x80 avatar 0x80 commented on June 13, 2024

Hi,

I'm experiencing this problem as well. I upgraded my cluster and node pool versions via the gcloud web interface, but I don't see any option to downgrade... Can I somehow downgrade without having to recreate the deployment?

from k8s-stackdriver.

thecav avatar thecav commented on June 13, 2024

I am seeing the same thing on GKE cluster 1.9.7-gke.1

from k8s-stackdriver.

loburm avatar loburm commented on June 13, 2024

Right now I don't see any other solution, except disabling those metrics in patch releases. I'll keep you updated, about the progress.

from k8s-stackdriver.

tomoe avatar tomoe commented on June 13, 2024

Looks like the issue has been resolved in my cluster.
@wstrange, could you confirm and close unless you still see the issue?

from k8s-stackdriver.

wstrange avatar wstrange commented on June 13, 2024

Appears to be fixed in my recent cluster (1.10.2-gke.3)

from k8s-stackdriver.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.