Git Product home page Git Product logo

Comments (6)

jlange-koch avatar jlange-koch commented on August 16, 2024

Hey,
we have the same issue with jetstream_consumer_num_pending .
As a workaround I added != 0 in my Grafana dashboard and set "Connect null values" to "Always" in the Time series panel. You might be able to use this for alerting if you dont alert on "null values", just keep in mind that you have less data points as prometheus sometimes scrapes the wrong value (for us the wrong value is always 0 ).
I am not sure I would trust such an alert 100% though.

from prometheus-nats-exporter.

niklasmtj avatar niklasmtj commented on August 16, 2024

Same behaviour for us with jetstream_consumer_num_pending. This already happened with the exporter in version 0.9.1. Upgraded it to 0.11.0 but it still shows the same behaviour.

from prometheus-nats-exporter.

andreyreshetnikov-zh avatar andreyreshetnikov-zh commented on August 16, 2024

Hello @wallyqs, sorry for ping you, but in general, it is difficult to understand which server displays the real information.
we have different values from each nats server(0 / 8 / 0):

nats_consumer_num_pending{account="TEST",account_id="ID",cluster="nats",consumer_desc="",consumer_leader="nats-1",
consumer_name="monitor",domain="",is_consumer_leader="false",is_meta_leader="false",is_stream_leader="false",
meta_leader="nats-2",server_name="nats-0",stream_leader="nats-2",stream_name="TEST"} 0

nats_consumer_num_pending{account="TEST",account_id="ID",cluster="nats",consumer_desc="",consumer_leader="nats-1",
consumer_name="monitor",domain="",is_consumer_leader="true",is_meta_leader="false",is_stream_leader="false",
meta_leader="nats-2",server_name="nats-1",stream_leader="nats-2",stream_name="TEST"} 8

nats_consumer_num_pending{account="TEST",account_id="ID",cluster="nats",consumer_desc="",consumer_leader="nats-1",
consumer_name="monitor",domain="",is_consumer_leader="false",is_meta_leader="true",is_stream_leader="true",
meta_leader="nats-2",server_name="nats-2",stream_leader="nats-2",stream_name="TEST"} 0

and in this case nats-1 is the leader.
result of nats consumer info:

nats consumer info TEST monitor |grep -E 'Leader|Unprocessed'                                                                                                                                                     
              Leader: nats-1
     Unprocessed Messages: 8

and it's difficult to say what exactly is true, since the leader displays 8, but the other two servers are 0.
Could you say where the error is possible and I could prepare a PR.

from prometheus-nats-exporter.

andreyreshetnikov-zh avatar andreyreshetnikov-zh commented on August 16, 2024

A few new points, I used the promql query:
count(nats_consumer_num_pending > 0) by (cluster_id, account, consumer_name, stream_name, consumer_leader) > 0
and I found that if there is a difference in the same metric between different servers, then the metric difference is always on consumer_leader side.

and the second point is that when I try to restart the prometheus-nats-exporter container inside the nats server pod(with metric differences) by:
kill -HUP $(ps aufx |grep '[p]rometheus-nats-exporter' |awk '{print $1}')
prometheus-nats-exporter container is successfully restarted, but the metric value doesnt change. I tried restarting the whole pod, but the result is the same, nothing changes.
apparently, the error is not with the exporter, as if the nats server displays another metric value.
it looks like consumer replicas don't replicate these metrics from the consumer_leader.

from prometheus-nats-exporter.

andreyreshetnikov-zh avatar andreyreshetnikov-zh commented on August 16, 2024

as far as I understand, when using the nats consumer info command, information about "Unprocessed Messages" is always given by the consumer leader. Is there any way to view this metric on each nats server? there is a desire to connect to each server and see the list of unprocessed messages and compare their number with the metric, to understand where the error is

from prometheus-nats-exporter.

andreyreshetnikov-zh avatar andreyreshetnikov-zh commented on August 16, 2024

after testing, it turned out that the nats pod, which is consumer_leader at the moment, always shows the correct value for pending messages and for ack pending messages. I added the label is_consumer_leader="true" to Grafana dashboard and it solved the problem of incorrect data display.
the same for alerts expression:

nats_consumer_num_pending{env="stage", is_consumer_leader="true"} > 0

it will always be triggered only when the current values are.

@jlange-koch, != 0 is not always true, as I have observed situations that replicas show != 0, but in fact there are no pending messages and the leader correctly displays 0.

from prometheus-nats-exporter.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.