Comments (6)
Hey,
we have the same issue with jetstream_consumer_num_pending
.
As a workaround I added != 0
in my Grafana dashboard and set "Connect null values" to "Always" in the Time series panel. You might be able to use this for alerting if you dont alert on "null values", just keep in mind that you have less data points as prometheus sometimes scrapes the wrong value (for us the wrong value is always 0 ).
I am not sure I would trust such an alert 100% though.
from prometheus-nats-exporter.
Same behaviour for us with jetstream_consumer_num_pending
. This already happened with the exporter in version 0.9.1. Upgraded it to 0.11.0 but it still shows the same behaviour.
from prometheus-nats-exporter.
Hello @wallyqs, sorry for ping you, but in general, it is difficult to understand which server displays the real information.
we have different values from each nats server(0 / 8 / 0):
nats_consumer_num_pending{account="TEST",account_id="ID",cluster="nats",consumer_desc="",consumer_leader="nats-1",
consumer_name="monitor",domain="",is_consumer_leader="false",is_meta_leader="false",is_stream_leader="false",
meta_leader="nats-2",server_name="nats-0",stream_leader="nats-2",stream_name="TEST"} 0
nats_consumer_num_pending{account="TEST",account_id="ID",cluster="nats",consumer_desc="",consumer_leader="nats-1",
consumer_name="monitor",domain="",is_consumer_leader="true",is_meta_leader="false",is_stream_leader="false",
meta_leader="nats-2",server_name="nats-1",stream_leader="nats-2",stream_name="TEST"} 8
nats_consumer_num_pending{account="TEST",account_id="ID",cluster="nats",consumer_desc="",consumer_leader="nats-1",
consumer_name="monitor",domain="",is_consumer_leader="false",is_meta_leader="true",is_stream_leader="true",
meta_leader="nats-2",server_name="nats-2",stream_leader="nats-2",stream_name="TEST"} 0
and in this case nats-1 is the leader.
result of nats consumer info
:
nats consumer info TEST monitor |grep -E 'Leader|Unprocessed'
Leader: nats-1
Unprocessed Messages: 8
and it's difficult to say what exactly is true, since the leader displays 8, but the other two servers are 0.
Could you say where the error is possible and I could prepare a PR.
from prometheus-nats-exporter.
A few new points, I used the promql query:
count(nats_consumer_num_pending > 0) by (cluster_id, account, consumer_name, stream_name, consumer_leader) > 0
and I found that if there is a difference in the same metric between different servers, then the metric difference is always on consumer_leader side.
and the second point is that when I try to restart the prometheus-nats-exporter container inside the nats server pod(with metric differences) by:
kill -HUP $(ps aufx |grep '[p]rometheus-nats-exporter' |awk '{print $1}')
prometheus-nats-exporter container is successfully restarted, but the metric value doesnt change. I tried restarting the whole pod, but the result is the same, nothing changes.
apparently, the error is not with the exporter, as if the nats server displays another metric value.
it looks like consumer replicas don't replicate these metrics from the consumer_leader.
from prometheus-nats-exporter.
as far as I understand, when using the nats consumer info
command, information about "Unprocessed Messages" is always given by the consumer leader. Is there any way to view this metric on each nats server? there is a desire to connect to each server and see the list of unprocessed messages and compare their number with the metric, to understand where the error is
from prometheus-nats-exporter.
after testing, it turned out that the nats pod, which is consumer_leader at the moment, always shows the correct value for pending messages and for ack pending messages. I added the label is_consumer_leader="true"
to Grafana dashboard and it solved the problem of incorrect data display.
the same for alerts expression:
nats_consumer_num_pending{env="stage", is_consumer_leader="true"} > 0
it will always be triggered only when the current values are.
@jlange-koch, != 0
is not always true, as I have observed situations that replicas show != 0
, but in fact there are no pending messages and the leader correctly displays 0.
from prometheus-nats-exporter.
Related Issues (20)
- Ability to use server_name instand of server_id
- Missing metrics for grafana dashboard HOT 1
- Monitoring multiple NATS servers doesn't get the Server ID
- Missing nss metrics for channels and servers in Jetstream HOT 5
- Publish new release HOT 4
- Invalid character 'p' after top-level value when add -channelz flag
- Common alerts to share? HOT 2
- what is the value exactly of nats_varz_cpu? and what is it's scale? HOT 1
- Is there any metrics to return a stream maximun size limit in NATS server?
- How to export accounts Max Data and jetstream Max Disk Storage metrics
- Healthz collector doesn't works HOT 2
- missing gnatsd_varz_cpu, is nats.io does not have jetstream ready while exporter start HOT 1
- After updating the durable consumer information of a stream, the sum of these metric is increased
- nats_stream_total_messages grows, but no other metric follows it
- not connected grafana HOT 1
- release a new version to fix https://github.com/advisories/GHSA-fr2g-9hjm-wr23
- Include username in detailed connection information.
- NATS Server Dashboard using deprecated angular panels
- Changing `config.nats.service` does not change from default
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from prometheus-nats-exporter.