Comments (26)
For anyone finding this in the future, you can dump metrics then look for duplicate series:
# Grab metrics
# Filter out comments
# Drop the value off the end
# Group lines and count
# Filter for >1 count of line (duplicate series)
curl -s 192.168.215.5:8080/metrics \
| grep -v '^#' \
| rev | cut -d" " -f 2- | rev \
| sort | uniq -c \
| sort -n | grep -vE '^[ ]+1[ ]'
I found
2 kube_endpoint_address{namespace="kube-system",endpoint="prom-kp-kube-proxy",ip="172.16.1.64",ready="true"}
Which, in my case, is an issue with my setup and keepalived
from prometheus.
For those finding this issue and wanting to follow on with kube-state-metrics, you want:
kubernetes/kube-state-metrics#2390
from prometheus.
Duplicate labels on scrape is a clear logic error (at least in the mind of some people who worked on it).
Duplicate sample fed into TSDB is something that happens, e.g. on some kinds of restart, and we prefer simple logic to always accept it over complicated logic aimed at particular corner cases.
I only wanted to nitpick the wording of the message, not change behaviour. See also #13277 (comment).
Unfortunately it would be more of a breaking change to rename prometheus_target_scrapes_sample_duplicate_timestamp_total
.
from prometheus.
I might has something to do with bairhys/prometheus-frigate-exporter#9 and the introduced check for duplicated series #12933
from prometheus.
Yes, starting with v2.52.0
such "duplicates" are no longer ignored.
In the bairhys/prometheus-frigate-exporter#9 case, the client was indeed exposing duplicated values for the same timestamp and a fix was merged.
maybe the same is happening with kube-state-metrics
.
A debug log after
Line 1625 in 3b8b577
cc @bboreham as you reviewed the feature.
from prometheus.
@machine424 I already thought that, but I don't see any duplicates in the metrics. So I'm a bit confused right now.
However, I like the idea of showing the failing metrics in the debug log.
from prometheus.
Yes, it's not that easy to debug. If you want to add that log, please go ahead. We'll see if we can add it to any potential v2.52.1
.
Otherwise I can open a PR.
from prometheus.
I see this from the report:
msg="Error on ingesting samples with different value but same timestamp" num_dropped=1
This is intentionally not giving any details on series, just the number.
We could perhaps record the first error, to avoid generating a lot of extra work.
from prometheus.
Prometheus configuration file
No response
This makes it harder to tell if your problem could be relabeling.
from prometheus.
Actually, now that I'm looking at the code for real, I think a debug log should already be provided via checkAddError
Lines 1781 to 1785 in 3b8b577
(no need for the extra debug log)
You can
--log.level=debug
and see.from prometheus.
@machine424 you are right the debug log is already implemented. I just deployed one prometheus with debug log level enabled. Seems like kube-state-metrics is indeed producing duplicate samples...
ts=2024-05-13T19:20:40.233Z caller=main.go:1372 level=info msg="Completed loading of configuration file" filename=/etc/prometheus/config_out/prometheus.env.yaml totalDuration=95.860644ms db_storage=1.142µs remote_storage=150.634µs web_handler=872ns query_engine=776ns scrape=98.941µs scrape_sd=7.197985ms notify=13.095µs notify_sd=269.119µs rules=54.251368ms tracing=6.745µs
...
ts=2024-05-13T19:21:09.190Z caller=scrape.go:1777 level=debug component="scrape manager" scrape_pool=serviceMonitor/monitoring/kube-state-metrics/0 target=https://10.244.5.6:8443/metrics msg="Duplicate sample for timestamp" series="kube_pod_tolerations{namespace=\"calico-system\",pod=\"calico-kube-controllers-75c647b46c-pg9cr\",uid=\"bf944c52-17bd-438b-bbf1-d97f8671bd6b\",key=\"CriticalAddonsOnly\",operator=\"Exists\"}"
ts=2024-05-13T19:21:09.207Z caller=scrape.go:1738 level=warn component="scrape manager" scrape_pool=serviceMonitor/monitoring/kube-state-metrics/0 target=https://10.244.5.6:8443/metrics msg="Error on ingesting samples with different value but same timestamp" num_dropped=1
from prometheus.
And well it is doing by purpose... not sure what AKS is doing here but the toleration exists two times on the calico-kube-controllers deployment
tolerations:
- key: CriticalAddonsOnly
operator: Exists
- effect: NoSchedule
key: node-role.kubernetes.io/master
- effect: NoSchedule
key: node-role.kubernetes.io/control-plane
- key: CriticalAddonsOnly
operator: Exists
So it seems like everything is working fine on prometheus and kube-state-metrics side 👍
from prometheus.
Yes, starting with
v2.52.0
such "duplicates" are no longer ignored.
“Ignored” is probably the wrong word here. It’s a little bit more complicated than that.
You might have some timeseries multiple times with different values, in which case I think the last one will be appended to tsdb, this doesn’t have to be “correct” one.
Or you can even imagine metrics response giving different order of samples on each scrape, which for counters might mean bogus results.
from prometheus.
So it seems like everything is working fine on prometheus and kube-state-metrics side 👍
I think this is worth creating an issue on kube-state-metrics
as well.
As the tolerations
array permits "duplicates", and depending on kube_pod_tolerations
intent, there might be a need to deduplicate or add an index-label or sth to identify each toleration.
In this case, it seems to be "harmless", but perhaps the same approach is applied to other arrays. It’s important to ensure they are aware of this.
from prometheus.
@machine424 will do. Quite confusing to see that "duplicates" are allowed within the tolerations
array. I wasn't expecting that.
from prometheus.
Closing here - Since everything is working as expected with Prometheus, thanks to everyone your help!
from prometheus.
Thanks for investigation @machine424.
One nit: "Error on ingesting samples with different value but same timestamp" - don't they all have the same value, i.e. 1?
I think this comes from Prometheus re-using an error in a slightly different context.
from prometheus.
One nit: "Error on ingesting samples with different value but same timestamp" - don't they all have the same value, i.e. 1?
I think this comes from Prometheus re-using an error in a slightly different context.
Good point. I think we agree that even in such cases (same value), we should continue to consider it as an error. This can help highlight a hidden issue (targets shouldn't rely on Prometheus deduplicating that IIUC). But I'm afraid some targets may be relying on the old behavior, especially the ones with honor_timestamps
(No need to clean the exposed metrics, prometheus will take care of that)
That being said, the TSDB doesn't consider samples with the same timestamps and the same value as duplicates; it tolerates that:
prometheus/tsdb/head_append.go
Lines 464 to 473 in dc92652
Hence, the explicit warning message.
If we want to maintain the current behavior, I agree we shouldn't use a storage error ErrDuplicateSampleForTimestamp
for a scrape phase issue.
from prometheus.
I'd also would like to suggest a revision of the warning message triggered by duplicate series in Prometheus. In my experience, the message didn't accurately reflect the situation, as both samples had identical values.
Similarly, the prometheus_target_scrapes_sample_duplicate_timestamp_total
counter seems to be incrementing even when the duplicate samples have the same value, which contradicts its intended purpose - at least by the current definition.
While I understand the logic behind rejecting duplicate samples, I'm a bit confused about the implementation, as the underlying TSDB is accepting such cases.
from prometheus.
Do you think @bboreham that could be done as part of #13277 or should we create an issue for it?
from prometheus.
I have a special and unprecedented idea that ingests different value on the same timestamp on different moment. The timestamp on the metric is created by myself not prometheus pulls.The metric has the potential to increase by the time. So how can i escape the info "Error on ingesting samples with different value but same timestamp" on the behind logic of promethues and ingest value(not first)on same timestamp?THX @bboreham@machine424
from prometheus.
The only way to send multiple samples for the same series at different timestamps into current Prometheus is via remote-write.
I checked the OpenMetrics spec and I don't think it says whether a scrape can have the same series at multiple timestamps.
You could raise a feature request for this.
from prometheus.
from prometheus.
Does anyone know how it is possible that in my case the rate of prometheus_target_scrapes_sample_duplicate_timestamp_total
is constantly ~0.0666 but when I check duplicate metrics I couldn't find any (used the curl @nijave's curl command)...?
from prometheus.
Turn on debug logging?
from prometheus.
Oh, now I see in Prometheus these warnings msg="Error on ingesting samples with different value but same timestamp" num_dropped=1
; I guess duplicates are being dropped before I try to get them via /metrics endpoint.
from prometheus.
Related Issues (20)
- [Feature] Add new labels to time series
- Prometheus Mixin Dashboard: Prometheus/Overview Grafana Dashboard Using Deprecated Angular components HOT 2
- remote write 2.0 - decide how to handle no metadata found
- remote write 2.0 - decide whether addition of metadata should count towards max samples in write request
- remote write 2.0 - update write handler benchmarks for 2.0 format
- remote write 2.0 - DRY the queue manager code HOT 2
- feat: Move remote write receive to runtime reloadable config HOT 1
- [flaky test] TestEvaluations/testdata/native_histograms.test HOT 1
- ui (tests): Add tests for Native histogram helpers HOT 2
- remote write 2.0 - update `TestSampleDelivery` to check for metadata in 2.0 proto
- remote write 2.0 - update test for old samples filtering for 2.0
- @ modifier with future return inconsistent value for sum_over_time HOT 4
- Prometheus stucks on protection from Host Header Injection HOT 1
- SIGSEGV after writing block HOT 13
- Not enough memory resources HOT 2
- `navigator.clipboard` may not be available HOT 2
- Prometheus Staleness Issue on Fedora 39 HOT 1
- Prometheus reload not exist block: opening storage failed
- Metrics in "/actuator/prometheus" are not consistent in a multi nodes environment (kubernetes) HOT 1
- Recommendation for PGO with Prometheus HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from prometheus.