Git Product home page Git Product logo

Comments (26)

nijave avatar nijave commented on September 23, 2024 4

For anyone finding this in the future, you can dump metrics then look for duplicate series:

# Grab metrics
# Filter out comments
# Drop the value off the end
# Group lines and count
# Filter for >1 count of line (duplicate series)
curl -s 192.168.215.5:8080/metrics \
  | grep -v '^#' \
  | rev | cut -d" " -f 2- | rev \
  | sort | uniq -c \
  | sort -n | grep -vE '^[ ]+1[ ]'

I found

2 kube_endpoint_address{namespace="kube-system",endpoint="prom-kp-kube-proxy",ip="172.16.1.64",ready="true"}

Which, in my case, is an issue with my setup and keepalived

from prometheus.

bootc avatar bootc commented on September 23, 2024 2

For those finding this issue and wanting to follow on with kube-state-metrics, you want:
kubernetes/kube-state-metrics#2390

from prometheus.

bboreham avatar bboreham commented on September 23, 2024 1

Duplicate labels on scrape is a clear logic error (at least in the mind of some people who worked on it).

Duplicate sample fed into TSDB is something that happens, e.g. on some kinds of restart, and we prefer simple logic to always accept it over complicated logic aimed at particular corner cases.

I only wanted to nitpick the wording of the message, not change behaviour. See also #13277 (comment).

Unfortunately it would be more of a breaking change to rename prometheus_target_scrapes_sample_duplicate_timestamp_total.

from prometheus.

rgarcia89 avatar rgarcia89 commented on September 23, 2024

I might has something to do with bairhys/prometheus-frigate-exporter#9 and the introduced check for duplicated series #12933

from prometheus.

machine424 avatar machine424 commented on September 23, 2024

Yes, starting with v2.52.0 such "duplicates" are no longer ignored.
In the bairhys/prometheus-frigate-exporter#9 case, the client was indeed exposing duplicated values for the same timestamp and a fix was merged.
maybe the same is happening with kube-state-metrics.
A debug log after

err = storage.ErrDuplicateSampleForTimestamp
(with the metric name + labels and maybe the value and timestamp) would be helpful for clients to adjust to the new behaviour.
cc @bboreham as you reviewed the feature.

from prometheus.

rgarcia89 avatar rgarcia89 commented on September 23, 2024

@machine424 I already thought that, but I don't see any duplicates in the metrics. So I'm a bit confused right now.

However, I like the idea of showing the failing metrics in the debug log.

from prometheus.

machine424 avatar machine424 commented on September 23, 2024

Yes, it's not that easy to debug. If you want to add that log, please go ahead. We'll see if we can add it to any potential v2.52.1.
Otherwise I can open a PR.

from prometheus.

bboreham avatar bboreham commented on September 23, 2024

I see this from the report:

msg="Error on ingesting samples with different value but same timestamp" num_dropped=1

This is intentionally not giving any details on series, just the number.
We could perhaps record the first error, to avoid generating a lot of extra work.

from prometheus.

bboreham avatar bboreham commented on September 23, 2024

Prometheus configuration file
No response

This makes it harder to tell if your problem could be relabeling.

from prometheus.

machine424 avatar machine424 commented on September 23, 2024

Actually, now that I'm looking at the code for real, I think a debug log should already be provided via checkAddError

prometheus/scrape/scrape.go

Lines 1781 to 1785 in 3b8b577

case errors.Is(err, storage.ErrDuplicateSampleForTimestamp):
appErrs.numDuplicates++
level.Debug(sl.l).Log("msg", "Duplicate sample for timestamp", "series", string(met))
sl.metrics.targetScrapeSampleDuplicate.Inc()
return false, nil

(no need for the extra debug log)
You can--log.level=debug and see.

from prometheus.

rgarcia89 avatar rgarcia89 commented on September 23, 2024

@machine424 you are right the debug log is already implemented. I just deployed one prometheus with debug log level enabled. Seems like kube-state-metrics is indeed producing duplicate samples...

ts=2024-05-13T19:20:40.233Z caller=main.go:1372 level=info msg="Completed loading of configuration file" filename=/etc/prometheus/config_out/prometheus.env.yaml totalDuration=95.860644ms db_storage=1.142µs remote_storage=150.634µs web_handler=872ns query_engine=776ns scrape=98.941µs scrape_sd=7.197985ms notify=13.095µs notify_sd=269.119µs rules=54.251368ms tracing=6.745µs
...
ts=2024-05-13T19:21:09.190Z caller=scrape.go:1777 level=debug component="scrape manager" scrape_pool=serviceMonitor/monitoring/kube-state-metrics/0 target=https://10.244.5.6:8443/metrics msg="Duplicate sample for timestamp" series="kube_pod_tolerations{namespace=\"calico-system\",pod=\"calico-kube-controllers-75c647b46c-pg9cr\",uid=\"bf944c52-17bd-438b-bbf1-d97f8671bd6b\",key=\"CriticalAddonsOnly\",operator=\"Exists\"}"
ts=2024-05-13T19:21:09.207Z caller=scrape.go:1738 level=warn component="scrape manager" scrape_pool=serviceMonitor/monitoring/kube-state-metrics/0 target=https://10.244.5.6:8443/metrics msg="Error on ingesting samples with different value but same timestamp" num_dropped=1

from prometheus.

rgarcia89 avatar rgarcia89 commented on September 23, 2024

And well it is doing by purpose... not sure what AKS is doing here but the toleration exists two times on the calico-kube-controllers deployment

       tolerations:
       - key: CriticalAddonsOnly
         operator: Exists
       - effect: NoSchedule
         key: node-role.kubernetes.io/master
       - effect: NoSchedule
         key: node-role.kubernetes.io/control-plane
       - key: CriticalAddonsOnly
         operator: Exists

So it seems like everything is working fine on prometheus and kube-state-metrics side 👍

from prometheus.

prymitive avatar prymitive commented on September 23, 2024

Yes, starting with v2.52.0 such "duplicates" are no longer ignored.

“Ignored” is probably the wrong word here. It’s a little bit more complicated than that.
You might have some timeseries multiple times with different values, in which case I think the last one will be appended to tsdb, this doesn’t have to be “correct” one.
Or you can even imagine metrics response giving different order of samples on each scrape, which for counters might mean bogus results.

from prometheus.

machine424 avatar machine424 commented on September 23, 2024

So it seems like everything is working fine on prometheus and kube-state-metrics side 👍

I think this is worth creating an issue on kube-state-metrics as well.
As the tolerations array permits "duplicates", and depending on kube_pod_tolerationsintent, there might be a need to deduplicate or add an index-label or sth to identify each toleration.
In this case, it seems to be "harmless", but perhaps the same approach is applied to other arrays. It’s important to ensure they are aware of this.

from prometheus.

rgarcia89 avatar rgarcia89 commented on September 23, 2024

@machine424 will do. Quite confusing to see that "duplicates" are allowed within the tolerations array. I wasn't expecting that.

from prometheus.

rgarcia89 avatar rgarcia89 commented on September 23, 2024

Closing here - Since everything is working as expected with Prometheus, thanks to everyone your help!

from prometheus.

bboreham avatar bboreham commented on September 23, 2024

Thanks for investigation @machine424.
One nit: "Error on ingesting samples with different value but same timestamp" - don't they all have the same value, i.e. 1?
I think this comes from Prometheus re-using an error in a slightly different context.

from prometheus.

machine424 avatar machine424 commented on September 23, 2024

One nit: "Error on ingesting samples with different value but same timestamp" - don't they all have the same value, i.e. 1?
I think this comes from Prometheus re-using an error in a slightly different context.

Good point. I think we agree that even in such cases (same value), we should continue to consider it as an error. This can help highlight a hidden issue (targets shouldn't rely on Prometheus deduplicating that IIUC). But I'm afraid some targets may be relying on the old behavior, especially the ones with honor_timestamps (No need to clean the exposed metrics, prometheus will take care of that)

That being said, the TSDB doesn't consider samples with the same timestamps and the same value as duplicates; it tolerates that:

if t == msMaxt {
// We are allowing exact duplicates as we can encounter them in valid cases
// like federation and erroring out at that time would be extremely noisy.
// This only checks against the latest in-order sample.
// The OOO headchunk has its own method to detect these duplicates.
if math.Float64bits(s.lastValue) != math.Float64bits(v) {
return false, 0, storage.ErrDuplicateSampleForTimestamp
}
// Sample is identical (ts + value) with most current (highest ts) sample in sampleBuf.
return false, 0, nil

Hence, the explicit warning message.

If we want to maintain the current behavior, I agree we shouldn't use a storage error ErrDuplicateSampleForTimestamp for a scrape phase issue.

from prometheus.

rgarcia89 avatar rgarcia89 commented on September 23, 2024

I'd also would like to suggest a revision of the warning message triggered by duplicate series in Prometheus. In my experience, the message didn't accurately reflect the situation, as both samples had identical values.

Similarly, the prometheus_target_scrapes_sample_duplicate_timestamp_total counter seems to be incrementing even when the duplicate samples have the same value, which contradicts its intended purpose - at least by the current definition.

While I understand the logic behind rejecting duplicate samples, I'm a bit confused about the implementation, as the underlying TSDB is accepting such cases.

from prometheus.

machine424 avatar machine424 commented on September 23, 2024

Do you think @bboreham that could be done as part of #13277 or should we create an issue for it?

from prometheus.

freshaier avatar freshaier commented on September 23, 2024

I have a special and unprecedented idea that ingests different value on the same timestamp on different moment. The timestamp on the metric is created by myself not prometheus pulls.The metric has the potential to increase by the time. So how can i escape the info "Error on ingesting samples with different value but same timestamp" on the behind logic of promethues and ingest value(not first)on same timestamp?THX @bboreham@machine424

from prometheus.

bboreham avatar bboreham commented on September 23, 2024

The only way to send multiple samples for the same series at different timestamps into current Prometheus is via remote-write.

I checked the OpenMetrics spec and I don't think it says whether a scrape can have the same series at multiple timestamps.
You could raise a feature request for this.

from prometheus.

freshaier avatar freshaier commented on September 23, 2024

from prometheus.

Aracki avatar Aracki commented on September 23, 2024

Does anyone know how it is possible that in my case the rate of prometheus_target_scrapes_sample_duplicate_timestamp_total is constantly ~0.0666 but when I check duplicate metrics I couldn't find any (used the curl @nijave's curl command)...?

from prometheus.

bboreham avatar bboreham commented on September 23, 2024

Turn on debug logging?

from prometheus.

Aracki avatar Aracki commented on September 23, 2024

Oh, now I see in Prometheus these warnings msg="Error on ingesting samples with different value but same timestamp" num_dropped=1; I guess duplicates are being dropped before I try to get them via /metrics endpoint.

from prometheus.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.