Git Product home page Git Product logo

Comments (7)

prymitive avatar prymitive commented on May 23, 2024

This is unlikely to be a bug in Prometheus but most likely problem on your end.
If you look at timestamps you’ll notice that they are duplicated. There are always two samples ~20ms apart from each other. You might be scraping the same target twice or two different targets ends up with identical time series.
When everything works smoothly you won’t notice any problems. But if there’s a delay with either of these scrapes then it might result in data like you see above, mostly because timestamp of each sample is the beginning of the scrape request.
If one scrape starts, gets delayed on dns or connect attempt, but the other one is fast, then the slow scrape might end up with lower timestamp but higher value.

from prometheus.

ashishvaishno avatar ashishvaishno commented on May 23, 2024

@prymitive Is there a way to handle this as I would need 2 statefulsets of promethues, Thanos does take care of deduplication but this delay might be difficult to manage right?

from prometheus.

prymitive avatar prymitive commented on May 23, 2024

Handle what exactly?
In Prometheus you’re supposed to have unique labels on all time series. Automatic injection of job and instance labels usually ensures this.
So first you need to understand why you have two scrapes that result in the same time series.

from prometheus.

ashishvaishno avatar ashishvaishno commented on May 23, 2024

@prymitive
I have different labels for the metrics. Since we have two promethues setup, they both scrape data at 60s interval based upon when each stateful set starts. For any de-deplication thanos takes care of these situation. Now if I understood you point correctly there are "few" moments in time that scrape time of counter are slightly off. When the data is aggregated and queried on thanos I get the issue.
Or i have a wrong understanding?

Example : These are the label for my metrics

request_count_total{app="test", exported_id="test-594b9d94fc-kgdcg", exported_service="test", id="test-594b9d94fc-kgdcg", instance="172.26.19.57", job="kubernetes-services-pods", name="test", namespace="test", pod_template_hash="594b9d94fc", prometheus="monitoring/prometheus-stack-kube-prom-prometheus", service="test", system="INTERNET"}

On prom-0, i have this a value
Screenshot 2024-04-19 at 10 46 56
ON prom-1, i that this value
Screenshot 2024-04-19 at 10 47 20

On thanos querier :

Screenshot 2024-04-19 at 10 46 34

Scrape duration on these endpoint is less than 0.1 sec as well

from prometheus.

prymitive avatar prymitive commented on May 23, 2024

If you use thanos and that’s where you see this problem then maybe thanos is merging two counters from two different Prometheus servers into a single time series?
Try your query on both Prometheus servers directly, if that works then you need to add some unique external labels on each Prometheus.

from prometheus.

ashishvaishno avatar ashishvaishno commented on May 23, 2024

@prymitive I am already adding global external labels as promethues_replica: $(POD_NAME) in prometheus config, which is then used in thanos queries for de-duplication as --query.replica-label=prometheus_replica.

from prometheus.

roidelapluie avatar roidelapluie commented on May 23, 2024

Indeed 20ms come from two different Prometheis servers. It looks like a configuration issue on the Thanos side.

In your last comment you have a typo: promethues_replica , is it like that in your config too?

from prometheus.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.