Comments (7)
This is unlikely to be a bug in Prometheus but most likely problem on your end.
If you look at timestamps you’ll notice that they are duplicated. There are always two samples ~20ms apart from each other. You might be scraping the same target twice or two different targets ends up with identical time series.
When everything works smoothly you won’t notice any problems. But if there’s a delay with either of these scrapes then it might result in data like you see above, mostly because timestamp of each sample is the beginning of the scrape request.
If one scrape starts, gets delayed on dns or connect attempt, but the other one is fast, then the slow scrape might end up with lower timestamp but higher value.
from prometheus.
@prymitive Is there a way to handle this as I would need 2 statefulsets of promethues, Thanos does take care of deduplication but this delay might be difficult to manage right?
from prometheus.
Handle what exactly?
In Prometheus you’re supposed to have unique labels on all time series. Automatic injection of job and instance labels usually ensures this.
So first you need to understand why you have two scrapes that result in the same time series.
from prometheus.
@prymitive
I have different labels for the metrics. Since we have two promethues setup, they both scrape data at 60s interval based upon when each stateful set starts. For any de-deplication thanos takes care of these situation. Now if I understood you point correctly there are "few" moments in time that scrape time of counter are slightly off. When the data is aggregated and queried on thanos I get the issue.
Or i have a wrong understanding?
Example : These are the label for my metrics
request_count_total{app="test", exported_id="test-594b9d94fc-kgdcg", exported_service="test", id="test-594b9d94fc-kgdcg", instance="172.26.19.57", job="kubernetes-services-pods", name="test", namespace="test", pod_template_hash="594b9d94fc", prometheus="monitoring/prometheus-stack-kube-prom-prometheus", service="test", system="INTERNET"}
On prom-0, i have this a value
ON prom-1, i that this value
On thanos querier :
Scrape duration on these endpoint is less than 0.1 sec as well
from prometheus.
If you use thanos and that’s where you see this problem then maybe thanos is merging two counters from two different Prometheus servers into a single time series?
Try your query on both Prometheus servers directly, if that works then you need to add some unique external labels on each Prometheus.
from prometheus.
@prymitive I am already adding global external labels as promethues_replica: $(POD_NAME)
in prometheus config, which is then used in thanos queries for de-duplication as --query.replica-label=prometheus_replica
.
from prometheus.
Indeed 20ms come from two different Prometheis servers. It looks like a configuration issue on the Thanos side.
In your last comment you have a typo: promethues_replica , is it like that in your config too?
from prometheus.
Related Issues (20)
- Some aggregations and functions produce incorrect results for native histograms HOT 9
- docker image does not recognise timezone appropriately HOT 1
- OOM crashloop auto-recovery HOT 1
- discovery(scaleway): instances without private IPs are not added to the target lists
- protocol error: received DATA after END_STREAM HOT 3
- Prometheus does not recognize `HELP` and `TYPE` for OpenMetrics counters HOT 3
- Idea to improve performance after missing a cache during scrape processing
- prometheus is very slow for query and almost unavailable HOT 3
- Persist alert 'keep_firing_for' state across restarts HOT 6
- --enable-feature: Consider removing no-default-scrape-port HOT 1
- promtool syntax detects errors HOT 1
- Please sign your releases HOT 2
- Default --storage.tsdb.retention.time HOT 6
- Prometheus too old sample issue
- docs: Remove the section about remote read JSON responses - it only supports proto response or errors HOT 2
- Corrupting data written to remote storage in case sample_age_limit is hit HOT 2
- Implement support for dots in metric and label names. HOT 1
- Do the remote-write support the recording rule data? HOT 1
- Unable to add namespace in nomad_sd_configs HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from prometheus.