Git Product home page Git Product logo

prometheus-downsampler's Introduction

Prometheus Downsampler

This program use for collect Prometheus data for last n minutes (default is 5 minutes). Then take average on each metrics. Output to a text file.

Solution

Use with another Prometheus for store the downsampled data. For our case, we set the long term Prometheus retention to 2 years (Still testing. Hope it will work as expected).

Tested for a while. Memory usage on prometheus keep growing. It may cause by metrics didn't update recently but not reach retention will keep that index in memory. Now will try thanos.

This program only output a text file on K8S empty dir. Then use a nginx in same pod to expose the output to long-term Prometheus. And need to set honor_labels: true inside long term Prometheus scrape job. Otherwise some conflicted labels will be renamed.

Downsampler with 2 Prometheus

Config

There 4 parameters can config. You can either use args or environment variable.

  • Source Prometheus endpoint
  • Output file path
    • Default: /tmp/prometheus_downsample_output.txt
    • Args: -o
    • Environame variable: PDS_OUTPUT
  • Interval in minute for collect data from source Prometheus
    • Default: 5m
    • Args: -i
    • Environame variable: PDS_INTERVAL
  • Max concurrent connection to source Prometheus
    • Default: 50
    • Args: -c
    • Environame variable: PDS_CONCURRENT

Example: Your prometheus endpoint is http://192.168.1.20:9090 and want to downsample data for every 10 minutes:

go run prometheus-downsampler.go -s http://192.168.1.20:9090 -i 10m

or

./prometheus-downsampler -s http://192.168.1.20:9090 -i 10m

How it work

  1. Call Querying label values API to get all metric names
  2. Call Range Queries API to get every metrics with 1 minute step
  3. Take average on each metrics
  4. Write all metrics with exposition format to a temp file
  5. Rename the temp file to output file name

Issue

This program can handle collect a longer time range data. Then group them to every n minute and take average. But due to below reasons. Now only process for single time group.

  • exposition format mention Each line must have a unique combination of a metric name and labels. Otherwise, the ingestion behavior is undefined.. But didn't mention is it safe if have different timestamp
  • Also tested for a while with export 1 hour data with 12 data points. The long term Prometheus lost some of data point.

Why not use remote_write with InfluxDB

Because we scrape over 650K metrics every 10 second (With 2 Prometheus servers for HA). We tried use remote_write to InfluxDB (Single server, not enterprise edition). But it cause InfluxDB very high CPU usage, no response and OOM dead very soon. Also make the operation Prometheus dead. So we try on different way (This project). Also just need a little bit modify on Grafana dashboard. No need to re-build all dashboard (Don't know why use Prometheus remote_read from InfluxDB for Grafana always got proxy timeout from Grafana).

prometheus-downsampler's People

Contributors

alantang888 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

prometheus-downsampler's Issues

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.