Git Product home page Git Product logo

thanos-openshift's Introduction

Thanos - long term storage for your Prometheus Metrics on OpenShift

Thanos? Another greek guy?

Thanos is a project that turns your prometheus installation into a highly available metric system with unlimited storage capacity. From a very high-level view, it does this by deploying a sidecar to prometheus, which uploads the data blocks to any object storage. A store component downloads the blocks again and makes them accessible to a query component, which has the same API as prometheus itself. This works nicely with Grafana because its the same API. So without much effort, you essentially get an unlimited timeline of your nice dashboard graphs.

On top of these already awesome features, Thanos also provides downsampling of stored metrics, deduplication of data points and some more.

Motivation

We are mostly interested in the unlimited storage of prometheus data. For our data science work, we need more than just a couple of days worth of data. We might want to go back months in time. Still, we don't want to add more complexity to the tooling but stay with the prometheus query API and PromQL we already know.

We're can re-use our Grafana dashboards, ML Containers and Jupyter Notebooks

Setup

We are using the S3 access points from Ceph to store the Time Series DB (TSDB) blocks in a bucket. We also need to deploy on top of a managed OpenShift installation, so we can't tune any network configurations or use cluster admin for our setup.

The first problem we've encountered is the gossip protocol that Thanos uses to find new nodes added to the Thanos cluster. But as we just want to store blocks from a single prometheus instance, we're not really interested in the clustered setup. To get around this, you have to tell every Thanos component to listen to a cluster address, but don't use it for cluster discovery. Instead, use the --store flag to specify the nodes directly. And gossip is probably being removed from Thanos anyway, so ๐Ÿคท

We're also building on top of the OKD prometheus examples, hence you'll see diffs in this post, which should be easy to apply to your setup.

Just the sidecar

sidecar

This is the least intrusive deployment. You can just add this without interfering with you prometheus at all. One thing that you might need to add to your prometheus configuration is the external_labels section and --storage.tsdb.{min, max}-block-duration setting. See the sidecar documentation for the reasoning. Here are the full deployment template and the diff to the original deployment template.

oc process -f ./prometheus_thanos_sidecar.yaml --param THANOS_ACCESS_KEY=abc --param THANOS_SECRET_KEY=xyz | oc apply -f -

Deploying this will store the TSDB blocks in the configured S3 bucket. Now, how would you query those offloaded blocks?

Thanos Query

query

Now we've added Thanos Query, a web and API frontend which looks like prometheus, but is able to query a prometheus instance and a Thanos Store at the same time. This gives you transparent access to the archived blocks and real-time metrics.

oc process -f ./prometheus_thanos_full.yaml --param THANOS_ACCESS_KEY=abc --param THANOS_SECRET_KEY=xyz | oc apply -f -

Taco Wrap Up

You should start with just the sidecar deployment to start backing up your metrics. If you don't even want to fiddle with the prometheus setup or you don't have access to it, you can also use the federate API from prometheus to deploy another instance just for doing the backup. This is actually how we do it because other teams run the production prometheus.

Then let it run for a couple of days and estimate the storage requirements.

After this, have fun with the query and store component and enjoy your unlimited way back in time of metrics.

thanos-openshift's People

Contributors

durandom avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.