Git Product home page Git Product logo

prometheus / prometheus Goto Github PK

View Code? Open in Web Editor NEW
52.8K 1.1K 8.7K 195.19 MB

The Prometheus monitoring system and time series database.

Home Page: https://prometheus.io/

License: Apache License 2.0

Makefile 0.07% Go 90.63% HTML 0.45% CSS 0.05% JavaScript 0.41% Shell 0.26% Lex 0.09% Dockerfile 0.02% TypeScript 7.27% Yacc 0.48% SCSS 0.27%
monitoring metrics alerting graphing time-series prometheus hacktoberfest

prometheus's Introduction

Prometheus
Prometheus

Visit prometheus.io for the full documentation, examples and guides.

CI Docker Repository on Quay Docker Pulls Go Report Card CII Best Practices Gitpod ready-to-code Fuzzing Status OpenSSF Scorecard

Prometheus, a Cloud Native Computing Foundation project, is a systems and service monitoring system. It collects metrics from configured targets at given intervals, evaluates rule expressions, displays the results, and can trigger alerts when specified conditions are observed.

The features that distinguish Prometheus from other metrics and monitoring systems are:

  • A multi-dimensional data model (time series defined by metric name and set of key/value dimensions)
  • PromQL, a powerful and flexible query language to leverage this dimensionality
  • No dependency on distributed storage; single server nodes are autonomous
  • An HTTP pull model for time series collection
  • Pushing time series is supported via an intermediary gateway for batch jobs
  • Targets are discovered via service discovery or static configuration
  • Multiple modes of graphing and dashboarding support
  • Support for hierarchical and horizontal federation

Architecture overview

Architecture overview

Install

There are various ways of installing Prometheus.

Precompiled binaries

Precompiled binaries for released versions are available in the download section on prometheus.io. Using the latest production release binary is the recommended way of installing Prometheus. See the Installing chapter in the documentation for all the details.

Docker images

Docker images are available on Quay.io or Docker Hub.

You can launch a Prometheus container for trying it out with

docker run --name prometheus -d -p 127.0.0.1:9090:9090 prom/prometheus

Prometheus will now be reachable at http://localhost:9090/.

Building from source

To build Prometheus from source code, You need:

Start by cloning the repository:

git clone https://github.com/prometheus/prometheus.git
cd prometheus

You can use the go tool to build and install the prometheus and promtool binaries into your GOPATH:

GO111MODULE=on go install github.com/prometheus/prometheus/cmd/...
prometheus --config.file=your_config.yml

However, when using go install to build Prometheus, Prometheus will expect to be able to read its web assets from local filesystem directories under web/ui/static and web/ui/templates. In order for these assets to be found, you will have to run Prometheus from the root of the cloned repository. Note also that these directories do not include the React UI unless it has been built explicitly using make assets or make build.

An example of the above configuration file can be found here.

You can also build using make build, which will compile in the web assets so that Prometheus can be run from anywhere:

make build
./prometheus --config.file=your_config.yml

The Makefile provides several targets:

  • build: build the prometheus and promtool binaries (includes building and compiling in web assets)
  • test: run the tests
  • test-short: run the short tests
  • format: format the source code
  • vet: check the source code for common errors
  • assets: build the React UI

Service discovery plugins

Prometheus is bundled with many service discovery plugins. When building Prometheus from source, you can edit the plugins.yml file to disable some service discoveries. The file is a yaml-formated list of go import path that will be built into the Prometheus binary.

After you have changed the file, you need to run make build again.

If you are using another method to compile Prometheus, make plugins will generate the plugins file accordingly.

If you add out-of-tree plugins, which we do not endorse at the moment, additional steps might be needed to adjust the go.mod and go.sum files. As always, be extra careful when loading third party code.

Building the Docker image

The make docker target is designed for use in our CI system. You can build a docker image locally with the following commands:

make promu
promu crossbuild -p linux/amd64
make npm_licenses
make common-docker-amd64

Using Prometheus as a Go Library

Remote Write

We are publishing our Remote Write protobuf independently at buf.build.

You can use that as a library:

go get buf.build/gen/go/prometheus/prometheus/protocolbuffers/go@latest

This is experimental.

Prometheus code base

In order to comply with go mod rules, Prometheus release number do not exactly match Go module releases. For the Prometheus v2.y.z releases, we are publishing equivalent v0.y.z tags.

Therefore, a user that would want to use Prometheus v2.35.0 as a library could do:

go get github.com/prometheus/[email protected]

This solution makes it clear that we might break our internal Go APIs between minor user-facing releases, as breaking changes are allowed in major version zero.

React UI Development

For more information on building, running, and developing on the React-based UI, see the React app's README.md.

More information

  • Godoc documentation is available via pkg.go.dev. Due to peculiarities of Go Modules, v2.x.y will be displayed as v0.x.y.
  • See the Community page for how to reach the Prometheus developers and users on various communication channels.

Contributing

Refer to CONTRIBUTING.md

License

Apache License 2.0, see LICENSE.

prometheus's People

Contributors

arthursens avatar bboreham avatar beorn7 avatar bernerdschaefer avatar brancz avatar brian-brazil avatar bwplotka avatar charleskorn avatar codesome avatar csmarchbanks avatar cstyan avatar dependabot[bot] avatar discordianfish avatar fabxc avatar gouthamve avatar grobie avatar johncming avatar juliusv avatar krasi-georgiev avatar leviharrison avatar matttproud avatar mmorel-35 avatar nexucis avatar pracucci avatar prymitive avatar roidelapluie avatar simonpasquier avatar slrtbtfs avatar superq avatar tomwilkie avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

prometheus's Issues

Simple graph generation interface

For a first demo, we need a rudimentary graph generation interface. It can be a simple web form that allows the input of metric, labels, and timerange and shows a graph accordingly.

Make graphs linkable

Graphs should be linkable via their current URL from the browser address bar.

Investigate supporting arrays as label values

This would be useful for things like e.g. the list of roles attached to a host and then querying only by a single role. Not sure if supporting this is worth the time and complexity though.

Remove temporal aliasing from rate() and delta() functions.

The rate() and delta() functions should consider the times of the actual first and last sample times within an interval vs. the desired begin and end time of the interval and compensate for any temporal aliasing that occurs when the graphing resolution is not a multiple of the recorded sample resolution.

Parameterize LevelDB Storage Behaviors

  1. Synchronous I/O should be a static flag for all LevelDB persistence engines. This should default to true for the time being.
  2. LRU cache size should be a flag for each LevelDB we use. We can discuss the safe defaults.

Harden Closing Behavior

Upon close or close request, …

  1. Prometheus should go into a drain mode immediately whereby no further retrievals or queries are answered.
  2. Once in drain mode, it should flush all pending metrics for appending into the storage infrastructure.
  3. The storage infrastructure, should then begin a flush procedure of its own—e.g., moving in-memory values to the on-disk LevelDB store. After this step is finished, we should be safe to shut down.

Item no. 3 is pertinent now, as it is possible, though unlikely, to introduce metric index corruption for which we do not have any tools to perform referential integrity checks on the LevelDB storage. Quick example:

  1. Metric and sample are requested to be appended.
  2. LevelDB storage checks indices for metric.
  3. No index element is found; it creates a preliminary one for label name and label value pairs.
  4. Finally an index is made for the entire metric.
  5. Sample is stored.

The ordering for no. 2 and no. 3 may be wrong; but although this mutation process is idempotent, we would never set the fingerprints for the new metric correctly.

Proposal:

  1. Fix the problem as I described above.
  2. Create an offline referential integrity scanner and repair utility. This would not take too long to due and would simply require the LevelDB iterator type and the model decoders.

Ruby client

We need a Ruby version of the client instrumentation library.

Implement UI for graph end-time selection

Currently we only allow choosing a range back in time from a given Unix timestamp as the graph ending time. The interface should instead have arrows that allow skipping back/forwards in time by smart units.

Remote storage

Prometheus needs to be able to interface with a remote and scalable data store for long-term storage/retrieval.

GetFingerprintsForLabelSet() does OR, not AND on labels

The expression:

targets_healthy_scrape_latency_ms{percentile="0.010000"}

Yields 12 vector elements where it should only yield one:

request_metrics_latency_equal_tallying_microseconds{instance='http://localhost:9090/metrics.json',percentile='0.010000'} => 657 @[2013-01-13 05:25:59.236345 +0100 CET]
requests_metrics_latency_equal_accumulating_microseconds{instance='http://localhost:9090/metrics.json',percentile='0.010000'} => 657 @[2013-01-13 05:25:59.236345 +0100 CET]
requests_metrics_latency_logarithmic_accumulating_microseconds{instance='http://localhost:9090/metrics.json',percentile='0.010000'} => 657 @[2013-01-13 05:25:59.236345 +0100 CET]
requests_metrics_latency_logarithmic_tallying_microseconds{instance='http://localhost:9090/metrics.json',percentile='0.010000'} => 657 @[2013-01-13 05:25:59.236345 +0100 CET]
sample_append_disk_latency_microseconds{instance='http://localhost:9090/metrics.json',percentile='0.010000'} => 725 @[2013-01-13 05:25:59.236345 +0100 CET]
targets_healthy_scrape_latency_ms{instance='http://localhost:9090/metrics.json',percentile='0.010000'} => 1.751 @[2013-01-13 05:25:59.236345 +0100 CET]
targets_healthy_scrape_latency_ms{instance='http://localhost:9090/metrics.json',percentile='0.010000'} => 1.751 @[2013-01-13 05:25:59.236345 +0100 CET]
targets_healthy_scrape_latency_ms{instance='http://localhost:9090/metrics.json',percentile='0.050000'} => 1.751 @[2013-01-13 05:25:59.236345 +0100 CET]
targets_healthy_scrape_latency_ms{instance='http://localhost:9090/metrics.json',percentile='0.500000'} => 1.751 @[2013-01-13 05:25:59.236345 +0100 CET]
targets_healthy_scrape_latency_ms{instance='http://localhost:9090/metrics.json',percentile='0.900000'} => 1.751 @[2013-01-13 05:25:59.236345 +0100 CET]
targets_healthy_scrape_latency_ms{instance='http://localhost:9090/metrics.json',percentile='0.990000'} => 1.751 @[2013-01-13 05:25:59.236345 +0100 CET]
targets_unhealthy_scrape_latency_ms{instance='http://localhost:9090/metrics.json',percentile='0.010000'} => NaN @[2013-01-13 05:25:59.236345 +0100 CET]

This is an OR of the labels, not an AND (all timeseries that match either the name of the metric OR the percentile value). The bug is in GetFingerprintsForLabelSet():

fmt.Printf("===========> %v\n", labels);
fingerprints, err := p.persistence.GetFingerprintsForLabelSet(&labels)
fmt.Printf("===========> %v\n", fingerprints);

This outputs this:

===========> map[percentile:0.010000 name:targets_healthy_scrape_latency_ms]
===========> [0xf8402fd650 0xf8402fd660 0xf8402fd680 0xf8402fd690 0xf8402fd6a0 0xf8402fd750 0xf8402fd760 0xf8402fd770 0xf8402fd780 0xf8402fd790 0xf8402fd7a0 0xf8402fd7b0]

Note the 12 fingerprints where there should only be one!

The reason is that GetFingerprintsForLabelSet() steps through all labels and fetches the matching metrics for each label, thus resulting in an OR.

Adjust return signature of GetBoundaryValues() metric persistence method

Current:

GetBoundaryValues(_model.LabelSet, *model.Interval, *StalenessPolicy) (_model.Sample, *model.Sample, error)

The return value is hard to use because the caller needs to manually match labelsets between the first and second return value e.g. for computing deltas. And there is nothing in the return types itself that ensures that the labels even do match.

So I think it should be the same as the GetRangeValues() return value:

GetBoundaryValues(_model.LabelSet, *model.Interval, *StalenessPolicy) (_model.SampleSet, error)

Downside: it's not explicit from the return type that it contains exactly two datapoints in each timeseries, but it's probably better than introducing yet another special type that ensures that.

Expression browser

Implement an expression browser via a web form (user inputs a rule language expression and gets back the evaluated result).

Change MetricPersistence interface to query values by fingerprint instead of by metric

It might make sense to change the interface of e.g. GetValueAtTime(), GetBoundaryValues(), and GetRangeValues() to expect a fingerprint instead of a metric labelset.

The AST currently starts off knowing a labelset, then gets all fingerprints for that, then gets the metrics for those fingerprints, then gets the values for each of those metrics.

It could be just: get all fingerprints for labelset, fetch values for each fingerprint. One conversion step less.

Targetpools scrape only first target in pool

Targetpools scrape only first target that was added to the pool (at pool creation time). I haven't figured out the exact problem yet, but maybe it is something wrong with the heap handling? I've verified that Add() is called correctly for all targets on the right pools, but after that in the actual runs, p.Len() is always just 1.

I'll send a minimal config with multiple targets via mail.

Implement Links between Graph and Expression Browser Pages

In the Graph Page

  • "View this Graph in the Expression Browser"

In the Expression Browser

  • "View this Expression as a Graph" link. @juliusv can offer some insights into runtime checks to ensure that the right kind of expressions are only allowed this.
  • Use heuristics from the AST to create node-level links of expressions such that these sub expressions can be graphed.

Data model optimization

Let's take another look at optimizing the Prometheus data models after our first experiments.

GetBoundaryValues() and GetRangeValues() should return labels in SampleSets

I've only tested GetRangeValues() so far, but maybe GetBoundaryValues() has the same behavior. The SampleSets that get returned have a nil-map as the Metric member, whereas they should probably contain the labels that the function was called with, to yield a proper timeseries.

It's not a big problem right now, because the caller has the right labels anyways and can insert them. However, I'm not sure if that's intended.

Expression evaluation code is not goroutine-safe

To reproduce, create a small program which launches concurrent requests to prometheus/api/query?json=JSON&expr=<anything>. Each request will have one of three possible outcomes:

  1. success (lucky you!)
  2. "Error parsing rules at line X, char Y: syntax error"
  3. crashing Prometheus with a slice-out-of-bounds

For the record, the panic has this stacktrace:

rules/lexer.l.go:196 (0x44c886)
rules/load.go:51 (0x44ce7e)
rules/parser.y.go:192 (0x44debf)
rules/parser.y.go:265 (0x450c0b)
rules/load.go:75 (0x44d177)
rules/load.go:116 (0x44d634)
rules/load.go:125 (0x44d746)
web/api/query.go:30 (0x5461d5)
web/api/api.go:0 (0x547565)

Implement Additional Store of Metric Counter Resets

If we had a mediator around the storage system, we could easily track counter resets with respect to metric values. This would save the number of range (a, b] queries in favor of just querying the endpoints.

Invalid iterator crash bug in newSeriesFrontier()

With my expressions benchmark (living in branch "julius-metrics-persistence-benchmarks"), I managed to provoke the following crash in newSeriesFrontier():

$ go run -a expressions_benchmark.go --leveldbFlushOnMutate=false -numTimeseries=10 -populateStorage=true -deleteStorage=true -evalIntervalSeconds=3600 > /tmp/foo.txt
^[OFpanic: runtime error: invalid memory address or nil pointer dereference
[signal 0xb code=0x1 addr=0x18 pc=0x7f9e61c48254]

goroutine 1 [select]:
github.com/prometheus/prometheus/storage/metric.(*tieredStorage).MakeView(0xf840000e00, 0xf8445f8040, 0xf8420e0140, 0xdf8475800, 0x0, ...)
    /home/julius/gosrc/src/github.com/prometheus/prometheus/storage/metric/tiered.go:135 +0x34b
github.com/prometheus/prometheus/rules/ast.viewAdapterForRangeQuery(0xf8400bee00, 0xf8420e0f80, 0x0, 0x0, 0x0, ...)
    /home/julius/gosrc/src/github.com/prometheus/prometheus/rules/ast/query_analyzer.go:138 +0x480
github.com/prometheus/prometheus/rules/ast.EvalVectorRange(0xf844652ac0, 0xf8420e0f80, 0x0, 0x0, 0x0, ...)
    /home/julius/gosrc/src/github.com/prometheus/prometheus/rules/ast/ast.go:274 +0xff
main.doBenchmark(0x6b5ce4, 0xf800000008)
    /home/julius/gosrc/src/github.com/prometheus/prometheus/expressions_benchmark.go:113 +0x45c
main.main()
    /home/julius/gosrc/src/github.com/prometheus/prometheus/expressions_benchmark.go:153 +0x37a

goroutine 2 [syscall]:
created by runtime.main
    /home/julius/go/src/pkg/runtime/proc.c:221

goroutine 9 [syscall]:
github.com/jmhodges/levigo._Cfunc_leveldb_iter_key(0x7f9e480017e0, 0xf843f274e0)
    github.com/jmhodges/levigo/_obj/_cgo_defun.c:178 +0x2f
github.com/jmhodges/levigo.(*Iterator).Key(0xf843f274a0, 0x746920410000000f, 0x7f9e620038a0, 0x100000001)
    github.com/jmhodges/levigo/_obj/batch.cgo1.go:519 +0x44
github.com/prometheus/prometheus/storage/raw/leveldb.levigoIterator.Key(0xf843f274a0, 0xf843f27498, 0xf843f27490, 0xf84009e820, 0x0, ...)
    /home/julius/gosrc/src/github.com/prometheus/prometheus/storage/raw/leveldb/leveldb.go:121 +0xe8
github.com/prometheus/prometheus/storage/raw/leveldb.(*levigoIterator).Key(0xf84195e060, 0x0, 0x0, 0x0)
    /home/julius/gosrc/src/github.com/prometheus/prometheus/storage/raw/leveldb/batch.go:0 +0x8c
github.com/prometheus/prometheus/storage/metric.extractSampleKey(0xf844bbeb40, 0xf84195e060, 0xf844652480, 0x0, 0x0, ...)
    /home/julius/gosrc/src/github.com/prometheus/prometheus/storage/metric/leveldb.go:692 +0xa4
github.com/prometheus/prometheus/storage/metric.newSeriesFrontier(0xf8400d6000, 0xf8400d9b40, 0xf8400d6000, 0xf84195e000, 0x0, ...)
    /home/julius/gosrc/src/github.com/prometheus/prometheus/storage/metric/frontier.go:147 +0x7ee
github.com/prometheus/prometheus/storage/metric.(*tieredStorage).renderView(0xf840000e00, 0xf8445f8040, 0xf8420e0140, 0xf840f596c0)
    /home/julius/gosrc/src/github.com/prometheus/prometheus/storage/metric/tiered.go:384 +0x444
github.com/prometheus/prometheus/storage/metric.(*tieredStorage).Serve(0xf840000e00, 0x0)
    /home/julius/gosrc/src/github.com/prometheus/prometheus/storage/metric/tiered.go:181 +0x143
created by main.main
    /home/julius/gosrc/src/github.com/prometheus/prometheus/expressions_benchmark.go:139 +0x292

goroutine 10 [syscall]:
created by addtimer
    /home/julius/go/src/pkg/runtime/ztime_amd64.c:72
exit status 2

The culprit is this line, where we rewind the iterator although it is possible that it is already pointing at the first element on disk:

Please add logic to prevent this as well as a regression test.

Support User-Provided Static Asset Serving Directory

What we have right now for dashboard generation is good for ad hoc sharing but does not support good long-term persistent dashboard uses cases where additional visual elements or metadata may be required.

Thusly, I would like to envision a world where …

  1. A precompiled Prometheus binary could be offered to teams, possibly packaged as a self-contained archive file with all external dependencies: the binary, the compiled-in blob assets, required shared libraries and such. We're basically here with the new build system.
  2. A team can take one of these packages mentioned above and vendor it to include a set of static assets that they would like served with their Prometheus. For instance, a custom dashboard with associated templates, HTML, CSS, JS, you-name-it.

./prometheus --userAssets=/path/to/asset/root

/path/to/asset/root may contain

  1. index.html or index.html.tmpl, which is used as the root handler for http://prometheus.host/user.
  2. Go template files, which Go will evaluate and interpolate into rendered content for a list of publicly-defined (via contract) variables.

/CC: @discordianfish and @juliusv

Rule formatting tool.

Like "gofmt" for Go, we ought to have a "promfmt" for Prometheus since we have a syntax tree. The idea being that the system produces uniform style that minimizes deviation and learning curve.

Update after we have totally moved to YAML rule files: In addition to formatting the PromQL expressions, we also want to format the YAML files to have a fixed structure, while preserving comments for both PromQL expressions and the YAML file.

Incorporate Data Resampling and Destruction Policy

The datastore grows ad infinitum right now. We need a couple of capabilities:

  1. The capability of specifying a reduction policy along …
    • an interval of a given size (e.g., one hour),
    • with a reduction method (e.g., mean, median, minimum, maximum), and
    • on samples with a timestamp subject to a certain predicate condition (e.g., older than one day from now).
  2. A reduction policy should be specifiable on a …
    • global basis (e.g., a median is OK for most things), and
    • a per-metric basis (e.g., downsample input pertaining to a SLA with the most pessimistic method like a minimum or maximum).

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.