Git Product home page Git Product logo

prometheus-config's Introduction

Alerta Release 9.1

Actions Status Slack chat Coverage Status Docker Pulls

The Alerta monitoring tool was developed with the following aims in mind:

  • distributed and de-coupled so that it is SCALABLE
  • minimal CONFIGURATION that easily accepts alerts from any source
  • quick at-a-glance VISUALISATION with drill-down to detail

webui


Requirements

Release 9 only supports Python 3.9 or higher.

The only mandatory dependency is MongoDB or PostgreSQL. Everything else is optional.

  • Postgres version 13 or better
  • MongoDB version 6.0 or better

Installation

To install MongoDB on Debian/Ubuntu run:

$ sudo apt-get install -y mongodb-org
$ mongod

To install MongoDB on CentOS/RHEL run:

$ sudo yum install -y mongodb
$ mongod

To install the Alerta server and client run:

$ pip install alerta-server alerta
$ alertad run

To install the web console run:

$ wget https://github.com/alerta/alerta-webui/releases/latest/download/alerta-webui.tar.gz
$ tar zxvf alerta-webui.tar.gz
$ cd dist
$ python3 -m http.server 8000

>> browse to http://localhost:8000

Docker

Alerta and MongoDB can also run using Docker containers, see alerta/docker-alerta.

Configuration

To configure the alertad server override the default settings in /etc/alertad.conf or using ALERTA_SVR_CONF_FILE environment variable::

$ ALERTA_SVR_CONF_FILE=~/.alertad.conf
$ echo "DEBUG=True" > $ALERTA_SVR_CONF_FILE

Documentation

More information on configuration and other aspects of alerta can be found at http://docs.alerta.io

Development

To run in development mode, listening on port 5000:

$ export FLASK_APP=alerta FLASK_DEBUG=1
$ pip install -e .
$ flask run

To run in development mode, listening on port 8080, using Postgres and reporting errors to Sentry:

$ export FLASK_APP=alerta FLASK_DEBUG=1
$ export DATABASE_URL=postgres://localhost:5432/alerta5
$ export SENTRY_DSN=https://8b56098250544fb78b9578d8af2a7e13:[email protected]/153768
$ pip install -e .[postgres]
$ flask run --debugger --port 8080 --with-threads --reload

Troubleshooting

Enable debug log output by setting DEBUG=True in the API server configuration:

DEBUG=True

LOG_HANDLERS = ['console','file']
LOG_FORMAT = 'verbose'
LOG_FILE = '$HOME/alertad.log'

It can also be helpful to check the web browser developer console for JavaScript logging, network problems and API error responses.

Tests

To run the all the tests there must be a local Postgres and MongoDB database running. Then run:

$ TOXENV=ALL make test

To just run the Postgres or MongoDB tests run:

$ TOXENV=postgres make test
$ TOXENV=mongodb make test

To run a single test run something like:

$ TOXENV="mongodb -- tests/test_search.py::QueryParserTestCase::test_boolean_operators" make test
$ TOXENV="postgres -- tests/test_queryparser.py::PostgresQueryTestCase::test_boolean_operators" make test

Cloud Deployment

Alerta can be deployed to the cloud easily using Heroku https://github.com/alerta/heroku-api-alerta, AWS EC2 https://github.com/alerta/alerta-cloudformation, or Google Cloud Platform https://github.com/alerta/gcloud-api-alerta

License

Alerta monitoring system and console
Copyright 2012-2023 Nick Satterly

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.

prometheus-config's People

Contributors

abhishekjiitr avatar freeseacher avatar mvhconsult avatar satterly avatar tdabasinskas avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

prometheus-config's Issues

Looking for file or path to database in prometheus where alerts are stored in json or other format that reflects in grafana

What did you do?
I am not sure whether this is right forum to ask this questions but I am looking for file or path to database in prometheus where alerts are stored in json or other format that reflects in grafana. I want to write a scripts to take some actions based on alerts/triggers. Normally alerts can be seen on dashboard and notified accordingly on email or messages. My concerns is to trace the path from where the alerts are relflected in grafana.

What did you expect to see?
I expect to have path to database or any json file in which alerts are populated on run time.

Environment

System information:

Linux 3.10.0-1160.53.1.el7.x86_64 x86_64

Prometheus version:

prometheus, version 2.30.0-rc.0 (branch: HEAD, revision: https://github.com/prometheus/prometheus/commit/05a816bfb739b3841acd82bd285bfb5dfec7bfd7)
build user: root@438506e6c112
build date: 20210909-13:31:07
go version: go1.17
platform: linux/amd64

Alertmanager version:

alertmanager, version 0.20.0 (branch: HEAD, revision: f74be0400a6243d10bb53812d6fa408ad71ff32d)
build user: root@00c3106655f8
build date: 20191211-14:13:14
go version: go1.13.5

Any help in this regard would be highly appreciated.

Getting - {"code":400,"errors":null,"message":"no alerts in Prometheus notification payload","status":"error"}

Hi,

I am also getting this error. I have gone through many of the articles available on google and nothing helped.
Can you please advise if i am doing anything wrong. All these three components i have configured using docker containers

I have integrated prometheus, alertmanager with alerta. Below is my prometheus.yml file

my global config

global:
  scrape_interval:     15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
  evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.
  # scrape_timeout is set to the global default (10s).
# Alertmanager configuration
alerting:
  alertmanagers:
  - static_configs:
    - targets:
      - ip:9093
# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:
  # - "first_rules.yml"
  # - "second_rules.yml"
    - alert.rules.yml
# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:
  # The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
  - job_name: 'prometheus'
    # metrics_path defaults to '/metrics'
    # scheme defaults to 'http'.
    static_configs:
      - targets: ['ip:9090']
  - job_name: 'node_exporter'
    static_configs:
      - targets: ['ip:9100']
  - job_name: 'nginx_exporter'
    static_configs:
      - targets: ['ip:9113']
  - job_name: 'alert_manager'
    static_configs:
      - targets: ['ip:9093']
  - job_name: 's3-exporter'
    static_configs:
      - targets: ['ip:9340']
  - job_name: 'grafana'
    static_configs:
- targets: ['ip:3000']
  - job_name: 'container_exporter'
    static_configs:
      - targets: ['ip:9104']
  - job_name: 'alerta'
    metrics_path: /api/management/metrics
    static_configs:
    - targets: ['ip:80']
    basic_auth:
      username: xyz
      password: xyz

And also find below alertmanager.yml file

global:
  resolve_timeout: 5m
  smtp_smarthost: 'smtp.gmail.com:587'
  smtp_from: 'xyz'
  smtp_auth_username: '[email protected]'
  smtp_auth_password: 'xyz'
route:
        group_by: ['alertname']
        group_wait: 10s
        group_interval: 10s
        repeat_interval: 1h
        receiver: 'suraj'
        routes:
        - match_re:
            env: 'production'
          receiver: 'alerta'
receivers:
- name: 'suraj'
  email_configs:
  - to: 'email id'
    #inhibit_rules:
    #- source_match:
    # severity: 'critical'
    #target_match:
    #  severity: 'warning'
    #equal: ['alertname', 'dev', 'instance']
- name: 'alerta'
  webhook_configs:
  - url: 'http://ip/api/webhooks/prometheus'
    send_resolved: true
    http_config:
      basic_auth:
        username: xyz
        password: xyz

My alertmanager is showing me alerts on the console. Same prometheus console showing alerts, but alerta console is not showing them.
Below is the error i am getting while accessing http://IP:80/api/webhooks/prometheus

{"code":400,"errors":null,"message":"no alerts in Prometheus notification payload","status":"error"}

"customer" variable need to be an annotation, not a label

Issue Summary
Hi there !
I was tring to deploy Alerta, and lost sometime on the customer field.
As the documentation says here, the customer field need to be in the label section, which seems to be logical, but its was not working.
To get the customer field ok, I had to add it as an "annotation".

Environment

  • OS: linux

  • API version: 8.1.0

  • Deployment: Helmchart (Kubernetes)

  • Database: Postgres

  • Server config:
    Auth enabled? Yes
    Auth provider? Basic
    Customer views? No

alerta@alert-poc-alerta-f579f95f-mchv7:/$ alerta version
alerta 8.1.0
alerta client 8.0.0
requests 2.25.0
click 7.1.2

To Reproduce
Steps to reproduce the behavior:

  1. Deploy a PrometheusRule with the customer field in the label section
  2. Nothing will be display in the customer field
  3. Add it to annotations, the field will be ok

Expected behavior
Add the customer field in the labels, not annotation section

Typos in readme.md

There are several typos in the readme.md file. Will post a fix shortly.

Alertmanager Cluster

Hi, I really like Alerta so far. It does what Alertmanager and Unsee are missing. However, I can' t seem to find the best way to configure Alerta with a Alertmanager cluster.

Is there a best practice for this or is it not recommended?

Prometheus alert notification to OMNIBUS

Hi All,

Iam new to this Prometheus
I have configured OMNIBUS with message probe in stand alone RHEL VM
My Prometheus running as a Docker Container
I want to configure Prometheus alerts forward to IBM OMNIBUS
Please let me know the possibilities and share me if any document

Regards,
Venkatraj

No data available in alerta web

Issue Summary
I just tried to run the docker-compose up -d for this repo. I saw alerts in Prometheus and Alertmanager. However there is no data in Alerta. I tried to login using [email protected] and password alerta. It failed too.

Environment

  • OS: Mac
  • API version: whatever in docker-compose.yaml

Alerta in HA with Alertmanager in HA

Hi, I'm trying to configure alerta with HA to get alerts from alertmanager cluster. I have a doubt to what should I put under alertmanager configuration for the url value on the webhook_configs. If I use only one alerta host how will that alert appear on the second one?

Thanks,
Bruno

Multiple sources support (Alertmanager webhooks)

Hi,

Wondering before I make any PoC testing, if it would be possible to integrate multiple alertmanager webhooks pointing at Alerta?

Would Alerta be able to manager multiple Prometheus(Alertmanager) sources?

Regards,
Wojtek

How to altera list type field convert to a Prometheus rule configuration

How to altera convert to a Prometheus rule configuration
example :
alerta data:
"correlate": ["cpu","load"],

Prometheus rules:
labels:
correlate: ['cpu','load']
or
labels:
correlate:
- cpu
- load

All failure
./promtool check rules Prometheus.rules
Checking Prometheus.rules
FAILED:
yaml: unmarshal errors:
line : cannot unmarshal !!seq into string

Alertmanager (v0.19.0) can't send alerts. The response are always HTTP 400

Issue Summary
Alertmanager can't send alerts. The response are always HTTP 400

Environment

  • AlertManager: v0.19.0

  • Pushgateway: v1.0.0

  • Prometheus: v2.14.0

  • Alerta: 7.4.1

  • Deployment: Docker and docker-compose

  • Database: Postgresql

  • Server config:
    Auth enabled? No
    Auth provider? None
    Customer views? No

  • web UI version: 7.4.1

  • CLI version: 7.4.1

To Reproduce

Additional context
Checking Prometheus, AlertManager and Pushgateway UI's, everything looks fine.

The AlertManager's log show: component=dispatcher msg="Notify for alerts failed" num_alerts=1 err="cancelling notify retry for \"webhook\" due to unrecoverable error: unexpected status code 400 from http://172.18.0.1:8080/api/webhooks/prometheus"

I even saved the posted data using Beeceptor e tried do reproduce the operation using curl and got the same error: no alerts in Prometheus notification payload

Alerta docker image

Thanks for building latest image
I am running Prometheus, mongo-db & alerta as docker container & I can see below error:

2016-02-25 07:45:59,381 DEBG 'nginx' stdout output:
2016/02/25 07:45:59 [error] 23#0: *1 open() "/app/webhooks/prometheus" failed (2: No such file or directory), client: 172.17.42.1, server: , request: "POST /webhooks/prometheus HTTP/1.1", host: ":"

client: 172.17.42.1 is the gateway (when I do docker inspect)

Please let me know if I am missing something

Update Alerta docker image

Hi,
I am using alerta/alert-web latest docker image, please let me know if I can integrate Prometheus with alerta using this image.
when I check :/api/ I cant find Prometheus webhook, probably because its a new development.

README is out-of-date

  • latest version of prometheus rule file is YAML
  • prometheus uses double dashes for command-line options now
  • include BasicAuth in the webhook config example
  • value is populated from annotations, not labels
  • docker-compose is out-of-date (remove it and make reference to Docker repo or fix it)
  • update copyright year

Prometheus Webhook error with AlertManager 0.8.0 version

Below error appears while trying to integrate AlertManager 0.8.0 with Alerta Prometheus Webhook config. The error does not appear with AlertManager 0.7.1 version.

127.0.0.1 - - [22/Sep/2017 05:00:35] "POST /webhooks/prometheus HTTP/1.1" 500 -
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/flask/app.py", line 1997, in __call__
    return self.wsgi_app(environ, start_response)
  File "/usr/lib/python2.7/site-packages/flask/app.py", line 1985, in wsgi_app
    response = self.handle_exception(e)
  File "/usr/lib/python2.7/site-packages/flask_cors/extension.py", line 161, in wrapped_function
    return cors_after_request(app.make_response(f(*args, **kwargs)))
  File "/usr/lib/python2.7/site-packages/flask/app.py", line 1540, in handle_exception
    reraise(exc_type, exc_value, tb)
  File "/usr/lib/python2.7/site-packages/flask/app.py", line 1982, in wsgi_app
    response = self.full_dispatch_request()
  File "/usr/lib/python2.7/site-packages/flask/app.py", line 1614, in full_dispatch_request
    rv = self.handle_user_exception(e)
  File "/usr/lib/python2.7/site-packages/flask_cors/extension.py", line 161, in wrapped_function
    return cors_after_request(app.make_response(f(*args, **kwargs)))
  File "/usr/lib/python2.7/site-packages/flask/app.py", line 1517, in handle_user_exception
    reraise(exc_type, exc_value, tb)
  File "/usr/lib/python2.7/site-packages/flask/app.py", line 1612, in full_dispatch_request
    rv = self.dispatch_request()
  File "/usr/lib/python2.7/site-packages/flask/app.py", line 1598, in dispatch_request
    return self.view_functions[rule.endpoint](**req.view_args)
  File "/usr/lib/python2.7/site-packages/flask_cors/decorator.py", line 128, in wrapped_function
    resp = make_response(f(*args, **kwargs))
  File "/usr/lib/python2.7/site-packages/alerta_server-4.10.0-py2.7.egg/alerta/app/auth.py", line 157, in wrapped
    return f(*args, **kwargs)
  File "/usr/lib/python2.7/site-packages/alerta_server-4.10.0-py2.7.egg/alerta/app/webhooks/views.py", line 382, in prometheus
    incomingAlert = parse_prometheus(alert, external_url)
  File "/usr/lib/python2.7/site-packages/alerta_server-4.10.0-py2.7.egg/alerta/app/webhooks/views.py", line 339, in parse_prometheus
    text = description or summary or '%s: %s on %s' % (labels['job'], labels['alertname'], labels['instance'])
KeyError: 'job'

Cannot able to see the critical alerts in Alerta dashboard.

Issue Summary
We are running alerta using our docker-compose which consist of Prometheus, alertmanager and grafana, etc services deployed in a ubuntu machine. We would like to co-relate the alerts and group them to reduce the noise.

The following are the issues which i am currently facing.

  1. If any alert configured with label severity: critical I don't see that alert in alerta dashboard.
  2. Sometimes even though the nodes are up alerta show its service_dow alert and in unAvailable state.
  3. As per my understanding alerta is a visualization tool but doesn't have the capability to correlate alert that which alertmanager sends (no correlated alerts send with email by using alerta)

Environment

  • OS: Ubuntu

  • API version: kubernetes API

  • Deployment: docker compose

  • Database: mongo

  • Server config:
    Auth enabled? Yes
    Auth provider? No
    Customer views? Yes

To Reproduce
Steps to reproduce the behavior:

  1. Run using the docker-compose up -d
  2. will create alertmanager,prometheus, grafana,alerta,mongo and other containers
  3. login to the alerta console and see the dashboard.
  4. I can at least see all the alerts at the time of the creation of the alerta for the very first time and making any changes in alert rules like changing the severity from critical to warning or warning to minor sometimes i done see the alerts in alerta but i can see the alert in firing state in prometheus.

For web app issues, include any web browser JavaScript console errors.

Screenshots
image

image

docker compose:

prometheus:
image: prom/prometheus
container_name: prometheus
volumes:
- ./prometheus/:/etc/prometheus/
- prometheus_data:/prometheus
command:
- '--config.file=/etc/prometheus/prometheus.yml'
- '--storage.tsdb.path=/prometheus'
- '--web.console.libraries=/etc/prometheus/console_libraries'
- '--web.console.templates=/etc/prometheus/consoles'
- '--storage.tsdb.retention.time=12h'
- '--storage.tsdb.retention.size=1024MB'
- '--web.enable-lifecycle'
restart: unless-stopped
expose:
- 9090
networks:
- monitor-net
labels:
org.label-schema.group: "monitoring"

alertmanager:
image: prom/alertmanager
container_name: alertmanager
volumes:
- ./alertmanager/:/etc/alertmanager/
command:
- '--config.file=/etc/alertmanager/config.yml'
- '--storage.path=/alertmanager'
restart: unless-stopped
expose:
- 9093
networks:
- monitor-net
labels:
org.label-schema.group: "monitoring"

nodeexporter:
image: prom/node-exporter
container_name: nodeexporter
user: root
privileged: true
volumes:
- /proc:/host/proc:ro
- /sys:/host/sys:ro
- /:/rootfs:ro
command:
- '--path.procfs=/host/proc'
- '--path.rootfs=/rootfs'
- '--path.sysfs=/host/sys'
- '--collector.filesystem.ignored-mount-points=^/(sys|proc|dev|host|etc)($$|/)'
restart: unless-stopped
expose:
- 9100
networks:
- monitor-net
labels:
org.label-schema.group: "monitoring"

cadvisor:
image: google/cadvisor
container_name: cadvisor
volumes:
- /:/rootfs:ro
- /var/run:/var/run:rw
- /sys:/sys:ro
- /var/lib/docker/:/var/lib/docker:ro
#- /cgroup:/cgroup:ro #doesn't work on MacOS only for Linux
restart: unless-stopped
expose:
- 8080
networks:
- monitor-net
labels:
org.label-schema.group: "monitoring"

grafana:
image: grafana/grafana
container_name: grafana
volumes:
- grafana_data:/var/lib/grafana
- ./grafana/datasources:/etc/grafana/datasources
- ./grafana/dashboards:/etc/grafana/dashboards
- ./grafana/setup.sh:/setup.sh
entrypoint: /setup.sh
environment:
- ADMIN_USER=${ADMIN_USER:-admin}
- ADMIN_PASSWORD=${ADMIN_PASSWORD:-admin}
- GF_USERS_ALLOW_SIGN_UP=false
restart: unless-stopped
expose:
- 3000
networks:
- monitor-net
labels:
org.label-schema.group: "monitoring"

alerta:
image: alerta/alerta-web:latest
ports:
- 9080:8080
depends_on:
- db
environment:
- DEBUG=1 # remove this line to turn DEBUG off
- DATABASE_URL=mongodb://db:27017/monitoring
# - AUTH_REQUIRED=True
- ADMIN_USERS=[email protected]
- PLUGINS=remote_ip,reject,heartbeat,blackout,prometheus
- ALERTMANAGER_API_URL=http://alertmanager:9093
restart: always
networks:
- monitor-net

db:
image: mongo
volumes:
- ./data/mongodb:/data/db
restart: always
networks:
- monitor-net

Alertmanager config:
global:
resolve_timeout: 5m
smtp_smarthost: 'host:port'
smtp_from: '[email protected]'
smtp_auth_username: 'XXXXXXXXXXXXXXXXXX'
smtp_auth_password: 'XXXXXXXXXXXXXXXXXXXXXXXXXXXXX'
templates:

  • '/etc/alertmanager/template/*.tmpl'
    route:
    receiver: alerta
    group_by:
    • alertname
    • cluster
    • service
      routes:
    • receiver: iot-ops
      match:
      severity: critical
    • receiver: default-receiver
      match:
      severity: warning

inhibit_rules:

  • source_match:
    severity: 'critical'
    target_match:
    severity: 'warning'

    Apply inhibition if the alertname is the same.

    equal: ['alertname', 'cluster', 'service', 'de_duplicate']

receivers:

Prometheus

prometheus global config

global:
scrape_interval: 15s
evaluation_interval: 15s

scrape_timeout is set to the global default (10s).

external_labels:
environment: Production
service: Prometheus

alerting:
alertmanagers:

  • static_configs:
    • targets:
      • alertmanager:9093

rule_files:

  • "*.rules"

scrape_configs:

metrics_path defaults to '/metrics'

scheme defaults to 'http'.

  • job_name: prometheus

    scrape_interval: 5s
    scrape_timeout: 2s
    honor_labels: true

    static_configs:

    • targets: ['prometheus:9090']
  • job_name: cadvisor

    scrape_interval: 5s
    scrape_timeout: 2s
    honor_labels: true

    static_configs:

    • targets: ['cadvisor:8080']
  • job_name: node-exporter

    scrape_interval: 5s
    scrape_timeout: 2s
    honor_labels: true

    static_configs:

    • targets: ['nodeexporter:9100']
  • job_name: de-duplicate-node

    scrape_interval: 5s
    scrape_timeout: 2s
    honor_labels: true

    static_configs:

    • targets: ['nodeexporter:9101']
  • job_name: collectd
    static_configs:

    • targets: ['collectd:9103']

START-EDGECLIFFE85QOGFCTBMZRHOV****************************

metrics for kubernetes scheduler and controller

  • job_name: 'sample-k8s-job-name-scheduler-and-controller'
    scrape_interval: 5s
    static_configs:
    • targets: ['192.168.0.30:10251']
      labels:
      customer: 'EDGECLIFFE85QOGFCTBMZRHOV'

metrics foom node exporter

  • job_name: 'sample-k8s-job-name-nodes-exporter'
    scrape_interval: 5s
    static_configs:
    • targets: ['192.168.0.30:9100']
      labels:
      customer: 'EDGECLIFFE85QOGFCTBMZRHOV'
    • targets: ['192.168.0.31:9100']
      labels:
      customer: 'EDGECLIFFE85QOGFCTBMZRHOV'

metrics from cadvisory

  • job_name: 'sample-k8s-job-name-cadvisor'
    scrape_interval: 10s
    metrics_path: "/metrics/cadvisor"
    static_configs:
    - targets: ['192.168.0.30:10255']
    labels:
    customer: 'EDGECLIFFE85QOGFCTBMZRHOV'

metrics for default/kubernetes api's from the kubernetes master

  • job_name: 'sample-k8s-job-name-apiservers'
    kubernetes_sd_configs:
    • role: endpoints
      api_server: https://192.168.0.30
      tls_config:
      insecure_skip_verify: true
      basic_auth:
      username: random
      password: jkaskhjn1267nbamhkjhadnpoiqwoioo
      scheme: https
      tls_config:
      insecure_skip_verify: true
      basic_auth:
      username: random
      password: jkaskhjn1267nbamhkjhadnpoiqwoioo
      relabel_configs:
    • source_labels: [__meta_kubernetes_namespace, __meta_kubernetes_service_name, __meta_kubernetes_endpoint_port_name]
      action: keep
      regex: default;kubernetes;https

metrics for the kubernetes node kubelet service (collection proxied through master)

  • job_name: 'sample-k8s-job-name-kubernetes-nodes'
    kubernetes_sd_configs:
    • role: node
      api_server: https://192.168.0.30
      tls_config:
      insecure_skip_verify: true
      basic_auth:
      username: random
      password: jkaskhjn1267nbamhkjhadnpoiqwoioo
      scheme: https
      tls_config:
      insecure_skip_verify: true
      basic_auth:
      username: random
      password: jkaskhjn1267nbamhkjhadnpoiqwoioo
      relabel_configs:
    • action: labelmap
      regex: _meta_kubernetes_node_label(.+)
    • target_label: address
      replacement: 192.168.0.30:443
    • source_labels: [__meta_kubernetes_node_name]
      regex: (.+)
      target_label: metrics_path
      replacement: /api/v1/nodes/${1}/proxy/metrics/cadvisor

metrics from service endpoints on /metrics over https via the master proxy

set annotation (prometheus.io/scrape: true) to enable

Example: kubectl annotate svc myservice prometheus.io/scrape=true

  • job_name: 'sample-k8s-job-name-service-endpoints'
    kubernetes_sd_configs:
    • role: endpoints
      api_server: https://192.168.0.30
      tls_config:
      insecure_skip_verify: true
      basic_auth:
      username: random
      password: jkaskhjn1267nbamhkjhadnpoiqwoioo
      scheme: https
      tls_config:
      insecure_skip_verify: true
      basic_auth:
      username: random
      password: jkaskhjn1267nbamhkjhadnpoiqwoioo
      relabel_configs:
    • source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scrape]
      action: keep
      regex: true
    • source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_port]
      action: replace
      regex: (\d+)
      target_label: __meta_kubernetes_pod_container_port_number
    • source_labels: [__meta_kubernetes_service_annotation_prometheus_io_path]
      action: replace
      regex: ()
      target_label: __meta_kubernetes_service_annotation_prometheus_io_path
      replacement: /metrics
    • source_labels: [__meta_kubernetes_namespace, __meta_kubernetes_service_name, __meta_kubernetes_pod_container_port_number, __meta_kubernetes_service_annotation_prometheus_io_path]
      target_label: metrics_path
      regex: (.+);(.+);(.+);(.+)
      replacement: /api/v1/namespaces/$1/services/$2:$3/proxy$4
    • target_label: address
      replacement: 192.168.0.30:443
    • action: labelmap
      regex: _meta_kubernetes_service_label(.+)
    • source_labels: [__meta_kubernetes_namespace]
      action: replace
      target_label: 'EDGECLIFFE85QOGFCTBMZRHOV'
    • source_labels: [__meta_kubernetes_service_name]
      action: replace
      target_label: kubernetes_name
    • source_labels: [__meta_kubernetes_pod_node_name]
      action: replace
      target_label: instance

metrics from pod endpoints on /metrics over https via the master proxy

set annotation (prometheus.io/scrape: true) to enable

Example: kubectl annotate pod mypod prometheus.io/scrape=true

  • job_name: 'sample-k8s-job-name-pods'
    kubernetes_sd_configs:
    • role: pod
      api_server: https://192.168.0.30
      tls_config:
      insecure_skip_verify: true
      basic_auth:
      username: random
      password: jkaskhjn1267nbamhkjhadnpoiqwoioo
      scheme: http
      tls_config:
      insecure_skip_verify: true
      basic_auth:
      username: random
      password: jkaskhjn1267nbamhkjhadnpoiqwoioo
      relabel_configs:
    • source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
      action: keep
      regex: true
    • source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_path]
      action: replace
      target_label: metrics_path
      regex: (.+)
    • source_labels: [address, __meta_kubernetes_pod_annotation_prometheus_io_port]
      action: replace
      regex: (.+):(?:\d+);(\d+)
      replacement: 192.168.0.30:10255
      target_label: address
    • action: labelmap
      regex: _meta_kubernetes_pod_label(.+)
    • source_labels: [__meta_kubernetes_pod_namespace]
      action: replace
      target_label: 'EDGECLIFFE85QOGFCTBMZRHOV'
    • source_labels: [__meta_kubernetes_pod_name]
      action: replace
      target_label: kubernetes_pod_name

END-EDGECLIFFE85QOGFCTBMZRHOV***************************

alert.rules

Rule  State Error Last Evaluation Evaluation Time
alert: Heartbeat
expr: vector(1)
labels:
  severity: informational
OK    14.935s ago 266.7us
alert: service_down
expr: up == 0
labels:
  alertname: de_duplicate
  severity: critical
annotations:
  description: Service {{ $labels.instance }} is unavailable.
  runbook: http://wiki.alerta.io/RunBook/{app}/Event/{alertname}
  value: DOWN ({{ $value }})
OK    14.935s ago 1.156ms
alert: service_up
expr: up == 1
labels:
  service: Platform
  severity: normal
annotations:
  description: Service {{ $labels.instance }} is available.
  value: UP ({{ $value }})
OK    14.934s ago 6.488ms
alert: high_load
expr: node_load1 > 0.5
labels:
  severity: major
annotations:
  description: '{{ $labels.instance }} of job {{ $labels.job }} is under high load.'
  summary: Instance {{ $labels.instance }} under high load
  value: '{{ $value }}'
OK    14.928s ago 443.4us
alert: disk_space
expr: (node_filesystem_size_bytes
  - node_filesystem_free_bytes) * 100 / node_filesystem_size_bytes > 5
labels:
  alertname: de_duplicate
  instance: '{{ $labels.instance }}:{{ $labels.mountpoint }}'
  severity: critical
annotations:
  value: '{{ humanize $value }}%'
OK    14.927s ago 4.844ms
alert: disk_util
expr: (node_filesystem_size_bytes
  - node_filesystem_free_bytes) * 100 / node_filesystem_size_bytes > 5
labels:
  alertname: de_duplicate
  event: '{alertname}:{{ $labels.mountpoint }}'
  instance: '{{ $labels.instance }}'
  severity: warning
annotations:
  value: '{{ humanize $value }}%'
OK    14.923s ago 9.817ms
PrometheusDiskSpaceIsLow
7.933s ago
1.592ms
Rule  State Error Last Evaluation Evaluation Time
alert: LowDiskSpace-
  NXAWSPU2EDGETRIL
expr: ((node_filesystem_size_bytes{job="node-exporter"}
  - node_filesystem_free_bytes{job="node-exporter"}) / node_filesystem_size_bytes{job="node-exporter"}
  * 100 > 4)
for: 1m
labels:
  alertname: de_duplicate
  severity: critical
annotations:
  description: Low Disk Space {{ $labels.instance }} .
  summary: Monitor service non-operational
OK    7.933s ago  1.423ms
alert: api_requests_high
expr: rate(alerta_alerts_queries_count{instance="alerta:8080",job="alerta"}[5m])
  > 5
labels:
  service: Alerta,Platform
  severity: major
annotations:
  description: API request rate of {{ $value | printf "%.1f" }} req/s is high
    (threshold 5 req/s)
  summary: API request rate high
  value: '{{ humanize $value }} req/s'
OK    7.932s ago  151.3us
kube-state-metrics.rules
2.268s ago
2.322ms
Rule  State Error Last Evaluation Evaluation Time
alert: DeploymentGenerationMismatch
  - NXAWSPU2EDGETRIL
expr: kube_deployment_status_observed_generation
  != kube_deployment_metadata_generation
for: 10m
labels:
  severity: warning
annotations:
  description: Observed deployment generation does not match expected one for deployment
    {{$labels.namespaces}}/{{$labels.deployment}}
  runbook: https://confluence.eng.example.com:8443/display/XRE/KubeDeploymentGenerationMismatch
  summary: Deployment is outdated
OK    2.268s ago  322.8us
alert: DeploymentReplicasNotUpdated
  - NXAWSPU2EDGETRIL
expr: ((kube_deployment_status_replicas_updated
  != kube_deployment_spec_replicas) or (kube_deployment_status_replicas_available
  != kube_deployment_spec_replicas)) unless (kube_deployment_spec_paused == 1)
for: 10m
labels:
  severity: warning
annotations:
  description: Replicas are not updated and available for deployment {{$labels.namespaces}}/{{$labels.deployment}}
  runbook: https://confluence.eng.example.com:8443/display/XRE/K8SDeploymentReplicasMismatch
  summary: Deployment replicas are outdated
OK    2.268s ago  388us
alert: DaemonSetRolloutStuck
  - NXAWSPU2EDGETRIL
expr: kube_daemonset_status_number_ready
  / kube_daemonset_status_desired_number_scheduled * 100 < 100
for: 10m
labels:
  severity: warning
annotations:
  description: Only {{$value}}% of desired pods scheduled and ready for daemon set
    {{$labels.namespaces}}/{{$labels.daemonset}}
  runbook: https://confluence.eng.example.com:8443/display/XRE/K8SDaemonSetRolloutStuck
  summary: DaemonSet is missing pods
OK    2.268s ago  119.4us
alert: K8SDaemonSetsNotScheduled
  - NXAWSPU2EDGETRIL
expr: kube_daemonset_status_desired_number_scheduled
  - kube_daemonset_status_current_number_scheduled > 0
for: 10m
labels:
  severity: warning
annotations:
  description: A number of daemonsets are not scheduled.
  runbook: https://confluence.eng.example.com:8443/display/XRE/K8sDaemonSetNotScheduled
  summary: Daemonsets are not scheduled correctly
  suppressed: "No"
OK    2.268s ago  97.28us
alert: DaemonSetsMissScheduled
  - NXAWSPU2EDGETRIL
expr: kube_daemonset_status_number_misscheduled
  > 0
for: 10m
labels:
  severity: warning
annotations:
  description: A number of daemonsets are running where they are not supposed to run.
  runbook: https://confluence.eng.example.com:8443/display/XRE/K8sDaemonSetsMissScheduled
  summary: Daemonsets are not scheduled correctly
OK    2.268s ago  72.79us
alert: PodFrequentlyRestarting
  - NXAWSPU2EDGETRIL
expr: increase(kube_pod_container_status_restarts_total[1h])
  > 5
for: 10m
labels:
  severity: critical
annotations:
  description: Pod {{$labels.namespaces}}/{{$labels.pod}} is was restarted {{$value}}
    times within the last hour
  summary: Pod is restarting frequently
OK    2.268s ago  1.076ms
alert: K8SNodeOutOfDisk
  - NXAWSPU2EDGETRIL
expr: kube_node_status_condition{condition="OutOfDisk",status="true"}
  == 1
labels:
  service: k8s
  severity: critical
annotations:
  description: '{{ $labels.node }} has run out of disk space.'
  runbook: https://confluence.eng.example.com:8443/display/XRE/K8SNodeOutOfDisk
  summary: Node ran out of disk space.

NOTE: Please provide as much information about your issue as possible.
Failure to provide basic details about your specific environment make
it impossible to know if an issue has already been fixed, can delay a
response and may result in your issue being closed without a resolution.

Permission or scope for user

As per my observation default user permission is read & write.

image

As user having write permission so user can delete any alert. which I do not want. I want separate permission or scope so apart from delete user can do rest.

Not sure if that is available or not, if available can anyone please guide me.

Can't access to /management/metrics with Alerta Docker

Issue Summary
Reading the documentation, it's indicated that Alerta metrics are exposed on the route /management/metrics in order to collect them with Prometheus.

When I try to access this route, I automatically get a redirect to the /alerts route. So I can't access the metrics.

Environment

  • OS: CentOS 7

  • API version: 8.7.0

  • Deployment: Docker

  • For self-hosted, WSGI environment: [eg. nginx/uwsgi, apache/mod_wsgi]

  • Database: Postgres

  • Server config:
    Auth enabled? Yes
    Auth provider? OpenID but disabled for testing metrics
    Customer views? Yes

  • web UI version: 8.7.0

To Reproduce
Steps to reproduce the behavior:

  1. Log into Alerta
  2. access to the url https://my-alerta.example/management/metrics

config:

{
  "actions": [], 
  "alarm_model": {
    "colors": {
      "severity": {
        "critical": "#91243E", 
        "emergency": "#91243E", 
        "error": "#67ACE1", 
        "fatal": "#91243E", 
        "indeterminate": "#A3C722", 
        "informational": "#A3C722", 
        "major": "#D43F3A", 
        "minor": "#F18C43", 
        "ok": "#A3C722", 
        "trace": "#67ACE1", 
        "unknown": "#67ACE1", 
        "warning": "#F8C851"
      }
    }, 
    "defaults": {
      "normal_severity": "ok", 
      "previous_severity": "indeterminate", 
      "status": "open"
    }, 
    "name": "Alerta 8.7.0", 
    "severity": {
      "critical": 1, 
      "emergency": 0, 
      "error": 9, 
      "fatal": 0, 
      "indeterminate": 5, 
      "informational": 5, 
      "major": 2, 
      "minor": 3, 
      "ok": 5, 
      "trace": 9, 
      "unknown": 9, 
      "warning": 4
    }, 
    "status": {
      "ack": "C", 
      "assign": "B", 
      "blackout": "E", 
      "closed": "F", 
      "expired": "G", 
      "open": "A", 
      "shelved": "D", 
      "unknown": "H"
    }
  }, 
  "allow_readonly": false, 
  "audio": {
    "new": null
  }, 
  "auth_required": true, 
  "aws_region": "us-east-1", 
  "azure_tenant": "common", 
  "blackouts": {
    "duration": 3600
  }, 
  "client_id": "", 
  "cognito_domain": null, 
  "colors": {
    "severity": {
      "critical": "#91243E", 
      "emergency": "#91243E", 
      "error": "#67ACE1", 
      "fatal": "#91243E", 
      "indeterminate": "#A3C722", 
      "informational": "#A3C722", 
      "major": "#D43F3A", 
      "minor": "#F18C43", 
      "ok": "#A3C722", 
      "trace": "#67ACE1", 
      "unknown": "#67ACE1", 
      "warning": "#F8C851"
    }
  }, 
  "columns": [
    "severity", 
    "status", 
    "lastReceiveTime", 
    "duplicateCount", 
    "customer", 
    "environment", 
    "service", 
    "resource", 
    "event", 
    "value", 
    "text"
  ], 
  "customer_views": true, 
  "dates": {
    "longDate": "ddd D MMM, YYYY HH:mm:ss.SSS Z", 
    "mediumDate": "ddd D MMM HH:mm", 
    "shortTime": "HH:mm"
  }, 
  "debug": true, 
  "email_verification": false, 
  "endpoint": "******://******/api", 
  "environments": [
    "Production", 
    "Development", 
    "Code", 
    "Test", 
    "Validation"
  ], 
  "filter": {
    "status": [
      "open", 
      "ack"
    ]
  }, 
  "font": {
    "font-family": "\"Sintony\", Arial, sans-serif", 
    "font-size": "13px", 
    "font-weight": 500
  }, 
  "github_url": "https://github.com", 
  "gitlab_url": "https://gitlab.com", 
  "indicators": {
    "queries": [
      {
        "query": [
          [
            "environment", 
            "Production"
          ]
        ], 
        "text": "Production"
      }, 
      {
        "query": [
          [
            "environment", 
            "Development"
          ]
        ], 
        "text": "Development"
      }, 
      {
        "query": {
          "q": "event:Heartbeat"
        }, 
        "text": "Heartbeats"
      }, 
      {
        "query": "group=Misc", 
        "text": "Misc."
      }
    ], 
    "severity": [
      "critical", 
      "major", 
      "minor", 
      "warning", 
      "indeterminate", 
      "informational"
    ]
  }, 
  "keycloak_realm": null, 
  "keycloak_url": null, 
  "oidc_auth_url": null, 
  "provider": "basic", 
  "readonly_scopes": [
    "read"
  ], 
  "refresh_interval": 5000, 
  "severity": {
    "critical": 1, 
    "emergency": 0, 
    "error": 9, 
    "fatal": 0, 
    "indeterminate": 5, 
    "informational": 5, 
    "major": 2, 
    "minor": 3, 
    "ok": 5, 
    "trace": 9, 
    "unknown": 9, 
    "warning": 4
  }, 
  "signup_enabled": false, 
  "site_logo_url": "", 
  "sort_by": "lastReceiveTime", 
  "timeouts": {
    "ack": 0, 
    "alert": 0, 
    "heartbeat": 86400, 
    "shelve": 7200
  }, 
  "tracking_id": null
}

Expected behavior
Simply to be able to access the metrics as indicated in the documentation.

labels.__name__ not work

in alertmanager my-rule.yaml

groups:
- name: svc-alert-rule
  rules:
  - alert: dubbo_invoke_exception
    expr: "rate({__name__=~'dubbo_.*_fail_.*'}[5m])>0"
    for: 5s
    labels:
      metricsName: '{{ $labels.__name__ }}'

{{ $labels.__name__ }} can not get the metric name,

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.