alerta / prometheus-config Goto Github PK

Prometheus config for Alerta

License: MIT License

prometheus-config's Issues

Permission or scope for user

As per my observation default user permission is read & write.

As user having write permission so user can delete any alert. which I do not want. I want separate permission or scope so apart from delete user can do rest.

Not sure if that is available or not, if available can anyone please guide me.

Cannot able to see the critical alerts in Alerta dashboard.

Issue Summary
We are running alerta using our docker-compose which consist of Prometheus, alertmanager and grafana, etc services deployed in a ubuntu machine. We would like to co-relate the alerts and group them to reduce the noise.

The following are the issues which i am currently facing.

If any alert configured with label severity: critical I don't see that alert in alerta dashboard.
Sometimes even though the nodes are up alerta show its service_dow alert and in unAvailable state.
As per my understanding alerta is a visualization tool but doesn't have the capability to correlate alert that which alertmanager sends (no correlated alerts send with email by using alerta)

Environment

OS: Ubuntu
API version: kubernetes API
Deployment: docker compose
Database: mongo
Server config:
Auth enabled? Yes
Auth provider? No
Customer views? Yes

To Reproduce
Steps to reproduce the behavior:

Run using the docker-compose up -d
will create alertmanager,prometheus, grafana,alerta,mongo and other containers
login to the alerta console and see the dashboard.
I can at least see all the alerts at the time of the creation of the alerta for the very first time and making any changes in alert rules like changing the severity from critical to warning or warning to minor sometimes i done see the alerts in alerta but i can see the alert in firing state in prometheus.

For web app issues, include any web browser JavaScript console errors.

Screenshots

docker compose:

prometheus:
image: prom/prometheus
container_name: prometheus
volumes:
- ./prometheus/:/etc/prometheus/
- prometheus_data:/prometheus
command:
- '--config.file=/etc/prometheus/prometheus.yml'
- '--storage.tsdb.path=/prometheus'
- '--web.console.libraries=/etc/prometheus/console_libraries'
- '--web.console.templates=/etc/prometheus/consoles'
- '--storage.tsdb.retention.time=12h'
- '--storage.tsdb.retention.size=1024MB'
- '--web.enable-lifecycle'
restart: unless-stopped
expose:
- 9090
networks:
- monitor-net
labels:
org.label-schema.group: "monitoring"

alertmanager:
image: prom/alertmanager
container_name: alertmanager
volumes:
- ./alertmanager/:/etc/alertmanager/
command:
- '--config.file=/etc/alertmanager/config.yml'
- '--storage.path=/alertmanager'
restart: unless-stopped
expose:
- 9093
networks:
- monitor-net
labels:
org.label-schema.group: "monitoring"

nodeexporter:
image: prom/node-exporter
container_name: nodeexporter
user: root
privileged: true
volumes:
- /proc:/host/proc:ro
- /sys:/host/sys:ro
- /:/rootfs:ro
command:
- '--path.procfs=/host/proc'
- '--path.rootfs=/rootfs'
- '--path.sysfs=/host/sys'
- '--collector.filesystem.ignored-mount-points=^/(sys|proc|dev|host|etc)($$|/)'
restart: unless-stopped
expose:
- 9100
networks:
- monitor-net
labels:
org.label-schema.group: "monitoring"

cadvisor:
image: google/cadvisor
container_name: cadvisor
volumes:
- /:/rootfs:ro
- /var/run:/var/run:rw
- /sys:/sys:ro
- /var/lib/docker/:/var/lib/docker:ro
#- /cgroup:/cgroup:ro #doesn't work on MacOS only for Linux
restart: unless-stopped
expose:
- 8080
networks:
- monitor-net
labels:
org.label-schema.group: "monitoring"

grafana:
image: grafana/grafana
container_name: grafana
volumes:
- grafana_data:/var/lib/grafana
- ./grafana/datasources:/etc/grafana/datasources
- ./grafana/dashboards:/etc/grafana/dashboards
- ./grafana/setup.sh:/setup.sh
entrypoint: /setup.sh
environment:
- ADMIN_USER=${ADMIN_USER:-admin}
- ADMIN_PASSWORD=${ADMIN_PASSWORD:-admin}
- GF_USERS_ALLOW_SIGN_UP=false
restart: unless-stopped
expose:
- 3000
networks:
- monitor-net
labels:
org.label-schema.group: "monitoring"

alerta:
image: alerta/alerta-web:latest
ports:
- 9080:8080
depends_on:
- db
environment:
- DEBUG=1 # remove this line to turn DEBUG off
- DATABASE_URL=mongodb://db:27017/monitoring
# - AUTH_REQUIRED=True
- ADMIN_USERS=[email protected]
- PLUGINS=remote_ip,reject,heartbeat,blackout,prometheus
- ALERTMANAGER_API_URL=http://alertmanager:9093
restart: always
networks:
- monitor-net

db:
image: mongo
volumes:
- ./data/mongodb:/data/db
restart: always
networks:
- monitor-net

Alertmanager config:
global:
resolve_timeout: 5m
smtp_smarthost: 'host:port'
smtp_from: '[email protected]'
smtp_auth_username: 'XXXXXXXXXXXXXXXXXX'
smtp_auth_password: 'XXXXXXXXXXXXXXXXXXXXXXXXXXXXX'
templates:

'/etc/alertmanager/template/*.tmpl'
route:
receiver: alerta
group_by:
- alertname
- cluster
- service
  routes:
- receiver: iot-ops
  match:
  severity: critical
- receiver: default-receiver
  match:
  severity: warning

inhibit_rules:

source_match:
severity: 'critical'
target_match:
severity: 'warning'
Apply inhibition if the alertname is the same.
equal: ['alertname', 'cluster', 'service', 'de_duplicate']

receivers:

name: 'default-receiver'
email_configs:
- to: '[email protected]'
  send_resolved: true
name: "alerta"
webhook_configs:
- url: 'http://alerta:8080/api/webhooks/prometheus'
  send_resolved: true
  http_config:
  basic_auth:
  username: [email protected]
  password: alerta
name: 'iot-ops'
email_configs:
- to: '[email protected]'
  send_resolved: true

Prometheus

prometheus global config

global:
scrape_interval: 15s
evaluation_interval: 15s

scrape_timeout is set to the global default (10s).

external_labels:
environment: Production
service: Prometheus

alerting:
alertmanagers:

static_configs:
- targets:
  - alertmanager:9093

rule_files:

"*.rules"

scrape_configs:

metrics_path defaults to '/metrics'

scheme defaults to 'http'.

job_name: prometheus

scrape_interval: 5s
scrape_timeout: 2s
honor_labels: true

static_configs:
- targets: ['prometheus:9090']
job_name: cadvisor

scrape_interval: 5s
scrape_timeout: 2s
honor_labels: true

static_configs:
- targets: ['cadvisor:8080']
job_name: node-exporter

scrape_interval: 5s
scrape_timeout: 2s
honor_labels: true

static_configs:
- targets: ['nodeexporter:9100']
job_name: de-duplicate-node

scrape_interval: 5s
scrape_timeout: 2s
honor_labels: true

static_configs:
- targets: ['nodeexporter:9101']
job_name: collectd
static_configs:
- targets: ['collectd:9103']

START-EDGECLIFFE85QOGFCTBMZRHOV****************************

metrics for kubernetes scheduler and controller

job_name: 'sample-k8s-job-name-scheduler-and-controller'
scrape_interval: 5s
static_configs:
- targets: ['192.168.0.30:10251']
  labels:
  customer: 'EDGECLIFFE85QOGFCTBMZRHOV'

metrics foom node exporter

job_name: 'sample-k8s-job-name-nodes-exporter'
scrape_interval: 5s
static_configs:
- targets: ['192.168.0.30:9100']
  labels:
  customer: 'EDGECLIFFE85QOGFCTBMZRHOV'
- targets: ['192.168.0.31:9100']
  labels:
  customer: 'EDGECLIFFE85QOGFCTBMZRHOV'

metrics from cadvisory

job_name: 'sample-k8s-job-name-cadvisor'
scrape_interval: 10s
metrics_path: "/metrics/cadvisor"
static_configs:
- targets: ['192.168.0.30:10255']
labels:
customer: 'EDGECLIFFE85QOGFCTBMZRHOV'

metrics for default/kubernetes api's from the kubernetes master

job_name: 'sample-k8s-job-name-apiservers'
kubernetes_sd_configs:
- role: endpoints
  api_server: https://192.168.0.30
  tls_config:
  insecure_skip_verify: true
  basic_auth:
  username: random
  password: jkaskhjn1267nbamhkjhadnpoiqwoioo
  scheme: https
  tls_config:
  insecure_skip_verify: true
  basic_auth:
  username: random
  password: jkaskhjn1267nbamhkjhadnpoiqwoioo
  relabel_configs:
- source_labels: [__meta_kubernetes_namespace, __meta_kubernetes_service_name, __meta_kubernetes_endpoint_port_name]
  action: keep
  regex: default;kubernetes;https

metrics for the kubernetes node kubelet service (collection proxied through master)

job_name: 'sample-k8s-job-name-kubernetes-nodes'
kubernetes_sd_configs:
- role: node
  api_server: https://192.168.0.30
  tls_config:
  insecure_skip_verify: true
  basic_auth:
  username: random
  password: jkaskhjn1267nbamhkjhadnpoiqwoioo
  scheme: https
  tls_config:
  insecure_skip_verify: true
  basic_auth:
  username: random
  password: jkaskhjn1267nbamhkjhadnpoiqwoioo
  relabel_configs:
- action: labelmap
  regex: _meta_kubernetes_node_label(.+)
- target_label: address
  replacement: 192.168.0.30:443
- source_labels: [__meta_kubernetes_node_name]
  regex: (.+)
  target_label: metrics_path
  replacement: /api/v1/nodes/${1}/proxy/metrics/cadvisor

metrics from service endpoints on /metrics over https via the master proxy

set annotation (prometheus.io/scrape: true) to enable

Example: kubectl annotate svc myservice prometheus.io/scrape=true

job_name: 'sample-k8s-job-name-service-endpoints'
kubernetes_sd_configs:
- role: endpoints
  api_server: https://192.168.0.30
  tls_config:
  insecure_skip_verify: true
  basic_auth:
  username: random
  password: jkaskhjn1267nbamhkjhadnpoiqwoioo
  scheme: https
  tls_config:
  insecure_skip_verify: true
  basic_auth:
  username: random
  password: jkaskhjn1267nbamhkjhadnpoiqwoioo
  relabel_configs:
- source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scrape]
  action: keep
  regex: true
- source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_port]
  action: replace
  regex: (\d+)
  target_label: __meta_kubernetes_pod_container_port_number
- source_labels: [__meta_kubernetes_service_annotation_prometheus_io_path]
  action: replace
  regex: ()
  target_label: __meta_kubernetes_service_annotation_prometheus_io_path
  replacement: /metrics
- source_labels: [__meta_kubernetes_namespace, __meta_kubernetes_service_name, __meta_kubernetes_pod_container_port_number, __meta_kubernetes_service_annotation_prometheus_io_path]
  target_label: metrics_path
  regex: (.+);(.+);(.+);(.+)
  replacement: /api/v1/namespaces/$1/services/$2:$3/proxy$4
- target_label: address
  replacement: 192.168.0.30:443
- action: labelmap
  regex: _meta_kubernetes_service_label(.+)
- source_labels: [__meta_kubernetes_namespace]
  action: replace
  target_label: 'EDGECLIFFE85QOGFCTBMZRHOV'
- source_labels: [__meta_kubernetes_service_name]
  action: replace
  target_label: kubernetes_name
- source_labels: [__meta_kubernetes_pod_node_name]
  action: replace
  target_label: instance

metrics from pod endpoints on /metrics over https via the master proxy

set annotation (prometheus.io/scrape: true) to enable

Example: kubectl annotate pod mypod prometheus.io/scrape=true

job_name: 'sample-k8s-job-name-pods'
kubernetes_sd_configs:
- role: pod
  api_server: https://192.168.0.30
  tls_config:
  insecure_skip_verify: true
  basic_auth:
  username: random
  password: jkaskhjn1267nbamhkjhadnpoiqwoioo
  scheme: http
  tls_config:
  insecure_skip_verify: true
  basic_auth:
  username: random
  password: jkaskhjn1267nbamhkjhadnpoiqwoioo
  relabel_configs:
- source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
  action: keep
  regex: true
- source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_path]
  action: replace
  target_label: metrics_path
  regex: (.+)
- source_labels: [address, __meta_kubernetes_pod_annotation_prometheus_io_port]
  action: replace
  regex: (.+):(?:\d+);(\d+)
  replacement: 192.168.0.30:10255
  target_label: address
- action: labelmap
  regex: _meta_kubernetes_pod_label(.+)
- source_labels: [__meta_kubernetes_pod_namespace]
  action: replace
  target_label: 'EDGECLIFFE85QOGFCTBMZRHOV'
- source_labels: [__meta_kubernetes_pod_name]
  action: replace
  target_label: kubernetes_pod_name

END-EDGECLIFFE85QOGFCTBMZRHOV***************************

alert.rules

Rule  State Error Last Evaluation Evaluation Time
alert: Heartbeat
expr: vector(1)
labels:
  severity: informational
OK    14.935s ago 266.7us
alert: service_down
expr: up == 0
labels:
  alertname: de_duplicate
  severity: critical
annotations:
  description: Service {{ $labels.instance }} is unavailable.
  runbook: http://wiki.alerta.io/RunBook/{app}/Event/{alertname}
  value: DOWN ({{ $value }})
OK    14.935s ago 1.156ms
alert: service_up
expr: up == 1
labels:
  service: Platform
  severity: normal
annotations:
  description: Service {{ $labels.instance }} is available.
  value: UP ({{ $value }})
OK    14.934s ago 6.488ms
alert: high_load
expr: node_load1 > 0.5
labels:
  severity: major
annotations:
  description: '{{ $labels.instance }} of job {{ $labels.job }} is under high load.'
  summary: Instance {{ $labels.instance }} under high load
  value: '{{ $value }}'
OK    14.928s ago 443.4us
alert: disk_space
expr: (node_filesystem_size_bytes
  - node_filesystem_free_bytes) * 100 / node_filesystem_size_bytes > 5
labels:
  alertname: de_duplicate
  instance: '{{ $labels.instance }}:{{ $labels.mountpoint }}'
  severity: critical
annotations:
  value: '{{ humanize $value }}%'
OK    14.927s ago 4.844ms
alert: disk_util
expr: (node_filesystem_size_bytes
  - node_filesystem_free_bytes) * 100 / node_filesystem_size_bytes > 5
labels:
  alertname: de_duplicate
  event: '{alertname}:{{ $labels.mountpoint }}'
  instance: '{{ $labels.instance }}'
  severity: warning
annotations:
  value: '{{ humanize $value }}%'
OK    14.923s ago 9.817ms
PrometheusDiskSpaceIsLow
7.933s ago
1.592ms
Rule  State Error Last Evaluation Evaluation Time
alert: LowDiskSpace-
  NXAWSPU2EDGETRIL
expr: ((node_filesystem_size_bytes{job="node-exporter"}
  - node_filesystem_free_bytes{job="node-exporter"}) / node_filesystem_size_bytes{job="node-exporter"}
  * 100 > 4)
for: 1m
labels:
  alertname: de_duplicate
  severity: critical
annotations:
  description: Low Disk Space {{ $labels.instance }} .
  summary: Monitor service non-operational
OK    7.933s ago  1.423ms
alert: api_requests_high
expr: rate(alerta_alerts_queries_count{instance="alerta:8080",job="alerta"}[5m])
  > 5
labels:
  service: Alerta,Platform
  severity: major
annotations:
  description: API request rate of {{ $value | printf "%.1f" }} req/s is high
    (threshold 5 req/s)
  summary: API request rate high
  value: '{{ humanize $value }} req/s'
OK    7.932s ago  151.3us
kube-state-metrics.rules
2.268s ago
2.322ms
Rule  State Error Last Evaluation Evaluation Time
alert: DeploymentGenerationMismatch
  - NXAWSPU2EDGETRIL
expr: kube_deployment_status_observed_generation
  != kube_deployment_metadata_generation
for: 10m
labels:
  severity: warning
annotations:
  description: Observed deployment generation does not match expected one for deployment
    {{$labels.namespaces}}/{{$labels.deployment}}
  runbook: https://confluence.eng.example.com:8443/display/XRE/KubeDeploymentGenerationMismatch
  summary: Deployment is outdated
OK    2.268s ago  322.8us
alert: DeploymentReplicasNotUpdated
  - NXAWSPU2EDGETRIL
expr: ((kube_deployment_status_replicas_updated
  != kube_deployment_spec_replicas) or (kube_deployment_status_replicas_available
  != kube_deployment_spec_replicas)) unless (kube_deployment_spec_paused == 1)
for: 10m
labels:
  severity: warning
annotations:
  description: Replicas are not updated and available for deployment {{$labels.namespaces}}/{{$labels.deployment}}
  runbook: https://confluence.eng.example.com:8443/display/XRE/K8SDeploymentReplicasMismatch
  summary: Deployment replicas are outdated
OK    2.268s ago  388us
alert: DaemonSetRolloutStuck
  - NXAWSPU2EDGETRIL
expr: kube_daemonset_status_number_ready
  / kube_daemonset_status_desired_number_scheduled * 100 < 100
for: 10m
labels:
  severity: warning
annotations:
  description: Only {{$value}}% of desired pods scheduled and ready for daemon set
    {{$labels.namespaces}}/{{$labels.daemonset}}
  runbook: https://confluence.eng.example.com:8443/display/XRE/K8SDaemonSetRolloutStuck
  summary: DaemonSet is missing pods
OK    2.268s ago  119.4us
alert: K8SDaemonSetsNotScheduled
  - NXAWSPU2EDGETRIL
expr: kube_daemonset_status_desired_number_scheduled
  - kube_daemonset_status_current_number_scheduled > 0
for: 10m
labels:
  severity: warning
annotations:
  description: A number of daemonsets are not scheduled.
  runbook: https://confluence.eng.example.com:8443/display/XRE/K8sDaemonSetNotScheduled
  summary: Daemonsets are not scheduled correctly
  suppressed: "No"
OK    2.268s ago  97.28us
alert: DaemonSetsMissScheduled
  - NXAWSPU2EDGETRIL
expr: kube_daemonset_status_number_misscheduled
  > 0
for: 10m
labels:
  severity: warning
annotations:
  description: A number of daemonsets are running where they are not supposed to run.
  runbook: https://confluence.eng.example.com:8443/display/XRE/K8sDaemonSetsMissScheduled
  summary: Daemonsets are not scheduled correctly
OK    2.268s ago  72.79us
alert: PodFrequentlyRestarting
  - NXAWSPU2EDGETRIL
expr: increase(kube_pod_container_status_restarts_total[1h])
  > 5
for: 10m
labels:
  severity: critical
annotations:
  description: Pod {{$labels.namespaces}}/{{$labels.pod}} is was restarted {{$value}}
    times within the last hour
  summary: Pod is restarting frequently
OK    2.268s ago  1.076ms
alert: K8SNodeOutOfDisk
  - NXAWSPU2EDGETRIL
expr: kube_node_status_condition{condition="OutOfDisk",status="true"}
  == 1
labels:
  service: k8s
  severity: critical
annotations:
  description: '{{ $labels.node }} has run out of disk space.'
  runbook: https://confluence.eng.example.com:8443/display/XRE/K8SNodeOutOfDisk
  summary: Node ran out of disk space.

NOTE: Please provide as much information about your issue as possible.
Failure to provide basic details about your specific environment make
it impossible to know if an issue has already been fixed, can delay a
response and may result in your issue being closed without a resolution.

Prometheus alert notification to OMNIBUS

Hi All,

Iam new to this Prometheus
I have configured OMNIBUS with message probe in stand alone RHEL VM
My Prometheus running as a Docker Container
I want to configure Prometheus alerts forward to IBM OMNIBUS
Please let me know the possibilities and share me if any document

Regards,
Venkatraj

labels.alertname is not working

according to prometheus/prometheus#2818
it is expected behaviour

Prometheus Webhook error with AlertManager 0.8.0 version

Below error appears while trying to integrate AlertManager 0.8.0 with Alerta Prometheus Webhook config. The error does not appear with AlertManager 0.7.1 version.

127.0.0.1 - - [22/Sep/2017 05:00:35] "POST /webhooks/prometheus HTTP/1.1" 500 -
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/flask/app.py", line 1997, in __call__
    return self.wsgi_app(environ, start_response)
  File "/usr/lib/python2.7/site-packages/flask/app.py", line 1985, in wsgi_app
    response = self.handle_exception(e)
  File "/usr/lib/python2.7/site-packages/flask_cors/extension.py", line 161, in wrapped_function
    return cors_after_request(app.make_response(f(*args, **kwargs)))
  File "/usr/lib/python2.7/site-packages/flask/app.py", line 1540, in handle_exception
    reraise(exc_type, exc_value, tb)
  File "/usr/lib/python2.7/site-packages/flask/app.py", line 1982, in wsgi_app
    response = self.full_dispatch_request()
  File "/usr/lib/python2.7/site-packages/flask/app.py", line 1614, in full_dispatch_request
    rv = self.handle_user_exception(e)
  File "/usr/lib/python2.7/site-packages/flask_cors/extension.py", line 161, in wrapped_function
    return cors_after_request(app.make_response(f(*args, **kwargs)))
  File "/usr/lib/python2.7/site-packages/flask/app.py", line 1517, in handle_user_exception
    reraise(exc_type, exc_value, tb)
  File "/usr/lib/python2.7/site-packages/flask/app.py", line 1612, in full_dispatch_request
    rv = self.dispatch_request()
  File "/usr/lib/python2.7/site-packages/flask/app.py", line 1598, in dispatch_request
    return self.view_functions[rule.endpoint](**req.view_args)
  File "/usr/lib/python2.7/site-packages/flask_cors/decorator.py", line 128, in wrapped_function
    resp = make_response(f(*args, **kwargs))
  File "/usr/lib/python2.7/site-packages/alerta_server-4.10.0-py2.7.egg/alerta/app/auth.py", line 157, in wrapped
    return f(*args, **kwargs)
  File "/usr/lib/python2.7/site-packages/alerta_server-4.10.0-py2.7.egg/alerta/app/webhooks/views.py", line 382, in prometheus
    incomingAlert = parse_prometheus(alert, external_url)
  File "/usr/lib/python2.7/site-packages/alerta_server-4.10.0-py2.7.egg/alerta/app/webhooks/views.py", line 339, in parse_prometheus
    text = description or summary or '%s: %s on %s' % (labels['job'], labels['alertname'], labels['instance'])
KeyError: 'job'

Alerta docker image

Thanks for building latest image
I am running Prometheus, mongo-db & alerta as docker container & I can see below error:

2016-02-25 07:45:59,381 DEBG 'nginx' stdout output:
2016/02/25 07:45:59 [error] 23#0: *1 open() "/app/webhooks/prometheus" failed (2: No such file or directory), client: 172.17.42.1, server: , request: "POST /webhooks/prometheus HTTP/1.1", host: ":"

client: 172.17.42.1 is the gateway (when I do docker inspect)

Please let me know if I am missing something

Update Alerta docker image

Hi,
I am using alerta/alert-web latest docker image, please let me know if I can integrate Prometheus with alerta using this image.
when I check :/api/ I cant find Prometheus webhook, probably because its a new development.

"customer" variable need to be an annotation, not a label

Issue Summary
Hi there !
I was tring to deploy Alerta, and lost sometime on the customer field.
As the documentation says here, the customer field need to be in the label section, which seems to be logical, but its was not working.
To get the customer field ok, I had to add it as an "annotation".

Environment

OS: linux
API version: 8.1.0
Deployment: Helmchart (Kubernetes)
Database: Postgres
Server config:
Auth enabled? Yes
Auth provider? Basic
Customer views? No

alerta@alert-poc-alerta-f579f95f-mchv7:/$ alerta version
alerta 8.1.0
alerta client 8.0.0
requests 2.25.0
click 7.1.2

To Reproduce
Steps to reproduce the behavior:

Deploy a PrometheusRule with the customer field in the label section
Nothing will be display in the customer field
Add it to annotations, the field will be ok

Expected behavior
Add the customer field in the labels, not annotation section

How to altera list type field convert to a Prometheus rule configuration

How to altera convert to a Prometheus rule configuration
example :
alerta data:
"correlate": ["cpu","load"],

Prometheus rules:
labels:
correlate: ['cpu','load']
or
labels:
correlate:
- cpu
- load

All failure
./promtool check rules Prometheus.rules
Checking Prometheus.rules
FAILED:
yaml: unmarshal errors:
line : cannot unmarshal !!seq into string

Alertmanager Cluster

Hi, I really like Alerta so far. It does what Alertmanager and Unsee are missing. However, I can' t seem to find the best way to configure Alerta with a Alertmanager cluster.

Is there a best practice for this or is it not recommended?

No data available in alerta web

Issue Summary
I just tried to run the docker-compose up -d for this repo. I saw alerts in Prometheus and Alertmanager. However there is no data in Alerta. I tried to login using [email protected] and password alerta. It failed too.

Environment

OS: Mac
API version: whatever in docker-compose.yaml

can we have multiple urls in alertmanager config

Currently we have one url in our alertmanager config

can we add two urls ? like below

name: aaaaaa
email_configs:
- to: '[email protected]'
  send_resolved: true
  webhook_configs:
- url: http://aaaaaa/prometheus/alerting,http://bbbbbbbb/prometheus/alerting

labels.name not work

in alertmanager my-rule.yaml

groups:
- name: svc-alert-rule
  rules:
  - alert: dubbo_invoke_exception
    expr: "rate({__name__=~'dubbo_.*_fail_.*'}[5m])>0"
    for: 5s
    labels:
      metricsName: '{{ $labels.__name__ }}'

{{ $labels.__name__ }} can not get the metric name,

Alerta in HA with Alertmanager in HA

Hi, I'm trying to configure alerta with HA to get alerts from alertmanager cluster. I have a doubt to what should I put under alertmanager configuration for the url value on the webhook_configs. If I use only one alerta host how will that alert appear on the second one?

Thanks,
Bruno

README is out-of-date

latest version of prometheus rule file is YAML
prometheus uses double dashes for command-line options now
include BasicAuth in the webhook config example
value is populated from annotations, not labels
docker-compose is out-of-date (remove it and make reference to Docker repo or fix it)
update copyright year

Prometheus webhook_configs returns 404

While hitting this end point from the Alert manager were getting a 404
/webhooks/prometheus

Please advise and many thanks

Please explain in readme the following in more detail

I am looking for more exact info:

where do I place the code given under 'example'.
Where do the examples go under a docker-compose deployment?

Typos in readme.md

There are several typos in the readme.md file. Will post a fix shortly.

Getting - {"code":400,"errors":null,"message":"no alerts in Prometheus notification payload","status":"error"}

Hi,

I am also getting this error. I have gone through many of the articles available on google and nothing helped.
Can you please advise if i am doing anything wrong. All these three components i have configured using docker containers

I have integrated prometheus, alertmanager with alerta. Below is my prometheus.yml file

my global config

global:
  scrape_interval:     15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
  evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.
  # scrape_timeout is set to the global default (10s).
# Alertmanager configuration
alerting:
  alertmanagers:
  - static_configs:
    - targets:
      - ip:9093
# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:
  # - "first_rules.yml"
  # - "second_rules.yml"
    - alert.rules.yml
# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:
  # The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
  - job_name: 'prometheus'
    # metrics_path defaults to '/metrics'
    # scheme defaults to 'http'.
    static_configs:
      - targets: ['ip:9090']
  - job_name: 'node_exporter'
    static_configs:
      - targets: ['ip:9100']
  - job_name: 'nginx_exporter'
    static_configs:
      - targets: ['ip:9113']
  - job_name: 'alert_manager'
    static_configs:
      - targets: ['ip:9093']
  - job_name: 's3-exporter'
    static_configs:
      - targets: ['ip:9340']
  - job_name: 'grafana'
    static_configs:
- targets: ['ip:3000']
  - job_name: 'container_exporter'
    static_configs:
      - targets: ['ip:9104']
  - job_name: 'alerta'
    metrics_path: /api/management/metrics
    static_configs:
    - targets: ['ip:80']
    basic_auth:
      username: xyz
      password: xyz

And also find below alertmanager.yml file

global:
  resolve_timeout: 5m
  smtp_smarthost: 'smtp.gmail.com:587'
  smtp_from: 'xyz'
  smtp_auth_username: '[email protected]'
  smtp_auth_password: 'xyz'
route:
        group_by: ['alertname']
        group_wait: 10s
        group_interval: 10s
        repeat_interval: 1h
        receiver: 'suraj'
        routes:
        - match_re:
            env: 'production'
          receiver: 'alerta'
receivers:
- name: 'suraj'
  email_configs:
  - to: 'email id'
    #inhibit_rules:
    #- source_match:
    # severity: 'critical'
    #target_match:
    #  severity: 'warning'
    #equal: ['alertname', 'dev', 'instance']
- name: 'alerta'
  webhook_configs:
  - url: 'http://ip/api/webhooks/prometheus'
    send_resolved: true
    http_config:
      basic_auth:
        username: xyz
        password: xyz

My alertmanager is showing me alerts on the console. Same prometheus console showing alerts, but alerta console is not showing them.
Below is the error i am getting while accessing http://IP:80/api/webhooks/prometheus

{"code":400,"errors":null,"message":"no alerts in Prometheus notification payload","status":"error"}

Alertmanager (v0.19.0) can't send alerts. The response are always HTTP 400

Issue Summary
Alertmanager can't send alerts. The response are always HTTP 400

Environment

AlertManager: v0.19.0
Pushgateway: v1.0.0
Prometheus: v2.14.0
Alerta: 7.4.1
Deployment: Docker and docker-compose
Database: Postgresql
Server config:
Auth enabled? No
Auth provider? None
Customer views? No
web UI version: 7.4.1
CLI version: 7.4.1

To Reproduce

Default installation containing Prometheus and AlertManager (I use Pushgateway to create some metrics)
alertmanager.yaml
docker-compose.yaml

Additional context
Checking Prometheus, AlertManager and Pushgateway UI's, everything looks fine.

The AlertManager's log show: component=dispatcher msg="Notify for alerts failed" num_alerts=1 err="cancelling notify retry for \"webhook\" due to unrecoverable error: unexpected status code 400 from http://172.18.0.1:8080/api/webhooks/prometheus"

I even saved the posted data using Beeceptor e tried do reproduce the operation using curl and got the same error: no alerts in Prometheus notification payload

Multiple sources support (Alertmanager webhooks)

Hi,

Wondering before I make any PoC testing, if it would be possible to integrate multiple alertmanager webhooks pointing at Alerta?

Would Alerta be able to manager multiple Prometheus(Alertmanager) sources?

Regards,
Wojtek

Can't access to /management/metrics with Alerta Docker

Issue Summary
Reading the documentation, it's indicated that Alerta metrics are exposed on the route /management/metrics in order to collect them with Prometheus.

When I try to access this route, I automatically get a redirect to the /alerts route. So I can't access the metrics.

Environment

OS: CentOS 7
API version: 8.7.0
Deployment: Docker
For self-hosted, WSGI environment: [eg. nginx/uwsgi, apache/mod_wsgi]
Database: Postgres
Server config:
Auth enabled? Yes
Auth provider? OpenID but disabled for testing metrics
Customer views? Yes
web UI version: 8.7.0

To Reproduce
Steps to reproduce the behavior:

Log into Alerta
access to the url https://my-alerta.example/management/metrics

config:

{
  "actions": [], 
  "alarm_model": {
    "colors": {
      "severity": {
        "critical": "#91243E", 
        "emergency": "#91243E", 
        "error": "#67ACE1", 
        "fatal": "#91243E", 
        "indeterminate": "#A3C722", 
        "informational": "#A3C722", 
        "major": "#D43F3A", 
        "minor": "#F18C43", 
        "ok": "#A3C722", 
        "trace": "#67ACE1", 
        "unknown": "#67ACE1", 
        "warning": "#F8C851"
      }
    }, 
    "defaults": {
      "normal_severity": "ok", 
      "previous_severity": "indeterminate", 
      "status": "open"
    }, 
    "name": "Alerta 8.7.0", 
    "severity": {
      "critical": 1, 
      "emergency": 0, 
      "error": 9, 
      "fatal": 0, 
      "indeterminate": 5, 
      "informational": 5, 
      "major": 2, 
      "minor": 3, 
      "ok": 5, 
      "trace": 9, 
      "unknown": 9, 
      "warning": 4
    }, 
    "status": {
      "ack": "C", 
      "assign": "B", 
      "blackout": "E", 
      "closed": "F", 
      "expired": "G", 
      "open": "A", 
      "shelved": "D", 
      "unknown": "H"
    }
  }, 
  "allow_readonly": false, 
  "audio": {
    "new": null
  }, 
  "auth_required": true, 
  "aws_region": "us-east-1", 
  "azure_tenant": "common", 
  "blackouts": {
    "duration": 3600
  }, 
  "client_id": "", 
  "cognito_domain": null, 
  "colors": {
    "severity": {
      "critical": "#91243E", 
      "emergency": "#91243E", 
      "error": "#67ACE1", 
      "fatal": "#91243E", 
      "indeterminate": "#A3C722", 
      "informational": "#A3C722", 
      "major": "#D43F3A", 
      "minor": "#F18C43", 
      "ok": "#A3C722", 
      "trace": "#67ACE1", 
      "unknown": "#67ACE1", 
      "warning": "#F8C851"
    }
  }, 
  "columns": [
    "severity", 
    "status", 
    "lastReceiveTime", 
    "duplicateCount", 
    "customer", 
    "environment", 
    "service", 
    "resource", 
    "event", 
    "value", 
    "text"
  ], 
  "customer_views": true, 
  "dates": {
    "longDate": "ddd D MMM, YYYY HH:mm:ss.SSS Z", 
    "mediumDate": "ddd D MMM HH:mm", 
    "shortTime": "HH:mm"
  }, 
  "debug": true, 
  "email_verification": false, 
  "endpoint": "******://******/api", 
  "environments": [
    "Production", 
    "Development", 
    "Code", 
    "Test", 
    "Validation"
  ], 
  "filter": {
    "status": [
      "open", 
      "ack"
    ]
  }, 
  "font": {
    "font-family": "\"Sintony\", Arial, sans-serif", 
    "font-size": "13px", 
    "font-weight": 500
  }, 
  "github_url": "https://github.com", 
  "gitlab_url": "https://gitlab.com", 
  "indicators": {
    "queries": [
      {
        "query": [
          [
            "environment", 
            "Production"
          ]
        ], 
        "text": "Production"
      }, 
      {
        "query": [
          [
            "environment", 
            "Development"
          ]
        ], 
        "text": "Development"
      }, 
      {
        "query": {
          "q": "event:Heartbeat"
        }, 
        "text": "Heartbeats"
      }, 
      {
        "query": "group=Misc", 
        "text": "Misc."
      }
    ], 
    "severity": [
      "critical", 
      "major", 
      "minor", 
      "warning", 
      "indeterminate", 
      "informational"
    ]
  }, 
  "keycloak_realm": null, 
  "keycloak_url": null, 
  "oidc_auth_url": null, 
  "provider": "basic", 
  "readonly_scopes": [
    "read"
  ], 
  "refresh_interval": 5000, 
  "severity": {
    "critical": 1, 
    "emergency": 0, 
    "error": 9, 
    "fatal": 0, 
    "indeterminate": 5, 
    "informational": 5, 
    "major": 2, 
    "minor": 3, 
    "ok": 5, 
    "trace": 9, 
    "unknown": 9, 
    "warning": 4
  }, 
  "signup_enabled": false, 
  "site_logo_url": "", 
  "sort_by": "lastReceiveTime", 
  "timeouts": {
    "ack": 0, 
    "alert": 0, 
    "heartbeat": 86400, 
    "shelve": 7200
  }, 
  "tracking_id": null
}

Expected behavior
Simply to be able to access the metrics as indicated in the documentation.

Looking for file or path to database in prometheus where alerts are stored in json or other format that reflects in grafana

What did you do?
I am not sure whether this is right forum to ask this questions but I am looking for file or path to database in prometheus where alerts are stored in json or other format that reflects in grafana. I want to write a scripts to take some actions based on alerts/triggers. Normally alerts can be seen on dashboard and notified accordingly on email or messages. My concerns is to trace the path from where the alerts are relflected in grafana.

What did you expect to see?
I expect to have path to database or any json file in which alerts are populated on run time.

Environment

System information:

Linux 3.10.0-1160.53.1.el7.x86_64 x86_64

Prometheus version:

prometheus, version 2.30.0-rc.0 (branch: HEAD, revision: https://github.com/prometheus/prometheus/commit/05a816bfb739b3841acd82bd285bfb5dfec7bfd7)
build user: root@438506e6c112
build date: 20210909-13:31:07
go version: go1.17
platform: linux/amd64

Alertmanager version:

alertmanager, version 0.20.0 (branch: HEAD, revision: f74be0400a6243d10bb53812d6fa408ad71ff32d)
build user: root@00c3106655f8
build date: 20191211-14:13:14
go version: go1.13.5

Any help in this regard would be highly appreciated.

alerta / prometheus-config Goto Github PK

prometheus-config's Issues

Apply inhibition if the alertname is the same.

prometheus global config

scrape_timeout is set to the global default (10s).

metrics_path defaults to '/metrics'

scheme defaults to 'http'.

START-EDGECLIFFE85QOGFCTBMZRHOV****************************

metrics for kubernetes scheduler and controller

metrics foom node exporter

metrics from cadvisory

metrics for default/kubernetes api's from the kubernetes master

metrics for the kubernetes node kubelet service (collection proxied through master)

metrics from service endpoints on /metrics over https via the master proxy

set annotation (prometheus.io/scrape: true) to enable

Example: kubectl annotate svc myservice prometheus.io/scrape=true

metrics from pod endpoints on /metrics over https via the master proxy

set annotation (prometheus.io/scrape: true) to enable

Example: kubectl annotate pod mypod prometheus.io/scrape=true

END-EDGECLIFFE85QOGFCTBMZRHOV***************************

Recommend Projects

Recommend Topics

Recommend Org