cloudalchemy / ansible-alertmanager Goto Github PK
View Code? Open in Web Editor NEWDeploy Prometheus Alertmanager service
License: MIT License
Deploy Prometheus Alertmanager service
License: MIT License
Please use the version 0.14.0 as default as there is a bug fixed that shows mesh network as "pending" although it is "established".
This would reduce confusions when using this role
What happened?
After commit 6f050af
I received following error:
`TASK [ansible-alertmanager : Get checksum for amd64 architecture] **************
fatal: [promcmt-test01]: FAILED! => {"msg": "template error while templating string: expected token 'end of print statement', got 'select'. String: {{ lookup('url', 'https://github.com/prometheus/alertmanager/releases/download/v' + alertmanager_version + '/sha256sums.txt', wantlist=True) | list select('contains', 'linux-' + go_arch + '.tar.gz') | list | first).split(' ')[0] }}"}
...ignoring
TASK [ansible-alertmanager : Checksum lookup error message] ********************
fatal: [promcmt-test01]: FAILED! => {"msg": "The task includes an option with an undefined variable. The error was: 'alertmanager_checksum_url' is undefined\n\nThe error appears to be in '/tmp/awx_183634_xomgifxg/requirements_roles/ansible-alertmanager/tasks/preflight.yml': line 51, column 3, but may\nbe elsewhere in the file depending on the exact syntax problem.\n\nThe offending line appears to be:\n\n\n- name: Checksum lookup error message\n ^ here\n"}`
Did you expect to see some different?
How to reproduce it (as minimally and precisely as possible):
Get latest version of amd64 alertmanager installed
Environment
0.21
Insert release version/galaxy tag or Git SHA here
Ansible version information:
ansible --version
Variables:
insert role variables relevant to the issue
insert Ansible logs relevant to the issue here
Anything else we need to know?:
What happened?
A positive scenario - installation of alertmanager - creates a configuration file, e.g. /etc/alertmanager/alertmanager.yml
.
The permissions on said config file are as follows:
$ ls -ald /etc/alertmanager/alertmanager.yml
-rw-r--r-- 1 alertmanager alertmanager 1050 Apr 30 13:02 /etc/alertmanager/alertmanager.yml
which is defined here:
ansible-alertmanager/tasks/configure.yml
Line 18 in c10f2a8
It is a security problem because alertmanager receivers (e.g.email_config
) include secrets in plain text which would be visible to every user logged into the Alertmanager host.
Did you expect to see some different?
I would expect the config to not be readable by "others", i.e.:
$ ls -ald /etc/alertmanager/alertmanager.yml
-rw-r----- 1 alertmanager alertmanager 1050 Apr 30 13:02 /etc/alertmanager/alertmanager.yml
How to reproduce it (as minimally and precisely as possible):
Standard task, e.g.
- name: Install Alertmanager
ansible.builtin.import_role:
name: cloudalchemy.alertmanager
Environment
Role version:
0.19.1
Ansible version information:
$ ansible --version
ansible 2.10.8
config file = /Users/weakcamel/git/auto/ansible.cfg
configured module search path = ['/Users/weakcamel/.ansible/plugins/modules', '/usr/share/ansible/plugins/modules']
ansible python module location = /Users/weakcamel/git/auto/.venv/lib/python3.9/site-packages/ansible
executable location = /Users/weakcamel/git/auto/.venv/bin/ansible
python version = 3.9.4 (default, Apr 5 2021, 01:50:46) [Clang 12.0.0 (clang-1200.0.32.29)]
n/a
n/a
Anything else we need to know?:
What happened?
if you run ansible and you put your alert rules in template directory
ansible will not copy them, based on the config here:
https://github.com/cloudalchemy/ansible-alertmanager/blob/master/defaults/main.yml#L10
Did you expect to see some different?
it should copy all the files(default *.tmpl
) in to alertmanager template directory(/etc/alertmanager/templates/
)
Ansible version information:
ansible 2.9.27
Variables:
alertmanager_template_files
insert role variables relevant to the issue
HOW YOU CAN FIX THIS:
correct config is:
alertmanager_template_files:
- templates/*.tmpl
you have to remove alertmanager
directory from the path.
More or less duplicated this issue from ansible-prometheus
What happened?
When using this role I get the following deprecation warning:
[DEPRECATION WARNING]: "include" is deprecated, use include_tasks/import_tasks instead.
This feature will be removed in version 2.16. Deprecation warnings can be disabled by
setting deprecation_warnings=False in ansible.cfg.
Did you expect to see some different?
No deprecation warnings.
How to reproduce it (as minimally and precisely as possible):
Use this playbook:
- hosts: all
roles: [cloudalchemy.alertmanager]
Environment
Role version:
0.19.1
Variables:
none
Ansible playbook execution Logs:
none, the warning is displayed before anything else is done by the playbook.
Anything else we need to know?:
The relevant code seems to be here:
ansible-alertmanager/tasks/main.yml
Line 2 in e42e7ef
ansible-alertmanager/tasks/main.yml
Line 8 in e42e7ef
Hello,
Can you put the binaries download url in a variable.
I'm installing on a host that doesn't have internet access. Currently the download url is hardcoded.
Thank you
ansible-alertmanager/tasks/preflight.yml
Line 44 in e42e7ef
Ansible Version
ansible [core 2.11.4]
config file = None
configured module search path = ['/Users/yoakum/.ansible/plugins/modules', '/usr/share/ansible/plugins/modules']
ansible python module location = /Users/yoakum/venv/ansible/lib/python3.9/site-packages/ansible
ansible collection location = /Users/yoakum/.ansible/collections:/usr/share/ansible/collections
executable location = /Users/yoakum/venv/ansible/bin/ansible
python version = 3.9.6 (default, Jun 29 2021, 04:45:03) [Clang 11.0.0 (clang-1100.0.33.17)]
jinja version = 2.11.2
libyaml = True
Errors:
TASK [alertmanager : Get checksum for amd64 architecture] **********************************************************************************************************************************************************************************************************************
fatal: [3.239.126.164]: FAILED! => {"msg": "template error while templating string: expected token 'end of print statement', got 'select'. String: {{ lookup('url', 'https://github.com/prometheus/alertmanager/releases/download/v' + alertmanager_version + '/sha256sums.txt',
wantlist=True) | list select('contains', 'linux-' + go_arch + '.tar.gz') | list | first).split(' ')[0] }}"}
...ignoring
TASK [alertmanager : Checksum lookup error message] ****************************************************************************************************************************************************************************************************************************
fatal: [3.239.126.164]: FAILED! => {"msg": "The task includes an option with an undefined variable. The error was: 'alertmanager_checksum_url' is undefined\n\nThe error appears to be in '/Users/yoakum/git/analytics-techops/analytics-bakery/ansible/roles/alertmanager/tasks
/preflight.yml': line 51, column 3, but may\nbe elsewhere in the file depending on the exact syntax problem.\n\nThe offending line appears to be:\n\n\n- name: Checksum lookup error message\n ^ here\n"}
PLAY RECAP *********************************************************************************************************************************************************************************************************************************************************************
3.239.126.164 : ok=5 changed=0 unreachable=0 failed=1 skipped=2 rescued=0 ignored=1
I was able to get passed part of this issue by adding a '(' before the lookup and then adding a '|' after list. I also had to add the checksum url to the main ansible because it was not being set.
"{{ (lookup('url', 'https://github.com/prometheus/alertmanager/releases/download/v' + alertmanager_version + '/sha256sums.txt', wantlist=True) | list | select('contains', 'linux-' + go_arch + '.tar.gz') | list | first).split(' ')[0] }}"
What happened?
http: Accept error: accept tcp [::]:9093: accept4: too many open files; retrying in 1s
http: Accept error: accept tcp [::]:9093: accept4: too many open files; retrying in 1s
http: Accept error: accept tcp [::]:9093: accept4: too many open files; retrying in 1s
is appearing on alertmanager journalctl error, and it stops sending alerts
adding this should fix the error:
[Service]
LimitNOFILE=16000
LimitNOFILESoft=16000
Hello,
It looks like this role can't use with Telegram just out of the box. Isn't yes?
It might be beneficial to add an option to create silences from ansible.
What needs to be researched:
Hi,
In some corporate environment, server don't have external access.
This playbook fail as it is looking to download from Github.
It could be nice to have a 'flag' to tell if the binaries are managed with this playbook or not
Thanks
Nicolas
What is missing?
https://github.com/prometheus/alertmanager/blob/master/docs/configuration.md#mute_time_interval support
Why do we need it?
It is already implemented in alertmanager but not available in the role.
Environment
Role version:
latest
Insert release version/galaxy tag or Git SHA here
https://galaxy.ansible.com/cloudalchemy/alertmanager (i can't find galaxy tag or something)
Ansible version information:
ansible --version
ansible 2.9.6
Variables:
---
alertmanager_version: latest
alertmanager_binary_local_dir: ''
alertmanager_config_dir: /etc/alertmanager
alertmanager_db_dir: /var/lib/alertmanager
alertmanager_config_file: 'alertmanager.yml.j2'
alertmanager_template_files:
- templates/*.tmpl
alertmanager_web_listen_address: '0.0.0.0:9093'
alertmanager_web_external_url: 'http://localhost:9093/'
alertmanager_http_config: {}
alertmanager_resolve_timeout: 3m
alertmanager_config_flags_extra: {}
# alertmanager_config_flags_extra:
# data.retention: 10
# SMTP default params
alertmanager_smtp: {}
# alertmanager_smtp:
# from: ''
# smarthost: ''
# auth_username: ''
# auth_password: ''
# auth_secret: ''
# auth_identity: ''
# require_tls: "True"
# Default values you can see here -> https://prometheus.io/docs/alerting/configuration/
alertmanager_slack_api_url: 'XXX'
alertmanager_pagerduty_url: ''
alertmanager_opsgenie_api_key: ''
alertmanager_opsgenie_api_url: ''
alertmanager_victorops_api_key: ''
alertmanager_victorops_api_url: ''
alertmanager_hipchat_api_url: ''
alertmanager_hipchat_auth_token: ''
alertmanager_wechat_url: ''
alertmanager_wechat_secret: ''
alertmanager_wechat_corp_id: ''
# First read: https://github.com/prometheus/alertmanager#high-availability
alertmanager_cluster:
listen-address: ""
# alertmanager_cluster:
# listen-address: "{{ ansible_default_ipv4.address }}:6783"
# peers:
# - "{{ ansible_default_ipv4.address }}:6783"
# - "demo.cloudalchemy.org:6783"
# alertmanager_receivers: []
alertmanager_receivers:
- name: slack
slack_configs:
- send_resolved: true
channel: '#it'
icon_url: https://avatars3.githubusercontent.com/u/3380462
# title: '{{ template "slack.title" . }}'
# text: '{% raw %}{{ template "slack.text" . }}{% endraw %}'
title: '{% raw %}{{ template "slack.monzo.title" . }}{% endraw %}'
icon_emoji: '{% raw %}{{ template "slack.monzo.icon_emoji" . }}{% endraw %}'
color: '{% raw %}{{ template "slack.monzo.color" . }}{% endraw %}'
text: '{% raw %}{{ template "slack.monzo.text" . }}{% endraw %}'
actions:
- type: button
text: 'Runbook :green_book:'
url: '{% raw %}{{ (index .Alerts 0).Annotations.runbook }}{% endraw %}'
- type: button
text: 'Query :mag:'
url: '{% raw %}{{ (index .Alerts 0).GeneratorURL }}{% endraw %}'
- type: button
text: 'Dashboard :grafana:'
url: '{% raw %}{{ (index .Alerts 0).Annotations.dashboard }}{% endraw %}'
- type: button
text: 'Silence :no_bell:'
url: '{% raw %}{{ template "__alert_silence_link" . }}{% endraw %}'
- type: button
text: '{% raw %}{{ template "slack.monzo.link_button_text" . }}{% endraw %}'
url: '{% raw %}{{ .CommonAnnotations.link_url }}{% endraw %}'
alertmanager_inhibit_rules: []
# alertmanager_inhibit_rules:
# - target_match:
# label: value
# source_match:
# label: value
# equal: ['dc', 'rack']
# - target_match_re:
# label: value1|value2
# source_match_re:
# label: value3|value5
# alertmanager_route: {}
alertmanager_route:
repeat_interval: 1h
receiver: slack
# # This routes performs a regular expression match on alert labels to
# # catch alerts that are related to a list of services.
# - match_re:
# service: ^(foo1|foo2|baz)$
# receiver: team-X-mails
# # The service has a sub-route for critical alerts, any alerts
# # that do not match, i.e. severity != critical, fall-back to the
# # parent node and are sent to 'team-X-mails'
# routes:
# - match:
# severity: critical
# receiver: team-X-pager
# - match:
# service: files
# receiver: team-Y-mails
# routes:
# - match:
# severity: critical
# receiver: team-Y-pager
# # This route handles all alerts coming from a database service. If there's
# # no team to handle it, it defaults to the DB team.
# - match:
# service: database
# receiver: team-DB-pager
# # Also group alerts by affected database.
# group_by: [alertname, cluster, database]
# routes:
# - match:
# owner: team-X
# receiver: team-X-pager
# - match:
# owner: team-Y
# receiver: team-Y-pager
# The template for amtool's configuration
alertmanager_amtool_config_file: 'amtool.yml.j2'
# Location (URL) of the alertmanager
alertmanager_amtool_config_alertmanager_url: "{{ alertmanager_web_external_url }}"
# Extended output of `amtool` commands, use '' for less verbosity
alertmanager_amtool_config_output: 'extended'
fatal: [10.10.10.151]: FAILED! => {"changed": false, "checksum": "ac7b62613e1c55fb5592a36f500c9fb6399e8cb0", "exit_status": 1, "msg": "failed to validate", "stderr": "amtool: error: failed to validate 1 file(s)\n\n", "stderr_lines": ["amtool: error: failed to validate 1 file(s)", ""], "stdout": "Checking '/root/.ansible/tmp/ansible-tmp-1601894343.1803482-191455237155978/source' SUCCESS\nFound:\n - global config\n - route\n - 0 inhibit rules\n - 1 receivers\n - 1 templates\n FAILED: template: slack.tmpl:14: unexpected EOF\n\n", "stdout_lines": ["Checking '/root/.ansible/tmp/ansible-tmp-1601894343.1803482-191455237155978/source' SUCCESS", "Found:", " - global config", " - route", " - 0 inhibit rules", " - 1 receivers", " - 1 templates", " FAILED: template: slack.tmpl:14: unexpected EOF", ""]}
Anything else we need to know?:
This is my slack.tmpl template
# This builds the silence URL. We exclude the alertname in the range
# to avoid the issue of having trailing comma separator (%2C) at the end
# of the generated URL
{{ define "__alert_silence_link" -}}
{{ .ExternalURL }}/#/silences/new?filter=%7B
{{- range .CommonLabels.SortedPairs -}}
{{- if ne .Name "alertname" -}}
{{- .Name }}%3D"{{- .Value -}}"%2C%20
{{- end -}}
{{- end -}}
alertname%3D"{{ .CommonLabels.alertname }}"%7D
{{- end }}
{{ define "__alert_severity_prefix" -}}
{{ if ne .Status "firing" -}}
:lgtm:
{{- else if eq .Labels.severity "critical" -}}
:fire:
{{- else if eq .Labels.severity "warning" -}}
:warning:
{{- else -}}
:question:
{{- end }}
{{- end }}
{{ define "__alert_severity_prefix_title" -}}
{{ if ne .Status "firing" -}}
:lgtm:
{{- else if eq .CommonLabels.severity "critical" -}}
:fire:
{{- else if eq .CommonLabels.severity "warning" -}}
:warning:
{{- else if eq .CommonLabels.severity "info" -}}
:information_source:
{{- else -}}
:question:
{{- end }}
{{- end }}
{{/* First line of Slack alerts */}}
{{ define "slack.monzo.title" -}}
[{{ .Status | toUpper -}}
{{ if eq .Status "firing" }}:{{ .Alerts.Firing | len }}{{- end -}}
] {{ template "__alert_severity_prefix_title" . }} {{ .CommonLabels.alertname }}
{{- end }}
{{/* Color of Slack attachment (appears as line next to alert )*/}}
{{ define "slack.monzo.color" -}}
{{ if eq .Status "firing" -}}
{{ if eq .CommonLabels.severity "warning" -}}
warning
{{- else if eq .CommonLabels.severity "critical" -}}
danger
{{- else -}}
#439FE0
{{- end -}}
{{ else -}}
good
{{- end }}
{{- end }}
{{/* Emoji to display as user icon (custom emoji supported!) */}}
{{ define "slack.monzo.icon_emoji" }}:prometheus:{{ end }}
{{/* The test to display in the alert */}}
{{ define "slack.monzo.text" -}}
{{ range .Alerts }}
{{- if .Annotations.message }}
{{ .Annotations.message }}
{{- end }}
{{- if .Annotations.description }}
{{ .Annotations.description }}
{{- end }}
{{- end }}
{{- end }}
{{- /* If none of the below matches, send to #monitoring-no-owner, and we
can then assign the expected code_owner to the alert or map the code_owner
to the correct channel */ -}}
{{ define "__get_channel_for_code_owner" -}}
{{- if eq . "platform-team" -}}
platform-alerts
{{- else if eq . "security-team" -}}
security-alerts
{{- else -}}
monitoring-no-owner
{{- end -}}
{{- end }}
{{- /* Select the channel based on the code_owner. We only expect to get
into this template function if the code_owners label is present on an alert.
This is to defend against us accidentally breaking the routing logic. */ -}}
{{ define "slack.monzo.code_owner_channel" -}}
{{- if .CommonLabels.code_owner }}
{{ template "__get_channel_for_code_owner" .CommonLabels.code_owner }}
{{- else -}}
monitoring
{{- end }}
{{- end }}
{{ define "slack.monzo.link_button_text" -}}
{{- if .CommonAnnotations.link_text -}}
{{- .CommonAnnotations.link_text -}}
{{- else -}}
Link
{{- end }} :link:
{{- end }}
I want to provide my own config template with receivers and routes hard coded. So currently I silence the preflight checks using:
alertmanager_routes: null
alertmanager_receivers: null
Maybe this could be builtin behaviour? I can provide a PR if needed.
Use mechanism introduced in cloudalchemy/ansible-prometheus to allow using latest
as a version specifier instead of numbers.
What did you do?
tried to install alertmanager with log.level=debug
Did you expect to see some different?
no
Environment
dev
Role version:
Insert release version/galaxy tag or Git SHA here
latest
Ansible version information:
ansible --version
Variables:
insert role variables relevant to the issue
insert Ansible logs relevant to the issue here
Anything else we need to know?:
does this role support log.level=debug or any log.level?
We need to add preflight checks to ensure alertmanager is configured properly. Currently configuration isn't checked and this can lead to unusable alertmanager.
Unless the clustering setup is modified, alertmanager fails with:
"create memberlist: Failed to get final advertise address: No private IP address found, and explicit IP not provided"
This can be avoided by having:
alertmanager_cluster:
listen-address: ""
as documented at: https://github.com/prometheus/alertmanager#high-availability
--cluster.listen-address string: cluster listen address (default "0.0.0.0:9094"; empty string disables HA mode)
I propose this be set as the default.
I suspect this is from a recent upstream change. Maybe: prometheus/alertmanager@78c9ebc
CI did not catch this so perhaps it is testing successful deployment but not successful service startup.
I could make a PR for the proposed (trivial!) default change, but I'm not yet well positioned to delve into the CI setup.
What happened?
Error when the checksums for the version are get
Did you expect to see some different?
Download works with version "latest" as documented in README.md
How to reproduce it (as minimally and precisely as possible):
Use the role in another role:
playbook.yml
:
---
- hosts: monitoringserver
roles:
- testrole
roles/testrole/tasks/main.yml
: (example from README.md
)
---
- name: configure prometheus alertmanager
ansible.builtin.include_role:
name: ansible-alertmanager
vars:
alertmanager_version: latest
alertmanager_slack_api_url: "http://example.org"
alertmanager_receivers:
- name: slack
slack_configs:
- send_resolved: true
channel: '#alerts'
alertmanager_route:
group_by: ['alertname', 'cluster', 'service']
group_wait: 30s
group_interval: 5m
repeat_interval: 4h
receiver: slack
Environment
Role version:
0.19.1
/ also tested with master
Ansible version information:
ansible [core 2.12.4]
config file = /home/chris/Documents/shared_projects/ansible-server/ansible.cfg
configured module search path = ['/home/chris/.ansible/plugins/modules', '/usr/share/ansible/plugins/modules']
ansible python module location = /usr/lib/python3/dist-packages/ansible
ansible collection location = /home/chris/.ansible/collections:/usr/share/ansible/collections
executable location = /usr/bin/ansible
python version = 3.10.5 (main, Jun 8 2022, 09:26:22) [GCC 11.3.0]
jinja version = 3.0.3
libyaml = True
Variables:
alertmanager_version: latest
TASK [cloudalchemy.alertmanager : Set prometheus version to 0.24.0] ************************************************************************************************************************************************************************************************************************
ok: [spinach]
TASK [cloudalchemy.alertmanager : Get checksum for amd64 architecture] *********************************************************************************************************************************************************************************************************************
fatal: [spinach]: FAILED! => {"msg": "An unhandled exception occurred while running the lookup plugin 'url'. Error was a <class 'ansible.errors.AnsibleError'>, original message: Received HTTP error for https://github.com/prometheus/alertmanager/releases/download/vlatest/sha256sums.txt : HTTP Error 404: Not Found. Received HTTP error for https://github.com/prometheus/alertmanager/releases/download/vlatest/sha256sums.txt : HTTP Error 404: Not Found"}
Anything else we need to know?:
Probably goes wrong here:
ansible-alertmanager/tasks/preflight.yml
Lines 34 to 37 in f295fe7
I tried adding custom templates, using alertmanager_template_files
, which works. What doesn't work is when I configure the receiver:
alertmanager_receivers:
- name: "receiver"
pagerduty_configs:
- routing_key: "xxx"
send_resolved: true
description: '{{ template "pagerduty.custom.description" . }}'
The job failes on preflight.yml, Fail when there are no receivers defined
and Ansible says
Error was a <class 'ansible.errors.AnsibleError'>, original message: template error while templating string: expected token 'end of print statement', got 'string'. String: {{ template \"pagerduty.custom.description\" . }}
What am I expected to write to use custom templates?
Hi,
if I put this route per severity which I using in cluster:
alertmanager_child_routes:
- match:
severity: Lowest
receiver: slack
- match:
severity: Low
receiver: jira
continue: true
- match:
severity: Low
receiver: slack
- match:
severity: High
receiver: jira
continue: true
- match:
severity: High
receiver: slack
In alertmanager.yml I get:
routes:
- match:
severity: Lowest
receiver: slack
- continue: true
match:
severity: Low
receiver: jira
- match:
severity: Low
receiver: slack
- continue: true
match:
severity: High
receiver: jira
- match:
severity: High
receiver: slack
- continue: true
Expected:
- match:
severity: Lowest
receiver: slack
- match:
severity: Low
receiver: jira
continue: true
- match:
severity: Low
receiver: slack
- match:
severity: High
receiver: jira
continue: true
- match:
severity: High
receiver: slack
If, say, your container doesn't have an IP address compatible with RFC1918, alertmanager will fail to start.
Quick fix: add --cluster.advertise-address=127.0.0.1:$YOUR_PORT as a CLI parameter
This could be done in a playbook I guess
Not sure this is a bug, but it was an unexpected surprise to me and it's causing me some pain.
I have found that when trying to set the alertmanager_external_url
value, it is only set when running the install
tag, and not the configure
tag. Digging in, the reason is that the web.external-url
value is provided to the binary as a command line flag, and that is baked in to the alertmanager.service
template, which is only done during the install step.
This is a problem for me, because I'm trying to bake AMIs in AWS with most of the installation done ahead of time, and then deployment instantiates these AMIs and applies dynamic configuration to them, things like IP addresses, ports, URLs, etc. I've found that the setup here makes it so that I really can't do that.
Since those arguments to the binary are really (at least, in my mind) configuration, shouldn't the configure
task update the flag as appropriate or re-template the service?
The README calls the variable alertmanager_binaries_local_dir
but it is actually called
alertmanager_binary_local_dir.
This causes playbook fails, when people use the name from the README.
What happened?
These tasks are always have changed state in normal and check mode as well. But they shouldn't make noise if alertmanager is already installed.
TASK [cloudalchemy.alertmanager : download alertmanager binary to local folder] ***
changed: [myhost]
TASK [cloudalchemy.alertmanager : unpack alertmanager binaries] ****************
changed: [myhost]
Did you expect to see some different?
After a first successful run, these tasks shouldn't have changed state in next runs (if there is no changes on host of course).
How to reproduce it (as minimally and precisely as possible):
Install this role on a Ubuntu 20.04.1 LTS.
Environment
Role version:
0.19.1
Ansible version information:
2.10.2
Variables:
insert role variables relevant to the issue
insert Ansible logs relevant to the issue here
Anything else we need to know?:
Are the check_mode: false
statements correctly used here?
Hello,
from what I have seen, this ansible role doesn't support the webhook config field in the alertmanager.yml.
https://prometheus.io/docs/alerting/configuration/#webhook_config
Will you support it in the near future ?
Thanks a lot for your work !
Variables from here should be similiar in name to ones from prometheus role.
When creating an http config via the default template it is unable to handle using basic auth. The issue appears to be here:
{% if alertmanager_http_config | length %}
http_config:
{% endif %}
{% for key, value in alertmanager_http_config.items() %}
{{ key }}: {{ value | quote }}
{% endfor %}
I was able to fix it locally by adding a new variable and modifying the template as so:
{% if alertmanager_http_config | length %}
http_config:
{% endif %}
{% if alermanager_http_config_basic_auth | length %}
basic_auth:
{% for key, value in alermanager_http_config_basic_auth.items() %}
{{ key }}: {{ value | quote }}
{% endfor %}
{% endif %}
{% for key, value in alertmanager_http_config.items() %}
{{ key }}: {{ value | quote }}
{% endfor %}
Can be fixed by using a custom template, but seems like it should be part of the default. Would be happy to put in a PR. Also, it appears that the existence of the alert_manager_http_config
variable is missing all together from the README.
The Alertmanager service fails to start, when Prometheus has not started yet. We observer this mainly after a machine reboot:
# journalctl -u alertmanager.service --boot
-- Logs begin at Thu 2019-07-04 01:20:09 UTC, end at Wed 2019-07-10 08:03:29 UTC. --
Jul 10 00:00:19 prometheus.example.com systemd[1]: Started Prometheus Alertmanager.
Jul 10 00:00:19 prometheus.example.com alertmanager[2859]: level=info ts=2019-07-10T00:00:19.771Z caller=main.go:197 msg="Starting Alertmanager" version="(version=0.18.0, branch=HEAD, revision=1ace0f76b7101cccc149d7298022df36039858ca)"
Jul 10 00:00:19 prometheus.example.com alertmanager[2859]: level=info ts=2019-07-10T00:00:19.773Z caller=main.go:198 build_context="(go=go1.12.6, user=root@868685ed3ed0, date=20190708-14:31:49)"
Jul 10 00:00:19 prometheus.example.com alertmanager[2859]: level=warn ts=2019-07-10T00:00:19.799Z caller=cluster.go:154 component=cluster err="couldn't deduce an advertise address: no private IP found, explicit advertise addr not provided"
Jul 10 00:00:19 prometheus.example.com alertmanager[2859]: level=error ts=2019-07-10T00:00:19.815Z caller=main.go:222 msg="unable to initialize gossip mesh" err="create memberlist: Failed to get final advertise address: No private IP addres
Jul 10 00:00:19 prometheus.example.com systemd[1]: alertmanager.service: Main process exited, code=exited, status=1/FAILURE
Jul 10 00:00:19 prometheus.example.com systemd[1]: alertmanager.service: Failed with result 'exit-code'.
Jul 10 00:00:20 prometheus.example.com systemd[1]: alertmanager.service: Service hold-off time over, scheduling restart.
Jul 10 00:00:20 prometheus.example.com systemd[1]: alertmanager.service: Scheduled restart job, restart counter is at 1.
Jul 10 00:00:20 prometheus.example.com systemd[1]: Stopped Prometheus Alertmanager.
Jul 10 00:00:20 prometheus.example.com systemd[1]: Started Prometheus Alertmanager.
Jul 10 00:00:20 prometheus.example.com alertmanager[3231]: level=info ts=2019-07-10T00:00:20.213Z caller=main.go:197 msg="Starting Alertmanager" version="(version=0.18.0, branch=HEAD, revision=1ace0f76b7101cccc149d7298022df36039858ca)"
Jul 10 00:00:20 prometheus.example.com alertmanager[3231]: level=info ts=2019-07-10T00:00:20.213Z caller=main.go:198 build_context="(go=go1.12.6, user=root@868685ed3ed0, date=20190708-14:31:49)"
Jul 10 00:00:20 prometheus.example.com alertmanager[3231]: level=warn ts=2019-07-10T00:00:20.221Z caller=cluster.go:154 component=cluster err="couldn't deduce an advertise address: no private IP found, explicit advertise addr not provided"
Jul 10 00:00:20 prometheus.example.com alertmanager[3231]: level=error ts=2019-07-10T00:00:20.227Z caller=main.go:222 msg="unable to initialize gossip mesh" err="create memberlist: Failed to get final advertise address: No private IP addres
Jul 10 00:00:20 prometheus.example.com systemd[1]: alertmanager.service: Main process exited, code=exited, status=1/FAILURE
Jul 10 00:00:20 prometheus.example.com systemd[1]: alertmanager.service: Failed with result 'exit-code'.
Jul 10 00:00:20 prometheus.example.com systemd[1]: alertmanager.service: Service hold-off time over, scheduling restart.
Jul 10 00:00:20 prometheus.example.com systemd[1]: alertmanager.service: Scheduled restart job, restart counter is at 2.
Jul 10 00:00:20 prometheus.example.com systemd[1]: Stopped Prometheus Alertmanager.
Jul 10 00:00:20 prometheus.example.com systemd[1]: Started Prometheus Alertmanager.
Jul 10 00:00:20 prometheus.example.com alertmanager[3355]: level=info ts=2019-07-10T00:00:20.468Z caller=main.go:197 msg="Starting Alertmanager" version="(version=0.18.0, branch=HEAD, revision=1ace0f76b7101cccc149d7298022df36039858ca)"
Jul 10 00:00:20 prometheus.example.com alertmanager[3355]: level=info ts=2019-07-10T00:00:20.468Z caller=main.go:198 build_context="(go=go1.12.6, user=root@868685ed3ed0, date=20190708-14:31:49)"
Jul 10 00:00:20 prometheus.example.com alertmanager[3355]: level=warn ts=2019-07-10T00:00:20.472Z caller=cluster.go:154 component=cluster err="couldn't deduce an advertise address: no private IP found, explicit advertise addr not provided"
Jul 10 00:00:20 prometheus.example.com alertmanager[3355]: level=error ts=2019-07-10T00:00:20.476Z caller=main.go:222 msg="unable to initialize gossip mesh" err="create memberlist: Failed to get final advertise address: No private IP addres
Jul 10 00:00:20 prometheus.example.com systemd[1]: alertmanager.service: Main process exited, code=exited, status=1/FAILURE
Jul 10 00:00:20 prometheus.example.com systemd[1]: alertmanager.service: Failed with result 'exit-code'.
Jul 10 00:00:20 prometheus.example.com systemd[1]: alertmanager.service: Service hold-off time over, scheduling restart.
Jul 10 00:00:20 prometheus.example.com systemd[1]: alertmanager.service: Scheduled restart job, restart counter is at 3.
Jul 10 00:00:20 prometheus.example.com systemd[1]: Stopped Prometheus Alertmanager.
Jul 10 00:00:20 prometheus.example.com systemd[1]: Started Prometheus Alertmanager.
Jul 10 00:00:20 prometheus.example.com alertmanager[3790]: level=info ts=2019-07-10T00:00:20.874Z caller=main.go:197 msg="Starting Alertmanager" version="(version=0.18.0, branch=HEAD, revision=1ace0f76b7101cccc149d7298022df36039858ca)"
Jul 10 00:00:20 prometheus.example.com alertmanager[3790]: level=info ts=2019-07-10T00:00:20.877Z caller=main.go:198 build_context="(go=go1.12.6, user=root@868685ed3ed0, date=20190708-14:31:49)"
Jul 10 00:00:20 prometheus.example.com alertmanager[3790]: level=warn ts=2019-07-10T00:00:20.882Z caller=cluster.go:154 component=cluster err="couldn't deduce an advertise address: no private IP found, explicit advertise addr not provided"
Jul 10 00:00:20 prometheus.example.com alertmanager[3790]: level=error ts=2019-07-10T00:00:20.885Z caller=main.go:222 msg="unable to initialize gossip mesh" err="create memberlist: Failed to get final advertise address: No private IP addres
Jul 10 00:00:20 prometheus.example.com systemd[1]: alertmanager.service: Main process exited, code=exited, status=1/FAILURE
Jul 10 00:00:20 prometheus.example.com systemd[1]: alertmanager.service: Failed with result 'exit-code'.
Jul 10 00:00:21 prometheus.example.com systemd[1]: alertmanager.service: Service hold-off time over, scheduling restart.
Jul 10 00:00:21 prometheus.example.com systemd[1]: alertmanager.service: Scheduled restart job, restart counter is at 4.
Jul 10 00:00:21 prometheus.example.com systemd[1]: Stopped Prometheus Alertmanager.
Jul 10 00:00:21 prometheus.example.com systemd[1]: Started Prometheus Alertmanager.
Jul 10 00:00:21 prometheus.example.com alertmanager[3918]: level=info ts=2019-07-10T00:00:21.109Z caller=main.go:197 msg="Starting Alertmanager" version="(version=0.18.0, branch=HEAD, revision=1ace0f76b7101cccc149d7298022df36039858ca)"
Jul 10 00:00:21 prometheus.example.com alertmanager[3918]: level=info ts=2019-07-10T00:00:21.110Z caller=main.go:198 build_context="(go=go1.12.6, user=root@868685ed3ed0, date=20190708-14:31:49)"
Jul 10 00:00:21 prometheus.example.com alertmanager[3918]: level=warn ts=2019-07-10T00:00:21.115Z caller=cluster.go:154 component=cluster err="couldn't deduce an advertise address: no private IP found, explicit advertise addr not provided"
Jul 10 00:00:21 prometheus.example.com alertmanager[3918]: level=error ts=2019-07-10T00:00:21.118Z caller=main.go:222 msg="unable to initialize gossip mesh" err="create memberlist: Failed to get final advertise address: No private IP addres
Jul 10 00:00:21 prometheus.example.com systemd[1]: alertmanager.service: Main process exited, code=exited, status=1/FAILURE
Jul 10 00:00:21 prometheus.example.com systemd[1]: alertmanager.service: Failed with result 'exit-code'.
Jul 10 00:00:21 prometheus.example.com systemd[1]: alertmanager.service: Service hold-off time over, scheduling restart.
Jul 10 00:00:21 prometheus.example.com systemd[1]: alertmanager.service: Scheduled restart job, restart counter is at 5.
Jul 10 00:00:21 prometheus.example.com systemd[1]: Stopped Prometheus Alertmanager.
Jul 10 00:00:21 prometheus.example.com systemd[1]: alertmanager.service: Start request repeated too quickly.
Jul 10 00:00:21 prometheus.example.com systemd[1]: alertmanager.service: Failed with result 'exit-code'.
Jul 10 00:00:21 prometheus.example.com systemd[1]: Failed to start Prometheus Alertmanager.
We're running Prometheus and Alertmanager on the same host (deployed using your Ansible roles ๐), so waiting for Prometheus seems a good measure:
diff templates/alertmanager.service.j2
[Unit]
-After=network.target
+After=network.target prometheus.service
I realise this might need a new variable and a conditional for general usage (i.e. when both services run on different hosts). Alternatively (or additionaly), it might be also useful to add a delay between the retries to give Prometheus a fair chance to start (as you can see in the log above, the restart attempts all happened within 2s):
diff templates/alertmanager.service.j2
Restart=always
+RestartSec=5s
As alertmanager do not write logs to the directory it is not necessary to create this dir
Hi,
When the new release will be made ?
I've seen on CHANGELOG.md you've updated ansible-alertmanager to use the last version of alert-manager but no release yet :(
Thanks !
What happened?
I made a mistake with my Go template. Ansible restarted Alertmanager. The mistake causes Alertmanager to crash on startup. There is no validation done.
Did you expect to see some different?
It would be nice to have a amtool check-config
done before the restart.
Unfortunately it only accepts alertmanager.yml
and not the .tmpl
file directly, blocking the use of the validate
option. Possible solution would be to have a task manually run amtool check-config, and restart if the templates were updated and the check-config is OK.
How to reproduce it (as minimally and precisely as possible):
Mess up a go template, run your playbook.
Environment
0.17.2
ansible 2.8.5
config file = /var/lib/ansible/ansible.cfg
configured module search path = ['/home/vos/.ansible/plugins/modules', '/usr/share/ansible/plugins/modules']
ansible python module location = /usr/local/lib/python3.7/dist-packages/ansible
executable location = /usr/local/bin/ansible
python version = 3.7.3 (default, Apr 3 2019, 05:39:12) [GCC 8.3.0]
insert role variables relevant to the issue
TASK [cloudalchemy.alertmanager : copy alertmanager config] *******************************************************************************************************************************************************
ok: [prometheus1]
TASK [cloudalchemy.alertmanager : create systemd service unit] ****************************************************************************************************************************************************
ok: [prometheus1]
TASK [cloudalchemy.alertmanager : copy alertmanager template files] ***********************************************************************************************************************************************
--- before: /etc/alertmanager/templates/vos.tmpl
+++ after: /var/lib/ansible/playbooks/files/alertmanager/templates/vos.tmpl
@@ -1,10 +1,11 @@
+{{ invalidtemplatecode }}
{{ define "slack.default.text" }}
[PromQL Expression]({{ (index .Alerts 0).GeneratorURL }})
{{ if gt (len .Alerts.Firing) 0 }}
{{ range .Alerts.Firing }}{{ .Annotations.description }}
{{ end }}
{{ end }}
{{ if gt (len .Alerts.Resolved) 0 }}
**Resolved:**
{{ range .Alerts.Resolved }}~~{{ .Annotations.description }}~~
_start: {{ .StartsAt }}_
changed: [prometheus1] => (item=/var/lib/ansible/playbooks/files/alertmanager/templates/vos.tmpl)
TASK [cloudalchemy.alertmanager : ensure alertmanager service is started and enabled] *****************************************************************************************************************************
ok: [prometheus1]
RUNNING HANDLER [cloudalchemy.alertmanager : restart alertmanager] ************************************************************************************************************************************************
changed: [prometheus1]
May 10 22:18:41 prometheus1 systemd[1]: alertmanager.service: Main process exited, code=exited, status=1/FAILURE
May 10 22:18:41 prometheus1 systemd[1]: alertmanager.service: Failed with result 'exit-code'.
Anything else we need to know?:
Hello,
with this role it is not possible to configure a mesh.
To configure a mesh each peer needs to know it's peers. The documentation says:
--mesh.peer value: initial peers (repeat flag for each additional peer)
so I need to be able to repeat the mesh.peer flag in the role var "alertmanager_cli_flags"
I tried
alertmanager_cli_flags:
mesh.peer=123.456.789.1:6783
mesh.peer=123.456.789.2:6783
but only the last "mesh.peer" is passed to the cli as parameter. I looked at the code and it seems that a dictionary/map is used which always overrides the the same key with another value. That's why only the last key is passed
A workaround I found is setting a default value and passing "more":
alertmanager_cli_flags:
log.level: "info --mesh.peer=123.456.789.1:6783 --mesh.peer=123.456.789.2:6783"
Hi
Please add a real life example playbook. An "empty" one just including role isn't enought, as the assert enforce you need receiver & route. And coming up with the proper yaml is a bit tedious. Just something like this would have helped getting stuff really started
---
# https://github.com/cloudalchemy/ansible-alertmanager
alertmanager_web_external_url: alertmanager.domain.tld
alertmanager_receivers:
- name: infra-ml
email_configs:
- to: "[email protected]"
from: "[email protected]"
smarthost: "smtp.gmail.com:587"
auth_username: "[email protected]"
auth_identity: "[email protected]"
auth_password: UseGmailAppToken
alertmanager_route:
group_by: ["..."]
receiver: infra-ml
Best regards,
Latest release of alertmanager (0.13.0) requires a change to the systemd alertmanager.service template: all startup options needs to be prefixed with '--' and not just '-' (e.g. --config.file instead of -config.file)
What happened?
The checksum always failed and the whole playbook failed. Suspect ansible lookup doesn't following link with HTTP 302
Did you expect to see some different?
Checksum is correctly pulled?
How to reproduce it (as minimally and precisely as possible):
git clone https://github.com/cloudalchemy/ansible-alertmanager roles/alertmanager
Create a basic playbook and include it as roles.
Environment
$ curl -s https://github.com/prometheus/alertmanager/releases/download/v0.20.0/sha256sums.txt
<html><body>You are being <a href="https://github-production-release-asset-2e65be.s3.amazonaws.com/11452538/a510c800-1c60-11ea-8d4c-414fc4fea6b5?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=AKIAIWNJYAX4CSVEH53A%2F20200303%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20200303T081307Z&X-Amz-Expires=300&X-Amz-Signature=857a1313e6f3cf669e7cef5c2fceb7b529cab239a1835ba201aad78b68bc6748&X-Amz-SignedHeaders=host&actor_id=0&response-content-disposition=attachment%3B%20filename%3Dsha256sums.txt&response-content-type=application%2Foctet-stream">redirected</a>.</body></html>
$ curl -s https://github.com/prometheus/alertmanager/releases/download/v0.20.0/sha256sums.txt -L
78d741b3bdcb910619f498d7662969cae363c9d5840d7cef1e5481f103de59ca alertmanager-0.20.0.darwin-386.tar.gz
5134585c6200856ca70f61502a460413f6a0e6c848e7156724e5d568f77aba56 alertmanager-0.20.0.darwin-amd64.tar.gz
8bc8ac50a4a7545b0a2e6a6d710ab4c51e8ccd85d519013db9495e4549546f93 alertmanager-0.20.0.dragonfly-amd64.tar.gz
810fabaad75f1d5d172f48eeafb6099f669c37baababff458e798474e006969e alertmanager-0.20.0.freebsd-386.tar.gz
5789adb5d4da773ec8480458c3d445985d9fbb3ce8cbe939090508b0a96f436d alertmanager-0.20.0.freebsd-amd64.tar.gz
645fd8b1bb541a360521b12694e7483017f9c5b95152a313630b8f3c06cbeb3e alertmanager-0.20.0.freebsd-armv6.tar.gz
48d3b69ca5618bd6632b10563eab1e7331ffdf9e1b6943cc34002beaccdec7cb alertmanager-0.20.0.freebsd-armv7.tar.gz
0f922a82a7358a33736d388faa9b44c661f4844c15a4c2eadeb71d1f6738bd66 alertmanager-0.20.0.linux-386.tar.gz
3a826321ee90a5071abf7ba199ac86f77887b7a4daa8761400310b4191ab2819 alertmanager-0.20.0.linux-amd64.tar.gz
ee219113b4dad6042f3f88dccea48ee15ac5e7d5c84933bc90f320819b71e1c5 alertmanager-0.20.0.linux-arm64.tar.gz
11d92562c72d9fc747db45bcf48d181f3db7b178a254af4877f74ab20f986a6a alertmanager-0.20.0.linux-armv5.tar.gz
89762e97cb18b4a47557cf74734fb398645ea5d8191b71b248b0dc515073e370 alertmanager-0.20.0.linux-armv6.tar.gz
5ebd33da8d61cef6ea1aab2ecc73310ff3fe4320eb76851ae71e22e3c5ddbc36 alertmanager-0.20.0.linux-armv7.tar.gz
fbd6ab4471b4c9c167d7fbe8f4b90cca2415b2d5426a7ce74734a9182613573e alertmanager-0.20.0.linux-mips64.tar.gz
5ea2ee935119d15247359d9bf1124c00dfb8e62882be8862721a477ff728b3b4 alertmanager-0.20.0.linux-mips64le.tar.gz
66c1aa886c48e6aef7cda3d835fa985254d59a2a811b204f69aa806e7796e806 alertmanager-0.20.0.linux-ppc64.tar.gz
1cf6e81a3964e63019026518574722922fd6d98fb256f3dba49efdcff20b14ff alertmanager-0.20.0.linux-ppc64le.tar.gz
53e8be5b029dc00fce97d1f79d5202a54ad5b20aa5ca135fc168f5eefd0f6b5c alertmanager-0.20.0.linux-s390x.tar.gz
d53382b389139876e13e22f500c19cd79fde67ab899dae51961bdb0e097734e0 alertmanager-0.20.0.netbsd-386.tar.gz
378a19ab208631f989ab353b0b3e3e4c1637202ca1b6c47f134ce1470560912f alertmanager-0.20.0.netbsd-amd64.tar.gz
2defc8b8ab59a291aa0e81c032aa27185e9c4ea702858e7e994f0df1dafd4164 alertmanager-0.20.0.netbsd-armv6.tar.gz
8463f50957935c1723d1f2ec5d386c9a1379442dde655d8ea430d6ded1a9d47e alertmanager-0.20.0.netbsd-armv7.tar.gz
476edebdff737cc7356556637b8caaea7b3708a2b8b370ab8d3201bd2c52cc36 alertmanager-0.20.0.openbsd-386.tar.gz
039d0fc1cc00e710f94ceda53f1bb4ec22341965c0e4bc623ab511edf1d61930 alertmanager-0.20.0.openbsd-amd64.tar.gz
28ad18728935412dcddc10a6ab7bdd4858bd7c832444bfb1880266caf2a310f6 alertmanager-0.20.0.windows-386.tar.gz
5887902bea633d8b3396804760308acc0a2631b7ca85df75d8c526cd1985d62b alertmanager-0.20.0.windows-amd64.tar.gz
Role version:
de29024767b37a6df1073cbf86254ea908a378a0
Ansible version information:
$ ansible --version
ansible 2.9.4
config file = /Users/teochenglim/thunes/sysadmin/ansible/projects/cobra/ansible.cfg
configured module search path = ['/Users/teochenglim/.ansible/plugins/modules', '/usr/share/ansible/plugins/modules']
ansible python module location = /usr/local/Cellar/ansible/2.9.4_1/libexec/lib/python3.8/site-packages/ansible
executable location = /usr/local/bin/ansible
python version = 3.8.1 (default, Dec 27 2019, 18:06:00) [Clang 11.0.0 (clang-1100.0.33.16)]
my work around as to hard code it.
go_arch: "amd64"
alertmanager_checksum: "3a826321ee90a5071abf7ba199ac86f77887b7a4daa8761400310b4191ab2819"
___________________________________________________________
< TASK [alertmanager : Get checksum for amd64 architecture] >
-----------------------------------------------------------
\ ^__^
\ (oo)\_______
(__)\ )\/\
||----w |
|| ||
objc[72571]: +[__NSCFConstantString initialize] may have been in progress in another thread when fork() was called.
objc[72571]: +[__NSCFConstantString initialize] may have been in progress in another thread when fork() was called. We cannot safely call it or ignore it in the fork() child process. Crashing instead. Set a breakpoint on objc_initializeAfterForkError to debug.
ERROR! A worker was found in a dead state
Anything else we need to know?:
I commented out 1 job at file task/preflight.yml
Hard coded the variable using group_var.
I am running this on MacOS
# - name: "Get checksum for {{ go_arch }} architecture"
# set_fact:
# alertmanager_checksum: "{{ item.split(' ')[0] }}"
# with_items:
# - "{{ lookup('url', 'https://github.com/prometheus/alertmanager/releases/download/v' + alertmanager_version + '/sha256sums.txt', wantlist=True) | list }}"
# when:
# - "('linux-' + go_arch + '.tar.gz') in item"
# - alertmanager_binary_local_dir | length == 0
go_arch: "amd64"
alertmanager_checksum: "3a826321ee90a5071abf7ba199ac86f77887b7a4daa8761400310b4191ab2819"
What happened?
Even after setting alertmanager_binary_local_dir
the task "Get checksum for {{ go_arch }} architecture"
gets executed and fails (curl fails because we are offline).
Did you expect to see some different?
no failure (task skipped)
How to reproduce it (as minimally and precisely as possible):
Environment
2.9.9
alertmanager_binary_local_dir: "alertmanager-0.21.0.linux-amd64"
TASK [ansible-alertmanager-master : Get checksum for amd64 architecture] *********************************************************************************************************************************************************************
task path: /home/user/ansible/roles/ansible-alertmanager-master/tasks/preflight.yml:46
url lookup connecting to https://github.com/prometheus/alertmanager/releases/download/v0.21.0/sha256sums.txt
fatal: FAILED! => {
"msg": "An unhandled exception occurred while running the lookup plugin 'url'. Error was a <class 'ansible.errors.AnsibleError'>, original message: Failed lookup url for https://github.com/prometheus/alertmanager/releases/download/v0.21.0/sha256sums.txt : <urlopen error [Errno -2] Name or service not known>"
}
Hi,
Commit 6f050af broke this role for me.
Can someone please create a TAG from before this change.
Kind regards,
Mathias
When running the playbook against a Centos 8 host, I receive the following error:
failed: [host] (item=policycoreutils-python) => {"ansible_loop_var": "item", "attempts": 5, "changed": false, "failures": ["No package policycoreutils-python available."], "item": "policycoreutils-python", "msg": "Failed to install some of the specified packages", "rc": 1, "results": []}
Could this be resolved by having the same centos.yml
and centos-8.yml
files in the vars
directory like https://github.com/cloudalchemy/ansible-prometheus/tree/master/vars ?
What is missing?
Add on README.md a example playbook that alertmanager_route
and alertmanager_receivers
is needed to the role to achieve the desired state.
Even if it is in the documentation of the variables, it does not explain the obligation of the same and as the demonstration project is archived, I find it interesting to provide a basic example (functional or not) with these two variables.
Why do we need it?
Quickstart for anyone that want to use this excellent role.
Environment
Role version:
3f4f089cae9fb6cf7a71c27745a3a5e6eac0b929
Ansible version information:
ansible --version
Anything else we need to know?:
If desired I can perform a PR with the content below (simple boilerplate for Telegram) that I am using in another repository for example:
- name: AlertManager role
hosts: monitoring
become: yes
roles:
- role: cloudalchemy.alertmanager
vars:
alertmanager_route:
group_by: [job]
receiver: default
alertmanager_receivers:
- name: default
webhook_configs:
- send_resolved: True
url: http://localhost:9087/alert/-chatid
Thanks for the excellent role!
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.