Ansible Collection for Prometheus
Documentation: https://prometheus-community.github.io/ansible/
Ansible Collection for Prometheus
Home Page: https://prometheus-community.github.io/ansible/
License: Apache License 2.0
Ansible Collection for Prometheus
Documentation: https://prometheus-community.github.io/ansible/
Hello the node exporter installation is failing during this part:
TASK [prometheus.prometheus.node_exporter : Download node_exporter binary to local folder] ************************
fatal: [...*]: UNREACHABLE! => {"changed": false, "msg": "Failed to connect to the host via ssh: ubuntu@localhost: Permission denied (publickey).", "unreachable": true}
I tried this line and no errors were obtained but the installation ended up being moved to the provisioning machine and not in the target:
ansible-playbook nodeexp.yml -i inventory --connection=local
I'm using the simplest example of playbook and all the other tasks executed fine
TASK [prometheus.prometheus.node_exporter : Copy the node_exporter systemd service file] ************************************************************************************************************
An exception occurred during task execution. To see the full traceback, use -vvv. The error was: ): 'AnsibleUnicode' object is not callable. 'AnsibleUnicode' object is not callable
fatal: [agn-an-afg-prod-republisher]: FAILED! => changed=false
msg: |-
AnsibleError: Unexpected templating type error occurred on ({{ ansible_managed | comment }}
[Unit]
Description=Prometheus Node Exporter
After=network-online.target
[Service]
Type=simple
User={{ node_exporter_system_user }}
Group={{ node_exporter_system_group }}
ExecStart={{ node_exporter_binary_install_dir }}/node_exporter \
{% for collector in node_exporter_enabled_collectors -%}
{% if not collector is mapping %}
'--collector.{{ collector }}' \
{% else -%}
{% set name, options = (collector.items()|list)[0] -%}
'--collector.{{ name }}' \
{% for k,v in options|dictsort %}
'--collector.{{ name }}.{{ k }}={{ v }}' \
{% endfor -%}
{% endif -%}
{% endfor -%}
{% for collector in node_exporter_disabled_collectors %}
'--no-collector.{{ collector }}' \
{% endfor %}
{% if node_exporter_tls_server_config | length > 0 or node_exporter_http_server_config | length > 0 or node_exporter_basic_auth_users | length > 0 %}
{% if node_exporter_version is version('1.5.0', '>=') %}
'--web.config.file=/etc/node_exporter/config.yaml' \
{% else %}
'--web.config=/etc/node_exporter/config.yaml' \
{% endif %}
{% endif %}
'--web.listen-address={{ node_exporter_web_listen_address }}' \
'--web.telemetry-path={{ node_exporter_web_telemetry_path }}'
SyslogIdentifier=node_exporter
Restart=always
RestartSec=1
StartLimitInterval=0
{% set ns = namespace(protect_home = 'yes') %}
{% for m in ansible_mounts if m.mount.startswith('/home') %}
{% set ns.protect_home = 'read-only' %}
{% endfor %}
{% if node_exporter_textfile_dir.startswith('/home') %}
{% set ns.protect_home = 'read-only' %}
{% endif %}
ProtectHome={{ ns.protect_home }}
NoNewPrivileges=yes
{% if (ansible_facts.packages.systemd | first).version is version('232', '>=') %}
ProtectSystem=strict
ProtectControlGroups=true
ProtectKernelModules=true
ProtectKernelTunables=yes
{% else %}
ProtectSystem=full
{% endif %}
[Install]
WantedBy=multi-user.target
): 'AnsibleUnicode' object is not callable. 'AnsibleUnicode' object is not callable
Version:
ansible [core 2.15.2]
config file = /workspace/myproject/ansible.cfg
configured module search path = ['/home/gitpod/.ansible/plugins/modules', '/usr/share/ansible/plugins/modules']
ansible python module location = /usr/lib/python3/dist-packages/ansible
ansible collection location = /workspace/myproject/collections
executable location = /usr/bin/ansible
python version = 3.10.12 (main, Jun 11 2023, 05:26:28) [GCC 11.4.0] (/usr/bin/python3)
jinja version = 3.0.3
libyaml = True
Collection version: 0.7.0
node_exporter variables I am defining:
node_exporter_version: latest
systemd_exporter.service
failed on startup with error: unknown long flag '--collector.enable-restart-count'
after adding vars to my playbook:
- hosts: prometheus
roles:
- prometheus.prometheus.systemd_exporter
vars:
systemd_exporter_enable_ip_accounting: true
systemd_exporter_enable_restart_count: true
Logs:
ahoyt@prom1:~$ sudo journalctl --unit systemd_exporter --since "10 minutes ago"
Jul 13 19:26:12 prom1 systemd[1]: Started Prometheus SystemD Exporter.
Jul 13 19:26:12 prom1 systemd_exporter[45867]: systemd_exporter: error: unknown long flag '--collector.enable-restart-count', try --help
Jul 13 19:26:12 prom1 systemd[1]: systemd_exporter.service: Main process exited, code=exited, status=1/FAILURE
Jul 13 19:26:12 prom1 systemd[1]: systemd_exporter.service: Failed with result 'exit-code'.
Jul 13 19:26:14 prom1 systemd[1]: systemd_exporter.service: Scheduled restart job, restart counter is at 6104.
Jul 13 19:26:14 prom1 systemd[1]: Stopped Prometheus SystemD Exporter.
Jul 13 19:26:14 prom1 systemd[1]: Started Prometheus SystemD Exporter.
It works after prepending systemd.
to the flags in systemd_exporter.service.j2 to match https://github.com/prometheus-community/systemd_exporter/blob/main/README.md
Tested with an Ubuntu 22.04.2 LTS VM.
still get warning message in ansible 2.14.6
$ ansible-playbook --inventory inventory
TASK [include roles] **************************************************************************
[WARNING]: Collection prometheus.prometheus does not support Ansible version 2.14.6
in https://github.com/prometheus-community/ansible/blob/main/meta/runtime.yml#L2
the supported ansible version is 2.9.0*, 2.10.0*, 2.11.0*, 2.12.0*, 2.13.0*, 2.14.0*, 2.15.0*
would it be ">= 2.9.0, < 2.16.0" ?
$ uname -a
Darwin johns-mbp.local 22.5.0 Darwin Kernel Version 22.5.0: Thu Jun 8 22:22:20 PDT 2023; root:xnu-8796.121.3~7/RELEASE_ARM64_T6000 arm64
$ ansible --version
ansible [core 2.11.7]
config file = /Users/jmaguire/src/my-ansible/ansible.cfg
configured module search path = ['/Users/jmaguire/src/my-ansible/library']
ansible python module location = /opt/homebrew/lib/python3.9/site-packages/ansible
ansible collection location = /Users/jmaguire/.ansible/collections:/usr/share/ansible/collections
executable location = /opt/homebrew/bin/ansible
python version = 3.9.17 (main, Jun 6 2023, 14:33:55) [Clang 14.0.3 (clang-1403.0.22.14.1)]
jinja version = 3.0.3
libyaml = True
---
- name: Provision Prometheus
hosts: prometheus
roles:
- role: prometheus.prometheus.prometheus
tags: [ prom ]
...
TASK [prometheus.prometheus.prometheus : Discover latest version] *********************************************************************************************
task path: /Users/jmaguire/.ansible/collections/ansible_collections/prometheus/prometheus/roles/prometheus/tasks/preflight.yml:76
skipping: [tornado.johnmaguire.me] => {
"changed": false,
"skip_reason": "Conditional result was False"
}
TASK [prometheus.prometheus.prometheus : Get checksum list] ***************************************************************************************************
task path: /Users/jmaguire/.ansible/collections/ansible_collections/prometheus/prometheus/roles/prometheus/tasks/preflight.yml:93
url lookup connecting to https://github.com/prometheus/prometheus/releases/download/v2.44.0/sha256sums.txt
objc[49407]: +[__NSCFConstantString initialize] may have been in progress in another thread when fork() was called.
objc[49407]: +[__NSCFConstantString initialize] may have been in progress in another thread when fork() was called. We cannot safely call it or ignore it in the fork() child process. Crashing instead. Set a breakpoint on objc_initializeAfterForkError to debug.
ERROR! A worker was found in a dead state
I'd like to request the enhancement of the Ansible role for Alertmanager to support the time_intervals
feature in its configuration. This feature allows specifying time-based intervals for alert management, and it would be a valuable addition to the existing functionality.
Here is an example of how the time_intervals
could be structured in YAML configuration:
time_intervals:
- name: office-hours
time_intervals:
- times:
- start_time: "08:00"
end_time: "17:00"
weekdays: ['Monday:Friday']
The proposed enhancement is based on the Prometheus Alerting Configuration documentation, specifically detailed here: Prometheus Alerting Configuration - time_interval.
I recommend introducing new variables, possibly named alertmanager_time_intervals
, and updating the Ansible template to accommodate this new configuration option.
Thank you for considering this enhancement request, and please let me know if you need any further information or assistance in implementing this feature.
Hi,
this is more a discussion/question or design proposal than an issue, but since this repo has discussion not enabled
I'll put it in an issue. (Would you mind enabling discussions on this repo?)
Prometheus does not support conf.d style config includes for static scrape configs and leaves this task to configuration management tools like ansible.
It would be nice if the ansible roles in this repo would support a design where every exporter role could simply drop in their static scrape_config files on the prometheus server (via delegate_to) in a folder like /etc/prometheus/conf.d
and the global config is then constructed by assembling the files in that folder.
So the changes to the current roles would be fairly small:
/etc/prometheus/prometheus.yml
based on the contents of /etc/prometheus/conf.d
What do you think about it?
I could prepare a PR for it.
Background:
I'm maintaining a role and would like to place the scrape configs on the prometheus server while maintaining compatibility with the prometheus role.
I'm currently creating the scrape config files on the prometheus server and trigger an assemble handler:
nusenu/ansible-relayor@24878b5#diff-67da321934d1a76cffaa1feed7ef7899327b68724357089bf6eeb4af62103715
but this is not compatible with the current prometheus role. The proposed design would make it possible for arbitrary exporter roles to drop in their scrape configs while still maintaining compatibility.
Unfortunately it is not possible to use file_sd_config
as a solution, as far as I know, because it does not support setting the metrics_path
and a single server has may exporters (example.com/exporter1, example.com/exporter2, ...).
On Ansible Core 2.14.2 (Ansible 7.2.0) I got the following warning:
[WARNING]: Error parsing collection metadata requires_ansible value from collection prometheus.prometheus: Invalid specifier: '2.9'
I get the following error when trying to execute a playbook with the latest revision of node_exporter
role.
fatal: [REDACTED-> localhost]: FAILED! =>
{
"attempts": 5,
"changed": false,
"dest": "/tmp/node_exporter-1.5.0.linux-amd64.tar.gz",
"elapsed": 0,
"msg": "An unknown error occurred: URL can't contain control characters. '/prometheus/node_exporter/releases/download/v1.5.0/ node_exporter-1.5.0.linux-amd64.tar.gz' (found at least ' ')",
"url": "https://github.com/prometheus/node_exporter/releases/download/v1.5.0/ node_exporter-1.5.0.linux-amd64.tar.gz"
}
It looks like it's due to the space charater in the download url:
https://github.com/prometheus/node_exporter/releases/download/v1.5.0/ node_exporter-1.5.0.linux-amd64.tar.gz
.
Hi! I was looking into this collection, coming from cloudalchemy/ansible-node-exporter#266 (comment) and decided to try it out.
I noticed that for example here: https://github.com/prometheus-community/ansible/blob/main/roles/node_exporter/tasks/configure.yml#L7 there seems to be a search/replace issue with the group parameter (presumably to use fully quantified roles everywhere?)
I am not sure where else this occurs as I haven't searched outside of this file.
Thanks!
There are some issues with the TLS configuration in the README (https://github.com/prometheus-community/ansible/tree/main/roles/node_exporter#tls-config).
Create node_exporter cert dir
task defines the directory owner and group, but we should use the default values defined in the node_exporter role.Create cert and key
task uses the openssl_certificate module, but it was renamed to community.crypto.x509_certificate when moved to the collection community.crypto.Create cert and key
task, we have to create the cert_file(ex. /etc/node_exporter/tls.csr) and the key file(ex. /etc/node_exporter/tls.key) in advance, but the task that creates these files is not defined anywhere.If give a user-friendly configuration example, change the playbook.yml as below.
- hosts: all
pre_tasks:
- name: Check if pip3 is installed
ansible.builtin.command:
cmd: which pip3
register: pip3_check
ignore_errors: true
- name: install pip3
ansible.builtin.apt:
name: python3-pip
register: pip3_install
when:
- pip3_check.rc is defined
- not pip3_check.rc == 0
- name: install cryptography python package
ansible.builtin.pip:
name: cryptography
when:
- (pip3_check.rc is defined and pip3_check.rc == 0) or (pip3_install is succeeded and pip3_install is not skipped)
- name: Check if cryptography python package is installed
ansible.builtin.command:
cmd: pip3 freeze | grep cryptography
register: cryptography_check
ignore_errors: true
when:
- (pip3_check.rc is defined and pip3_check.rc == 0) or (pip3_install is succeeded and pip3_install is not skipped)
- name: Create the node_exporter group
ansible.builtin.group:
name: "{{ _node_exporter_system_group }}"
state: present
system: true
when: _node_exporter_system_group != "root"
- name: Create the node_exporter user
ansible.builtin.user:
name: "{{ _node_exporter_system_user }}"
groups: "{{ _node_exporter_system_group }}"
append: true
shell: /usr/sbin/nologin
system: true
create_home: false
home: /
when: _node_exporter_system_user != "root"
- name: Create node_exporter cert dir
file:
path: "/etc/node_exporter"
state: directory
owner: "{{ _node_exporter_system_group }}"
group: "{{ _node_exporter_system_group }}"
- name: Check /etc/node_exporter exists
stat:
path: "/etc/node_exporter"
register: node_exporter_dir
- name: Create private key (RSA, 4096 bits)
community.crypto.openssl_privatekey:
path: /etc/node_exporter/tls.key
owner: "{{ _node_exporter_system_group }}"
group: "{{ _node_exporter_system_group }}"
when:
- cryptography_check is defined
- cryptography_check.rc is defined
- cryptography_check.rc == 0
- node_exporter_dir.stat.exists
- name: Create cert and key
community.crypto.x509_certificate:
path: /etc/node_exporter/tls.cert
privatekey_path: /etc/node_exporter/tls.key
provider: selfsigned
owner: "{{ _node_exporter_system_group }}"
group: "{{ _node_exporter_system_group }}"
when:
- cryptography_check is defined
- cryptography_check.rc is defined
- cryptography_check.rc == 0
- node_exporter_dir.stat.exists
roles:
- node_exporter
vars:
node_exporter_tls_server_config:
cert_file: /etc/node_exporter/tls.cert
key_file: /etc/node_exporter/tls.key
node_exporter_basic_auth_users:
randomuser: examplepassword
Also, add conditions to the task to run the "Assert TLS configuration is correct" task as follows: This is to prevent this task from failing (e.g. if the node_exporter cert directory doesn't already exist).
+- name: Check existence of TLS cert file directory
+ ansible.builtin.stat:
+ path: "{{ node_exporter_tls_server_config.cert_file | dirname }}"
+ register: node_exporter_cert_file_dir
+ when:
+ - node_exporter_tls_server_config | length > 0
+ - node_exporter_tls_server_config.cert_file is defined
+ - node_exporter_tls_server_config.cert_file | length > 0
+
+- name: Check existence of TLS key file directory
+ ansible.builtin.stat:
+ path: "{{ node_exporter_tls_server_config.key_file | dirname }}"
+ register: node_exporter_key_file_dir
+ when:
+ - node_exporter_tls_server_config | length > 0
+ - node_exporter_tls_server_config.key_file is defined
+ - node_exporter_tls_server_config.key_file | length > 0
+
- name: Assert that TLS config is correct
- when: node_exporter_tls_server_config | length > 0
+ when:
+ - node_exporter_tls_server_config | length > 0
+ - node_exporter_cert_file_dir.stat is defined and node_exporter_cert_file_dir.stat.exists
+ - node_exporter_key_file_dir.stat is defined and node_exporter_key_file_dir.stat.exists
block:
- name: Assert that TLS key and cert path are set
ansible.builtin.assert:
Writing this much README for TLS configuration is not user-friendly, so maybe we should fix the preflight.yml or configure.yml tasks.
Issue #13 was never really fixed. I incorrectly assumed something about the behavior of scopes in Jinja2 templates. Thank you @tjdavis3 for uncovering this problem. Fixed in #94, which uses jinja2's namespaces to ensure variable change propagates. I wanted to use something nicer like python's any()
or something more clever, but at least it won't be broken.
Hi sry for that question I'm totally new in the ansible world.
I'm migration from cloudalchemy and I'm wondering where do I find these roles in galaxy what is the name for them? I use prometheus and node_exporter.
Thx for the help and sry for that questions 😅
We should investigate if it would make sense to share some of the common task files across roles, to avoid duplicate code.
For example much of the selinux.yml
, install.yml
, configure.yml
is the same across roles.
We could collect those common task files into one role and just include the tasks we need from that role
So something like this:
From:
ansible/roles/node_exporter/tasks/main.yml
Lines 24 to 31 in d389ad2
to:
- name: SELinux
ansible.builtin.include_role:
name: prometheus.prometheus.common
tasks_from: selinux.yml
apply:
become: true
when: ansible_selinux.status == "enabled"
tags:
- node_exporter_configure
It's also possible to pass variables to the role when it's being imported, so we can essentially use the common role to create functions
which we can use across other roles.
Something like:
- name: Install
ansible.builtin.include_role:
name: prometheus.prometheus.common
tasks_from: install.yml
vars:
binary_url: https://example.tld/prometheus.tar.gz
user: node-exp
I am wondering how to pass --enable-feature=agent
to allow Prometheus running in agent mode: https://prometheus.io/blog/2021/11/16/agent/
I didn't find the info https://github.com/prometheus-community/ansible/tree/main/roles/node_exporter#requirements including the links inside.
Any guide would be appreciate, thanks! 😃
I'm seeing this issue when trying to exclude filesystem types from the filesystem collector of the node_exporter.
I'm using a variable like this:
node_exporter_enabled_collectors:
- filesystem:
fs-types-exclude: ^(nfs|autofs)$
Which results in this in node_exporter.service:
ExecStart=/usr/local/bin/node_exporter \
--collector.filesystem \
--collector.filesystem.fs-types-exclude='^(nfs|autofs)$' \
And - I see this in the logs:
caller=filesystem_common.go:113 level=info collector=filesystem msg="Parsed flag --collector.filesystem.fs-types-exclude" flag='^(nfs|autofs)$'
And these are some of the metrics scraped from the exporter
node_filesystem_size_bytes{device="auto.job",fstype="autofs",mountpoint="/job"} 0
node_filesystem_size_bytes{device="auto.net",fstype="autofs",mountpoint="/net"} 0
node_filesystem_size_bytes{device="host:/path",fstype="nfs",mountpoint="/net/path"} 1.099511627776e+13
node_filesystem_size_bytes{device="host:/path2",fstype="nfs",mountpoint="/net/path2"} 1.078874406912e+12
The same command run from the command line seems to parse the flag correctly:
collector=filesystem msg="Parsed flag --collector.filesystem.fs-types-exclude" flag=^(nfs|autofs)$
curl -s localhost:9100/metrics | grep -v ^# | grep -e autofs
#
I think the config file needs to either not have the quotes around the value, or quotes around the entire flag key and value - so, something like this in the template:
--collector.{{ name }}.{{ k }}={{ v }}
or
{{ ( '--collector.' + name + '.' + k + '=' + v ) | quote }}
I've only looked at this for node_exporter, but I guess this issue could be present in the other roles.
I've tested both these, and they both worked as expected.
Looks like 0.5.2 was never tagged, so the release workflow didn't run for 126e68e.
In the node_exporter.service.j2 template, ProtectHome
is set to read-only
instead of yes
only if /home
is a separate partition. However, it is possible that a filesystem might be mounted under /home
instead of at /home
. Because ProtectHome
is set to yes
in that case, node_exporter can't run statfs() on that filesystem.
(Copy of cloudalchemy/ansible-node-exporter#271)
See https://github.com/prometheus-community/ansible/actions/runs/6402688586/job/17858370682
Could be, of course, an issue with the "new and improved [i.e. broken and untested] Galaxy".
Due to a breaking change in Ansible 2.14, using the warn
parameter on command
triggers an error:
TASK [cloudalchemy.node_exporter : Gather currently installed node_exporter version (if any)] **********************************************************************************************
fatal: [node-1]: FAILED! => changed=false
msg: 'Unsupported parameters for (ansible.legacy.command) module: warn. Supported parameters include: _raw_params, executable, strip_empty_ends, stdin_add_newline, creates, chdir, removes, stdin, _uses_shell, argv.'
(This issue is a copy of cloudalchemy/ansible-node-exporter#276)
Would it be possible to add a task that would check that :
So far, this is possible by launching /usr/local/bin/blackbox_exporter --config.check --config.file="/etc/blackbox_exporter.yml"
Output of this cmd looks like below :
ts=2023-08-22T09:40:19.081Z caller=main.go:87 level=info msg="Config file is ok exiting..."
Hello,
It seems that the role is not gathering facts when I try to run it ; here is an extract of the run ;
PLAY [Monitoring] ***************************
TASK [prometheus.prometheus.prometheus : Validating arguments against arg spec 'main' - Installs and configures prometheus] ***********
fatal: [status.example.com]: FAILED! =>
msg: 'The task includes an option with an undefined variable. The error was: {{ go_arch_map[ansible_architecture] | default(ansible_architecture) }}: ''ansible_architecture'' is undefined. ''ansible_architecture'' is undefined. {{ go_arch_map[ansible_architecture] | default(ansible_architecture) }}: ''ansible_architecture'' is undefined. ''ansible_architecture'' is undefined. {{ go_arch_map[ansible_architecture] | default(ansible_architecture) }}: ''ansible_architecture'' is undefined. ''ansible_architecture'' is undefined. {{ go_arch_map[ansible_architecture] | default(ansible_architecture) }}: ''ansible_architecture'' is undefined. ''ansible_architecture'' is undefined'
It's weird because for every other role I use, there is a fact gathering stage, for example ;
PLAY [Install basic packages] *******************
TASK [Gathering Facts] ************************
ok: [example.com]
[...]
I've tried to add the gather_facts: true
option to the play, but it doesn't change anything.
Did I miss something with the setup ?
Here is the complete play ;
- name: Monitoring
hosts: status
become: true
gather_facts: true
roles:
- prometheus.prometheus.prometheus
- prometheus.prometheus.node_exporter
- prometheus.prometheus.mysqld_exporter
- prometheus.prometheus.systemd_exporter
- cloudalchemy.grafana
- geerlingguy.certbot
- geerlingguy.nginx
tags:
- stats
Thanks,
I'm using node_exporter with an authentication using this repository manually installed (see #18, commit 1e41dec)
- name: Setup node exporter
import_role:
name: prometheus.prometheus.node_exporter
vars:
node_exporter_version: 1.5.0
node_exporter_basic_auth_users:
prometheus: "toto"
node_exporter_web_telemetry_path: "/node-exporter"
Which was working with by the past but now replaces the hashed password by *0
TASK [prometheus.prometheus.node_exporter : Copy the node_exporter config file] ****************************************************************************************************************************************************************************************************
--- before: /etc/node_exporter/config.yaml
+++ after: /home/nereis/.ansible/tmp/ansible-local-15437bdzf6l09/tmpz9j85ndj/config.yaml.j2
@@ -5,4 +5,4 @@
basic_auth_users:
- prometheus: $2b...
+ prometheus: *0
cat /etc/node_exporter/config.yaml
---
#
# Ansible managed
#
basic_auth_users:
prometheus: *0
node-exporter v1.5.0
ansible 2.10.8
python 3.10.6
Thanks!
Using the node_exporter role without fully checking all the steps (mea culpa), I was astonished to find out that it had reduced the user I wanted to run node-exporter as to a no-login, no-home, system user. This then caused other issues with other services running under this user's name and that very much expect to have a home directory of their own.
I think the role should check whether the user already exists and not modify the user if so.
The service file should then also not include ProtectHome=yes
as this would effectively disable it from accessing its own home, where, in my case all the metrics prom files are written to.
Dear community,
Migrating from cloudalchemy.node_exporter and a previous ansible version, my playbook fails to validate node_exporter_tls_server_config.
Regards
Olivier
ogrosjeanne@bastion:~/ansible$ ansible --version
ansible [core 2.14.3]
config file = /etc/ansible/ansible.cfg
configured module search path = ['/home/ogrosjeanne/.ansible/plugins/modules', '/usr/share/ansible/plugins/modules']
ansible python module location = /usr/lib/python3/dist-packages/ansible
ansible collection location = /home/ogrosjeanne/.ansible/collections:/usr/share/ansible/collections
executable location = /usr/bin/ansible
python version = 3.10.6 (main, Nov 14 2022, 16:10:14) [GCC 11.3.0] (/usr/bin/python3)
jinja version = 3.0.3
libyaml = True
ogrosjeanne@bastion:~/ansible$ more prometheus-test.yml
- hosts: localhost
connection: local
roles:
- prometheus.prometheus.node_exporter
vars:
node_exporter_tls_server_config:
cert_file: /etc/node_exporter/certificate.cer
key_file: /etc/node_exporter/privateKey.pem
ogrosjeanne@bastion:~/ansible$ ansible-playbook prometheus-test.yml
PLAY [localhost] *********************************************************************************************************************************************************************************************************
TASK [Gathering Facts] ***************************************************************************************************************************************************************************************************
ok: [localhost]
TASK [prometheus.prometheus.node_exporter : Validating arguments against arg spec 'main' - Prometheus Node Exporter] *****************************************************************************************************
fatal: [localhost]: FAILED! => {"argument_errors": ["Invalid type dict for option '{'cert_file': '/etc/node_exporter/certificate.cer', 'key_file': '/etc/node_exporter/privateKey.pem'}', elements value check is supported only with 'list' type", "Invalid type dict for option '{}', elements value check is supported only with 'list' type", "Invalid type dict for option '{}', elements value check is supported only with 'list' type"], "argument_spec_data": {"node_exporter_basic_auth_users": {"description": "Dictionary of users and password for basic authentication. Passwords are automatically hashed with bcrypt.", "elements": "str", "type": "dict"}, "node_exporter_binary_local_dir": {"description": ["Enables the use of local packages instead of those distributed on github.", "The parameter may be set to a directory where the C(node_exporter) binary is stored on the host where ansible is run.", "This overrides the I(node_exporter_version) parameter"]}, "node_exporter_disabled_collectors": {"description": ["List of disabled collectors.", "By default node_exporter disables collectors listed L(here,https://github.com/prometheus/node_exporter#disabled-by-default)."], "elements": "str", "type": "list"}, "node_exporter_enabled_collectors": {"default": ["systemd", {"textfile": {"directory": "/var/lib/node_exporter"}}], "description": ["List of dicts defining additionally enabled collectors and their configuration.", "It adds collectors to L(those enabled by default,https://github.com/prometheus/node_exporter#enabled-by-default)."], "type": "list"}, "node_exporter_http_server_config": {"description": ["Config for HTTP/2 support.", "Keys and values are the same as in L(node_exporter docs,https://github.com/prometheus/node_exporter/blob/master/https/README.md#sample-config)."], "elements": "str", "type": "dict"}, "node_exporter_textfile_dir": {"default": "/var/lib/node_exporter", "description": ["Directory used by the L(Textfile Collector,https://github.com/prometheus/node_exporter#textfile-collector).", "To get permissions to write metrics in this directory, users must be in C(node-exp) system group.", "B(Note:) More information in TROUBLESHOOTING.md guide."]}, "node_exporter_tls_server_config": {"description": ["Configuration for TLS authentication.", "Keys and values are the same as in L(node_exporter docs,https://github.com/prometheus/node_exporter/blob/master/https/README.md#sample-config)."], "elements": "str", "type": "dict"}, "node_exporter_version": {"default": "1.1.2", "description": "Node exporter package version. Also accepts latest as parameter."}, "node_exporter_web_listen_address": {"default": "0.0.0.0:9100", "description": "Address on which node exporter will listen"}, "node_exporter_web_telemetry_path": {"default": "/metrics", "description": "Path under which to expose metrics"}}, "changed": false, "msg": "Validation of arguments failed:\nInvalid type dict for option '{'cert_file': '/etc/node_exporter/certificate.cer', 'key_file': '/etc/node_exporter/privateKey.pem'}', elements value check is supported only with 'list' type\nInvalid type dict for option '{}', elements value check is supported only with 'list' type\nInvalid type dict for option '{}', elements value check is supported only with 'list' type", "validate_args_context": {"argument_spec_name": "main", "name": "node_exporter", "path": "/home/ogrosjeanne/.ansible/collections/ansible_collections/prometheus/prometheus/roles/node_exporter", "type": "role"}}
PLAY RECAP ***************************************************************************************************************************************************************************************************************
localhost : ok=1 changed=0 unreachable=0 failed=1 skipped=0 rescued=0 ignored=0
Per this commit to node_exporter's README, a breaking change was introduced in 1.5.0. The --web.config
command line flag should be --web.config.file
.
(This issue is a copy of cloudalchemy/ansible-node-exporter#280)
Hi,
Thanks for your great work on this project and I am very glad that the node_exporter Ansible role is active again.
Would it be possible to add support for node_exporter on MacOS?
Reference: https://github.com/devops37/ansible-node-exporter/pull/1/files
My host is a MacBook Pro with M1 chip.
I installed the Ansible collection prometheus.prometheus
0.5.1.
I am trying to install node_exporter on a target machine which runs Ubuntu 22.04 ARM64 in Parallels Desktop by this playbook:
- name: Install Prometheus Node Exporter
hosts: hm-ubuntu
roles:
- role: prometheus.prometheus.node_exporter
However, I got error
export OBJC_DISABLE_INITIALIZE_FORK_SAFETY=true && \
ansible-playbook --inventory=inventory.yaml --vault-password-file=~/.ansible_vault_pass hm_ubuntu_group/playbook.yml
[WARNING]: Collection prometheus.prometheus does not support Ansible version 2.14.5
PLAY [Install Prometheus Node Exporter] **********************************************************************************************************************
TASK [Gathering Facts] ***************************************************************************************************************************************
ok: [hm-ubuntu]
TASK [prometheus.prometheus.node_exporter : Validating arguments against arg spec 'main' - Prometheus Node Exporter] *****************************************
ok: [hm-ubuntu]
TASK [prometheus.prometheus.node_exporter : Preflight] *******************************************************************************************************
included: /Users/hongbo-miao/.ansible/collections/ansible_collections/prometheus/prometheus/roles/node_exporter/tasks/preflight.yml for hm-ubuntu
TASK [prometheus.prometheus.node_exporter : Assert usage of systemd as an init system] ***********************************************************************
ok: [hm-ubuntu] => {
"changed": false,
"msg": "All assertions passed"
}
TASK [prometheus.prometheus.node_exporter : Install package fact dependencies] *******************************************************************************
ok: [hm-ubuntu]
TASK [prometheus.prometheus.node_exporter : Gather package facts] ********************************************************************************************
ok: [hm-ubuntu]
TASK [prometheus.prometheus.node_exporter : Naive assertion of proper listen address] ************************************************************************
ok: [hm-ubuntu] => {
"changed": false,
"msg": "All assertions passed"
}
TASK [prometheus.prometheus.node_exporter : Assert collectors are not both disabled and enabled at the same time] ********************************************
skipping: [hm-ubuntu]
TASK [prometheus.prometheus.node_exporter : Assert that TLS key and cert path are set] ***********************************************************************
skipping: [hm-ubuntu]
TASK [prometheus.prometheus.node_exporter : Check existence of TLS cert file] ********************************************************************************
skipping: [hm-ubuntu]
TASK [prometheus.prometheus.node_exporter : Check existence of TLS key file] *********************************************************************************
skipping: [hm-ubuntu]
TASK [prometheus.prometheus.node_exporter : Assert that TLS key and cert are present] ************************************************************************
skipping: [hm-ubuntu]
TASK [prometheus.prometheus.node_exporter : Check if node_exporter is installed] *****************************************************************************
ok: [hm-ubuntu]
TASK [prometheus.prometheus.node_exporter : Gather currently installed node_exporter version (if any)] *******************************************************
skipping: [hm-ubuntu]
TASK [prometheus.prometheus.node_exporter : Discover latest version] *****************************************************************************************
skipping: [hm-ubuntu]
TASK [prometheus.prometheus.node_exporter : Get checksum list from github] ***********************************************************************************
ok: [hm-ubuntu]
TASK [prometheus.prometheus.node_exporter : Get checksum for arm64] ******************************************************************************************
skipping: [hm-ubuntu] => (item=8aa0c275795b6812cdda6e3bca83b6412ea1c80ef1c7c2ceb364982b6f1a5d87 node_exporter-1.6.0.darwin-amd64.tar.gz)
skipping: [hm-ubuntu] => (item=7789559f3f12322e400741fc549bc71ee9c803f45fa5f64111ba67c51bd81bb4 node_exporter-1.6.0.darwin-arm64.tar.gz)
skipping: [hm-ubuntu] => (item=174a6586ee1376c869665cf736c6c44232c7d7a5305a1458e85f0741065e2b51 node_exporter-1.6.0.linux-386.tar.gz)
skipping: [hm-ubuntu] => (item=0b3573f8a7cb5b5f587df68eb28c3eb7c463f57d4b93e62c7586cb6dc481e515 node_exporter-1.6.0.linux-amd64.tar.gz)
ok: [hm-ubuntu] => (item=eb2f24626eca824c077cc7675d762bd520161c5c1a3f33c57b4b8aa0d452d613 node_exporter-1.6.0.linux-arm64.tar.gz)
skipping: [hm-ubuntu] => (item=cf47d88be69d3b40425e940ff3b05aa688eab3afa179419038be80631ca13061 node_exporter-1.6.0.linux-armv5.tar.gz)
skipping: [hm-ubuntu] => (item=6dbf0eaaefb9d865bcfa9b5dcf831f8659c71d8db87c7c489e1279c106c9c01a node_exporter-1.6.0.linux-armv6.tar.gz)
skipping: [hm-ubuntu] => (item=e050ec02091de91ab0f5d5164f685acc972616a78504eaa2597742945a5cf3b7 node_exporter-1.6.0.linux-armv7.tar.gz)
skipping: [hm-ubuntu] => (item=b52b843e2d12b0dfd9a74bbd71f55ba3d4cb7f19bdfb69f714bf6f12fd17fb07 node_exporter-1.6.0.linux-mips.tar.gz)
skipping: [hm-ubuntu] => (item=db8cc9bfc7ade29d1c72ea4ae36d1da1e7102f1fc321bf03207e93152f29be5c node_exporter-1.6.0.linux-mips64.tar.gz)
skipping: [hm-ubuntu] => (item=a10057f7be023bdfa88370d51348e26fcbaf78f38cb29da4b667d9e27010fa96 node_exporter-1.6.0.linux-mips64le.tar.gz)
skipping: [hm-ubuntu] => (item=f9e86c21e2dcee81aa5f419f8371936b533b177bcf1733accceb543364867f72 node_exporter-1.6.0.linux-mipsle.tar.gz)
skipping: [hm-ubuntu] => (item=db2b7e7f75a6fbe7078c88c3d9dc983ae3c0dc1e3a5757cc25fbd3ddc7bb9118 node_exporter-1.6.0.linux-ppc64.tar.gz)
skipping: [hm-ubuntu] => (item=c5621160e89be6aef86049b727fc855d41788b5ec0e348925c62038304409f1d node_exporter-1.6.0.linux-ppc64le.tar.gz)
skipping: [hm-ubuntu] => (item=8b2cb5213342fe9c72e2fabea2cb16897fda25360d7b32387046f4986a23f9a9 node_exporter-1.6.0.linux-s390x.tar.gz)
skipping: [hm-ubuntu] => (item=4192047557d9bfc54967f24ff239812d86f9a62c84be6f589a04b6e205833c2b node_exporter-1.6.0.netbsd-386.tar.gz)
skipping: [hm-ubuntu] => (item=9edc0c688471862895554b64aebb727b30f14839061654f141ca3fbc8d436c05 node_exporter-1.6.0.netbsd-amd64.tar.gz)
skipping: [hm-ubuntu] => (item=550a3f39ec021045f9b96d3c5a5ac0c70a32cab7b5e10fa61a296ea8666363ba node_exporter-1.6.0.openbsd-amd64.tar.gz)
TASK [prometheus.prometheus.node_exporter : Install] *********************************************************************************************************
included: /Users/hongbo-miao/.ansible/collections/ansible_collections/prometheus/prometheus/roles/node_exporter/tasks/install.yml for hm-ubuntu
TASK [prometheus.prometheus.node_exporter : Create the node_exporter group] **********************************************************************************
ok: [hm-ubuntu]
TASK [prometheus.prometheus.node_exporter : Create the node_exporter user] ***********************************************************************************
ok: [hm-ubuntu]
TASK [prometheus.prometheus.node_exporter : Download node_exporter binary to local folder] *******************************************************************
ok: [hm-ubuntu -> localhost]
TASK [prometheus.prometheus.node_exporter : Unpack node_exporter binary] *************************************************************************************
fatal: [hm-ubuntu -> localhost]: FAILED! => {"changed": false, "msg": "Failed to find handler for \"/Users/hongbo-miao/.ansible/tmp/ansible-tmp-1692123515.7587788-28109-144992350357591/source\". Make sure the required command to extract the file is installed.\nCommand \"/usr/bin/tar\" detected as tar type bsd. GNU tar required.\nCommand \"/usr/bin/unzip\" could not handle archive: End-of-central-directory signature not found. Either this file is not\n a zipfile, or it constitutes one disk of a multi-part archive. In the\n latter case the central directory and zipfile comment will be found on\n the last disk(s) of this archive.\nnote: /Users/hongbo-miao/.ansible/tmp/ansible-tmp-1692123515.7587788-28109-144992350357591/source may be a plain executable, not an archive\nunzip: cannot find zipfile directory in one of /Users/hongbo-miao/.ansible/tmp/ansible-tmp-1692123515.7587788-28109-144992350357591/source or\n /Users/hongbo-miao/.ansible/tmp/ansible-tmp-1692123515.7587788-28109-144992350357591/source.zip, and cannot find /Users/hongbo-miao/.ansible/tmp/ansible-tmp-1692123515.7587788-28109-144992350357591/source.ZIP, period.\n"}
PLAY RECAP ***************************************************************************************************************************************************
hm-ubuntu : ok=14 changed=0 unreachable=0 failed=1 skipped=7 rescued=0 ignored=0
This is my host macOS unzip version:
➜ unzip -v
UnZip 6.00 of 20 April 2009, by Info-ZIP. Maintained by C. Spieler. Send
bug reports using http://www.info-zip.org/zip-bug.html; see README for details.
Latest sources and executables are at ftp://ftp.info-zip.org/pub/infozip/ ;
see ftp://ftp.info-zip.org/pub/infozip/UnZip.html for other sites.
Compiled with gcc Apple LLVM 14.0.3 (clang-1403.0.22.11) [+internal-os] for Unix Mac OS X on Apr 14 2023.
UnZip special compilation options:
COPYRIGHT_CLEAN (PKZIP 0.9x unreducing method not supported)
SET_DIR_ATTRIB
SYMLINKS (symbolic links supported, if RTL and file system permit)
TIMESTAMP
UNIXBACKUP
USE_EF_UT_TIME
USE_UNSHRINK (PKZIP/Zip 1.x unshrinking method supported)
USE_DEFLATE64 (PKZIP 4.x Deflate64(tm) supported)
LARGE_FILE_SUPPORT (large files over 2 GiB supported)
ZIP64_SUPPORT (archives using Zip64 for large files supported)
VMS_TEXT_CONV
[decryption, version 2.11 of 05 Jan 2007]
UnZip and ZipInfo environment options:
UNZIP: [none]
UNZIPOPT: [none]
ZIPINFO: [none]
ZIPINFOOPT: [none]
This is my target Ubuntu unzip version:
parallels@ubuntu-linux-22-04-desktop:~$ unzip -v
UnZip 6.00 of 20 April 2009, by Debian. Original by Info-ZIP.
Latest sources and executables are at ftp://ftp.info-zip.org/pub/infozip/ ;
see ftp://ftp.info-zip.org/pub/infozip/UnZip.html for other sites.
Compiled with gcc 11.2.0 for Unix (Linux ELF).
UnZip special compilation options:
ACORN_FTYPE_NFS
COPYRIGHT_CLEAN (PKZIP 0.9x unreducing method not supported)
SET_DIR_ATTRIB
SYMLINKS (symbolic links supported, if RTL and file system permit)
TIMESTAMP
UNIXBACKUP
USE_EF_UT_TIME
USE_UNSHRINK (PKZIP/Zip 1.x unshrinking method supported)
USE_DEFLATE64 (PKZIP 4.x Deflate64(tm) supported)
UNICODE_SUPPORT [wide-chars, char coding: UTF-8] (handle UTF-8 paths)
LARGE_FILE_SUPPORT (large files over 2 GiB supported)
ZIP64_SUPPORT (archives using Zip64 for large files supported)
USE_BZIP2 (PKZIP 4.6+, using bzip2 lib version 1.0.8, 13-Jul-2019)
VMS_TEXT_CONV
WILD_STOP_AT_DIR
[decryption, version 2.11 of 05 Jan 2007]
UnZip and ZipInfo environment options:
UNZIP: [none]
UNZIPOPT: [none]
ZIPINFO: [none]
ZIPINFOOPT: [none]
Any help would be appreciate, thanks!
Hi I tried to used the config as mentioned and on the remote system with the right package. I get the following error from ansible.
- hosts: public
pre_tasks:
- name: Install passlib python package
ansible.builtin.pip:
name: passlib[bcrypt]
- name: Create node_exporter cert dir
file:
path: "/etc/node_exporter"
state: directory
owner: root
group: root
- name: Generate an OpenSSL private key with the default values (4096 bits, RSA)
community.crypto.openssl_privatekey:
path: /etc/node_exporter/tls.key
- name: Generate an OpenSSL Certificate Signing Request
community.crypto.openssl_csr:
path: /etc/node_exporter/tls.csr
privatekey_path: /etc/node_exporter/tls.key
common_name: "{{ ansible_hostname}}"
- name: Create cert and key
community.crypto.x509_certificate:
path: /etc/node_exporter/tls.cert
csr_path: /etc/node_exporter/tls.csr
privatekey_path: /etc/node_exporter/tls.key
provider: selfsigned
roles:
- prometheus.prometheus.node_exporter
vars:
node_exporter_tls_server_config:
cert_file: /etc/node_exporter/tls.cert
key_file: /etc/node_exporter/tls.key
node_exporter_basic_auth_users:
export_user: "OOkxA0L1M7KLPgwedwedW9DPL8I0XwvAq65oI"
The full traceback is:
Traceback (most recent call last):
File "/home/lukas/.local/pipx/venvs/ansible-base/lib/python3.11/site-packages/ansible/plugins/action/template.py", line 139, in run
resultant = self._templar.do_template(template_data, preserve_trailing_newlines=True, escape_backslashes=False)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/lukas/.local/pipx/venvs/ansible-base/lib/python3.11/site-packages/ansible/template/__init__.py", line 1066, in do_template
res = j2_concat(rf)
^^^^^^^^^^^^^
File "<template>", line 88, in root
File "/home/lukas/.local/pipx/venvs/ansible-base/lib/python3.11/site-packages/ansible/template/__init__.py", line 264, in wrapper
ret = func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/home/lukas/.local/pipx/venvs/ansible-base/lib/python3.11/site-packages/ansible/plugins/filter/core.py", line 273, in get_encrypted_password
return passlib_or_crypt(password, hashtype, salt=salt, salt_size=salt_size, rounds=rounds)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/lukas/.local/pipx/venvs/ansible-base/lib/python3.11/site-packages/ansible/utils/encrypt.py", line 203, in passlib_or_crypt
return CryptHash(algorithm).hash(secret, salt=salt, salt_size=salt_size, rounds=rounds)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/lukas/.local/pipx/venvs/ansible-base/lib/python3.11/site-packages/ansible/utils/encrypt.py", line 100, in hash
return self._hash(secret, salt, rounds)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/lukas/.local/pipx/venvs/ansible-base/lib/python3.11/site-packages/ansible/utils/encrypt.py", line 119, in _hash
result = crypt.crypt(secret, saltstring)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.11/crypt.py", line 86, in crypt
return _crypt.crypt(word, salt)
^^^^^^^^^^^^^^^^^^^^^^^^
OSError: [Errno 22] Invalid argument
fatal: [ares.mldsc.de]: FAILED! => {
"changed": false,
"msg": "OSError: [Errno 22] Invalid argument"
}
Just to be sure. the password are hashed on the remote system right?
Hello,
any plans to integrate bind_exporter?
Thanks!
Using v0.4.0 of this collection, I get the following warning message. I believe this commit is meant to resolve the problem but maybe not.
$ ansible-playbook -i inventory site.yaml
[WARNING]: Collection prometheus.prometheus does not support Ansible version 2.14.5
This are the versions I'm running and I didn't run in any issue using v2.14.5.
$ ansible --version
ansible [core 2.14.5]
config file = /home/jlosito/workspace/infrastructure/ansible.cfg
configured module search path = ['/home/jlosito/.ansible/plugins/modules', '/usr/share/ansible/plugins/modules']
ansible python module location = /home/jlosito/workspace/infrastructure/.venv/lib64/python3.11/site-packages/ansible
ansible collection location = /home/jlosito/.ansible/collections:/usr/share/ansible/collections
executable location = /home/jlosito/workspace/infrastructure/.venv/bin/ansible
python version = 3.11.3 (main, Apr 5 2023, 00:00:00) [GCC 13.0.1 20230401 (Red Hat 13.0.1-0)] (/home/jlosito/workspace/infrastructure/.venv/bin/python3)
jinja version = 3.1.2
libyaml = True
Hi,
Any idea why when installing prometheus using this ansible role it sometimes sets target with the right hostname and sometimes sets localhost as per image below (in the same instance)?
prometheus_scrape_configs:
- job_name: "prometheus"
metrics_path: "{{ prometheus_metrics_path }}"
static_configs:
- targets:
- "{{ ansible_fqdn | default(ansible_host) | default('localhost') }}:9090"
The new version supports multiple config files, so we could eventually support that.
Originally posted by @SuperQ in #211 (comment)
Hi,
snmp_node_exporter install seems broken. The task prometheus.prometheus.snmp_exporter : Download snmp_exporter binary to local folder
download the binary to a "random" file ("dest": "/tmp/8490f100-eb94-11ea-87ba-182377c269c8"
).
The task prometheus.prometheus.snmp_exporter : Unpack snmp_exporter binary
want to unpack file from '/tmp/snmp_exporter-0.19.0.linux-amd64.tar.gz'
.
TASK [prometheus.prometheus.snmp_exporter : Download snmp_exporter binary to local folder] ****************************************************************************************************************************************************************************************************************************************************
vendredi 05 mai 2023 16:01:30 +0200 (0:00:00.050) 0:01:17.256 **********
vendredi 05 mai 2023 16:01:30 +0200 (0:00:00.050) 0:01:17.256 **********
ok: [hnld-ptls-amazonl2-prometheus-dev01 -> localhost] => {
"attempts": 1,
"changed": false,
"checksum_dest": "ff333c8409a9587720d2b55c31335f2290bc46c7",
"checksum_src": "ff333c8409a9587720d2b55c31335f2290bc46c7",
"dest": "/tmp/8490f100-eb94-11ea-87ba-182377c269c8",
"elapsed": 1,
"gid": 1000,
"group": "seb",
"md5sum": "5bccac10fb0d258148c488c30c786eca",
"mode": "0644",
"owner": "seb",
"size": 7313658,
"src": "/home/seb/.ansible/tmp/ansible-tmp-1683295290.88605-270186-112912528244913/tmpetxjw74o",
"state": "file",
"status_code": 200,
"uid": 1000,
"url": "https://github.com/prometheus/snmp_exporter/releases/download/v0.19.0/snmp_exporter-0.19.0.linux-amd64.tar.gz"
}
MSG:
OK (7313658 bytes)
TASK [prometheus.prometheus.snmp_exporter : Unpack snmp_exporter binary] **********************************************************************************************************************************************************************************************************************************************************************
vendredi 05 mai 2023 16:01:32 +0200 (0:00:01.678) 0:01:18.935 **********
vendredi 05 mai 2023 16:01:32 +0200 (0:00:01.679) 0:01:18.935 **********
An exception occurred during task execution. To see the full traceback, use -vvv. The error was: If you are using a module and expect the file to exist on the remote, see the remote_src option
fatal: [hnld-ptls-amazonl2-prometheus-dev01 -> localhost]: FAILED! => {
"changed": false
}
MSG:
Could not find or access '/tmp/snmp_exporter-0.19.0.linux-amd64.tar.gz' on the Ansible Controller.
If you are using a module and expect the file to exist on the remote, see the remote_src option
The task Install package fact dependencies inside the task-file preflight.yml inside the node_exporter role needs to be run a root. I noticed that while installing the node-exporter on my ubuntu server.
The task should be changed from:
ansible/roles/node_exporter/tasks/preflight.yml
Lines 7 to 16 in 3bc2c94
to:
- name: Install package fact dependencies
become: true
ansible.builtin.package:
name: "{{ _pkg_fact_req }}"
state: present
when: (_pkg_fact_req)
vars:
_pkg_fact_req: "{% if (ansible_pkg_mgr == 'apt') %}\
{{ ('python-apt' if ansible_python_version is version('3', '<') else 'python3-apt') }}
{% else %}\
{% endif %}"
As a workaround you could include the role with root privileges:
- name: Install node-exporter
ansible.builtin.include_role:
name: prometheus.prometheus.node_exporter
apply:
become: true
Thank you!
Background:
We are using RedHat with SELinux enabled in an offline environment.
When SELinux is enabled, the node_exporter playbook wants to make sure two packages are present: python3-libselinux and python3-policycoreutils.
Problem:
The task to check for the presence of the packages always fails due to Yum which is unable to fetch metadata from the Yum repositories. Note that the packages are already installed so why does Yum wants to fetch repo metadata.
I've noticed that when I run "yum check-update" beforehand with internet temporarily enabled, the task does succeed afterwards when offline...
Description
The "Create textfile collector dir" task in the node_exporter role for the Prometheus Ansible collection is consistently reporting a "changed" state upon every run. The task in question
Other system services generate their own metrics and place the corresponding metric files in the "{{ node_exporter_textfile_dir }}" directory. These services belong to the "{{ node_exporter_system_group }}" group, enabling them to change the group ownership of created metric files to the desired "{{ node_exporter_system_group }}". The node-exporter can read from these metric files, but the services are unable to change the ownership to the "{{ node_exporter_system_user }}". This results in a "changed" state for the "Create textfile collector dir" task after every run, as long as there are new or updated metric files within the "{{ node_exporter_textfile_dir }}" directory.
Problem
This issue is problematic because I monitor Ansible changes for security reasons and to detect any unexpected changes. Constant changes are undesirable and should not occur under normal circumstances.
Expected Outcome:
When all files and directories within the node_exporter_textfile_dir directory have the correct group set, no Ansible changes should be reported. The role should neither alter the "user" for files nor report any changes regarding this.
Solution Proposal
I have tried to come up with a suitable solution but have been unsuccessful. Unfortunately, there is no associated PR. Possible solutions could include:
recurse: false
- This would set the desired ownership for the directory itself, allowing the user to manage and verify that text metric files have the correct group, which node-exporter can read from.changed: false
- This option is also not ideal.I am unsure how to address this issue and welcome any suggestions or proposals for a resolution.
Is it possible to add a new role for smartctl_exporter?
today start getting this error when running the prometmeus.prometheus role, literally yesterday it was working
Options no_log: false and true doesn't work
FAILED! => {"censored": "the output has been hidden due to the fact that 'no_log: true' was specified for this result", "changed": false}
Ansible version that i'm using
# ansible --version
ansible [core 2.12.10]
config file = /etc/ansible/ansible.cfg
configured module search path = ['/root/.ansible/plugins/modules', '/usr/share/ansible/plugins/modules']
ansible python module location = /usr/lib/python3/dist-packages/ansible
ansible collection location = /root/.ansible/collections:/usr/share/ansible/collections
executable location = /usr/bin/ansible
python version = 3.8.10 (default, Mar 13 2023, 10:26:41) [GCC 9.4.0]
jinja version = 2.10.1
libyaml = True
Inventiry file
[prometheus]
prometheus.server.ip
playbook file:
---
- hosts: prometheus
become: true
roles:
- prometheus.prometheus.prometheus
vars:
prometheus_version: 2.40.7
prometheus_web_listen_address: "0.0.0.0:9090"
prometheus_alert_rules_files: /etc/ansible/files/*.rules
prometheus_web_config:
tls_server_config: {}
http_server_config: {}
basic_auth_users: {}
prometheus_alertmanager_config:
- scheme: http
basic_auth: {}
static_configs:
- targets: ["127.0.0.1:9093"]
prometheus_targets:
node-exporter:
- targets:
- targert:9100
labels:
labels: label
prometheus_scrape_configs:
- job_name: "prometheus"
metrics_path: "{{ prometheus_metrics_path }}"
static_configs:
- targets:
- "{{ ansible_fqdn | default(ansible_host) | default('localhost') }}:9090"
- job_name: "node-exporter"
file_sd_configs:
- files:
- "{{ prometheus_config_dir }}/file_sd/node-exporter.yml"
The blackbox_exporter
role has a couple of config issues.
/etc
directly.When using role prometheus.prometheus.snmp_exporter
, the task Reload snmp exporter
failed:
RUNNING HANDLER [prometheus.prometheus.snmp_exporter : Reload snmp exporter] ********************************************************************************
vendredi 05 mai 2023 17:51:43 +0200 (0:00:03.534) 0:01:37.901 **********
vendredi 05 mai 2023 17:51:43 +0200 (0:00:03.534) 0:01:37.901 **********
fatal: [hnld-ptls-amazonl2-prometheus-dev01]: FAILED! => {
"changed": false
}
MSG:
failure 1 during daemon-reload: Failed to execute operation: The name org.freedesktop.PolicyKit1 was not provided by any .service files
So this role can install prometheus from a binary tarball on GitHub (which I feel is a questionable practice, but that's another issue). In our environments, we prefer packages vetted by third parties, namely for auto-upgrades and so on, so we rely on Debian packages.
It's possible to completely skip the install with prometheus_skip_install
, but then the role will crash with:
[prometheus.debian.net]: FAILED! => {"changed": false, "checksum": "df2496d755394a2fd11e01e3b4bd566418adc796", "cmd": "/usr/local/bin/promtool check rules /home/anarcat/.ansible/tmp/ansible-tmp-1695839321.1357365-156317-255727393866530/source", "msg": "[Errno 2] No such file or directory: b'/usr/local/bin/promtool'", "rc": 2, "stderr": "", "stderr_lines": [], "stdout": "", "stdout_lines": []}
That's because promtool is in /usr/bin/promtool
and not /usr/local/bin/promtool
as expected...
There's all sorts of issues like this around the package. For example, the prometheus.service
unit file is overridden, poorly IMHO:
--- /usr/lib/systemd/system/prometheus.service 2023-02-04 00:20:19.000000000 +0000
+++ /etc/systemd/system/prometheus.service 2023-09-27 18:46:39.470561824 +0000
@@ -1,39 +1,53 @@
+#
+# Ansible managed
+#
+
[Unit]
-Description=Monitoring system and time series database
-Documentation=https://prometheus.io/docs/introduction/overview/ man:prometheus(1)
-After=time-sync.target
+Description=Prometheus
+After=network-online.target
+Requires=local-fs.target
+After=local-fs.target
[Service]
-Restart=on-failure
+Type=simple
+Environment="GOMAXPROCS=2"
User=prometheus
-EnvironmentFile=/etc/default/prometheus
-ExecStart=/usr/bin/prometheus $ARGS
+Group=prometheus
ExecReload=/bin/kill -HUP $MAINPID
-TimeoutStopSec=20s
-SendSIGKILL=no
+ExecStart=/usr/bin/prometheus \
+ --storage.tsdb.path=/var/lib/prometheus \
+ --storage.tsdb.retention.time=365d \
+ --storage.tsdb.retention.size=0 \
+ --web.config.file=/etc/prometheus/web.yml \
+ --web.console.libraries=/etc/prometheus/console_libraries \
+ --web.console.templates=/etc/prometheus/consoles \
+ --web.listen-address=0.0.0.0:9090 \
+ --web.external-url= \
+ --config.file=/etc/prometheus/prometheus.yml
-# systemd hardening-options
-AmbientCapabilities=
-CapabilityBoundingSet=
-DeviceAllow=/dev/null rw
-DevicePolicy=strict
-LimitMEMLOCK=0
-LimitNOFILE=32768
+CapabilityBoundingSet=CAP_SET_UID
+LimitNOFILE=65000
LockPersonality=true
-MemoryDenyWriteExecute=true
NoNewPrivileges=true
+MemoryDenyWriteExecute=true
PrivateDevices=true
PrivateTmp=true
+ProtectHome=true
+RemoveIPC=true
+RestrictSUIDSGID=true
+#SystemCallFilter=@signal @timer
+
+ReadWritePaths=/var/lib/prometheus
+
PrivateUsers=true
ProtectControlGroups=true
-ProtectHome=true
ProtectKernelModules=true
ProtectKernelTunables=true
-ProtectSystem=full
-RemoveIPC=true
-RestrictNamespaces=true
-RestrictRealtime=true
-SystemCallArchitectures=native
+ProtectSystem=strict
+
+
+SyslogIdentifier=prometheus
+Restart=always
[Install]
WantedBy=multi-user.target
now some of those are due to the role's entries not being sorted the same way (if at all, actually), but some are real issues. Restart=always
(or, for that matter, even =on-failure
) in particular, is dangerous, see https://bugs.debian.org/1022724
There should be a way to leave all of that stuff alone and focus on configuring the package instead...
mysqld_exporter_password
is not documented in the Readme and should have no_log set to true in argument-spec:
https://docs.ansible.com/ansible/latest/dev_guide/developing_program_flow_modules.html#argument-spec
Currently, the task prometheus.prometheus.mysqld_exporter : Copy the mysqld_exporter config file
is leaking the password.
Starting from version 2.43.0 of Prometheus, we have a new section available in the configuration file prometheus.yml called 'scrape_config_files,' where you can specify the paths of files containing the scrape_config settings.
Would it be possible to implement this functionality by adding the 'scrape_config_files' attribute to the prometheus.yml.j2 template if it is defined in a variable, for example, 'prometheus_scrape_config_files'? It could be a list of paths."
The alertmanager binary supports mutliple listen addresses, from --help
:
--web.listen-address=:9093 ...
Addresses on which to expose metrics and web
interface. Repeatable for multiple addresses.
The current service template and variable does not seem to support this. I edited manually to check what it would need to look like:
ExecStart=/usr/local/bin/alertmanager \
...
--web.listen-address=10.10.20.23:9093 \
--web.listen-address=127.0.1.1:9093 \
...
$ netstat -nlptu | rg alertmanager | column -t
tcp 0 0 127.0.1.1:9093 0.0.0.0:* LISTEN 105724/alertmanager
tcp 0 0 10.10.20.23:9093 0.0.0.0:* LISTEN 105724/alertmanager
I guess for this change to be backwards compatible, the alertmanager_web_listen_address
variable needs to be checked if it's either a single value or a list.
[WARNING]: Collection prometheus.prometheus does not support Ansible version
2.15.2
Running 0.5.1 release of the collection. Is it looking at the minor version of Ansible and failing?
The error says:
TASK [prometheus.prometheus.node_exporter : Discover latest version] ***************************************************************************************************************************************************************************************************************
ok: [host] => {"ansible_facts": {"node_exporter_version": "1.5.0"}, "attempts": 1, "changed": false}
TASK [prometheus.prometheus.node_exporter : Get checksum list from github] *********************************************************************************************************************************************************************************************************
fatal: [host]: FAILED! => {"msg": "An unhandled exception occurred while running the lookup plugin 'url'. Error was a <class 'ansible.errors.AnsibleError'>, original message: Received HTTP error for https://github.com/prometheus/node_exporter/releases/download/vlatest/sha256sums.txt : HTTP Error 404: Not Found. Received HTTP error for https://github.com/prometheus/node_exporter/releases/download/vlatest/sha256sums.txt : HTTP Error 404: Not Found"}
As seen from log above, it retrieved version correctly, but somehow failed to override the variable.
Currently there is no easy way to remove custom rules. Since we are using copy, if we remove any local rule, it won't be removed in the server.
I wanted to quickly update my Prometheus config using the tag "prometheus_config". The playbook didn't work because the role uses "include_tasks".
I see 2 solutions:
include_tasks
to import_tasks
.apply: {tags: ...}
to all ìnclude_tasks` call.Is there a reason why this collection prefer include_tasks
?
This affects at least the systemd service file for the snmp_exporter.
According to https://wiki.ubuntu.com/nobody the 'nobody' user should only be used for nfs and daemons should not use this user, as multiple daemons with this user could affect each other.
Against nobody:
https://en.wikipedia.org/wiki/Nobody_(username)
Answers from Platforms like askbuntu unix.stackexchange etc.
Systemd logs the following message: /etc/systemd/system/snmp_exporter.service:8: Special user nobody configured, this is not safe!
"For" nobody:
https://wiki.debian.org/SystemGroups#Groups_with_an_associated_user
Alternatives:
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.