Git Product home page Git Product logo

ansible's Introduction

ansible's People

Contributors

alkinks avatar andygrunwald avatar anviar avatar chrisbra avatar cudevmaxwell avatar francisco-core avatar gardar avatar github-actions[bot] avatar haavard avatar iisojunn avatar jonesbusy avatar laurent-indermuehle avatar manzsolutions-lpr avatar mark-tomich avatar mikaellanger avatar molpako avatar msterhuj avatar paulfantom avatar prombot avatar renovate[bot] avatar rezizter avatar ritmas avatar rooty0 avatar runofthemill avatar scarletblizzard avatar sebbbastien avatar spirit-act avatar superq avatar tvenieris avatar wookietreiber avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

ansible's Issues

SSH fail during node_exporter installation

Hello the node exporter installation is failing during this part:
TASK [prometheus.prometheus.node_exporter : Download node_exporter binary to local folder] ************************
fatal: [...*]: UNREACHABLE! => {"changed": false, "msg": "Failed to connect to the host via ssh: ubuntu@localhost: Permission denied (publickey).", "unreachable": true}

I tried this line and no errors were obtained but the installation ended up being moved to the provisioning machine and not in the target:
ansible-playbook nodeexp.yml -i inventory --connection=local

I'm using the simplest example of playbook and all the other tasks executed fine

  • hosts: targets
    become: true
    roles:
    • prometheus.prometheus.node_exporter

node_exporter role failing with 'AnsibleUnicode' object is not callable

TASK [prometheus.prometheus.node_exporter : Copy the node_exporter systemd service file] ************************************************************************************************************
An exception occurred during task execution. To see the full traceback, use -vvv. The error was: ): 'AnsibleUnicode' object is not callable. 'AnsibleUnicode' object is not callable
fatal: [agn-an-afg-prod-republisher]: FAILED! => changed=false 
  msg: |-
    AnsibleError: Unexpected templating type error occurred on ({{ ansible_managed | comment }}
  
    [Unit]
    Description=Prometheus Node Exporter
    After=network-online.target
  
    [Service]
    Type=simple
    User={{ node_exporter_system_user }}
    Group={{ node_exporter_system_group }}
    ExecStart={{ node_exporter_binary_install_dir }}/node_exporter \
    {% for collector in node_exporter_enabled_collectors -%}
    {%   if not collector is mapping %}
        '--collector.{{ collector }}' \
    {%   else -%}
    {%     set name, options = (collector.items()|list)[0] -%}
        '--collector.{{ name }}' \
    {%     for k,v in options|dictsort %}
        '--collector.{{ name }}.{{ k }}={{ v }}' \
    {%     endfor -%}
    {%   endif -%}
    {% endfor -%}
    {% for collector in node_exporter_disabled_collectors %}
        '--no-collector.{{ collector }}' \
    {% endfor %}
    {% if node_exporter_tls_server_config | length > 0 or node_exporter_http_server_config | length > 0 or node_exporter_basic_auth_users | length > 0 %}
        {% if node_exporter_version is version('1.5.0', '>=') %}
        '--web.config.file=/etc/node_exporter/config.yaml' \
        {% else %}
        '--web.config=/etc/node_exporter/config.yaml' \
        {% endif %}
    {% endif %}
        '--web.listen-address={{ node_exporter_web_listen_address }}' \
        '--web.telemetry-path={{ node_exporter_web_telemetry_path }}'
  
    SyslogIdentifier=node_exporter
    Restart=always
    RestartSec=1
    StartLimitInterval=0
  
    {% set ns = namespace(protect_home = 'yes') %}
    {% for m in ansible_mounts if m.mount.startswith('/home') %}
    {%   set ns.protect_home = 'read-only' %}
    {% endfor %}
    {% if node_exporter_textfile_dir.startswith('/home') %}
    {%   set ns.protect_home = 'read-only' %}
    {% endif %}
    ProtectHome={{ ns.protect_home }}
    NoNewPrivileges=yes
  
    {% if (ansible_facts.packages.systemd | first).version is version('232', '>=') %}
    ProtectSystem=strict
    ProtectControlGroups=true
    ProtectKernelModules=true
    ProtectKernelTunables=yes
    {% else %}
    ProtectSystem=full
    {% endif %}
  
    [Install]
    WantedBy=multi-user.target
    ): 'AnsibleUnicode' object is not callable. 'AnsibleUnicode' object is not callable

Version:

ansible [core 2.15.2]
  config file = /workspace/myproject/ansible.cfg
  configured module search path = ['/home/gitpod/.ansible/plugins/modules', '/usr/share/ansible/plugins/modules']
  ansible python module location = /usr/lib/python3/dist-packages/ansible
  ansible collection location = /workspace/myproject/collections
  executable location = /usr/bin/ansible
  python version = 3.10.12 (main, Jun 11 2023, 05:26:28) [GCC 11.4.0] (/usr/bin/python3)
  jinja version = 3.0.3
  libyaml = True

Collection version: 0.7.0

node_exporter variables I am defining:

node_exporter_version: latest

[systemd_exporter] Unit fails to start because template adds unknown flags.

systemd_exporter.service failed on startup with error: unknown long flag '--collector.enable-restart-count' after adding vars to my playbook:

- hosts: prometheus
  roles:
    - prometheus.prometheus.systemd_exporter
  vars:
    systemd_exporter_enable_ip_accounting: true
    systemd_exporter_enable_restart_count: true

Logs:

ahoyt@prom1:~$ sudo journalctl --unit systemd_exporter --since "10 minutes ago"

Jul 13 19:26:12 prom1 systemd[1]: Started Prometheus SystemD Exporter.
Jul 13 19:26:12 prom1 systemd_exporter[45867]: systemd_exporter: error: unknown long flag '--collector.enable-restart-count', try --help
Jul 13 19:26:12 prom1 systemd[1]: systemd_exporter.service: Main process exited, code=exited, status=1/FAILURE
Jul 13 19:26:12 prom1 systemd[1]: systemd_exporter.service: Failed with result 'exit-code'.
Jul 13 19:26:14 prom1 systemd[1]: systemd_exporter.service: Scheduled restart job, restart counter is at 6104.
Jul 13 19:26:14 prom1 systemd[1]: Stopped Prometheus SystemD Exporter.
Jul 13 19:26:14 prom1 systemd[1]: Started Prometheus SystemD Exporter.

It works after prepending systemd. to the flags in systemd_exporter.service.j2 to match https://github.com/prometheus-community/systemd_exporter/blob/main/README.md

Tested with an Ubuntu 22.04.2 LTS VM.

Warning message does not support Ansible version 2.14.6

still get warning message in ansible 2.14.6

$ ansible-playbook --inventory inventory
TASK [include roles] **************************************************************************
[WARNING]: Collection prometheus.prometheus does not support Ansible version 2.14.6

in https://github.com/prometheus-community/ansible/blob/main/meta/runtime.yml#L2
the supported ansible version is 2.9.0*, 2.10.0*, 2.11.0*, 2.12.0*, 2.13.0*, 2.14.0*, 2.15.0*

would it be ">= 2.9.0, < 2.16.0" ?

Python crash with prometheus role

$ uname -a
Darwin johns-mbp.local 22.5.0 Darwin Kernel Version 22.5.0: Thu Jun  8 22:22:20 PDT 2023; root:xnu-8796.121.3~7/RELEASE_ARM64_T6000 arm64

$ ansible --version
ansible [core 2.11.7]
  config file = /Users/jmaguire/src/my-ansible/ansible.cfg
  configured module search path = ['/Users/jmaguire/src/my-ansible/library']
  ansible python module location = /opt/homebrew/lib/python3.9/site-packages/ansible
  ansible collection location = /Users/jmaguire/.ansible/collections:/usr/share/ansible/collections
  executable location = /opt/homebrew/bin/ansible
  python version = 3.9.17 (main, Jun  6 2023, 14:33:55) [Clang 14.0.3 (clang-1403.0.22.14.1)]
  jinja version = 3.0.3
  libyaml = True
---
- name: Provision Prometheus
  hosts: prometheus
  roles:
    - role: prometheus.prometheus.prometheus
      tags: [ prom ]
...

TASK [prometheus.prometheus.prometheus : Discover latest version] *********************************************************************************************
task path: /Users/jmaguire/.ansible/collections/ansible_collections/prometheus/prometheus/roles/prometheus/tasks/preflight.yml:76
skipping: [tornado.johnmaguire.me] => {
    "changed": false,
    "skip_reason": "Conditional result was False"
}

TASK [prometheus.prometheus.prometheus : Get checksum list] ***************************************************************************************************
task path: /Users/jmaguire/.ansible/collections/ansible_collections/prometheus/prometheus/roles/prometheus/tasks/preflight.yml:93
url lookup connecting to https://github.com/prometheus/prometheus/releases/download/v2.44.0/sha256sums.txt
objc[49407]: +[__NSCFConstantString initialize] may have been in progress in another thread when fork() was called.
objc[49407]: +[__NSCFConstantString initialize] may have been in progress in another thread when fork() was called. We cannot safely call it or ignore it in the fork() child process. Crashing instead. Set a breakpoint on objc_initializeAfterForkError to debug.
ERROR! A worker was found in a dead state
Screenshot 2023-07-23 at 12 30 47 AM

Crash Report

Request for Support of time_intervals in Alertmanager Configuration

I'd like to request the enhancement of the Ansible role for Alertmanager to support the time_intervals feature in its configuration. This feature allows specifying time-based intervals for alert management, and it would be a valuable addition to the existing functionality.

Here is an example of how the time_intervals could be structured in YAML configuration:

time_intervals:
  - name: office-hours
    time_intervals:
      - times:
          - start_time: "08:00"
            end_time: "17:00"
        weekdays: ['Monday:Friday']

The proposed enhancement is based on the Prometheus Alerting Configuration documentation, specifically detailed here: Prometheus Alerting Configuration - time_interval.

I recommend introducing new variables, possibly named alertmanager_time_intervals, and updating the Ansible template to accommodate this new configuration option.

Thank you for considering this enhancement request, and please let me know if you need any further information or assistance in implementing this feature.

design for conf.d style support

Hi,

this is more a discussion/question or design proposal than an issue, but since this repo has discussion not enabled
I'll put it in an issue. (Would you mind enabling discussions on this repo?)

Prometheus does not support conf.d style config includes for static scrape configs and leaves this task to configuration management tools like ansible.

It would be nice if the ansible roles in this repo would support a design where every exporter role could simply drop in their static scrape_config files on the prometheus server (via delegate_to) in a folder like /etc/prometheus/conf.d and the global config is then constructed by assembling the files in that folder.

So the changes to the current roles would be fairly small:

  • prometheus role:
    • generate base config in /etc/prometheus/conf.d/1_base.yml
    • assembles /etc/prometheus/prometheus.yml based on the contents of /etc/prometheus/conf.d
  • exporter roles (node_exporter, ...) place their scrape_configs on the prometheus server (via delegate_to) and trigger an assemble handler. For rules files this is already supported out of the box by prometheus.

What do you think about it?

I could prepare a PR for it.

Background:
I'm maintaining a role and would like to place the scrape configs on the prometheus server while maintaining compatibility with the prometheus role.
I'm currently creating the scrape config files on the prometheus server and trigger an assemble handler:
nusenu/ansible-relayor@24878b5#diff-67da321934d1a76cffaa1feed7ef7899327b68724357089bf6eeb4af62103715
but this is not compatible with the current prometheus role. The proposed design would make it possible for arbitrary exporter roles to drop in their scrape configs while still maintaining compatibility.

Unfortunately it is not possible to use file_sd_config as a solution, as far as I know, because it does not support setting the metrics_path and a single server has may exporters (example.com/exporter1, example.com/exporter2, ...).

`requires_ansible` parsing warning

On Ansible Core 2.14.2 (Ansible 7.2.0) I got the following warning:

[WARNING]: Error parsing collection metadata requires_ansible value from collection prometheus.prometheus: Invalid specifier: '2.9'

node_exporter: Download node_exporter step failing due to an extra space

I get the following error when trying to execute a playbook with the latest revision of node_exporter role.

fatal: [REDACTED-> localhost]: FAILED! => 
{
  "attempts": 5, 
  "changed": false, 
  "dest": "/tmp/node_exporter-1.5.0.linux-amd64.tar.gz", 
  "elapsed": 0, 
  "msg": "An unknown error occurred: URL can't contain control characters. '/prometheus/node_exporter/releases/download/v1.5.0/ node_exporter-1.5.0.linux-amd64.tar.gz' (found at least ' ')", 
  "url": "https://github.com/prometheus/node_exporter/releases/download/v1.5.0/ node_exporter-1.5.0.linux-amd64.tar.gz"
}

It looks like it's due to the space charater in the download url:
https://github.com/prometheus/node_exporter/releases/download/v1.5.0/ node_exporter-1.5.0.linux-amd64.tar.gz.

node_exporter: TLS config does not work correctly.

There are some issues with the TLS configuration in the README (https://github.com/prometheus-community/ansible/tree/main/roles/node_exporter#tls-config).

  1. The Create node_exporter cert dir task defines the directory owner and group, but we should use the default values ​​defined in the node_exporter role.
  2. The Create cert and key task uses the openssl_certificate module, but it was renamed to community.crypto.x509_certificate when moved to the collection community.crypto.
    2.1 See https://docs.ansible.com/ansible/latest/collections/community/crypto/x509_certificate_module.html
  3. Before run the Create cert and key task, we have to create the cert_file(ex. /etc/node_exporter/tls.csr) and the key file(ex. /etc/node_exporter/tls.key) in advance, but the task that creates these files is not defined anywhere.

If give a user-friendly configuration example, change the playbook.yml as below.

- hosts: all
  pre_tasks:
    - name: Check if pip3 is installed
      ansible.builtin.command:
        cmd: which pip3
      register: pip3_check
      ignore_errors: true
    - name: install pip3
      ansible.builtin.apt:
        name: python3-pip
      register: pip3_install
      when:
        - pip3_check.rc is defined
        - not pip3_check.rc == 0
    - name: install cryptography python package
      ansible.builtin.pip:
        name: cryptography
      when:
        - (pip3_check.rc is defined and pip3_check.rc == 0) or (pip3_install is succeeded and pip3_install is not skipped)
    - name: Check if cryptography python package is installed
      ansible.builtin.command:
        cmd: pip3 freeze | grep cryptography
      register: cryptography_check
      ignore_errors: true
      when:
        - (pip3_check.rc is defined and pip3_check.rc == 0) or (pip3_install is succeeded and pip3_install is not skipped)
    - name: Create the node_exporter group
      ansible.builtin.group:
        name: "{{ _node_exporter_system_group }}"
        state: present
        system: true
      when: _node_exporter_system_group != "root"
    - name: Create the node_exporter user
      ansible.builtin.user:
        name: "{{ _node_exporter_system_user }}"
        groups: "{{ _node_exporter_system_group }}"
        append: true
        shell: /usr/sbin/nologin
        system: true
        create_home: false
        home: /
      when: _node_exporter_system_user != "root"
    - name: Create node_exporter cert dir
      file:
        path: "/etc/node_exporter"
        state: directory
        owner: "{{ _node_exporter_system_group }}"
        group: "{{ _node_exporter_system_group }}"
    - name: Check /etc/node_exporter exists
      stat:
        path: "/etc/node_exporter"
      register: node_exporter_dir
    - name: Create private key (RSA, 4096 bits)
      community.crypto.openssl_privatekey:
        path: /etc/node_exporter/tls.key
        owner: "{{ _node_exporter_system_group }}"
        group: "{{ _node_exporter_system_group }}"
      when:
        - cryptography_check is defined
        - cryptography_check.rc is defined
        - cryptography_check.rc == 0
        - node_exporter_dir.stat.exists
    - name: Create cert and key
      community.crypto.x509_certificate:
        path: /etc/node_exporter/tls.cert
        privatekey_path: /etc/node_exporter/tls.key
        provider: selfsigned
        owner: "{{ _node_exporter_system_group }}"
        group: "{{ _node_exporter_system_group }}"
      when:
        - cryptography_check is defined
        - cryptography_check.rc is defined
        - cryptography_check.rc == 0
        - node_exporter_dir.stat.exists
  roles:
    - node_exporter
  vars:
    node_exporter_tls_server_config:
      cert_file: /etc/node_exporter/tls.cert
      key_file: /etc/node_exporter/tls.key
    node_exporter_basic_auth_users:
      randomuser: examplepassword

Also, add conditions to the task to run the "Assert TLS configuration is correct" task as follows: This is to prevent this task from failing (e.g. if the node_exporter cert directory doesn't already exist).

+- name: Check existence of TLS cert file directory
+  ansible.builtin.stat:
+    path: "{{ node_exporter_tls_server_config.cert_file | dirname }}"
+  register: node_exporter_cert_file_dir
+  when:
+    - node_exporter_tls_server_config | length > 0
+    - node_exporter_tls_server_config.cert_file is defined
+    - node_exporter_tls_server_config.cert_file | length > 0
+
+- name: Check existence of TLS key file directory
+  ansible.builtin.stat:
+    path: "{{ node_exporter_tls_server_config.key_file | dirname }}"
+  register: node_exporter_key_file_dir
+  when:
+    - node_exporter_tls_server_config | length > 0
+    - node_exporter_tls_server_config.key_file is defined
+    - node_exporter_tls_server_config.key_file | length > 0
+
 - name: Assert that TLS config is correct
-  when: node_exporter_tls_server_config | length > 0
+  when:
+    - node_exporter_tls_server_config | length > 0
+    - node_exporter_cert_file_dir.stat is defined and node_exporter_cert_file_dir.stat.exists
+    - node_exporter_key_file_dir.stat is defined and node_exporter_key_file_dir.stat.exists
   block:
     - name: Assert that TLS key and cert path are set
       ansible.builtin.assert:

Writing this much README for TLS configuration is not user-friendly, so maybe we should fix the preflight.yml or configure.yml tasks.

Question: What is the galaxy name of this roles/collections?

Hi sry for that question I'm totally new in the ansible world.

I'm migration from cloudalchemy and I'm wondering where do I find these roles in galaxy what is the name for them? I use prometheus and node_exporter.

Thx for the help and sry for that questions 😅

Share common tasks across roles

We should investigate if it would make sense to share some of the common task files across roles, to avoid duplicate code.

For example much of the selinux.yml, install.yml, configure.yml is the same across roles.

We could collect those common task files into one role and just include the tasks we need from that role

So something like this:
From:

- name: SELinux
ansible.builtin.include_tasks:
file: selinux.yml
apply:
become: true
when: ansible_selinux.status == "enabled"
tags:
- node_exporter_configure

to:

 - name: SELinux 
   ansible.builtin.include_role: 
     name: prometheus.prometheus.common
     tasks_from: selinux.yml 
     apply: 
       become: true 
   when: ansible_selinux.status == "enabled" 
   tags: 
     - node_exporter_configure 

It's also possible to pass variables to the role when it's being imported, so we can essentially use the common role to create functions which we can use across other roles.
Something like:

- name: Install
  ansible.builtin.include_role:
    name: prometheus.prometheus.common
    tasks_from: install.yml
  vars:
    binary_url: https://example.tld/prometheus.tar.gz
    user: node-exp

Issues with templating of systemd unit config files

I'm seeing this issue when trying to exclude filesystem types from the filesystem collector of the node_exporter.

I'm using a variable like this:

node_exporter_enabled_collectors:
- filesystem:
      fs-types-exclude: ^(nfs|autofs)$

Which results in this in node_exporter.service:

ExecStart=/usr/local/bin/node_exporter \
--collector.filesystem \
    --collector.filesystem.fs-types-exclude='^(nfs|autofs)$' \

And - I see this in the logs:
caller=filesystem_common.go:113 level=info collector=filesystem msg="Parsed flag --collector.filesystem.fs-types-exclude" flag='^(nfs|autofs)$'

And these are some of the metrics scraped from the exporter

node_filesystem_size_bytes{device="auto.job",fstype="autofs",mountpoint="/job"} 0
node_filesystem_size_bytes{device="auto.net",fstype="autofs",mountpoint="/net"} 0
node_filesystem_size_bytes{device="host:/path",fstype="nfs",mountpoint="/net/path"} 1.099511627776e+13
node_filesystem_size_bytes{device="host:/path2",fstype="nfs",mountpoint="/net/path2"} 1.078874406912e+12

The same command run from the command line seems to parse the flag correctly:
collector=filesystem msg="Parsed flag --collector.filesystem.fs-types-exclude" flag=^(nfs|autofs)$

 curl -s localhost:9100/metrics | grep -v ^# | grep -e autofs
# 

I think the config file needs to either not have the quotes around the value, or quotes around the entire flag key and value - so, something like this in the template:

--collector.{{ name }}.{{ k }}={{ v }}
or
{{ ( '--collector.' + name + '.' + k + '=' + v ) | quote }}

I've only looked at this for node_exporter, but I guess this issue could be present in the other roles.

I've tested both these, and they both worked as expected.

node_exporter: Role doesn't work with Ansible 7.0.0

Due to a breaking change in Ansible 2.14, using the warn parameter on command triggers an error:

TASK [cloudalchemy.node_exporter : Gather currently installed node_exporter version (if any)] **********************************************************************************************
fatal: [node-1]: FAILED! => changed=false
  msg: 'Unsupported parameters for (ansible.legacy.command) module: warn. Supported parameters include: _raw_params, executable, strip_empty_ends, stdin_add_newline, creates, chdir, removes, stdin, _uses_shell, argv.'

(This issue is a copy of cloudalchemy/ansible-node-exporter#276)

Blackbox_exporter - Add a check config condition

Would it be possible to add a task that would check that :

  • The blackbox_exporter.yml configuration imported is correct before resuming actions and starting service
    • If yes, resume actions
    • Otherwise, stop actions

So far, this is possible by launching /usr/local/bin/blackbox_exporter --config.check --config.file="/etc/blackbox_exporter.yml"
Output of this cmd looks like below :
ts=2023-08-22T09:40:19.081Z caller=main.go:87 level=info msg="Config file is ok exiting..."

prometheus.prometheus.prometheus does not gather facts

Hello,

It seems that the role is not gathering facts when I try to run it ; here is an extract of the run ;

PLAY [Monitoring] ***************************

TASK [prometheus.prometheus.prometheus : Validating arguments against arg spec 'main' - Installs and configures prometheus] ***********
fatal: [status.example.com]: FAILED! => 
  msg: 'The task includes an option with an undefined variable. The error was: {{ go_arch_map[ansible_architecture] | default(ansible_architecture) }}: ''ansible_architecture'' is undefined. ''ansible_architecture'' is undefined. {{ go_arch_map[ansible_architecture] | default(ansible_architecture) }}: ''ansible_architecture'' is undefined. ''ansible_architecture'' is undefined. {{ go_arch_map[ansible_architecture] | default(ansible_architecture) }}: ''ansible_architecture'' is undefined. ''ansible_architecture'' is undefined. {{ go_arch_map[ansible_architecture] | default(ansible_architecture) }}: ''ansible_architecture'' is undefined. ''ansible_architecture'' is undefined'

It's weird because for every other role I use, there is a fact gathering stage, for example ;

PLAY [Install basic packages] *******************

TASK [Gathering Facts] ************************
ok: [example.com]

 [...]

I've tried to add the gather_facts: true option to the play, but it doesn't change anything.

Did I miss something with the setup ?

Here is the complete play ;

- name: Monitoring
  hosts: status
  become: true
  gather_facts: true
  roles:
    - prometheus.prometheus.prometheus
    - prometheus.prometheus.node_exporter
    - prometheus.prometheus.mysqld_exporter
    - prometheus.prometheus.systemd_exporter
    - cloudalchemy.grafana
    - geerlingguy.certbot
    - geerlingguy.nginx
  tags:
    - stats

Thanks,

node_exporter: Basic Auth Password not properly generated

I'm using node_exporter with an authentication using this repository manually installed (see #18, commit 1e41dec)

- name: Setup node exporter
  import_role:
    name: prometheus.prometheus.node_exporter
  vars:
    node_exporter_version: 1.5.0
    node_exporter_basic_auth_users:
      prometheus: "toto"
    node_exporter_web_telemetry_path: "/node-exporter"

Which was working with by the past but now replaces the hashed password by *0

TASK [prometheus.prometheus.node_exporter : Copy the node_exporter config file] ****************************************************************************************************************************************************************************************************
--- before: /etc/node_exporter/config.yaml
+++ after: /home/nereis/.ansible/tmp/ansible-local-15437bdzf6l09/tmpz9j85ndj/config.yaml.j2
@@ -5,4 +5,4 @@
 
 
 basic_auth_users:
-  prometheus: $2b...
+  prometheus: *0
 cat /etc/node_exporter/config.yaml
---
#
# Ansible managed
#


basic_auth_users:
  prometheus: *0

node-exporter v1.5.0
ansible 2.10.8
python 3.10.6

Thanks!

consider not modifying the node exporter user if it already exists

Using the node_exporter role without fully checking all the steps (mea culpa), I was astonished to find out that it had reduced the user I wanted to run node-exporter as to a no-login, no-home, system user. This then caused other issues with other services running under this user's name and that very much expect to have a home directory of their own.

I think the role should check whether the user already exists and not modify the user if so.

The service file should then also not include ProtectHome=yes as this would effectively disable it from accessing its own home, where, in my case all the metrics prom files are written to.

FAILED! => {"argument_errors": ["Invalid type dict for option '{'cert_file':

Dear community,

Migrating from cloudalchemy.node_exporter and a previous ansible version, my playbook fails to validate node_exporter_tls_server_config.

Regards

Olivier

ogrosjeanne@bastion:~/ansible$ ansible --version
ansible [core 2.14.3]
  config file = /etc/ansible/ansible.cfg
  configured module search path = ['/home/ogrosjeanne/.ansible/plugins/modules', '/usr/share/ansible/plugins/modules']
  ansible python module location = /usr/lib/python3/dist-packages/ansible
  ansible collection location = /home/ogrosjeanne/.ansible/collections:/usr/share/ansible/collections
  executable location = /usr/bin/ansible
  python version = 3.10.6 (main, Nov 14 2022, 16:10:14) [GCC 11.3.0] (/usr/bin/python3)
  jinja version = 3.0.3
  libyaml = True
ogrosjeanne@bastion:~/ansible$ more prometheus-test.yml 
- hosts: localhost
  connection: local 
  roles:
    - prometheus.prometheus.node_exporter
  vars:
    node_exporter_tls_server_config:
      cert_file: /etc/node_exporter/certificate.cer
      key_file: /etc/node_exporter/privateKey.pem
ogrosjeanne@bastion:~/ansible$ ansible-playbook prometheus-test.yml 

PLAY [localhost] *********************************************************************************************************************************************************************************************************

TASK [Gathering Facts] ***************************************************************************************************************************************************************************************************
ok: [localhost]

TASK [prometheus.prometheus.node_exporter : Validating arguments against arg spec 'main' - Prometheus Node Exporter] *****************************************************************************************************
fatal: [localhost]: FAILED! => {"argument_errors": ["Invalid type dict for option '{'cert_file': '/etc/node_exporter/certificate.cer', 'key_file': '/etc/node_exporter/privateKey.pem'}', elements value check is supported only with 'list' type", "Invalid type dict for option '{}', elements value check is supported only with 'list' type", "Invalid type dict for option '{}', elements value check is supported only with 'list' type"], "argument_spec_data": {"node_exporter_basic_auth_users": {"description": "Dictionary of users and password for basic authentication. Passwords are automatically hashed with bcrypt.", "elements": "str", "type": "dict"}, "node_exporter_binary_local_dir": {"description": ["Enables the use of local packages instead of those distributed on github.", "The parameter may be set to a directory where the C(node_exporter) binary is stored on the host where ansible is run.", "This overrides the I(node_exporter_version) parameter"]}, "node_exporter_disabled_collectors": {"description": ["List of disabled collectors.", "By default node_exporter disables collectors listed L(here,https://github.com/prometheus/node_exporter#disabled-by-default)."], "elements": "str", "type": "list"}, "node_exporter_enabled_collectors": {"default": ["systemd", {"textfile": {"directory": "/var/lib/node_exporter"}}], "description": ["List of dicts defining additionally enabled collectors and their configuration.", "It adds collectors to L(those enabled by default,https://github.com/prometheus/node_exporter#enabled-by-default)."], "type": "list"}, "node_exporter_http_server_config": {"description": ["Config for HTTP/2 support.", "Keys and values are the same as in L(node_exporter docs,https://github.com/prometheus/node_exporter/blob/master/https/README.md#sample-config)."], "elements": "str", "type": "dict"}, "node_exporter_textfile_dir": {"default": "/var/lib/node_exporter", "description": ["Directory used by the L(Textfile Collector,https://github.com/prometheus/node_exporter#textfile-collector).", "To get permissions to write metrics in this directory, users must be in C(node-exp) system group.", "B(Note:) More information in TROUBLESHOOTING.md guide."]}, "node_exporter_tls_server_config": {"description": ["Configuration for TLS authentication.", "Keys and values are the same as in L(node_exporter docs,https://github.com/prometheus/node_exporter/blob/master/https/README.md#sample-config)."], "elements": "str", "type": "dict"}, "node_exporter_version": {"default": "1.1.2", "description": "Node exporter package version. Also accepts latest as parameter."}, "node_exporter_web_listen_address": {"default": "0.0.0.0:9100", "description": "Address on which node exporter will listen"}, "node_exporter_web_telemetry_path": {"default": "/metrics", "description": "Path under which to expose metrics"}}, "changed": false, "msg": "Validation of arguments failed:\nInvalid type dict for option '{'cert_file': '/etc/node_exporter/certificate.cer', 'key_file': '/etc/node_exporter/privateKey.pem'}', elements value check is supported only with 'list' type\nInvalid type dict for option '{}', elements value check is supported only with 'list' type\nInvalid type dict for option '{}', elements value check is supported only with 'list' type", "validate_args_context": {"argument_spec_name": "main", "name": "node_exporter", "path": "/home/ogrosjeanne/.ansible/collections/ansible_collections/prometheus/prometheus/roles/node_exporter", "type": "role"}}

PLAY RECAP ***************************************************************************************************************************************************************************************************************
localhost                  : ok=1    changed=0    unreachable=0    failed=1    skipped=0    rescued=0    ignored=0   

"/usr/bin/tar" detected as tar type bsd. GNU tar required.

My host is a MacBook Pro with M1 chip.

I installed the Ansible collection prometheus.prometheus 0.5.1.

I am trying to install node_exporter on a target machine which runs Ubuntu 22.04 ARM64 in Parallels Desktop by this playbook:

- name: Install Prometheus Node Exporter
  hosts: hm-ubuntu
  roles:
    - role: prometheus.prometheus.node_exporter

However, I got error

export OBJC_DISABLE_INITIALIZE_FORK_SAFETY=true && \
ansible-playbook --inventory=inventory.yaml --vault-password-file=~/.ansible_vault_pass hm_ubuntu_group/playbook.yml
[WARNING]: Collection prometheus.prometheus does not support Ansible version 2.14.5

PLAY [Install Prometheus Node Exporter] **********************************************************************************************************************

TASK [Gathering Facts] ***************************************************************************************************************************************
ok: [hm-ubuntu]

TASK [prometheus.prometheus.node_exporter : Validating arguments against arg spec 'main' - Prometheus Node Exporter] *****************************************
ok: [hm-ubuntu]

TASK [prometheus.prometheus.node_exporter : Preflight] *******************************************************************************************************
included: /Users/hongbo-miao/.ansible/collections/ansible_collections/prometheus/prometheus/roles/node_exporter/tasks/preflight.yml for hm-ubuntu

TASK [prometheus.prometheus.node_exporter : Assert usage of systemd as an init system] ***********************************************************************
ok: [hm-ubuntu] => {
    "changed": false,
    "msg": "All assertions passed"
}

TASK [prometheus.prometheus.node_exporter : Install package fact dependencies] *******************************************************************************
ok: [hm-ubuntu]

TASK [prometheus.prometheus.node_exporter : Gather package facts] ********************************************************************************************
ok: [hm-ubuntu]

TASK [prometheus.prometheus.node_exporter : Naive assertion of proper listen address] ************************************************************************
ok: [hm-ubuntu] => {
    "changed": false,
    "msg": "All assertions passed"
}

TASK [prometheus.prometheus.node_exporter : Assert collectors are not both disabled and enabled at the same time] ********************************************
skipping: [hm-ubuntu]

TASK [prometheus.prometheus.node_exporter : Assert that TLS key and cert path are set] ***********************************************************************
skipping: [hm-ubuntu]

TASK [prometheus.prometheus.node_exporter : Check existence of TLS cert file] ********************************************************************************
skipping: [hm-ubuntu]

TASK [prometheus.prometheus.node_exporter : Check existence of TLS key file] *********************************************************************************
skipping: [hm-ubuntu]

TASK [prometheus.prometheus.node_exporter : Assert that TLS key and cert are present] ************************************************************************
skipping: [hm-ubuntu]

TASK [prometheus.prometheus.node_exporter : Check if node_exporter is installed] *****************************************************************************
ok: [hm-ubuntu]

TASK [prometheus.prometheus.node_exporter : Gather currently installed node_exporter version (if any)] *******************************************************
skipping: [hm-ubuntu]

TASK [prometheus.prometheus.node_exporter : Discover latest version] *****************************************************************************************
skipping: [hm-ubuntu]

TASK [prometheus.prometheus.node_exporter : Get checksum list from github] ***********************************************************************************
ok: [hm-ubuntu]

TASK [prometheus.prometheus.node_exporter : Get checksum for arm64] ******************************************************************************************
skipping: [hm-ubuntu] => (item=8aa0c275795b6812cdda6e3bca83b6412ea1c80ef1c7c2ceb364982b6f1a5d87  node_exporter-1.6.0.darwin-amd64.tar.gz)
skipping: [hm-ubuntu] => (item=7789559f3f12322e400741fc549bc71ee9c803f45fa5f64111ba67c51bd81bb4  node_exporter-1.6.0.darwin-arm64.tar.gz)
skipping: [hm-ubuntu] => (item=174a6586ee1376c869665cf736c6c44232c7d7a5305a1458e85f0741065e2b51  node_exporter-1.6.0.linux-386.tar.gz)
skipping: [hm-ubuntu] => (item=0b3573f8a7cb5b5f587df68eb28c3eb7c463f57d4b93e62c7586cb6dc481e515  node_exporter-1.6.0.linux-amd64.tar.gz)
ok: [hm-ubuntu] => (item=eb2f24626eca824c077cc7675d762bd520161c5c1a3f33c57b4b8aa0d452d613  node_exporter-1.6.0.linux-arm64.tar.gz)
skipping: [hm-ubuntu] => (item=cf47d88be69d3b40425e940ff3b05aa688eab3afa179419038be80631ca13061  node_exporter-1.6.0.linux-armv5.tar.gz)
skipping: [hm-ubuntu] => (item=6dbf0eaaefb9d865bcfa9b5dcf831f8659c71d8db87c7c489e1279c106c9c01a  node_exporter-1.6.0.linux-armv6.tar.gz)
skipping: [hm-ubuntu] => (item=e050ec02091de91ab0f5d5164f685acc972616a78504eaa2597742945a5cf3b7  node_exporter-1.6.0.linux-armv7.tar.gz)
skipping: [hm-ubuntu] => (item=b52b843e2d12b0dfd9a74bbd71f55ba3d4cb7f19bdfb69f714bf6f12fd17fb07  node_exporter-1.6.0.linux-mips.tar.gz)
skipping: [hm-ubuntu] => (item=db8cc9bfc7ade29d1c72ea4ae36d1da1e7102f1fc321bf03207e93152f29be5c  node_exporter-1.6.0.linux-mips64.tar.gz)
skipping: [hm-ubuntu] => (item=a10057f7be023bdfa88370d51348e26fcbaf78f38cb29da4b667d9e27010fa96  node_exporter-1.6.0.linux-mips64le.tar.gz)
skipping: [hm-ubuntu] => (item=f9e86c21e2dcee81aa5f419f8371936b533b177bcf1733accceb543364867f72  node_exporter-1.6.0.linux-mipsle.tar.gz)
skipping: [hm-ubuntu] => (item=db2b7e7f75a6fbe7078c88c3d9dc983ae3c0dc1e3a5757cc25fbd3ddc7bb9118  node_exporter-1.6.0.linux-ppc64.tar.gz)
skipping: [hm-ubuntu] => (item=c5621160e89be6aef86049b727fc855d41788b5ec0e348925c62038304409f1d  node_exporter-1.6.0.linux-ppc64le.tar.gz)
skipping: [hm-ubuntu] => (item=8b2cb5213342fe9c72e2fabea2cb16897fda25360d7b32387046f4986a23f9a9  node_exporter-1.6.0.linux-s390x.tar.gz)
skipping: [hm-ubuntu] => (item=4192047557d9bfc54967f24ff239812d86f9a62c84be6f589a04b6e205833c2b  node_exporter-1.6.0.netbsd-386.tar.gz)
skipping: [hm-ubuntu] => (item=9edc0c688471862895554b64aebb727b30f14839061654f141ca3fbc8d436c05  node_exporter-1.6.0.netbsd-amd64.tar.gz)
skipping: [hm-ubuntu] => (item=550a3f39ec021045f9b96d3c5a5ac0c70a32cab7b5e10fa61a296ea8666363ba  node_exporter-1.6.0.openbsd-amd64.tar.gz)

TASK [prometheus.prometheus.node_exporter : Install] *********************************************************************************************************
included: /Users/hongbo-miao/.ansible/collections/ansible_collections/prometheus/prometheus/roles/node_exporter/tasks/install.yml for hm-ubuntu

TASK [prometheus.prometheus.node_exporter : Create the node_exporter group] **********************************************************************************
ok: [hm-ubuntu]

TASK [prometheus.prometheus.node_exporter : Create the node_exporter user] ***********************************************************************************
ok: [hm-ubuntu]

TASK [prometheus.prometheus.node_exporter : Download node_exporter binary to local folder] *******************************************************************
ok: [hm-ubuntu -> localhost]

TASK [prometheus.prometheus.node_exporter : Unpack node_exporter binary] *************************************************************************************
fatal: [hm-ubuntu -> localhost]: FAILED! => {"changed": false, "msg": "Failed to find handler for \"/Users/hongbo-miao/.ansible/tmp/ansible-tmp-1692123515.7587788-28109-144992350357591/source\". Make sure the required command to extract the file is installed.\nCommand \"/usr/bin/tar\" detected as tar type bsd. GNU tar required.\nCommand \"/usr/bin/unzip\" could not handle archive:   End-of-central-directory signature not found.  Either this file is not\n  a zipfile, or it constitutes one disk of a multi-part archive.  In the\n  latter case the central directory and zipfile comment will be found on\n  the last disk(s) of this archive.\nnote:  /Users/hongbo-miao/.ansible/tmp/ansible-tmp-1692123515.7587788-28109-144992350357591/source may be a plain executable, not an archive\nunzip:  cannot find zipfile directory in one of /Users/hongbo-miao/.ansible/tmp/ansible-tmp-1692123515.7587788-28109-144992350357591/source or\n        /Users/hongbo-miao/.ansible/tmp/ansible-tmp-1692123515.7587788-28109-144992350357591/source.zip, and cannot find /Users/hongbo-miao/.ansible/tmp/ansible-tmp-1692123515.7587788-28109-144992350357591/source.ZIP, period.\n"}

PLAY RECAP ***************************************************************************************************************************************************
hm-ubuntu                  : ok=14   changed=0    unreachable=0    failed=1    skipped=7    rescued=0    ignored=0

This is my host macOS unzip version:

➜ unzip -v
UnZip 6.00 of 20 April 2009, by Info-ZIP.  Maintained by C. Spieler.  Send
bug reports using http://www.info-zip.org/zip-bug.html; see README for details.

Latest sources and executables are at ftp://ftp.info-zip.org/pub/infozip/ ;
see ftp://ftp.info-zip.org/pub/infozip/UnZip.html for other sites.

Compiled with gcc Apple LLVM 14.0.3 (clang-1403.0.22.11) [+internal-os] for Unix Mac OS X on Apr 14 2023.

UnZip special compilation options:
        COPYRIGHT_CLEAN (PKZIP 0.9x unreducing method not supported)
        SET_DIR_ATTRIB
        SYMLINKS (symbolic links supported, if RTL and file system permit)
        TIMESTAMP
        UNIXBACKUP
        USE_EF_UT_TIME
        USE_UNSHRINK (PKZIP/Zip 1.x unshrinking method supported)
        USE_DEFLATE64 (PKZIP 4.x Deflate64(tm) supported)
        LARGE_FILE_SUPPORT (large files over 2 GiB supported)
        ZIP64_SUPPORT (archives using Zip64 for large files supported)
        VMS_TEXT_CONV
        [decryption, version 2.11 of 05 Jan 2007]

UnZip and ZipInfo environment options:
           UNZIP:  [none]
        UNZIPOPT:  [none]
         ZIPINFO:  [none]
      ZIPINFOOPT:  [none]

This is my target Ubuntu unzip version:

parallels@ubuntu-linux-22-04-desktop:~$ unzip -v
UnZip 6.00 of 20 April 2009, by Debian. Original by Info-ZIP.

Latest sources and executables are at ftp://ftp.info-zip.org/pub/infozip/ ;
see ftp://ftp.info-zip.org/pub/infozip/UnZip.html for other sites.

Compiled with gcc 11.2.0 for Unix (Linux ELF).

UnZip special compilation options:
        ACORN_FTYPE_NFS
        COPYRIGHT_CLEAN (PKZIP 0.9x unreducing method not supported)
        SET_DIR_ATTRIB
        SYMLINKS (symbolic links supported, if RTL and file system permit)
        TIMESTAMP
        UNIXBACKUP
        USE_EF_UT_TIME
        USE_UNSHRINK (PKZIP/Zip 1.x unshrinking method supported)
        USE_DEFLATE64 (PKZIP 4.x Deflate64(tm) supported)
        UNICODE_SUPPORT [wide-chars, char coding: UTF-8] (handle UTF-8 paths)
        LARGE_FILE_SUPPORT (large files over 2 GiB supported)
        ZIP64_SUPPORT (archives using Zip64 for large files supported)
        USE_BZIP2 (PKZIP 4.6+, using bzip2 lib version 1.0.8, 13-Jul-2019)
        VMS_TEXT_CONV
        WILD_STOP_AT_DIR
        [decryption, version 2.11 of 05 Jan 2007]

UnZip and ZipInfo environment options:
           UNZIP:  [none]
        UNZIPOPT:  [none]
         ZIPINFO:  [none]
      ZIPINFOOPT:  [none]

Any help would be appreciate, thanks! ☺️

basic auth user don't work

Hi I tried to used the config as mentioned and on the remote system with the right package. I get the following error from ansible.

- hosts: public
  pre_tasks:
    - name: Install passlib python package
      ansible.builtin.pip:
        name: passlib[bcrypt]
    - name: Create node_exporter cert dir
      file:
        path: "/etc/node_exporter"
        state: directory
        owner: root
        group: root
    - name: Generate an OpenSSL private key with the default values (4096 bits, RSA)
      community.crypto.openssl_privatekey:
        path: /etc/node_exporter/tls.key
    - name: Generate an OpenSSL Certificate Signing Request
      community.crypto.openssl_csr:
        path: /etc/node_exporter/tls.csr
        privatekey_path: /etc/node_exporter/tls.key
        common_name: "{{ ansible_hostname}}"
    - name: Create cert and key
      community.crypto.x509_certificate:
        path: /etc/node_exporter/tls.cert
        csr_path: /etc/node_exporter/tls.csr
        privatekey_path: /etc/node_exporter/tls.key
        provider: selfsigned
  roles:
    - prometheus.prometheus.node_exporter
  vars:
    node_exporter_tls_server_config:
      cert_file: /etc/node_exporter/tls.cert
      key_file: /etc/node_exporter/tls.key
    node_exporter_basic_auth_users:
      export_user: "OOkxA0L1M7KLPgwedwedW9DPL8I0XwvAq65oI"
The full traceback is:
Traceback (most recent call last):
  File "/home/lukas/.local/pipx/venvs/ansible-base/lib/python3.11/site-packages/ansible/plugins/action/template.py", line 139, in run
    resultant = self._templar.do_template(template_data, preserve_trailing_newlines=True, escape_backslashes=False)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/lukas/.local/pipx/venvs/ansible-base/lib/python3.11/site-packages/ansible/template/__init__.py", line 1066, in do_template
    res = j2_concat(rf)
          ^^^^^^^^^^^^^
  File "<template>", line 88, in root
  File "/home/lukas/.local/pipx/venvs/ansible-base/lib/python3.11/site-packages/ansible/template/__init__.py", line 264, in wrapper
    ret = func(*args, **kwargs)
          ^^^^^^^^^^^^^^^^^^^^^
  File "/home/lukas/.local/pipx/venvs/ansible-base/lib/python3.11/site-packages/ansible/plugins/filter/core.py", line 273, in get_encrypted_password
    return passlib_or_crypt(password, hashtype, salt=salt, salt_size=salt_size, rounds=rounds)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/lukas/.local/pipx/venvs/ansible-base/lib/python3.11/site-packages/ansible/utils/encrypt.py", line 203, in passlib_or_crypt
    return CryptHash(algorithm).hash(secret, salt=salt, salt_size=salt_size, rounds=rounds)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/lukas/.local/pipx/venvs/ansible-base/lib/python3.11/site-packages/ansible/utils/encrypt.py", line 100, in hash
    return self._hash(secret, salt, rounds)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/lukas/.local/pipx/venvs/ansible-base/lib/python3.11/site-packages/ansible/utils/encrypt.py", line 119, in _hash
    result = crypt.crypt(secret, saltstring)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/crypt.py", line 86, in crypt
    return _crypt.crypt(word, salt)
           ^^^^^^^^^^^^^^^^^^^^^^^^
OSError: [Errno 22] Invalid argument
fatal: [ares.mldsc.de]: FAILED! => {
    "changed": false,
    "msg": "OSError: [Errno 22] Invalid argument"
}

Just to be sure. the password are hashed on the remote system right?

Warning message does not support Ansible version 2.14.5

Using v0.4.0 of this collection, I get the following warning message. I believe this commit is meant to resolve the problem but maybe not.

$ ansible-playbook -i inventory site.yaml 
[WARNING]: Collection prometheus.prometheus does not support Ansible version 2.14.5

This are the versions I'm running and I didn't run in any issue using v2.14.5.

$ ansible --version
ansible [core 2.14.5]
  config file = /home/jlosito/workspace/infrastructure/ansible.cfg
  configured module search path = ['/home/jlosito/.ansible/plugins/modules', '/usr/share/ansible/plugins/modules']
  ansible python module location = /home/jlosito/workspace/infrastructure/.venv/lib64/python3.11/site-packages/ansible
  ansible collection location = /home/jlosito/.ansible/collections:/usr/share/ansible/collections
  executable location = /home/jlosito/workspace/infrastructure/.venv/bin/ansible
  python version = 3.11.3 (main, Apr  5 2023, 00:00:00) [GCC 13.0.1 20230401 (Red Hat 13.0.1-0)] (/home/jlosito/workspace/infrastructure/.venv/bin/python3)
  jinja version = 3.1.2
  libyaml = True

Target hostname not being set on scrape config

Hi,

Any idea why when installing prometheus using this ansible role it sometimes sets target with the right hostname and sometimes sets localhost as per image below (in the same instance)?

image

prometheus_scrape_configs:
  - job_name: "prometheus"
    metrics_path: "{{ prometheus_metrics_path }}"
    static_configs:
      - targets:
          - "{{ ansible_fqdn | default(ansible_host) | default('localhost') }}:9090"

Unable to install snmp_node_exporter

Hi,

snmp_node_exporter install seems broken. The task prometheus.prometheus.snmp_exporter : Download snmp_exporter binary to local folder download the binary to a "random" file ("dest": "/tmp/8490f100-eb94-11ea-87ba-182377c269c8").
The task prometheus.prometheus.snmp_exporter : Unpack snmp_exporter binary want to unpack file from '/tmp/snmp_exporter-0.19.0.linux-amd64.tar.gz'.

TASK [prometheus.prometheus.snmp_exporter : Download snmp_exporter binary to local folder] ****************************************************************************************************************************************************************************************************************************************************
vendredi 05 mai 2023  16:01:30 +0200 (0:00:00.050)       0:01:17.256 ********** 
vendredi 05 mai 2023  16:01:30 +0200 (0:00:00.050)       0:01:17.256 ********** 
ok: [hnld-ptls-amazonl2-prometheus-dev01 -> localhost] => {
    "attempts": 1,
    "changed": false,
    "checksum_dest": "ff333c8409a9587720d2b55c31335f2290bc46c7",
    "checksum_src": "ff333c8409a9587720d2b55c31335f2290bc46c7",
    "dest": "/tmp/8490f100-eb94-11ea-87ba-182377c269c8",
    "elapsed": 1,
    "gid": 1000,
    "group": "seb",
    "md5sum": "5bccac10fb0d258148c488c30c786eca",
    "mode": "0644",
    "owner": "seb",
    "size": 7313658,
    "src": "/home/seb/.ansible/tmp/ansible-tmp-1683295290.88605-270186-112912528244913/tmpetxjw74o",
    "state": "file",
    "status_code": 200,
    "uid": 1000,
    "url": "https://github.com/prometheus/snmp_exporter/releases/download/v0.19.0/snmp_exporter-0.19.0.linux-amd64.tar.gz"
}

MSG:

OK (7313658 bytes)

TASK [prometheus.prometheus.snmp_exporter : Unpack snmp_exporter binary] **********************************************************************************************************************************************************************************************************************************************************************
vendredi 05 mai 2023  16:01:32 +0200 (0:00:01.678)       0:01:18.935 ********** 
vendredi 05 mai 2023  16:01:32 +0200 (0:00:01.679)       0:01:18.935 ********** 
An exception occurred during task execution. To see the full traceback, use -vvv. The error was: If you are using a module and expect the file to exist on the remote, see the remote_src option
fatal: [hnld-ptls-amazonl2-prometheus-dev01 -> localhost]: FAILED! => {
    "changed": false
}

MSG:

Could not find or access '/tmp/snmp_exporter-0.19.0.linux-amd64.tar.gz' on the Ansible Controller.
If you are using a module and expect the file to exist on the remote, see the remote_src option

Permission required for task in node_exporter role

The task Install package fact dependencies inside the task-file preflight.yml inside the node_exporter role needs to be run a root. I noticed that while installing the node-exporter on my ubuntu server.

The task should be changed from:

- name: Install package fact dependencies
ansible.builtin.package:
name: "{{ _pkg_fact_req }}"
state: present
when: (_pkg_fact_req)
vars:
_pkg_fact_req: "{% if (ansible_pkg_mgr == 'apt') %}\
{{ ('python-apt' if ansible_python_version is version('3', '<') else 'python3-apt') }}
{% else %}\
{% endif %}"

to:

- name: Install package fact dependencies
  become: true
  ansible.builtin.package:
    name: "{{ _pkg_fact_req }}"
    state: present
  when: (_pkg_fact_req)
  vars:
    _pkg_fact_req: "{% if (ansible_pkg_mgr == 'apt') %}\
                    {{ ('python-apt' if ansible_python_version is version('3', '<') else 'python3-apt') }}
                    {% else %}\
                    {% endif %}"

As a workaround you could include the role with root privileges:

- name: Install node-exporter
  ansible.builtin.include_role:
    name: prometheus.prometheus.node_exporter
    apply:
      become: true

Thank you!

unable to run node_exporter playbook when offline and SELinux enabled on RedHat

Background:
We are using RedHat with SELinux enabled in an offline environment.
When SELinux is enabled, the node_exporter playbook wants to make sure two packages are present: python3-libselinux and python3-policycoreutils.
Problem:
The task to check for the presence of the packages always fails due to Yum which is unable to fetch metadata from the Yum repositories. Note that the packages are already installed so why does Yum wants to fetch repo metadata.

I've noticed that when I run "yum check-update" beforehand with internet temporarily enabled, the task does succeed afterwards when offline...

node_exporter "Create textfile collector dir" task - Consistent "change" state

Description
The "Create textfile collector dir" task in the node_exporter role for the Prometheus Ansible collection is consistently reporting a "changed" state upon every run. The task in question

Other system services generate their own metrics and place the corresponding metric files in the "{{ node_exporter_textfile_dir }}" directory. These services belong to the "{{ node_exporter_system_group }}" group, enabling them to change the group ownership of created metric files to the desired "{{ node_exporter_system_group }}". The node-exporter can read from these metric files, but the services are unable to change the ownership to the "{{ node_exporter_system_user }}". This results in a "changed" state for the "Create textfile collector dir" task after every run, as long as there are new or updated metric files within the "{{ node_exporter_textfile_dir }}" directory.

Problem
This issue is problematic because I monitor Ansible changes for security reasons and to detect any unexpected changes. Constant changes are undesirable and should not occur under normal circumstances.

Expected Outcome:
When all files and directories within the node_exporter_textfile_dir directory have the correct group set, no Ansible changes should be reported. The role should neither alter the "user" for files nor report any changes regarding this.

Solution Proposal
I have tried to come up with a suitable solution but have been unsuccessful. Unfortunately, there is no associated PR. Possible solutions could include:

  1. Setting recurse: false - This would set the desired ownership for the directory itself, allowing the user to manage and verify that text metric files have the correct group, which node-exporter can read from.
  2. Using changed: false - This option is also not ideal.

I am unsure how to address this issue and welcome any suggestions or proposals for a resolution.

FAILED! => {"censored": "the output has been hidden due to the fact that 'no_log: true' was specified for this result", "changed": false}

today start getting this error when running the prometmeus.prometheus role, literally yesterday it was working
Options no_log: false and true doesn't work

FAILED! => {"censored": "the output has been hidden due to the fact that 'no_log: true' was specified for this result", "changed": false}

Ansible version that i'm using

# ansible --version
ansible [core 2.12.10]
  config file = /etc/ansible/ansible.cfg
  configured module search path = ['/root/.ansible/plugins/modules', '/usr/share/ansible/plugins/modules']
  ansible python module location = /usr/lib/python3/dist-packages/ansible
  ansible collection location = /root/.ansible/collections:/usr/share/ansible/collections
  executable location = /usr/bin/ansible
  python version = 3.8.10 (default, Mar 13 2023, 10:26:41) [GCC 9.4.0]
  jinja version = 2.10.1
  libyaml = True

Inventiry file

[prometheus]
prometheus.server.ip

playbook file:

---
- hosts: prometheus
  become: true
  roles:
    - prometheus.prometheus.prometheus
  vars:
    prometheus_version: 2.40.7
    prometheus_web_listen_address: "0.0.0.0:9090"
    prometheus_alert_rules_files: /etc/ansible/files/*.rules
    prometheus_web_config:
      tls_server_config: {}
      http_server_config: {}
      basic_auth_users: {}
    prometheus_alertmanager_config:
      - scheme: http
        basic_auth: {}
        static_configs:
          - targets: ["127.0.0.1:9093"]
    prometheus_targets:
      node-exporter:
        - targets:
            - targert:9100
          labels:
            labels: label
    prometheus_scrape_configs:
      - job_name: "prometheus"
        metrics_path: "{{ prometheus_metrics_path }}"
        static_configs:
          - targets:
              - "{{ ansible_fqdn | default(ansible_host) | default('localhost') }}:9090"
      - job_name: "node-exporter"
        file_sd_configs:
          - files:
              - "{{ prometheus_config_dir }}/file_sd/node-exporter.yml"

Update blackbox_exporter config

The blackbox_exporter role has a couple of config issues.

  • It puts the exporter config in /etc directly.
  • It is missing the web config file.

Reload snmp exporter failed

When using role prometheus.prometheus.snmp_exporter, the task Reload snmp exporter failed:

RUNNING HANDLER [prometheus.prometheus.snmp_exporter : Reload snmp exporter] ********************************************************************************
vendredi 05 mai 2023  17:51:43 +0200 (0:00:03.534)       0:01:37.901 ********** 
vendredi 05 mai 2023  17:51:43 +0200 (0:00:03.534)       0:01:37.901 ********** 
fatal: [hnld-ptls-amazonl2-prometheus-dev01]: FAILED! => {
    "changed": false
}

MSG:

failure 1 during daemon-reload: Failed to execute operation: The name org.freedesktop.PolicyKit1 was not provided by any .service files

skip installation and use Debian packages instead

So this role can install prometheus from a binary tarball on GitHub (which I feel is a questionable practice, but that's another issue). In our environments, we prefer packages vetted by third parties, namely for auto-upgrades and so on, so we rely on Debian packages.

It's possible to completely skip the install with prometheus_skip_install, but then the role will crash with:

[prometheus.debian.net]: FAILED! => {"changed": false, "checksum": "df2496d755394a2fd11e01e3b4bd566418adc796", "cmd": "/usr/local/bin/promtool check rules /home/anarcat/.ansible/tmp/ansible-tmp-1695839321.1357365-156317-255727393866530/source", "msg": "[Errno 2] No such file or directory: b'/usr/local/bin/promtool'", "rc": 2, "stderr": "", "stderr_lines": [], "stdout": "", "stdout_lines": []}

That's because promtool is in /usr/bin/promtool and not /usr/local/bin/promtool as expected...

There's all sorts of issues like this around the package. For example, the prometheus.service unit file is overridden, poorly IMHO:

--- /usr/lib/systemd/system/prometheus.service	2023-02-04 00:20:19.000000000 +0000
+++ /etc/systemd/system/prometheus.service	2023-09-27 18:46:39.470561824 +0000
@@ -1,39 +1,53 @@
+#
+# Ansible managed
+#
+
 [Unit]
-Description=Monitoring system and time series database
-Documentation=https://prometheus.io/docs/introduction/overview/ man:prometheus(1)
-After=time-sync.target
+Description=Prometheus
+After=network-online.target
+Requires=local-fs.target
+After=local-fs.target
 
 [Service]
-Restart=on-failure
+Type=simple
+Environment="GOMAXPROCS=2"
 User=prometheus
-EnvironmentFile=/etc/default/prometheus
-ExecStart=/usr/bin/prometheus $ARGS
+Group=prometheus
 ExecReload=/bin/kill -HUP $MAINPID
-TimeoutStopSec=20s
-SendSIGKILL=no
+ExecStart=/usr/bin/prometheus \
+  --storage.tsdb.path=/var/lib/prometheus \
+  --storage.tsdb.retention.time=365d \
+  --storage.tsdb.retention.size=0 \
+  --web.config.file=/etc/prometheus/web.yml \
+  --web.console.libraries=/etc/prometheus/console_libraries \
+  --web.console.templates=/etc/prometheus/consoles \
+  --web.listen-address=0.0.0.0:9090 \
+  --web.external-url= \
+  --config.file=/etc/prometheus/prometheus.yml
 
-# systemd hardening-options
-AmbientCapabilities=
-CapabilityBoundingSet=
-DeviceAllow=/dev/null rw
-DevicePolicy=strict
-LimitMEMLOCK=0
-LimitNOFILE=32768
+CapabilityBoundingSet=CAP_SET_UID
+LimitNOFILE=65000
 LockPersonality=true
-MemoryDenyWriteExecute=true
 NoNewPrivileges=true
+MemoryDenyWriteExecute=true
 PrivateDevices=true
 PrivateTmp=true
+ProtectHome=true
+RemoveIPC=true
+RestrictSUIDSGID=true
+#SystemCallFilter=@signal @timer
+
+ReadWritePaths=/var/lib/prometheus
+
 PrivateUsers=true
 ProtectControlGroups=true
-ProtectHome=true
 ProtectKernelModules=true
 ProtectKernelTunables=true
-ProtectSystem=full
-RemoveIPC=true
-RestrictNamespaces=true
-RestrictRealtime=true
-SystemCallArchitectures=native
+ProtectSystem=strict
+
+
+SyslogIdentifier=prometheus
+Restart=always
 
 [Install]
 WantedBy=multi-user.target

now some of those are due to the role's entries not being sorted the same way (if at all, actually), but some are real issues. Restart=always (or, for that matter, even =on-failure) in particular, is dangerous, see https://bugs.debian.org/1022724

There should be a way to leave all of that stuff alone and focus on configuring the package instead...

[prometheus] - New scrape_config_files

Starting from version 2.43.0 of Prometheus, we have a new section available in the configuration file prometheus.yml called 'scrape_config_files,' where you can specify the paths of files containing the scrape_config settings.

Would it be possible to implement this functionality by adding the 'scrape_config_files' attribute to the prometheus.yml.j2 template if it is defined in a variable, for example, 'prometheus_scrape_config_files'? It could be a list of paths."

[alertmanager] support multiple addresses with web.listen-address in service

The alertmanager binary supports mutliple listen addresses, from --help:

      --web.listen-address=:9093 ...
                                 Addresses on which to expose metrics and web 
                                 interface. Repeatable for multiple addresses.

The current service template and variable does not seem to support this. I edited manually to check what it would need to look like:

ExecStart=/usr/local/bin/alertmanager \
  ...
  --web.listen-address=10.10.20.23:9093 \
  --web.listen-address=127.0.1.1:9093 \
  ...
$ netstat -nlptu | rg alertmanager | column -t
tcp  0  0  127.0.1.1:9093    0.0.0.0:*  LISTEN  105724/alertmanager
tcp  0  0  10.10.20.23:9093  0.0.0.0:*  LISTEN  105724/alertmanager

I guess for this change to be backwards compatible, the alertmanager_web_listen_address variable needs to be checked if it's either a single value or a list.

Ansible 2.15.2 and failing to run.

[WARNING]: Collection prometheus.prometheus does not support Ansible version
2.15.2

Running 0.5.1 release of the collection. Is it looking at the minor version of Ansible and failing?

Checksum validation fails if `latest` version requested

The error says:

TASK [prometheus.prometheus.node_exporter : Discover latest version] ***************************************************************************************************************************************************************************************************************
ok: [host] => {"ansible_facts": {"node_exporter_version": "1.5.0"}, "attempts": 1, "changed": false}

TASK [prometheus.prometheus.node_exporter : Get checksum list from github] *********************************************************************************************************************************************************************************************************
fatal: [host]: FAILED! => {"msg": "An unhandled exception occurred while running the lookup plugin 'url'. Error was a <class 'ansible.errors.AnsibleError'>, original message: Received HTTP error for https://github.com/prometheus/node_exporter/releases/download/vlatest/sha256sums.txt : HTTP Error 404: Not Found. Received HTTP error for https://github.com/prometheus/node_exporter/releases/download/vlatest/sha256sums.txt : HTTP Error 404: Not Found"}

As seen from log above, it retrieved version correctly, but somehow failed to override the variable.

Tags are not working

I wanted to quickly update my Prometheus config using the tag "prometheus_config". The playbook didn't work because the role uses "include_tasks".

I see 2 solutions:

  1. Convert include_tasks to import_tasks.
  2. Add apply: {tags: ...} to all ìnclude_tasks` call.

Is there a reason why this collection prefer include_tasks?

Reevaluate usage of user "nobody"

This affects at least the systemd service file for the snmp_exporter.

According to https://wiki.ubuntu.com/nobody the 'nobody' user should only be used for nfs and daemons should not use this user, as multiple daemons with this user could affect each other.

Against nobody:
https://en.wikipedia.org/wiki/Nobody_(username)
Answers from Platforms like askbuntu unix.stackexchange etc.
Systemd logs the following message: /etc/systemd/system/snmp_exporter.service:8: Special user nobody configured, this is not safe!

"For" nobody:
https://wiki.debian.org/SystemGroups#Groups_with_an_associated_user

Alternatives:

  • create a real user (without home etc.) and set it in the service file
  • use systemd's dynamicuser=yes feature, which allocates a new user on service start for the duration of the service
    • https://www.freedesktop.org/software/systemd/man/systemd.exec.html#DynamicUser=
    • should be usable as the service does not seem to need any filesystem access, except for the globally readable config file
      • dynamicuser implies protectsystem=strict which prohibits nearly all access to the filesystem
    • does not create a new entry for the user in the /etc/groups and /etc/passwd files
      • less/no traces left after uninstalling, except for maybe the config file
    • At least available since Systemd 232: https://0pointer.net/blog/dynamic-users-with-systemd.html
    • "works on my system" without any problems so far (Raspberry Pi 4B, latest Raspberry Pi OS, based on Debian 11)
    • caveats:
      • not tested for all systems
      • ?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.