Git Product home page Git Product logo

nagflux's Introduction

GoDoc Go Report Card Circle CI Coverage Status

Nagflux

A connector which transforms performancedata from Nagios/Icinga(2)/Naemon to InfluxDB/Elasticsearch

Nagflux collects data from the NagiosSpoolfileFolder and adds informations from Livestatus. This data is sent to an InfluxDB, to get displayed by Grafana. Therefor is the tool Histou gives you the possibility to add Templates to Grafana.

Nagflux can be seen as the process_perfdata.pl script from PNP4Nagios.

Dependencies

Golang 1.5+

Install

go get -u github.com/griesbacher/nagflux
go build github.com/griesbacher/nagflux

A x86-64 Linux binary will be added to the releases. Here the link to the latest Release.

Configure

Here are some of the important config-options:

Section Config-Key Meaning
main NagiosSpoolfileFolder This is the folder where nagios/icinga writes its spoolfiles. Icinga2: /var/spool/icinga2/perfdata
main NagfluxSpoolfileFolder In this folder you can dump files with InfluxDBs linequery syntax, the will be shipped to the InfluxDB, the timestamp has to be in ms
main FieldSeperator This char is used to separate the logical parts of the tablenames. This char has to be an char which is not allowed in one of those: host-, servicename, command, perfdata
main FileBufferSize This is the size of the buffer which is used to read files from disk, if you have huge checks or a lot of them you maybe recive error messages that your buffer is too small and that's the point to change it
Log MinSeverity INFO is default an enough for the most. DEBUG give you a lot more data but it's mostly just spamming
InfluxDBGlobal Version Currentliy the only supported Version of InfluxDB is 0.9+
Influx "name" Address The URL of the InfluxDB-API
Influx "name" Arguments Here you can set your user name and password as well as the database. The precision has to be ms!
Influx "name" NastyString/NastyStringToReplace These keys are to avoid a bug in InfluxDB and should disappear when the bug is fixed
Influx "name" StopPullingDataIfDown This is used to tell Nagflux, if this Influxdb is down to stop reading new data. That's useful if you're using spoolfiles. But if you're using gearman set this always to false because by default gearman will not buffer the data endlessly

Start

If the configfile is in the same folder as the executable:

./nagflux

else:

./nagflux -configPath=/path/to/config.gcfg

Debugging

  • If the InfluxDB is not available Nagflux will stop and an log entry will be written.
  • If the Livestatus is not available Nagflux will just write an log entry, but additional informations can't be gathered.
  • If any part of the Tablename is not valid for the InfluxDB an log entry will written and the data is writen to a file which has the same name as the logfile just with the ending '.dump-errors'. You could fix the errors by hand and copy the lines in the NagfluxSpoolfileFolder
  • If the Data can't be send to the InfluxDB, Nagflux will also write them in the '.dump-errors' file, you can handle them the same way.
  • If the logs are showing files are being read (in DEBUG mode) but nothing is going into InfluxDB, check the perfdata template to ensure it matches OMD format. See Perfdata Template for more details.

Dataflow

There are basically two ways for Nagflux to receive data:

  • Spoolfiles: They are for useful if Nagflux is running at the same machine as Nagios
  • Gearman: If you have a distributed setup, that's the way to go

With both ways you could enrich your performance data with additional informations from livestatus. Like downtimes, notifications and so.

Targets can be:

  • InfluxDB, that's the main target and the reason for this project.
  • Elasticsearch, more a prove of concept but it worked some time ago ;)
  • JSON, to parse the data by an third tool.

Dataflow Image

OMD

Nagflux is fully integrated in OMD-Labs, as well as Histou is. Therefor if you wanna try it out, it's maybe easier to install OMD-Labs.

Perfdata Template

Nagflux supports a couple of Perfdata templates (see main_test.go for some supported formats). By default it assumes you have the OMD formattemplate. If you are setting this up manually (not using OMD) please ensure your perfdata template is as follows:

Host

DATATYPE::HOSTPERFDATA\tTIMET::$TIMET$\tHOSTNAME::$HOSTNAME$\tHOSTPERFDATA::$HOSTPERFDATA$\tHOSTCHECKCOMMAND::$HOSTCHECKCOMMAND$

Service

DATATYPE::SERVICEPERFDATA\tTIMET::$TIMET$\tHOSTNAME::$HOSTNAME$\tSERVICEDESC::$SERVICEDESC$\tSERVICEPERFDATA::$SERVICEPERFDATA$\tSERVICECHECKCOMMAND::$SERVICECHECKCOMMAND$

If you are using Nagios the default templates will not work. Use the above templates with config host_perfdata_file_template and service_perfdata_file_template, respectively.

DEMO

This Dockercontainer contains OMD and everything is preconfigured to use Nagflux/Histou/Grafana/InfluxDB: https://github.com/Griesbacher/docker-omd-grafana

Presentations

  • Here is a presentation I held about Nagflux and Histou in 2016, only in German, sorry: Slides
  • That's the first one from 2015, also only in German. Slides - Video

nagflux's People

Contributors

britcey avatar gitmopp avatar griesbacher avatar sni avatar u238 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

nagflux's Issues

Configuration for timestamp of passive check

I want import historical performance data with passive checks, example:
echo "[1566572900] PROCESS_SERVICE_CHECK_RESULT;host1;weather;0;OK|rainintensity=5.000mm/h">/opt/omd/sites/test/tmp/run/naemon.cmd
for time 23.08.2019 - 17:08:20

But in the nagflux database the entry has the actual time.
I assume the reason is the macro $TIMET$ in nagios_nagflux.cfg

etc/nagflux/nagios_nagflux.cfg: service_perfdata_file_template=DATATYPE::SERVICEPERFDATA\tTIMET::$TIMET$\tHOSTNAME::$HOSTNAME$\tSERVICEDESC::$SERVICEDESC$\tSERVICEPERFDATA::$SERVICEPERFDATA$\tSERVICECHECKCOMMAND::$SERVICECHECKCOMMAND$\tHOSTSTATE::$HOSTSTATE$\tHOSTSTATETYPE::$HOSTSTATETYPE$\tSERVICESTATE::$SERVICESTATE$\tSERVICESTATETYPE::$SERVICESTATETYPE$

the resulting data is

var/pnp4nagios/service-perfdata: DATATYPE::SERVICEPERFDATA TIMET::1566639139 HOSTNAME::host1 SERVICEDESC::weather SERVICEPERFDATA::rainintensity=5.000mm/h SERVICECHECKCOMMAND::check_dummy HOSTSTATE::UP HOSTSTATETYPE::HARD SERVICESTATE::OK SERVICESTATETYPE::HARD
1566639139 was the actual time: 24.08.2019 - 11:32:19

How can the service_perfdata_file_template changed to use the provided timestamp of the passive service check?

I use the OMD-Labs 3.1.

Nagios XI: The notification type is unkown:HOST NOTIFICATION

Nagios XI 5.2.9 with check_mk livestatus 1.2.6p12
Nagflux reports:
Warn: Could not detect livestatus type, with version: 1.2.6p12 asuming Nagios

Collecting of HOST notification fails:
2016-06-17 12:47:39 Warn: The notification type is unkown:HOST NOTIFICATION
2016-06-17 12:47:39 Warn: HOST NOTIFICATION, undefinded linelenght: 8 Line:[HOST NOTIFICATION 1466160387 testuser123 [1466160387] HOST NOTIFICATION: testuser123 My_Router UP notify-host-by-email OK - 10.206.173.121: rta 48.516ms, lost 0%]

In Collector.go line 200 the data structure for a length of 8 fields is missing. I don' t know which fields are expected by nagflux.

By the way there is a typo linelenght -> line length.

Nagflux doesn't work with influxDB/https and auto-signed certificat

Hi,

config:

  • influxdb 0.13 / https API configured with auto-signed certificate (influxdb on remote host)

From github, nagflux can't connect to influxdb as go http client denies access to such https connector.

I managed to add a bit of code to skip certicate check:

--- a/target/influx/Connector.go
+++ b/target/influx/Connector.go
@@ -1,6 +1,7 @@
package influx

import (

  •   "crypto/tls"
    "encoding/json"
    "github.com/griesbacher/nagflux/collector"
    "github.com/griesbacher/nagflux/data"
    
    @@ -43,7 +44,15 @@ func ConnectorFactory(jobs chan collector.Printable, connectionHost, connectionA
    }
    }
    timeout := time.Duration(5 * time.Second)
  •   client := http.Client{Timeout: timeout}
    
  •   tlsConfig := &tls.Config{
    
  •           InsecureSkipVerify: true,
    
  •   }
    
  •   tr := &http.Transport{
    
  •           TLSClientConfig: tlsConfig,
    
  •   }
    
  •   client := http.Client{Timeout: timeout, Transport: tr}
    

After that, nagflux connection went well, but nagflux failed to create the database:

2016-08-23 12:15:55 Info: good
2016-08-23 12:15:55 Info: Influxdb running
2016-08-23 12:16:05 Panic: Database does not exists and was not able to created
panic: Database does not exists and was not able to created

Any idea to get https working ?

jfr

Unable to fetch data life data from OP5 to Nagflush to Influxdb to Grafana

Hi There,

I am trying to fetch data(live alert, performance graph etc) to Grafana.

i have setup configuration followed below link

https://support.itrsgroup.com/hc/en-us/articles/360020252353-OP5-Monitor-How-to-send-metrics-to-Grafana

setup below

ITRS OP5
Influxdb
Nagflux
Grafana

i have got the Grafana console and created one test DB in influxdb

added data source in grafana

however unable to add live data from OP5 to grafana please help

Write metrics to different influxdb databases

I have discovered this project while making a PR to mod_gearman.

I'm trying also to send icinga perfdata to InfluxDB, but I need to send each perfdata to a different database based on a custom template defined in each host.

Any idea if it will be possible to handle this with nagflux?

Thanks!

Empty NAGFLUX:TAG

The service_perfdata_file_template has \tNAGFLUX:TAG::$_SERVICENAGFLUX_TAG$ appended to it. The problem is, that now every service needs some sane value for it. Otherwise nagflux fails to parse the line:

metrics,host=localhost,service=Swap,command=check_local_swap,performanceLabel=swap,$_SERVICENAGFLUX_TAG$=,warn-fill=none,crit-fill=none,unit=MB value=974.0,warn=308.0,crit=205.0,min=0.0,max=1027.0 1487770865000

because the macro doesn't get expanded:

DATATYPE::SERVICEPERFDATA	TIMET::1487771589	HOSTNAME::localhost	SERVICEDESC::Swap	SERVICEPERFDATA::swap=974MB;308;205;0;1027	SERVICECHECKCOMMAND::check_local_swap!30%!20%	HOSTSTATE::UP	HOSTSTATETYPE::HARD	SERVICESTATE::OK	SERVICESTATETYPE::HARD	NAGFLUX:TAG::$_SERVICENAGFLUX_TAG$

I've tried to define a parent service with an empty _NAGFLUX_TAG which leads to nagflux crashes:

panic: assignment to entry in nil map

goroutine 165 [running]:
panic(0x70a4e0, 0xc4201373c0)
        /opt/projects/omd/rpm.topdir/BUILD/omd-2.21.20161208-labs-edition/packages/go-1.7/go-1.7.3/src/runtime/panic.go:500 +0x1a1
github.com/griesbacher/nagflux/collector/spoolfile.(*NagiosSpoolfileWorker).PerformanceDataIterator.func1(0xc42047fbf0, 0x76a86b, 0x7, 0xc42044c059, 0x13, 0xc42044c0bd, 0x11, 0xc420137383, 0xd, 0xc4201ca3c0, ...)
        /opt/projects/omd/rpm.topdir/BUILD/omd-2.21.20161208-labs-edition/packages/nagflux/go/src/github.com/griesbacher/nagflux/collector/spoolfile/nagiosSpoolfileWorker.go:214 +0xc8e
created by github.com/griesbacher/nagflux/collector/spoolfile.(*NagiosSpoolfileWorker).PerformanceDataIterator
        /opt/projects/omd/rpm.topdir/BUILD/omd-2.21.20161208-labs-edition/packages/nagflux/go/src/github.com/griesbacher/nagflux/collector/spoolfile/nagiosSpoolfileWorker.go:245 +0x4f6

The field NAGFLUX:TAG field looks like this in the crashing case: NAGFLUX:TAG::. I guess a sane way would be to treat the field value NAGFLUX::TAG:: as if the field wasn't there at all.

Nagflux doesn't register HOSTSTATE, HOSTSTATETYPE, SERVICESTATE and SERVICESTATETYPE to InfluxDB

I have a Nagios + Nagflux + InfluxDB installed and configured.
In my nagios.cfg file, i have service_perfdata_file_template=DATATYPE::SERVICEPERFDATA\tTIMET::$TIMET$\tHOSTNAME::$HOSTNAME$\tSERVICEDESC::$SERVICEDESC$\tSERVICEPERFDATA::$SERVICEPERFDATA$\tSERVICECHECKCOMMAND::$SERVICECHECKCOMMAND$\tHOSTSTATE::$HOSTSTATE$\tHOSTSTATETYPE::$HOSTSTATETYPE$\tSERVICESTATE::$SERVICESTATE$\tSERVICESTATETYPE::$SERVICESTATETYPE$.
When I take a look at my service-perfdata file, I can clearly see that all the fields have correctly been exported :
DATATYPE::SERVICEPERFDATA TIMET::1667206809 HOSTNAME::MYHOST SERVICEDESC::PING SERVICEPERFDATA::rta=1.071000ms;100.000000;500.000000;0.000000 pl=0%;20;60;0 SERVICECHECKCOMMAND::check_ping!100.0,20%!500.0,60% HOSTSTATE::UP HOSTSTATETYPE::HARD SERVICESTATE::OK SERVICESTATETYPE::HARD.

But when I take a look at my InfluxDB, I can only see a part of all these fields :

> use nagflux
Using database nagflux
> show measurements
name: measurements
name
----
metrics
> show tag keys
name: metrics
tagKey
------
command
crit-fill
host
performanceLabel
service
unit
warn-fill
> SELECT * FROM "metrics" WHERE ("service" = 'PING' AND "host" = 'MYHOST') LIMIT 1
name: metrics
time                command    crit crit-fill host   max min performanceLabel service unit value warn warn-fill
----                -------    ---- --------- ----   --- --- ---------------- ------- ---- ----- ---- ---------
1667206509000000000 check_ping 60   none      MYHOST     0   pl               PING    %    0     20   none

Is there something I misconfigured? Is it something that is not supported?
I really need to get a filter to get all my services by status.

Versions:

  • Nagflux : 0.4.1, binary from releases section (I can't compile due to apparently deprecated command and I don't know anything about go)
  • Nagios : 4.4.8, compiled from source
  • InfluxDB : 1.8.10, installad from yum (I see there is a version 2 of InfluxDB, could my issue come from there?)

InfluxDB Database(nagflux) does not exists and Nagflux was not able to create it

root@host:/opt/nagflux# /opt/nagflux/nagflux -configPath=/opt/nagflux/config.gcfg
2022-09-01 17:54:50 Info: Started Nagflux v0.4.1
2022-09-01 17:54:50 Info: Is InfluxDB(nagflux) running: true
2022-09-01 17:54:52 Warn: Could not create database:nagflux
2022-09-01 17:54:54 Warn: Could not create database:nagflux
2022-09-01 17:54:56 Warn: Could not create database:nagflux
2022-09-01 17:54:58 Warn: Could not create database:nagflux
2022-09-01 17:55:00 Warn: Could not create database:nagflux
2022-09-01 17:55:00 Critical: InfluxDB Database(nagflux) does not exists and Nagflux was not able to create it
2022-09-01 17:55:02 Critical: Connection type is unknown, options are: tcp, file. Input:
2022-09-01 17:55:02 Info: Livestatus version:
2022-09-01 17:55:02 Warn: Could not detect livestatus type, with version: . Asuming Nagios
2022-09-01 17:55:02 Info: Nagios Spoolfile Folder: /usr/local/nagios/var/spool/nagfluxperfdata
2022-09-01 17:55:02 Info: Nagflux Spoolfile Folder: /usr/local/nagios/var/nagflux
2022-09-01 17:55:02 Critical: Connection type is unknown, options are: tcp, file. Input:
2022-09-01 17:55:02 Critical: Connection type is unknown, options are: tcp, file. Input:
2022-09-01 17:55:02 Critical: Connection type is unknown, options are: tcp, file. Input:
2022-09-01 17:55:02 Critical: Connection type is unknown, options are: tcp, file. Input:
2022-09-01 17:55:02 Critical: Connection type is unknown, options are: tcp, file. Input:
2022-09-01 17:55:02 Critical: Connection type is unknown, options are: tcp, file. Input:
2022-09-01 17:55:32 Critical: Connection type is unknown, options are: tcp, file. Input:
2022-09-01 17:55:32 Critical: Connection type is unknown, options are: tcp, file. Input:
2022-09-01 17:55:32 Critical: Connection type is unknown, options are: tcp, file. Input:
2022-09-01 17:56:02 Critical: Connection type is unknown, options are: tcp, file. Input:
2022-09-01 17:56:02 Critical: Connection type is unknown, options are: tcp, file. Input:
2022-09-01 17:56:02 Critical: Connection type is unknown, options are: tcp, file. Input:
2022-09-01 17:56:32 Critical: Connection type is unknown, options are: tcp, file. Input:
2022-09-01 17:56:32 Critical: Connection type is unknown, options are: tcp, file. Input:
2022-09-01 17:56:32 Critical: Connection type is unknown, options are: tcp, file. Input:

Critical: Connection type is unknown, options are: tcp, file. Input:

Hi

I am facing the below mentioned issue.

systemctl status nagflux.service[/b]
● nagflux.service - A connector which transforms performancedata from Nagios/Icinga(2)/Naemon to InfluxDB/Elasticsearch
Loaded: loaded (/usr/lib/systemd/system/nagflux.service; enabled; vendor preset: disabled)
Active: active (running) since Fri 2020-01-31 04:10:26 UTC; 12h ago
Docs: https://github.com/Griesbacher/nagflux
Main PID: 28895 (nagflux)
CGroup: /system.slice/nagflux.service
└─28895 /opt/nagflux/nagflux -configPath /opt/nagflux/config.gcfg

Jan 31 17:05:59 ip-172-31-0-145.ap-south-1.compute.internal nagflux[28895]: 2020-01-31 17:05:59 Critical: Connection type is unknown, options are: tcp, file. Input:
Jan 31 17:06:28 ip-172-31-0-145.ap-south-1.compute.internal nagflux[28895]: 2020-01-31 17:06:28 Critical: Connection type is unknown, options are: tcp, file. Input:
Jan 31 17:06:28 ip-172-31-0-145.ap-south-1.compute.internal nagflux[28895]: 2020-01-31 17:06:28 Critical: Connection type is unknown, options are: tcp, file. Input:
Jan 31 17:06:28 ip-172-31-0-145.ap-south-1.compute.internal nagflux[28895]: 2020-01-31 17:06:28 Critical: Connection type is unknown, options are: tcp, file. Input:
Jan 31 17:06:29 ip-172-31-0-145.ap-south-1.compute.internal nagflux[28895]: 2020-01-31 17:06:29 Critical: Connection type is unknown, options are: tcp, file. Input:
Jan 31 17:06:29 ip-172-31-0-145.ap-south-1.compute.internal nagflux[28895]: 2020-01-31 17:06:29 Critical: Connection type is unknown, options are: tcp, file. Input:
Jan 31 17:06:29 ip-172-31-0-145.ap-south-1.compute.internal nagflux[28895]: 2020-01-31 17:06:29 Critical: Connection type is unknown, options are: tcp, file. Input:
Jan 31 17:06:59 ip-172-31-0-145.ap-south-1.compute.internal nagflux[28895]: 2020-01-31 17:06:59 Critical: Connection type is unknown, options are: tcp, file. Input:
Jan 31 17:06:59 ip-172-31-0-145.ap-south-1.compute.internal nagflux[28895]: 2020-01-31 17:06:59 Critical: Connection type is unknown, options are: tcp, file. Input:
Jan 31 17:06:59 ip-172-31-0-145.ap-south-1.compute.internal nagflux[28895]: 2020-01-31 17:06:59 Critical: Connection type is unknown, options are: tcp, file. Input :

cat /opt/nagflux/config.gcfg
[main]
   NagiosSpoolfileFolder = "/usr/local/nagios/var/spool/nagfluxperfdata"
   NagiosSpoolfileWorker = 1
   InfluxWorker = 2
   MaxInfluxWorker = 5
   DumpFile = "nagflux.dump"
   NagfluxSpoolfileFolder = "/usr/local/nagios/var/nagflux"
   FieldSeparator = "&"
   BufferSize = 10000
   FileBufferSize = 65536
   DefaultTarget = "all"

[Log]
   LogFile = ""
   MinSeverity = "INFO"

[Livestatus]
#        # tcp or file
         Type = "file"
#        # tcp: 127.0.0.1:6557 or file /var/run/live
        file /usr/local/nagios/var/live.sock
#        #Address = "127.0.0.1:6557"
#        # The amount to minutes to wait for livestatus to come up, if set to 0 the detection is disabled
        MinutesToWait = 2
#        # Set the Version of Livestatus. Allowed are Nagios, Icinga2, Naemon.
#        # If left empty Nagflux will try to detect it on it's own, which will not always work.
       Version = ""

[InfluxDBGlobal]
   CreateDatabaseIfNotExists = true
   NastyString = ""
   NastyStringToReplace = ""
   HostcheckAlias = "hostcheck"

[InfluxDB "nagflux"]
   Enabled = true
   Version = 1.0
   Address = "http://127.0.0.1:8086"
   Arguments = "precision=ms&u=root&p=root&db=nagflux"
   StopPullingDataIfDown = true

[InfluxDB "fast"]
   Enabled = false
   Version = 1.0
   Address = "http://127.0.0.1:8086"
   Arguments = "precision=ms&u=root&p=root&db=fast"
   StopPullingDataIfDown = false

Livestatus live socker file is /usr/local/nagios/var/live.sock
srw-rw----. 1 nagios nagios 0 Jan 29 07:20 /usr/local/nagios/var/live.sock

I have enabled the below in /opt/nagflux/config.gcfg. nagflux service does not start at all.

CODE: SELECT ALL
[Livestatus]
#        # tcp or file
        Type = "file"
#        # tcp: 127.0.0.1:6557 or file /var/run/live
        file /usr/local/nagios/var/live.sock
#        #Address = "127.0.0.1:6557"
#        # The amount to minutes to wait for livestatus to come up, if set to 0 the detection is disabled
        MinutesToWait = 2
#        # Set the Version of Livestatus. Allowed are Nagios, Icinga2, Naemon.
#        # If left empty Nagflux will try to detect it on it's own, which will not always work.
        Version = ""

● nagflux.service - A connector which transforms performancedata from Nagios/Icinga(2)/Naemon to InfluxDB/Elasticsearch
Loaded: loaded (/usr/lib/systemd/system/nagflux.service; enabled; vendor preset: disabled)
Active: failed (Result: start-limit) since Fri 2020-01-31 17:10:48 UTC; 3s ago
Docs: https://github.com/Griesbacher/nagflux
Process: 10845 ExecStart=/opt/nagflux/nagflux -configPath /opt/nagflux/config.gcfg (code=exited, status=2)
Main PID: 10845 (code=exited, status=2)

Jan 31 17:10:48 ip-172-31-0-145.ap-south-1.compute.internal nagflux[10845]: main.main()
Jan 31 17:10:48 ip-172-31-0-145.ap-south-1.compute.internal nagflux[10845]: /root/gorepo/src/github.com/griesbacher/nagflux/main.go:68 +0x22e
Jan 31 17:10:48 ip-172-31-0-145.ap-south-1.compute.internal systemd[1]: Unit nagflux.service entered failed state.
Jan 31 17:10:48 ip-172-31-0-145.ap-south-1.compute.internal systemd[1]: nagflux.service failed.
Jan 31 17:10:48 ip-172-31-0-145.ap-south-1.compute.internal systemd[1]: nagflux.service holdoff time over, scheduling restart.
Jan 31 17:10:48 ip-172-31-0-145.ap-south-1.compute.internal systemd[1]: Stopped A connector which transforms performancedata from Nagios/Icinga(2)/Naemon to InfluxDB/Elasticsearch.
Jan 31 17:10:48 ip-172-31-0-145.ap-south-1.compute.internal systemd[1]: start request repeated too quickly for nagflux.service
Jan 31 17:10:48 ip-172-31-0-145.ap-south-1.compute.internal systemd[1]: Failed to start A connector which transforms performancedata from Nagios/Icinga(2)/Naemon to InfluxDB/Elasticsearch.
Jan 31 17:10:48 ip-172-31-0-145.ap-south-1.compute.internal systemd[1]: Unit nagflux.service entered failed state.
Jan 31 17:10:48 ip-172-31-0-145.ap-south-1.compute.internal systemd[1]: nagflux.service failed.

cat /var/log/nagflux.log
Feb 1 03:52:06 ip-172-31-0-145 nagflux: panic: invalid variable: section "Livestatus" subsection "" variable "File"
Feb 1 03:52:06 ip-172-31-0-145 nagflux: goroutine 1 [running]:
Feb 1 03:52:06 ip-172-31-0-145 nagflux: github.com/griesbacher/nagflux/config.InitConfig(0x7ffc00839f52, 0x18)
Feb 1 03:52:06 ip-172-31-0-145 nagflux: /root/gorepo/src/github.com/griesbacher/nagflux/config/ConfigProvider.go:18 +0xe7
Feb 1 03:52:06 ip-172-31-0-145 nagflux: main.main()
Feb 1 03:52:06 ip-172-31-0-145 nagflux: /root/gorepo/src/github.com/griesbacher/nagflux/main.go:68 +0x22e
Feb 1 03:52:06 ip-172-31-0-145 nagflux: panic: invalid variable: section "Livestatus" subsection "" variable "File"
Feb 1 03:52:06 ip-172-31-0-145 nagflux: goroutine 1 [running]:
Feb 1 03:52:06 ip-172-31-0-145 nagflux: github.com/griesbacher/nagflux/config.InitConfig(0x7ffe6cf9ef52, 0x18)
Feb 1 03:52:06 ip-172-31-0-145 nagflux: /root/gorepo/src/github.com/griesbacher/nagflux/config/ConfigProvider.go:18 +0xe7
Feb 1 03:52:06 ip-172-31-0-145 nagflux: main.main()
Feb 1 03:52:06 ip-172-31-0-145 nagflux: /root/gorepo/src/github.com/griesbacher/nagflux/main.go:68 +0x22e
Feb 1 03:52:07 ip-172-31-0-145 nagflux: panic: invalid variable: section "Livestatus" subsection "" variable "File"
Feb 1 03:52:07 ip-172-31-0-145 nagflux: goroutine 1 [running]:
Feb 1 03:52:07 ip-172-31-0-145 nagflux: github.com/griesbacher/nagflux/config.InitConfig(0x7ffc5468bf52, 0x18)
Feb 1 03:52:07 ip-172-31-0-145 nagflux: /root/gorepo/src/github.com/griesbacher/nagflux/config/ConfigProvider.go:18 +0xe7
Feb 1 03:52:07 ip-172-31-0-145 nagflux: main.main()
Feb 1 03:52:07 ip-172-31-0-145 nagflux: /root/gorepo/src/github.com/griesbacher/nagflux/main.go:68 +0x22e
Feb 1 03:52:07 ip-172-31-0-145 nagflux: panic: invalid variable: section "Livestatus" subsection "" variable "File"
Feb 1 03:52:07 ip-172-31-0-145 nagflux: goroutine 1 [running]:
Feb 1 03:52:07 ip-172-31-0-145 nagflux: github.com/griesbacher/nagflux/config.InitConfig(0x7fff1dc6bf52, 0x18)
Feb 1 03:52:07 ip-172-31-0-145 nagflux: /root/gorepo/src/github.com/griesbacher/nagflux/config/ConfigProvider.go:18 +0xe7
Feb 1 03:52:07 ip-172-31-0-145 nagflux: main.main()
Feb 1 03:52:07 ip-172-31-0-145 nagflux: /root/gorepo/src/github.com/griesbacher/nagflux/main.go:68 +0x22e
Feb 1 03:52:07 ip-172-31-0-145 nagflux: panic: invalid variable: section "Livestatus" subsection "" variable "File"
Feb 1 03:52:07 ip-172-31-0-145 nagflux: goroutine 1 [running]:
Feb 1 03:52:07 ip-172-31-0-145 nagflux: github.com/griesbacher/nagflux/config.InitConfig(0x7ffcd9ae2f52, 0x18)
Feb 1 03:52:07 ip-172-31-0-145 nagflux: /root/gorepo/src/github.com/griesbacher/nagflux/config/ConfigProvider.go:18 +0xe7
Feb 1 03:52:07 ip-172-31-0-145 nagflux: main.main()
Feb 1 03:52:07 ip-172-31-0-145 nagflux: /root/gorepo/src/github.com/griesbacher/nagflux/main.go:68 +0x22e
Feb 1 04:03:02 ip-172-31-0-145 nagflux: panic: invalid variable: section "Livestatus" subsection "" variable "File"
Feb 1 04:03:02 ip-172-31-0-145 nagflux: goroutine 1 [running]:
Feb 1 04:03:02 ip-172-31-0-145 nagflux: github.com/griesbacher/nagflux/config.InitConfig(0x7fff886e0f52, 0x18)
Feb 1 04:03:02 ip-172-31-0-145 nagflux: /root/gorepo/src/github.com/griesbacher/nagflux/config/ConfigProvider.go:18 +0xe7
Feb 1 04:03:02 ip-172-31-0-145 nagflux: main.main()
Feb 1 04:03:02 ip-172-31-0-145 nagflux: /root/gorepo/src/github.com/griesbacher/nagflux/main.go:68 +0x22e
Feb 1 04:03:02 ip-172-31-0-145 nagflux: panic: invalid variable: section "Livestatus" subsection "" variable "File"
Feb 1 04:03:02 ip-172-31-0-145 nagflux: goroutine 1 [running]:
Feb 1 04:03:02 ip-172-31-0-145 nagflux: github.com/griesbacher/nagflux/config.InitConfig(0x7ffc12375f52, 0x18)
Feb 1 04:03:02 ip-172-31-0-145 nagflux: /root/gorepo/src/github.com/griesbacher/nagflux/config/ConfigProvider.go:18 +0xe7
Feb 1 04:03:02 ip-172-31-0-145 nagflux: main.main()
Feb 1 04:03:02 ip-172-31-0-145 nagflux: /root/gorepo/src/github.com/griesbacher/nagflux/main.go:68 +0x22e
Feb 1 04:03:02 ip-172-31-0-145 nagflux: panic: invalid variable: section "Livestatus" subsection "" variable "File"
Feb 1 04:03:02 ip-172-31-0-145 nagflux: goroutine 1 [running]:
Feb 1 04:03:02 ip-172-31-0-145 nagflux: github.com/griesbacher/nagflux/config.InitConfig(0x7ffe67c3af52, 0x18)
Feb 1 04:03:02 ip-172-31-0-145 nagflux: /root/gorepo/src/github.com/griesbacher/nagflux/config/ConfigProvider.go:18 +0xe7
Feb 1 04:03:02 ip-172-31-0-145 nagflux: main.main()
Feb 1 04:03:02 ip-172-31-0-145 nagflux: /root/gorepo/src/github.com/griesbacher/nagflux/main.go:68 +0x22e
Feb 1 04:03:03 ip-172-31-0-145 nagflux: panic: invalid variable: section "Livestatus" subsection "" variable "File"
Feb 1 04:03:03 ip-172-31-0-145 nagflux: goroutine 1 [running]:
Feb 1 04:03:03 ip-172-31-0-145 nagflux: github.com/griesbacher/nagflux/config.InitConfig(0x7ffea93acf52, 0x18)
Feb 1 04:03:03 ip-172-31-0-145 nagflux: /root/gorepo/src/github.com/griesbacher/nagflux/config/ConfigProvider.go:18 +0xe7
Feb 1 04:03:03 ip-172-31-0-145 nagflux: main.main()
Feb 1 04:03:03 ip-172-31-0-145 nagflux: /root/gorepo/src/github.com/griesbacher/nagflux/main.go:68 +0x22e
Feb 1 04:03:03 ip-172-31-0-145 nagflux: panic: invalid variable: section "Livestatus" subsection "" variable "File"
Feb 1 04:03:03 ip-172-31-0-145 nagflux: goroutine 1 [running]:
Feb 1 04:03:03 ip-172-31-0-145 nagflux: github.com/griesbacher/nagflux/config.InitConfig(0x7fff36b98f52, 0x18)
Feb 1 04:03:03 ip-172-31-0-145 nagflux: /root/gorepo/src/github.com/griesbacher/nagflux/config/ConfigProvider.go:18 +0xe7
Feb 1 04:03:03 ip-172-31-0-145 nagflux: main.main()
Feb 1 04:03:03 ip-172-31-0-145 nagflux: /root/gorepo/src/github.com/griesbacher/nagflux/main.go:68 +0x22e

Thanks in Advance.

Best Regards,

Kaushal

Critical: Connection type is unknown, options are: tcp, file. Input

Hi all,

I have a strange problem with influDB because the database collect the data from nagios but continue to show this message in /var/log/syslog

Jan 13 12:02:55 mi-sg3-nagios-01 nagflux[26725]: #33[31m2019-01-13 13:02:55 Critical: Connection type is unknown, options are: tcp, file. Input:

Could you help me to fix it?

nms@mi-sg3-nagios-01:~$ curl -G "http://localhost:8086/query?db=nagflux&pretty=true" --data-urlencode "q=show series"
{
"results": [
{
"statement_id": 0,
"series": [
{
"columns": [
"key"
],
"values": [
[
"metrics,command=check-host-alive,crit-fill=none,host=mi-sg12-sw-01,performanceLabel=pl,service=hostcheck,unit=%,warn-fill=none"
],
[
"metrics,command=check-host-alive,crit-fill=none,host=mi-sg12-sw-01,performanceLabel=rta,service=hostcheck,unit=ms,warn-fill=none"
],
[
"metrics,command=check-host-alive,crit-fill=none,host=mi-sg12-sw-02,performanceLabel=pl,service=hostcheck,unit=%,warn-fill=none"
],
[
"metrics,command=check-host-alive,crit-fill=none,host=mi-sg12-sw-02,performanceLabel=rta,service=hostcheck,unit=ms,warn-fill=none"
],
[
"metrics,command=check-host-alive,crit-fill=none,host=mi-sg12-sw-04,performanceLabel=pl,service=hostcheck,unit=%,warn-fill=none"
],
[
"metrics,command=check-host-alive,crit-fill=none,host=mi-sg12-sw-04,performanceLabel=rta,service=hostcheck,unit=ms,warn-fill=none"
],
[
"metrics,command=check-host-alive,crit-fill=none,host=mi-sg3-sw-01,performanceLabel=pl,service=hostcheck,unit=%,warn-fill=none"
],
[
"metrics,command=check-host-alive,crit-fill=none,host=mi-sg3-sw-01,performanceLabel=rta,service=hostcheck,unit=ms,warn-fill=none"
],
[
"metrics,command=check-host-alive,crit-fill=none,host=mi-sg3-sw-02,performanceLabel=pl,service=hostcheck,unit=%,warn-fill=none"
],
[
"metrics,command=check-host-alive,crit-fill=none,host=mi-sg3-sw-02,performanceLabel=rta,service=hostcheck,unit=ms,warn-fill=none"
],
[
"metrics,command=check-host-alive,crit-fill=none,host=rm-msc10-sw-01,performanceLabel=pl,service=hostcheck,unit=%,warn-fill=none"
],

Best Regards
CR

README up date: go get deprecated

Hello guys,

In the README.md file it use go get command when it is deprecated. We have to use go install github.com/griesbacher/nagflux@latest instead.

see got get output:

go get -v -u github.com/griesbacher/nagflux
go: go.mod file not found in current directory or any parent directory.
	'go get' is no longer supported outside a module.
	To build and install a command, use 'go install' with a version,
	like 'go install example.com/cmd@latest'
	For more information, see https://golang.org/doc/go-get-install-deprecation
	or run 'go help get' or 'go help install'.

Cheers

[question] nagflux schema in omd

Hello there,

is there anyway to modify the scheme data ist stored in influxdb? Since 0.9 i think are some additional possibilities (sums, filtering, etc. ) is data would be stores in a more structured way - like: http://stackoverflow.com/questions/33997430/obtaining-a-total-of-two-series-of-data-from-influxdb-in-grafana

Like a memory used check in nagios: there will be the used performance values with value, warning, critical and maximum values provided. Currently each are stored in single measurements like hostname&Memory used&check_mk-esx_vsphere_hostsystem.mem_usage&usage&value
Why not storing those values in tags.

Issue with sending data to InfluxDB

Hey Philip,

I'm not sure that I've got a bug or it should work such, but I have following lines in the log:

2017-03-24 16:29:23 Critical: Connection type is unknown, options are: tcp, file. Input:
2017-03-24 16:29:23 Critical: Connection type is unknown, options are: tcp, file. Input:
2017-03-24 16:29:23 Critical: Connection type is unknown, options are: tcp, file. Input:

I've configured to send data from Nagios to InfluxDB. DB was created, but it's empty. My main question is: must I have livestatus installed? can nagflux work without it?

My current configuration is:
nagios-4.3.1-2.el7.centos.x86_64
influxdb-1.2.2-1.x86_64
Nagflux by Philip Griesbacher v0.4.0

Nagflux config is:
[main]
NagiosSpoolfileFolder = "/opt/nagios/var/spool/nagfluxperfdata"
NagiosSpoolfileWorker = 10
InfluxWorker = 10
MaxInfluxWorker = 50
FileBufferSize = 65536
DumpFile = "/opt/nagios/var/log/nagflux/nagflux.dump"
NagfluxSpoolfileFolder = "/opt/nagios/var/nagflux"
FieldSeparator = "&"
BufferSize = 1000
DefaultTarget = "all"

[Log]
LogFile = "/opt/logs/nagios/nagflux.log"
MinSeverity = "DEBUG"

[InfluxDBGlobal]
CreateDatabaseIfNotExists = true
NastyString = ""
NastyStringToReplace = ""
HostcheckAlias = "hostcheck"

[InfluxDB "nagios"]
Enabled = true
Version = 1.2
Address = "http://127.0.0.1:8086"
Arguments = "precision=ms&u=root&p=root&db=nagios"
StopPullingDataIfDown = true

Thanks,
Andrew

nagiosSpoolfileWorker.go: check_command for passive checks

See https://docs.pnp4nagios.org/de/pnp-0.6/advanced

For passive checks, the check_command should be appended to the perfdata string:

DATATYPE::SERVICEPERFDATA        TIMET::1444918417       HOSTNAME::sakuli_client SERVICEDESC::example_ubuntu     SERVICEPERFDATA::suite__state=0;;;; suite__warning=25s;;;; suite__critical=28s;;;; suite_example_ubuntu=22.74s;25;28;; c_001__state=0;;;; c_001__warning=27s;;;; c_001__critical=30s;;;; c_001_case1=15.21s;27;30;; s_001_001_Test_Sahi_landing_page=1.05s;10;;; s_001_002_Calculation=7.05s;20;;; s_001_003_Editor=1.35s;30;;; [check_sakuli]  SERVICECHECKCOMMAND::check_service!flap HOSTSTATE::UP   HOSTSTATETYPE::HARD     SERVICESTATE::OK        SERVICESTATETYPE::HARD

See also check_multi....

Certificate Authority

InfluxDB map[string]*struct {

If the influxdb connection is secured by TLS, i guess we need an option to provide CA certificates. For example to provide a company's own CA. This may also be true for ElasticSearch.

Error in config.gcfg.example

With NagiosSpoolfileFolder = "/var/spool/nagios" - we got broken nagios, lost /var/spool/nagios/cmd, /var/spool/nagios/checkresults etc, need use other path. And better rename parameter.

Errors in nagios:
Error in configuration file '/etc/nagios/nagios.cfg' - Line 453 (Check result path '/var/spool/nagios/checkresults' is not a valid directory)
and then broken start with line in log
qh: Failed to init socket '/var/spool/nagios/cmd/nagios.qh'. bind() failed: No such file or directory

using nagflux with nagios NPCDMOD

we are using Nagios Core 3.5.1, with NPCD and NPCDMOD for pnp4nagios. as far as I understood npcdmod will write perfdata to /var/spool/pnp4nagios. and npcd will process it for graphing.

Is it possible to make npcdmod to write another spool ? . so that i can use it for nagflux.

or any other solutions make nagflux work in this mode ?

I have tried adding another npcmod.o broker moduler in /etc/nagios.cfg , but it is breaking pnp4nagios.

broker_module=/usr/lib64/nagios/brokers/npcdmod.o config_file=/etc/nagflux/npcd.cfg

please share your ideas on it.

Nagflux saves data incorrectly into InfluxDB 2.0 - still problems with check_multi

Hi,

i got some problemes with check_multi & nagflux:

Here the performance data:

check_multi::check_multi::plugins=1 time=0.053427 check_multi_extended::check_multi_extended::count_ok=1 count_warning=0 count_critical=0 count_unknown=0 overall_state=0 'influxdb_disk::check_influxdb::data'=5873575B;;;0; 'db'=0B;;;0; 'meta'=703B;;;0; 'raft'=0B;;;0; 'wal'=32464726B;;;0; '_internal'=4575006B;;;0; 'graphite'=29937B;;;0; 'nagflux'=1268632B;;;0; 

screen shot 2016-12-12 at 09 49 39

and the result in InfluxDB:

> select time,service,performanceLabel,value from metrics where time < now() and time > now() - 1m and service = 'test_influx_disk_multi' ;
name: metrics
-------------
time			service			performanceLabel						value
2016-12-12T08:47:38Z	test_influx_disk_multi	check_multi::check_multi::plugins				1.268632e+06
2016-12-12T08:47:38Z	test_influx_disk_multi	influxdb_disk::check_influxdb::wal				1.268632e+06
2016-12-12T08:47:38Z	test_influx_disk_multi	influxdb_disk::check_influxdb::raft				1.268632e+06
2016-12-12T08:47:38Z	test_influx_disk_multi	influxdb_disk::check_influxdb::nagflux				1.268632e+06
2016-12-12T08:47:38Z	test_influx_disk_multi	influxdb_disk::check_influxdb::meta				1.268632e+06
2016-12-12T08:47:38Z	test_influx_disk_multi	influxdb_disk::check_influxdb::graphite				1.268632e+06
2016-12-12T08:47:38Z	test_influx_disk_multi	influxdb_disk::check_influxdb::db				1.268632e+06
2016-12-12T08:47:38Z	test_influx_disk_multi	influxdb_disk::check_influxdb::data				1.268632e+06
2016-12-12T08:47:38Z	test_influx_disk_multi	influxdb_disk::check_influxdb::_internal			1.268632e+06
2016-12-12T08:47:38Z	test_influx_disk_multi	check_multi_extended::check_multi_extended::overall_state	1.268632e+06
2016-12-12T08:47:38Z	test_influx_disk_multi	check_multi_extended::check_multi_extended::count_warning	1.268632e+06
2016-12-12T08:47:38Z	test_influx_disk_multi	check_multi_extended::check_multi_extended::count_unknown	1.268632e+06
2016-12-12T08:47:38Z	test_influx_disk_multi	check_multi_extended::check_multi_extended::count_ok		1.268632e+06
2016-12-12T08:47:38Z	test_influx_disk_multi	check_multi_extended::check_multi_extended::count_critical	1.268632e+06
2016-12-12T08:47:38Z	test_influx_disk_multi	check_multi::check_multi::time					1.268632e+06

It looks like nagflux stores the last perf. value to all fields/values in the influxdb.

Thanks
Robert

Accessing livestatus data

Hello,

I'm currently using nagflux together with Checkmk Raw edition (which uses a Nagios core) which works quite cool.

How is the livestatus data gathered with nagflux?

On the Checkmk console I can query the API to gather livestatus data:

OMD[MYSITE]:~$ lq "GET hosts\nColumns: name"
myserver1
myserver2
myserver3
...

But how can I do this on the Grafana/Influx side? Is this data generally stored?

The data I want to gather is how many hosts, how many services, etc... not data from a single host but from the whole monitoring.

Regards
SchiSchi

"Line does not match the schememap", no data written to InfluxDB

Nagflux says "Info: Line does not match the schememap" for all perfdata, and no measurements are written to InfluxDB.

No host_perfdata_file_template and service_perfdata_file_template defined in nagios.cfg:

2016-12-15 18:29:20 Info: Line does not match the schememap[0.202: USERS OK - 0 users currently logged in: users=0;5;10;0: [SERVICEPERFDATA]: 1481821918: myhost.mydomain: Check Users: 0.039:]

With

host_perfdata_file_template=DATATYPE::HOSTPERFDATA\tTIMET::$TIMET$\tHOSTNAME::$HOSTNAME$\tHOSTPERFDATA::$HOSTPERFDATA$\tHOSTCHECKC
OMMAND::$HOSTCHECKCOMMAND$\tHOSTSTATE::$HOSTSTATE$\tHOSTSTATETYPE::$HOSTSTATETYPE$\tHOSTOUTPUT::$HOSTOUTPUT$
service_perfdata_file_template=DATATYPE::SERVICEPERFDATA\tTIMET::$TIMET$\tHOSTNAME::$HOSTNAME$\tSERVICEDESC::$SERVICEDESC$\tSERVIC
EPERFDATA::$SERVICEPERFDATA$\tSERVICECHECKCOMMAND::$SERVICECHECKCOMMAND$\tHOSTSTATE::$HOSTSTATE$\tHOSTSTATETYPE::$HOSTSTATETYPE$\t
SERVICESTATE::$SERVICESTATE$\tSERVICESTATETYPE::$SERVICESTATETYPE$\tSERVICEOUTPUT::$SERVICEOUTPUT$
2016-12-16 19:11:16 Info: Line does not match the schememap[[HOSTPERFDATA]: 1481822728: myhost.mydomain: 0.023: PING OK - Packet loss = 0%, RTA = 0.31 ms: rta=0.310000ms;5000.000000;5000.000000;0.000000 pl=0%;100;100;0:]

With

host_perfdata_file_template=[HOSTPERFDATA]\t$TIMET$\t$HOSTNAME$\t$HOSTEXECUTIONTIME$\t$HOSTOUTPUT$\t$HOSTPERFDATA$
service_perfdata_file_template=[SERVICEPERFDATA]\t$TIMET$\t$HOSTNAME$\t$SERVICEDESC$\t$SERVICEEXECUTIONTIME$\t$SERVICELATENCY$\t$SERVICEOUTPUT$\t$SERVICEPERFDATA$
2016-12-16 19:19:54 Info: Line does not match the schememap[0.184: OK - load average: 0.00, 0.01, 0.05: load1=0.000;5.000;10.000;0; load5=0.010;4.000;6.000;0; load15=0.050;3.000;4.000;0;: [SERVICEPERFDATA]: 1481915962: localhost: Current Load: 0.005:]

This is a manual install (go get).
Nagios 3 is used.
InfluxDB v1.1.0 is running on a remote machine as a Docker container.

No error when wrong influxdb credentials

When the credentials in the nagflux config for accessing influxdb are wrong then there is nothing in the logs.

Example:
If the config for nagflux looks like the following and there is no account with name admin and pw admin then nagflux seems to fail silently.

[InfluxDB "nagflux"]
    Enabled = true
    Version = 1.0
    Address = "http://127.0.0.1:8086"
    Arguments = "precision=ms&u=admin&p=admin&db=nagflux"
    StopPullingDataIfDown = true

Even setting log level of nagflux to debug does not show anything.

Project is dead and disfunctional!

This project is dead and does not work anymore. I would greatly appreciate this fact to be stated here and a link to the successor. Unfortunately there is no working copy of this project anywhere to be found and my understanding/skill of go is too limited to step in here.

I hope this message saves others the time that I wasted in trying to get this thing to run. Bummer.

panic: runtime error: index out of range

Naglfux crashes and fails to process host perfdata. Occasionaly I'm getting:

doesn't contain all of these fields: [table time]

servmon1:root:/etc/nagflux> go version
go version go1.5.1 linux/amd64

Nagios 4.1.1

I've attached my naglux config and host perfdata file it failed to process.

config.gcfg.txt
host-perfdata.1473250552.txt

Nagios perfdata definition and process command:

perfdata.cfg.txt

2016-09-07 13:15:59 Info: Nagios Spoolfile Folder: /var/spool
2016-09-07 13:15:59 Info: Nagflux Spoolfile Folder: /var/spool/nagflux
panic: runtime error: index out of range

goroutine 68 [running]:
github.com/griesbacher/nagflux/collector/nagflux.FileCollector.parseFile(0xc82023e070, 0xc8200d1b60, 0xc82000f4c0, 0x12, 0xc8200d1b30, 0x26, 0xc820166570, 0x22, 0x0, 0x0, ...)
        /root/gorepo/src/github.com/griesbacher/nagflux/collector/nagflux/nagfluxFileCollector.go:90 +0x149b
github.com/griesbacher/nagflux/collector/nagflux.FileCollector.run(0xc82023e070, 0xc8200d1b60, 0xc82000f4c0, 0x12, 0xc8200d1b30, 0x26)
        /root/gorepo/src/github.com/griesbacher/nagflux/collector/nagflux/nagfluxFileCollector.go:54 +0x23c
created by github.com/griesbacher/nagflux/collector/nagflux.NewNagfluxFileCollector
        /root/gorepo/src/github.com/griesbacher/nagflux/collector/nagflux/nagfluxFileCollector.go:34 +0x108

goroutine 1 [runnable]:
main.main()
        /root/gorepo/src/github.com/griesbacher/nagflux/main.go:128 +0x14c7

goroutine 17 [syscall, locked to thread]:
runtime.goexit()
        /usr/lib/golang/src/runtime/asm_amd64.s:1696 +0x1

goroutine 5 [syscall]:
os/signal.loop()
        /usr/lib/golang/src/os/signal/signal_unix.go:22 +0x18
created by os/signal.init.1
        /usr/lib/golang/src/os/signal/signal_unix.go:28 +0x37

goroutine 19 [select]:
github.com/griesbacher/nagflux/target/influx.Worker.run(0x1, 0xc820176180, 0xc8201682a0, 0xc8200db4a0, 0xc82006acd0, 0x41, 0xc82016c320, 0x13, 0x7fc7588742f0, 0xc82016c300, ...)
        /root/gorepo/src/github.com/griesbacher/nagflux/target/influx/Worker.go:84 +0x53a
created by github.com/griesbacher/nagflux/target/influx.WorkerGenerator.func1
        /root/gorepo/src/github.com/griesbacher/nagflux/target/influx/Worker.go:63 +0x4fa

goroutine 18 [select]:
github.com/griesbacher/nagflux/target/influx.Worker.run(0x0, 0xc820176120, 0xc820168230, 0xc8200db4a0, 0xc82006acd0, 0x41, 0xc82016c2e0, 0x13, 0x7fc7588742f0, 0xc82016c300, ...)
        /root/gorepo/src/github.com/griesbacher/nagflux/target/influx/Worker.go:84 +0x53a
created by github.com/griesbacher/nagflux/target/influx.WorkerGenerator.func1
        /root/gorepo/src/github.com/griesbacher/nagflux/target/influx/Worker.go:63 +0x4fa

goroutine 20 [chan receive]:
github.com/griesbacher/nagflux/target/influx.(*Connector).run(0xc820120000)
        /root/gorepo/src/github.com/griesbacher/nagflux/target/influx/Connector.go:140 +0x4c
created by github.com/griesbacher/nagflux/target/influx.ConnectorFactory
        /root/gorepo/src/github.com/griesbacher/nagflux/target/influx/Connector.go:86 +0xbeb

goroutine 35 [select]:
github.com/griesbacher/nagflux/collector/livestatus.Collector.run(0xc82015e0e0, 0xc8200d1b60, 0xc820150210, 0xc8200d1b30, 0x8d17c0, 0x73)
        /root/gorepo/src/github.com/griesbacher/nagflux/collector/livestatus/Collector.go:100 +0x1b9
created by github.com/griesbacher/nagflux/collector/livestatus.NewLivestatusCollector
        /root/gorepo/src/github.com/griesbacher/nagflux/collector/livestatus/Collector.go:85 +0x257

goroutine 36 [select]:
github.com/griesbacher/nagflux/collector/livestatus.(*CacheBuilder).run(0xc820150540, 0x6fc23ac00)
        /root/gorepo/src/github.com/griesbacher/nagflux/collector/livestatus/CacheBuilder.go:68 +0x228
created by github.com/griesbacher/nagflux/collector/livestatus.NewLivestatusCacheBuilder
        /root/gorepo/src/github.com/griesbacher/nagflux/collector/livestatus/CacheBuilder.go:50 +0x180

goroutine 66 [select]:
github.com/griesbacher/nagflux/collector/spoolfile.(*NagiosSpoolfileWorker).run(0xc820228080)
        /root/gorepo/src/github.com/griesbacher/nagflux/collector/spoolfile/nagiosSpoolfileWorker.go:72 +0xc95
created by github.com/griesbacher/nagflux/collector/spoolfile.NagiosSpoolfileWorkerGenerator.func1
        /root/gorepo/src/github.com/griesbacher/nagflux/collector/spoolfile/nagiosSpoolfileWorker.go:56 +0x79

goroutine 67 [syscall]:
syscall.Syscall(0xd9, 0x5, 0xc820245000, 0x1000, 0x10, 0x441bc5, 0x715c40)
        /usr/lib/golang/src/syscall/asm_linux_amd64.s:18 +0x5
syscall.Getdents(0x5, 0xc820245000, 0x1000, 0x1000, 0x64, 0x0, 0x0)
        /usr/lib/golang/src/syscall/zsyscall_linux_amd64.go:508 +0x5f
syscall.ReadDirent(0x5, 0xc820245000, 0x1000, 0x1000, 0x0, 0x0, 0x0)
        /usr/lib/golang/src/syscall/syscall_linux.go:770 +0x4d
os.(*File).readdirnames(0xc820238008, 0xffffffffffffffff, 0xc820256000, 0x0, 0x64, 0x0, 0x0)
        /usr/lib/golang/src/os/dir_unix.go:39 +0x215
os.(*File).Readdirnames(0xc820238008, 0xffffffffffffffff, 0x0, 0x0, 0x0, 0x0, 0x0)
        /usr/lib/golang/src/os/doc.go:134 +0x85
os.(*File).readdir(0xc820238008, 0xffffffffffffffff, 0x0, 0x0, 0x0, 0x0, 0x0)
        /usr/lib/golang/src/os/file_unix.go:179 +0xb3
os.(*File).Readdir(0xc820238008, 0xffffffffffffffff, 0x0, 0x0, 0x0, 0x0, 0x0)
        /usr/lib/golang/src/os/doc.go:115 +0x85
io/ioutil.ReadDir(0xc8200b4db0, 0xa, 0x0, 0x0, 0x0, 0x0, 0x0)
        /usr/lib/golang/src/io/ioutil/ioutil.go:105 +0xcc
github.com/griesbacher/nagflux/collector/spoolfile.(*NagiosSpoolfileCollector).run(0xc820228040)
        /root/gorepo/src/github.com/griesbacher/nagflux/collector/spoolfile/nagiosSpoolfileCollector.go:61 +0x2d3
created by github.com/griesbacher/nagflux/collector/spoolfile.NagiosSpoolfileCollectorFactory
        /root/gorepo/src/github.com/griesbacher/nagflux/collector/spoolfile/nagiosSpoolfileCollector.go:38 +0x21b

goroutine 69 [select, locked to thread]:
runtime.gopark(0x8ce7e0, 0xc8201adf28, 0x825770, 0x6, 0x42f118, 0x2)
        /usr/lib/golang/src/runtime/proc.go:185 +0x163
runtime.selectgoImpl(0xc8201adf28, 0x0, 0x18)
        /usr/lib/golang/src/runtime/select.go:392 +0xa64
runtime.selectgo(0xc8201adf28)
        /usr/lib/golang/src/runtime/select.go:212 +0x12
runtime.ensureSigM.func1()
        /usr/lib/golang/src/runtime/signal1_unix.go:227 +0x353
runtime.goexit()
        /usr/lib/golang/src/runtime/asm_amd64.s:1696 +0x1

goroutine 70 [chan receive]:
main.main.func2(0xc820212180, 0xc82000fec0, 0xc820150240, 0xc820150540, 0xc820228040, 0xc82022a0c0, 0xc8200d1b60)
        /root/gorepo/src/github.com/griesbacher/nagflux/main.go:119 +0x48
created by main.main
        /root/gorepo/src/github.com/griesbacher/nagflux/main.go:124 `+0x13fe

Nagios Spool Location

Hi,

I have Nagios installed through Check_MK Raw, and I installed Grafana and InfluxDB separately.

Every time I try and launch nagflux it does not seem to do anything. I am assuming that the NagiosSpoolfileFolder is incorrect in the config.

is there a way to tell what directory nagflux requires? I can't find a spool value anywhere in Nagios, and I'm unsure of what the contents should be.

Below is the output I get when launching Nagflux:

`
Info: Started Nagflux v0.3.0
Info: Is InfluxDB running: true
Info: Could not detect livestatus version, waiting for 1m0s 2 times( 0/2 )...
Info: Could not detect livestatus version, waiting for 1m0s 2 times( 1/2 )...
Info: Livestatus version:
Warn: Could not detect livestatus type, with version: . Asuming Nagios

`

Thank you

manual setup problems

Hi,

since I'm not using OMD I would like to setup nagflux manually. Running Centos 6 final x64, Nagios Core 3.5.1, Check_MK, installed Golang 1.5.1.

Tried to run below command which stucks at last line:
[root@nagios~]# go get -v -u github.com/griesbacher/nagflux
github.com/griesbacher/nagflux (download)
Fetching https://gopkg.in/gcfg.v1?go-get=1
Parsing meta tags from https://gopkg.in/gcfg.v1?go-get=1 (status code 200)
get "gopkg.in/gcfg.v1": found meta tag main.metaImport{Prefix:"gopkg.in/gcfg.v1", VCS:"git", RepoRoot:"https://gopkg.in/gcfg.v1"} at https://gopkg.in/gcfg.v1?go-get=1
gopkg.in/gcfg.v1 (download)

Since I couldn't find a solution for this problem I then did download latest omd-2.01* and found all config files and tried to set them to configure to run it as it should:
-created /var/nagflux, /var/log/nagflux/nagflux.log
-created and edited /etc/logrotate.d/nagflux
-created and edited /etc/init.d/nagflux
-added action_url to my service & host template
-created database called "nagflux" in influxdb

Data is filling in InfluxDB and if I go to Grafana dash I can create new pannel and show mesaured data.

The problem that I cannot resolve is connected with Histou - action url from my host and service is showing graphs but not content during an error. I did check histou.php (i.e. /histou/index.php?host=nagios-ipo&service=CPU%20load) which shows the same data for every service - it allways returns data for check_mk service:

....
[rows] => Array
(
[0] => Array
(
[title] => nagios-ipo CPU load check-mk
[editable] => 1
[height] => 400px
[panels] => Array
(
[0] => Array
(
[title] => nagios-ipo CPU load check-mk children_system_time
...

The problem is that the command name and performancelabel are incorrect. If I go to influx cli, and use nagflux and perform command:
select * from metrics where host='nagios-ipo-trap' and service='CPU load' limit 1

name: metrics

time command crit crit-fill host max min performanceLabel service unit value warn warn-fill
1461240500000000000 check_mk-cpu.loads 20 none nagios-ipo-trap 2 0 load1 CPU load 10 none

You can see that che command should be check_mk-cpu.loads and the performanceLabel should be load1. Since I had to manually create database named "nagflux" I'm not really sure that something is not missing.

Can you help me?

BR, Peter

work with InfluxDB 2.x ?

Hi, we tried OMD and setup our own installation of Naemon/Nagflux/Influxdb/Grafana. They work properly.
Both OMD and our installation are using InfluxDB 1.8 and hope to upgrade to InfluxDB 2.4. Seems 1.8 and 2.4 differ a lot. Does Nagflux support InfluxDB 2.4 ?
Thanks a lot.

PNP Converter ?

Hello,

is there any chance we get an converter to move all pnp rrd's to influx ?

Best regards,
Carsten

[Question] Can nagflux and pnp4nagios use same spool folder / perf data ?

In the manual setup instructions - https://exchange.nagios.org/directory/Documentation/Nagios-with-InfluxDB,-nagflux-and-Grafana/details, I can see there are additional steps to make a copy of the spool folder so that pnp4nagios and nagflux use different folders.

Whereas when enabling the nagflux using latest version of omd (omd config set NAGFLUX on), I can see that both pnp4nagios and nagflux are using the same spool folders and both the graphs are working fine.
Can nagflux and pnp4nagios use same spool folder ?

Support for CentOS8 RHEL8 Rocky8

It appears that 'go get' is no longer supported due to changes with golang.

The replacement command would be 'go install' which would replace both get & build.

e.g.

go install github.com/griesbacher/nagflux@latest

Unfortunately, this fails for me with the below error due to changes to prometheus.

go/pkg/mod/github.com/griesbacher/[email protected]/statistics/prometheus.go:110:39: undefined: prometheus.Handler

My understanding is that this line now needs to read promhttp.Handler not prometheus.Handler

Also, the import statement in this go file needs editing to include the prometheus promhttp module.

I have been unable to get this working as I have little experience with go projects.

Nagios high cpu - high cached log messages

OMD-Labs nightly from 24-3 (Nagflux version: 0.40)

We are having issues with high cpu usage (similar to #3). After a while (couple of minutes) Nagios cpu usage goes up to 100%, and log's take a long time to load (Check_MK Events of recent 4 hours).

Nagflux Log:

^[[35m2017-03-28 14:56:23 Warn: connectToLivestatus timed out
^[[35m2017-03-28 14:57:03 Warn: connectToLivestatus timed out
^[[35m2017-03-28 14:57:43 Warn: connectToLivestatus timed out
^[[35m2017-03-28 14:57:43 Warn: Livestatus timed out... (Collector.queryData())

I have set the cache to 2 million, and this is filled up right away.
We are checking 422 hosts w/ 7680 services

Nagflux with Non-Admin User on InfluxDB

I tried to get nagflux working with a non admin user, it fails because the first query is show databases, which requires admin access.

[httpd] 127.0.0.1 - nagflux [07/Apr/2017:13:26:35 +0200] "GET /query?db=nagflux&p=%5BREDACTED%5D&precision=ms&q=show+databases&u=nagflux HTTP/1.1" 403 133 "-" "Go-http-client/1.1" 0ebf918f-1b85-11e7-8104-000000000000 494

It would be nice to have an option to skip the show databases part.

panic: counter cannot decrease in value

Hello!

Well i'm using OMD-Labs and everthing went smoothly until 5 days ago when sudenly naglfux stopped writing data to influxdb. There is some erros on naglux's logs but i can't seem to understand whats happening, maybe you can help me.

Here is the log part just after a service reset:

2017-04-24 10:18:37 Info: Started Nagflux v0.2.7
2017-04-24 10:18:37 Info: Is InfluxDB running: true
2017-04-24 10:18:37 Info: Could not detect livestatus version, waiting for 1m0s, three times(0)...
2017-04-24 10:19:32 Info: Livestatus version: 1.0.4
2017-04-24 10:19:32 Warn: Could not detect livestatus type, with version: 1.0.4 asuming Nagios
2017-04-24 10:19:32 Info: Nagios Spoolfile Folder: /omd/sites/bennercloud/var/pn p4nagios/spool
2017-04-24 10:19:32 Info: Nagflux Spoolfile Folder: /omd/sites/bennercloud/var/n agflux
panic: counter cannot decrease in value

goroutine 38 [running]:
panic(0x703100, 0xc422107dc0)
/opt/projects/omd/rpm.topdir/BUILD/omd-2.20-labs-edition/packages/go-1.7 /go-1.7.3/src/runtime/panic.go:500 +0x1a1
github.com/griesbacher/nagflux/vendor/github.com/prometheus/client_golang/promet heus.(*counter).Add(0xc420109c00, 0xc079600000000000)
/opt/projects/omd/rpm.topdir/BUILD/omd-2.20-labs-edition/packages/nagflu x/go/src/github.com/griesbacher/nagflux/vendor/github.com/prometheus/client_gola ng/prometheus/counter.go:71 +0xb3
github.com/griesbacher/nagflux/collector/spoolfile.(*NagiosSpoolfileWorker).run( 0xc42015e540)
/opt/projects/omd/rpm.topdir/BUILD/omd-2.20-labs-edition/packages/nagflu x/go/src/github.com/griesbacher/nagflux/collector/spoolfile/nagiosSpoolfileWorke r.go:112 +0x981
created by github.com/griesbacher/nagflux/collector/spoolfile.NagiosSpoolfileWor kerGenerator.func1
/opt/projects/omd/rpm.topdir/BUILD/omd-2.20-labs-edition/packages/nagflu x/go/src/github.com/griesbacher/nagflux/collector/spoolfile/nagiosSpoolfileWorke r.go:61 +0x121

Thanks,
Thiago!

[question] using OMD with nagflux grafana influxdb

Hello,

I am using OMD with nagflux grafana and influxdb and experiencing very strange problem that any operations regarding log files takes very long time to respond. (i.e. in Thruk clicking Notifications, Alerts, Availability etc.) takes like 400 seconds to list Notifications. When i restart naemon/nagios/icinga (does not matter which core i use) then it is back to normal for 5 minutes and then again it takes 400 seconds or so to list Notifications or Alerts. After I disable either nagflux or influxdb then it is back to normal fast response time. How can it be related? what nagflux does to interfere with logs processing? Any thoughts would be helpful. thanks

Elastic 6.x Support

I have been unable to get Nagflux to post data to an Elastic 6.5 cluster. All i see in my ES logs is Nagflux getting _template, but nothing else. I tried posting the template manually and it has more than 1 type, which is no longer allowed in ES 6.x.
Are there any plans to maintain ES compatibility?

Weird performanceLabel

Context
OMD - Open Monitoring Distribution Version 2.11.20161101-labs-edition
CentOS 7

After a full restart of omd, there was some weird entries in influxdb:

1478000205000000000 check-https centurion 0 ,241334s;;;0,000000 size google B 11142
1478000205000000000 check-https centurion time google 0
1478000265000000000 check-https centurion 0 ,268444s;;;0,000000 size google B 11111
1478000265000000000 check-https centurion time google 0

The check behind is this one (a standard check_http from omd):
/omd/sites/recette/lib/nagios/plugins/check_http --hostname www.google.fr --ssl

There are the only fourth strange lines, anything else is ok.
nagflux.log looks ok at this time.

jfr

Panic: runtime error

Hi,

sometimes nagflux crashes on some notification logs.

2015-12-23 18:11:13 Warn: Query failed while csv parsing:GET log\x10Columns: type time contact_name message\x10Filter: type ~ .*NOTIFICATION\x10Filter: time < 1450890493\x10Negate:\x10OutputFor mat: csv\x10\x10

2015-12-23 18:11:13 Warn: SERVICE NOTIFICATION;1450890539;basta;[1450890539] SERVICE NOTIFICATION: basta;SYNO BACKUP;HEALTH;OK;service-notify-by-email;OK - Synology "DS414slim" (s/n: "14A0MF N175700", "DSM 5.2-5644") is in good health 2015-12-23 18:11:13 Warn: line 1, column 142: bare " in non-quoted-field panic: runtime error: index out of range

I read Go language but didn't find out the problem.

OMD[basta]:~$ omd version
OMD - Open Monitoring Distribution Version 2.10-labs-edition

jfr

"error":"partial write:\nunable to parse

Hi,
in my nagflux log I see high amount of "partial write:\nunable to parse" errors.
The NRPE Command output format of the check itself looks fine for me:

OK: All drives within bounds.|'C:\ %'=74%;92;95 'C:'=8.90799G;11.04;11.4;0;12

The line looks like this:
[35m2017-01-13 02:02:42 Warn: Influx status: 400 Bad Request - {"error":"partial write:\nunable to parse 'metrics,host=109_TOC,service=HardDrives,command=check_nrpe,performanceLabel=C:\\,warn-fill=none,crit-fill=none,unit=G value=8.90499,warn=11.04,crit=11.4,min=0.0,max=12.0 1484262144000': invalid tag format\nunable to parse 'metrics,host=103_1_VR,service=HardDrives,command=check_nrpe,performanceLabel=C:\\,warn-fill=none,crit-fill=none,unit=G value=9.179,warn=27.6,crit=28.5,min=0.0,max=30.0 1484262144000':.....

in Thruk the value is shown but the performance graph not for the second value.
I see, that after the "unit=G" value there is no "," inserted... is this the issue ?

Using version v0.2.7 together with OMD
Any Idea?
best regards Martin

Install error

Hello,
we get the following error when installing:

xxxx@xxxxx:/tmp# go get -u github.com/griesbacher/nagflux

github.com/griesbacher/nagflux/target/elasticsearch

go/src/github.com/griesbacher/nagflux/target/elasticsearch/Connector.go:38:29: error: unknown field ‘Timeout’ in ‘http.Client’
false, false, http.Client{Timeout: time.Duration(5 * time.Second)},

github.com/griesbacher/nagflux/target/influx

go/src/github.com/griesbacher/nagflux/target/influx/Connector.go:46:24: error: unknown field ‘Timeout’ in ‘http.Client’
client := http.Client{Timeout: timeout}

Any solutions?

Best regards
Patrick

No data sent to influxdb

Hi, we're using Nagflux 0.4.1, InfluxDB 1.8, and Naemon following steps here https://support.nagios.com/kb/article/nagios-core-performance-graphs-using-influxdb-nagflux-grafana-histou-802.html#Nagflux_Config.

When starting Nagflux, the db can be created in influxDB. However, no data is sent to influxDB.

We found that Nagflux keeps starting (always sees Started Nagflux) and read on the same performance data file /var/spool/nagflux/1663209466.perfdata.host which is the first one in the spool directory:

--- cut here ---
2022-09-15 17:35:51 Info: Started Nagflux v0.4.1
2022-09-15 17:35:51 Debug: Using Config: /usr/local/nagflux/config.gcfg
2022-09-15 17:35:51 Info: serving prometheus metrics at :8080/metrics
2022-09-15 17:35:51 Info: Is InfluxDB(nagflux) running: true
2022-09-15 17:35:51 Debug: Influxdb(nagflux) is running
2022-09-15 17:35:51 Debug: Dumpfile: nagflux.dump-nagflux.influx not found, skipping... (Everything is fine)
2022-09-15 17:35:51 Debug: DumpfileCollector stopped
2022-09-15 17:35:53 Info: Setting Livestatus version to: Naemon
2022-09-15 17:35:53 Info: Nagios Spoolfile Folder: /var/lib/naemon
2022-09-15 17:35:53 Info: Nagflux Spoolfile Folder: /var/spool/nagflux
2022-09-15 17:35:58 Debug: Reading Directory: /var/lib/naemon
2022-09-15 17:35:58 Debug: Reading file: /var/lib/naemon/naemon.cmd
2022-09-15 17:35:58 Debug: Reading file: /var/spool/nagflux/1663209466.perfdata.host

2022-09-15 17:35:58 Info: Started Nagflux v0.4.1
2022-09-15 17:35:58 Debug: Using Config: /usr/local/nagflux/config.gcfg
2022-09-15 17:35:58 Info: serving prometheus metrics at :8080/metrics
2022-09-15 17:35:58 Info: Is InfluxDB(nagflux) running: true
2022-09-15 17:35:58 Debug: Influxdb(nagflux) is running
2022-09-15 17:35:58 Debug: Dumpfile: nagflux.dump-nagflux.influx not found, skipping... (Everything is fine)
2022-09-15 17:35:58 Debug: DumpfileCollector stopped
2022-09-15 17:36:00 Info: Setting Livestatus version to: Naemon
2022-09-15 17:36:00 Info: Nagios Spoolfile Folder: /var/lib/naemon
2022-09-15 17:36:00 Info: Nagflux Spoolfile Folder: /var/spool/nagflux
2022-09-15 17:36:05 Debug: Reading Directory: /var/lib/naemon
2022-09-15 17:36:05 Debug: Reading file: /var/lib/naemon/host-perfdata
2022-09-15 17:36:05 Debug: Reading file: /var/lib/naemon/naemon.cmd
2022-09-15 17:36:05 Debug: Reading file: /var/spool/nagflux/1663209466.perfdata.host

2022-09-15 17:36:05 Info: Started Nagflux v0.4.1
2022-09-15 17:36:05 Debug: Using Config: /usr/local/nagflux/config.gcfg
2022-09-15 17:36:05 Info: serving prometheus metrics at :8080/metrics
2022-09-15 17:36:05 Info: Is InfluxDB(nagflux) running: true
2022-09-15 17:36:05 Debug: Influxdb(nagflux) is running
2022-09-15 17:36:05 Debug: Dumpfile: nagflux.dump-nagflux.influx not found, skipping... (Everything is fine)
2022-09-15 17:36:05 Debug: DumpfileCollector stopped
2022-09-15 17:36:07 Info: Setting Livestatus version to: Naemon
2022-09-15 17:36:07 Info: Nagios Spoolfile Folder: /var/lib/naemon
2022-09-15 17:36:07 Info: Nagflux Spoolfile Folder: /var/spool/nagflux
2022-09-15 17:36:12 Debug: Reading Directory: /var/lib/naemon
2022-09-15 17:36:12 Debug: Reading file: /var/lib/naemon/naemon.cmd
2022-09-15 17:36:12 Debug: Reading file: /var/spool/nagflux/1663209466.perfdata.host
--- cut here ---

Since it's a testing Naemon, there is only 1 host with few monitors. Most of the performance data file are empty:

--- cut here ---

ls -l /var/spool/nagflux

total 972
-rw-r--r--. 1 naemon naemon 0 Sep 15 10:37 1663209466.perfdata.host
-rw-r--r--. 1 naemon naemon 0 Sep 15 10:37 1663209481.perfdata.host
-rw-r--r--. 1 naemon naemon 0 Sep 15 10:38 1663209496.perfdata.host
-rw-r--r--. 1 naemon naemon 0 Sep 15 10:38 1663209511.perfdata.host
-rw-r--r--. 1 naemon naemon 0 Sep 15 10:38 1663209526.perfdata.host
-rw-r--r--. 1 naemon naemon 0 Sep 15 10:38 1663209541.perfdata.host
-rw-r--r--. 1 naemon naemon 0 Sep 15 10:39 1663209556.perfdata.host
-rw-r--r--. 1 naemon naemon 0 Sep 15 10:39 1663209571.perfdata.host
-rw-r--r--. 1 naemon naemon 0 Sep 15 10:39 1663209586.perfdata.host
-rw-r--r--. 1 naemon naemon 567 Sep 15 10:39 1663209601.perfdata.host
-rw-r--r--. 1 naemon naemon 418 Sep 15 10:40 1663209616.perfdata.host
-rw-r--r--. 1 naemon naemon 0 Sep 15 10:40 1663209631.perfdata.host
-rw-r--r--. 1 naemon naemon 539 Sep 15 10:40 1663209646.perfdata.host
-rw-r--r--. 1 naemon naemon 0 Sep 15 10:40 1663209661.perfdata.host
-rw-r--r--. 1 naemon naemon 0 Sep 15 10:41 1663209676.perfdata.host
-rw-r--r--. 1 naemon naemon 0 Sep 15 10:41 1663209691.perfdata.host
[snipped]
--- cut here ---

Out Nagflux file looks like this:

--- cut here ---
[main]
NagiosSpoolfileFolder = "/var/lib/naemon"
NagiosSpoolfileWorker = 1
InfluxWorker = 2
MaxInfluxWorker = 5
DumpFile = "nagflux.dump"
NagfluxSpoolfileFolder = "/var/spool/nagflux"
FieldSeparator = "&"
BufferSize = 10000
FileBufferSize = 65536
# If the performancedata does not have a certain target set with NAGFLUX:TARGET.
# The following field will define the target for this data.
# "all" sends the data to all Targets(every Influxdb, Elasticsearch...)
# a certain name will direct the data to this certain target
DefaultTarget = "all"

[Log]
# leave empty for stdout
LogFile = "/usr/local/nagflux/nagflux.log"
# List of Severities https://godoc.org/github.com/kdar/factorlog#Severity

MinSeverity = "INFO"

MinSeverity = "DEBUG"

[Monitoring]
# leave empty to disable
# PrometheusAddress = ":8080"
PrometheusAddress = ":8080"

[Livestatus]
# tcp or file
Type = "file"
# tcp: 127.0.0.1:6557 or file /var/run/live
Address = "/var/cache/naemon/live"
# The amount to minutes to wait for livestatus to come up, if set to 0 the detection is disabled
MinutesToWait = 2
# Set the Version of Livestatus. Allowed are Nagios, Icinga2, Naemon.
# If left empty Nagflux will try to detect it on it's own, which will not always work.
Version = "Naemon"

[ModGearman "example"] #copy this block and rename it to add a second ModGearman queue
Enabled = false
Address = "127.0.0.1:4730"
Queue = "perfdata"
# Leave Secret and SecretFile empty to disable encryption
# If both are filled the the Secret will be used
# Secret to encrypt the gearman jobs
Secret = ""
# Path to a file which holds the secret to encrypt the gearman jobs
SecretFile = "/etc/mod-gearman/secret.key"
Worker = 1

[InfluxDBGlobal]
CreateDatabaseIfNotExists = true
NastyString = ""
NastyStringToReplace = ""
HostcheckAlias = "hostcheck"

[InfluxDB "nagflux"]
Enabled = true
Version = 1.0
Address = "http://wb3.mydomain.hk:8086"
Arguments = "precision=ms&u=nagflux&p=abcd1234&db=nagflux"
StopPullingDataIfDown = true

[InfluxDB "fast"]
Enabled = false
Version = 1.0
Address = "http://127.0.0.1:8086"
Arguments = "precision=ms&u=root&p=root&db=fast"
StopPullingDataIfDown = false

[ElasticsearchGlobal]
HostcheckAlias = "hostcheck"
NumberOfShards = 1
NumberOfReplicas = 1
# Sorts the indices "monthly" or "yearly"
IndexRotation = "monthly"

[Elasticsearch "example"]
Enabled = false
Address = "http://localhost:9200"
Index = "nagflux"
Version = 2.1

[JSONFileExport "one"]
Enabled = false
Path = "export/json"
# Timeinterval in Seconds till a new file will be used. 0 for no rotation.
# If no rotation is selected, the JSON Objects are appended line by line so,
# every single line is valid JSON but the whole file not.
# If rotation is selected every file as whole is valid JSON.
AutomaticFileRotation = "10"
--- cut here ---

I'm afraid if we've something missed. Please help.

Thanks a lot.

Service checked from Nagios are not stored in influx DB

Hello @Griesbacher
First of all thanks for your amazing plugin! please could you help me with this issue: I installed nagflux with this guide i think that all is OK but i get the error: Database Error: No data found when i integrated Grafana / Histou into the Nagios Core web interface but only with th SERVICE with the host all are OK.

I started to debug and i think that nagflux is unable to insert any metric of SERVICES on the database "nagflux" when a query with:
select time,host,service,performanceLabel,value,warn,crit from metrics

so then histou can't graphited any service on grafana.

Heres is my config file of nagflux:

/opt/nagflux/config.gcfg
[main]
NagiosSpoolfileFolder = "/usr/local/nagios/var/spool/nagfluxperfdata"
NagiosSpoolfileWorker = 10
InfluxWorker = 10
MaxInfluxWorker = 50
DumpFile = "nagflux.dump"
NagfluxSpoolfileFolder = "/usr/local/nagios/var/nagflux"
FieldSeparator = "&"
BufferSize = 10000
FileBufferSize = 65536
DefaultTarget = "all"

[Log]
LogFile = "/opt/logs/nagios/nagflux.log"
MinSeverity = "DEBUG"

[InfluxDBGlobal]
CreateDatabaseIfNotExists = true
NastyString = ""
NastyStringToReplace = ""
HostcheckAlias = "hostcheck"
ServicecheckAlias = "servicecheck"

[InfluxDB "nagflux"]
Enabled = true
Version = 1.0
Address = "http://127.0.0.1:8086"
Arguments = "precision=ms&u=root&p=root&db=nagflux"
StopPullingDataIfDown = true

[InfluxDB "fast"]
Enabled = false
Version = 1.0
Address = "http://127.0.0.1:8086"
Arguments = "precision=ms&u=root&p=root&db=fast"
StopPullingDataIfDown = false

any advice is very welcome! thanks for your time!

Regards

Problem for parsing performance data

Hi,
I have performance data with 'μs' unit and nagflux doesn't parse this data correctly as below:

check_nrpe -H HOST_SERVER -c check_perfs OK | getItinerary_min=34385μs getItinerary_avg=130925μs getItinerary_max=267719μs

metrics,host=HOST_SERVERt,service=Perfs,command=check_perfs,performanceLabel=μs\ getItinerary_min value=26875.0 1490878500000

Nothing in influx DB

Hi,

I'm trying to install nagflux on my nagios machine.

Here is my parameters :

`[root@JXT-NAGIOS ~]# cat config.gcfg
[main]
NagiosSpoolfileFolder = "/usr/local/nagios/var/spool/"
NagiosSpoolfileWorker = 1
InfluxWorker = 2
MaxInfluxWorker = 5
DumpFile = "nagflux.dump"
NagfluxSpoolfileFolder = "/usr/local/nagios/var/spool/nagflux"
FieldSeparator = "&"
BufferSize = 10000
FileBufferSize = 65536
# If the performancedata does not have a certain target set with NAGFLUX:TARGET.
# The following field will define the target for this data.
# "all" sends the data to all Targets(every Influxdb, Elasticsearch...)
# a certain name will direct the data to this certain target
DefaultTarget = "all"

[Log]
#leave empty for stdout
LogFile = ""
#List of Severities https://godoc.org/github.com/kdar/factorlog#Severity
MinSeverity = "DEBUG"

[InfluxDBGlobal]
CreateDatabaseIfNotExists = true
NastyString = ""
NastyStringToReplace = ""

[Livestatus]
# tcp or file
Type = "tcp"
# tcp: 127.0.0.1:6557 or file /var/run/live
Address = "127.0.0.1:6557"
# The amount to minutes to wait for livestatus to come up, if set to 0 the detection is disabled
MinutesToWait = 2
# Set the Version of Livestatus. Allowed are Nagios, Icinga2, Naemon.
# If left empty Nagflux will try to detect it on it's own, which will not always work.
Version = "Nagios"

[InfluxDB "nagflux"]
Enabled = true
Address = "http://192.168.2.116:8086"
Arguments = "precision=ms&db=nagflux"
StopPullingDataIfDown = true
`
When i launch nagflux, everything seem to be okay :

[root@JXT-NAGIOS ~]# ./nagflux 2017-09-13 17:22:38 Info: Started Nagflux v0.4.1 2017-09-13 17:22:38 Debug: Using Config: config.gcfg 2017-09-13 17:22:38 Info: Is InfluxDB(nagflux) running: true 2017-09-13 17:22:38 Debug: Influxdb(nagflux) is running 2017-09-13 17:22:38 Debug: Dumpfile: nagflux.dump-nagflux.influx not found, skipping... (Everything is fine) 2017-09-13 17:22:38 Debug: DumpfileCollector stopped 2017-09-13 17:22:40 Info: Setting Livestatus version to: Nagios 2017-09-13 17:22:40 Info: Nagios Spoolfile Folder: /usr/local/nagios/var/spool/ 2017-09-13 17:22:40 Info: Nagflux Spoolfile Folder: /usr/local/nagios/var/spool/nagflux 2017-09-13 17:22:45 Debug: Reading Directory: /usr/local/nagios/var/spool/ 2017-09-13 17:22:50 Debug: Reading Directory: /usr/local/nagios/var/spool/ 2017-09-13 17:22:55 Debug: Reading Directory: /usr/local/nagios/var/spool/ 2017-09-13 17:23:00 Debug: Reading Directory: /usr/local/nagios/var/spool/
But obviously, i have no data in my influxdb base ..
influxDB is 0.9.5.

Can you help me ?

Nagflux saves data incorrectly into InfluxDB

I'm using Nagflux (v0.2.9) with InfluxDB 1.1.0-1, and I have problems with check_snmp_load.pl performance data. While in Thruk and Nagflux logs I can see the correct data, when it's saved into InfluxDB data by Nagflux all load_*_min gets the same value (also for warn and crit thresholds)

Thruk shows:
load_1_min=0.08;5.75;14.75 load_5_min=0.08;5.60;14.60 load_15_min=0.05;5.50;14.50

Nagflux shows in its log:

2016-12-02 08:11:37 Debug: [ModGearman] map[SERVICEDESC:[1][LOAD] SERVICEPERFDATA:load_1_min=0.08;5.75;14.75 load_5_min=0.08;5.60;14.60 load_15_min=0.05;5.50;14.50 SERVICECHEC
KCOMMAND:check_load_solaris_linux_by_snmp!5.75,5.60,5.50!14.75,14.60,14.50 SERVICESTATE:0 SERVICESTATETYPE:1
SERVICEINTERVAL::1.000000

DATATYPE:SERVICEPERFDATA TIMET:1480662697 HOSTNAME:server]

But when I check InfluxDB database, load_1_min, load_5_min and load_15_min show same values registered (those for load_15_min):

> select time,host,service,performanceLabel,value,warn,crit from metrics where host='server' and command='check_load_solaris_linux_by_snmp' and time = 1480662697000000000
name: metrics
time			host		service		performanceLabel	value	warn	crit
----			----		-------		----------------	-----	----	----
1480662697000000000	server		[1][LOAD]	load_15_min		0.05	5.5	14.5
1480662697000000000	server		[1][LOAD]	load_5_min		0.05	5.5	14.5
1480662697000000000	server		[1][LOAD]	load_1_min		0.05	5.5	14.5

Also happens with check_wmi_plus.pl (mode checkeachcpu) which returns perfdata as:

'Avg Utilisation CPU0'=0.2%;90;95; 'Avg Utilisation CPU1'=0.6%;90;95; 'Avg Utilisation CPU2'=1.9%;90;95; 'Avg Utilisation CPU3'=0.2%;90;95; 'Avg Utilisation CPU_Total'=0.7%;90;95;

Other checks, like check_icmp (returning rta and pl values) save data correctly, so I guess it's a problem with similar performanceLabel.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.