Git Product home page Git Product logo

nagios-jenkins-plugin's Introduction

Overview

This repostitory contains three nagios plugins:

  • check_jenkins_job_extended.pl - The original, as documented below. Designed to check for failures, not how long since success.
  • check_jenkins_cron.pl - A from-scratch copy designed to check jobs that should build periodically.
  • check_jenkins_nodes.pl - Checks the number of nodes with a status of "offline".

check_jenkins_cron.pl

Usage

usage: ./check_jenkins_cron.pl -j <job> -l <url> -w <threshold> -c <threshold> [-f] [-u username -p password] [-v]

    Required arguments
        -j <job>        : Jenkins job name
                          The name of the job to examine.

        -l <url>        : Jenkins URL
                          Protocol assumed to be http if none specified.

        -w <threshold>  : Warning Threshold (seconds)
                          WARNING when the last successful run was over <threshold> seconds ago.
                          CRITICAL when last successful run was over <threshold> and failures
                          have occured since then.

        -c <threshold>  : Critical Threshold (seconds)
                          CRITICAL when the last successful run was over <threshold> seconds ago.

    Optional arguments
        -f              : WARNING when the last run was not successful, even if the last
                          successful run is within the -w and -c thresholds.

        -u <username>   : Jenkins Username if anonymous API access is not available

        -p <password>   : Jenkins Password if anonymous API access is not available

        -v              : Increased verbosity.
                          This will confuse nagios, and should only be used for debug purposes
                          when testing this plugin.

Sample nagios configuration

Command definition

define command {
  command_name    check_jenkins_cron
  command_line    $USER1$/check_jenkins_cron.pl -j '$ARG1$' -l $ARG2$ -w $ARG3$ -c $ARG4$ -f -u $ARG5$ -p $ARG6$
}

Service definition to warn when a job hasn't built for 24 hours, and crit when it hasn't built for 36 hours.

define service {
  use                             local-service
  host_name                       buildserver.mycompany.com
  service_description             Jenkins - prod build
  check_interval                  1
  check_command                   check_jenkins_cron!Producuction build!buildserver.mycompany.com!86400!129600!myuser!mypassword
  contacts                        bob,bill
}

nagios-jenkins-plugin (check_jenkins_job_extended.pl)

A nagios plugin for which lets you check jenkins jobs according to various criteria.

How to use it

The plugin supports several options, which you can pass "0" to disable that particular threshold.

Usage: check_jenkins_job_extended url jobname concurrentFailsThreshold buildDurationThresholdMilliseconds lastStableBuildThresholdInMinutesWarn lastStableBuildThresholdInMinutesCrit

  • url: The URL to your jenkins server

  • username: The username for auth to your jenkins server [optional]

  • password: The password for auth to your jenkins server [optional]

  • jobname: The name of the jenkins job you'd like to check

  • concurrentFailsThreshold: The number of concurrent failing builds it should CRIT alert on

  • buildDurationThresholdMilliseconds: It will alert if the last build took longer than this number of milliseconds to complete

  • lastStableBuildThresholdInMinutesWarn: WARN if it's been this number of minutes since the last stable build

  • lastStableBuildThresholdInMinutesCrit: CRIT if it's been this number of minutes since the last stable build

Example

A sample nagios command using this plugin.

define command {
  command_name    check_jenkins_job_ext
  command_line    $USER1$/check_jenkins_job_extended.pl $ARG1$ $ARG2$ $ARG3$ $ARG4$ $ARG5$ $ARG6$ $ARG7$ $ARG8$
}

A sample nagios service using the above command to warn when it's been 4 mins since the last stable build, and crit when it's been 20.

define service {
  use                             local-service
  host_name                 	    buildserver.mycompany.com
  service_description             Jenkins - prod build
  check_interval                  1
  check_command                   check_jenkins_job_ext!http://buildserver.mycompany.com!prod!0!0!4!20
  contacts						bob,bill
}

check_jenkins_nodes.pl

Usage

Usage: check_jenkins_nodes.pl -s [jenkins server hostname & path] -w [integer or %] -c [integer or %] [-h this help message] [-u username] [-p password] [-v]

Required Arguments:
    -s <server hostname>    : jenkins CI server hostname

    -c <threshold>          : integer or percentage (ex: 2 or 50%)
                              CRITICAL if <threshold> nodes or greater are offline

    -w <threshold>          : integer or percentage (ex: 2 or 50%)
                              WARNING if <threshold> nodes or greater are offline

Optional arguments

    -h This help message

    -p <password>           : password to the jenkins CI server

    -u <username>           : username to the jenkins CI server

    -v verbose output

Command definition

define command{
	command_name    check_jenkins_nodes
	command_line    $USER1$/check_jenkins_nodes.pl -s$ARG1$ -u$ARG2$ -p$ARG3$ -w$ARG4$ -c$ARG5$
}

Service definition to warn when a job hasn't built for 24 hours, and crit when it hasn't built for 36 hours.

define service {
  use                             local-service
  host_name                       buildserver.mycompany.com
  service_description             Jenkins - node check
  check_interval                  1
  check_command                   check_jenkins_nodes!https://buildserver.mycompany.com!myuser!mypassword!2!51%
  contacts                        bob,bill
}

nagios-jenkins-plugin's People

Contributors

alex-murygin avatar bohdyone avatar davestern avatar honggoff avatar jonlives avatar nickrw avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

nagios-jenkins-plugin's Issues

Bug when using the "-f" switch

Hello,

when is use
./check_jenkins_cron.pl -j Test -l https://1.1.1.1/ -w 86400 -c 172800 -f

i get back:
WARNING - 'Test' failed 45y 5mo 2w 4d 7h 44m 57s ago. OMD[tf]:~/local/lib/nagios/plugins$

please help

check_jenkins_nodes.pl Does not work with Jenkins ver. 1.624

It seems that output in api has changed. Now instead of
if($computer->{'offline'} eq 'true') should be if ($computer->{'offline'} eq '1') or even if(($computer->{'offline'} eq 'true') || ($computer->{'offline'} eq '1')) to get it to work.

no authorization support

Hi

I foud your plugin usefull in my current project for monitoring jenkins jobs. Unfortunately my jenkins server requires authorization.
I will try to prepare simple fix and send you a pull request.

br,

Michał

Failure with port != 80 in URL

Changed
my $jobStatusUrlPrefix = $ciMasterUrl . "/job/" . uri_escape($jobName);
for
my $jobStatusUrlPrefix = "http://" . $ciMasterUrl . "/job/" . uri_escape($jobName);

to solve issue when making the request to server in IP:PORT format

reconsider job_name "syntax" for icinga2 and foldered Jobs

Hi,

atm ( after Pullrequest #21 ) you can configure icinga2 with something like

object CheckCommand "jenkins-job-check-anon" {
     import "ipv4-or-ipv6"
     command = [ CustomPluginDir + "/check_jenkins_job_extended.pl" ]
     arguments = {
         "url" = {
             skip_key = true
             value = "$url$"
             order = 1
         }
         "job_name" = {
             skip_key = true
             value = "$job_name$"
             order = 2
         }
         "concurrent_fails_threshold" = {
             skip_key = true
             value = "$concurrent_fails_threshold$"
             order = 3
         }
        <asf>

and

apply Service for (jenkins_job => config in host.vars.jenkins_jobs_anon) {
     import "generic-service"
     check_command = "jenkins-job-check-anon"
     action_url = config.url+"/job/"+config.job_path
     vars += config
}

with

object Host {
    ...
    vars.jenkins_jobs_anon [ "Jenkins: jobname" ] = {
        url="https://jenkins.domain.tld",
        job_name="folder/jobname",
        concurrent_fails_threshold = 1,
        job_path="folder/job/jobname"
    }
}

... in a very nice and generic way (only have to add a jenkins-job-line for every job directly to the host).

But if the plugin is changed to require the exact path-part of the URL leading to the Jenkins-Job itself (e.g. folder/job/jobname) instead of folder/jobname, one could also create the action_url more nicely, right?
e.g.

    vars.jenkins_jobs_anon [ "Jenkins: jobname" ] = {
        url="https://jenkins.domain.tld",
        job_name="folder/job/jobname",
        concurrent_fails_threshold = 1,
    }

in the Host-Object (saving one parameter) and

     action_url = config.url+"/"+config.job_path

this should also work for multiple folder-Levels, not?

What's your opinion about that?

cheers

Slave checks

Really useful plugin so far, thanks for creating this. Working with the code, I recently contributed #7 and #8 and now have the need to check if our slaves are online. Cool to write a new script and PR for this (check_jenkins_slaves.pl)? If not, no problem, I'll do it separately.

Proposed:
Warning & critical: % of slaves out of total offline OR an integer for absolute quantities

Example:

./check_jenkins_slaves.pl -w 50% -c 90% -l etc.
OR
./check_jenkins_slaves.pl -w 2 -c 1 -l etc.

Folders plugin causes problem due to URI Encoding

If I try to make a call with a job in the format of the API using the folders plugin it gets a 404 error.

This is due to the URL encoding changing the / into %2F

I'll do a pull request for the fix, but since I'm unsure if URL Encoding is needed or not I made this change:

$jobnameU = uri_escape($opts{j});
$jobnameU =~ s/%2F/\//g;

Thank you for the handy plugin,

Steve Radich
BitShop.com

Returns OK if build fails for more than 24 hours

I found a small issue here, line 189:

my $tdiff = $dt - $bts;
my $tmin = ($tdiff->hours * 60) + $tdiff->minutes;

Here, the calculation for $tmin has a hole: there are also days, months and years values in the DateTime::Duration hash, so if your build fails for 1 day + 1 minute, it will say "OK, 1 minute(s). ", when it should say 1441.

The best way I could find to calculate the date difference that includes all units and wraps around is:

my $tmin = int($dt->subtract_datetime_absolute( $bts )->delta_seconds / 60);

I just noticed my builds had been failing for a month (semi-forgotten project), so it's time to monitor that - thanks for the plugin. After that change, the plugin now gets the number of minutes between now and the first failed build correct and says critical.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.