Git Product home page Git Product logo

check_ibm_storwize's Introduction

IBM Storwize and FlashSystem Monitoring plugin

Checks overal health of IBM FlashSystem family devices (formerly Storwize). Includes storage, network and cluster status.

https://en.wikipedia.org/wiki/IBM_FlashSystem

https://en.wikipedia.org/wiki/IBM_Storwize

Updated version

Main additions and changes:

  • script modified for standard Nagios
  • supports V7000 (Gen3) and iSCSI
  • tested plugin with Spectrum Virtualize upgrade v8.3.1 code (and previous versions)
  • template cfg files adapted to work as best posible with Nagios Core
  • configurable critical/warning thresholds, with defaults

Screenshot

Original script was made for "Shinken", a Nagios rewrite. This version is a fork, for more information see: docs/Sources.md.

Installation

Requirements

  • Nagios, Icinga or other compatible monitoring system
  • Perl 5, if not already included with OS (apt or yum install perl)
  • open CIM port (TCP/5989) and a WBEM/CIM client (apt or yum install sblim-wbemcli)
  • a "monitor" user account to log in to storage device (create in GUI or mkuser in CLI)

Download

If you dont want to use git, get a tarball or zip archive. Or, if you only want the script get libexec/check_ibm_storwize.pl.

Script

The main updated Perl script is located in "libexec" dir. Copy it to the host where you want to run it and which has access to the IBM storage device, for example your Nagios server.

Example

check_ibm_storwize.pl -H ibm01.example.com -P 5988 -u nagios -p <PASSWORD> -C StorageVolume

Usage

check_ibm_storwize.pl -h


IBM Storwize & FlashSystem health status plugin for Nagios (v20221223-mk)
Needs 'wbemcli' to query the Storwize Arrays CIMOM server

Usage: check_ibm_storwize.pl [-h] -H host [-P port] -u user -p password -C check [-c crit] [-w warn]

Flags:

    -C check    Check to run. Currently available checks:

                Array, ArrayBasedOnDiskDrive*, BackendVolume, Cluster, ConcreteStoragePool**,
                DiskDrive, Enclosure, EthernetPort, FCPort, IOGroup*, IsSpare, MasterConsole,
                MirrorExtent, Node, QuorumDisk, StorageVolume**
                BackendController, BackendTargetSCSIProtocolEndpoint, FCPortStatistics
                IPProtocolEndpoint, iSCSIProtocolEndpoint*, ProtocolController*, RemoteCluster,
                HostCluster

    -h          Print this help message
    -H host     Hostname of IP of the SVC cluster
    -P port     CIMOM port on the SVC cluster
    -p          Password for CIMOM access on the SVC cluster
    -u          User with CIMOM access on the SVC cluster
    -c crit     Critical threshold as <n> NOK items or as % (only for checks with '*' or '**')
    -w warn     Warning threshold as <n> NOK items or as % (only for checks with '*' or '**')
    -s skip     Skip element(s) using regular expression
    -b bytes    Do not convert bytes to MiB GiB TiB

Defaults

  • CIMON port 5989 (TLS)
  • Convert bytes to MiB GiB TiB is enabled

Check thresholds:

  • ArrayBasedOnDiskDrive - Spares: 0 ("no Spare", omit)
  • ConcreteStoragePool - PhysicalCapacity: WARN at 80% usage, CRITICAL at 90%
  • IOGroup - FreeMemory: 0 Bytes (omit)
  • iSCSIProtocolEndpoint: WARN at 1 port down, 2 or more is CRITICAL
  • ProtocolController: WARN at 3 hosts down, 4 or more is CRITICAL
  • StorageVolume - Capacity: WARN at 85% usage, CRITICAL at 95%

The bold numbers can be changed with -c and -w. If percentage -c 100 is set, the plugin will warn only.

These checks will WARN if more than half of total items are down: BackendVolume, EthernetPort and FCPort.

Nagios templates

All nagios config file examples are now in "etc" dir. Use as you see fit.

  • etc/objects/commands.cfg (check_commands)
  • etc/objects/discovery.cfg (mgmt https check)
  • etc/objects/template.cfg (host, user, password)
  • etc/objects/timeperiods.cfg (schedule, retries)
  • etc/objects/services/*.cfg

More information

See: docs/README.md

check_ibm_storwize's People

Contributors

mkorthof avatar

Stargazers

 avatar

Watchers

 avatar  avatar  avatar  avatar

check_ibm_storwize's Issues

faulty node canister

Hello Marius,

It's me-oct again.
Unfortunately (or fortunately for testing the script), we have got a faulty node canister now. The node #2 is currently off-line.
I have tested the latest script, "v20221223-mk", on the current faulty system and found that the FCPort command would not report an error.

FC PORT OK - NOK:0/OK:4/Stopped:4/Total:8 - 50050768030EB346(unconf_inactive) 500507680312B346(unconf_inactive) 50050768030EB347(unconf_inactive) 500507680312B347(unconf_inactive)|nok=0;;;; ok=4;;;; Stopped=4;;;; total=8;;;;

I have also tried the command execution;

wbemcli -noverify -nl ei https://XXX:IBMTSSVC_FCPort

The output includes two of '-StatusDescriptions="Port configured inactive"'.
I wonder why the script doesn't consider them as error.

The 'Enclosure' command seems to work as expected.

ENCLOSURE CRITICAL - NOK:1/OK:0/Total:1 - Enc_1,SN:7887117(degraded,Canister:2/2,PSU:2/2)|nok=1;;;; ok=0;;;; total=1;;;;

Your comment would be much appreciated.

Thank you,

Judgement condition of ConcreteStoragePool of version v20230128-mk

Hi Marius,
Thank you for your continuous update of the script.
I have been testing behavior of the latest version v20230128-mk. I have found a problem with the command 'ConcreteStoragePool'.
Here is a result of the command.

INFO: missing argument "-c crit", using default value '80'
INFO: missing argument "-w warn", using default value '90'
STORAGE POOL CRITICAL - NOK:0/OK:1/Total:1 |Pool-00=100%;;;; used=12TiB;;;; total=12TiB;;;; mdisks=3;;;; vols=2;;;;

There seems to be a wrong judgement condition around the line 880, I think.
Please could you investigate this problem?

Many thanks.

checking Fibre-Channel port

Thank you for providing such a useful script.
I don't know if the script is still maintained, but we run this script on the Storwize V3700 with Nagios to monitor the healthiness of the unit.

Recently we had a problem with the fibre-channel port on the V3700, but the script did not detect the error.
The problem was the faulty SFP converter, which disabled one of the fibre-channel ports. The Web GUI manager said that the fibre-channel was not able to operate. The 'lsportfc' command showed that the port status was 'inactive_configured'.
The script checks only the 'OperationalStatus' for Fibre-Channel and would considers the status as OK when the value is 2 (OK) or 10 (Stopped). I guess the OperationalStatus was 10, but the 'StatusDescriptions" attribute was something like 'Port configured inactive' at the time. Unfortunately we did not have a chance to check if this was the case.

Any advice would be very helpful.
Many thanks.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.