Git Product home page Git Product logo

lausser / check_hpasm Goto Github PK

View Code? Open in Web Editor NEW
16.0 16.0 18.0 446 KB

A plugin (monitoring-plugin, not nagios-plugin, see also http://is.gd/PP1330) which checks the hardware health of HP Proliant Servers. (May also be used for other devices which implement the CPQHLTH mib)

Home Page: http://labs.consol.de/nagios/check_hpasm/

License: GNU General Public License v2.0

Shell 4.30% Perl 94.12% Awk 0.30% Makefile 0.79% M4 0.49%

check_hpasm's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

check_hpasm's Issues

Problem with Power Supplies Temperature -99 C

Hi,

I have a problem with check_hpasm with proliant gen9.

I have critical error with Power Supplies sensors.

Error Message :
CRITICAL - 11 powerSupply temperature too high (36C, -99 max), 12 powerSupply temperature too high (42C, -99 max), 19 powerSupply temperature too high (40C, -99 max), 20 powerSupply temperature too high (44C, -99 max), System: 'proliant dl380 gen9',

Rewards,
Romuald

When cache battery is failed with cable error, it is reported as notPresent and returns status OK

When a cache battery is failed with "Cache Status Details: Cable Error", the Battery status line disappears. Check_hpasm then assumes it's notPresent and returns OK when it should return Critical/Warning

Failing battery but not failed:
check_hpasm:
WARNING - controller accelerator battery recharging

ssacli output:
Smart Array P441 in Slot 1
Bus Interface: PCI
Slot: 1
Serial Number: XXXXXX
Cache Serial Number: XXXXXX
RAID 6 (ADG) Status: Enabled
Controller Status: OK
Hardware Revision: B
Firmware Version: 5.04-0
Rebuild Priority: High
Expand Priority: Medium
Surface Scan Delay: 3 secs
Surface Scan Mode: Idle
Parallel Surface Scan Supported: Yes
Current Parallel Surface Scan Count: 1
Max Parallel Surface Scan Count: 16
Queue Depth: Automatic
Monitor and Performance Delay: 60 min
Elevator Sort: Enabled
Degraded Performance Optimization: Disabled
Inconsistency Repair Policy: Disabled
Wait for Cache Room: Disabled
Surface Analysis Inconsistency Notification: Disabled
Post Prompt Timeout: 15 secs
Cache Board Present: True
Cache Status: Temporarily Disabled
Cache Status Details: Cache disabled; battery/capacitor charge is low.
Cache Ratio: 10% Read / 90% Write
Drive Write Cache: Disabled
Total Cache Size: 4.0 GB
Total Cache Memory Available: 3.2 GB
No-Battery Write Cache: Disabled
SSD Caching RAID5 WriteBack Enabled: True
SSD Caching Version: 2
Cache Backup Power Source: Batteries
Battery/Capacitor Count: 1
Battery/Capacitor Status: Recharging
SATA NCQ Supported: True
Spare Activation Mode: Activate on physical drive failure (default)
Controller Temperature (C): 74
Cache Module Temperature (C): 43
Number of Ports: 2 External only
Encryption: Disabled
Express Local Encryption: False
Driver Name: hpsa
Driver Version: 3.4.18
Driver Supports SSD Smart Path: True
PCI Address (Domain:Bus:Device.Function): 0000:05:00.0
Negotiated PCIe Data Rate: PCIe 3.0 x8 (7880 MB/s)
Controller Mode: RAID
Pending Controller Mode: RAID
Latency Scheduler Setting: Disabled
Current Power Mode: MaxPerformance
Survival Mode: Enabled
Host Serial Number: XXXXX
Sanitize Erase Supported: True
Primary Boot Volume: None
Secondary Boot Volume: None

Failed battery:
check_hpasm:
OK

ssacli output:
Smart Array P441 in Slot 1
Bus Interface: PCI
Slot: 1
Serial Number: XXXXXX
Cache Serial Number: XXXXXX
RAID 6 (ADG) Status: Enabled
Controller Status: OK
Hardware Revision: B
Firmware Version: 5.04-0
Rebuild Priority: High
Expand Priority: Medium
Surface Scan Delay: 3 secs
Surface Scan Mode: Idle
Parallel Surface Scan Supported: Yes
Current Parallel Surface Scan Count: 1
Max Parallel Surface Scan Count: 16
Queue Depth: Automatic
Monitor and Performance Delay: 60 min
Elevator Sort: Enabled
Degraded Performance Optimization: Disabled
Inconsistency Repair Policy: Disabled
Wait for Cache Room: Disabled
Surface Analysis Inconsistency Notification: Disabled
Post Prompt Timeout: 15 secs
Cache Board Present: True
Cache Status: Permanently Disabled
Cache Status Details: Cable Error
Cache Ratio: 10% Read / 90% Write
Drive Write Cache: Disabled
Total Cache Size: 4.0 GB
Total Cache Memory Available: 3.2 GB
No-Battery Write Cache: Disabled
SSD Caching RAID5 WriteBack Enabled: True
SSD Caching Version: 2
SATA NCQ Supported: True
Spare Activation Mode: Activate on physical drive failure (default)
Controller Temperature (C): 70
Cache Module Temperature (C): 41
Number of Ports: 2 External only
Encryption: Disabled
Express Local Encryption: False
Driver Name: hpsa
Driver Version: 3.4.18
Driver Supports SSD Smart Path: True
PCI Address (Domain:Bus:Device.Function): 0000:05:00.0
Negotiated PCIe Data Rate: PCIe 3.0 x8 (7880 MB/s)
Controller Mode: RAID
Pending Controller Mode: RAID
Latency Scheduler Setting: Disabled
Current Power Mode: MaxPerformance
Survival Mode: Enabled
Host Serial Number: XXXXX
Sanitize Erase Supported: True
Primary Boot Volume: None
Secondary Boot Volume: None

check_hpasm has issues with memory when using SNMPv3

When switching check_hpasm from SNMPv2 to SNMPv3, it stops working properly: there are two uninitialized variables errors.
The SNMP target is a Windows 2008 host. The system where check_hpasm is running is on RHEL6. I'm using v4.7.1.1.
snmpwalk works fine using either version of the protocol, and I don't see any obvious differences between the results.

$ /usr/lib64/nagios/plugins/check_hpasm --protocol=3  -H server -t 60 --username=user --authpassword=password --privpassword=password --authprotocol=sha --privprotocol=aes 
Use of uninitialized value in numeric lt (<) at /usr/lib64/nagios/plugins/check_hpasm line 2894.
Use of uninitialized value $hpboard in numeric ne (!=) at /usr/lib64/nagios/plugins/check_hpasm line 2900.
WARNING - status of all 0 dimms is n/a (please upgrade firmware), System: 'proliant dl380p gen8', S/N: 'CZxxxxxWN', ROM: 'P70 07/01/2015'

Proliant Gen11 problem with cpu and drive status

Hi, not sure if this is still actively maintained but anyway :)

Gen11 Proliant reports the following

CRITICAL - cpu 1 needs attention (failed), physical drive 1:1 is other, physical drive 1:2 is other, System: 'proliant dl380 gen11', S/N: 'CZ23250HWX', ROM: 'U54 v1.30'

This is an VMware ESXi host and neither iLO nor VMware itself are reporting any issues.

We have two other identical Gen11, all of them are reporting the drives as "other", only one thinks the CPU failed.
These are Intel(R) Xeon(R) Gold 6458Q CPUs, where cores can be disabled to increase the CPU base frequency.
32 cores => 3Ghz
24 cores => 3,5Ghz
16 cores => 4Ghz

This one is configured with 24 cores, so 8 are disabled
The one with 16 cores does not have a problem with the cpu according to check_hpasm

I did not yet check the snmp output, but i can provide it if needed

HP models with hardware RAID but without ILO

Hi,

We have some old HP models (ProLiant DL 185 G5) that have an hardware RAID card but no real ILO interface (LO100 is only IPMI).

hpacucli works fine on these server.

It will be great if check_hpasm has an option to only enable hpasmcli (or SNMP informations that corresponds to the Drive Array MIB).

Thanks

WARNING - status of all 0 dimms is n/a (please upgrade firmware),

I have one dl380g7 that wont work with check_hpasm.
The only difference against the others are that I updated to a new kernel from the repo.

./check_hpasm --hostname foo.bar --community private --timeout=10 
Use of uninitialized value in lc at ./check_hpasm line 2379.
Use of uninitialized value in lc at ./check_hpasm line 2380.
Use of uninitialized value in lc at ./check_hpasm line 1662.
Use of uninitialized value in lc at ./check_hpasm line 4057.
Use of uninitialized value in lc at ./check_hpasm line 4058.
Use of uninitialized value in lc at ./check_hpasm line 803.
Use of uninitialized value in lc at ./check_hpasm line 804.
WARNING - status of all 0 dimms is n/a (please upgrade firmware), System: 'proliant dl380 g7', S/N: 'CZxxxxxxx', ROM: 'P67 08/16/2015'

Same config and firmware as the one that works..
Do you any way of debugging this problem?

~/src/check_hpasm-4.7.3.1/plugins-scripts$ ./check_hpasm --version
check_hpasm 4.7.3.1 [http://labs.consol.de/nagios/check_hpasm]

Regards Falk

Blacklisting of devicenames with letters does not blacklist

We have a Proliant server where physical drives are named like 1E:2:4.
If we try to blacklist this device with --blacklist=dapd:1e:2:4 it wont work due to that is_blacklisted() uses a regexp which doesnt check for letters (HP/Server.pm line 296)

Spare disks cause critical warning

Disks that are marked as spare in an array cause a critical return code.

This is caused because their cpqDaPhyDrvStatus (oid : 1.3.6.1.4.1.232.3.2.5.1.1.6) return vaule is 1 (other) .

Since the check is ne 'ok' thatr causes a critical warning which is a false positive.

Support for HP 3PAR Storage

Hi
Can I use it for 3PAR storage monitoring?
I am getting unknown in "whoami" output.
will be very thankful if it can be used for 3PAR storage.
I am able to run snmpwalk on 3par successfully.

not drawing in pnp graph

version 4.8.0.2
compiled like:
./configure --prefix=/usr/.... --with-nagios-user=nagios --with-nagios-group=nagios --with-noinst-level=ok --with-degrees=celsius --with-perfdata --with-hpacucli

command definition includes: --perfdata=short

error displaying:

image

Blacklisting DIMMs doesn't override warning "status of all x dimms is n/a (please upgrade firmware)"

Currently blacklisting one or more DIMMs doesnt allow you to override the warning telling you to upgrade firmware. The status check in MemorySubsystem.pm doesnt check if a module is blacklisted when counting ok modules.
To ignore this warning you have to add --ignore-dimms but this requires you to either

  1. add it to all corresponding nagios checks,
  2. split the nagios checks to have one with this option and one without and assign this properly
  3. do other server specific things like adding an alias etc

Is this intended or a bug?

Gen8 blade shows critical - powerSupply temperature too high (30C, -99 max)

A new gen8 server has some weird messages regarding powersupply, especially the -99 max looks weird.

check_hpasm -H HOSTNAME
CRITICAL - 15 powerSupply temperature too high (27C, -99 max), 16 powerSupply temperature too high (30C, -99 max), System: 'proliant dl360e gen8', S/N: 'CZJ24000VL', ROM: 'P73 08/20/2012' | pc_1=35;460;460 pc_2=35;460;460

I tried with --customthresholds 0:100/0:100 and same result.

I looked at the management interface and it reports same temp but OK status and these two powersupplies have Caution/Critical thresholds marked as "N/A"

Following is the output when running with -vvv

snmp agent answered
whoami: ProLiant DL360e Gen8
using HP::Proliant::SNMP
Protocol is 2c
000 seconds for walk cpqSasComponent (4 oids)
000 seconds for walk cpqHeMComponent (265 oids)
000 seconds for walk cpqHeAsr (21 oids)
000 seconds for walk cpqSiComponent (59 oids)
000 seconds for walk system (7 oids)
000 seconds for walk cpqSeRom (3 oids)
000 seconds for walk cpqHeEventLog (11 oids)
000 seconds for walk cpqHeThermal (288 oids)
000 seconds for walk cpqHePWSComponent (34 oids)
000 seconds for walk cpqSeProcessor (44 oids)
000 seconds for walk cpqIdeComponent (4 oids)
000 seconds for walk cpqNic (275 oids)
000 seconds for walk cpqDaComponent (330 oids)
000 seconds for walk cpqFcaComponent (1 oids)
000 seconds for get various (8 oids)
overall si condition is undefined
overall he condition is undefined
SI: 00 HE: 00 H2: 12
H200-> 01 02 03 04 05 06
H201-> 01 02 03 04 05 06
TYP4 proliant dl360e gen8
ALL: 12
HP::Proliant::Component::DiskSubsystem::Da::SNMP controllers und platten zusammenfuehren
has 1 controllers
has 1 accelerators
has 2 physical_drives
has 1 logical_drives
has 0 spare_drives
HP::Proliant::Component::DiskSubsystem::Sas::SNMP controllers und platten zusammenfuehren
has 0 controllers
has 0 physical_drives
has 0 logical_drives
has 0 spare_drives
HP::Proliant::Component::DiskSubsystem::Scsi::SNMP controllers und platten zusammenfuehren
has 0 controllers
has 0 physical_drives
has 0 logical_drives
has 0 spare_drives
HP::Proliant::Component::DiskSubsystem::Ide::SNMP controllers und platten zusammenfuehren
has 0 controllers
has 0 physical_drives
has 0 logical_drives
has 0 spare_drives
HP::Proliant::Component::DiskSubsystem::Fca::SNMP controllers und platten zusammenfuehren
has 0 host controllers
has 0 controllers
has 0 physical_drives
has 0 logical_drives
has 0 spare_drives
[CPU_0]
cpqSeCpuSlot: 0
cpqSeCpuUnitIndex: 0
cpqSeCpuName: Intel Xeon
cpqSeCpuStatus: ok
info: cpu 0 is ok

[PS_1]
cpqHeFltTolPowerSupplyBay: 1
cpqHeFltTolPowerSupplyChassis: 0
cpqHeFltTolPowerSupplyPresent: present
cpqHeFltTolPowerSupplyCondition: ok
cpqHeFltTolPowerSupplyRedundant: redundant
cpqHeFltTolPowerSupplyCapacityUsed: 35
cpqHeFltTolPowerSupplyCapacityMaximum: 460
info: powersupply 1 is ok

[PS_2]
cpqHeFltTolPowerSupplyBay: 2
cpqHeFltTolPowerSupplyChassis: 0
cpqHeFltTolPowerSupplyPresent: present
cpqHeFltTolPowerSupplyCondition: ok
cpqHeFltTolPowerSupplyRedundant: redundant
cpqHeFltTolPowerSupplyCapacityUsed: 35
cpqHeFltTolPowerSupplyCapacityMaximum: 460
info: powersupply 2 is ok

[FAN_1]
cpqHeFltTolFanChassis: 0
cpqHeFltTolFanIndex: 1
cpqHeFltTolFanLocale: system
cpqHeFltTolFanPresent: absent
cpqHeFltTolFanType: spinDetect
cpqHeFltTolFanSpeed: other
cpqHeFltTolFanRedundant: notRedundant
cpqHeFltTolFanRedundantPartner: 0
cpqHeFltTolFanCondition: other
cpqHeFltTolFanHotPlug: hotPluggable
info: fan 1 (system) needs attention (is absent)

[FAN_2]
cpqHeFltTolFanChassis: 0
cpqHeFltTolFanIndex: 2
cpqHeFltTolFanLocale: system
cpqHeFltTolFanPresent: absent
cpqHeFltTolFanType: spinDetect
cpqHeFltTolFanSpeed: other
cpqHeFltTolFanRedundant: notRedundant
cpqHeFltTolFanRedundantPartner: 0
cpqHeFltTolFanCondition: other
cpqHeFltTolFanHotPlug: hotPluggable
info: fan 2 (system) needs attention (is absent)

[FAN_3]
cpqHeFltTolFanChassis: 0
cpqHeFltTolFanIndex: 3
cpqHeFltTolFanLocale: system
cpqHeFltTolFanPresent: absent
cpqHeFltTolFanType: spinDetect
cpqHeFltTolFanSpeed: other
cpqHeFltTolFanRedundant: notRedundant
cpqHeFltTolFanRedundantPartner: 0
cpqHeFltTolFanCondition: other
cpqHeFltTolFanHotPlug: hotPluggable
info: fan 3 (system) needs attention (is absent)

[FAN_4]
cpqHeFltTolFanChassis: 0
cpqHeFltTolFanIndex: 4
cpqHeFltTolFanLocale: system
cpqHeFltTolFanPresent: absent
cpqHeFltTolFanType: spinDetect
cpqHeFltTolFanSpeed: other
cpqHeFltTolFanRedundant: notRedundant
cpqHeFltTolFanRedundantPartner: 0
cpqHeFltTolFanCondition: other
cpqHeFltTolFanHotPlug: hotPluggable
info: fan 4 (system) needs attention (is absent)

[FAN_5]
cpqHeFltTolFanChassis: 0
cpqHeFltTolFanIndex: 5
cpqHeFltTolFanLocale: system
cpqHeFltTolFanPresent: present
cpqHeFltTolFanType: spinDetect
cpqHeFltTolFanSpeed: normal
cpqHeFltTolFanRedundant: notRedundant
cpqHeFltTolFanRedundantPartner: 0
cpqHeFltTolFanCondition: ok
cpqHeFltTolFanHotPlug: hotPluggable
info: fan 5 is present, speed is normal, pctmax is 50%, location is system, redundance is notRedundant, partner is 0

[FAN_6]
cpqHeFltTolFanChassis: 0
cpqHeFltTolFanIndex: 6
cpqHeFltTolFanLocale: system
cpqHeFltTolFanPresent: present
cpqHeFltTolFanType: spinDetect
cpqHeFltTolFanSpeed: normal
cpqHeFltTolFanRedundant: notRedundant
cpqHeFltTolFanRedundantPartner: 0
cpqHeFltTolFanCondition: ok
cpqHeFltTolFanHotPlug: hotPluggable
info: fan 6 is present, speed is normal, pctmax is 50%, location is system, redundance is notRedundant, partner is 0

[FAN_7]
cpqHeFltTolFanChassis: 0
cpqHeFltTolFanIndex: 7
cpqHeFltTolFanLocale: system
cpqHeFltTolFanPresent: present
cpqHeFltTolFanType: spinDetect
cpqHeFltTolFanSpeed: normal
cpqHeFltTolFanRedundant: notRedundant
cpqHeFltTolFanRedundantPartner: 0
cpqHeFltTolFanCondition: ok
cpqHeFltTolFanHotPlug: hotPluggable
info: fan 7 is present, speed is normal, pctmax is 50%, location is system, redundance is notRedundant, partner is 0

[FAN_8]
cpqHeFltTolFanChassis: 0
cpqHeFltTolFanIndex: 8
cpqHeFltTolFanLocale: system
cpqHeFltTolFanPresent: present
cpqHeFltTolFanType: spinDetect
cpqHeFltTolFanSpeed: normal
cpqHeFltTolFanRedundant: notRedundant
cpqHeFltTolFanRedundantPartner: 0
cpqHeFltTolFanCondition: ok
cpqHeFltTolFanHotPlug: hotPluggable
info: fan 8 is present, speed is normal, pctmax is 50%, location is system, redundance is notRedundant, partner is 0

[TEMP_1]
cpqHeTemperatureChassis: 0
cpqHeTemperatureIndex: 1
cpqHeTemperatureLocale: ambient
cpqHeTemperatureCelsius: 20
cpqHeTemperatureThreshold: 42
cpqHeTemperatureThresholdType: caution
cpqHeTemperatureCondition: ok
info: 1 ambient temperature is 20C (42 max)

[TEMP_10]
cpqHeTemperatureChassis: 0
cpqHeTemperatureIndex: 10
cpqHeTemperatureLocale: memory
cpqHeTemperatureCelsius: 25
cpqHeTemperatureThreshold: 80
cpqHeTemperatureThresholdType: caution
cpqHeTemperatureCondition: ok
info: 10 memory temperature is 25C (80 max)

[TEMP_11]
cpqHeTemperatureChassis: 0
cpqHeTemperatureIndex: 11
cpqHeTemperatureLocale: memory
cpqHeTemperatureCelsius: 25
cpqHeTemperatureThreshold: 80
cpqHeTemperatureThresholdType: caution
cpqHeTemperatureCondition: ok
info: 11 memory temperature is 25C (80 max)

[TEMP_12]
cpqHeTemperatureChassis: 0
cpqHeTemperatureIndex: 12
cpqHeTemperatureLocale: system
cpqHeTemperatureCelsius: 35
cpqHeTemperatureThreshold: 60
cpqHeTemperatureThresholdType: caution
cpqHeTemperatureCondition: ok
info: 12 system temperature is 35C (60 max)

[TEMP_13]
cpqHeTemperatureChassis: 0
cpqHeTemperatureIndex: 13
cpqHeTemperatureLocale: system
cpqHeTemperatureCelsius: 44
cpqHeTemperatureThreshold: 105
cpqHeTemperatureThresholdType: caution
cpqHeTemperatureCondition: ok
info: 13 system temperature is 44C (105 max)

[TEMP_14]
cpqHeTemperatureChassis: 0
cpqHeTemperatureIndex: 14
cpqHeTemperatureLocale: system
cpqHeTemperatureCelsius: 33
cpqHeTemperatureThreshold: 95
cpqHeTemperatureThresholdType: caution
cpqHeTemperatureCondition: ok
info: 14 system temperature is 33C (95 max)

[TEMP_15]
cpqHeTemperatureChassis: 0
cpqHeTemperatureIndex: 15
cpqHeTemperatureLocale: powerSupply
cpqHeTemperatureCelsius: 27
cpqHeTemperatureThreshold: 90
cpqHeTemperatureThresholdType: other
cpqHeTemperatureCondition: ok
info: 15 powerSupply temperature is 27C (90 max)

[TEMP_16]
cpqHeTemperatureChassis: 0
cpqHeTemperatureIndex: 16
cpqHeTemperatureLocale: powerSupply
cpqHeTemperatureCelsius: 30
cpqHeTemperatureThreshold: -99
cpqHeTemperatureThresholdType: other
cpqHeTemperatureCondition: ok
info: 16 powerSupply temperature too high (30C, -99 max)

[TEMP_17]
cpqHeTemperatureChassis: 0
cpqHeTemperatureIndex: 17
cpqHeTemperatureLocale: powerSupply
cpqHeTemperatureCelsius: 25
cpqHeTemperatureThreshold: 80
cpqHeTemperatureThresholdType: caution
cpqHeTemperatureCondition: ok
info: 17 powerSupply temperature is 25C (80 max)

[TEMP_18]
cpqHeTemperatureChassis: 0
cpqHeTemperatureIndex: 18
cpqHeTemperatureLocale: powerSupply
cpqHeTemperatureCelsius: 26
cpqHeTemperatureThreshold: 80
cpqHeTemperatureThresholdType: caution
cpqHeTemperatureCondition: ok
info: 18 powerSupply temperature is 26C (80 max)

[TEMP_19]
cpqHeTemperatureChassis: 0
cpqHeTemperatureIndex: 19
cpqHeTemperatureLocale: powerSupply
cpqHeTemperatureCelsius: 28
cpqHeTemperatureThreshold: 110
cpqHeTemperatureThresholdType: caution
cpqHeTemperatureCondition: ok
info: 19 powerSupply temperature is 28C (110 max)

[TEMP_2]
cpqHeTemperatureChassis: 0
cpqHeTemperatureIndex: 2
cpqHeTemperatureLocale: cpu
cpqHeTemperatureCelsius: 40
cpqHeTemperatureThreshold: 70
cpqHeTemperatureThresholdType: caution
cpqHeTemperatureCondition: ok
info: 2 cpu temperature is 40C (70 max)

[TEMP_20]
cpqHeTemperatureChassis: 0
cpqHeTemperatureIndex: 20
cpqHeTemperatureLocale: powerSupply
cpqHeTemperatureCelsius: 25
cpqHeTemperatureThreshold: 110
cpqHeTemperatureThresholdType: caution
cpqHeTemperatureCondition: ok
info: 20 powerSupply temperature is 25C (110 max)

[TEMP_21]
cpqHeTemperatureChassis: 0
cpqHeTemperatureIndex: 21
cpqHeTemperatureLocale: powerSupply
cpqHeTemperatureCelsius: 27
cpqHeTemperatureThreshold: 110
cpqHeTemperatureThresholdType: caution
cpqHeTemperatureCondition: ok
info: 21 powerSupply temperature is 27C (110 max)

[TEMP_22]
cpqHeTemperatureChassis: 0
cpqHeTemperatureIndex: 22
cpqHeTemperatureLocale: powerSupply
cpqHeTemperatureCelsius: 28
cpqHeTemperatureThreshold: 110
cpqHeTemperatureThresholdType: caution
cpqHeTemperatureCondition: ok
info: 22 powerSupply temperature is 28C (110 max)

[TEMP_26]
cpqHeTemperatureChassis: 0
cpqHeTemperatureIndex: 26
cpqHeTemperatureLocale: system
cpqHeTemperatureCelsius: 33
cpqHeTemperatureThreshold: 100
cpqHeTemperatureThresholdType: caution
cpqHeTemperatureCondition: ok
info: 26 system temperature is 33C (100 max)

[TEMP_28]
cpqHeTemperatureChassis: 0
cpqHeTemperatureIndex: 28
cpqHeTemperatureLocale: system
cpqHeTemperatureCelsius: 27
cpqHeTemperatureThreshold: 90
cpqHeTemperatureThresholdType: caution
cpqHeTemperatureCondition: ok
info: 28 system temperature is 27C (90 max)

[TEMP_31]
cpqHeTemperatureChassis: 0
cpqHeTemperatureIndex: 31
cpqHeTemperatureLocale: ioBoard
cpqHeTemperatureCelsius: 27
cpqHeTemperatureThreshold: 80
cpqHeTemperatureThresholdType: caution
cpqHeTemperatureCondition: ok
info: 31 ioBoard temperature is 27C (80 max)

[TEMP_32]
cpqHeTemperatureChassis: 0
cpqHeTemperatureIndex: 32
cpqHeTemperatureLocale: ioBoard
cpqHeTemperatureCelsius: 24
cpqHeTemperatureThreshold: 80
cpqHeTemperatureThresholdType: caution
cpqHeTemperatureCondition: ok
info: 32 ioBoard temperature is 24C (80 max)

[TEMP_33]
cpqHeTemperatureChassis: 0
cpqHeTemperatureIndex: 33
cpqHeTemperatureLocale: system
cpqHeTemperatureCelsius: 30
cpqHeTemperatureThreshold: 80
cpqHeTemperatureThresholdType: caution
cpqHeTemperatureCondition: ok
info: 33 system temperature is 30C (80 max)

[TEMP_34]
cpqHeTemperatureChassis: 0
cpqHeTemperatureIndex: 34
cpqHeTemperatureLocale: system
cpqHeTemperatureCelsius: 29
cpqHeTemperatureThreshold: 80
cpqHeTemperatureThresholdType: caution
cpqHeTemperatureCondition: ok
info: 34 system temperature is 29C (80 max)

[TEMP_35]
cpqHeTemperatureChassis: 0
cpqHeTemperatureIndex: 35
cpqHeTemperatureLocale: system
cpqHeTemperatureCelsius: 30
cpqHeTemperatureThreshold: 80
cpqHeTemperatureThresholdType: caution
cpqHeTemperatureCondition: ok
info: 35 system temperature is 30C (80 max)

[TEMP_36]
cpqHeTemperatureChassis: 0
cpqHeTemperatureIndex: 36
cpqHeTemperatureLocale: system
cpqHeTemperatureCelsius: 30
cpqHeTemperatureThreshold: 80
cpqHeTemperatureThresholdType: caution
cpqHeTemperatureCondition: ok
info: 36 system temperature is 30C (80 max)

[TEMP_37]
cpqHeTemperatureChassis: 0
cpqHeTemperatureIndex: 37
cpqHeTemperatureLocale: system
cpqHeTemperatureCelsius: 28
cpqHeTemperatureThreshold: 80
cpqHeTemperatureThresholdType: caution
cpqHeTemperatureCondition: ok
info: 37 system temperature is 28C (80 max)

[TEMP_4]
cpqHeTemperatureChassis: 0
cpqHeTemperatureIndex: 4
cpqHeTemperatureLocale: memory
cpqHeTemperatureCelsius: 22
cpqHeTemperatureThreshold: 87
cpqHeTemperatureThresholdType: caution
cpqHeTemperatureCondition: ok
info: 4 memory temperature is 22C (87 max)

[TEMP_6]
cpqHeTemperatureChassis: 0
cpqHeTemperatureIndex: 6
cpqHeTemperatureLocale: memory
cpqHeTemperatureCelsius: 26
cpqHeTemperatureThreshold: 80
cpqHeTemperatureThresholdType: caution
cpqHeTemperatureCondition: ok
info: 6 memory temperature is 26C (80 max)

[TEMP_7]
cpqHeTemperatureChassis: 0
cpqHeTemperatureIndex: 7
cpqHeTemperatureLocale: memory
cpqHeTemperatureCelsius: 26
cpqHeTemperatureThreshold: 80
cpqHeTemperatureThresholdType: caution
cpqHeTemperatureCondition: ok
info: 7 memory temperature is 26C (80 max)

[TEMP_8]
cpqHeTemperatureChassis: 0
cpqHeTemperatureIndex: 8
cpqHeTemperatureLocale: memory
cpqHeTemperatureCelsius: 27
cpqHeTemperatureThreshold: 80
cpqHeTemperatureThresholdType: caution
cpqHeTemperatureCondition: ok
info: 8 memory temperature is 27C (80 max)

[TEMP_9]
cpqHeTemperatureChassis: 0
cpqHeTemperatureIndex: 9
cpqHeTemperatureLocale: memory
cpqHeTemperatureCelsius: 25
cpqHeTemperatureThreshold: 80
cpqHeTemperatureThresholdType: caution
cpqHeTemperatureCondition: ok
info: 9 memory temperature is 25C (80 max)

dimm module 0:1 (module 1 @ cartridge 0) is ok
dimm module 0:2 (module 2 @ cartridge 0) is not present
dimm module 0:3 (module 3 @ cartridge 0) is not present
dimm module 0:4 (module 4 @ cartridge 0) is not present
dimm module 0:5 (module 5 @ cartridge 0) is not present
dimm module 0:6 (module 6 @ cartridge 0) is not present
dimm module 1:1 (module 1 @ cartridge 1) is not present
dimm module 1:2 (module 2 @ cartridge 1) is not present
dimm module 1:3 (module 3 @ cartridge 1) is not present
dimm module 1:4 (module 4 @ cartridge 1) is not present
dimm module 1:5 (module 5 @ cartridge 1) is not present
dimm module 1:6 (module 6 @ cartridge 1) is not present
[SI]
[HE]
[H2]
car 00 mod 01 siz 4194304 sta good con ok typ
car 00 mod 02 siz 0 sta notPresent con other typ
car 00 mod 03 siz 0 sta notPresent con other typ
car 00 mod 04 siz 0 sta notPresent con other typ
car 00 mod 05 siz 0 sta notPresent con other typ
car 00 mod 06 siz 0 sta notPresent con other typ
car 01 mod 01 siz 0 sta notPresent con other typ
car 01 mod 02 siz 0 sta notPresent con other typ
car 01 mod 03 siz 0 sta notPresent con other typ
car 01 mod 04 siz 0 sta notPresent con other typ
car 01 mod 05 siz 0 sta notPresent con other typ
car 01 mod 06 siz 0 sta notPresent con other typ
i dump the memory
car 00 mod 01 siz 4194304 sta good con ok typ
car 00 mod 02 siz 0 sta notPresent con other typ
car 00 mod 03 siz 0 sta notPresent con other typ
car 00 mod 04 siz 0 sta notPresent con other typ
car 00 mod 05 siz 0 sta notPresent con other typ
car 00 mod 06 siz 0 sta notPresent con other typ
car 01 mod 01 siz 0 sta notPresent con other typ
car 01 mod 02 siz 0 sta notPresent con other typ
car 01 mod 03 siz 0 sta notPresent con other typ
car 01 mod 04 siz 0 sta notPresent con other typ
car 01 mod 05 siz 0 sta notPresent con other typ
car 01 mod 06 siz 0 sta notPresent con other typ
[DA_CONTROLLER_0]
cpqDaCntlrSlot: 0
cpqDaCntlrIndex: 1
cpqDaCntlrCondition: ok
cpqDaCntlrModel: value_54

[ACCELERATOR]
cpqDaAccelCntlrIndex: 1
cpqDaAccelBattery: notPresent
cpqDaAccelStatus: enabled
cpqDaAccelCondition: ok

[LOGICAL_DRIVE]
cpqDaLogDrvCntlrIndex: 1
cpqDaLogDrvIndex: 1
cpqDaLogDrvSize: 953837
cpqDaLogDrvFaultTol: mirroring
cpqDaLogDrvStatus: ok
cpqDaLogDrvCondition: ok
cpqDaLogDrvPercentRebuild: 100
cpqDaLogDrvPhyDrvIDs: empty

[PHYSICAL_DRIVE]
cpqDaPhyDrvCntlrIndex: 1
cpqDaPhyDrvIndex: 0
cpqDaPhyDrvBay: 1
cpqDaPhyDrvBusNumber: 0
cpqDaPhyDrvSize: 689
cpqDaPhyDrvStatus: ok
cpqDaPhyDrvCondition: ok

[PHYSICAL_DRIVE]
cpqDaPhyDrvCntlrIndex: 1
cpqDaPhyDrvIndex: 1
cpqDaPhyDrvBay: 2
cpqDaPhyDrvBusNumber: 0
cpqDaPhyDrvSize: 689
cpqDaPhyDrvStatus: ok
cpqDaPhyDrvCondition: ok

[EVENT_85]
cpqHeEventLogEntryNumber: 85
cpqHeEventLogEntrySeverity: informational
cpqHeEventLogEntryCount: 1
cpqHeEventLogInitialTime: Mon Nov 12 14:47:00 2012
cpqHeEventLogUpdateTime: Mon Nov 12 14:47:00 2012
cpqHeEventLogErrorDesc: IML Cleared (iLO 4 user:Administrator)
info: Event: 85 Added: 1352731620 Class: (Maintenance Note) informational IML Cleared (iLO 4 user:Administrator)

CRITICAL - 16 powerSupply temperature too high (30C, -99 max), System: 'proliant dl360e gen8', S/N: 'CZJ24000VL', ROM: 'P73 08/20/2012'
checking cpus
cpu 0 is ok
checking power supplies
powersupply 1 is ok
powersupply 2 is ok
checking fans
overall fan status: system=ok, cpu=other
fan 1 is absent, speed is other, pctmax is 0%, location is system, redundance is notRedundant, partner is 0
fan 1 (system) needs attention (is absent)
fan 2 is absent, speed is other, pctmax is 0%, location is system, redundance is notRedundant, partner is 0
fan 2 (system) needs attention (is absent)
fan 3 is absent, speed is other, pctmax is 0%, location is system, redundance is notRedundant, partner is 0
fan 3 (system) needs attention (is absent)
fan 4 is absent, speed is other, pctmax is 0%, location is system, redundance is notRedundant, partner is 0
fan 4 (system) needs attention (is absent)
fan 5 is present, speed is normal, pctmax is 50%, location is system, redundance is notRedundant, partner is 0
fan 6 is present, speed is normal, pctmax is 50%, location is system, redundance is notRedundant, partner is 0
fan 7 is present, speed is normal, pctmax is 50%, location is system, redundance is notRedundant, partner is 0
fan 8 is present, speed is normal, pctmax is 50%, location is system, redundance is notRedundant, partner is 0
checking temperatures
1 ambient temperature is 20C (42 max)
2 cpu temperature is 40C (70 max)
4 memory temperature is 22C (87 max)
6 memory temperature is 26C (80 max)
7 memory temperature is 26C (80 max)
8 memory temperature is 27C (80 max)
9 memory temperature is 25C (80 max)
10 memory temperature is 25C (80 max)
11 memory temperature is 25C (80 max)
12 system temperature is 35C (60 max)
13 system temperature is 44C (105 max)
14 system temperature is 33C (95 max)
15 powerSupply temperature is 27C (90 max)
16 powerSupply temperature too high (30C, -99 max)
17 powerSupply temperature is 25C (80 max)
18 powerSupply temperature is 26C (80 max)
19 powerSupply temperature is 28C (110 max)
20 powerSupply temperature is 25C (110 max)
21 powerSupply temperature is 27C (110 max)
22 powerSupply temperature is 28C (110 max)
26 system temperature is 33C (100 max)
28 system temperature is 27C (90 max)
31 ioBoard temperature is 27C (80 max)
32 ioBoard temperature is 24C (80 max)
33 system temperature is 30C (80 max)
34 system temperature is 29C (80 max)
35 system temperature is 30C (80 max)
36 system temperature is 30C (80 max)
37 system temperature is 28C (80 max)
checking memory
dimm module 0:1 (module 1 @ cartridge 0) is ok
dimm module 0:2 (module 2 @ cartridge 0) is not present
dimm module 0:3 (module 3 @ cartridge 0) is not present
dimm module 0:4 (module 4 @ cartridge 0) is not present
dimm module 0:5 (module 5 @ cartridge 0) is not present
dimm module 0:6 (module 6 @ cartridge 0) is not present
dimm module 1:1 (module 1 @ cartridge 1) is not present
dimm module 1:2 (module 2 @ cartridge 1) is not present
dimm module 1:3 (module 3 @ cartridge 1) is not present
dimm module 1:4 (module 4 @ cartridge 1) is not present
dimm module 1:5 (module 5 @ cartridge 1) is not present
dimm module 1:6 (module 6 @ cartridge 1) is not present
checking disk subsystem
controller accelerator is ok
controller accelerator battery is notPresent
logical drive 1:1 is ok (mirroring)
physical drive 1:0 is ok
physical drive 1:1 is ok
da controller 1 in slot 0 is ok
checking ASR
ASR overall condition is ok
checking events
Event: 85 Added: 1352731620 Class: (Maintenance Note) informational IML Cleared (iLO 4 user:Administrator) | pc_1=35;460;460 pc_2=35;460;460

incorrect drive bay number reported on failed physical drives on a DL380 G7 server

A ProLiant DL380 G7 with 2 failing SAS HDDs in bay 4 and 5 show like this in check_hpasm:

CRITICAL - physical drive 0:3 is failed, physical drive 0:4 is failed, da controller 0 in slot 0 needs attention, logical drive 0:1 is recovering, System: 'proliant dl380 g7', S/N: ..., ROM: 'P67 05/05/2011'

It seems that the drive index is shifted by 1?

$ /usr/lib/nagios/plugins/check_hpasm -V
check_hpasm 4.6.3.1 [http://labs.consol.de/nagios/check_hpasm]

$ hpacucli controller check:
=> controller all show

  Smart Array P212 in Slot 2                (sn: <ABC>)
  Smart Array P410i in Slot 0 (Embedded)    (sn: <EFG>)

=> controller sn= show

  Smart Array P410i in Slot 0 (Embedded)

    Bus Interface: PCI
    Slot: 0
    Serial Number: <EFG>
    Cache Serial Number: ...
    RAID 6 (ADG) Status: Disabled
    Controller Status: OK
    Hardware Revision: C
    Firmware Version: 3.66
    Rebuild Priority: Medium
    Expand Priority: Medium
    Surface Scan Delay: 3 secs
    Surface Scan Mode: Idle
    Queue Depth: Automatic
    Monitor and Performance Delay: 60  min
    Elevator Sort: Enabled
    Degraded Performance Optimization: Disabled
    Inconsistency Repair Policy: Disabled
    Wait for Cache Room: Disabled
    Surface Analysis Inconsistency Notification: Disabled                                                                                                                                              
    Post Prompt Timeout: 15 secs                                                                                                                                                                       
    Cache Board Present: True                                                                                                                                                                          
    Cache Status: OK                                                                                                                                                                                   
    Cache Ratio: 25% Read / 75% Write                                                                                                                                                                  
    Drive Write Cache: Disabled                                                                                                                                                                        
    Total Cache Size: 1024 MB                                                                                                                                                                          
    Total Cache Memory Available: 912 MB                                                                                                                                                               
    No-Battery Write Cache: Disabled                                                                                                                                                                   
    Cache Backup Power Source: Capacitors                                                                                                                                                              
    Battery/Capacitor Count: 1                                                                                                                                                                         
    Battery/Capacitor Status: OK                                                                                                                                                                       
    SATA NCQ Supported: True                                                                                                                                                                           

=> controller sn= physicaldrive all show

  Smart Array P410i in Slot 0 (Embedded)                                                                                                                                                               

     array A

        physicaldrive 1I:1:1 (port 1I:box 1:bay 1, SAS, 300 GB, OK)
        physicaldrive 1I:1:2 (port 1I:box 1:bay 2, SAS, 300 GB, OK)
        physicaldrive 1I:1:3 (port 1I:box 1:bay 3, SAS, 300 GB, OK)
        physicaldrive 1I:1:4 (port 1I:box 1:bay 4, SAS, 300 GB, Failed)
        physicaldrive 2I:1:5 (port 2I:box 1:bay 5, SAS, 300 GB, Failed)
        physicaldrive 2I:1:6 (port 2I:box 1:bay 6, SAS, 300 GB, OK)
        physicaldrive 2I:1:7 (port 2I:box 1:bay 7, SAS, 300 GB, OK)
        physicaldrive 2I:1:8 (port 2I:box 1:bay 8, SAS, 300 GB, OK, active spare)

whoami returns "Storage" using SNMP to hp-snmp-agents on ProLiant BL460c Gen8

With hp-snmp-agents on a ProLiant BL460c Gen8 the result of valid_response is undef due to the condition $result->{$oid} eq 'noSuchInstance' which results in whoami returning Storage.

$ snmpget -v2c -c public HOST 1.3.6.1.4.1.232.2.2.4.2.0 
SNMPv2-SMI::enterprises.232.2.2.4.2.0 = ""

Verified that overriding $self->{productname} = 'ProLiant' at the end of whoami make things work as expected.

I'm not sure what real life situations each of the guards in valid_response protects against so I wasn't able to come up with a patch. Sorry!


hpasmcli> show server
System        : ProLiant BL460c Gen8
Serial No.    : [ REDACTED ]
ROM version   : I31 02/10/2014
# cat /etc/os-release 
PRETTY_NAME="Debian GNU/Linux 8 (jessie)"
NAME="Debian GNU/Linux"
VERSION_ID="8"
VERSION="8 (jessie)"
ID=debian
HOME_URL="http://www.debian.org/"
SUPPORT_URL="http://www.debian.org/support/"
BUG_REPORT_URL="https://bugs.debian.org/"
# apt-cache policy hp-snmp-agents
hp-snmp-agents:
  Installed: 10.0.0.1.23-20.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.