richardkav / zabbix-nvidia-smi-integration Goto Github PK
View Code? Open in Web Editor NEWThe Zabbix template for monitoring Nvidia graphics cards.
License: Apache License 2.0
The Zabbix template for monitoring Nvidia graphics cards.
License: Apache License 2.0
I added template and userparams into zabbix-agent config.
In web interface an error that keys are unsupported, but from zabbix-server host:
zabbix_get -s host -k gpu.free returns current count of free memory.
zabbix_agentd (daemon) (Zabbix) 3.2.6 Ubuntu 16.04 (all cheks works by hands and throu zabbix_get)
zabbix_server (Zabbix) 3.2.6 ( Centos 7.3)
I've been asked by email about how something similar might work for AMD based cards and thought it might be worth developing, so I outline the steps here. In case someone wants to try.
I suspect the main body of the xml template would remain the same. The main changes would be made to the configuration of the Zabbix agent and the commands such as the following would have to change:
UserParameter=gpu.temp,nvidia-smi --query-gpu=temperature.gpu --format=csv,noheader,nounits -i 0
The forum (https://community.amd.com/thread/167544) suggests a similar tool to nvidia-smi would be amdconfig, while another suggests RadeonTop (https://askubuntu.com/questions/244577/temperature-and-other-statistics-from-radeon-open-source-drivers).
The first forum provides several commands such as:
amdconfig --adapter=$1 --odgt | grep 'Temperature' | cut -d'-' -f2 | cut -d'.' -f1 | tr -d ' '
amdconfig --adapter=$1 --odgc | grep 'GPU load' | cut -f1 -d'%' | cut -f2 -d':'| tr -d ' '
This I believe would can be converted into the equivelant commands for the Zabbix template, provided here. i.e.
UserParameter=gpu.temp,amdconfig --adapter=$1 --odgt | grep 'Temperature' | cut -d'-' -f2 | cut -d'.' -f1 | tr -d ' '
UserParameter=gpu.utilisation,amdconfig --adapter=$1 --odgc | grep 'GPU load' | cut -f1 -d'%' | cut -f2 -d':'| tr -d ' '
The two lines above use grep and cut to select the correct part of the output of amdconfig. I purposefully chose to get nvidia-smi in my own commands to limit the output so there was no need to parse the output with text processing commands afterwards. This was the one of the main advances I made over the gist: https://gist.github.com/bhcopeland/b54d3c678a0cb6e87119. Commands such as “grep” and “cut” may led to selecting the wrong bit of data, in cases where say “temperature” was shown on multiple lines of output of the command amdconfig.
do it like that, it is be simple
UserParameter=nvsmi-gpu[*],nvidia-smi --query-gpu=$1 --format=csv,noheader,nounits -i 0
UserParameter=nvsmi-compute-apps[*],nvidia-smi --query-compute-apps=$1 --format=csv,noheader,nounits -i 0
UserParameter=nvsmi-retired-pages[*],nvidia-smi --query-retired-pages=$1 --format=csv,noheader,nounits -i 0
UserParameter=nvsmi-accounted-apps[*],nvidia-smi --query-accounted-apps=$1 --format=csv,noheader,nounits -i 0
UserParameter=nvsmi-supported-clocks[*],nvidia-smi --query-supported-clocks=$1 --format=csv,noheader,nounits -i 0
Hi,
I'm getting [no data] for all Items
Ran below steps for configuration.
Imported template to zabbix server
Assigned to GPU host (zabbix agent)
Added below lines to zabbix agent conf file.
UserParameter=gpu.temp,nvidia-smi --query-gpu=temperature.gpu --format=csv,noheader,nounits -i 0
UserParameter=gpu.memtotal,nvidia-smi --query-gpu=memory.total --format=csv,noheader,nounits -i 0
UserParameter=gpu.used,nvidia-smi --query-gpu=memory.used --format=csv,noheader,nounits -i 0
UserParameter=gpu.free,nvidia-smi --query-gpu=memory.free --format=csv,noheader,nounits -i 0
UserParameter=gpu.fanspeed,nvidia-smi --query-gpu=fan.speed --format=csv,noheader,nounits -i 0
UserParameter=gpu.utilisation,nvidia-smi --query-gpu=utilization.gpu --format=csv,noheader,nounits -i 0
UserParameter=gpu.power,nvidia-smi --query-gpu=power.draw --format=csv,noheader,nounits -i 0
Restarted zabbix services on server and agent.
I am able to get data when I ran manual steps.
nvidia-smi --query-gpu=utilization.memory,memory.total,memory.free,memory.used --format=csv
utilization.memory [%], memory.total [MiB], memory.free [MiB], memory.used [MiB]
0 %, 16130 MiB, 16130 MiB, 0 MiB
0 %, 16130 MiB, 16130 MiB, 0 MiB
0 %, 16130 MiB, 16130 MiB, 0 MiB
0 %, 16130 MiB, 16130 MiB, 0 MiB
Any hint to fix issue.
Hi,
I'm getting this error when I try to use the template:
Value "'nvidia-smi' is not recognized as an internal or external command,
operable program or batch file." of type "string" is not suitable for value type "Numeric (float)"
What I did:
Host conf:
### NVidia GPU
# https://share.zabbix.com/cat-server-hardware/other/nvidia-smi-integration
UserParameter=gpu.temp,nvidia-smi --query-gpu=temperature.gpu --format=csv,noheader,nounits -i 0
UserParameter=gpu.memtotal,nvidia-smi --query-gpu=memory.total --format=csv,noheader,nounits -i 0
UserParameter=gpu.used,nvidia-smi --query-gpu=memory.used --format=csv,noheader,nounits -i 0
UserParameter=gpu.free,nvidia-smi --query-gpu=memory.free --format=csv,noheader,nounits -i 0
UserParameter=gpu.fanspeed,nvidia-smi --query-gpu=fan.speed --format=csv,noheader,nounits -i 0
UserParameter=gpu.utilisation,nvidia-smi --query-gpu=utilization.gpu --format=csv,noheader,nounits -i 0
UserParameter=gpu.power,nvidia-smi --query-gpu=power.draw --format=csv,noheader,nounits -i 0
Agent Log:
8692:20181218:084322.380 Zabbix Agent stopped. Zabbix 4.0.0 (revision 85308).
8324:20181218:084326.055 Starting Zabbix Agent [Servername]. Zabbix 4.0.0 (revision 85308).
8324:20181218:084326.055 **** Enabled features ****
8324:20181218:084326.056 IPv6 support: YES
8324:20181218:084326.056 TLS support: YES
8324:20181218:084326.057 **************************
8324:20181218:084326.057 using configuration file: C:\Zabbix\conf\zabbix_agentd.win.conf
8324:20181218:084326.060 agent #0 started [main process]
2960:20181218:084326.061 agent #2 started [listener #1]
6752:20181218:084326.062 agent #4 started [listener #3]
8464:20181218:084326.062 agent #1 started [collector]
9896:20181218:084326.063 agent #5 started [active checks #1]
1592:20181218:084326.063 agent #3 started [listener #2]
Info:
Server: Zabbix 4.0.2
Win Agent: 4.0.0 (amd64)
Did I forget something or is that a bug?
Error reason for "12.33.243.26:gpu.free" changed: Received value [561658905890589058905890] is not suitable for value type [Numeric (float)]
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.