Git Product home page Git Product logo

zabbix-nvidia-smi-integration's People

Contributors

richardkav avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

zabbix-nvidia-smi-integration's Issues

Error - Unsupported item key.

I added template and userparams into zabbix-agent config.
In web interface an error that keys are unsupported, but from zabbix-server host:
zabbix_get -s host -k gpu.free returns current count of free memory.

zabbix_agentd (daemon) (Zabbix) 3.2.6 Ubuntu 16.04 (all cheks works by hands and throu zabbix_get)

zabbix_server (Zabbix) 3.2.6 ( Centos 7.3)

Steps towards a template for AMD based cards

I've been asked by email about how something similar might work for AMD based cards and thought it might be worth developing, so I outline the steps here. In case someone wants to try.

I suspect the main body of the xml template would remain the same. The main changes would be made to the configuration of the Zabbix agent and the commands such as the following would have to change:

UserParameter=gpu.temp,nvidia-smi --query-gpu=temperature.gpu --format=csv,noheader,nounits -i 0 

The forum (https://community.amd.com/thread/167544) suggests a similar tool to nvidia-smi would be amdconfig, while another suggests RadeonTop (https://askubuntu.com/questions/244577/temperature-and-other-statistics-from-radeon-open-source-drivers).

The first forum provides several commands such as:

amdconfig --adapter=$1 --odgt | grep 'Temperature' | cut -d'-' -f2 | cut -d'.' -f1 | tr -d ' '
amdconfig --adapter=$1 --odgc | grep 'GPU load' | cut -f1 -d'%' | cut -f2 -d':'| tr -d ' '

This I believe would can be converted into the equivelant commands for the Zabbix template, provided here. i.e.

UserParameter=gpu.temp,amdconfig --adapter=$1 --odgt | grep 'Temperature' | cut -d'-' -f2 | cut -d'.' -f1 | tr -d ' ' 
UserParameter=gpu.utilisation,amdconfig --adapter=$1 --odgc | grep 'GPU load' | cut -f1 -d'%' | cut -f2 -d':'| tr -d ' '

The two lines above use grep and cut to select the correct part of the output of amdconfig. I purposefully chose to get nvidia-smi in my own commands to limit the output so there was no need to parse the output with text processing commands afterwards. This was the one of the main advances I made over the gist: https://gist.github.com/bhcopeland/b54d3c678a0cb6e87119. Commands such as “grep” and “cut” may led to selecting the wrong bit of data, in cases where say “temperature” was shown on multiple lines of output of the command amdconfig.

be simple

do it like that, it is be simple

UserParameter=nvsmi-gpu[*],nvidia-smi --query-gpu=$1 --format=csv,noheader,nounits -i 0

UserParameter=nvsmi-compute-apps[*],nvidia-smi --query-compute-apps=$1 --format=csv,noheader,nounits -i 0

UserParameter=nvsmi-retired-pages[*],nvidia-smi --query-retired-pages=$1 --format=csv,noheader,nounits -i 0

UserParameter=nvsmi-accounted-apps[*],nvidia-smi --query-accounted-apps=$1 --format=csv,noheader,nounits -i 0

UserParameter=nvsmi-supported-clocks[*],nvidia-smi --query-supported-clocks=$1 --format=csv,noheader,nounits -i 0

Getting [no data] from zabbix agent

Hi,

I'm getting [no data] for all Items

Ran below steps for configuration.

  1. Imported template to zabbix server

  2. Assigned to GPU host (zabbix agent)

  3. Added below lines to zabbix agent conf file.
    UserParameter=gpu.temp,nvidia-smi --query-gpu=temperature.gpu --format=csv,noheader,nounits -i 0
    UserParameter=gpu.memtotal,nvidia-smi --query-gpu=memory.total --format=csv,noheader,nounits -i 0
    UserParameter=gpu.used,nvidia-smi --query-gpu=memory.used --format=csv,noheader,nounits -i 0
    UserParameter=gpu.free,nvidia-smi --query-gpu=memory.free --format=csv,noheader,nounits -i 0
    UserParameter=gpu.fanspeed,nvidia-smi --query-gpu=fan.speed --format=csv,noheader,nounits -i 0
    UserParameter=gpu.utilisation,nvidia-smi --query-gpu=utilization.gpu --format=csv,noheader,nounits -i 0
    UserParameter=gpu.power,nvidia-smi --query-gpu=power.draw --format=csv,noheader,nounits -i 0

  4. Restarted zabbix services on server and agent.

image

image

I am able to get data when I ran manual steps.
nvidia-smi --query-gpu=utilization.memory,memory.total,memory.free,memory.used --format=csv
utilization.memory [%], memory.total [MiB], memory.free [MiB], memory.used [MiB]
0 %, 16130 MiB, 16130 MiB, 0 MiB
0 %, 16130 MiB, 16130 MiB, 0 MiB
0 %, 16130 MiB, 16130 MiB, 0 MiB
0 %, 16130 MiB, 16130 MiB, 0 MiB

Any hint to fix issue.

nvidia-smi is not recognized

Hi,

I'm getting this error when I try to use the template:
Value "'nvidia-smi' is not recognized as an internal or external command,
operable program or batch file." of type "string" is not suitable for value type "Numeric (float)"

What I did:

  1. Added the template to zabbix
  2. Assigned it to a host
  3. Added C:\Program Files\NVIDIA Corporation\NVSMI to the PATH var on the Windows Hosts
  4. Added the UserParameters to the Zabbix conf File
  5. Restarted Zabbix Service / rebooted the entire windows host

Zabbix:
zabb

Host:
cmd

Host conf:

### NVidia GPU
#   https://share.zabbix.com/cat-server-hardware/other/nvidia-smi-integration
UserParameter=gpu.temp,nvidia-smi --query-gpu=temperature.gpu --format=csv,noheader,nounits -i 0
UserParameter=gpu.memtotal,nvidia-smi --query-gpu=memory.total --format=csv,noheader,nounits -i 0
UserParameter=gpu.used,nvidia-smi --query-gpu=memory.used --format=csv,noheader,nounits -i 0
UserParameter=gpu.free,nvidia-smi --query-gpu=memory.free --format=csv,noheader,nounits -i 0
UserParameter=gpu.fanspeed,nvidia-smi --query-gpu=fan.speed --format=csv,noheader,nounits -i 0
UserParameter=gpu.utilisation,nvidia-smi --query-gpu=utilization.gpu --format=csv,noheader,nounits -i 0
UserParameter=gpu.power,nvidia-smi --query-gpu=power.draw --format=csv,noheader,nounits -i 0

Agent Log:

  8692:20181218:084322.380 Zabbix Agent stopped. Zabbix 4.0.0 (revision 85308).
  8324:20181218:084326.055 Starting Zabbix Agent [Servername]. Zabbix 4.0.0 (revision 85308).
  8324:20181218:084326.055 **** Enabled features ****
  8324:20181218:084326.056 IPv6 support:          YES
  8324:20181218:084326.056 TLS support:           YES
  8324:20181218:084326.057 **************************
  8324:20181218:084326.057 using configuration file: C:\Zabbix\conf\zabbix_agentd.win.conf
  8324:20181218:084326.060 agent #0 started [main process]
  2960:20181218:084326.061 agent #2 started [listener #1]
  6752:20181218:084326.062 agent #4 started [listener #3]
  8464:20181218:084326.062 agent #1 started [collector]
  9896:20181218:084326.063 agent #5 started [active checks #1]
  1592:20181218:084326.063 agent #3 started [listener #2]

Info:
Server: Zabbix 4.0.2
Win Agent: 4.0.0 (amd64)

Did I forget something or is that a bug?

Cannot get data from zabbix agent

Error reason for "12.33.243.26:gpu.free" changed: Received value [561658905890589058905890] is not suitable for value type [Numeric (float)]

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.