Git Product home page Git Product logo

nvidia-gpu-monitoring's Introduction

nvidia-gpu-monitoring

Synopsis

This repository contains an example of programmatic monitoring of Nvidia GPUs in C++ using NVML library.

Refer to Monitoring Nvidia GPUs using API article for details.

Deliverables

Core deliverables of this project are contained in separate directories and are the following:

monitor

Collects metrics from GPU devices.

Console application, written in C++17.

data_extractor

Extracts metrics data from monitor's output.

Console application, written in Python 3.

device_data_filter

Filters metrics of a specific device from full data set obtained via data_extractor.

Console application, written in Python 3.

data_visualizer

Plots metrics of a specific GPU device.

Console application, written in Python 3.

Building

This section briefly describes steps needed to be taken to get each deliverable ready for execution.

monitor

The monitor component is written in C++17 with build instructions defined in cmake.

Hence, one needs a cmake and a C++ frontend supporting std:c++17 to build the monitor.

Having those both installed, the monitor build steps are the following:

mkdir build
cd build
cmake ..

data_extractor

The data_extractor component uses Python 3.7+.

Has no build steps.

device_data_filter

The device_data_filter component uses Python 3.7+.

Its dependencies are listed at:

device_data_filter/requirements.txt

Dependencies installation example:

pip install -r device_data_filter/requirements.txt

data_visualizer

The data_visualizer component uses Python 3.7+.

Its dependencies are listed at:

data_visualizer/requirements.txt

Dependencies installation example:

pip install -r data_visualizer/requirements.txt

Usage

Current section contains brief usage description for components provided by this repository.

Prerequisites

NVML shared library must be present in the system and its directory must be listed in system's PATH variable in order to run the monitor.

Usually, it comes installed with graphics driver on Windows and has to be installed additionally on Linux-based systems.

monitor

The monitor does not take any parameters.

Basic usage:

./monitor

Watching monitor and persisting logs on Windows:

./monitor | Tee-Object -FilePath "monitor.log"

Watching monitor and persisting logs on Linux-base systems:

./monitor | tee "monitor.log"

data_extractor

Executable of the data_extractor component is located at:

./data_extractor/run.py

Its usage doc is listed below:

usage: run.py [-h] [monitor_log] [extracted_data]

Extract data from monitor log

positional arguments:
  monitor_log     path to monitor log file (default: -)
  extracted_data  path to extracted data file (default: -)

optional arguments:
  -h, --help      show this help message and exit

As stated in the doc, the component accepts 2 positional arguments.

Both of them are optional and default to std streams.

Example of reading from stdin and writing to stdout:

cat /path/to/captured/monitor.log | ./data_extractor/run.py

Same with specifying std streams explicitly:

cat /path/to/captured/monitor.log | ./data_extractor/run.py - -

Example of using files:

./data_extractor/run.py /path/to/captured/monitor.log /path/to/output/data_all.csv

Or same using output redirection:

./data_extractor/run.py /path/to/captured/monitor.log > /path/to/output/data_all.csv

Example of reading from a file and writing to stdout:

./data_extractor/run.py /path/to/captured/monitor.log

device_data_filter

Executable of the device_data_filter component is located at:

./device_data_filter/run.py

Its usage doc is listed below:

usage: run.py [-h] [device_index] [input_data] [output_data]

Filter monitoring data for a specific device

positional arguments:
  device_index  index of device to filter data by (default: 0)
  input_data    path to monitor log file (default: -)
  output_data   path to extracted data file (default: -)

optional arguments:
  -h, --help    show this help message and exit

As stated in the doc, the component accepts 3 optional positional arguments.

device_index defaults to 0 and the rest defaults to std streams, just as in case of data_extractor.

Example of filtering data of device with index 0 using files:

./device_data_filter/run.py 0 /path/to/output/data_all.csv /path/to/output/data_0.csv

data_visualizer

Executable of the data_visualizer component is located at:

./data_visualizer/run.py

Its usage doc is listed below:

usage: run.py [-h] [output_file_path_format] [input_data]

Visualize device data

positional arguments:
  output_file_path_format
                        format of output file paths (default:
                        monitor.{suffix}.png)
  input_data            path to device data file (default: -)

optional arguments:
  -h, --help            show this help message and exit

As stated in the doc, the component accepts 2 optional positional arguments:

  • path format for output files, where {suffix} will be replaced with a suffix like full, 0, 1 and so on (see docs/monitor.*.png for output examples).
  • path to data filtered by device_data_filter component.

Usage plotting data for device 0:

./data_visualizer/run.py '/path/to/output/monitor.{suffix}.csv' /path/to/output/data_0.csv

nvidia-gpu-monitoring's People

Contributors

o3bvv avatar 1div0 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.