Git Product home page Git Product logo

infiniband-exporter's Introduction

Infiniband-Exporter

Prometheus exporter for a Infiniband fabric. This exporter only need to be installed on one server connected to the fabric, it will collect all the ports statistics on all the switches.

Metrics are identified by type, port number, switch GUID and name. The remote connection of each port is also collected. Thus each metric represents a cable between 2 switches, or between a switch and a card in a server.

When a node name map file is provided, it will be used by ibqueryerrors to put a more human friendly name on switches.

This exporter takes 3 seconds to collect the information of 60+ IB switches, and 900+ compute nodes. The information takes about 7.5MB in ASCII format for that fabric.

Requirements

  • Python
  • prometheus-client
  • ibqueryerrors

Usage

Metrics are exported on the chosen HTTP port, events like counter reset will be on STDOUT.

usage: infiniband-exporter.py [-h] [--port PORT] [--can-reset-counter]
                              [--from-file INPUT_FILE]
                              [--node-name-map NODE_NAME_MAP]

Prometheus collector for a infiniband fabric

optional arguments:
  -h, --help            show this help message and exit
  --port PORT           Collector http port, default is 9683
  --can-reset-counter   Will reset counter as required when maxed out
  --from-file INPUT_FILE
                        Read a file containing the output of ibqueryerrors, if
                        left empty, ibqueryerrors will be launched as needed
                        by this collector
  --node-name-map NODE_NAME_MAP
                        Node name map used by ibqueryerrors

Sample

# HELP infiniband_linkdownedcounter_total Total number of times the Port Training state machine has failed the link error recovery process and downed the link.
# TYPE infiniband_linkdownedcounter_total counter
infiniband_linkdownedcounter_total{local_guid="0x506b4b03005d3101",local_name="switch1",local_port="2",remote_guid="0x506b4b0300e5e461",remote_name="node1 mlx5_0",remote_port="1"} 1.0
infiniband_linkdownedcounter_total{local_guid="0x506b4b03005d3101",local_name="switch1",local_port="3",remote_guid="0x506b4b0300c35b61",remote_name="node2 mlx5_0",remote_port="1"} 1.0
[...]
# HELP infiniband_portrcvdata_total Total number of data octets, divided by 4 (lanes), received on all VLs.
# TYPE infiniband_portrcvdata_total counter
infiniband_portrcvdata_total{local_guid="0x506b4b03005d3101",local_name="switch1",local_port="2",remote_guid="0x506b4b0300e5e461",remote_name="node1 mlx5_0",remote_port="1"} 5.149057134655e+012
infiniband_portrcvdata_total{local_guid="0x506b4b03005d3101",local_name="switch1",local_port="3",remote_guid="0x506b4b0300c35b61",remote_name="node2 mlx5_0",remote_port="1"} 6.051662505593e+012

infiniband-exporter's People

Contributors

guilbaults avatar likueimo avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.