Git Product home page Git Product logo

machinestate's Introduction

MachineState logo

Introduction

This script should be executed before running benchmarks to determine the current system settings and the execution environment.

On Linux, most information is gathered from sysfs/procfs files to reduce the dependencies. Some information is only available through external tools (likwid-*, nvidia-smi, vecmd, modulecmd) and some basic tools (hostname, users, ...). On MacOS, most information is gathered through the sysctl command.

An example JSON (in extended mode) from an Intel Skylake Desktop system running Linux can be found here (raw).

An example JSON (in extended mode) from an Intel Skylake Desktop system running macOS can be found here (raw).

GitHub Action Codecov DOI


Installation

MachineState is written as Python3 module:

$ git clone https://github.com/RRZE-HPC/MachineState
$ cd MachineState
$ ./machinestate.py

or

$ pip3 install MachineState
$ machinestate
or
$ python3
>>> import machinestate

or just for the current project

$ wget https://raw.githubusercontent.com/RRZE-HPC/MachineState/master/machinestate.py
$ ./machinestate.py

The module cannot be used with Python2!

The module is tested on Ubuntu Xenial for Python versions 3.4, 3.5, 3.6, 3.7 and 3.8 for the architectures AMD64, PPC64le and ARM8. For macOS, only Python versions 3.7 and 3.8 for the AMD64 architecture are tested.


Checks

General:

  • Hostname
  • The current load of the system
  • Number of users that are logged into the system that might disturb the runs
  • Shell environment
  • Module system
  • Installed compilers and MPI implementations
  • Information about the executable (if command is passed as cli argument)

Linux:

  • Operating system and kernel version
  • CPU information (family, model, vulnerabilities, ...) and cpuset
  • CPU, cache and NUMA topology
  • CPU frequency settings
  • Memory information
  • Uncore frequency settings (Uncore only if LIKWID is available)
  • Prefetchers and turbo frequencies (if LIKWID is available)
  • OS settings (NUMA balancing, huge pages, transparent huge pages, ...)
  • Power contraints (RAPL limits)
  • Accelerator information (Nvidida GPUs and NEC Tsubasa)
  • Dmidecode system configuration (if available)

macOS:

  • Operating system version
  • CPU information (family, model, ...)
  • CPU, cache and NUMA topology
  • CPU frequency settings
  • Memory information

All sizes are converted to bytes, all frequencies are converted to Hz


Usage (CLI)

Getting usage help:

usage: machinestate.py [-h] [-e] [-a] [-c] [-s] [-i INDENT] [-o OUTPUT]
                       [-j JSON] [--html] [--configfile CONFIGFILE]
                       [executable]

Reads and outputs system information as JSON document

positional arguments:
  executable            analyze executable (optional)

optional arguments:
  -h, --help            show this help message and exit
  -e, --extended        extended output (default: False)
  -a, --anonymous       Remove host-specific information (default: False)
  -c, --config          print configuration as JSON (files, commands, ...)
  -s, --sort            sort JSON output (default: False)
  -i INDENT, --indent INDENT
                        indention in JSON output (default: 4)
  -o OUTPUT, --output OUTPUT
                        save to file (default: stdout)
  -j JSON, --json JSON  compare given JSON with current state
  -m, --no-meta         embed meta information in classes (recommended, default: True)
  --html                generate HTML page with CSS and JavaScript embedded
                        instead of JSON
  --configfile CONFIGFILE
                        Location of configuration file

If the configfile cli option is not given, machinestate checks for configuration files at (in this order):

  • $PWD/.machinestate
  • $HOME/.machinestate
  • /etc/machinestate.conf

Examples

Gather data and print JSON

$ machinestate
{
    "HostInfo": {
        "Hostname": "testhost"
    },
    [...]
}

Gather extended data and print JSON

$ machinestate -e
{
    "HostInfo": {
        "Hostname": "testhost"
        "Domainname": "testdomain.de",
        "FQDN": "testhost.testdomain.de"
    },
    [...]
}

Gather data, include information about the executable on cmdline and print JSON

$ machinestate hostname
{
    "HostInfo": {
        "Hostname": "testhost"
    },
    [...]
    "ExecutableInfo": {
        "ExecutableInfo": {
            "Name": "hostname",
            "Abspath": "/bin/hostname",
            "Size": 18504
        },
        "LinkedLibraries": {
            "linux-vdso.so.1": null,
            "libc.so.6": "/lib/x86_64-linux-gnu/libc.so.6",
            "/lib64/ld-linux-x86-64.so.2": "/lib64/ld-linux-x86-64.so.2"
        }
    }
}

Redirecting JSON output to file

$ machinestate -o $(hostname -s).json

Sort keys in JSON output

$ machinestate -s

Compare JSON file created with machinestate.py with current state

$ machinestate -j oldstate.json

Output the MachineState data as collapsible HTML table (with CSS and JavaScript):

$ machinestate --html

You can also redirect the HTML output to a file directly:

$ machinestate --html --output machine.html

You can embedd the file in your HTML page within an <iframe>ö.


Configuration file

The configuration file is in JSON format and should look like this:

{
  "dmifile" : "/path/to/file/containing/the/output/of/dmidecode",
  "likwid_enable" : <true|false>,
  "likwid_path" : "/path/to/LIKWID/installation/bin/directory",
  "modulecmd" : "/path/to/modulecmd",
  "vecmd_path" : "/path/to/vecmd/command",
  "debug" : <true|false>,
}

Valid locations are:

  • $PWD/.machinestate
  • $HOME/.machinestate
  • /etc/machinestate.conf

Or the user can specify a custom path with the --configfile CONFIGFILE option.

For the ModulesInfo class with its modulecmd setting, also the TCL version can be used with tclsh /path/to/modulecmd.tcl.


Usage as Python3 module

You can use MachineState also as module in your applications. You don't need to gather all information if you are interested in only specific information classes.

In order to capture the current state:

$ python3
>>> import machinestate
>>> ms = machinestate.MachineState(extended=False, anonymous=False)
>>> ms.generate()                        # generate subclasses
>>> ms.update()                          # read information
>>> ms.get()                             # get the information as dict
{ ... all fields ... }
>>> ms.get_json(indent=4, sort=True)     # get the information as JSON document (parameters optional)
"... JSON document ..."

How to get the list of information classes:

$ python3
>>> import machinestate
>>> help(machinestate)
[...]
Provided classes:
    - HostInfo
    - CpuInfo
    - OSInfo
    [...]

Using single information classes is similar to the big MachineState class

$ python3
>>> import machinestate
>>> hi = machinestate.HostInfo(extended=False, anonymous=False)
>>> hi.generate()
>>> hi.update()
>>> hi_dict = hi.get()
{'Hostname': 'testhost'}
>>> hi_json = hi.get_json()
'{\n    "Hostname": "testhost"\n}'

If you want to compare with an old state:

$ python3
>>> oldstate = {}            # dictionary of oldstate or
                             # filename "oldstate.json" or
                             # JSON document "... OldState JSON document ..."
>>> ms = machinestate.MachineState(extended=False, anonymous=False)
>>> ms.generate()
>>> ms.update()
>>> ms == oldstate
True

In case of 'False', it reports the value differences and missing keys. For integer and float values, it compares the values with a tolerance of 20%. Be aware that if you use oldstate.get() == ms.get(), it uses the default dict comparison which does not print anything and matches exact.

If you want to load an old state and use the class tree

$ python3
>>> oldstate = {}           # dictionary of oldstate or
                            # path to JSON file of oldstate or
                            # JSON document (as string)
                            # or a MachineState class
                            # It has to contain the '_meta' entries
                            # you get when calling get_json() or
                            # get(meta=True)
>>> ms = machinestate.MachineState.from_dict(oldstate)
>>> ms == oldstate
True

Differences between Shell and Python version

The Shell version (shell-version/machine-state.sh) executes some commands and just dumps the output to stdout.

The Python version (machinestate.py) collects all data and outputs it in JSON format. This version is currently under development.


Additional information by others

machinestate's People

Contributors

bkmgit avatar christiealappatt avatar cod3monk avatar janljl avatar tomthebear avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Forkers

lkampoli

machinestate's Issues

Module Info for Lmod

The module information does only work with TCL. It would be great if it will also work with lmod (written in lua).

[BUG] Error when running in extended output mode using Intel MPI

Describe the bug
When running in an environment with Intel MPI, the execution of impi_info, which happens when using the extended output, fails.
Even though it is documented that impi_info returns a list as shown in their Developer Reference, the output I get is in the format:

|<KEY>
 MPI Datatype:
    <VALUE_TYPE>
| <VALUE> | <VALUE_TYPE> |

, e.g.:

| NAME                                           | DEFAULT VALUE | DATA TYPE |
 ==============================================================================
 |I_MPI_JOB_STARTUP_TIMEOUT
  MPI Datatype:
    MPI_CHAR
 | -1            | MPI_CHAR  |
 |I_MPI_HYDRA_HOST_FILE                                                                                                                                                                                             MPI Datatype:
    MPI_CHAR
 | not defined   | MPI_CHAR  |

I tried this with Intel MPI in version 2021.4.0, 2021.7.0, and 2021.9.0.
Obviously, this results in the error that machinestate can't find the value due to it being in a new line.

To Reproduce
Have Intel MPI loaded and run the following command:

$ machinestate -e -a -o test.json
Traceback (most recent call last):
  File "/home/woody/ihpc/ihpc030h/conda/envs/venv310/bin/machinestate", line 33, in <module>
    sys.exit(load_entry_point('MachineState', 'console_scripts', 'machinestate')())
  File "/home/hpc/ihpc/ihpc030h/git/MachineState/machinestate.py", line 3657, in main
    mstate.update()
  File "/home/hpc/ihpc/ihpc030h/git/MachineState/machinestate.py", line 824, in update
    inst.update()
  File "/home/hpc/ihpc/ihpc030h/git/MachineState/machinestate.py", line 821, in update
    data = op.parse(data)
  File "/home/hpc/ihpc/ihpc030h/git/MachineState/machinestate.py", line 578, in parse
    out = self.parser(data)
  File "/home/hpc/ihpc/ihpc030h/git/MachineState/machinestate.py", line 2493, in intelmpiparams
    outdict[llist[1]] = llist[2]
IndexError: list index out of range

Fix
My workaround is to grep all lines including "|" and processing both the key and the value line in one loop iteration (see diff below).
This only works as long as impi_info prints the table with the weird MPI Datatype: <VALUE_TYPE> artifact in between key and value, so I am not sure if we should adjust the machinestate code.

$ git diff machinestate.py
diff --git a/machinestate.py b/machinestate.py
index 4d65719..85b2ca2 100755
--- a/machinestate.py
+++ b/machinestate.py
@@ -2473,7 +2473,7 @@ class MpiInfo(ListInfoGroup):
                 self.addc("OpenMpiParams", ompi, ompi_args, parse=MpiInfo.openmpiparams)
             impi = which("impi_info")
             if impi and len(impi) > 0 and extended:
-                self.addc("IntelMpiParams", impi, "| grep I_MPI", parse=MpiInfo.intelmpiparams)
+                self.addc("IntelMpiParams", impi, "| grep \"|\"", parse=MpiInfo.intelmpiparams)
     @staticmethod
     def openmpiparams(value):
         outdict = {}
@@ -2486,10 +2486,11 @@ class MpiInfo(ListInfoGroup):
     @staticmethod
     def intelmpiparams(value):
         outdict = {}
-        for line in value.split("\n"):
+        vlist = value.split("\n")
+        for i, line in enumerate(vlist):
             if "I_MPI" not in line: continue
             if not line.strip(): continue
-            llist = [x.strip() for x in line.split("|")]
+            llist = [x.strip() for x in (line + vlist[i+1]).split("|")]
             outdict[llist[1]] = llist[2]
         return outdict

macOS support?

Is there any interest in adding support for macOS?

I gave it a quick shot on my laptop (using 990194f) and it died horribly, probably because it assumes Linux as OS?

$ ./machinestate.py
Traceback (most recent call last):
  File "./machinestate.py", line 2610, in <module>
    main()
  File "./machinestate.py", line 2571, in main
    mstate.generate()
  File "./machinestate.py", line 903, in generate
    raise exce
  File "./machinestate.py", line 897, in generate
    cls = cltype(extended=self.extended, anonymous=self.anonymous, **clargs)
  File "./machinestate.py", line 1486, in __init__
    base = pjoin("/sys/fs/cgroup/cpuset", cset.strip("/"))
AttributeError: 'NoneType' object has no attribute 'strip'

Add more performance-relevant settings for macOS

The current macOS support contains the basic information about CPU, caches and software environment but no real performance-relevant setting. All info I found in the www are about network tuning and extending shared memory regions:

What are performance-relavant settings and how to get them?

Error when running machinestate/comparing JSONs without likwid

When running machinestate without likwid, the attribute PerfEnergyBias has value similiar like this:

...
"PerfEnergyBias": "ERROR - [./src/access_client.c:189] No such file or directory\nExiting due to timeout: The socket file at '/tmp/likwid-9098' could not be\nopened within 10 seconds. Consult the error message above\nthis to find out why. If the error is 'no such file or directoy',\nit usually means that likwid-accessD just failed to start."
...

Furthermore, whenever I do another run it (obviously) prints a different four-digit-number as part of the socket file, therefore, when comparing two machine states with -j option, I get an error due to a mismatch in this string:

$ machinestate -j old_state.json
ERROR: Equality check failed for key 'PerfEnergyBias' for class TurboInfo
The current state differs at least in one setting with input file

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.