Git Product home page Git Product logo

pmemtool's Introduction

Functionality Reports Help & Reference Developers
PMEM Tool DIMM Report External Refs Theory of Operation
Guided Recovery DIMM Status IPMCTL User Guide Function Timers
Recovery Script PMEM FS NDCTL User Guide TO DO
PMFS Health Report Steve Scargall PMEM Programming Book

pmemtool

This multipurpose tool provides health and configuration details for server platforms integrating Intel Optane Persistent Memory (PMEM) amd provides guided recovery from a pmem device fault. Data is collected from ndctl, ipmctl, and /etc/fstab and integrated to enable rapid interpretation of PMEM DIMM, Region, namespace, and filesystem Status.

The recovery option generates bash scripts for each CPU socket with commands to restore persistent memory services to an operational state should a PMEM DIMM failure occur. Refer to Guided Recovery for additional details.

Python modules were created to interact with data from ndctl, DAX Mounted File Systems and fstab data with pmt providing the primary user interface through a command line.

Reports

Three Reports are currently generated through this tool with each targeting specific functionally and status.

Optane DIMM Report

The OPtane DIMM Report presents DIMM DIMM name, Status

DIMMID  Health State     PMEM DIMM UUID         Capacity     Skt   iMC   Chan  Slot  FW Version      Device Locator
------  ---------------  ---------------------  -----------  ----  ----  ----  ----  --------------  --------------------
0x0001  Healthy          8089-a2-1836-00002c4b  252.454 GiB  0     0     0     1     01.02.00.5446   CPU1_DIMM_A2
0x0011  Healthy          8089-a2-1836-0000214e  252.454 GiB  0     0     1     1     01.02.00.5446   CPU1_DIMM_B2
0x1111  Healthy          8089-a2-1836-00002639  252.454 GiB  1     1     1     1     01.02.00.5446   CPU2_DIMM_E2
0x1121  Healthy          8089-a2-1836-00002617  252.454 GiB  1     1     2     1     01.02.00.5446   CPU2_DIMM_F2
0x0021  Healthy          8089-a2-1836-00002716  252.454 GiB  0     0     2     1     01.02.00.5446   CPU1_DIMM_C2
0x0101  Healthy          8089-a2-1836-000025c2  252.454 GiB  0     1     0     1     01.02.00.5446   CPU1_DIMM_D2
0x0111  Healthy          8089-a2-1842-00002352  252.454 GiB  0     1     1     1     01.02.00.5446   CPU1_DIMM_E2
0x0121  Healthy          8089-a2-1836-000025fc  252.454 GiB  0     1     2     1     01.02.00.5446   CPU1_DIMM_F2
0x1001  Healthy          8089-a2-1836-000025be  252.454 GiB  1     0     0     1     01.02.00.5446   CPU2_DIMM_A2
0x1011  Healthy          8089-a2-1836-00001db5  252.454 GiB  1     0     1     1     01.02.00.5446   CPU2_DIMM_B2
0x1021  Healthy          8089-a2-1836-00001e90  252.454 GiB  1     0     2     1     01.02.00.5446   CPU2_DIMM_C2
0x1101  Healthy          8089-a2-1836-000020bf  252.454 GiB  1     1     0     1     01.02.00.5446   CPU2_DIMM_D2

Persistent Memory Filesystem (PMFS) Report

The PMFS report is driven from contents of /etc/fstab and shows PM Region, namespace device, mount point, and PMEM DIMM's associated with the region and namespace. The intent is to roll up the namespace health based upon teh health of teh underlying PMEM DIMM's.

Mount Point  Mounted NS Size   Health   Region     NS dev   NS Type  fs_type  PMEM Devices
------------ ------- --------- -------- ---------- -------- -------- -------- --------------------------------------
/pmemfs0     False   1488 GiB  ok       region0    pmem0    fsdaX    xfs      nmem5 nmem4 nmem3 nmem2 nmem1 nmem0
/pmemfs1     True    1488 GiB  ok       region1    pmem1    fsdaX    xfs      nmem11 nmem10 nmem9 nmem8 nmem7 nmem6

Healthy Persistent Memory Mount Points

This report lists healthy PMFS filesystems with its initial use targeted for consumption by by specific database application that leverages DAX mounted PMFS to accelerate its in-memory database. By copying the line into the database configuration, the DB engine will map its objects to those Persistent Memory Filesystems.

PMFS with OK status: /pmemfs0; /pmemfs1;

Help

usage: pmt [-h] [--delimiter DELIMITER] [--suffix SUFFIX] [--recovery] [--script_prefix SCRIPT_PREFIX]
           [--script_path SCRIPT_PATH] [--verbose {1,2,3,4,5,6,7,8,9,10,11,12,13,14}] [--sandbox SANDBOX]

Persistent Memory Tool

optional arguments:
  -h, --help            show this help message and exit
  --delimiter DELIMITER
                        specify delimiter for pmfs mount path. Default: None
  --suffix SUFFIX       string to append to pmfs mount path Default: None
  --recovery            Generate Recovery Scripts for each socket.
  --script_prefix SCRIPT_PREFIX
                        change recover script name prefix. default: recover_socket
  --script_path SCRIPT_PATH
                        change recovery script destination path. default: /tmp
  --verbose {1,2,3,4,5,6,7,8,9,10,11,12,13,14}
                        enable increasingly more verbosity. Verbose Values=1-5, Debug Values=10-15
  --sandbox SANDBOX     path to optional sandbox environment. Default:''

Common Usage

./pmt

Extending Capabilities

Many additional capabilities can be added to pmt through the modules ndctl.py, fsab.py, and common.py refer to the modules themselves

pmemtool's People

Contributors

davelarsen58 avatar

Stargazers

 avatar Steve Scargall avatar  avatar

Watchers

Steve Scargall avatar  avatar

pmemtool's Issues

Version info via the -h flag please and error message when executing 1.00 release

Hi Dave,

would it be possible to show the pmemtool version as part of the -h or as a parameter please?

Also since we updated to version 1.00 I get this error when running pmt:
xxxxxxxxxx:/usr/share/pmemtool-main # pmt
Traceback (most recent call last):
File "./pmt", line 239, in
main(sys.argv[1:])
File "./pmt", line 142, in main
i.dimms = i.parse_dimm()
File "/usr/share/pmemtool-main/ipmctl.py", line 233, in parse_dimm
if white_list[root[dimm][dimm_attr].tag] == 1:
KeyError: 'FWActiveAPIVersion'

provide better handling of missing namespaces during recovery

This particular traceback was obtained during a recovery operation where a new region was created, however the namespace had not yet been created.

Traceback (most recent call last):
File "./recovery.py", line 356, in
main()
File "./recovery.py", line 351, in main
status = recover_all()
File "./recovery.py", line 274, in recover_all
data['ns_name'] = ''.join(n.get_ns_dev(socket_num))
File "/homes/dplarsen/src/pmemtool/ndctl.py", line 122, in get_ns_dev
tmp = json.load(f)
File "/usr/lib/python3.8/json/init.py", line 293, in load
return loads(fp.read(),
File "/usr/lib/python3.8/json/init.py", line 357, in loads
return _default_decoder.decode(s)
File "/usr/lib/python3.8/json/decoder.py", line 337, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "/usr/lib/python3.8/json/decoder.py", line 355, in raw_decode
raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

PMT Namespace error while testing recovery

:~ # pmt
Traceback (most recent call last):
File "./pmt", line 188, in
main(sys.argv[1:])
File "./pmt", line 141, in main
nsDevList = n.get_ns_device_list_by_dimm(dimm)
File "/usr/share/pmemtool-main/ndctl.py", line 272, in get_ns_device_list_by_dimm
ns_device_list = get_region_ns_device_list(region)
File "/usr/share/pmemtool-main/ndctl.py", line 129, in get_region_ns_device_list
for d in range(len(ndctl['regions'][r]['namespaces'])):
KeyError: 'namespaces'
:~ #

error on missing namespace is not properly handled

r1ldassps4005:~ # pmt
Traceback (most recent call last):
File "./pmt", line 239, in
main(sys.argv[1:])
File "./pmt", line 162, in main
nsDevList = n.get_ns_device_list_by_dimm(dimm)
File "/usr/share/pmemtool-main/ndctl.py", line 409, in get_ns_device_list_by_dimm
ns_device_list = get_region_ns_device_list(region)
File "/usr/share/pmemtool-main/ndctl.py", line 266, in get_region_ns_device_list
for d in range(len(ndctl['regions'][r]['namespaces'])):
KeyError: 'namespaces'

pmt --timers throws error when --timers option provided

------------Start Recovery function timers---------------
Function Elapsed Start End


Traceback (most recent call last):
File "./pmt", line 322, in
main(sys.argv[1:])
File "./pmt", line 317, in main
r.print_timers()
File "/homes/dplarsen/src/pmemtool/recovery.py", line 422, in print_timers
first = t[0]['tic']
IndexError: list index out of range

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.