Git Product home page Git Product logo

standalone-linux-io-tracer's Introduction

Standalone Linux IO Tracer

NEWS

Standalone Linux IO Tracer switched to using eBPF for capturing traces. Previously, the tracer ran custom loadable kernel module for that. eBPF tracing method is more secure for user. The version which uses the kernel module will be obsoleted. In case you want to run old version, switch to this branch: master-kernel

Description

Standalone Linux IO Tracer (iotrace) is a tool for block device I/O tracing and management of created traces

For each I/O to target device(s) basic metadata information is captured (IO operation type, address, size), supplemented with extended classification. Extended classification contains information about I/O type (direct / filesystem metadata / file) and target file attributes(e.g. file size).

iotrace is based on Open CAS Telemetry Framework (OCTF). Collected traces are stored in OCTF trace location. Traces can later be converted to JSON or CSV format.

The iotrace executable (iotrace command line application) includes an eBPF program which is loaded to the Linux kernel during tracing. The eBPF program captures trace information and shares them to the userspace iotrace application. This is serialized to the OCTF IO trace.

In this readme:

Supported OS

Right now the compilation of Standalone Linux IO Tracer is tested on the following OSes:

OS Version Kernel Version
Fedora 36 5.18.10
Ubuntu 22.04 5.15.0

Source Code

Source code is available in the official Standalone Linux IO Tracer GitHub repository:

git clone https://github.com/open-cas/standalone-linux-io-tracer
cd standalone-linux-io-tracer

Deployment

Prerequisites

  • To build and use Standalone Linux IO Tracer, setup prerequisites first in the following way:

    git submodule update --init --recursive
    sudo ./setup_dependencies.sh
    

    Installed dependencies include OCTF, Google Protocol Buffers, CMake and Google Test. The dependencies are either installed with dnf/yum/apt or installed to a dedicated directory /opt/octf/ to avoid overwriting already installed ones.

Build

To build the iotrace executable invoke:

make

You can try to create rpm/deb installation package.

make package

For example in case of Fedora OS, the generated installation package is located in build/release/iotrace-XX.YY.ZZ-1.x86_64.rpm.

Installation

To install iotrace call:

sudo make install

Also you can try to install iotrace using rpm/deb package:

rpm -Uvh iotrace-XX.YY.ZZ-1.x86_64.rpm.

Theory of operation

Standalone Linux IO Tracer captures request data by registering to multiple trace points surfaced by the Linux kernel (e.g. BIO queueing, BIO splitting, BIO completion). This allows for gathering of IO metadata at the request level and passing it between kernel and userspace.

A perf buffer is allocated and shared between the eBPF program and the userspace application. The below example shows a recorded traces event.

struct iotrace_event_hdr {
    /** Event sequence ID */
    log_sid_t sid;

    /** Time stamp */
    uint64_t timestamp;

    /** Trace event type, iotrace_event_type enunerator */
    uint32_t type;

    /** Size of this event, including header */
    uint32_t size;
} __attribute__((packed, aligned(8)));

...

struct iotrace_event {
    /** Trace event header */
    struct iotrace_event_hdr hdr;
    /**
     * @brief IO ID
     *
     * This ID can be used by the tracing environment to assign an ID to the IO.
     *
     * @note Zero means not set.
     */
    uint64_t id;

    /** Address of IO in sectors */
    uint64_t lba;

    /** Size of IO in sectors */
    uint32_t len;

    /** IO class of IO */
    uint32_t io_class;

    /** Device ID */
    uint32_t dev_id;

    /** Operation flags: flush, fua, ... .
     * Values according to iotrace_event_flag_t enum
     * are summed (OR-ed) together. */
    uint32_t flags;

    /** Operation type: read, write, discard
     * (iotrace_event_operation_t enumerator) **/
    uint8_t operation;

    /** Write hint associated with IO */
    uint8_t write_hint;
} __attribute__((packed, aligned(8)));

The events declaration file can be found here.

The userspace part of the Standalone Linux IO Tracer reads the entries from the perf buffer and translates them into Google Protocol Buffer format (see example below), for easier portability. The data is then serialized in trace files in a per CPU basis (e.g. octf.trace.0).

message EventHeader {
    /** Event sequence ID */
    uint64 sid = 1;

    /** Time stamp */
    uint64 timestamp = 2;
}

...

enum IoType {
    UnknownIoType = 0;
    Read = 1;
    Write = 2;
    Discard = 3;
}

...

message EventIo {
    /** Address of IO in sectors */
    uint64 lba = 1;

    /** Size of IO in sectors */
    uint32 len = 2;

    /** IO class of IO */
    uint32 ioClass = 3;

    /** Device ID */
    uint64 deviceId = 4;

    /** Operation type: read, write, trim */
    IoType operation = 5;

    /** Flush flag */
    bool flush = 6;

    /** FUA flag */
    bool fua = 7;

    /** Write (lifetime) hint */
    uint32 writeHint = 8;

    /**
     * This ID can be used by the tracing environment to assign an ID to the IO.
     * Zero means not set.
     */
    uint64 id = 9;
}

...

message Event {
    /** Trace event header */
    EventHeader header = 1;

    oneof EventType {
        EventIo io = 2;
        EventDeviceDescription deviceDescription = 3;
        EventIoFilesystemMeta filesystemMeta = 4;
        EventIoCompletion ioCompletion = 5;
        EventIoFilesystemFileName filesystemFileName = 6;
        EventIoFilesystemFileEvent filesystemFileEvent = 7;
    }
}

The protobuf events declaration file can be found here.

You may see the results of translating into the above protobuf format, by executing the following command:

iotrace --trace-parser --io --path "kernel/2020-07-02_08:52:51" --raw

Output:

...
{"header":{"sid":"1","timestamp":"14193058940837"},"deviceDescription":{"id":"271581186","name":"nvme0n1","size":"732585168","model":"INTEL SSDPED1K375GA"}}
{"header":{"sid":"73","timestamp":"14196894550578"},"io":{"lba":"1652296","len":256,"ioClass":19,"deviceId":"271581186","operation":"Write","flush":false,"fua":false,"writeHint":0,"id":"110842991263647"}}
{"header":{"sid":"74","timestamp":"14196894550696"},"filesystemMeta":{"refSid":"110842991263647","fileId":{"partitionId":"271581186","id":"76","creationDate":"2020-07-02T06:52:55.712990641Z"},"fileOffset":"0","fileSize":"241960"}}
...

After tracing is complete these singular trace events may be parsed, combined and translated into different Google Protocol Buffer messages (or other formats, such as CSV) when executing Standalone Linux IO Tracer trace parser commands.

For example the --trace-parser --io command analyzes multiple submission, split and completion events to give a more complete view of a given IO request such as: its latency, queue depth, file size and path (if applicable) etc.

Examples

  • Start tracing two block devices for 1 hour, or until trace file is 1GiB:

    sudo iotrace --start-tracing --devices /dev/sda,/dev/sdb1 --time 3600 --size 1024
    
    • The below output example is based on sample traces found here. The traces were captured during YCSB workload type A benchmark on RocksDB.

    Traces can be unpacked using the following command:

    tar -xf sample_trace.tar.xz
    

    They can then be moved to the trace repository directory, the path of which can be extracted using:

    iotrace --trace-config --get-trace-repository-path
    
  • List created traces:

    iotrace --trace-management --list-traces
    

    Output:

    {
    "trace": [
      {
       "tracePath": "kernel/2020-07-02_08:52:51",
       "state": "COMPLETE"
       "tags": {}
      }
    ]
    }
    
  • Parse traces (note usage of path returned in --list-traces):

    iotrace --trace-parser --io --path "kernel/2020-07-02_08:52:51" --format json
    

    Output:

    {"header":{"sid":"1","timestamp":"3835590186"},"io":{"lba":"1652296","len":256,"ioClass":19,"operation":"Write","flush":false,"fua":false,"error":false,"latency":"83797","qd":"1","writeHint":0},"device":{"id":"271581186","name":"nvme0n1","partition":"271581186","model":"INTEL SSDPED1K375GA"},"file":{"id":"76","offset":"0","size":"241960","path":"/000014.sst","eventType":"Access","creationDate":"2020-07-02T06:52:55.712990641Z"}}
    {"header":{"sid":"2","timestamp":"3835625267"},"io":{"lba":"1652552","len":256,"ioClass":19,"operation":"Write","flush":false,"fua":false,"error":false,"latency":"95069","qd":"2","writeHint":0},"device":{"id":"271581186","name":"nvme0n1","partition":"271581186","model":"INTEL SSDPED1K375GA"},"file":{"id":"76","offset":"256","size":"241960","path":"/000014.sst","eventType":"Access","creationDate":"2020-07-02T06:52:55.712990641Z"}}
    {"header":{"sid":"3","timestamp":"3835638717"},"io":{"lba":"1652808","len":256,"ioClass":19,"operation":"Write","flush":false,"fua":false,"error":false,"latency":"130094","qd":"3","writeHint":0},"device":{"id":"271581186","name":"nvme0n1","partition":"271581186","model":"INTEL SSDPED1K375GA"},"file":{"id":"76","offset":"512","size":"241960","path":"/000014.sst","eventType":"Access","creationDate":"2020-07-02T06:52:55.712990641Z"}}
    {"header":{"sid":"4","timestamp":"3835652180"},"io":{"lba":"1653064","len":256,"ioClass":19,"operation":"Write","flush":false,"fua":false,"error":false,"latency":"203209","qd":"4","writeHint":0},"device":{"id":"271581186","name":"nvme0n1","partition":"271581186","model":"INTEL SSDPED1K375GA"},"file":{"id":"76","offset":"768","size":"241960","path":"/000014.sst","eventType":"Access","creationDate":"2020-07-02T06:52:55.712990641Z"}}
    ...
    

    NOTE: Any mention of LBA assumes a 512B sector size, even if the underlying drive is formatted to 4096B sectors. Similarly a 512B sector is the unit of length.

  • Show trace statistics:

    iotrace --trace-parser --statistics -p "kernel/2020-07-02_08:52:51/"
    

    Output:

    "statistics": [
    {
     "desc": {
      "device": {
       "id": "271581186",
       "name": "nvme0n1",
       "size": "732585168" >  NOTE:  In sectors
      }
     },
     "duration": "24650525234",  NOTE:  In nanoseconds
     ...
     "write": {  NOTE:  Statistics for writes
      "size": {
       "unit": "sector",
       "average": "254",  NOTE:  Average write I/O size in sectors
       "min": "1",  NOTE:  Minimum write I/O size in sectors
       "max": "256",  NOTE:  Maximum write I/O size in sectors
       "total": "16338524",  NOTE:  Total write I/O size in sectors
       ...
      "latency": {
       "unit": "ns",
       "average": "15071365",  NOTE:  Average write I/O latency in ns
       "min": "9942",  NOTE:  Minimum write I/O latency in ns
       "max": "63161202",  NOTE:  Maximum write I/O latency in ns
      ...
      "count": "64188",  NOTE:  Number of write operations
      "metrics": {
       "throughput": {
        "unit": "IOPS",
        "value": 2603.9201757643164  NOTE:  Average write IOPS
       },
       "workset": {
        "unit": "sector",
        "value": 2237616  NOTE:  Number of distinct sectors written
       },
       "bandwidth": {
        "unit": "MiB/s",
        "value": 323.63590009317039  NOTE:  Average write bandwidth
       }
    ...
    

    NOTE: Similar statistics exist for any read, discard or flush operations. There's a section for the combined statistics as well (Total).

  • Show file system statistics:

    iotrace --trace-parser --fs-statistics -p "kernel/2020-07-02_08:52:51/"
    

    Output:

    "entries": [
     {
      "deviceId": "271581186",
      "partitionId": "271581186",
      "statistics": {
     ...
     "write": {
       "size": {
        "unit": "sector",
        "average": "255",  NOTE:  Average write I/O size in sectors
        "min": "8",  NOTE:  Minimum write I/O size in sectors
        "max": "256",  NOTE:  Maximum write I/O size in sectors
        "total": "16336216",  NOTE:  Total write I/O size in sectors
        "percentiles": {}
       },
       "count": "63984",
       "metrics": {
        "throughput": {
         "unit": "IOPS",
         "value": 2602.9390810897676
        },
        "workset": {
         "unit": "sector",
         "value": 2237096  NOTE:  Number of distinct sectors written
        },
        "write invalidation factor": {
         "unit": "",
         "value": 7.3024206381845032  NOTE:  Average number of times a given sector in the workset is written to
        },
        "bandwidth": {
         "unit": "MiB/s",
         "value": 324.49957478019985  NOTE:  Average write bandwidth
        }
       },
       ...
       "directory": "/"  NOTE:  Directories are relative to the filesystem mountpoint
    

    NOTE: Similar statistics exist for any read, discard or flush operations. There's a section for the combined statistics as well (Total).

    NOTE: File system statistics are gathered for detected groups of related I/O requests with common attributes like directory, file extension, file name prefix or IO class.

  • Show latency histogram:

    iotrace --trace-parser --latency-histogram -p "kernel/2020-07-02_08:52:51/"
    

    Output:

    ...
     "write": {
      "unit": "ns",
      "range": [
       {
        "begin": "0",
        "end": "0",
        "count": "0"
       },
       ...
       {
        "begin": "8192",  NOTE:  Minimum bound of latency histogram bucket in nanoseconds
        "end": "16383",  NOTE:  Maximum bound of latency histogram bucket in nanoseconds
        "count": "95"  NOTE:  Number of I/O requests in the bucket
       },
    ...
    

    NOTE: In the above example 95 write requests had latency between 8192 and 16383 ns.

    NOTE: Similar statistics exist for any read, discard or flush operations. There's a section for the combined statistics as well (Total).

  • Show latency histogram:

    iotrace --trace-parser --lba-histogram -p "kernel/2020-07-02_08:52:51/"
    

    Output:

    ...
     "write": {
      "unit": "sector",
      "range": [
       {
        "begin": "0",   NOTE:  Minimum disk LBA bound of latency histogram bucket
        "end": "20479",  NOTE:  Maximum disk LBA bound of latency histogram bucket
        "count": "177"  NOTE:  Number of I/O requests in the bucket
       },
    ...
    

    NOTE: In the above example 177 write requests were issued to disk LBAs 0 through 20479.

    NOTE: Similar statistics exist for any read, discard or flush operations. There's a section for the combined statistics as well (Total).

Tests

See our tests README

Contributing

Please refer to the OCTF contributing guide.

Related Projects

Please explore related projects:

Notice

NOTICE contains more information

standalone-linux-io-tracer's People

Contributors

arutk avatar gertpauwels avatar mariuszbarczak avatar micrakow avatar mmkaypl avatar ostrokrzew avatar plchd avatar tomaszrybicki avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

standalone-linux-io-tracer's Issues

Zero copy IO tracing in kernel

At the moment we copy the IO trace event into circular buffer. We expect getting buffer from the circular buffer and fill the IO trace event directly on buffer.

Missing '99.99' latency percentile

When printing trace statistics (iotrace -P -S -p <trace_path>), value of 99.900000th latency percentile is missing.
"latency": { "unit": "ns", "average": "8094", "min": "6851", "max": "5929010", "count": "262144", "total": "2121846323", "percentiles": { "90.000000th": 12288, "99.900000th": 6291456, "99.000000th": 24576 } }
Also notice that size percentiles are printed correct:
"size": { "unit": "sector", "average": "8", "min": "8", "max": "8", "count": "262144", "total": "2097152", "percentiles": { "99.000000th": 9.5, "90.000000th": 8.5, "99.900000th": 10.5, "99.990000th": 11.5 }

Traces description is inconsistent between device statistics and FS statistics

When calculating device statistics you get this nice header informing you on which device traces were collected:
"device": { "id": "271581184", "name": "nvme0n1", "size": "781422768" }

On the other hand, the header of FS stats looks entirely different:
"deviceId": "271581184", "partitionId": "271581184",

Is it possible to get the same level of detail in FS statistics as in device statistics?

Improves IO Parsing

Enhancement:

  • IO trace events and related one should be printed in on line
  • IO trace processing should add latency of IO

Starting trace fails if iotrace module is loaded before that

Test scenario:

  1. Install iotrace
  2. modprobe iotrace
  3. start tracing (e.g. iotrace -S -d /dev/nvme0n1)

Output after starting and finishing trace:

{
 "timestamp": "2020-02-18 00:31.539",
 "system": "iotrace",
 "severity": "Error",
 "trace": "Failed to open trace file: /proc/iotrace//trace_ring.0"
}
^C{
 "timestamp": "2020-02-18 00:31.544",
 "system": "iotrace",
 "severity": "Error",
 "trace": "Tracing not completed, trace path kernel/2020-02-18_00:31:25"
}

Without modprobe command starting succeeds properly.

OS: Ubuntu 18.04.3 LTS
kernel: 5.0.0-23-generic

Unsupported operands in test_lba_histogram

Test test_lba_histogram in tests/functional/test_trace_io_events.py fails dure to changes in test API.

def test_lba_histogram():
    TestRun.LOGGER.info("Testing lba histogram")
    iotrace = TestRun.plugins['iotrace']
    number_buckets = 32
    bucket_size = Size(256)
    start_lba = 10240
  end_lba = start_lba + number_buckets * bucket_size

E TypeError: unsupported operand type(s) for +: 'int' and 'Size'

`count` in trace stats is confusing.

"write": { "size": { "unit": "sector", "average": "8", "min": "8", "max": "8", **"count": "262144",** "total": "2097152", "percentiles": { "90.000000th": 8.5, "99.900000th": 10.5, "99.990000th": 11.5, "99.000000th": 9.5 } }, "latency": { "unit": "ns", "average": "8235", "min": "6932", "max": "5954144", **"count": "262144",** "total": "2158819326", "percentiles": { "99.900000th": 6291456, "99.000000th": 24576, "90.000000th": 12288 } }, ... }

count field value means number of events. It should be moved out of latency and size structs.

Setting incorrect values in /proc/iotrace/size

I treat writing to the size file as an equivalent of setting --buffer in a start command. Bearing that in mind:

  1. You can write values greater than 1024 directly to the file (1024 is max for a start command);
  2. After running a start command, the parameter set in the size file is smaller (by 1) than the value used in the start command. With no parameter set (default 100) size file contains 99. I don't know if that is intended, but it is certainly confusing.

Trace logical addresses when operating on partition

Trace the logical address instead of the physical one when operating on a partition of a device.
For instance, the logs below show a write to the first block within the /dev/nvme0n1p1 partition, while the traces (collected from the partition device, /dev/nvme0n1p1, not /dev/nvme0n1) indicate a write being sent to address 2048 (start of the /dev/nvme0n1p1 partition). It seems more logical to record the traces offset within the bounds of the partition if user specifies the partition as the device to be traced.

[root@iotrace iotrace]# fdisk -l /dev/nvme0n1

Units = sectors of 1 * 4096 = 4096 bytes
Sector size (logical/physical): 4096 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disk label type: dos
Disk identifier: 0x3c1bcb0d

        Device Boot      Start         End      Blocks   Id  System
/dev/nvme0n1p1             256     7078143    28311552   83  Linux

[root@iotrace iotrace]# iotrace -S -d /dev/nvme0n1p1 &
[1] 58924
[root@iotrace iotrace]# dd if=/dev/zero of=/dev/nvme0n1p1 bs=4k count=1 seek=0 oflag=dsync
1+0 records in
1+0 records out
4096 bytes (4.1 kB) copied, 0.000327407 s, 12.5 MB/s

[root@iotrace iotrace]# kill 58924
...
{
 "tracePath": "kernel/2019-08-20_05:41:32",
 "state": "COMPLETE",
 "sourceNode": {
  "node": [
   {
    "id": "kernel"
   }
  ]
 },
 "traceStartDateTime": "2019-08-20 05:41",
 "traceDuration": "7",
 "traceSize": 1,
 "tracedEvents": "61",
 "droppedEvents": "0",
 "queueCount": "36"
}

[root@iotrace iotrace]# iotrace --parse-trace --path kernel/2019-08-20_05:41:32 | grep Write
{"header":{"sid":"60473671","timestamp":"525077100448357"},"io":{"lba":"2048","len":8,"ioClass":1,"deviceId":"271581184","operation":"Write","flush":false,"fua":false}}

get-fs-statistics not generating output

I have tried several captured traces to determine what -get-fs-statistics can be determined but the output from the various traces is nothing.

What is the expected behaviour?

Wrong permissions for config file and trace dir/files

Config file has proper permissions for root but too broad permissions for group/others (should have none, has read).
For trace dir/files expected permissions are 770/600 respectively.
For default trace dir actual permissions are 755; for all nested dirs and non-default trace path permissions are 760; for trace files 644.

Incomplete instructions to install io tracer and octf on brand new RHEL 7.6

started with brand new RHEL 7.6 buld (server with gui), system registered with RHEL
download latest GIT for standalone tracer

following the README.md, I ran the setup dependencies.sh for standalone iotracer (log attached), failure

downloaded latest GIT for octf, ran the setup dependencies.sh (log attached)

I am unable to build the tools.

Are there different readmes?
error-octf.log
error-iotrace.log
error-octf.log
error-iotrace.log

Traced data for RAID volume or RAID members not consistent

Traced information from VROC or drives used in a RAID volume are not reporting io.lba correctly and the latency is appearing to be zero

A simple RAID0 volume was constructed
iotrace --start-trace --devices /dev/md126 -t 400
a fio mixed workload was started

No latency or qd was captured

Then repeat of workload but this time capture a trace of the 2 devices that were striped in the RAID volume
iotrace --start-trace --devices /dev/nvme0n1,/dev/nvme1n1 -t 400
Now some latency is shown, iodepth is same as above

In both cases, the io.lba ranges from 0 to very large numbers

Please advise of a share to post the 115M and 220M trace files and large xls. I saved the 1st 20lines of the parsed output.
parsed-md126-vroc-raid0-20lines.xlsx
parsed-2drives-vroc-raid0-20lines.xlsx

Traces of NVMe admin commands

Hi,
I am not sure if this is correct place to raise these queries.

  1. I am planning to use standalone-linux-io-tracer for capturing traces of my NVMe drive. I could successfully capture traces when IOs are running on drive. But when admin commands are run no traces are captured. Does this tool captures traces for IOs only? Is there any option to enable the traces for admin commands? If not, is there any plan to enhance the tool to do so?
  2. Is there any option to stop tracing in between? How to stop tracing before the mentioned trace duration?

Support for latest kernel versions

IOtracer works fine with kernel 3.10. But when 3.10 is upgraded to 5.3.8, it started giving error 'Kernel tracing module is not loaded.'. I tried to re-install it on kernel 5.3.8 but re-installation also fails. I tried to debug the issue, it seems that directory of 3.10 and 5.3.8 are completely different and it is causing IOtracer failure. Is there any plan to support IOtracer on latest kernel versions? Is there any workaround that I can try for now?

In trace statistics average latency has wrong value

Division on unsigned long longs gives wrong result. Tried to cast divisor and type of variable average to double (also long double) but it didn't work. Maybe somebody else would have got an idea for this "simple" arithmetic operation.

Invalid discard section content

After issuing IO with following fio cmd
fio --eta=always --output-format=json --filename=/dev/nvme0n1 --ioengine=sync --size=1073741824 --blocksize=4096 --iodepth=32 --numjobs=1 --direct=1 --readwrite=randtrim --name=fio --group_reportin --lat_percentiles=1 --percentile_list=100 --clat_percentiles=0 --slat_percentiles=1

iotrace stats discard section (iotrace -P -S -p <trace_path>) have value:
"discard": { "size": { "unit": "sector", "average": "8", "min": "8", "max": "8", "count": "262144", "total": "2097152", "percentiles": { "90.000000th": 8.5, "99.900000th": 10.5, "99.990000th": 11.5, "99.000000th": 9.5 } }, "latency": { "unit": "ns", "average": "0", "min": "0", "max": "0", "count": "0", "total": "0", "percentiles": {} }, "metrics": { "bandwidth": { "unit": "MiB/s", "value": 23.28547713835875 }, "throughput": { "unit": "IOPS", "value": 5961.08214741984 }, "workset": { "unit": "sector", "value": 0 } }, "errors": "0" },

latency and workset subsections are filled with zeroes.

[Sighting] Kernel BUG octf_trace_push+0x19/0x240

NOT REPRODUCED!

[ 5797.598826] BUG: unable to handle kernel NULL pointer dereference at 000000000000000a
[ 5797.607974] IP: [] octf_trace_push+0x19/0x240 [iotrace]
[ 5797.616049] PGD 0
[ 5797.619308] Oops: 0000 [#1] SMP
[ 5797.623788] Modules linked in: intelcas(OE) inteldisk(OE) iotrace(OE) brd xt_CHECKSUM iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack libcrc32c ipt_REJECT nf_reject_ipv4 tun bridge stp llc ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter devlink nls_utf8 isofs loop rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache sunrpc dm_mirror dm_region_hash dm_log dm_mod sb_edac intel_powerclamp coretemp intel_rapl iosf_mbi kvm_intel kvm irqbypass crc32_pclmul ghash_clmulni_intel ast ttm cryptd igb drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops iTCO_wdt drm ptp iTCO_vendor_support sg pps_core ioatdma joydev pcspkr i2c_algo_bit ipmi_si ipmi_devintf ipmi_msghandler acpi_pad mei_me mei lpc_ich acpi_power_meter i2c_i801 shpchp i2c_core dca ip_tables ext4 mbcache jbd2 sd_mod crc_t10dif crct10dif_generic ahci libahci crct10dif_pclmul crct10dif_common libata crc32c_intel nvme nvme_core wmi [last unloaded: inteldisk]
[ 5797.730719] CPU: 67 PID: 0 Comm: swapper/67 Kdump: loaded Tainted: G IOE ------------ 3.10.0-862.el7.x86_64 #1
[ 5797.743443] Hardware name: Supermicro SYS-1028U-TR4+/X10DRU-i+, BIOS 1.0 08/11/2014
[ 5797.752824] task: ffff8a3cb4450fd0 ti: ffff8a3cb4458000 task.ti: ffff8a3cb4458000
[ 5797.762068] RIP: 0010:[] [] octf_trace_push+0x19/0x240 [iotrace]
[ 5797.773133] RSP: 0018:ffff8a441f3c3eb8 EFLAGS: 00010002
[ 5797.780299] RAX: 0000000010300002 RBX: ffffffffc08446c0 RCX: 00000000001d9149
[ 5797.789318] RDX: 0000000000000048 RSI: ffff8a441f3c3ef8 RDI: 0000000000000002
[ 5797.798360] RBP: ffff8a441f3c3ed8 R08: 000000002baa5cd0 R09: 0000000000000000
[ 5797.807422] R10: 00000000003d04d5 R11: 7fffffffffffffff R12: 000000002baa5cd0
[ 5797.816511] R13: 0000000010300002 R14: 0000000000000002 R15: 0000000000000043
[ 5797.825634] FS: 0000000000000000(0000) GS:ffff8a441f3c0000(0000) knlGS:0000000000000000
[ 5797.835676] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 5797.843332] CR2: 000000000000000a CR3: 0000000a1e80e000 CR4: 00000000001607e0
[ 5797.852361] Call Trace:
[ 5797.856688]
[ 5797.858639] [] iotrace_trace_desc+0xca/0x120 [iotrace]
[ 5797.869349] [] iotrace_bdev_add_oncpu+0x7b/0xa0 [iotrace]
[ 5797.878245] [] flush_smp_call_function_queue+0x63/0x130
[ 5797.886921] [] generic_smp_call_function_single_interrupt+0x13/0x30
[ 5797.896647] [] smp_call_function_interrupt+0x2d/0x40
[ 5797.905096] [] call_function_interrupt+0x162/0x170
[ 5797.913371]
[ 5797.915318] [] ? cpuidle_enter_state+0x57/0xd0
[ 5797.925327] [] cpuidle_idle_call+0xde/0x230
[ 5797.933057] [] arch_cpu_idle+0xe/0x40
[ 5797.940262] [] cpu_startup_entry+0x14a/0x1e0
[ 5797.948093] [] start_secondary+0x1f2/0x270
[ 5797.955766] [] start_cpu+0x5/0x14
[ 5797.962669] Code: fa 01 75 c1 48 c7 40 10 01 00 00 00 eb d1 0f 1f 40 00 0f 1f 44 00 00 55 48 85 ff 48 89 e5 41 56 41 55 41 54 53 0f 84 03 02 00 00 <48> 8b 4f 08 41 89 d3 48 ba 89 39 9b 16 14 00 00 00 b8 ea ff ff
[ 5797.986823] RIP [] octf_trace_push+0x19/0x240 [iotrace]
[ 5797.995782] RSP
[ 5798.001382] CR2: 000000000000000a

Centos7 - 3.10.0-1062 Kernel Crash - during tracing

OS: CentOS 7, 3.10.0-1062.el7.x86_64
Crash occurs when workload starts during tracing.

Stack:
BUG: unable to handle kernel NULL pointer dereference at 0000000000000098
IP: [] bio_complete_event+0xac/0x110 [iotrace]

[ 921.400604] Call Trace:
[ 921.400824]
[ 921.401007] [] ? deinit_tracers.isra.3+0x90/0x90 [iotrace]
[ 921.401683] [] iotrace_block_rq_complete+0x4c/0x70 [iotrace]
[ 921.402348] [] blk_update_request+0x116/0x360
[ 921.402885] [] scsi_end_request+0x34/0x1e0
[ 921.403419] [] scsi_io_completion+0x168/0x6a0
[ 921.403955] [] ? irq_exit+0xda/0x110
[ 921.404432] [] scsi_finish_command+0xdc/0x140
[ 921.404966] [] scsi_softirq_done+0x132/0x160
[ 921.405514] [] blk_done_softirq+0x96/0xc0
[ 921.406030] [] __do_softirq+0xf5/0x280
[ 921.406521] [] call_softirq+0x1c/0x30
[ 921.406995] [] do_softirq+0x65/0xa0
[ 921.407479] [] irq_exit+0x105/0x110
[ 921.407938] [] do_IRQ+0x56/0xf0
[ 921.408401] [] common_interrupt+0x162/0x162
[ 921.408914]
[ 921.409110] [] ? ata_sff_pio_task+0x99/0x1d0 [libata]
[ 921.409734] [] process_one_work+0x17f/0x440
[ 921.410271] [] worker_thread+0x126/0x3c0
[ 921.410756] [] ? manage_workers.isra.26+0x2a0/0x2a0
[ 921.411343] [] kthread+0xd1/0xe0
[ 921.411777] [] ? insert_kthread_work+0x40/0x40
[ 921.412338] [] ret_from_fork_nospec_begin+0x21/0x21
[ 921.412917] [] ? insert_kthread_work+0x40/0x40

I narrowed it down to null pointer dereference in bio_complete_event():
bio->bi_bdev
in
dev_id = disk_devt(IOTRACE_BIO_GET_DEV(bio));
(inside the macro)

Trace data - io.latency and io.qd may not be accurate

I used iotrace and storage snapshot to collect data using a simple FIO script
fio --time_based --name=test --filename=/mnt/data/test --filesize=100G --rw=randrw --rwmixwrite=100 --bssplit=4k --direct=1 --do_verify=0 --eta=never --iodepth=8 --ioengine=libaio --runtime=300 --ramp_time=30 --refill_buffers --numjobs=16 --randrepeat=0 --group_reporting --thread --random_distribution=zipf:1.1 --iodepth_batch_complete_min=8 --iodepth_low=8 --iodepth_batch_submit=8

iotrace-200lines.xlsx

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.