Git Product home page Git Product logo

yactfr's Introduction

yactfr

yactfr (yac·tif·er) is yet another CTF reader.

The yactfr library is written in C++14 and offers a (🥁…​) C++14 API.

While the CTF reading libraries that I know about focus on decoding and providing you with complete, ordered event record objects, the yactfr API offers a lower level of CTF processing: you iterate individual element sequences to obtain elements: beginning/end of packet, beginning/end of event record, beggining/end of structure, individual data stream scalar values like fixed-length integers, fixed-length floating point numbers, and null-terminated strings, specific clock value update, known data stream ID, and the rest. This means the sizes of event records and packets don’t influence the performance or memory usage of yactfr.

Some use cases where this library can prove useful:

  • Trace inspection programs, for example when you need to debug a custom CTF producer.

  • Foundation of an API with a higher level of abstraction.

  • Targeted offline analyses with low memory usage.

The name yactfr takes its inspiration from yajl, a JSON parser with a similar approach.

Warning
yactfr is not mature yet! Its API is still experimental: the interface could change at any time.

Notable features

  • Full CTF 1.8.3 and CTF 2 support.

  • Only depends on Boost at build time, and nothing at run time (except for your C++ runtime library):

    $ ldd libyactfr.so
            linux-vdso.so.1 (0x00007ffe98fa5000)
            libstdc++.so.6 => /usr/lib/libstdc++.so.6 (0x00007fb8337d2000)
            libm.so.6 => /usr/lib/libm.so.6 (0x00007fb83343d000)
            libgcc_s.so.1 => /usr/lib/libgcc_s.so.1 (0x00007fb833225000)
            libc.so.6 => /usr/lib/libc.so.6 (0x00007fb832e69000)
            /usr/lib64/ld-linux-x86-64.so.2 (0x00007fb833e51000)
  • Simple, documented C++14 API with a focus on immutability.

  • Efficient data stream decoding: zero memory allocation, zero smart pointer reference count change, zero virtual method call, and zero copy during an element sequence iteration.

    • yactfr decodes a data stream fixed-length bit array in place: the result is available as an element member.

    • A data stream string/BLOB is available as one or more raw data parts in consecutive elements. The actual raw data is a pointer to the current data block of the data source (for example, from a memory mapped region), which means zero string/BLOB copy.

      This is possible because, as per CTF 1.8.3 and CTF2-SPEC-2.0, data stream strings and BLOBs have an 8-bit alignment requirement.

  • Decodes CTF packets of any size and event records of any size with steady memory usage and performance.

  • Offers a memory mapped file view data source, but you can also implement your own data source.

  • It’s safe to use two different iterators on the same element sequence in two different threads.

  • Full CTF metadata text inspection and validation: a parsing error exception contains a list of locations (offset in bytes, line number, and column number) with associated context messages.

  • Full data stream packet decoding information with the input iterator interface, for example:

    • The decoded packet magic number.

    • The decoded metadata stream UUID.

    • The expected total length of the current packet.

    • The expected content length of the current packet.

    • The default clock is updated.

    • The next data stream type to use.

    • The next event record type to use.

    • The data stream (instance) ID.

  • “Seek packet” iterator method to seek the beginning of an element known to be located at a specific offset, in bytes, within the same element sequence.

    You can use this method with non-CTF packet index information, for example LTTng's packet index files (not directly supported by yactfr).

  • Decoding error reporting with precise message, offset in element sequence (bits) where this message applies within the element sequence, and other relevant properties.

  • Performance (nop iteration loop) is similar to Babeltrace 1.5.3’s (-o dummy) with a release build.

    I don’t publish numbers: experiment by yourself to confirm or deny this claim.

Limitations

Note that after a few years of working with all sorts of real life CTF traces, I never caught one which yactfr would refuse to decode, but I’m still honest:

  • Only builds on a Linux/Unix platform.

    The non-portable part is the memory mapped file source which uses system functions such as open(), close(), fstat(), mmap(), and madvise().

  • Decodes up to, and including, 64-bit signed/unsigned CTF fixed-length bit arrays and integers (no “big integers”).

  • Decodes up to, and including, 63-bit (effective, which means 72 total) signed/unsigned CTF variable-length integers.

  • Only decodes 32-bit and 64-bit CTF fixed-length floating point numbers.

  • Packet lengths must be multiples of 8 bits (I’m still not sure, but this could be enforced by the specification anyway), and be at least 8 bits.

  • TSDL (CTF 1.8): Doesn’t handle single-line comment continuation when the line ends with \:

    event {
        name = "hello"; // this is a comment \
        id = 23; we're still in the comment started above here
        id = 42;
        ...
    };
  • TSDL (CTF 1.8): Doesn’t support relative dynamic-length array type lengths and variant type selectors in data type aliases (or named structure/variant types) which target structure member types outside this data type alias.

    For example, this is not supported (TSDL):

    fields := struct {
        int len;
    
        typealias struct {
            int sequence[len];
        } := my_struct;
    
        struct {
            int len;
            my_struct a_struct;
        } field;
    };

    This is also not supported (TSDL):

    fields := struct {
        enum {
            ...
        } tag;
    
        variant my_variant <tag> {
            ...
        } a_variant;
    
        my_variant the_variant;
    };

    The example above would work, however, if the selector location of the variant type would be absolute:

    fields := struct {
        enum {
            ...
        } tag;
    
        variant my_variant <event.fields.tag> {
            ...
        } a_variant;
    
        my_variant the_variant;
    };
  • API and ABI backward compatibility is not guaranteed at this point.

    Please rebuild your project if you change the yactfr version.

Build and install yactfr

Make sure you have the build time requirements:

  • Linux/Unix platform

  • CMake ≥ 3.10.0

  • C++14 compiler

  • Boost ≥ 1.58

  • If you build the API documentation: Doxygen

Build and install yactfr from source
$ git clone https://github.com/eepp/yactfr
$ cd yactfr
$ mkdir build
$ cd build
$ cmake -DCMAKE_BUILD_TYPE=release ..
$ make
# make install

You can specify your favorite C and C++ compilers with the usual CC and CXX environment variables when you run cmake, and additional options with CFLAGS and CXXFLAGS.

Specify -DOPT_BUILD_DOC=YES to cmake to enable the HTML API documentation build (requires Doxygen). The documentation is available in BUILD/doc/api/output/html, where BUILD is your build directory.

Specify -DCMAKE_INSTALL_PREFIX=PREFIX to cmake to install yactfr to the PREFIX directory instead of the default /usr/local directory.

For example, this is how I run cmake for development:

$ CC=clang CXX=clang++ CXXFLAGS='-Wextra -Wall -pedantic' \
  cmake .. -DCMAKE_BUILD_TYPE=debug -DOPT_BUILD_DOC=ON

For production, you should make a release build:

$ CC=clang CXX=clang++ \
  cmake .. -DCMAKE_BUILD_TYPE=release -DOPT_BUILD_DOC=ON

Run the tests

Once you have built the project in the build directory, you can run the tests. You need Python 3 and pytest.

Run the yactfr tests from the build directory.
$ make check

If you’re in a hurry and you have the pytest-xdist package, you can parallelize the testing process. You need to set the YACTFR_BINARY_DIR environment variable to the build directory (absolute path), for example:

Run the yactfr tests in parallel from the build directory.
$ make tests
$ YACTFR_BINARY_DIR=$(pwd) pytest -n logical

Usage examples

In the examples below, the program accepts two arguments:

  1. The path to the metadata stream file of the trace (required).

  2. The path to a data stream file of the same trace (required by some example).

Build the API documentation for a thorough reference.

Note
The examples are not necessarily optimal: their purpose is to show what the yactfr API looks like.
Example 1. Print all the data stream’s event record names.
#include <cassert>
#include <fstream>
#include <iostream>
#include <yactfr/yactfr.hpp>

int main(const int argc, const char * const argv[])
{
    assert(argc == 3);

    // open metadata stream file
    std::ifstream metadataFile {argv[1], std::ios::binary};

    // create metadata stream object
    const auto metadataStream = yactfr::createMetadataStream(metadataFile);

    // we have the metadata text at this point: safe to close the file
    metadataFile.close();

    // get a trace type from the metadata text
    auto traceTypeMsUuidPair = yactfr::fromMetadataText(metadataStream->text());

    // create a memory mapped file view factory to read the data stream file
    yactfr::MemoryMappedFileViewFactory factory {argv[2]};

    // create an element sequence from the trace type and data source factory
    yactfr::ElementSequence seq {*traceTypeMsUuidPair.first, factory};

    // print all the event record names
    for (auto& elem : seq) {
        if (elem.isEventRecordInfoElement()) {
            auto& erInfo = elem.asEventRecordInfoElement();

            // the name of an event record type is optional
            if (erInfo.type()->name()) {
                std::cout << *erInfo.type()->name() << std::endl;
            }
        }
    }
}
Example 2. Print all the fixed-length signed integers of the sched_switch event records and their offset.
#include <cassert>
#include <fstream>
#include <iostream>
#include <yactfr/yactfr.hpp>

int main(const int argc, const char * const argv[])
{
    assert(argc == 3);

    // open metadata stream file
    std::ifstream metadataFile {argv[1], std::ios::binary};

    // create metadata stream object
    const auto metadataStream = yactfr::createMetadataStream(metadataFile);

    // we have the metadata text at this point: safe to close the file
    metadataFile.close();

    // get a trace type from the metadata text
    auto traceTypeMsUuidPair = yactfr::fromMetadataText(metadataStream->text());

    // create a memory mapped file view factory to read the data stream file
    yactfr::MemoryMappedFileViewFactory factory {argv[2]};

    // create an element sequence from the trace type and data source factory
    yactfr::ElementSequence seq {*traceTypeMsUuidPair.first, factory};

    // print all the fixed-length signed integers of the `sched_switch` ERs
    const auto endIt = seq.end();
    bool inSchedSwitchEventRecord = false;

    for (auto it = seq.begin(); it != endIt; ++it) {
        if (it->isEventRecordInfoElement()) {
            auto& ertElem = it->asEventRecordInfoElement();

            // the name of an event record type is optional
            if (ertElem.type()->name() && *ertElem.type()->name() == "sched_switch") {
                std::cout << "---" << std::endl;
                inSchedSwitchEventRecord = true;
            } else {
                inSchedSwitchEventRecord = false;
            }

            continue;
        }

        if (inSchedSwitchEventRecord && it->isFixedLengthSignedIntegerElement()) {
            std::cout << it.offset() << ": ";

            auto& intElem = it->asFixedLengthSignedIntegerElement();

            if (intElem.structureMemberType()) {
                std::cout << intElem.structureMemberType()->displayName() << ": ";
            }

            std::cout << intElem.value() << std::endl;
        }
    }
}
Example 3. Print all the packet offsets and lengths (both in bits): slow version.

In this example, we iterate all the elements of the data stream. The next example shows how to do the same faster.

#include <cassert>
#include <fstream>
#include <iostream>
#include <iomanip>
#include <yactfr/yactfr.hpp>

int main(const int argc, const char * const argv[])
{
    assert(argc == 3);

    // open metadata stream file
    std::ifstream metadataFile {argv[1], std::ios::binary};

    // create metadata stream object
    const auto metadataStream = yactfr::createMetadataStream(metadataFile);

    // we have the metadata text at this point: safe to close the file
    metadataFile.close();

    // get a trace type from the metadata text
    auto traceTypeMsUuidPair = yactfr::fromMetadataText(metadataStream->text());

    // create a memory mapped file view factory to read the data stream file
    yactfr::MemoryMappedFileViewFactory factory {argv[2]};

    // create an element sequence from the trace type and data source factory
    yactfr::ElementSequence seq {*traceTypeMsUuidPair.first, factory};

    // print all the packet offsets and lengths (both in bits)
    const auto endIt = seq.end();
    yactfr::Index curPktOffset = 0;
    unsigned long curPktNumber = 0;

    for (auto it = seq.begin(); it != endIt; ++it) {
        if (it->isPacketBeginningElement()) {
            // save packet beginning offset
            curPktOffset = it.offset();
        } else if (it->isPacketEndElement()) {
            // back to first level: end of packet
            const auto pktLen = it.offset() - curPktOffset;

            std::cout << "Packet #" << curPktNumber << ":    " <<
                         "Offset: " << std::setw(10) << curPktOffset << "    " <<
                         "Size: " << std::setw(10) << pktLen <<
                         std::endl;
            ++curPktNumber;
        }
    }
}
Example 4. Print all the packet offsets and lengths (both in bits): fast version.

This is a faster version of the previous example.

Instead of decoding the whole packet to find its length, we use the expected packet total length property of the “packet info” element. This element is available after the decoder reads the packet context. Then, we make the iterator seek the next packet directly.

Note that this example doesn’t work if the packet context type doesn’t contain an expected packet total length fixed-length unsigned integer, in which case the data stream must contain a single packet. This could be detected by inspecting the metadata (trace type) and using the size of the whole data stream file as the unique packet total length.

#include <cassert>
#include <fstream>
#include <iostream>
#include <iomanip>
#include <yactfr/yactfr.hpp>

int main(const int argc, const char * const argv[])
{
    assert(argc == 3);

    // open metadata stream file
    std::ifstream metadataFile {argv[1], std::ios::binary};

    // create metadata stream object
    const auto metadataStream = yactfr::createMetadataStream(metadataFile);

    // we have the metadata text at this point: safe to close the file
    metadataFile.close();

    // get a trace type from the metadata text
    auto traceTypeMsUuidPair = yactfr::fromMetadataText(metadataStream->text());

    // create a memory mapped file view factory to read the data stream file
    yactfr::MemoryMappedFileViewFactory factory {argv[2]};

    // create an element sequence from the trace type and data source factory
    yactfr::ElementSequence seq {*traceTypeMsUuidPair.first, factory};

    // print all the packet offsets and lengths (both in bits)
    const auto endIt = seq.end();
    auto it = seq.begin();
    yactfr::Index curPktOffset = 0;
    unsigned long curPktNumber = 0;

    while (it != endIt) {
        if (it->isPacketBeginningElement()) {
            // save packet beginning offset
            curPktOffset = it.offset();
        } else if (it->isPacketInfoElement()) {
            // this element contains the expected total length of the current packet
            auto& elem = it->asPacketInfoElement();

            assert(elem.expectedTotalLength());
            std::cout << "Packet #" << curPktNumber << ":    " <<
                         "Offset: " << std::setw(10) << curPktOffset << "    " <<
                         "Size: " << std::setw(10) << *elem.expectedTotalLength() <<
                         std::endl;
            ++curPktNumber;

            /*
             * Seek the next packet without iterating the intermediate
             * elements. The expected offset is in bytes, so we need to
             * divide what we have by 8.
             */
            it.seekPacket((curPktOffset + *elem.expectedTotalLength()) / 8);
            continue;
        }

        ++it;
    }
}

Contribute and report bugs

Please contribute with GitHub pull requests and report bugs as GitHub issues.

Community

See eepp.ca.

I’m eepp on Libera.Chat and OFTC.

yactfr's People

Contributors

eepp avatar frdeso avatar jgalar avatar simark avatar

Stargazers

 avatar  avatar  avatar

Watchers

 avatar  avatar

yactfr's Issues

Using yactf from Visual Basic 4.0

Hi,

I am working on a trace viewer written in Visual Basic. I think both of our projects would benefit tremendously if yactfr could be used from Visual Basic applications.

Note that my code depends on a fair amount of custom widgets written using VB 4.0, so I can't move to the newly introduced VB .NET just yet.

I have a license of Visual C++ 1.0 which I can use to build 16-bit real mode code. I will send you patches to build on this compiler since your code seems to use a lot of experimental C++ features. I think we should focus on industry standards if this project is to become successful.

Kind regards.

Build failure

It no builds:

FAILED: yactfr/CMakeFiles/yactfr.dir/internal/metadata/json/json-val-from-text.cpp.o 
/usr/bin/c++ -DZF_LOG_DEF_SRCLOC=ZF_LOG_SRCLOC_NONE -DZF_LOG_LEVEL=ZF_LOG_NONE -Dyactfr_EXPORTS -I/home/simark/src/yactfr/include -fPIC -std=c++14 -MD -MT yactfr/CMakeFiles/yactfr.dir/internal/metadata/json/json-val-from-text.cpp.o -MF yactfr/CMakeFiles/yactfr.dir/internal/metadata/json/json-val-from-text.cpp.o.d -o yactfr/CMakeFiles/yactfr.dir/internal/metadata/json/json-val-from-text.cpp.o -c /home/simark/src/yactfr/yactfr/internal/metadata/json/json-val-from-text.cpp
In file included from /home/simark/src/yactfr/yactfr/internal/metadata/json/json-parser.hpp:18,
                 from /home/simark/src/yactfr/yactfr/internal/metadata/json/json-val-from-text.cpp:13:
/home/simark/src/yactfr/yactfr/internal/metadata/json/../str-scanner.hpp:454:26: error: field ‘_convBuf’ has incomplete type ‘std::array<char, 72>’
  454 |     std::array<char, 72> _convBuf;
      |                          ^~~~~~~~
In file included from /usr/include/c++/12.1.0/bits/hashtable_policy.h:34,
                 from /usr/include/c++/12.1.0/bits/hashtable.h:35,
                 from /usr/include/c++/12.1.0/unordered_set:46,
                 from /home/simark/src/yactfr/yactfr/internal/metadata/json/json-parser.hpp:14:
/usr/include/c++/12.1.0/tuple:1595:45: note: declaration of ‘struct std::array<char, 72>’
 1595 |   template<typename _Tp, size_t _Nm> struct array;
      |                                             ^~~~~
$ /usr/bin/c++ --version
c++ (GCC) 12.1.0
Copyright (C) 2022 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.