galoisinc / mctrace Goto Github PK

An implementation of DTrace for machine code

License: BSD 3-Clause "New" or "Revised" License

Haskell 73.86% Lex 1.75% Yacc 4.63% D 0.12% Makefile 1.48% DTrace 1.21% C 3.33% Roff 7.47% Shell 3.96% Dockerfile 1.00% Python 1.20%

binary-rewriting dtrace llvm-frontend machine-code

mctrace's Introduction

Introduction

This repository contains the source code and build system for the MCTrace binary instrumentation tool. The MCTrace tool enables users to modify binaries, inserting instrumentation into them in order to collect fine-grained tracing information. For information on the MCTrace tool's design and usage, please see MCTRACE.md. This document covers instructions for building MCTrace from source for development or releases.

Building MCTrace

MCTrace can be built for one of two purposes: either for local development in a Haskell build environment, or for release as a Docker image. Instructions for each method are detailed below.

Development Build Instructions

The development environment setup and build processes are automated. The build process requires Ubuntu 20.04. To perform a one-time setup of the development environment including the installation of LLVM, cross compilers, and other required tools, run the development setup script:

./dev_setup.sh

Once the development environment is set up and the required tools are installed, MCTrace can be built with the build script:

./build.sh

After the build has completed, various cross compilers and other tools can be brought into the PATH for easier access with:

. env.sh

To build the example test programs for x86_64 and instrument them using various testing probes, run:

make -C mctrace/tests/full

To do the same for PowerPC, run:

make -C mctrace/tests/full ARCH=PPC

The mctrace tool can be run manually by running:

cabal run mctrace <args>

For more details on using the mctrace tool, see MCTRACE.md.

Release Build Instructions

To build the release Docker image, execute the following from the root of the repository:

cd release
./build.sh

This will build two docker images:

mctrace.tar.gz, a self-contained image that contains MCTrace, its dependencies, associated tools, and examples. For information on using former image, please see release/README.md.
mctrace-tool.tar.gz, a minimal image containing just MCTrace and its dependencies. A helper script, release/mctrace has been provided to run the command in a container. Note that paths passed to this script should be relative to the root of the repository and paths outside of the repository will not accessible.

Status Information

Some of the work on this project involved attempting to run the Challenge Problem 10 binary on a real PowerPC microcontroller. We obtained an NXP development board, the MPC5777C, and a PEMicro USB Multilink debug adapter for connecting to the board. This work involved flashing the microcontroller with the Challenge Problem 10 binary, both original and instrumented versions, under the following conditions:

NXP S32 Design Studio for Power Architecture, version 2.1
IBM Thinkpad running Ubuntu 16.04, a supported platform for the NXP Design Studio
PEMicro USB Multilink adapter, model USB QORIVVA Multilink for MPC55xx/56xx devices, part # USB-ML-PPCNEXUS
USB A to USB micro connector for the MPC5777C debug UART

The system installation was done as follows:

Install Ubuntu 16.04 with a desktop GUI installed.
Install S32DS 2.1, following the steps in the S32 Design Studio for Power Architecture 2.1 Installation Guide.

The flashing procedure was performed by following the steps listed in the Immunant challenge problem repository and are reproduced below for posterity. Steps for the booting procedure are also documented below.

In our attempt to flash the MPC5777C with the Challenge Problem 10 binary, our findings and next steps were as follows:

We were able to use the flashing procedure to load the uninstrumented (original) Challenge Problem 10 binary for MPC5777C onto the board.
Our evidence that an unmodified Challenge Problem 10 binary booted somewhat successfully was that we saw the following output on the UART console:

Setup Complete.
ERROR: Failed to send status update

We then flashed an mctrace-instrumented version of the Challenge Problem 10 binary. The instrumented version that we found, along with its probes, can be found in the cp10_demo/mpc5777c directory in this repository.
Our instrumented version failed to boot (as evidenced by no console output).
Even with a platform implementation that does nothing in any of its functions (e.g. platform_send) and even with probes that do not use any global variables (thus not warranting a memory allocation from the platform implementation to provide for global variable storage), the binary failed to boot. We did not explore this further to determine the cause.
However, one key task left unfinished in our work was to write a suitable platform implementation for the MPC5777C. To date, we had used a PowerPC platform implementation that was only suitable for running in Linux userspace environments. To get a working platform implementation on the MPC5777C, an exfiltration mechanism must be implemented, such as a CAN bus send operation or a UART write. A next step is to obtain either of those and integrate their source directly into the compilation process of the MPC5777C platform implementation.

MPC5777C and USB Multilink Setup

This section details how to connect the PEMicro adapter to the MPC5777C, since there is no suitable documentation on how to get this right.

This image shows the board at a glance, with the power connector and power switch visible in the lower right corner of the board. The USB UART is connected at the far left part of the board and the PEMicro debugging adapter's ribbon cable is connected on the right with the red stripe (Pin 1) positioned furthest from the power connector.

This image shows the PEMicro adapter with its ribbon cable coming from its enclosure.

This image shows how the PEMicro adapter's ribbon cable connects to the MPC5777C, with the red stripe position indicating Pin 1.

This image shows how the ribbon cable is connected to the internal header in the PEMicro adapter. Of all of the headers available, two fit the provided 14-pin cable. The correct header is the lower-left header, pictured here, and the Pin 1 orientation is the lower-right pin of the header if looking at the board from above.

Immunant Flashing Procedure

Source: Immunant AMP Challenge Problem repository

Right click in the Project Explorer tab within S32DS.
Click "Import".
In the Import window, navigate to "S32 Design Studio" > "Executable File Importer".
Click "Next".
Click "Browse" and navigate to your binary of interest.
Under "Please specify hardware parameter ...",
navigate to "MPC5777C" > "MPC5777C" > "Z7_0" for the core selection.
Click "Next".
Specify a project name and rename the launch configuration if desired.
Click "Finish".

With the project created, you can then navigate in project explorer to the project. Navigate to "{Project Name}" > "Binaries" > "{Binary Name}". Right click on "{Binary Name}" and then click "{Flash from file}".

If there are no launch configurations, click on the "New launch configuration" button, which will create a new configuration based on the binary's name. Otherwise, you can use one of the flash configurations if there is one already populated for the MPC5777C. This may happen if you have created other projects.

Select the desired configuration (likely just created), and click on the "PEMicro Debugger" tab. For "Interface" select the "USB Multilink..." option. For port, select the port that the Multilink is connected to. Likely some COMX type variant. For "Device Name", be sure "MPC5777C" is selected and "Z7_0" for "Core". Default options should work for the rest. Click "Flash".

A similar workflow should be possible by selecting the project, right-clicking "Debug as" or "Runs as" after the above steps and selecting "S32DS C/C++ Application". This, however, was not tested as of writing this document.

Booting the MPC5777C

Plug in the power adapter.
Connect a USB micro cable to the debugging UART USB connector on the left side of the board (the opposite end from the power connector).
On the Linux host to which the USB UART is connected, run sudo minicom -D /dev/ttyUSB0 to connect minicom to the UART.
Flip the power switch (the switch immediately next to the power connector).

Acknowledgements

This material is based upon work supported by the United States Air Force AFRL/SBRK under Contract No. FA8649-21-P-0293, and by the Defense Advanced Research Projects Agency (DARPA) and Naval Information Warfare Center Pacific (NIWC Pacific) under Contract Number N66001-20-C-4027. Any opinions, findings and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the DARPA & NIWC Pacific.

mctrace's People

Contributors

Stargazers

Watchers

mctrace's Issues

Update to a newer version of llvm-hs

The version released on Hackage works with LLVM 9, which is getting difficult to install in modern environments. We will probably need to bring llvm-hs in as a submodule to get the version that supports LLVM 12, which is much more modern. This will probably require some minor API updates, but I wouldn't expect too much churn given the small slice of the API currently in use.

Add automated tests for telemetry situation/injection

Currently, the test suite only tests that the LLVM code generation from DTrace works (by compiling examples and testing them with a simple harness).

This needs to be extended to test that the probe location identification and probe injection functionality works correctly. This would need to run the MCTrace entry point on a binary and test that the telemetry works as expected.

Add a primitive DTrace function for obtaining probe context

We should add an intrinsic function supported in our extended DTrace language for capturing the context that the probe is running in. This could be implemented with a bit of architecture-specific stack walking code to read a fixed number of return addresses off of the stack. The interface would require a great deal of design, but might look something like:

::func:entry {
  byte context[8];
  // Store as much context as we can in 8 bytes; that would be two return addresses on a 32 bit architecture
  get_context(&context, 8);

  // Save the context elsewhere
  emit_message(&context);
}

This would probably be a piece of code to implement in hand-written assembly and injected into the binary to be called from probes, rather than doing codegen directly in renovate.

There are some functions in the official DTrace manual that are similar (see stack): https://docs.oracle.com/cd/E19253-01/817-6223/6mlkidlhj/index.html

Avoid caching in Dockerfile for git cloning

The docker file creates a fresh clone of the repository before building, which can cause issues with caching since docker isn't aware when the repo is updated and this step needs to be re-run.

Support additional intrinsics

This ticket tracks some additional intrinsics that we will likely need to implement. We should make breakout tickets for each one.

Read memory from a global address
Read memory from a named global variable (with the name taken from DWARF metadata see #10)

We could also implement intrinsics for writing memory. This is somewhat outside of the DTrace model, but would be useful for patching.

Build a Docker container for distribution

MCTrace has non-trivial dependencies (especially LLVM) that we cannot count on users to have installed properly. We will need to build a Docker container to enable reasonable distribution.

We need to ensure that the version of LLVM linked against by llvm-hs supports cross-compilation (at least for our target architectures).

We probably want to use the staged container construction method where we build a base container with all of the Haskell and LLVM build artifacts, then just copy the mctrace binary and the necessary LLVM shared libraries into the final container (to minimize distribution size).

Add a PowerPC backend

We will need to support PowerPC (with a focus on the 32 bit variant). This task has two major components:

Adapt the LLVM codegen module to support PowerPC by identifying the correct host triple (and ensure that the version of LLVM linked against has support for the PowerPC architecture)
Implement the probe insertion and call logic

Our first targets are likely to be statically linked PowerPC binaries. Keep in mind that supporting dynamically linked PowerPC binaries will require adding support to both elf-edit and macaw for the PowerPC relocation types.

Read DWARF metadata, if available

This would enable mctrace to inject probes in more locations more ergonomically. Instead of referring to functions (in probe descriptions) by their address, we could use DWARF metadata to provide more ergonomic names.

It may also enable injecting probes around accesses to e.g., global variables whose names are provided in DWARF metadata.

Add support for parsing structure type definitions in DTrace scripts

The parser needs to be extended with support for structure types. Note that it would be ideal to handle codegen for structures in a way that is layout-compatible with C. Handling that properly may (or may not...) require a bit of fiddling with padding manually, as some of that is handled in the LLVM frontends.

Add an ARM backend

We first need an ARM backend, which requires two sub-tasks:

Generate ARM code using LLVM as a cross-compiler (augmenting the LLVM codegen module)
Instantiate the probe call/insertion interface for ARM (copy the x86 version)

Support alternative probe storage allocation mechanisms

DTrace scripts can allocate additional global storage that is separate from the rest of program memory, and is only visible to probes.

Currently, mctrace supports this by computing the amount of storage required (statically) and allocates that amount of memory using mmap. To support embedded systems not running with an OS, we need to provide an alternative mechanism to store probe data. This could be as simple as taking an address on the command line that points to static storage. It may require interfacing with other system services. It may need to be customizable.

Move probe storage allocation to an externally defined and injected function

Context: MCTrace supports platform-specific allocation of the telemetry storage block. Currently the allocation is handled by a set of hardcoded instructions in the platform-specific plugin. Instead, the allocation function should be defined as part of an external (compiled) module. MCTrace must be updated to inject this module in to the binary and make calls these functions as appropriate.

To achieve this MCTrace must support injecting external modules in to the binary and invoking these newly injected functions afterwards.

Note that this approach can also be used to support externally specified data exfiltration functions.

Make Dockerfile use new setup and build automation

As of c0bf028, there is improved automation for development environment setup and building in an ordinary Linux environment using dev_setup.sh and build.sh. After writing those scripts, I discovered that most of the steps are also carried out by the Dockerfile in the repository. But having the steps duplicated is problematic for obvious reasons.

So this ticket is a reminder to explore unifying these approaches by making the Dockerfile use dev_setup.sh and build.sh, and about modifying those scripts, as needed, to make them work in both environments. That way, we have working automation that works both within and outside of a Docker context.

Support additional probe types

The proof-of-concept of MCTrace is able to instrument entry/exit from named functions. This ticket tracks additional probe types that need to be implemented. Consider adding separate tickets for each one in progress:

Probes that fire before/after particular memory addresses are read/written
Probes that are inserted unconditionally at specific machine code addresses

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.