Git Product home page Git Product logo

llvm-ei's Introduction

Extensible Interpreter for LLVM

In some recent projects, I've needed to do some simple dynamic analyses over compiled LLVM bitcode. The typical way to do this would be to instrument the code by inserting new instructions using an LLVM pass. That can be tricky and error-prone, though, so here's another way.

LLVM ships with an interpreter. It's slow but it works. So why not just extend the interpreter to impose my own code during execution? That way I won't have to worry about generating bitcode instructions to get my runtime work done. I can stick to C++ and avoid linking in a runtime library to do so.

But LLVM's interpreter isn't meant to be extended in this way. (The crucial methods are non-virtual.) So I wrote this hack to provide a simple facade over the interpreter that does allow subclasses to interpose on the interpreter loop.

Requirements

To use the extensible interpreter, you will need:

  • LLVM itself, built with LLVM_ENABLE_FFI turned on. To do this, pass the command-line option -DLLVM_ENABLE_FFI:BOOL=ON to cmake when building LLVM.
  • libffi.
  • The LLVM sources, since we use some header files that are not installed with the LLVM binaries.

A Demonstration

Included in this repository is a demonstration, in trace.cpp. To build it, first edit the top of CMakeLists.txt to point to LLVM's installation prefix and its source directory (you'll need to download the source). These are the lines you need to change:

set(LLVM_PREFIX /usr)
set(LLVM_SRC_DIR ~/llvm-3.3)

Then build the example with the usual CMake dance:

$ cmake .
$ make

You'll then have an executable called trace that works as an LLVM bitcode interpreter. To use it, compile any C file to bitcode. For example, you could make a file called test.c like this:

int main(int argc, char *argv[]) {
    return argc + 1;
}

Compile it to LLVM IR like so:

$ clang -c -emit-llvm test.c -o test.bc

Then get an execution trace by running the interpreter:

$ ./trace test.bc
%1 = alloca i32, align 4
%2 = alloca i32, align 4
%3 = alloca i8**, align 8
store i32 0, i32* %1
store i32 %argc, i32* %2, align 4
store i8** %argv, i8*** %3, align 8
%4 = load i32* %2, align 4
%5 = add nsw i32 %4, 1
ret i32 %5

Awesome!

Extending the Interpreter

If you open up trace.cpp, you'll see that you don't need much code to start extending the interpreter. There's some boilerplate to take the command-line arguments, but the bulk of the interpreter extension goes like this:

class Tracer : public ExtensibleInterpreter {
public:
    Tracer(Module *M) : ExtensibleInterpreter(M) {};
    virtual void execute(llvm::Instruction &I) {
        I.dump();
        ExtensibleInterpreter::execute(I);
    }
};

Namely, you subclass ExtensibleInterpreter and override the execute() method. You probably want to call the superclass' version of that method to actually interpret the instruction, but you don't have to. Here, we just run I.dump() to output a description of each dynamic instruction we see. But you can do whatever you like during runtime, including changing the executed instructions on-the-fly.

To run the interpreter, use the provided interpret() function, which is paramaterized on the interpreter class:

interpret<Tracer>(bitcodeFile, commandArgs, envp)

The three arguments are the bitcode file to execute, the command-line arguments (argv), and the environment (envp).

How it Works

To make this work, I wrapped LLVM's Interpreter class in a new class that, via copypasta and other hacks, just passes control instruction-by-instruction to the real interpreter.

I needed access to private members of Interpreter, however, so I had to resort to a terrible, incorrect hack: I copied the definition of that class, made all its members public, and typecast the interpreter pointer to this new class. To access private Interpreter members in your code, just go through the pubInterp pointer. This hack will almost certainly break at some point, as it relies both on undefined compiler behavior and the internal structure of LLVM's classes. I'm sorry about that.

Author

This abomination is by Adrian Sampson. Use it under the terms of the same license as LLVM itself.

llvm-ei's People

Contributors

sampsyo avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

llvm-ei's Issues

Port the project LLVM 6

Hi Adrian,
First of all thanks for your time and projects. I'd like to implement a tracer that would print the LLVM instructions executed. My idea is to create a class the would extend an Interpreter, obtain the pointer to the main function and call the interpreter->visit(main). Once done that I'd override the visitInstruction in my class and I should be able to see the instructions. From my understanding the Interpreter would call the different visit functions during the execution. Am I right? If not, how would you approach this problem?

Thanks

Building with FFI

Hi,

Have you succeeded at running with libffi included?

Without it, I pretty much can't run any non-trivial program.

I've got it pointing at an llvm installation built with LLVM_ENABLE_FFI enabled, and patched up the CMakeLists.txt with target_link_libraries( trace ${FFI_LIBRARY_PATH} ), which got it to at least build.

But it still fails once it (very quickly) hits its first @llvm.memcpy with LLVM ERROR: Tried to execute an unknown external function: memcpy.

lli successfully interprets / executes the program.

I'm wondering if it has to do with the sys::DynamicLibrary::SearchForAddressOfSymbol call in callExternalFunction. I've added debug output and the trace executable does run the code in the #ifdef USE_LIBFFI block.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.