lifting-bits / remill Goto Github PK

Library for lifting machine code to LLVM bitcode

License: Apache License 2.0

Shell 0.29% C++ 70.92% C 1.29% Assembly 25.48% CMake 1.19% Python 0.72% GDB 0.02% Batchfile 0.07% Dockerfile 0.03%

x86 x86-64 aarch64 llvm llvm-ir instruction-semantics armv7 armv8 thumb2 sparcv8

remill's Introduction

Remill

Remill is a static binary translator that translates machine code instructions into LLVM bitcode. It translates AArch64 (64-bit ARMv8), SPARC32 (SPARCv8), SPARC64 (SPARCv9), x86 and amd64 machine code (including AVX and AVX512) into LLVM bitcode. AArch32 (32-bit ARMv8 / ARMv7) support is underway.

Remill focuses on accurately lifting instructions. It is meant to be used as a library for other tools, e.g. McSema.

Build Status

Documentation

To understand how Remill works you can take a look at the following resources:

If you would like to contribute you can check out: How to contribute

Getting Help

If you are experiencing undocumented problems with Remill then ask for help in the #binary-lifting channel of the Empire Hacking Slack.

Supported Platforms

Remill is supported on Linux platforms and has been tested on Ubuntu 22.04. Remill also works on macOS, and has experimental support for Windows.

Remill's Linux version can also be built via Docker for quicker testing.

Dependencies

Most of Remill's dependencies can be provided by the cxx-common repository. Trail of Bits hosts downloadable, pre-built versions of cxx-common, which makes it substantially easier to get up and running with Remill. Nonetheless, the following table represents most of Remill's dependencies.

Name	Version
Git	Latest
CMake	3.14+
Google Flags	Latest
Google Log	Latest
Google Test	Latest
LLVM	15+
Clang	15
Intel XED	Latest
Python	2.7
Unzip	Latest
ccache	Latest

Getting and Building the Code

Docker Build

Remill now comes with a Dockerfile for easier testing. This Dockerfile references the cxx-common container to have all pre-requisite libraries available.

The Dockerfile allows for quick builds of multiple supported LLVM, and Ubuntu configurations.

Important

Not all LLVM and Ubuntu configurations are supported---Please refer to the CI results to get an idea about configurations that are tested and supported. The Docker image should build on both x86_64 and ARM64, but we only test x86_64 in CI. ARM64 should build, but if it doesn't, please open an issue.

Quickstart (builds Remill against LLVM 17 on Ubuntu 22.04).

Clone Remill:

git clone https://github.com/lifting-bits/remill.git
cd remill

Build Remill Docker container:

docker build . -t remill \
     -f Dockerfile \
     --build-arg UBUNTU_VERSION=22.04 \
     --build-arg LLVM_VERSION=17

Ensure remill works:

Decode some AMD64 instructions to LLVM:

docker run --rm -it remill \
     --arch amd64 --ir_out /dev/stdout --bytes c704ba01000000

Decode some AArch64 instructions to LLVM:

docker run --rm -it remill \
     --arch aarch64 --address 0x400544 --ir_out /dev/stdout \
     --bytes FD7BBFA90000009000601891FD030091B7FFFF97E0031F2AFD7BC1A8C0035FD6

On Linux

First, update aptitude and get install the baseline dependencies.

sudo dpkg --add-architecture i386
sudo apt-get update
sudo apt-get upgrade

sudo apt-get install \
     git \
     python3 \
     wget \
     curl \
     build-essential \
     lsb-release \
     ccache \
     libc6-dev:i386 \
     'libstdc++-*-dev:i386' \
     g++-multilib

Next, clone the repository. This will clone the code into the remill directory.

git clone https://github.com/lifting-bits/remill.git

Next, we build Remill. This script will create another directory, remill-build, in the current working directory. All remaining dependencies needed by Remill will be built in the remill-build directory.

./remill/scripts/build.sh

Next, we can install Remill. Remill itself is a library, and so there is no real way to try it. However, you can head on over to the McSema repository, which uses Remill for lifting instructions.

cd ./remill-build
sudo make install

We can also build and run Remill's test suite.

cd ./remill-build
make test_dependencies
make test

Full Source Builds

Sometimes, you want to build everything from source, including the cxx-common libraries remill depends on. To build against a custom cxx-common location, you can use the following cmake invocation:

mkdir build
cd build
cmake  \
  -DCMAKE_INSTALL_PREFIX="<path where remill will install>" \
  -DCMAKE_TOOLCHAIN_FILE="<path to cxx-common directory>/vcpkg/scripts/buildsystems/vcpkg.cmake"  \
  -G Ninja  \
  ..
cmake --build .
cmake --build . --target install

The output may produce some CMake warnings about policy CMP0003. These warnings are safe to ignore.

Common Build Issues

If you see errors similar to the following:

fatal error: 'bits/c++config.h' file not found

Then you need to install 32-bit libstdc++ headers and libraries. On a Debian/Ubuntu based distribution, You would want to do something like this:

sudo dpkg --add-architecture i386
sudo apt-get update
sudo apt-get install libc6-dev:i386 libstdc++-10-dev:i386 g++-multilib

This error happens because the SPARC32 runtime semantics (the bitcode library which lives in <install directory>/share/remill/<version>/semantics/sparc32.bc) are built as 32-bit code, but 32-bit development libraries are not installed by default.

A similar situation occurs when building remill on arm64 Linux. In that case, you want to follow a similar workflow, except the architecture used in dpkg and apt-get commands would be armhf instead of i386.

Another alternative is to disable SPARC32 runtime semantics. To do that, use the -DREMILL_BUILD_SPARC32_RUNTIME=False option when invoking cmake.

remill's People

Contributors

Stargazers

Watchers

Forkers

gitcollect rgov 13757556089terry erhlee-bird evil-e chubbymaggie japesinator roachspray decomp-mirror mewpull pombredanne jcarlson23 bdlmt perks scottcarr wellcomez sdasgup3 kumarak ranweiler woozoo86 srinib1982 mitp0sh tathanhdinh f0829 wollmilch-systems milesqli yangboyd joshpoll ianklatzco dureuill 251 brolerbin supjerk alvarofe owl129 nairobi222 systems-nuts fengjixuchui fvrmatteo crackercat svv232 thelongestusernameofall modulexcite longjohncoder freemanzyq michaeljclark mewbak mcgrady1 fcccode trevorsundberg abamidele dbwodlf3 zhizhongpan dancwpark jeffli678 adahsuzixin reberhardt7 markgllin ekilmer sigma-random sailfish009 jimbei pgarba linan1109 xpoy1 icodein zhkl0228 bwry guduhanyan alehacksp mrexodia asmjmp0 satanson dcnick3 00mjk foxhoundsk zyt755 superligen altnt boydai ekmixon ckotherproject greenbagels oxygen-hunter yjxxin newmai richardlford jaic1 sanqudui8ban thomasking2014 smallchester kunsonx floatingnumber cdacesec dh0er anisyusof-sc dongaxis compiler-tool-zoom gmh5225 amesianx

remill's Issues

Find a better way to implement non-transparent flags optimizations

Right now there are some macro-enables flags optimizations directly in the instruction semantics code. This isn't the right place for them, but I want to eventually be able to use them. The goal of these optimizations is to "kill" the eflags based on the assumption that the code being lifted is produced by a "sane" compiler that doesn't use the flags after a conditional branch, function call, indirect jump, or function return.

Probably what we want is some kind of intrinsic for telling us that we're doing a direct control flow transfer (e.g. for condition branches, direct jumps, and direct function calls). Other flags killing code can be placed in the existing intrinsics for indirect function call/return and indirect jumps.

Analyze instructions in CFG file to do feature detection

Pre-process the instructions CFG before lifting to feature detect for things like SSEn, AVXn, etc. and broadly categorize into: no-AVX, AVX (includes AVX2), and AVX512.

Figure out how to handle instructions making suppressed memory accesses using the ADDR32 or ADDR16 prefixes.

For exampe, STOS and SCAS in 64-bit can use [EDI] instead of [RDI] as the base address. If RDI != ZExtend(EDI) then there will be a translation transparency issue.

Run Valgrind on Remill

Use the Valgrind annotations to enable/disable checking around execution of native and lifted code. Periodically ensure an absence of errors.

The same should be done for cfg_to_bc.

Add MPX regs to State structure

Also add accompanying test cases.

Convert inteprocedural register/flag kill analysis into an LLVM pass

There are some benefits to this:

It will not be x86-specific, so a theoretic port to another architecture will benefit from the same analysis.
It will not require all code to be present in a single CFG file. Large executables push or exceed the maximum protobuf sizes, so switching to using many, more fine-grained CFGs (e.g. one per function) makes sense.
It will be "simple" insofar as it can use identical local variable names to identify dead stores, thus avoiding alias analysis altogether.

Make algorithm intrinsic

There should be a number of pre-defined algorithms (using an enum to list them all) that can be invoked by semantics functions. These would roughly correspond to instructions available in hardware, e.g. logarithms, tangents, etc.

Make State structures derived from an architecture-neutral base class

This is related to the footnote in Issue #52. The idea here is that some information (e.g. the interrupt vector) cannot be passed through control-flow intrinsics (e.g. __remill_interrupt_call) because of the rigid argument requirements for control-flow intrinsics (State *state, Memory *memory, addr_t pc). I think an appropriate solution is to put these architecture-neutral "dirty details" into a base class, and have the State structure derive from this base class. Then, control-flow intrinsics can be defined as accepting pointers to the base class. Implementations of the intrinsics that require access to the actual machine-specific contents can then down-cast the pointer.

Make glog not depend on any stack unwinder

This will likely require adding a tar.xz into blob that is either the latest glog that supports Cmake, or with outright modifications that eliminate stack unwinding. This will be one less external dependency that doesn't provide more information than you already get with a debugger.

Incorrect use of llvm::CloneFunctionInto

In general builds of Remill this doesn't really appear, but by shimming in a build of LLVM in debug mode with assertions and expensive checks, we see the following crash:

Program received signal SIGABRT, Aborted.
0x00007ffff5b35418 in __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:54
54  ../sysdeps/unix/sysv/linux/raise.c: No such file or directory.
(gdb) bt
#0  0x00007ffff5b35418 in __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:54
#1  0x00007ffff5b3701a in __GI_abort () at abort.c:89
#2  0x00007ffff5b2dbd7 in __assert_fail_base (fmt=<optimized out>, assertion=assertion@entry=0xf5a2e8 "(Flags & RF_IgnoreMissingEntries) && \"Referenced value not in value map!\"", file=file@entry=0xf5a0b8 "/home/pag/git/llvm/lib/Transforms/Utils/ValueMapper.cpp", line=line@entry=444, function=function@entry=0xf609c0 <llvm::RemapInstruction(llvm::Instruction*, llvm::ValueMap<llvm::Value const*, llvm::WeakVH, llvm::ValueMapConfig<llvm::Value const*, llvm::sys::SmartMutex<false> > >&, llvm::RemapFlags, llvm::ValueMapTypeRemapper*, llvm::ValueMaterializer*)::__PRETTY_FUNCTION__> "void llvm::RemapInstruction(llvm::Instruction*, llvm::ValueToValueMapTy&, llvm::RemapFlags, llvm::ValueMapTypeRemapper*, llvm::ValueMaterializer*)") at assert.c:92
#3  0x00007ffff5b2dc82 in __GI___assert_fail (assertion=0xf5a2e8 "(Flags & RF_IgnoreMissingEntries) && \"Referenced value not in value map!\"", file=0xf5a0b8 "/home/pag/git/llvm/lib/Transforms/Utils/ValueMapper.cpp", line=444, function=0xf609c0 <llvm::RemapInstruction(llvm::Instruction*, llvm::ValueMap<llvm::Value const*, llvm::WeakVH, llvm::ValueMapConfig<llvm::Value const*, llvm::sys::SmartMutex<false> > >&, llvm::RemapFlags, llvm::ValueMapTypeRemapper*, llvm::ValueMaterializer*)::__PRETTY_FUNCTION__> "void llvm::RemapInstruction(llvm::Instruction*, llvm::ValueToValueMapTy&, llvm::RemapFlags, llvm::ValueMapTypeRemapper*, llvm::ValueMaterializer*)") at assert.c:101
#4  0x00000000009126a1 in llvm::RemapInstruction (I=0x16e5970, VMap=..., Flags=llvm::RF_NoModuleLevelChanges, TypeMapper=0x0, Materializer=0x0) at /home/pag/git/llvm/lib/Transforms/Utils/ValueMapper.cpp:443
#5  0x00000000008a1ce7 in llvm::CloneFunctionInto (NewFunc=0x1643cc8, OldFunc=0x167b9a8, VMap=..., ModuleLevelChanges=false, Returns=..., NameSuffix=0xe622ea "", CodeInfo=0x0, TypeMapper=0x0, Materializer=0x0) at /home/pag/git/llvm/lib/Transforms/Utils/CloneFunction.cpp:163
#6  0x00000000006d5528 in remill::(anonymous namespace)::AddBlockInitializationCode (block_func=0x1643cc8, template_func=0x167b9a8) at /home/pag/Code/remill/remill/BC/Translator.cpp:155
#7  0x00000000006d4ce1 in remill::Translator::LiftBlock (this=0x7fffffffd9b0, cfg_block=0x16f09c0) at /home/pag/Code/remill/remill/BC/Translator.cpp:607
#8  0x00000000006d4a01 in remill::Translator::LiftBlocks (this=0x7fffffffd9b0, cfg_module=0x1690c50) at /home/pag/Code/remill/remill/BC/Translator.cpp:580
#9  0x00000000006d48be in remill::Translator::LiftCFG (this=0x7fffffffd9b0, cfg_module=0x1690c50) at /home/pag/Code/remill/remill/BC/Translator.cpp:565
#10 0x00000000006f242c in main (argc=1, argv=0x7fffffffdd08) at /home/pag/Code/remill/remill/Translate.cpp:85

Convert __mcsema_memory_order into a third parameter to control-flow intrinsics instead of having it being a global variable.

I have a hunch that this will further improve compiler optimizations, as well as tightening the optimized code to not include so many loads/stores.

Add State structure reference to memory access intrinsics.

This would probably be very useful, especially for implementations that want to access things like the program counter from within the memory intrinsics.

Improve external dependencies

I think dependency management needs to be improved. Right now the bootstrap.sh script downloads and builds some packages from source. It would be preferable to use existing package managers to get these libraries.

Provide a script for unpacking and installing XED to a common library directory, e.g. /usr/local/lib. This should allow the build process to avoid nasty hacks related to linking against the third-party folder.
Use OS package managers (aptitude, homebrew, etc.) to download and install things like protobufs, glog, etc. This should also be used for pip to install the python bindings of things.
Include the protoc-produced Python- and C++-generated code for using CFG.proto. Changes to CFG.proto should result in these auto-generated files being updated in the repo.
Download and globally install the LLVM release, assuming it is not already installed.

These steps will make it easier to have a bunch of simple binaries (e.g. remill-opt, remill-lift, etc.) that can be installed to system directories, without needing to reference stuff in the third_party directory. Ideally, this type of change will enable Remill itself to be packageable.

Create testing infrastructure

Tasks:

Add CPU number access as an intrinsic.

Perhaps to be used by RDTSCP.

Investigate AFL on Remill bitcode

Function return intrinsic

Recently I've been mulling through the idea of introducing a new intrinsic, __remill_accept_function_return, and would appreciate feedback.

The idea is to be explicit about function return target. I don't think this loses generality, even in the case of something like longjmp or the pattern of doing call +5; pop reg to get the current program counter.

The idea is to use a setjmp idiom for preparing function returns. It would be a kind of brother to __remill_function_return, who would presumably target this function (though does not need to).

Suppose this is the code being translated. It is contrived but shows the point:

           sub_0123abc:
0123abc    call sub_0456def
0123ac1    ...   // Target code of return at 0456def.
...
           sub_0456def:
0456def    ret  // Targets 0123ac1.

Here's the gist of what code would look like, as if it were written in C++:

void __remill_sub_123abc(State *state, Memory *memory, addr_t pc) {
  CALL_NEAR_RELBRd(state, &memory, 0x456def);
  if (!__remill_accept_function_return(state, &memory, 0123ac1) {
    return __remill_sub_456def(state, memory, 0x456def);  // Target of call.
  } else {
    return __remill_sub_123ac1(state, memory, 0123ac1);  // Target of return.
  }
}

void __remill_sub_456def(State *state, Memory *memory, addr_t pc) {
  RET_NEAR(state, memory);
  return __remill_function_return(state, memoru, state->gpr.rip.qword);
}

One thing of note is that the __remill_accept_function_return intrinsic function takes memory by pointer, so that it can change the memory pointer used down the return path.

I think this may be nice from a static analysis perspective, especially for direct function calls. There's no real way to distinguish a direct function call from a direct jump in the optimised LLVM bitcode; this would provide such a way. It may also be useful for some kind of CFI-related instrumentation downstream, but that's not a compelling enough reason to do this.

The __remill_sub_123ac1 would still be marked as an "indirect block", so I don't think there would be any loss of generality.

Finally, I think is that this structure is also easily removable by a downstream tool -- it commits it to nothing. Consider replacing all uses of __remill_accept_function_return with false. Dead code elimination will turn the result into what we already have.

Micro op fusion

Consider trying to do the equivalent of Intel's micro-op fusion to make the compiled code for compare-and-jump patterns more sane.

Investigate XSAVE instruction.

Use the new Intel XED kits instead of PIN kits.

The XED kits can be found here:
https://software.intel.com/en-us/protected-download/267266/560870/step2

Partially unimplemented instruction intrinsic

In some cases it might be valuable to have a partially (un)implemented instruction. For example, some complex FPU instructions like FPATAN have two real components:

The mechanics of how the FPU stack is modified, and
The actual algorithm (e.g. arctangent).

The former should be implemented in the partial instruction, as it is arch-specific and depends on no special features. The latter should be stubbed out in some way. McSema1 currently does someline like this by using LLVM intrinsics. This is probably be the most sensible approach.

Unimplemented instruction intrinsic

Implement an unimplemented instruction intrinsic. This intrinsic should be treated as a control-flow intrinsic, be given the current and next program counters, the bytes of the instruction, and a reference to the State structure. In a practical application, this intrinsic could be implemented via micro-execution of the instruction, or via full VM-based emulation (a la Unicorn).

MMX Instruction support.

Implement and test the following MMX instructions:

Investigate PointsTo on Remill

Encode CFG as meta-data.

Replace some code in condition flag computation with compiler builtins

For example, use __builtin_parity for parity computation, as opposed to doing it manually. I think this will improve optimisation opportunities without loss of generality.

Pre-publication tasks

Convert code to use LLVM-like instruction functions as opposed to depending on implicit C/C++ semantics. E.g. using inline functions like ZExtend, FMul, etc,
Replace order_t with an abstract Memory * variable, and pass it by value through the basic block functions and into instruction semantics functions. This will get closer to describing the small-step semantics of memory-modifying code, and clean up the optimized bitcode substantially.
Implement a Binary Ninja-based get_cfg. This will be much simpler than McSema's one.
Fix the IDA get_cfg program.
Documentation describing some of the finer points of the design.
Have the get_cfg program output one .cfg proto file per function, as opposed to one large proto. Use cfg_to_bc chaining functionality to build up a single bitcode file.
Make some kind of front-end script, kind of like how the gcc or clang can do all sorts of things via clang ..., .e.g remill .... It would be cool if you could use remill in a Makefile, sort of like you can with an everyday compiler.

Consider using SQLite in place of protobufs

Some of the advantages are:

Does not have a limit on the file size. Protobufs have a limit, unless a special API is used to increase the limit
Tools using mcsema2 can extend the database in their own ways. Using SQLite could thus provide a way to share data across tools.

Add CPU time stamp counter access as an instrinsic.

Represent ST registers passed to instruction semantic functions as integral indexes, e.g. ST(0) is 0.

That way the stack register can be computed in terms of the data pointer and the index argument passed to the semantic function.

Clang static analyzer

Update the build system's --dry_run option (or add something similar) to produce a compilation database [1] that can be consumed by the Clang Static Analyzer.

[1] http://clang.llvm.org/docs/JSONCompilationDatabase.html

Investigate using OpenCL's ext_vector_type attribute to back vecN_t types.

It's possible that this would work better for ARM's NEON implementation of SIMD. Right now I use the vector_size attribute.

Implement IDA script for producing one or more CFG protos

There used to be one but it was over complicated and hacky (see commit history). The proto format has since changed, so having a new script would be helpful.

Eliminate (n)curses dependency

I am not even sure what needs it. It may actually be LLVM libraries. If so, then this issue isn't really doable.

Implement conditional interrupts like conditional branches

Conditional branches modify the BRANCH_TAKEN variable. The translator then uses the value of this variable to decide to tail-call to one block function or another. I think that conditional interrupt instructions, like into and bound can be similarly implemented. There is already the INTERRUPT_TAKEN variable that is modified, as shown below:

DEF_ISEL_SEM(INTO) {
  INTERRUPT_TAKEN = FLAG_OF;
  INTERRUPT_VECTOR = 4;
}

In the above code, INTERRUPT_TAKEN maps into a field in the State structure. I think this is particularly ugly. A better solution would be to use the BRANCH_TAKEN variable in __remill_basic_block, thereby not polluting the State structure with fields that cannot opaquely [1] be represented across architectures.

I think instead we can just "take over" BRANCH_TAKEN variable. This particular nuance is "hidden" by the code translator. For example, the semantics of JO (jump on overflow) are:

DEF_SEM(JO, R8W cond, PC taken_pc, PC not_taken_pc) {
  auto take_branch = FLAG_OF;
  Write(cond, take_branch);
  Write(REG_PC, Select<addr_t>(take_branch, taken_pc, not_taken_pc));
}

The R8W cond argument is actually a pointer to the BRANCH_TAKEN. We can see the addition of this argument in the DecodeConditionalBranch code.

What I mean by "opaquely" is that "front line" code implementing the __remill_interrup_call intrinsic should not need to know the actual contents of the state structure itself. Imagine a scenario where you have a symbolic executor, and you point it at some lifted bitcode, as well as a shared library implementing a system call model. The implementation of __remill_interrupt_call should only "pass the buck" into the shared library's code. It should not need to inspect a field telling it if the interrupt should be taken, because then it needs to know the structure of the State struct, and thus would not be architecture-agnostic. Of course, we still have the pesky interrupt vector field, which is a nuisance. There is a solution to this, though. The State structure could be a derived class of an architecture-neutral class with common elements. The opaque implementations of intrinsics could generically operate on the State structure's base class, thus gaining extra info. I think this is acceptable for interrupt vector numbers, but not acceptable for conditional execution of an interrupt.

Make sure initial SSE / AVX / AVX512 register state is all 1s for tests

This should be an effective way to verify that (non-)zeroing of higher bits of SSE or AVX registers happens correctly.

X87 Instruction support

Implement and test the following instructions:

Pattern match to boil flag uses in jCC to the intended operation.

This may end up being a bit x86-specific, but maybe not. The key is to do this before force-inlining some of those flag functions.

Type conversion instruction support

Implement and test the following instructions:

Implement MMX instructions.

This issue relates to #19.

ida_get_cfg doesn't recognize every basic block

Sometimes it will miss basic blocks that it knows should exist (e.g. target of a direct call).

Fix build of x86 test generators/runners.

Somehow cfg_to_bc produces bitcode that contains an invalid record here:
http://code.woboq.org/llvm/llvm/lib/Bitcode/Reader/BitcodeReader.cpp.html#3851

What is a good strategy to debug this? Some ideas:

Inject bitcode dump() invocations into the bitcode reader to see how far it gets.
Trace the program's execution to see what sequence of calls/returns leads to the error.
Try to dump the bitcode that is lifted for each function as it is produced by cfg_to_bc.

Crashing compile_semantics.sh on Ubuntu 16.04.1

From a freshly installed remill repo, I followed the README and got to ./scripts/bootstrap.sh
It crashes while running scripts/compile_semantics.sh

Building for x86
In file included from /remill/remill/Arch/X86/Runtime/Instructions.cpp:5:
In file included from /remill/remill/Arch/Runtime/Intrinsics.h:6:
/remill/remill/Arch/Runtime/Types.h:6:10: fatal error: 'cstdint' file not
      found
#include <cstdint>
         ^
1 error generated.
In file included from /remill/remill/Arch/X86/Runtime/BasicBlock.cpp:3:
In file included from /remill/remill/Arch/X86/Runtime/State.h:21:
In file included from /remill/remill/Arch/Runtime/Runtime.h:14:
In file included from /remill/remill/Arch/Runtime/Intrinsics.h:6:
/remill/remill/Arch/Runtime/Types.h:6:10: fatal error: 'cstdint' file not
      found
#include <cstdint>
         ^
1 error generated.
clang-3.8: error: no such file or directory: '/remill/generated/Arch/X86/Runtime/sem_x86_instr.bc'
clang-3.8: error: no input files
/remill/third_party/bin/llvm-link: /remill/generated/Arch/X86/Runtime/sem_x86_block.bc: error: Could not open input file: No such file or directory
0  llvm-link       0x0000000000574308
1  llvm-link       0x0000000000574977
2  libpthread.so.0 0x00007f57616da3d0
3  llvm-link       0x00000000004dd2e1
4  llvm-link       0x0000000000409257
5  llvm-link       0x000000000040845c
6  llvm-link       0x00000000004070b2
7  libc.so.6       0x00007f5760865830 __libc_start_main + 240
8  llvm-link       0x0000000000406e19
Stack dump:
0.  Program arguments: /remill/third_party/bin/llvm-link -o=/remill/generated/sem_x86.bc /remill/generated/Arch/X86/Runtime/sem_x86_block.bc /remill/generated/Arch/X86/Runtime/sem_x86_instr.opt.bc 
./scripts/compile_semantics.sh: line 23: 66120 Segmentation fault      $DIR/third_party/bin/llvm-link -o=$DIR/generated/${FILE_NAME}.bc $DIR/generated/Arch/X86/Runtime/${FILE_NAME}_block.bc $DIR/generated/Arch/X86/Runtime/${FILE_NAME}_instr.opt.bc
Error: Building for x86

I attached a quick cxxflags change in compile_semantics.sh that fixed this for me.
cxxflags_diff.txt

Upgrade to LLVM 3.9

Intermediate milestones:

Update bootstrap.sh to download new code for LLVM.
Change libOptimize into no longer being an LLVM pass, but instead being a tool. I think this will simplify the build process in a number of ways. Name this new tool remill-opt.
Rename cfg_to_bc to remill-lift.

Add undef dead store eliminator to libOptimize

This should eliminate stores of undef values, as well as memsets of undef values.

Add jump tables to CFG proto

A new structure in the protobuf that represents indirect control flows. This should be represented as a pair of program counters: the address of the control-flow instruction, and the address of the targeted block. This field should also have some kind of flow type annotation, e.g. is_local, that semantically says that the source block and target block logically belong to the same function.
Represent jump tables using this new format. That is, each entry of the jump table should be an element of this structure.
Update the data-flow analyses in Remill to understand indirect flows via jump tables. This will improve dead register and flag elimination.

Create a new arch feature intrinsic

I think CPUID should be handled by a special control-flow intrinsics, __mcsema_arch_read_features. The purpose of making it a control-flow intrinsic, kind of like __mcsema_function_call is to implicitly represent that the behaviour of the intrinsic is undefined (i.e. it can read/write the machine state in an arbitrary way) and therefore unobservable to static analysis. It also comes with the benefit that the synchronizing nature of the instruction would be somewhat implicit in its use as a flow intrinsic,

Implement Binary Ninja-based get_cfg

This should be pretty straightforward since all it needs are basic blocks, a list of known code symbol names, and whether or not those symbols are exported.

Add multiply strength reduction to libOptimize

The following pattern some times comes up:

  %26 = mul nsw i128 %25, -8198552921648689607
  %27 = trunc i128 %26 to i64
  store i64 %27, i64* %3, align 8
  store i64 4216879, i64* %5, align 8
  %trunc = trunc i128 %26 to i32

The only uses of %26 are trunc instructions, so I should be able to strength reduce the 128-bit multiplication to be a 64-bit multiplication.

Replace build.py with CMake

Ideally, it would be cool if there were a way to "install" Remill somewhere.