Git Product home page Git Product logo

libmatch's Introduction

LibMatch: context-based library matching w/ angr

LibMatch is a proof-of-concept tool for matching object files to binary executables. The key idea here is context -- in firmware, many functions look very similar, which will confuse other tools like FLIRT. Sometimes, the functions will be entirely identical, but we'll want to care what their name is for the purposes of re-hosting with High Level Emulation. Other benefits include using imports to indirectly name functions we don't have the code for, or whose code was changed due to compiler flags for version mismatches.

This tool is meant to go with, and was developed along-side HALucinator(https://github.com/embedded-sec/halucinator) and hal-fuzz (https://github.com/ucsb-seclab/hal-fuzz)

Installing

EDG notes: This is a proof-of-concept, it uses tons of RAM and isn't the world's most efficient tool. It does get the job done, and we've used it on real firmware successfully, it just needs a little refactoring before I'd say it's ready for prime-time.

Shortcut: Now with Docker!

If you're the kind of person that doens't hate Docker, you can try libmatch fast using this handy dockerfile!

docker build -t libmatch .

...and much later...

docker run -it libmatch /bin/bash

Manual setup

First, get angr(https://angr.horse/ ) I suggest using the angr-dev package to do so (https://github.com/angr/angr-dev/ )

You'll also need autoblob, a CLE Loader that I wrote which helps with some binary blob loading (https://github.com/subwire/autoblob/ )

Usage

Once you have an angr environment, you can use the ./utils/unblob tool to build some databases. Put all your objects in a folder structure like:

./objects/my_hal/library1/obj1.o
./objects/my_hal/library1/obj2.o
./objects/my_hal/library2/foo/obj3.o

Objects can live in any depth of subfolders you like. You can even just copy in the build tree of the SDK or library into a folder.

Then, do the following:

./utils/unblob -B ./objects/my_hal ./objects/my_hal.lmdb

...and go get a coffee.

Once that's done, grab your blob or ELF, and do:

./utils/unblob -U -L ./objects/my_hal.lmdb -Y ./bins/my_firmware.bin ./bins/my_firmware.yml

...and go get a much smaller coffee. This will produce a YML file of symbols, immediately ready to be ingested by HALucinator or hal-fuzz.

Curious how well it's doing? Debugging problems? Got an ELF with symbols? Try this:

./utils/unblob -U --scoring -L ./objects/my_hal.lmdb -Y ./bins/my_firmware.elf ./my_firmware.yml

This will produce nice colorful debug output with accuracy and collision information.

Example

Some of these databases get rather big -- this is something we'd like to optimize, but for now, we include a few examples so you can see the process in action.

Here's one, start-to-finish:

./utils/unblob -B ./objects/arm-none-eabi ./objects/arm-none-eabi.lmdb

.... wait some time....

This will build an LMDB of the STM32 HAL, mbed, and some other assorted stuff.

You can give it a try on our test binary. This will run in scoring mode (used for metrics gathering). It will first output "naive" results (without context), and ask you to hit Enter, followed by the final resutls. We use an ELF here for ground-truth, but of course you can use this on the blob version of the same file too!

./utils/unblob -U --scoring -L ./objects/arm-none-eabi.lmdb -Y ./bins/Nucleo_i2c_master.elf ./bins/Nucleo_i2c_master_addrs.yml

TODOs and future work

  • Re-work the exact matching to not require the full CFG, and lifted binary. Perhaps use an LSH approach (EDG hypothesizes this won't work as well as the Ghidra authors claim)

  • Optimize LMDB storage format to use shelf or similar, to avoid massive memory usage.

libmatch's People

Contributors

subwire avatar pcgrosen avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.