Git Product home page Git Product logo

iovfi's Introduction

IOVec Function Identification (IOVFI)

Dynamic function identifier for stripped binaries. IOVFI works by measuring program state changes a function performs with a given input program state, and then uses this information as a unique fingerprint for later identification in unknown binaries. The combination of input program state and output program state is called an IOVec, or Input/Output Vector. We then store these IOVecs in a binary decision tree, which can then be used to identify functions quickly.

Prerequisites

  1. GCC, automake, python-numpy, python-sklearn, cmake

Building

  1. mkdir cmake-build-debug && cd cmake-build-debug
  2. cmake ..
  3. cmake --build . --target BuildValgrind
  • The valgrind binary will be placed in cmake-build-debug/src/valgrind/install/bin
  1. cmake --build . --target segrind_so_loader

Building a decision tree

Building the decision tree involves fuzzing the target application, consolidating the IOVecs generated, and then actually generating the decision tree. The following scripts assume you are working from the top level directory of this repo, but we recommend that you perform this in a separate directory. Adjust paths accordingly. Each script has a -h tag to get help.

The following scripts create two directories, _work and logs. logs, unsurprisingly, contains the logs of the fuzzing and consolidation actions. _work is the where the active directory for the fuzzing and consolidation scripts work out of, and contains really nothing of value, and can be deleted when the decision tree is created.

A fair warning: the two directories can take up a lot of space.

Fuzzing a binary

  1. src/software-ethology/python/fuzz-applications.py -valgrind cmake-build-debug/src/valgrind/install/bin/valgrind -ignore tests/ignored.txt -t tree.bin -bin /path/to/binary

This creates tree.bin which is the decision tree generated after fuzzing the binary for a period of time, or until the code coverage threshold is exceeded. If you want to fuzz a library, use the segrind_so_loader, i.e., attach -loader cmake-build-debug/bin/segrind_so_loader to the previous command.

Semantic Function Identification

  1. src/software-ethology/python/IdentifyFunction.py -valgrind cmake-build-debug/src/valgrind/install/bin/valgrind -b /path/to/unknown/binary

This script creates a file called guesses.bin, which is a python dictionary mapping functions in the supplied binary (in the form of src/software-ethology/python/context/FunctionDescriptor python objects) to equivalence classes in the tree. If the function could not be found in the decision tree, then it is assigned None.

iovfi's People

Contributors

deadly-platypus avatar liblor avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

Forkers

bar2023

iovfi's Issues

Arity Inaccuracy

Instructions are missed if a conditional jump target is never the target of an unconditional jump

Output hex output ASCII characters

The runtime library function output_hex outputs printable characters if it can. This makes it difficult for the CSV parser to correctly convert types.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.