Git Product home page Git Product logo

killvxk / mcsema Goto Github PK

View Code? Open in Web Editor NEW

This project forked from lifting-bits/mcsema

0.0 1.0 0.0 90.81 MB

x86 to machine code translation framework

License: Other

CSS 0.20% C++ 88.43% Perl 0.03% Shell 0.84% C 1.55% Max 0.02% XSLT 0.40% Python 1.50% Makefile 0.19% Assembly 1.98% TeX 0.01% Perl 6 0.01% Tcl 0.01% JavaScript 0.02% C# 0.02% Objective-C 4.76% Rebol 0.01% PHP 0.03% Ruby 0.01% Bison 0.01%

mcsema's Introduction

MC-Semantics

Build Status

MC-Semantics (or mcsema, pronounced 'em see se ma') is a library to translate the semantics of native code to LLVM IR. The MC-Semantics project is separated into a few sub-projects:

  • Control Flow Recovery
  • Instruction Semantics
  • Binary File Parsing
  • Semantics Testing

We hope that this library is useful to the program analysis and reverse engineering community. Currently it supports the translation of semantics for x86 programs and supports subsets of integer arithmetic, floating point, and vector operations. Work is in progress, and additional semantics are constantly being added.

Patches are welcome.

News

09/01/2014: MC-Semantics now builds on Linux and supports ELF object files. Linux support is not as well tested as Windows, and currently assumes all indirect branches and callbacks are to translated code.

Separation of Components

MC-Semantics is separated into two conceptual parts: control flow recovery and instruction translation.

The two parts communicate via a control flow graph structure that contains native code. This control flow graph structure connects basic blocks and defines information about external calls, but provides no further semantic information.

The bin_descend program attempts to recover a control flow graph from a given binary file. It will write the recovered control flow graph into a Google Protocol Buffer serialized file. There is also an IDAPython script to recover control flow from within IDA Pro.

The cfg_to_bc program attempts to convert a control flow graph structure into LLVM bitcode. This translation process is more a transcription act than an analysis, since a control flow structure has already been recovered.

The problems of instruction semantics and control flow recovery are separated. Any recovered control flow graph, from any mechanism, may be analyzed and studied in an LLVM intermediate representation.

Documentation

Detailed design and usage information can be found in the docs directory.

Building

Detailed build instructions for Windows and Linux are at docs/BUILDING.md. If you use Ubuntu 14.04, then bash bootstrap.sh will install dependencies via apt-get and compile the release version of the tools into a directory called build. The entire process can take over 40 minutes.

Usage

Usage instructions, with examples, are at docs/TOOLS.md. For more examples, see the demos described in docs/DEMOS.md.

Most of the documentation uses Windows-based examples, but pretty much everything should be cross-platform.

Source Code Information

The layout of the source code is described in docs/NAVIGATION.md. The description of the protocol buffer layout and the translation process is in docs/USAGE_AND_APIS.md.

External Code

mcsema uses external code which has been included in this source release:

  • LLVM 3.2
  • Google Protocol Buffers
  • Boost

mcsema also uses external code which has not been included in this source release, but is freely available:

  • Intel Pin 2.10

Contact

For any questions, contact [email protected].

There is a mailing list dedicated to mcsema: [email protected]. It can also be accessed via web at: https://groups.google.com/forum/?hl=en#!forum/mcsema-dev

mcsema's People

Contributors

artemdinaburg avatar dguido avatar mewmew avatar sineaggi avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.