Git Product home page Git Product logo

Comments (2)

surovic avatar surovic commented on May 19, 2024

This will need some progress on #4

from fcd.

surovic avatar surovic commented on May 19, 2024

So, after one day's worth of effort, I've been able to take (very slightly modified) McSema produced bitcode for test.c, run it through RemillArgumentRecovery and RemillStackRecovery IR passes and produce C pseudocode using the AST passes in fcd. The output for function main() is as follows:

uint64_t sub_400566_main(uint64_t RSP8, uint64_t RSP16, uint64_t RSP24, uint64_t RSP32, uint64_t RSP40, uint64_t RSP48)
{
    uint64_t alloca7;
    uint64_t alloca11;
    uint64_t alloca14;
    uint64_t alloca15;
    uint64_t alloca16;
    uint64_t alloca1 = RSP8;
    uint64_t alloca2 = RSP16;
    uint64_t alloca3 = RSP48;
    uint64_t alloca4 = RSP40;
    uint64_t alloca5 = RSP32;
    uint64_t alloca6 = RSP24;
    uint64_t anon8 = (uint64_t){{0, 0, 0, 0}};
    alloca7 = anon8;
    uint64_t anon10 = (uint64_t)&alloca11 | 1;
    alloca9 = anon10 + 42;
    uint64_t anon12 = (uint64_t){{1, 0, 2, 0, 0, 0, 0, 0}, {71, 108, 111, 98, 97, 108, 32, 118, 97, 114, 105, 97, 98, 108, 101, 32, 39, 97, 39, 32, 111, 102, 32, 118, 97, 108, 117, 101, 32, 37, 117, 32, 97, 116, 32, 97, 100, 100, 114, 101, 115, 115, 32, 37, 112, 32, 105, 115, 32, 0}, {101, 118, 101, 110, 46, 0}, {111, 100, 100, 46, 0}};
    uint32_t* anon13 = (uint32_t*)anon8;
    printf(anon12 + 8 & 0xffffffff, (__zext uint64_t)*anon13, anon8, __undefined, __undefined, __undefined, alloca14, alloca7, alloca15, *(uint64_t*)alloca16, alloca1, alloca2, alloca6, alloca5, alloca4, alloca3);
    uint64_t alloca9 = anon10 + ((*anon13 & 1) != 0 ? 67 : 55) + 10;
    if ((*anon13 & 1) != 0)
    {
        puts(anon12 + 64 & 0xffffffff);
    }
    else 
    {
        puts(anon12 + 58 & 0xffffffff);
    }
    return 0;
}

I think the whole experiment can be summarized in the following points:

  1. Does fcd work with mcsema bitcode?

In principle, it does. But it's likely unstable and the output is pretty low quality

  1. If it crashes, what seems to be the issue?

Currently the two main reasons fcd crashes with mcsema bitcode is: a) __remill_basic_block() was not preserved; b) the AST passes don't support an IR construct present in the mcsema bitcode.

  1. What opportunities might there be with running on mcsema-lifted bitcode?

For fcd, it's better CFG recovery, better recovery of binary data other than executable code (global variables, static constants, ...) and last but not least, test cases for argument recovery, stack recovery and pseudocode generation.

For mcsema, it's access to an easily hackable LLVM pass pipeline with support for passes written in python and quick overview of a binary in a high-level language.

  1. Is it worth the time?

In my opinion, definitely. CFG recovery and lifting of all binary data (not just executable code) is not a trivial task. Using whatever existing code makes it possible to spend developer time on something more meaningful. Also I think that with each iterative improvement to RemillArgumentRecovery, RemillStackRecovery and the AST passes we'll see very noticeable improvements in the output.

from fcd.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.