Comments (2)
This will need some progress on #4
from fcd.
So, after one day's worth of effort, I've been able to take (very slightly modified) McSema produced bitcode for test.c, run it through RemillArgumentRecovery
and RemillStackRecovery
IR passes and produce C pseudocode using the AST passes in fcd. The output for function main()
is as follows:
uint64_t sub_400566_main(uint64_t RSP8, uint64_t RSP16, uint64_t RSP24, uint64_t RSP32, uint64_t RSP40, uint64_t RSP48)
{
uint64_t alloca7;
uint64_t alloca11;
uint64_t alloca14;
uint64_t alloca15;
uint64_t alloca16;
uint64_t alloca1 = RSP8;
uint64_t alloca2 = RSP16;
uint64_t alloca3 = RSP48;
uint64_t alloca4 = RSP40;
uint64_t alloca5 = RSP32;
uint64_t alloca6 = RSP24;
uint64_t anon8 = (uint64_t){{0, 0, 0, 0}};
alloca7 = anon8;
uint64_t anon10 = (uint64_t)&alloca11 | 1;
alloca9 = anon10 + 42;
uint64_t anon12 = (uint64_t){{1, 0, 2, 0, 0, 0, 0, 0}, {71, 108, 111, 98, 97, 108, 32, 118, 97, 114, 105, 97, 98, 108, 101, 32, 39, 97, 39, 32, 111, 102, 32, 118, 97, 108, 117, 101, 32, 37, 117, 32, 97, 116, 32, 97, 100, 100, 114, 101, 115, 115, 32, 37, 112, 32, 105, 115, 32, 0}, {101, 118, 101, 110, 46, 0}, {111, 100, 100, 46, 0}};
uint32_t* anon13 = (uint32_t*)anon8;
printf(anon12 + 8 & 0xffffffff, (__zext uint64_t)*anon13, anon8, __undefined, __undefined, __undefined, alloca14, alloca7, alloca15, *(uint64_t*)alloca16, alloca1, alloca2, alloca6, alloca5, alloca4, alloca3);
uint64_t alloca9 = anon10 + ((*anon13 & 1) != 0 ? 67 : 55) + 10;
if ((*anon13 & 1) != 0)
{
puts(anon12 + 64 & 0xffffffff);
}
else
{
puts(anon12 + 58 & 0xffffffff);
}
return 0;
}
I think the whole experiment can be summarized in the following points:
- Does fcd work with mcsema bitcode?
In principle, it does. But it's likely unstable and the output is pretty low quality
- If it crashes, what seems to be the issue?
Currently the two main reasons fcd crashes with mcsema bitcode is: a) __remill_basic_block()
was not preserved; b) the AST passes don't support an IR construct present in the mcsema bitcode.
- What opportunities might there be with running on mcsema-lifted bitcode?
For fcd, it's better CFG recovery, better recovery of binary data other than executable code (global variables, static constants, ...) and last but not least, test cases for argument recovery, stack recovery and pseudocode generation.
For mcsema, it's access to an easily hackable LLVM pass pipeline with support for passes written in python and quick overview of a binary in a high-level language.
- Is it worth the time?
In my opinion, definitely. CFG recovery and lifting of all binary data (not just executable code) is not a trivial task. Using whatever existing code makes it possible to spend developer time on something more meaningful. Also I think that with each iterative improvement to RemillArgumentRecovery
, RemillStackRecovery
and the AST passes we'll see very noticeable improvements in the output.
from fcd.
Related Issues (20)
- RFC: Recovery of parameters passed via stack HOT 1
- Alias Analysis of Remill's `State` structure
- Enhance function return type recovery in `RemillArgumentRecovery`
- Segfault in `ExpressionUse::setUse(Expression*)`
- Interactivity or scripting HOT 9
- Analyzing RA location on the stack
- Remill `State` and Intrinsic cleanup
- Analyze callsites in `RemillArgumentRecovery`
- Migrate IR passes from `RemillTranslationContext::FinalizeModule()`
- Refactoring AST generation HOT 1
- Improve handling of conditions in AST generation HOT 3
- Handle global entities in AST generation HOT 1
- Handle floats in AST generation HOT 1
- Presentation of string literals in output pseudocode HOT 1
- Refactor python bindings
- Refactor header declaration parsing HOT 2
- Refactor AddressSpaceAAWrapperPass HOT 1
- Add a comprehensive help message to cmdline flags
- Bad value replacement in `ConvertRemillArgsToLocals()`
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from fcd.