lifting-bits / rellic Goto Github PK
View Code? Open in Web Editor NEWRellic produces goto-free C output from LLVM bitcode
License: Apache License 2.0
Rellic produces goto-free C output from LLVM bitcode
License: Apache License 2.0
Potential candidates:
The mapping could be something like std::unordered_map<llvm::Instruction*, std::vector<clang::Stmt*>>
.
Please contact me if you want the actual bitcode (it is 300mb of llvm bitcode generated by McSema!)
Output from program------
F0618 14:25:43.245985 34890 IRToASTVisitor.cpp:84] Unknown LLVM Type
*** Check failure stack trace: ***
@ 0x1b7eeed google::LogMessage::Fail()
@ 0x1b813e4 google::LogMessage::SendToLog()
@ 0x1b7e96b google::LogMessage::Flush()
@ 0x1b82009 google::LogMessageFatal::~LogMessageFatal()
@ 0x7e50b6 rellic::(anonymous namespace)::GetQualType()
There are some basic in the repository's tests
folder. Automating the test using something like pytest would be great.
There is a failure that appears only on MacOS, as discovered in #78
The error is below and appears on LLVM versions 9, 10, 11, with Z3 version 4.8.9
Running tests...
/usr/local/Cellar/cmake/3.19.1/bin/ctest --force-new-ctest-process
Test project /Users/runner/work/rellic/rellic/rellic-build
Start 1: test_roundtrip
1/1 Test #1: test_roundtrip ...................***Failed 9.33 sec
.F.................
======================================================================
FAIL: test_assert (__main__.TestRoundtrip)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/Users/runner/work/rellic/rellic/scripts/roundtrip.py", line 112, in test
roundtrip(self, args.rellic, path, args.clang, args.timeout)
File "/Users/runner/work/rellic/rellic/scripts/roundtrip.py", line 78, in roundtrip
decompile(self, rellic, rt_bc, rt_c, timeout)
File "/Users/runner/work/rellic/rellic/scripts/roundtrip.py", line 59, in decompile
self.assertEqual(p.returncode, 0, "rellic-decomp failure: %s" % p.stderr)
AssertionError: -6 != 0 : rellic-decomp failure: F1211 15:53:30.115931 156609984 Z3ConvVisitor.cpp:65] Check failed: expr.is_bv() z3::expr is not a bitvector!
*** Check failure stack trace: ***
@ 0x1016d323f google::LogMessageFatal::~LogMessageFatal()
@ 0x1016cf7e9 google::LogMessageFatal::~LogMessageFatal()
----------------------------------------------------------------------
Ran 19 tests in 8.666s
FAILED (failures=1)
This includes handling creation of reaching conditions in GenerateAST::CreateEdgeCond
and adding all the necessary visitors in IRToASTVisitor
and Z3ConvVisitor
.
This is especially important when re-generating C pointers from Z3 expressions. Currently the C type information is lost when translated to Z3 formulae. An example solution would be:
(unsigned char *)64 -> (IntegralToPointer |unsigned char *| 64) -> (unsigned char *)64
The sort of the Z3 expression would remain (_ BitVector 64)
as it is currently. Uninterpreted Z3 sorts cannot be used, since we would lose Z3 bitvector semantics and I currently don't know how to add interpretation to uninterpreted Z3 sorts.
The above solution would also require building clang::CastKind -> z3::func_decl
and clang::Type -> z3::func_decl
mappings. Possibly clang::BuiltinType
instead of general clang::Type
.
Adding interpretations to Z3 casting functions like IntegralToPointer
could also be possible. The default being simply returning the second argument, others could model actual C semantics (bitwise extensions, truncations, etc).
Z3 functions derived from types, such as |unsigned char *|
could have an interpretation that would return their bitwidth to aid with casting semantics.
While doing type translation and type casting C expressions I ran into a lot of trouble with different semantics of operations between Z3, LLVM IR and C. For example, C allows numeric costants to have types int
, long
and long long
in their signed and unsigned versions. LLVM IR routinely contains numeric constants of i1
and i8
types, which would naturally map to C types like char
. Another example would be the conflation of C pointers and integers into bitvector sorts when Z3 is involved. It's impossible to tell if a 64 bits wide Z3_BV_SORT
is a char*
or long long
.
The issue becomes even more complex when typing of expressions is involved. LLVM IR has every instruction (a value) explicitly typed and this type can differ from the what the result of an equivalent C expression would be.
My proposal would be to only directly translate variable and constant types between representations (Z3, IR, C). Expression types would inferred using the type semantics of the given representation without referring to the expression types of any other representations. However the result types of expressions should correspond between the representations.
For example if an IR and i8 %a, 1
, where %a
is an i32
, yields an i8
. The equivalent C expression must yield an i8
equivalent type, namely unsigned char
. So the equivalent C expression would be (unsigned char)(a & 1U)
.
The correspondence check can be implemented using gtest / gflag CHECK()
macros.
[-] Library version is libraries-llvm40-ubuntu1604-amd64
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
9 258M 9 24.9M 0 0 25152 0 2:59:30 0:17:18 2:42:12 0
curl: (56) SSL read: error:00000000:lib(0):func(0):reason(0), errno 104
[x] Unable to download cxx-common build libraries-llvm40-ubuntu1604-amd64.
[x] Build aborted.
[-] Library version is libraries-llvm40-ubuntu1604-amd64
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
0 258M 0 645k 0 0 624 0 5d 00h 0:17:38 5d 00h 0
curl: (56) SSL read: error:00000000:lib(0):func(0):reason(0), errno 104
[x] Unable to download cxx-common build libraries-llvm40-ubuntu1604-amd64.
[x] Build aborted.
[-] Library version is libraries-llvm40-ubuntu1604-amd64
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
12 258M 12 32.6M 0 0 15420 0 4:52:47 0:37:00 4:15:47 0
curl: (56) SSL read: error:00000000:lib(0):func(0):reason(0), errno 104
[x] Unable to download cxx-common build libraries-llvm40-ubuntu1604-amd64.
[x] Build aborted.
[-] Library version is libraries-llvm40-ubuntu1604-amd64
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
0 258M 0 0 0 0 0 0 --:--:-- 0:09:46 --:--:-- 0
curl: (56) SSL read: error:00000000:lib(0):func(0):reason(0), errno 104
[x] Unable to download cxx-common build libraries-llvm40-ubuntu1604-amd64.
[x] Build aborted.
I tried to compile SQLite with Clang to LLVM IR and then decompile it using Rellic. I expect there to be more issues than one before this is possible.
This is the first issue encountered.
u@x1 ~/D/rellic_play> ./rellic-build/rellic-decomp-4.0 --input s.ll --output=s.c
F0203 08:57:37.886811 20497 IRToASTVisitor.cpp:116] Check failed: array->isString() ConstantArray is not a string
*** Check failure stack trace: ***
@ 0xa441e8 google::LogMessage::Flush()
@ 0xa4766c google::LogMessageFatal::~LogMessageFatal()
@ 0x697936 rellic::(anonymous namespace)::CreateLiteralExpr()
fish: “./rellic-build/rellic-decomp-4.…” terminated by signal SIGABRT (Abort)
To reproduce:
wget https://www.sqlite.org/2018/sqlite-amalgamation-3250200.zip
unzip sqlite-amalgamation-3250200.zip
~/Desktop/rellic_play/rellic-build/libraries/llvm/bin/clang -S -emit-llvm -o sqlite.ll sqlite-amalgamation-3250200/sqlite3.c
~/Desktop/rellic_play/rellic-build/libraries/llvm/bin/clang -S -emit-llvm -o shell.ll sqlite-amalgamation-3250200/shell.c
~/Desktop/rellic_play/rellic-build/libraries/llvm/bin/llvm-link -o s.ll shell.ll sqlite.ll
./rellic-build/rellic-decomp-4.0 --input s.ll --output=s.c
I've attached s.ll as produced with the above steps.
Cheers,
Robin
This is an aesthetic output improvement. When we see large integers, let's output them as hex.
Examples:
if ((*(unsigned char *)6295756UL) != (*(unsigned char *)6295754UL)) {
*(unsigned char *)6295810UL = '\x01';
should all be hex.
Hi, I tring to install the rellic on ubuntu 18.04, but I met the error when running the command: ./scripts/build.sh
:
FAILED: rellic/CMakeFiles/rellic.dir/AST/Util.cpp.o
/home/lb/mvm/rellic/rellic-build/libraries/llvm/bin/clang++ -DGFLAGS_IS_A_DLL=0 -DGOOGLE_GLOG_DLL_DECL="" -DNDEBUG -D_GNU_SOURCE -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS -isystem . -isystem ../ -isystem libraries/llvm/include -isystem libraries/z3/include -isystem libraries/glog/include -isystem libraries/gflags/include -O2 -g -DNDEBUG -Wall -Wextra -Wno-unused-parameter -Wno-c++98-compat -Wno-unreachable-code-return -Wno-nested-anon-types -Wno-extended-offsetof -Wno-variadic-macros -Wno-return-type-c-linkage -Wno-c99-extensions -Wno-ignored-attributes -Wno-unused-local-typedef -Wno-unknown-pragmas -Wno-unknown-warning-option -fPIC -fno-omit-frame-pointer -fvisibility-inlines-hidden -fno-exceptions -fno-asynchronous-unwind-tables -Wgnu-alignof-expression -Wno-gnu-anonymous-struct -Wno-gnu-designator -Wno-gnu-zero-variadic-macro-arguments -Wno-gnu-statement-expression -gdwarf-2 -g3 -O3 -Werror -pedantic -std=c++14 -MD -MT rellic/CMakeFiles/rellic.dir/AST/Util.cpp.o -MF rellic/CMakeFiles/rellic.dir/AST/Util.cpp.o.d -o rellic/CMakeFiles/rellic.dir/AST/Util.cpp.o -c ../rellic/AST/Util.cpp
../rellic/AST/Util.cpp:206:20: error: no matching constructor for initialization of 'clang::MemberExpr'
return new (ctx) clang::MemberExpr(base, is_arrow, clang::SourceLocation(),
^ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
libraries/llvm/include/clang/AST/Expr.h:2848:3: note: candidate constructor not viable: requires 9 arguments, but 8 were provided
MemberExpr(Expr *Base, bool IsArrow, SourceLocation OperatorLoc,
^
libraries/llvm/include/clang/AST/Expr.h:2852:3: note: candidate constructor not viable: requires single argument 'Empty', but 8 arguments were provided
MemberExpr(EmptyShell Empty)
^
libraries/llvm/include/clang/AST/Expr.h:2807:7: note: candidate constructor (the implicit copy constructor) not viable: requires 1 argument, but 8 were provided
class MemberExpr final
^
1 error generated.
thanks a lot!
It is very helpful if the output code has #include directives.
Understandably, we cannot identify every custom header that may exist.
However, we can handle the case of things in the standard OS headers for the target OS, or at least the standard C library.
The mapping does not have to be perfect; it just has to provide better quality output than what exists now.
Currently rellic will (correctly! -- since they cannot be represented in C) bail on encountering LLVM Phi nodes.
We do not transform the code automatically because we do not want to make changes to the bitcode the user gave us. An error message is produced letting them know to run reg2mem, which will remove Phi nodes.
This is a reasonable solution for interactive use, but for massive testing, we should include a way for rellic to opt-in to phi node elimination.
Suggestion to have something like --remove-phi-nodes
that will run a simplified Phi node eliminator prior to converting to C. (reg2mem does a few other things as well; we are only after Phi removal).
Some of our optimizations are too eager and remove some needed math.
For example, something like:
uint8_t brake_switch = (buf[4] & 0b00001100) >> 2;
...
if (brake_switch){
...
Will decompile to:
if (((arg0[4U])) != '\x00') {
This is missing come critical shifts and bitops in the conditional.
Hi, I'm trying to decompile some rust generated bitcode to see the output's equivalent c structure. I think if rellic supported llvm 9.0, it would be possible.
I'm using noodle to test this. It's a dependency free serialization/deserialization library that is quick to compile.
cargo rustc --release -- --emit=llvm-bc
will produce bitcode in ./target/release/deps/
The error in rellic is:
docker run --rm -t -i -v $(pwd):/test -w /test -u $(id -u):$(id -g) rellic-decomp-80 --input /test/target/release/deps/noodle-2a10fa4336dab943.bc --output /dev/stdout
F1128 18:38:31.484282 7 Util.cpp:89] Unable to parse module file /test/target/release/deps/noodle-2a10fa4336dab943.bc: Unknown attribute kind (60) (Producer: 'LLVM9.0.0-rust-1.39.0-stable' Reader: 'LLVM 8.0.0')
*** Check failure stack trace: ***
@ 0x1c7651d google::LogMessage::Fail()
@ 0x1c789aa google::LogMessage::SendToLog()
@ 0x1c75f2d google::LogMessage::Flush()
@ 0x1c79939 google::LogMessageFatal::~LogMessageFatal()
@ 0x843d13 rellic::LoadModuleFromFile()
Aborted (core dumped)
key takeaway here: (Producer: 'LLVM9.0.0-rust-1.39.0-stable' Reader: 'LLVM 8.0.0')
obviously I'm using rellic built for llvm 8.0, and didn't really expect it to work.
Mainly think about how to construct reaching conditions in a C AST. Maybe use the clang static analyzer cfg structures?
Hi,
Following the instructions in README, I'm getting this error when running the build.sh
script:
[ 72%] Building CXX object CMakeFiles/rellic-decomp-4.0.dir/rellic/AST/Util.cpp.o
/var/tmp/xchalup4/rellic-build/libraries/llvm/bin/clang++ -DGFLAGS_IS_A_DLL=0 -DGOOGLE_GLOG_DLL_DECL="" -DNDEBUG -D_GNU_SOURCE -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS -isystem /var/tmp/xchalup4/rellic-build -isystem /var/tmp/xchalup4/rellic -isystem /var/tmp/xchalup4/rellic-build/libraries/llvm/include -isystem /var/tmp/xchalup4/rellic-build/libraries/z3/include -isystem /var/tmp/xchalup4/rellic-build/libraries/glog/include -isystem /var/tmp/xchalup4/rellic-build/libraries/gflags/include -O2 -g -DNDEBUG -Wall -Wextra -Wno-unused-parameter -Wno-c++98-compat -Wno-unreachable-code-return -Wno-nested-anon-types -Wno-extended-offsetof -Wno-variadic-macros -Wno-return-type-c-linkage -Wno-c99-extensions -Wno-ignored-attributes -Wno-unused-local-typedef -Wno-unknown-pragmas -Wno-unknown-warning-option -fPIC -fno-omit-frame-pointer -fvisibility-inlines-hidden -fno-exceptions -fno-asynchronous-unwind-tables -Wgnu-alignof-expression -Wno-gnu-anonymous-struct -Wno-gnu-designator -Wno-gnu-zero-variadic-macro-arguments -Wno-gnu-statement-expression -gdwarf-2 -g3 -O3 -Werror -pedantic -fopenmp=libomp -std=c++11 -o CMakeFiles/rellic-decomp-4.0.dir/rellic/AST/Util.cpp.o -c /var/tmp/xchalup4/rellic/rellic/AST/Util.cpp
/var/tmp/xchalup4/rellic/rellic/AST/IRToASTVisitor.cpp:335:31: error: no member named 'indices' in 'llvm::GetElementPtrInst'
for (auto &gep_idx : inst.indices()) {
~~~~ ^
1 error generated.
make[2]: *** [CMakeFiles/rellic-decomp-4.0.dir/build.make:157: CMakeFiles/rellic-decomp-4.0.dir/rellic/AST/IRToASTVisitor.cpp.o] Error 1
make[2]: *** Waiting for unfinished jobs....
/var/tmp/xchalup4/rellic/rellic/AST/ExprCombine.cpp:109:17: error: no matching function for call to 'ignoringParens'
has(ignoringParens(binaryOperator(stmt().bind("binop")))))) {}
^~~~~~~~~~~~~~
/var/tmp/xchalup4/rellic-build/libraries/llvm/include/clang/ASTMatchers/ASTMatchers.h:728:25: note: candidate function not viable: no known conversion from
'clang::ast_matchers::internal::BindableMatcher<clang::Stmt>' to 'const internal::Matcher<QualType>' for 1st argument
AST_MATCHER_P(QualType, ignoringParens,
^
/var/tmp/xchalup4/rellic-build/libraries/llvm/include/clang/ASTMatchers/ASTMatchersMacros.h:130:32: note: expanded from macro 'AST_MATCHER_P'
AST_MATCHER_P_OVERLOAD(Type, DefineMatcher, ParamType, Param, 0)
^
/var/tmp/xchalup4/rellic-build/libraries/llvm/include/clang/ASTMatchers/ASTMatchersMacros.h:150:57: note: expanded from macro 'AST_MATCHER_P_OVERLOAD'
inline ::clang::ast_matchers::internal::Matcher<Type> DefineMatcher( \
^
1 error generated.
make[2]: *** [CMakeFiles/rellic-decomp-4.0.dir/build.make:131: CMakeFiles/rellic-decomp-4.0.dir/rellic/AST/ExprCombine.cpp.o] Error 1
Currently there's some issues with how and when function declarations are emitted which prevent emitting C code with global variables that use function pointers. Reproducible via any mcsema-lift output.
The AST gets corrupted during condition based refinement. This is likely due to NestedCondProp incorrectly simplifying conditions. Reproducible by loop.zip
I was trying to see what would happen if I decompiled some bitcode generated by Rust. I get:
F1121 05:25:09.980633 5741 GenerateAST.cpp:159] Unknown terminator instruction
*** Check failure stack trace: ***
@ 0xcc82dd google::LogMessage::Fail()
@ 0xcca76a google::LogMessage::SendToLog()
@ 0xcc7ced google::LogMessage::Flush()
@ 0xccb6f9 google::LogMessageFatal::~LogMessageFatal()
@ 0x781a73 rellic::GenerateAST::CreateEdgeCond()```
Bitcode file is attached:
foo.bc.gz
In https://github.com/lifting-bits/rellic/blob/master/rellic/AST/GenerateAST.cpp#L135, SwitchInst
are not supported.
Compile the following with remill-clang-4.0 -emit-llvm -O3 -c -o example.bc
and decompile:
#include <stdint.h>
uint32_t target(uint32_t n) {
uint32_t mod = n % 4;
uint32_t result = 0;
if (mod == 0) {
result = (n | 0xbaaad0bf) * (2 ^ n);
} else if (mod == 1) {
result = (n & 0xbaaad0bf) * (3 + n);
} else if (mod == 2) {
result = (n ^ 0xbaaad0bf) * (4 | n);
} else {
result = (n + 0xbaaad0bf) * (5 & n);
}
return result;
}
You will see something similar to (instruction print was added):
F1110 21:21:03.700402 59636 GenerateAST.cpp:159] Unknown terminator instruction: switch
*** Check failure stack trace: ***
@ 0x1b4733d google::LogMessage::Fail()
@ 0x1b49834 google::LogMessage::SendToLog()
@ 0x1b46dbb google::LogMessage::Flush()
@ 0x1b4a459 google::LogMessageFatal::~LogMessageFatal()
@ 0x7c039d rellic::GenerateAST::CreateEdgeCond()
SIGABRT (Abort)
Would be willing to work on this, with some guidance.
So far we've been dealing with phi
instructions by requiring preprocessing via reg2mem
. This can lead to a large number of alloca
instructions and consequently into a lot of local variables in the output C of rellic. So the question is wether there is a better way to do it.
Assertion reproducible by sqlite.zip.
The rellic-headergen
tool should produce a functioning C header equivalent from the following code.
#include <utility>
class MyClass {
std::pair<int, int> my_pair;
std::pair<int, int> MyMethod(std::pair<int, int> pair);
};
So far it produces
struct pair {
int first;
int second;
};
void _ZNSt4pairIiiEC1Ev(struct pair *this);
void _ZNSt4pairIiiEC1ERKiS2_(struct pair *this, const int &__a, const int &__b);
struct MyClass {
std::pair<int, int> my_pair;
};
std::pair<int, int> _ZN7MyClass8MyMethodESt4pairIiiE(struct MyClass *this, std::pair<int, int> pair);
A good way to evaluate the results is to compare the LLVM IR produced by code that uses the original C++ header and the generated C header. The shape and size of class.MyClass
in the LLVM IR from both headers should be the same. Name mangling should be the same as well.
first it failed because ninja wasn't installed, so i called the build.sh again, resulting in:
CMake Error at cmake/vcpkg_helper.cmake:17 (message):
Please define a path to VCPKG_ROOT. See
https://github.com/trailofbits/cxx-common for more details. Or if you
don't want to use vcpkg dependencies, add '-DUSE_SYSTEM_DEPENDENCIES=ON'
Call Stack (most recent call first):
CMakeLists.txt:22 (include)
i then rm -rf'd the rellic-build dir, downloaded the 500 MB of stuff again, just to get
Could NOT find Git (missing: GIT_EXECUTABLE)
(i assumed if i cloned the repo before entering the ubuntu rootfs i wouldnt need it)
running build.sh again results in the same error as above.
i guess i have to wait another hour to download the 500MB again and see what comes before the next error.
Likely caused CondBasedRefine
. Might be useful to implement reachability-based refinement from no more gotos. Reproducible via x86.bc.gz
Tried a simple function that gives me an odd error on Z3ConvVisitor; invocation and bitcode below:
$ ./rellic-decomp-8.0 --input /store/artem/git/test/x86.bc --output /dev/stdout
F1121 12:21:17.359514 25086 Z3ConvVisitor.cpp:157] Check failed: iter != c_decl_map.end()
*** Check failure stack trace: ***
@ 0xcc82dd google::LogMessage::Fail()
@ 0xcca76a google::LogMessage::SendToLog()
@ 0xcc7ced google::LogMessage::Flush()
@ 0xccb6f9 google::LogMessageFatal::~LogMessageFatal()
@ 0x800fb1 rellic::Z3ConvVisitor::GetCValDecl()
During the build of rellic I get a number of compilation errors:
[ 19%] Building CXX object rellic/CMakeFiles/rellic.dir/AST/Compat/Stmt.cpp.o [ 19%] Building CXX object rellic/CMakeFiles/rellic.dir/AST/Compat/Expr.cpp.o [ 23%] Building CXX object rellic/CMakeFiles/rellic.dir/AST/InferenceRule.cpp.o [ 26%] Building CXX object rellic/CMakeFiles/rellic.dir/AST/CXXToCDecl.cpp.o [ 30%] Building CXX object rellic/CMakeFiles/rellic.dir/AST/DeadStmtElim.cpp.o [ 34%] Building CXX object rellic/CMakeFiles/rellic.dir/AST/CondBasedRefine.cpp.o [ 38%] Building CXX object rellic/CMakeFiles/rellic.dir/AST/ExprCombine.cpp.o [ 42%] Building CXX object rellic/CMakeFiles/rellic.dir/AST/GenerateAST.cpp.o [ 46%] Building CXX object rellic/CMakeFiles/rellic.dir/AST/IRToASTVisitor.cpp.o [ 50%] Building CXX object rellic/CMakeFiles/rellic.dir/AST/LoopRefine.cpp.o /Users/brianmosher/Documents/repos/rellic/rellic/AST/IRToASTVisitor.cpp:87:57: error: too few arguments to function call, expected 5, have 4 clang::ArrayType::ArraySizeModifier::Normal, 0); ^ /usr/local/share/trailofbits/libraries/llvm/include/clang/AST/ASTContext.h:1349:3: note: 'getConstantArrayType' declared here QualType getConstantArrayType(QualType EltTy, const llvm::APInt &ArySize, ^ In file included from /Users/brianmosher/Documents/repos/rellic/rellic/AST/ExprCombine.cpp:20: /Users/brianmosher/Documents/repos/rellic/rellic/AST/ExprCombine.h:27:34: error: expected class name class ExprCombine : public llvm::ModulePass,
Examining the caller and the ASTContext.h header, there does seem to be a mismatch in the number of args and expected parameters.
rellic/rellic/AST/IRToASTVisitor.cpp:
case llvm::Type::ArrayTyID: { auto arr = llvm::cast<llvm::ArrayType>(type); auto elm = GetQualType(arr->getElementType()); result = ast_ctx.getConstantArrayType( elm, llvm::APInt(32, arr->getNumElements()), clang::ArrayType::ArraySizeModifier::Normal, 0); } break;
libraries/llvm/include/clang/AST/ASTContext.h:
QualType getConstantArrayType(QualType EltTy, const llvm::APInt &ArySize, const Expr *SizeExpr, ArrayType::ArraySizeModifier ASM, unsigned IndexTypeQuals) const;
This is with llvm-10 installed using pkgman.py from cxxx-common.
When I trying to decompile the bc file generated by following code
#include <stdio.h>
#include <string.h>
int main () {
char str1[20];
char str2[20];
int result;
//Assigning the value to the string str1
strcpy(str1, "hello");
//Assigning the value to the string str2
strcpy(str2, "helLO WORLD");
//This will compare the first 3 characters
if(strncmp(str1, str2, 3) > 0) {
printf("ASCII value of first unmatched character of str1 is greater than str2");
} else if(result < 0) {
printf("ASCII value of first unmatched character of str1 is less than str2");
} else {
printf("Both the strings str1 and str2 are equal");
}
return 0;
}
with
$ clang -emit-llvm -c xxx -o xxx.bc
$ ./rellic-build/tools/rellic-decomp-10.0 --input xxx.bc --output xxx_generated.c
I received following error:
F0215 16:30:39.637413 27356 Z3ConvVisitor.cpp:324] Unimplemented FunctionDecl visitor11
*** Check failure stack trace: ***
@ 0x556a801557ac google::LogMessageFatal::~LogMessageFatal()
@ 0x556a7ed39368 rellic::Z3ConvVisitor::VisitFunctionDecl()
Aborted (core dumped
I add more logs in "bool Z3ConvVisitor::VisitFunctionDecl(clang::FunctionDecl *func) " at "rellic/AST/Z3ConvVisitor.cpp" as following:
bool Z3ConvVisitor::VisitFunctionDecl(clang::FunctionDecl *func) {
//https://clang.llvm.org/doxygen/structclang_1_1DeclarationNameInfo.html
LOG(INFO) << "Name of function is : "<<func->getNameInfo().getName().getAsString();
LOG(INFO) << "Declare Info is " << func->getNameInfo().getAsString();
DLOG(INFO) << "VisitFunctionDecl";
LOG(FATAL) << "Unimplemented FunctionDecl visitor11";
return true;
}
And it told me that the function blocks at strncmp:
I0215 16:10:10.213157 25450 Z3ConvVisitor.cpp:320] Name of function is : strncmp
I0215 16:10:10.213160 25450 Z3ConvVisitor.cpp:321] Declare Info is strncmp
F0215 16:10:10.213161 25450 Z3ConvVisitor.cpp:324] Unimplemented FunctionDecl visitor11
Meanwhile, extract "strncmp" out of "if statement" makes everything works.
I tried to fix this but not sure where to start, is this a bug or any reason we do not want to fix this?
For the gdb info, it is here:
__GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51
51 ../sysdeps/unix/sysv/linux/raise.c: No such file or directory.
(gdb) bt
#0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51
#1 0x00007ffff6ec4921 in __GI_abort () at abort.c:79
#2 0x00005555572569fd in google::DumpStackTraceAndExit() ()
#3 0x000055555724eedc in google::LogMessage::SendToLog() ()
#4 0x000055555724f4ce in google::LogMessage::Flush() ()
#5 0x00005555572527ac in google::LogMessageFatal::~LogMessageFatal() ()
#6 0x0000555555e36368 in rellic::Z3ConvVisitor::VisitFunctionDecl (this=<optimized out>,
func=0x555558d327c0) at /home/muqi/decompile_tool/rellic/rellic/AST/Z3ConvVisitor.cpp:324
#7 0x0000555555e3ef8b in clang::RecursiveASTVisitor<rellic::Z3ConvVisitor>::WalkUpFromFunctionDecl (D=0x555558d327c0, this=<optimized out>)
at /home/muqi/decompile_tool/lifting-bits-downloads/vcpkg_ubuntu-18.04_llvm-10_amd64/installed/x64-linux-rel/include/clang/AST/DeclNodes.inc:401
#8 rellic::Z3ConvVisitor::TraverseFunctionDecl (func=0x555558d327c0, this=<optimized out>)
at /home/muqi/decompile_tool/rellic/rellic/AST/Z3ConvVisitor.h:91
#9 clang::RecursiveASTVisitor<rellic::Z3ConvVisitor>::TraverseDecl (this=0x555558d3a000,
D=0x555558d327c0)
at /home/muqi/decompile_tool/lifting-bits-downloads/vcpkg_ubuntu-18.04_llvm-10_amd64/installed/x64-linux-rel/include/clang/AST/DeclNodes.inc:401
#10 0x0000555555e3a30b in rellic::Z3ConvVisitor::GetOrCreateZ3Decl (
this=this@entry=0x555558d3a000, c_decl=c_decl@entry=0x555558d327c0)
at /home/muqi/decompile_tool/rellic/rellic/AST/Z3ConvVisitor.cpp:262
#11 0x0000555555e3a40f in rellic::Z3ConvVisitor::VisitDeclRefExpr (this=0x555558d3a000,
c_ref=0x555558d32f68)
at /home/muqi/decompile_tool/rellic/rellic/AST/Z3ConvVisitor.cpp:672
#12 0x0000555555e3a51c in clang::RecursiveASTVisitor<rellic::Z3ConvVisitor>::TraverseStmt (
this=this@entry=0x555558d3a000, S=S@entry=0x555558d332e0, Queue=0x0)
at /home/muqi/decompile_tool/lifting-bits-downloads/vcpkg_ubuntu-18.04_llvm-10_amd64/installed/x64-linux-rel/include/clang/AST/RecursiveASTVisitor.h:653
#13 0x0000555555e3a6bb in rellic::Z3ConvVisitor::GetOrCreateZ3Expr (this=0x555558d3a000,
c_expr=c_expr@entry=0x555558d332e0)
at /home/muqi/decompile_tool/rellic/rellic/AST/Z3ConvVisitor.cpp:254
#14 0x0000555555e09dc3 in rellic::Z3CondSimplify::SimplifyCExpr (
this=this@entry=0x555558d370b0, c_expr=0x555558d332e0)
at /home/muqi/decompile_tool/rellic/rellic/AST/Z3CondSimplify.cpp:44
#15 0x0000555555e0a320 in rellic::Z3CondSimplify::VisitIfStmt (
this=this@entry=0x555558d370b0, stmt=0x555558d33600)
at /home/muqi/decompile_tool/rellic/rellic/AST/Z3CondSimplify.cpp:56
#16 0x0000555555e0ad99 in clang::RecursiveASTVisitor<rellic::Z3CondSimplify>::WalkUpFromIfStmt (S=<optimized out>, this=0x555558d370d0)
at /home/muqi/decompile_tool/lifting-bits-downloads/vcpkg_ubuntu-18.04_llvm-10_amd64/installed/x64-linux-rel/include/clang/AST/StmtNodes.inc:121
#17 clang::RecursiveASTVisitor<rellic::Z3CondSimplify>::PostVisitStmt (
this=this@entry=0x555558d370d0, S=<optimized out>)
at /home/muqi/decompile_tool/lifting-bits-downloads/vcpkg_ubuntu-18.04_llvm-10_amd64/installed/x64-linux-rel/include/clang/AST/StmtNodes.inc:121
#18 0x0000555555e0a608 in clang::RecursiveASTVisitor<rellic::Z3CondSimplify>::TraverseStmt (
this=this@entry=0x555558d370d0, S=<optimized out>, Queue=0x0)
at /home/muqi/decompile_tool/lifting-bits-downloads/vcpkg_ubuntu-18.04_llvm-10_amd64/installed/x64-linux-rel/include/clang/AST/RecursiveASTVisitor.h:653
#19 0x0000555555e323d0 in clang::RecursiveASTVisitor<rellic::Z3CondSimplify>::TraverseFunctionHelper (this=this@entry=0x555558d370d0, D=D@entry=0x555558d39660)
at /home/muqi/decompile_tool/lifting-bits-downloads/vcpkg_ubuntu-18.04_llvm-10_amd64/installed/x64-linux-rel/include/clang/AST/RecursiveASTVisitor.h:2072
---Type <return> to continue, or q <return> to quit---
#20 0x0000555555e32557 in clang::RecursiveASTVisitor<rellic::Z3CondSimplify>::TraverseFunctionDecl (this=0x555558d370d0, D=0x555558d39660)
at /home/muqi/decompile_tool/lifting-bits-downloads/vcpkg_ubuntu-18.04_llvm-10_amd64/installed/x64-linux-rel/include/clang/AST/RecursiveASTVisitor.h:2077
#21 0x0000555555e0a554 in clang::RecursiveASTVisitor<rellic::Z3CondSimplify>::TraverseDeclContextHelper (this=this@entry=0x555558d370d0, DC=DC@entry=0x555558d29c00)
at /home/muqi/decompile_tool/lifting-bits-downloads/vcpkg_ubuntu-18.04_llvm-10_amd64/installed/x64-linux-rel/include/clang/AST/RecursiveASTVisitor.h:1410
#22 0x0000555555e223aa in clang::RecursiveASTVisitor<rellic::Z3CondSimplify>::TraverseDeclContextHelper (DC=<optimized out>, this=0x555558d370d0)
at /home/muqi/decompile_tool/lifting-bits-downloads/vcpkg_ubuntu-18.04_llvm-10_amd64/installed/x64-linux-rel/include/clang/AST/Decl.h:105
#23 clang::RecursiveASTVisitor<rellic::Z3CondSimplify>::TraverseTranslationUnitDecl (
this=0x555558d370d0, D=0x555558d29bd8)
at /home/muqi/decompile_tool/lifting-bits-downloads/vcpkg_ubuntu-18.04_llvm-10_amd64/installed/x64-linux-rel/include/clang/AST/RecursiveASTVisitor.h:1511
#24 0x0000555555e0a4c1 in rellic::Z3CondSimplify::runOnModule (this=0x555558d370b0,
module=...) at /home/muqi/decompile_tool/rellic/rellic/AST/Z3CondSimplify.cpp:73
#25 0x0000555555f525a8 in llvm::legacy::PassManagerImpl::run(llvm::Module&) ()
#26 0x0000555555ccf1e3 in (anonymous namespace)::GeneratePseudocode (output=..., module=...,
this=<optimized out>) at /home/muqi/decompile_tool/rellic/tools/decomp/Decomp.cpp:95
#27 main (argc=<optimized out>, argv=<optimized out>)
at /home/muqi/decompile_tool/rellic/tools/decomp/Decomp.cpp:202
We should start lowering switch, which will both enable us to generate better bitcode and fix a source of non-translation.
C output for more complex bitcode (i.e. mcsema output) has syntax errors due to the order in which function, type and global variable declarations are emitted.
Possible fixes:
Running rellic-decomp against attached bitcode file produces inconsistent results. I've attached the expected result, and one of the random results. Other results get produces as well.
During simplification of expressions with function calls in the expression, the Z3 conversion visitor will fail to convert function calls properly as per: https://github.com/lifting-bits/rellic/blob/master/rellic/AST/Z3ConvVisitor.cpp#L327
I am not entirely sure as to how this should be implemented, I was thinking something along the lines of emitting an opaque Z3 interpreted function through (how do I get the parameters in there?) and then "interpreting" it as a function call right through. Does this sound like the optimal way? Open to suggestions, would be willing to implement.
after running ./rellic-build/tools/rellic-decomp-9.0 --input foo.exe.bc --output foo.c
(note that the path to the decompiler is different than mentioned in README)
i get:
error: unknown target triple '', please use -triple or -arch
*** Aborted at 1611523302 (unix time) try "date -d @1611523302" if you are using GNU date ***
PC: @ 0x0 (unknown)
*** SIGSEGV (@0x0) received by PID 21849 (TID 0x7fd16427fec0) from PID 0; stack trace: ***
@ 0x7fd1646893c0 (/home/rofl0r/ub/root/usr/lib/x86_64-linux-gnu/libpthread-2.31.so+0x153bf)
@ 0x1bcc80c (/home/rofl0r/ub/root/root/rellic-build/tools/rellic-decomp-9.0+0x1bcc80b)
@ 0x825922 (/home/rofl0r/ub/root/root/rellic-build/tools/rellic-decomp-9.0+0x825921)
Segmentation fault
ftr, foo.exe.bc was created by retdec-4.0 from a 2.8 meg win32 binary (retdec subsequently spent 5 days running on a single core in retdec-llvmir2hll which i subsequently killed, as i figured rellic might be able to do the same, and it pretty much looked like retdec was hung in an infinite loop as memory usage was constant at 32GB RAM for the whole time)
Right now we emit some suffixes like Ui16
that do not seem to be standard. We should stop emitting these.
I was really excited to see the release of Rellic and wanted to take it out for a spin. Minimal programs work are successfully decompiled (e.g. the LLVM IR produced by Clang 4.0 for the input C source int main() {return 42;}
)
However, I ran into an issue that causes an out of bounds crash.
u@x1 ~/D/r/rellic-build> ./rellic-decomp-4.0 --input input.ll --output output.c
*** Aborted at 1548759746 (unix time) try "date -d @1548759746" if you are using GNU date ***
PC: @ 0x0 (unknown)
*** SIGSEGV (@0x0) received by PID 2499 (TID 0x7f56e8afcf40) from PID 0; stack trace: ***
@ 0x7f56e8f463c0 (unknown)
@ 0x6662d3 rellic::CondBasedRefine::VisitCompoundStmt()
fish: “./rellic-decomp-4.0 --input inp…” terminated by signal SIGSEGV (Address boundary error)
Based on the following LLVM IR input:
; ModuleID = 'original.c'
source_filename = "original.c"
target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-unknown-linux-gnu"
; Function Attrs: noinline nounwind uwtable
define i32 @foo(i32, i32) #0 {
%3 = alloca i32, align 4
%4 = alloca i32, align 4
%5 = alloca i32, align 4
%6 = alloca i32, align 4
store i32 %0, i32* %3, align 4
store i32 %1, i32* %4, align 4
store i32 0, i32* %5, align 4
store i32 0, i32* %6, align 4
br label %7
; <label>:7: ; preds = %17, %2
%8 = load i32, i32* %6, align 4
%9 = icmp ne i32 %8, 42
br i1 %9, label %10, label %20
; <label>:10: ; preds = %7
%11 = load i32, i32* %3, align 4
%12 = load i32, i32* %5, align 4
%13 = add i32 %12, %11
store i32 %13, i32* %5, align 4
%14 = load i32, i32* %4, align 4
%15 = load i32, i32* %5, align 4
%16 = urem i32 %15, %14
store i32 %16, i32* %5, align 4
br label %17
; <label>:17: ; preds = %10
%18 = load i32, i32* %6, align 4
%19 = add i32 %18, 1
store i32 %19, i32* %6, align 4
br label %7
; <label>:20: ; preds = %7
%21 = load i32, i32* %5, align 4
ret i32 %21
}
; Function Attrs: noinline nounwind uwtable
define i32 @main() #0 {
%1 = alloca i32, align 4
store i32 0, i32* %1, align 4
%2 = call i32 @foo(i32 10, i32 20)
ret i32 %2
}
attributes #0 = { noinline nounwind uwtable "correctly-rounded-divide-sqrt-fp-math"="false" "disable-tail-calls"="false" "less-precise-fpmad"="false" "no-frame-pointer-elim"="true" "no-frame-pointer-elim-non-leaf" "no-infs-fp-math"="false" "no-jump-tables"="false" "no-nans-fp-math"="false" "no-signed-zeros-fp-math"="false" "no-trapping-math"="false" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+fxsr,+mmx,+sse,+sse2,+x87" "unsafe-fp-math"="false" "use-soft-float"="false" }
!llvm.ident = !{!0}
!0 = !{!"clang version 4.0.1 (tags/RELEASE_401/final)"}
Which is produced with Clang 4.0 (the bundled version) from this C source:
unsigned int foo(unsigned int a, unsigned int b) {
unsigned int sum = 0;
for (unsigned int i = 0; i != 42; i++) {
sum += a;
sum %= b;
}
return sum;
}
int main() {
return foo(10, 20);
}
Those that don't need it, that is. Which are basically all of the clang::RecursiveASTVisitor
derived classes in rellic/AST/
.
rellic/tools/decomp/Decomp.cpp
Line 65 in f6a946f
The above function is starting to get pretty bloated. One way to mitigate this is to enable condition simplification passes to share a single Z3 context.
Assertion failed: V.getBitWidth() == C.getIntWidth(type) && "Integer type is not the correct size for constant.", file C:\saturn\build_llvm\llvm-8.0.1.src\tools\clang\lib\AST\Expr.cpp, line 787
*** Aborted at 1566331860 (unix time) try "date -d @1566331860" if you are using GNU date ***
@ 0x7ffe06c3c31d raise
@ 0x7ffe06c3d321 abort
@ 0x7ffe06c3ed5e _get_wpgmptr
@ 0x7ffe06c3ec55 _get_wpgmptr
@ 0x7ffe06c3efe1 _wassert
@ 0x7ff74bce9f07 public: __cdecl clang::IntegerLiteral::IntegerLiteral(class clang::ASTContext const & __ptr64,class llvm::APInt const & __ptr64,class clang::QualType,class clang::SourceLocation) __ptr64
@ 0x7ff74bce9fd2 public: static class clang::IntegerLiteral * __ptr64 __cdecl clang::IntegerLiteral::Create(class clang::ASTContext const & __ptr64,class llvm::APInt const & __ptr64,class clang::QualType,class clang::SourceLocation)
@ 0x7ff74bc36b2c rellic::IRToASTVisitor::CreateLiteralExpr
@ 0x7ff74bc374cc rellic::IRToASTVisitor::GetOrCreateStmt
@ 0x7ff74bc372f9 rellic::IRToASTVisitor::GetOperandExpr
@ 0x7ff74bc397e1 rellic::IRToASTVisitor::visitBinaryOperator
@ 0x7ff74bc374ec rellic::IRToASTVisitor::GetOrCreateStmt
@ 0x7ff74bc2b7a4 rellic::GenerateAST::CreateBasicBlockStmts
@ 0x7ff74bc2ba0e rellic::GenerateAST::CreateRegionStmts
@ 0x7ff74bc2c6b6 rellic::GenerateAST::StructureAcyclicRegion
@ 0x7ff74bc2d7d9 rellic::GenerateAST::StructureRegion
@ 0x7ff74bc35072 std::_Func_impl_no_alloc<`lambda at C:\rellic\rellic\AST\GenerateAST.cpp:372:24',void,llvm::Region *>::_Do_call
@ 0x7ff74bc2ddc5 rellic::GenerateAST::runOnModule
@ 0x7ff74be62d51 public: bool __cdecl llvm::legacy::PassManagerImpl::run(class llvm::Module & __ptr64) __ptr64
@ 0x7ff74bcd67b8 main
@ 0x7ff74d566884 __scrt_common_main_seh
@ 0x7ffe078e7bd4 BaseThreadInitThunk
@ 0x7ffe090ece71 RtlUserThreadStart
lifted.txt
F20201123 21:49:35.359668 145 IRToASTVisitor.cpp:264] Invalid operand
*** Check failure stack trace: ***
@ 0x1b653fc google::LogMessageFatal::~LogMessageFatal()
@ 0x7c66d4 rellic::IRToASTVisitor::GetOperandExpr()
Spec is attached.
Version output:
rellic-decomp-9.0 --version
rellic-decomp-9.0 version unknown
Commit Hash: 7bacf0bb2dceff6cf6ca4fbafc4d76eae996bb35
Commit Date: 2020-11-04 19:46:27 -0500
Last commit by: Peter Goodman [[email protected]]
Commit Subject: [Merge pull request #76 from lifting-bits/fix_compat_llvm10_llvm11]
Uncommitted changes were present during build.
Using LLVM 9.0.0
bfce8c4a-2dd5-11eb-92b6-0242ac110002_rx_message_routine_gxk6ovju.spec.gz
Having an install
target for rellic will help when packaging rellic to include the necessary binaries and library artifacts, so that they can be distributed in a more lightweight package.
With PR #48 , users have the ability to use Docker to build, test, and run rellic. However, the Dockerfile in that PR produces an image that is fairly large, at around 2GB, with the libraries
directory (inside the build directory) accounting for 1.4GB of that total.
It would be nice to use the install
target to produce smaller Docker images that only include the necessary build artifacts required to develop with/run rellic.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.