electrojustin / triad-decompiler Goto Github PK
View Code? Open in Web Editor NEWTRiad Is A Decompiler. Triad is a tiny, free and open source, Capstone based x86 decompiler for ELF binaries.
License: MIT License
TRiad Is A Decompiler. Triad is a tiny, free and open source, Capstone based x86 decompiler for ELF binaries.
License: MIT License
Triad decompiler version 0.4 Alpha Test. Not intended to be used for copyright infringement or other illegal activities. What is triad: TRiad Is A Decompiler Triad is a tiny, free and open source, Capstone based x86 decompiler that will take in ELF files as input and spit out pseudo-C. Installation: Triad requires Capstone to be installed first. http://www.capstone-engine.org/ For 32 bit tests, gcc-multilib is also required. First, it will be necessary to build triad. "make triad" should suffice. After its components are built, the triad binary will be placed in the build directory. To copy the binary into /usr/bin, simply use "sudo make install." Usage: triad <flags> <file name> <(optional)start address> <(optional) cutoff address> Simply run the triad binary from the command line and specify an ELF to decompile as a parameter. By default, triad will try to find the main function of the given file and start decompiling from there. Sometimes ELFs have all symbols stripped, so triad will be unable to find main. In such a scenario, the user may simply specify a starting address as the second command line parameter. But, an incorrect starting address will likely result in incorrect decompilation or no decompilation. Occasionally it is ambiguous as to where a function actually ends. If a user thinks he/she knows better where a particular function ends than triad and has specified a start address, he/she can specify a cutoff address. The default cutoff address is the end of the segment containing the entry point. Triad has the ability to follow function calls and automatically decompile callees. This is especially helpful when dealing with stripped binaries or other binaries in which relevant code isn't clearly distinguishable from data. Flags: -f: Full decompilation. This is the default. -p: Partial decompilation. Recovered control flow is always going to be bad, so Triad has an option to only partially decompile code. This means Triad will identify variables and parameters, try to recover calling convention, and translate most instructions back into their C operator equivalents, but Triad will leave jumps and comparisons as is with the philosophy that the user knows best how to follow them. -d: Disassemble. Make no attempt to decompile code, simply print out a disassembly in AT&T syntax. -s: Disable call following, just decompile main/whatever code was at the specified address. -h: Print all constants in hexadecimal format. Limitations PLEASE READ BEFORE SUBMITTING A BUG REPORT: Triad really only works on x86 and x86_64 ELF executables. Other architectures may be possible in the future, but there are currently no plans to add them. The triad decompiler is still very much an alpha. The project is nowhere near completion and as such is missing some critical features, contains numerous bugs, has several odd quirks, and has a propensity for segfaulting. Missing features include support for switch decompilation and full support for strings and statically allocated arrays (dynamically allocated arrays will actually probably work to one degree or another, but the syntax will be most unusual e.g. *(char*)(eax + (12)) = 96 instead of array[12] = 'a'). Struct analysis will be a long ways a way as well, and unions may never work properly. The only supported binary format currently supported is the Executable and Linkable Format (ELF), commonly used on UNIX like systems, such as LINUX. Control flow decompilation should be mostly correct, but it may look funky. Continues, and forward gotos inside of conditional statements might wind up as if-else statements. This is actually semantically equivalent, just different from original source. Optimization and computed jumps will probably cause a program to be decompiled completely incorrectly. Triad was designed and tested for programs compiled using gcc. It is important to understand that the generated source code will NEVER be exactly the original source (unless the program was compiled with debug symbols, of course). If triad segfaults on you, feel free to tell me. Include a stack trace and a description of the conditions that triggered the crash if at all possible. For obvious reasons, it is quite important that triad crash as little as possible. "Hacking"/Modding notes: I will be honest, the code is a bit of a mess. It is a short mess, probably less than 2 KLOC, but the amount of pointer arithmetic and number of globals used is not for the faint of heart. That said, feel free to "hack" in features! The license is just MIT, so do whatever. Feel free to contact me if you have any questions about how the code works or think you have a cool feature that should be merged into the codebase. I tried to document the source, but I'm sure certain lines will leave many programmers confused and/or horrified. My email is just [email protected]
"call *%rax" is incorrectly disassembled. Presumably other register calls are disassembled incorrectly too.
There seems to an issue with dynamic symbols on 64 bit ELF's and I'm sure a few others. In addition to the number of dynamic symbols being incorrect for all ELF's (count only includes symbols of the "function" type), 64 bit ELF's seem to have PLT entries that jump relative to the current instruction pointer, making dynamic symbols in 64 bit ELF's unrecognizable by Triad and even causing segfaults (a security risk that should also be looked into). Fix should be relatively simple.
For some reason, EBP sometimes gets confused with EBX. First noticed in a left shift instruction.
Valgrind seems to indicate a number of memory errors when the program attempts to decompile control_flow_test. The program does not crash and the output is correct, but any and all memory issues should be looked at.
Output:
valgrind ./triad control_flow_test
==18208== Memcheck, a memory error detector
==18208== Copyright (C) 2002-2013, and GNU GPL'd, by Julian Seward et al.
==18208== Using Valgrind-3.9.0 and LibVEX; rerun with -h for copyright info
==18208== Command: ./triad control_flow_test
==18208==
==18208== Conditional jump or move depends on uninitialised value(s)
==18208== at 0x406E5A: jump_block_preprocessing (lang_gen.c:664)
==18208== by 0x40700D: translate_func (lang_gen.c:701)
==18208== by 0x4073A7: translate_function_list (lang_gen.c:759)
==18208== by 0x400D92: main (main.c:61)
==18208==
==18208== Invalid write of size 1
==18208== at 0x405095: translate_insn (lang_gen.c:257)
==18208== by 0x405BC6: translate_jump_block (lang_gen.c:444)
==18208== by 0x40706B: translate_func (lang_gen.c:702)
==18208== by 0x4073A7: translate_function_list (lang_gen.c:759)
==18208== by 0x400D92: main (main.c:61)
==18208== Address 0x5561f40 is 0 bytes after a block of size 256 alloc'd
==18208== at 0x4C28730: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==18208== by 0x406FBB: translate_func (lang_gen.c:695)
==18208== by 0x4073A7: translate_function_list (lang_gen.c:759)
==18208== by 0x400D92: main (main.c:61)
==18208==
==18208== Invalid write of size 1
==18208== at 0x4C2BBF3: strcpy (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==18208== by 0x404FFD: translate_insn (lang_gen.c:244)
==18208== by 0x405BC6: translate_jump_block (lang_gen.c:444)
==18208== by 0x40706B: translate_func (lang_gen.c:702)
==18208== by 0x4073A7: translate_function_list (lang_gen.c:759)
==18208== by 0x400D92: main (main.c:61)
==18208== Address 0x5561f40 is 0 bytes after a block of size 256 alloc'd
==18208== at 0x4C28730: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==18208== by 0x406FBB: translate_func (lang_gen.c:695)
==18208== by 0x4073A7: translate_function_list (lang_gen.c:759)
==18208== by 0x400D92: main (main.c:61)
==18208==
==18208== Invalid read of size 1
==18208== at 0x4C2BB94: strlen (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==18208== by 0x40410A: translate_insn (lang_gen.c:44)
==18208== by 0x405BC6: translate_jump_block (lang_gen.c:444)
==18208== by 0x40706B: translate_func (lang_gen.c:702)
==18208== by 0x4073A7: translate_function_list (lang_gen.c:759)
==18208== by 0x400D92: main (main.c:61)
==18208== Address 0x5561f40 is 0 bytes after a block of size 256 alloc'd
==18208== at 0x4C28730: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==18208== by 0x406FBB: translate_func (lang_gen.c:695)
==18208== by 0x4073A7: translate_function_list (lang_gen.c:759)
==18208== by 0x400D92: main (main.c:61)
==18208==
==18208== Conditional jump or move depends on uninitialised value(s)
==18208== at 0x4056EE: translate_jump_block (lang_gen.c:354)
==18208== by 0x40706B: translate_func (lang_gen.c:702)
==18208== by 0x4073A7: translate_function_list (lang_gen.c:759)
==18208== by 0x400D92: main (main.c:61)
==18208==
==18208== Use of uninitialised value of size 8
==18208== at 0x40571E: translate_jump_block (lang_gen.c:356)
==18208== by 0x40706B: translate_func (lang_gen.c:702)
==18208== by 0x4073A7: translate_function_list (lang_gen.c:759)
==18208== by 0x400D92: main (main.c:61)
==18208==
==18208== Conditional jump or move depends on uninitialised value(s)
==18208== at 0x40575B: translate_jump_block (lang_gen.c:359)
==18208== by 0x40706B: translate_func (lang_gen.c:702)
==18208== by 0x4073A7: translate_function_list (lang_gen.c:759)
==18208== by 0x400D92: main (main.c:61)
==18208==
==18208== Conditional jump or move depends on uninitialised value(s)
==18208== at 0x40586D: translate_jump_block (lang_gen.c:383)
==18208== by 0x40706B: translate_func (lang_gen.c:702)
==18208== by 0x4073A7: translate_function_list (lang_gen.c:759)
==18208== by 0x400D92: main (main.c:61)
==18208==
==18208== Use of uninitialised value of size 8
==18208== at 0x40589D: translate_jump_block (lang_gen.c:385)
==18208== by 0x40706B: translate_func (lang_gen.c:702)
==18208== by 0x4073A7: translate_function_list (lang_gen.c:759)
==18208== by 0x400D92: main (main.c:61)
==18208==
==18208== Conditional jump or move depends on uninitialised value(s)
==18208== at 0x4058DA: translate_jump_block (lang_gen.c:388)
==18208== by 0x40706B: translate_func (lang_gen.c:702)
==18208== by 0x4073A7: translate_function_list (lang_gen.c:759)
==18208== by 0x400D92: main (main.c:61)
==18208==
int main (void)
{
register int eax;
int d;
int c;
int b;
int a;
a = 1;
b = 10;
0x80483df:
c = 11;
d = 0;
while (a > 0)
{
a -= 1;
if (b != 0)
{
break;
}
b += 1;
}
do
{
b -= 1;
if (c != 0)
{
continue;
}
c -= 1;
eax = b;
} while (eax >= a);
if (b != 0)
{
if (a != 0)
{
c = 1;
}
else
{
c = 2;
}
}
else
{
if (d == 0)
{
c = 6;
}
}
a = 0;
if (a != 0)
{
a = 2;
}
else
{
a = 3;
}
while (a <= 9)
{
a += 1;
}
a = 11;
eax = c;
if (eax == b)
{
goto 0x80483df;
}
c = 10;
eax = c;
b = eax;
eax = c;
return eax;
}
==18208==
==18208== HEAP SUMMARY:
==18208== in use at exit: 0 bytes in 0 blocks
==18208== total heap usage: 360 allocs, 360 frees, 1,291,607 bytes allocated
==18208==
==18208== All heap blocks were freed -- no leaks are possible
==18208==
==18208== For counts of detected and suppressed errors, rerun with: -v
==18208== Use --track-origins=yes to see where uninitialised values come from
==18208== ERROR SUMMARY: 14 errors from 10 contexts (suppressed: 1 from 1)
Spider disassembler locks up computer and uses as much RAM as possible when fed a program with remotely complicated control flow, e.g. control_flow_test.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.