Git Product home page Git Product logo

scratchablock's Introduction

Q: Why is there need for yet another decompiler, especially a crippled one?

A: A sad truth is that most decompilers out there are crippled. Many aren't able to decompile trivial constructs, others can't decompile more advanced, those which seemingly can deal with them, are crippled by supporting only the boring architectures and OSes. And almost every written in such a way that tweaking it or adding a new architecture is complicated. A decompiler is a tool for reverse engineering, but ironically, if you want to use a typical decompiler productively or make it suit your needs, first you will need to reverse-engineer the decompiler itself, and that can easily take months (or years).

How ScratchABlock is different?

The central part of a decompiler (and any program transformation framework) is Intermediate Representation (IR). A decompiler should work on IR, and should take it as an input, and conversion of a particular architecture's assembler to this IR should be well decoupled from a decompiler, or otherwise it takes extraordinary effort to add support for another architecture (which in turn limits userbase of a decompiler).

Decompilation is a complex task, so there should be easy insight into the decompilation process. This means that IR used by a decompiler should be human-friendly, for example use a syntax familiar to programmers, map as directly as possible to a typical machine assembler, etc.

The requirements above should be quite obvious on their own. If not, they can be learnt from the books on the matter, e.g.:

"The compiler writer also needs mechanisms that let humans examine the IR program easily and directly. Self-interest should ensure that compiler writers pay heed to this last point."

(Keith Cooper, Linda Torczon, "Engineering a Compiler")

However, decompiler projects, including OpenSource ones, routinely violate these requirements: they are tightly coupled with specific machine architectures, don't allow to feed IR in, and oftentimes don't expose or document it to user at all.

ScratchABlock is an attempt to say "no" to such practices and develop a decompilation framework based on the requirements above. Note that ScratchABlock can be considered a learning/research project, and beyond good intentions and criticism of other projects, may not offer too much to a casual user - currently, or possibly at all. It can certainly be criticised in many aspects too.

Down to Earth part

ScratchABlock is released under the terms of GNU General Public License v3 (GPLv3).

ScratchABlock is written in Python3 language, and tested with version 3.3 and up, though may work with 3.2 or lower too (won't work with legacy Python2 versions). For unit testing, "nose" package is required: https://pypi.python.org/pypi/nose (expected executable name is nosetests3).

Source code and interfacing scripts are in the root of the repository. The most important scripts are:

  • apply_xform.py - A central driver, allows to apply a sequence of transformations (or in general, a high-level analysis/transformation algorithm) to a single file or a directory of files.

  • run_tests - The regregression testsuite runner. The majority of testsuite is high-level, consisting of running apply_xform.py with different passes on file(s) and checking the expected results.

Other subdirectories of the repository:

  • tests_unit - Classical unit tests for Python modules, written in Python.

  • tests - The main testsuite. While integrational in the nature, it usually tests one pass on one simple file, so follows the unit testing philosophy. Tests are represented as PseudoC input files, while expected results - as PseudoC with basic blocks annotation and (where applicable) CFG in .dot format. Looking at these testcases, trying to modify them and seeing the outcome is the best way to learn how ScratchABlock works.

  • docs - A growing collection of documentation. Currently there's a specification of the PseudoC assembler language serving as intermediate representation (IR) for ScratchABlock and a survey why another existing IR was not selected.

The current approach of ScratchABlock is to grow a collection of relatively loosely-coupled algorithms for program analysis and transformation, have them covered with tests, and allow easy user access to them. The magic of decompilation consists of applying these algorithms in the rights order and right number of times. Then, to improve performance of the decompilation, the passes usually require more tight coupling. Exploring those directions is the next priority after implementing inventory of the passes as described above.

Algorithms and transformations implemented by ScratchABlock:

  • Graph algorithms:

    • Construction and querying (predecessors, successors, etc.)
    • Traversal (depth first search (DFS), postorder)
    • Dominator tree
    • Node splitting
  • Data flow analysis:

    • Generic iterative dataflow algorithm framework
    • Dominator tree
    • Reaching definitions
    • Live variables
    • Building def-use chains
  • Propagation:

    • Constant
    • Copy
    • Memory references
    • Expressions
  • Dead code elimination (DCE)

  • Rewriting:

    • Of stack variables
    • Of structure fields (TODO)
    • Devirtualization (TODO)
  • Control flow structuring:

    • Removal of jumps-over-jumps
    • Single exit
    • Loop single landing site
    • if/if-else/if-elif-else ladders
    • Control-flow "and" (if (a && b))
    • Abnormal selection via node splitting
    • while/do-while/infinite loops
    • Generic loop structuring (TODO)
    • Unreachable basic blocks elimination (TODO)
  • Output formats:

    • PseudoC
    • PseudoC with annotated basic blocks
    • C
    • .dot (for control flow (CFGs) and other graphs)
    • YAML (for function properties database)

ScratchABlock's partner tool is ScratchABit, which is an interactive disassemler intended to perform the lowest-level tasks of decompilation process, like separation of code from data, and identifying function boundaries. ScratchABit produces a PseudoC output (subject to plugin availability for a particular CPU architecture), which can serve as input to ScratchABlock.

scratchablock's People

Contributors

mewmew avatar pfalcon avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.