Git Product home page Git Product logo

gigahorse-toolchain's Introduction

NOTE: you need to clone this repo using the --recursive flag since this repo has submodules, e.g., git clone [email protected]:nevillegrech/gigahorse-toolchain.git --recursive

The Gigahorse binary lifter and toolchain Tweet

A binary lifter (and related framework) from low-level EVM code to a higher-level function-based three-address representation, similar to LLVM IR or Jimple.

Quickstart

Running/Installing Gigahorse from local clone (requires souffle)

First make sure you have the following things installed on your system:

  • Boost libraries (Can be installed on Debian with apt install libboost-all-dev)

  • Python 3.8 (Refer to standard documentation)

  • Souffle 2.3 or 2.4 (We only test using the release versions, later development versions may work but are untested by us. Refer to Souffle documentation. The easiest way to install this is to use the release from https://github.com/souffle-lang/souffle/releases/tag/2.3)

Now install the Souffle custom functors:

cd souffle-addon && make            # builds all, sets libfunctors.so as a link to libsoufflenum.so

You should now be ready to run Gigahorse.

Installing Gigahorse via docker

Alternatively, you can use Gigahorse via our pre-built docker images using the following instructions:

  1. For amd64:

    curl -s -L https://raw.githubusercontent.com/nevillegrech/gigahorse-toolchain/master/scripts/docker/install/install_amd64 | bash
    

    For arm64/m1 (not actively tested):

    curl -s -L https://raw.githubusercontent.com/nevillegrech/gigahorse-toolchain/master/scripts/docker/install/install_arm64 | bash
    
  2. Then source ~/.bashrc

  3. Check if gigahorse is available using gigahorse --help

Running Gigahorse

The gigahorse.py script can be run on a contract individually or on a collection of contract bytecode files in specified directory, and it will run the binary lifter implemented in logic/main.dl on each contract, optionally followed by any additional client analyses specified by the user using the -C flag.

The default pipeline first attempts to decompile a contract using a transactional context-sensitivity configuration. If that times out it performs a second attempt with the scalable-fallback configuration (using a hybrid-precise context sensitivity algorithm, tuned for scalability). In addition, if the default configuration succeeds but produces imprecise output, the precise-fallback configuration (currently the same as the --early_cloning config) is used to attempt to remove that imprecision. Both fallback configurations can be disabled if needed using the --disable_scalable_fallback and --disable_precise_fallback flags respectively.

The Gigahorse pipeline also includes a few rounds of inlining of small functions in order to help the subsequent client libraries get more high-level inferences. The inlining functionality can be disabled with --disable_inline.

The expected file format for each contract is in .hex format.

Example (individual contract):

./gigahorse.py examples/long_running.hex

(For some Souffle versions, you will get an error message regarding the libsoufflenum.so dynamic library, during the first compilation. You can ignore this and gigahorse.py should work upon a re-run.)

Contracts that take too long to analyse will be skipped after a configurable timeout.

The decompilation results are placed in the directory .temp, whereas metadata about the execution, e.g., metrics are placed in a results.json file, as a list of triples in the form:

[filename, properties, flags]

Here, properties is a list of the detected issues with the contract in filename, where any output relations in the datalog files that are non-empty will have their relation name placed in this list. flags is a list indicating auxiliary or exceptional information. It may include "ERROR" and "TIMEOUT", which are self-explanatory.

gigahorse.py --help for invocation instructions.

Example (with client analysis):

./gigahorse.py  -j <number of jobs> -C clients/visualizeout.py <contracts>

(The clients following the -C flag can be a comma-separated list, with no spaces, of path-reachable or fully-qualified filenames.)

Gigahorse can also be used in "bulk analysis" mode, by replacing by a directory filled with contracts.

For additional instructions in tuning the Gigahorse framework see Advanced.md.

Textual representation of the lifted IR

Client analysis clients/visualizeout.py can be used to provide a pretty-printed textual representation of the IR produced by Gigahorse. The pretty-printed text file is named contract.tac and will be placed in the out/ folder for each analyzed contract. For example the output for ./gigahorse.py -C clients/visualizeout.py examples/long_running.hex will be placed in .temp/long_running/out/contract.tac.

A block visualized in contract.tac looks like:

    Begin block 0x3e
    prev=[0xb], succ=[0x10ee, 0x49]
    =================================
    0x3f: v3f(0xf42fdfb) = CONST 
    0x44: v44 = EQ v3f(0xf42fdfb), v32
    0x10c7: v10c7(0x10ee) = CONST 
    0x10c8: JUMPI v10c7(0x10ee), v44

Keep in mind that the pretty-printed variable identifiers do not correspond to their identifiers in the underlying datalog facts.

Writing client analyses

Client analyses can be written in any language by reading the relational files that are written by the decompilation step (main.dl). This framework however provides preferential treatment for clients written in Datalog. The most notable example of client analysis for the Gigahorse framework is MadMax. This uses several of the "analysis client libraries" under clientlib. These libraries include customizable dataflow analysis, memory modeling, data structure reconstruction and others.

A common template for client analyses for decompiled bytecode is to create souffle datalog file that includes clientlib/decompiler_imports.dl, for instance:

#include "clientlib/decompiler_imports.dl"

.output ...

Uses of Gigahorse

The Gigahorse toolchain was originally published as:

  • Grech, N., Brent, L., Scholz, B., Smaragdakis, Y. (2019), Gigahorse: Thorough, Declarative Decompilation of Smart Contracts. In 41st ACM/IEEE International Conference on Software Engineering.

Several novel developments to Gigahorse after the original publication have been published as:

  • Grech, N., Lagouvardos, S., Tsatiris, I., Smaragdakis, Y. (2022), Elipmoc: Advanced Decompilation of Ethereum Smart Contracts Proceedings of the ACM in Programming Languages (OOPSLA).

In addition, other research tools have been developed on top of Gigahorse, including:

  • Grech, N., Kong, M., Jurisevic, A., Brent, L., Scholz, B., Smaragdakis, Y. (2018), MadMax: Surviving Out-of-Gas Conditions in Ethereum Smart Contracts. Proceedings of the ACM on Programming Languages (OOPSLA).

  • Brent, L., Grech, N., Lagouvardos, S., Scholz, B., Smaragdakis, Y. (2020), Ethainter: A Smart Contract Security Analyzer for Composite Vulnerabilities. In 41st ACM SIGPLAN Conference on Programming Language Design and Implementation.

  • Lagouvardos, S., Grech, N., Tsatiris, I., Smaragdakis, Y. (2020) Precise Static Modelling of Ethereum "Memory". Proceedings of the ACM in Programming Languages (OOPSLA).

  • Grech, N., Kong, M., Jurisevic, A., Brent, L., Scholz, B., Smaragdakis, Y. (2020), Analyzing the Out-of-Gas World of Smart Contracts. Communications of the ACM.

  • Smaragdakis, Y., Grech, N., Lagouvardos, S., Triantafyllou, K., Tsatiris, I. (2021), Symbolic Value-Flow Static Analysis: Deep, Precise, Complete Modeling of Ethereum Smart Contracts. Proceedings of the ACM in Programming Languages (OOPSLA).

The Gigahorse framework also underpins the realtime decompiler and analysis tool at contract-library.com.

gigahorse-toolchain's People

Contributors

0xl3x1 avatar alkorang avatar andripol avatar b-scholz avatar darkn0mad avatar gfour avatar giokamarkella avatar hamishivi avatar hikarishine avatar ibollanos avatar iliastsa avatar latiosu avatar mrexodia avatar navyera avatar nevillegrech avatar omahs avatar pandadefi avatar samuelmarks avatar sifislag avatar snf avatar trv-cs avatar yanniss avatar zyzek avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

gigahorse-toolchain's Issues

why stop compilation

why stop compilation
failed to compile C++ sources
Process Process-3:
Traceback (most recent call last):
File "/usr/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap
self.run()
File "/usr/lib/python3.10/multiprocessing/process.py", line 108, in run
self._target(*self._args, **self._kwargs)
File "/root/work/gig/gigahorse-toolchain/./gigahorse.py", line 337, in compile_datalog
assert not(process.returncode), f"Compilation for {spec} failed. Stopping."
AssertionError: Compilation for /root/work/gig/gigahorse-toolchain/clientlib/function_inliner.dl failed. Stopping.
c++: fatal error: Killed signal terminated program cc1plus
compilation terminated.

Possible missing events from output

I tried using gigahorse to decompile contract 0x85865A86502ffB25a1cb3BBAc809d814FEB614e9 and only got a single event in the output. According to etherscan, there should be 4 events.

I also tried using different -M and -cd settings but the result didn’t change.

Any pointers on how to use the tool to get all 4 events?

Thank you.

Indirect Jump In TAC

Is there an indirect jump semantic in the TAC format produced by Gigahorse?

The syntax of a jump operation is JUMP $var. Although it seems like the $var can be a runtime computed result, from what I observe so far, $var is always a constant in the TAC format. (Same for JUMPI)

I have this concern because in some cases, a transaction can fail if the code is trying to jump to a destination that is not marked by a JUMPDEST.
For example, this contract with some particular input can trigger an ErrInvalidJump exception in go-ethereum, but from the TAC output, it is pretty obvious that such error can never occur because jumps are all static and they all jump to legal destinations.

So I want to ask, does Gigahorse concern about indirect jump when dissembling the source code?

Result differs when running Gigahorse manually

I tried to run the Gigahorse pipeline manually with the instructions given in readme (with MaxContextDepth.csv set to 8)

I then compare them with the results generated from gigahorse.py using diff -qr out .temp/long_running/out | sort and notice

  1. Some files are different. E.g.
Files out/ActualReturnArgs.csv and .temp/long_running/out/ActualReturnArgs.csv differ
Files out/FormalArgs.csv and .temp/long_running/out/FormalArgs.csv differ
Files out/Function.csv and .temp/long_running/out/Function.csv differ
...
  1. There are two extra files produced from running manually
Only in out: Fail.csv
Only in out: Helper_Zeros.csv

Fail.csv is an empty file.
Helper_Zeros.csv contains:

	0
0	1
00	2
000	3
0000	4
00000	5
000000	6
0000000	7
00000000	8
000000000	9
0000000000	10

  1. Some files only exists when running from gigahorse.py
Only in .temp/long_running/out: Verbatim_CDLAllVSStaticVSArr.csv
Only in .temp/long_running/out: Verbatim_MemConsStmtsLengths.csv
...

I do not see any error or warning when running the pipeline manually. Could it simply be a misconfiguration? Let me know if you need the full log.

address

image Observe the following two blocks, why the display jumps to oxbc, but 0xbcB0x59 is written on the next block, what is the meaning?

Pretty-printed output format?

Hi there--I've been familiarizing myself with gigahorse and haven't been able to find out how to enable the type of analysis described in the paper:

Gigahorse produces output both in structured form (i.e., tables, exported as CSV files) and in pretty-printed text.

I can successfully generate the tables/CSVs but haven't been able to figure out how to generate the pretty-printed text output format. I'm hoping to produce output similar to contract-library.com to aid in my own analysis (the contracts in question are on the Matic sidechain unfortunately or I'd just use contract-library itself).

Appreciate the work that's been put into gigahorse; I've currently been limping by with panoramix.

Why is the compiled file incomplete?

My souffle version is 2.1, and the file opened after running command "gigahorse-toolchain/gigahorse.py -C detector2.dl examples/long_running.hex" has very little content. The result is as follows:
image
image

Why not get complete data? Looking forward to your reply, thank you very much!

Can't compile contract

I install it properly but it throws an error during compile:

#34 0.355 Compiling /home/reviewer/gigahorse-toolchain/logic/main.dl to C++ program and executable
#34 0.360 Compiling /home/reviewer/gigahorse-toolchain/clientlib/function_inliner.dl to C++ program and executable
#34 403.6 compiler error: cannot compile source file /home/reviewer/gigahorse-toolchain/cache/69422d6108360eb40049666acc113122.cpp
#34 403.6 g++ -std=c++17   -fwrapv  -DUSE_NCURSES  -O3 -DUSE_LIBZ  -DUSE_SQLITE  -fopenmp  -march=native  -std=c++17 -Wdate-time -D_FORTIFY_SOURCE=2 -o/home/reviewer/gigahorse-toolchain/cache/69422d6108360eb40049666acc113122 /home/reviewer/gigahorse-toolchain/cache/69422d6108360eb40049666acc113122.cpp  -ldl -lpthread -lsqlite3 -lz -lncurses -lfunctors  -I/usr/bin/../include -I/usr/bin/include 
#34 403.6 g++: internal compiler error: Killed (program cc1plus)
#34 403.6 Please submit a full bug report,
#34 403.6 with preprocessed source if appropriate.
#34 403.6 See <file:///usr/share/doc/gcc-7/README.Bugs> for instructions.
#34 403.7 failed to compile C++ source </home/reviewer/gigahorse-toolchain/cache/69422d6108360eb40049666acc113122.cpp>
#34 403.9 Process Process-2:
#34 403.9 Traceback (most recent call last):
#34 403.9   File "/usr/lib/python3.8/multiprocessing/process.py", line 313, in _bootstrap
#34 403.9     self.run()
#34 403.9   File "/usr/lib/python3.8/multiprocessing/process.py", line 108, in run
#34 403.9     self._target(*self._args, **self._kwargs)
#34 403.9   File "/home/reviewer/gigahorse-toolchain/gigahorse.py", line 287, in compile_datalog
#34 403.9     assert not(process.returncode), "Compilation failed. Stopping."
#34 403.9 AssertionError: Compilation failed. Stopping.
#34 500.4 Traceback (most recent call last):
#34 500.4   File "/home/reviewer/gigahorse-toolchain/gigahorse.py", line 542, in <module>
#34 500.4     open(v, 'r') # check program exists
#34 500.4 FileNotFoundError: [Errno 2] No such file or directory: '/home/reviewer/gigahorse-toolchain/clientlib/function_inliner.dl_compiled'

Pathological contract example

This contract never finishes decompilation: 0xaeb960ed44c8a4ce848c50ef451f472a503456b2

In my attempts with gigahorse the process is eventually killed after consuming all memory.

It's one of the Sorare contracts, currently containing $3m worth of ETH. I believe it's the contract used for purchasing cards on their platform.

By comparison, Panoramix gives a partial decompilation but fails on function 223da1ba.

cannot run the example

I run the example, and some error happened as following:

root@cj-YangTianM4000e-17:/home/cj/Work/PONZI/gigahorse-toolchain# ./gigahorse.py examples/long_running.hex
Compiling /home/cj/Work/PONZI/gigahorse-toolchain/logic/main.dl to C++ program and executable
Compiling /home/cj/Work/PONZI/gigahorse-toolchain/clientlib/function_inliner.dl to C++ program and executable
Error: syntax error, unexpected <, expecting end of file in unknown source location.
1 errors generated, evaluation aborted
Process Process-2:
Traceback (most recent call last):
File "/usr/lib/python3.8/multiprocessing/process.py", line 315, in _bootstrap
self.run()
File "/usr/lib/python3.8/multiprocessing/process.py", line 108, in run
self._target(*self._args, **self._kwargs)
File "./gigahorse.py", line 287, in compile_datalog
assert not(process.returncode), "Compilation failed. Stopping."
AssertionError: Compilation failed. Stopping.
Error: syntax error, unexpected (, expecting . in unknown source location.
1 errors generated, evaluation aborted
Process Process-1:
Traceback (most recent call last):
File "/usr/lib/python3.8/multiprocessing/process.py", line 315, in _bootstrap
self.run()
File "/usr/lib/python3.8/multiprocessing/process.py", line 108, in run
self._target(*self._args, **self._kwargs)
File "./gigahorse.py", line 287, in compile_datalog
assert not(process.returncode), "Compilation failed. Stopping."
AssertionError: Compilation failed. Stopping.
Traceback (most recent call last):
File "./gigahorse.py", line 542, in
open(v, 'r') # check program exists
FileNotFoundError: [Errno 2] No such file or directory: '/home/cj/Work/PONZI/gigahorse-toolchain/logic/main.dl_compiled'

PHI insetion in the TAC representation.

I tried to perform some kind of out-of-SSA translation on the output TAC representation.
I found some of the PHI operations seem to be ill-formed.

Take long_running.hex as an example, in the TAC output, the function function getTransactionCount(bool,bool)() public contains a block:

    Begin block 0x73aB0x210
    prev=[0x739B0x210], succ=[0x740B0x210, 0x746B0x210]
    =================================
    0x73a_0x00x210: v73a_0V210 = PHI v218, v21d, v714V210, v738V210
    0x73b0x210: v73bV210 = ISZERO v73a_0V210
    0x73c0x210: v73cV210(0x746) = CONST 
    0x73f0x210: JUMPI v73cV210(0x746), v73bV210

Here, PHI takes four possible variables but has only one predecessor. This already confuses me as it is not at a join node (which means it can't be a dominance frontier). But let's move on and look at its predecessor, which is an empty block with three incoming edges.

    Begin block 0x739B0x210
    prev=[0x715B0x210, 0x71bB0x210, 0x723B0x210], succ=[0x73aB0x210]
    =================================

I have no clue of why this block is empty, but given three incoming edges, there is no enough information to decode the PHI operation.

The strongest evidence shows that there is something wrong with the PHI insertion is that two of the variables in the PHI operation v218, v21d are actually defined in the same block:

    Begin block 0x210
    prev=[0x205], succ=[0x6e8B0x210]
    =================================
    0x211: v211(0x14d) = CONST 
    0x214: v214(0x4) = CONST 
    0x216: v216 = CALLDATALOAD v214(0x4)
    0x217: v217 = ISZERO v216
    0x218: v218 = ISZERO v217
    0x219: v219(0x24) = CONST 
    0x21b: v21b = CALLDATALOAD v219(0x24)
    0x21c: v21c = ISZERO v21b
    0x21d: v21d = ISZERO v21c
    0x21e: v21e(0x6e8) = CONST 
    0x221: JUMP v21e(0x6e8)

Any help would be appreciated!

Cannot compile on Mac

Ran

brew install boost
brew install --HEAD souffle-lang/souffle/souffle
git clone --recurse-submodules --remote-submodules https://github.com/nevillegrech/gigahorse-toolchain
cd gigahorse-toolchain
cd souffle-addon
make
cd ..
./gigahorse.py examples/long_running.hex

and get the error

    raise RuntimeError('''
RuntimeError: 
        An attempt has been made to start a new process before the
        current process has finished its bootstrapping phase.

        This probably means that you are not using fork to start your
        child processes and you have forgotten to use the proper idiom
        in the main module:

            if __name__ == '__main__':
                freeze_support()
                ...

        The "freeze_support()" line can be omitted if the program
        is not going to be frozen to produce an executable.
Traceback (most recent call last):
  File "./gigahorse.py", line 550, in <module>
    open(v, 'r') # check program exists
FileNotFoundError: [Errno 2] No such file or directory: '.../gigahorse-toolchain/logic/main.dl_compiled'

When experimenting with example, the following error was reported

When experimenting with example, the following error was reported
,help

root@hecs-214309:~/work/gig/gigahorse-toolchain# ./gigahorse.py examples/long_running.hex
Compiling /root/work/gig/gigahorse-toolchain/logic/fallback_precise.dl to C++ program and executable
Compiling /root/work/gig/gigahorse-toolchain/logic/main.dl to C++ program and executable
Compiling /root/work/gig/gigahorse-toolchain/logic/fallback_scalable.dl to C++ program and executable
Compiling /root/work/gig/gigahorse-toolchain/clientlib/function_inliner.dl to C++ program and executable
Warning: No rules/facts defined for relation global.sens.MergeContextResponse in abstract_context.dl:14:9
Error: Ignored execution plan for non-recursive clause in local.dl:605:12
Warning: No rules/facts defined for relation revertCloner.Prev_Block_OriginalBlock in statement_insertor.dl:246:9
1 errors generated, evaluation aborted
Process Process-2:
Traceback (most recent call last):
File "/usr/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap
self.run()
File "/usr/lib/python3.10/multiprocessing/process.py", line 108, in run
self._target(*self._args, **self._kwargs)
File "/root/work/gig/gigahorse-toolchain/./gigahorse.py", line 342, in compile_datalog
assert not(process.returncode), f"Compilation for {spec} failed. Stopping."
AssertionError: Compilation for /root/work/gig/gigahorse-toolchain/logic/fallback_scalable.dl failed. Stopping.
Warning: No rules/facts defined for relation global.sens.MergeContextResponse in abstract_context.dl:14:9
Error: Ignored execution plan for non-recursive clause in local.dl:605:12
Warning: No rules/facts defined for relation revertCloner.Prev_Block_OriginalBlock in statement_insertor.dl:246:9
1 errors generated, evaluation aborted
Process Process-1:
Traceback (most recent call last):
File "/usr/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap
self.run()

The "FormalArgs" relation missing some of the facts.

Breif Introduction

I am trying to add a custom data flow analysis module. However, during developing, I notice that some of the facts of the relation FormalArgs(func:IRFunction, var:TACVariable, n:number) is missing. I wonder if this is on purpose? If not, maybe the reproduction script will help to fix it.

Bug Reproduction

  • using the runtime bytecode of the wEth contract and store it in example/wEth.hex.
  • adding the following datalog script in logic/decompiler_output.dl
.decl NewFormalArgs(func:IRFunction, var:TACVariable, n:number)
.output NewFormalArgs

NewFormalArgs(func, tacVar, n) :-
  FunctionArgument(func, n, var),
  Variable_String(var, var_rep),
  tacVar = as(var_rep, TACVariable).
  • using the following script to run gigahorse
./gigahorse.py examples/wEth.hex
  • checking the difference between .temp/wEth/FormalArgs.csv and .temp/wEth/NewFormalArgs.csv. We shall see the following result:
## FormalArgs
0x4dd	0x4ddarg0x0	0
0x68c	0x68carg0x2	2
0x68c	0x68carg0x0	0
0x68c	0x68carg0x1	1
0x68c	0x68carg0x3	3
0xb30	0xb30arg0x0	0
## NewFormalArgs
0x440	0x440arg0x0	0
0x4dd	0x4ddarg0x0	0
0x68c	0x68carg0x0	0
0x68c	0x68carg0x1	1
0x68c	0x68carg0x2	2
0x68c	0x68carg0x3	3
0xb30	0xb30arg0x0	0
0xbce	0xbcearg0x0	0
0xbce	0xbcearg0x1	1
0xbce	0xbcearg0x2	2

The fallback function is not separated.

I wanted to use gigahorse to identify the proxy call instructions in the fallback function, but found that the fallback functions of some contracts were not separated, and the logic of these fallback functions was in the function selector. This was a hindrance to what I wanted to do. The example contract below fits this scenario.

// SPDX-License-Identifier: MIT
pragma solidity ^0.8.0;
contract ReceiveEther {
    uint public a = 0;
    uint public b = 0;
   
    fallback() external payable {
        a += 1;
        emit Received(msg.sender, msg.value, a);
    }

    receive() external payable {
        b += 1;
        emit Received(msg.sender, msg.value, b);
    }
}

The following is the decompilation result of the example contract.

function __function_selector__() public {
    ...
    // The logic of fallback function.
    Begin block 0x93
    prev=[0x33, 0x37], succ=[0xa4]
    =================================
    0x94: v94(0x1) = CONST 
    0x96: v96(0x0) = CONST 
    0x9a: v9a = SLOAD v96(0x0) 
    0x9b: v9b(0xa4) = CONST 
    0xa0: va0(0x1af) = CONST 
    0xa3: va3_0 = CALLPRIVATE va0(0x1af), v9a, v94(0x1), v9b(0xa4)
    ...
}

The "freeze_support()" line can be omitted if the program

I ran the code but encountered several problems, may I ask if anyone has encountered this problem?

  1. This probably means that you are not using fork to start your child processes and you have forgotten to use the proper idiom in the main module:
    if name == 'main':
    freeze_support()
    ...
    The "freeze_support()" line can be omitted if the program is not going to be frozen to produce an executable.
  2. Traceback (most recent call last):
    File "/Users/a/Downloads/gigahorse-toolchain/souffle-addon/../gigahorse.py", line 673, in
    open(get_souffle_executable_path(compile_args[0]), 'r') # check program exists
    FileNotFoundError: [Errno 2] No such file or directory: '/Users/a/Downloads/gigahorse-toolchain/cache/main.dl_compiled'

Can detection rules be automatically generated?

I have always had a question: your method is to write vulnerability detection rules based on vulnerability characteristics, which will lead to too strong subjective factors. So, do we have any way to automatically generate detection rules based on the characteristics of the vulnerability? Or is this a disadvantage of Datalog in the first place?
Looking forward to your reply. Thanks!

Add CI event listener

This is related to issue 655 raised by the contract library project where there is a need to get updated as a contract goes through analysis

The compilation time is so long

I write a datalog rule by myself. It works well, but every time I want to add or modify some rules, it takes so long for the compilation. Is there any way I can reduce the compilation time?

A problem of "souffle-addon"?

Hello! I have installed souffle-2.3 and downloaded gigahorse, but then I entered the "souffle-addon" and executed the command "make -j8", and the following error occurred:
image
image
What is the reason for this problem? Looking forward to your reply. Thanks!

Function call not inferred

On this contract the biggest part of the body of the transferFrom() function is not part of the decompiled output. It is the 0x1a37() function but the call at block 0xda9 is not recognized.

Did some digging and found that a fact for IsFunctionCallReturn is produced but no IRFunctionCallReturn. This is because the callerret block is not deduced to be part of the caller function.

Relevant code of rule that fails to produce a fact:

// Be strict in our precision demands for recognizing a call
IRFunction_Return(irfunc, retir),
IRFunctionCallReturn(callerir, irfunc, callerretir),
IRFunction_Return_Edge(retir, callerretir) :-
IsFunctionCallReturn(ctx, caller, func, _, ret, callerret), //REVIEW: take advantage of retCtx
MaybeInFunctionUnderContext(ctx, caller, prevfunc),
Block_IRBlock(ret, func, retir),
Block_IRBlock(caller, prevfunc, callerir),
Block_IRBlock(callerret, prevfunc, callerretir),
Function_IRFunction(func, irfunc).

Analytics returning 0 for all outputs

I'm currently running gigahorse via docker (ubuntu 22.04 souffle 2.2) and getting 0 for all outputs under Analytics regardless of bytecodes I use, even the example:

--------------------------------------------------------------------------------
Analytics
--------------------------------------------------------------------------------
  Analytics_CallToSignature: 0
  Analytics_CallWithSig: 0
  Analytics_DataFlows: 0
  Analytics_DataFlowsToDifferentPublicFunction: 0
  Analytics_ERC20ApproveCall: 0
  Analytics_ERC20TransferCall: 0
  Analytics_ERC20TransferFromCall: 0
  Analytics_EventSignature: 0
  Analytics_LargestFunction: 0
  Analytics_LenVarHasManyVars: 0
  Analytics_MemConsStmtUsesFromDistinctPaths: 0
  Analytics_MissingDataFlowsToSamePublicFunction: 0
  Analytics_NonModeledMLOAD: 0
  Analytics_NonModeledMSTORE: 0
  Analytics_PolymorphicVariable: 0
  Analytics_PublicFunctionArg: 0
  Analytics_PublicFunctionArrayArg: 0
  Analytics_Variables: 0
  bytecode_size: 3644

The program still managed to capture the bytecode size differences and generate outputs for opcodes. However, files such as visualizeout.py will not run since it's missing the necessary reference files.

Flow analysis + memory modeling

Hi, first I'd like to thank you for making gigahorse open source. It's an amazing research project and tool. This is not a concrete issue but more of a brain dump with which am looking for ideas.

I've been using gigahorse to write rules to experiment with known bugs and found a limitation at the time of making flow analysis (both using clientlib and symvalic).

To me it looks like the main issue is that the analysis passes don't tend to make a lot of use of the memory modeling module so data flowing through the memory gets "lost". Even after adding relations to improve the memory modeling (specially around MemoryCopyLoop) for the latest solidity versions (0.7, 0.8), the flow analysis tend to be limited to statement uses and defines.

So my question is what do you think it would be a good way to augment the flow analysis passes with memory modeling? I was thinking of adding more more "Array/Tuple/Memory Contents" relations to DataFlows/VarMayBe but I may be missing context on why this is the way it is.

And the other is, have you considered or have thoughts about making a "de-aliasing" pass? I haven't given it too much thought but it looks to me like something as basic as merging aliases with MLOADSFreePtrUnchanged would normalize all the array/tuple access and make the other passes much easier to follow (I'm aware of the symbolic array equivalence but don't think that's used for flow anal). On the other hand, I think, some information could be lost in the process (RawDataArray -> AbiArray, etc) or completely fail if the modeling is inaccurate.

Also, do you hang out in any messaging platform (Discord or other) to discuss gigahorse and evm/solidity?

Encountered a problem when installing the Souffle custom functors

Hey, I am trying to install Gigahorse from a local clone by following the instructions in the README file, but encountered make: *** [num256.o] Error 1 error. The souffle version is 2.3 and cpp_int.hpp exists in boost, so I am not sure why it happened and if it is the same reason caused #67.
I am using a Macbook with Apple M1 Pro chips, the operating system is macOS Monterey 12.4. It encountered the same error when I tried to install it on ssh with Ubuntu 20.04.6 LTS.
Any suggestion is appreciated, thanks in advance!

➜  souffle-addon git:(b2ef95b) make
g++ -std=c++17 -O2 num256.cpp -c -fPIC -o num256.o
num256.cpp:3:10: fatal error: 'boost/multiprecision/cpp_int.hpp' file not found
#include <boost/multiprecision/cpp_int.hpp>
         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1 error generated.
make: *** [num256.o] Error 1
➜  souffle-addon git:(b2ef95b) souffle --version
----------------------------------------------------------------------------
Version: 2.3
----------------------------------------------------------------------------
Copyright (c) 2016-22 The Souffle Developers.
Copyright (c) 2013-16 Oracle and/or its affiliates.
All rights reserved.
============================================================================
➜  multiprecision git:(stable) pwd
/opt/homebrew/Cellar/boost/1.82.0_1/include/boost/multiprecision
➜  multiprecision git:(stable) ls
complex128.hpp       debug_adaptor.hpp    mpc.hpp
complex_adaptor.hpp  detail               mpfi.hpp
concepts             eigen.hpp            mpfr.hpp
cpp_bin_float        float128.hpp         number.hpp
cpp_bin_float.hpp    fwd.hpp              random.hpp
cpp_complex.hpp      gmp.hpp              rational_adaptor.hpp
cpp_dec_float.hpp    integer.hpp          tommath.hpp
cpp_int              logged_adaptor.hpp   traits
cpp_int.hpp          miller_rabin.hpp

Need some constant folding in `GlobalAnalysis`

We currently only perform constant folding at local.dl:

https://github.com/nevillegrech/gigahorse-toolchain/blob/2c3994b1390aba93ee87922d35bb1acce618ba42/logic/local.dl#L118C1-L135C56

There are cases in which this isn't enough. Relevant snippet:

    Begin block 0x4aaa0xb67
    prev=[0x39db0xb67], succ=[0x4eb10xb67]
    =================================
    0x4aab0xb67: vb674aab(0x4abe) = CONST 
    0x4ab10xb67: vb674ab1(0x4e60) = CONST 
    0x4ab40xb67: vb674ab4(0x555f) = CONST 
    0x4ab70xb67: vb674ab7(0x5586) = CONST 
    0x4aba0xb67: vb674aba(0x4eb1) = CONST 
    0x4abd0xb67: JUMP vb674aba(0x4eb1)

    Begin block 0x4eb10xb67
    prev=[0x4aaa0xb67], succ=[]
    =================================
    0x4eb20xb67: vb674eb2(0x4ebf) = CONST 
    0x4eb80xb67: vb674eb8(0xffffffff) = CONST 
    0x4ebd0xb67: vb674ebd = AND vb674eb8(0xffffffff), vb674ab1(0x4e60)
    0x4ebe0xb67: JUMP vb674ebd

Will tackle soon, will have some difficulty.

no such file or directory decompiler_compiled

Sorry to bother you,when I install as instructed,there is no mistake.But when I run the command ./gigahorse.py examples/long_running.hex,there is a mistake called compile error and no such file or directory decompiler_compiled.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.