nevillegrech / gigahorse-toolchain Goto Github PK
View Code? Open in Web Editor NEWA binary lifter and analysis framework for Ethereum smart contracts
License: Other
A binary lifter and analysis framework for Ethereum smart contracts
License: Other
why stop compilation
failed to compile C++ sources
Process Process-3:
Traceback (most recent call last):
File "/usr/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap
self.run()
File "/usr/lib/python3.10/multiprocessing/process.py", line 108, in run
self._target(*self._args, **self._kwargs)
File "/root/work/gig/gigahorse-toolchain/./gigahorse.py", line 337, in compile_datalog
assert not(process.returncode), f"Compilation for {spec} failed. Stopping."
AssertionError: Compilation for /root/work/gig/gigahorse-toolchain/clientlib/function_inliner.dl failed. Stopping.
c++: fatal error: Killed signal terminated program cc1plus
compilation terminated.
Ran
brew install boost
brew install --HEAD souffle-lang/souffle/souffle
git clone --recurse-submodules --remote-submodules https://github.com/nevillegrech/gigahorse-toolchain
cd gigahorse-toolchain
cd souffle-addon
make
cd ..
./gigahorse.py examples/long_running.hex
and get the error
raise RuntimeError('''
RuntimeError:
An attempt has been made to start a new process before the
current process has finished its bootstrapping phase.
This probably means that you are not using fork to start your
child processes and you have forgotten to use the proper idiom
in the main module:
if __name__ == '__main__':
freeze_support()
...
The "freeze_support()" line can be omitted if the program
is not going to be frozen to produce an executable.
Traceback (most recent call last):
File "./gigahorse.py", line 550, in <module>
open(v, 'r') # check program exists
FileNotFoundError: [Errno 2] No such file or directory: '.../gigahorse-toolchain/logic/main.dl_compiled'
This contract never finishes decompilation: 0xaeb960ed44c8a4ce848c50ef451f472a503456b2
In my attempts with gigahorse the process is eventually killed after consuming all memory.
It's one of the Sorare contracts, currently containing $3m worth of ETH. I believe it's the contract used for purchasing cards on their platform.
By comparison, Panoramix gives a partial decompilation but fails on function 223da1ba
.
We currently only perform constant folding at local.dl
:
There are cases in which this isn't enough. Relevant snippet:
Begin block 0x4aaa0xb67
prev=[0x39db0xb67], succ=[0x4eb10xb67]
=================================
0x4aab0xb67: vb674aab(0x4abe) = CONST
0x4ab10xb67: vb674ab1(0x4e60) = CONST
0x4ab40xb67: vb674ab4(0x555f) = CONST
0x4ab70xb67: vb674ab7(0x5586) = CONST
0x4aba0xb67: vb674aba(0x4eb1) = CONST
0x4abd0xb67: JUMP vb674aba(0x4eb1)
Begin block 0x4eb10xb67
prev=[0x4aaa0xb67], succ=[]
=================================
0x4eb20xb67: vb674eb2(0x4ebf) = CONST
0x4eb80xb67: vb674eb8(0xffffffff) = CONST
0x4ebd0xb67: vb674ebd = AND vb674eb8(0xffffffff), vb674ab1(0x4e60)
0x4ebe0xb67: JUMP vb674ebd
Will tackle soon, will have some difficulty.
I run the example, and some error happened as following:
root@cj-YangTianM4000e-17:/home/cj/Work/PONZI/gigahorse-toolchain# ./gigahorse.py examples/long_running.hex
Compiling /home/cj/Work/PONZI/gigahorse-toolchain/logic/main.dl to C++ program and executable
Compiling /home/cj/Work/PONZI/gigahorse-toolchain/clientlib/function_inliner.dl to C++ program and executable
Error: syntax error, unexpected <, expecting end of file in unknown source location.
1 errors generated, evaluation aborted
Process Process-2:
Traceback (most recent call last):
File "/usr/lib/python3.8/multiprocessing/process.py", line 315, in _bootstrap
self.run()
File "/usr/lib/python3.8/multiprocessing/process.py", line 108, in run
self._target(*self._args, **self._kwargs)
File "./gigahorse.py", line 287, in compile_datalog
assert not(process.returncode), "Compilation failed. Stopping."
AssertionError: Compilation failed. Stopping.
Error: syntax error, unexpected (, expecting . in unknown source location.
1 errors generated, evaluation aborted
Process Process-1:
Traceback (most recent call last):
File "/usr/lib/python3.8/multiprocessing/process.py", line 315, in _bootstrap
self.run()
File "/usr/lib/python3.8/multiprocessing/process.py", line 108, in run
self._target(*self._args, **self._kwargs)
File "./gigahorse.py", line 287, in compile_datalog
assert not(process.returncode), "Compilation failed. Stopping."
AssertionError: Compilation failed. Stopping.
Traceback (most recent call last):
File "./gigahorse.py", line 542, in
open(v, 'r') # check program exists
FileNotFoundError: [Errno 2] No such file or directory: '/home/cj/Work/PONZI/gigahorse-toolchain/logic/main.dl_compiled'
I am trying to add a custom data flow analysis module. However, during developing, I notice that some of the facts of the relation FormalArgs(func:IRFunction, var:TACVariable, n:number)
is missing. I wonder if this is on purpose? If not, maybe the reproduction script will help to fix it.
example/wEth.hex
.logic/decompiler_output.dl
.decl NewFormalArgs(func:IRFunction, var:TACVariable, n:number)
.output NewFormalArgs
NewFormalArgs(func, tacVar, n) :-
FunctionArgument(func, n, var),
Variable_String(var, var_rep),
tacVar = as(var_rep, TACVariable).
./gigahorse.py examples/wEth.hex
.temp/wEth/FormalArgs.csv
and .temp/wEth/NewFormalArgs.csv
. We shall see the following result:## FormalArgs
0x4dd 0x4ddarg0x0 0
0x68c 0x68carg0x2 2
0x68c 0x68carg0x0 0
0x68c 0x68carg0x1 1
0x68c 0x68carg0x3 3
0xb30 0xb30arg0x0 0
## NewFormalArgs
0x440 0x440arg0x0 0
0x4dd 0x4ddarg0x0 0
0x68c 0x68carg0x0 0
0x68c 0x68carg0x1 1
0x68c 0x68carg0x2 2
0x68c 0x68carg0x3 3
0xb30 0xb30arg0x0 0
0xbce 0xbcearg0x0 0
0xbce 0xbcearg0x1 1
0xbce 0xbcearg0x2 2
I install it properly but it throws an error during compile:
#34 0.355 Compiling /home/reviewer/gigahorse-toolchain/logic/main.dl to C++ program and executable
#34 0.360 Compiling /home/reviewer/gigahorse-toolchain/clientlib/function_inliner.dl to C++ program and executable
#34 403.6 compiler error: cannot compile source file /home/reviewer/gigahorse-toolchain/cache/69422d6108360eb40049666acc113122.cpp
#34 403.6 g++ -std=c++17 -fwrapv -DUSE_NCURSES -O3 -DUSE_LIBZ -DUSE_SQLITE -fopenmp -march=native -std=c++17 -Wdate-time -D_FORTIFY_SOURCE=2 -o/home/reviewer/gigahorse-toolchain/cache/69422d6108360eb40049666acc113122 /home/reviewer/gigahorse-toolchain/cache/69422d6108360eb40049666acc113122.cpp -ldl -lpthread -lsqlite3 -lz -lncurses -lfunctors -I/usr/bin/../include -I/usr/bin/include
#34 403.6 g++: internal compiler error: Killed (program cc1plus)
#34 403.6 Please submit a full bug report,
#34 403.6 with preprocessed source if appropriate.
#34 403.6 See <file:///usr/share/doc/gcc-7/README.Bugs> for instructions.
#34 403.7 failed to compile C++ source </home/reviewer/gigahorse-toolchain/cache/69422d6108360eb40049666acc113122.cpp>
#34 403.9 Process Process-2:
#34 403.9 Traceback (most recent call last):
#34 403.9 File "/usr/lib/python3.8/multiprocessing/process.py", line 313, in _bootstrap
#34 403.9 self.run()
#34 403.9 File "/usr/lib/python3.8/multiprocessing/process.py", line 108, in run
#34 403.9 self._target(*self._args, **self._kwargs)
#34 403.9 File "/home/reviewer/gigahorse-toolchain/gigahorse.py", line 287, in compile_datalog
#34 403.9 assert not(process.returncode), "Compilation failed. Stopping."
#34 403.9 AssertionError: Compilation failed. Stopping.
#34 500.4 Traceback (most recent call last):
#34 500.4 File "/home/reviewer/gigahorse-toolchain/gigahorse.py", line 542, in <module>
#34 500.4 open(v, 'r') # check program exists
#34 500.4 FileNotFoundError: [Errno 2] No such file or directory: '/home/reviewer/gigahorse-toolchain/clientlib/function_inliner.dl_compiled'
Is there an indirect jump semantic in the TAC format produced by Gigahorse?
The syntax of a jump operation is JUMP $var
. Although it seems like the $var
can be a runtime computed result, from what I observe so far, $var
is always a constant in the TAC format. (Same for JUMPI
)
I have this concern because in some cases, a transaction can fail if the code is trying to jump to a destination that is not marked by a JUMPDEST
.
For example, this contract with some particular input can trigger an ErrInvalidJump exception in go-ethereum, but from the TAC output, it is pretty obvious that such error can never occur because jumps are all static and they all jump to legal destinations.
So I want to ask, does Gigahorse concern about indirect jump when dissembling the source code?
Hey, I am trying to install Gigahorse from a local clone by following the instructions in the README file, but encountered make: *** [num256.o] Error 1
error. The souffle version is 2.3 and cpp_int.hpp
exists in boost, so I am not sure why it happened and if it is the same reason caused #67.
I am using a Macbook with Apple M1 Pro chips, the operating system is macOS Monterey 12.4. It encountered the same error when I tried to install it on ssh with Ubuntu 20.04.6 LTS.
Any suggestion is appreciated, thanks in advance!
➜ souffle-addon git:(b2ef95b) make
g++ -std=c++17 -O2 num256.cpp -c -fPIC -o num256.o
num256.cpp:3:10: fatal error: 'boost/multiprecision/cpp_int.hpp' file not found
#include <boost/multiprecision/cpp_int.hpp>
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1 error generated.
make: *** [num256.o] Error 1
➜ souffle-addon git:(b2ef95b) souffle --version
----------------------------------------------------------------------------
Version: 2.3
----------------------------------------------------------------------------
Copyright (c) 2016-22 The Souffle Developers.
Copyright (c) 2013-16 Oracle and/or its affiliates.
All rights reserved.
============================================================================
➜ multiprecision git:(stable) pwd
/opt/homebrew/Cellar/boost/1.82.0_1/include/boost/multiprecision
➜ multiprecision git:(stable) ls
complex128.hpp debug_adaptor.hpp mpc.hpp
complex_adaptor.hpp detail mpfi.hpp
concepts eigen.hpp mpfr.hpp
cpp_bin_float float128.hpp number.hpp
cpp_bin_float.hpp fwd.hpp random.hpp
cpp_complex.hpp gmp.hpp rational_adaptor.hpp
cpp_dec_float.hpp integer.hpp tommath.hpp
cpp_int logged_adaptor.hpp traits
cpp_int.hpp miller_rabin.hpp
Our current heuristic supports public functions with 3 or 4 byte sighashes. However, there are cases of optimized code with "mined" sighashes that are smaller than 3 bytes. In order to support this with generality we'd probably need some limited global analysis.
Example
This is related to issue 655 raised by the contract library project where there is a need to get updated as a contract goes through analysis
I write a datalog rule by myself. It works well, but every time I want to add or modify some rules, it takes so long for the compilation. Is there any way I can reduce the compilation time?
I ran the code but encountered several problems, may I ask if anyone has encountered this problem?
Hi there--I've been familiarizing myself with gigahorse and haven't been able to find out how to enable the type of analysis described in the paper:
Gigahorse produces output both in structured form (i.e., tables, exported as CSV files) and in pretty-printed text.
I can successfully generate the tables/CSVs but haven't been able to figure out how to generate the pretty-printed text output format. I'm hoping to produce output similar to contract-library.com to aid in my own analysis (the contracts in question are on the Matic sidechain unfortunately or I'd just use contract-library itself).
Appreciate the work that's been put into gigahorse; I've currently been limping by with panoramix.
I tried using gigahorse to decompile contract 0x85865A86502ffB25a1cb3BBAc809d814FEB614e9 and only got a single event in the output. According to etherscan, there should be 4 events.
I also tried using different -M and -cd settings but the result didn’t change.
Any pointers on how to use the tool to get all 4 events?
Thank you.
Sorry to bother you,when I install as instructed,there is no mistake.But when I run the command ./gigahorse.py examples/long_running.hex,there is a mistake called compile error and no such file or directory decompiler_compiled.
Hi, first I'd like to thank you for making gigahorse open source. It's an amazing research project and tool. This is not a concrete issue but more of a brain dump with which am looking for ideas.
I've been using gigahorse to write rules to experiment with known bugs and found a limitation at the time of making flow analysis (both using clientlib and symvalic).
To me it looks like the main issue is that the analysis passes don't tend to make a lot of use of the memory modeling module so data flowing through the memory gets "lost". Even after adding relations to improve the memory modeling (specially around MemoryCopyLoop
) for the latest solidity versions (0.7, 0.8), the flow analysis tend to be limited to statement uses and defines.
So my question is what do you think it would be a good way to augment the flow analysis passes with memory modeling? I was thinking of adding more more "Array/Tuple/Memory Contents" relations to DataFlows/VarMayBe but I may be missing context on why this is the way it is.
And the other is, have you considered or have thoughts about making a "de-aliasing" pass? I haven't given it too much thought but it looks to me like something as basic as merging aliases with MLOADSFreePtrUnchanged
would normalize all the array/tuple access and make the other passes much easier to follow (I'm aware of the symbolic array equivalence but don't think that's used for flow anal). On the other hand, I think, some information could be lost in the process (RawDataArray
-> AbiArray
, etc) or completely fail if the modeling is inaccurate.
Also, do you hang out in any messaging platform (Discord or other) to discuss gigahorse and evm/solidity?
I'm currently running gigahorse via docker (ubuntu 22.04 souffle 2.2) and getting 0 for all outputs under Analytics regardless of bytecodes I use, even the example:
--------------------------------------------------------------------------------
Analytics
--------------------------------------------------------------------------------
Analytics_CallToSignature: 0
Analytics_CallWithSig: 0
Analytics_DataFlows: 0
Analytics_DataFlowsToDifferentPublicFunction: 0
Analytics_ERC20ApproveCall: 0
Analytics_ERC20TransferCall: 0
Analytics_ERC20TransferFromCall: 0
Analytics_EventSignature: 0
Analytics_LargestFunction: 0
Analytics_LenVarHasManyVars: 0
Analytics_MemConsStmtUsesFromDistinctPaths: 0
Analytics_MissingDataFlowsToSamePublicFunction: 0
Analytics_NonModeledMLOAD: 0
Analytics_NonModeledMSTORE: 0
Analytics_PolymorphicVariable: 0
Analytics_PublicFunctionArg: 0
Analytics_PublicFunctionArrayArg: 0
Analytics_Variables: 0
bytecode_size: 3644
The program still managed to capture the bytecode size differences and generate outputs for opcodes. However, files such as visualizeout.py
will not run since it's missing the necessary reference files.
I wanted to use gigahorse to identify the proxy call instructions in the fallback function, but found that the fallback functions of some contracts were not separated, and the logic of these fallback functions was in the function selector. This was a hindrance to what I wanted to do. The example contract below fits this scenario.
// SPDX-License-Identifier: MIT
pragma solidity ^0.8.0;
contract ReceiveEther {
uint public a = 0;
uint public b = 0;
fallback() external payable {
a += 1;
emit Received(msg.sender, msg.value, a);
}
receive() external payable {
b += 1;
emit Received(msg.sender, msg.value, b);
}
}
The following is the decompilation result of the example contract.
function __function_selector__() public {
...
// The logic of fallback function.
Begin block 0x93
prev=[0x33, 0x37], succ=[0xa4]
=================================
0x94: v94(0x1) = CONST
0x96: v96(0x0) = CONST
0x9a: v9a = SLOAD v96(0x0)
0x9b: v9b(0xa4) = CONST
0xa0: va0(0x1af) = CONST
0xa3: va3_0 = CALLPRIVATE va0(0x1af), v9a, v94(0x1), v9b(0xa4)
...
}
When experimenting with example, the following error was reported
,help
root@hecs-214309:~/work/gig/gigahorse-toolchain# ./gigahorse.py examples/long_running.hex
Compiling /root/work/gig/gigahorse-toolchain/logic/fallback_precise.dl to C++ program and executable
Compiling /root/work/gig/gigahorse-toolchain/logic/main.dl to C++ program and executable
Compiling /root/work/gig/gigahorse-toolchain/logic/fallback_scalable.dl to C++ program and executable
Compiling /root/work/gig/gigahorse-toolchain/clientlib/function_inliner.dl to C++ program and executable
Warning: No rules/facts defined for relation global.sens.MergeContextResponse in abstract_context.dl:14:9
Error: Ignored execution plan for non-recursive clause in local.dl:605:12
Warning: No rules/facts defined for relation revertCloner.Prev_Block_OriginalBlock in statement_insertor.dl:246:9
1 errors generated, evaluation aborted
Process Process-2:
Traceback (most recent call last):
File "/usr/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap
self.run()
File "/usr/lib/python3.10/multiprocessing/process.py", line 108, in run
self._target(*self._args, **self._kwargs)
File "/root/work/gig/gigahorse-toolchain/./gigahorse.py", line 342, in compile_datalog
assert not(process.returncode), f"Compilation for {spec} failed. Stopping."
AssertionError: Compilation for /root/work/gig/gigahorse-toolchain/logic/fallback_scalable.dl failed. Stopping.
Warning: No rules/facts defined for relation global.sens.MergeContextResponse in abstract_context.dl:14:9
Error: Ignored execution plan for non-recursive clause in local.dl:605:12
Warning: No rules/facts defined for relation revertCloner.Prev_Block_OriginalBlock in statement_insertor.dl:246:9
1 errors generated, evaluation aborted
Process Process-1:
Traceback (most recent call last):
File "/usr/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap
self.run()
I tried to run the Gigahorse pipeline manually with the instructions given in readme (with MaxContextDepth.csv
set to 8)
I then compare them with the results generated from gigahorse.py
using diff -qr out .temp/long_running/out | sort
and notice
Files out/ActualReturnArgs.csv and .temp/long_running/out/ActualReturnArgs.csv differ
Files out/FormalArgs.csv and .temp/long_running/out/FormalArgs.csv differ
Files out/Function.csv and .temp/long_running/out/Function.csv differ
...
Only in out: Fail.csv
Only in out: Helper_Zeros.csv
Fail.csv
is an empty file.
Helper_Zeros.csv
contains:
0
0 1
00 2
000 3
0000 4
00000 5
000000 6
0000000 7
00000000 8
000000000 9
0000000000 10
gigahorse.py
Only in .temp/long_running/out: Verbatim_CDLAllVSStaticVSArr.csv
Only in .temp/long_running/out: Verbatim_MemConsStmtsLengths.csv
...
I do not see any error or warning when running the pipeline manually. Could it simply be a misconfiguration? Let me know if you need the full log.
I tried to perform some kind of out-of-SSA translation on the output TAC representation.
I found some of the PHI operations seem to be ill-formed.
Take long_running.hex
as an example, in the TAC output, the function function getTransactionCount(bool,bool)() public
contains a block:
Begin block 0x73aB0x210
prev=[0x739B0x210], succ=[0x740B0x210, 0x746B0x210]
=================================
0x73a_0x00x210: v73a_0V210 = PHI v218, v21d, v714V210, v738V210
0x73b0x210: v73bV210 = ISZERO v73a_0V210
0x73c0x210: v73cV210(0x746) = CONST
0x73f0x210: JUMPI v73cV210(0x746), v73bV210
Here, PHI takes four possible variables but has only one predecessor. This already confuses me as it is not at a join node (which means it can't be a dominance frontier). But let's move on and look at its predecessor, which is an empty block with three incoming edges.
Begin block 0x739B0x210
prev=[0x715B0x210, 0x71bB0x210, 0x723B0x210], succ=[0x73aB0x210]
=================================
I have no clue of why this block is empty, but given three incoming edges, there is no enough information to decode the PHI operation.
The strongest evidence shows that there is something wrong with the PHI insertion is that two of the variables in the PHI operation v218, v21d
are actually defined in the same block:
Begin block 0x210
prev=[0x205], succ=[0x6e8B0x210]
=================================
0x211: v211(0x14d) = CONST
0x214: v214(0x4) = CONST
0x216: v216 = CALLDATALOAD v214(0x4)
0x217: v217 = ISZERO v216
0x218: v218 = ISZERO v217
0x219: v219(0x24) = CONST
0x21b: v21b = CALLDATALOAD v219(0x24)
0x21c: v21c = ISZERO v21b
0x21d: v21d = ISZERO v21c
0x21e: v21e(0x6e8) = CONST
0x221: JUMP v21e(0x6e8)
Any help would be appreciated!
For this contract(https://sepolia.etherscan.io/address/0xafcb3200627b921da0e3c1aa6acbea24b45168d1#code), there is an error in the starting position of the 3-address code block
I was trying to run Gigahorse manually by directly invoking souffle with the facts generated by generatefacts
.
However, I ran into an error complaining about missing MaxContextDepth.csv
as input.
Am I missing something here? I'm using the latest Souffle release 2.0.2
I have always had a question: your method is to write vulnerability detection rules based on vulnerability characteristics, which will lead to too strong subjective factors. So, do we have any way to automatically generate detection rules based on the characteristics of the vulnerability? Or is this a disadvantage of Datalog in the first place?
Looking forward to your reply. Thanks!
On this contract the biggest part of the body of the transferFrom()
function is not part of the decompiled output. It is the 0x1a37()
function but the call at block 0xda9
is not recognized.
Did some digging and found that a fact for IsFunctionCallReturn
is produced but no IRFunctionCallReturn
. This is because the callerret
block is not deduced to be part of the caller function.
Relevant code of rule that fails to produce a fact:
gigahorse-toolchain/logic/functions.dl
Lines 416 to 425 in 37dc051
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.