angr / pypcode Goto Github PK
View Code? Open in Web Editor NEWPython bindings to Ghidra's SLEIGH library for disassembly and lifting to P-Code IR
Home Page: https://api.angr.io/projects/pypcode/en/latest/
License: Other
Python bindings to Ghidra's SLEIGH library for disassembly and lifting to P-Code IR
Home Page: https://api.angr.io/projects/pypcode/en/latest/
License: Other
We can reduce amount of references to context. Helpful also for creating p-code manually.
No response
No response
In some cases, an older csleigh version may be accidentally loaded (eg working with multiple python revs). Add version enforcement.
No response
No response
When I was creating an ArchLinux package of this repository I encountered a build error due to makepkg script of ArchLinux exporting multiple CFLAGS. The error message is as follows
/usr/bin/cc "-march=x86-64 -mtune=generic -O2 -pipe -fno-plt -fexceptions -Wp,-D_FORTIFY_SOURCE=2 -Wformat -Werror=format-security -fstack-clash-protection -fcf-protection" -DCFFI_CDEF=1 -E -P /home/fanta/packages/python-pypcode/src/pypcode-1.0.2/pypcode/native/csleigh.h > /home/fanta/packages/python-pypcode/src/pypcode-1.0.2/pypcode/native/build/csleigh.i
cc1: error: bad value (‘x86-64 -mtune=generic -O2 -pipe -fno-plt -fexceptions -Wp,-D_FORTIFY_SOURCE=2 -Wformat -Werror=format-security -fstack-clash-protection -fcf-protection’) for ‘-march=’ switch
cc1: note: valid arguments to ‘-march=’ switch are: nocona core2 nehalem corei7 westmere sandybridge corei7-avx ivybridge core-avx-i haswell core-avx2 broadwell skylake skylake-avx512 cannonlake icelake-client rocketlake icelake-server cascadelake tigerlake cooperlake sapphirerapids alderlake bonnell atom silvermont slm goldmont goldmont-plus tremont knl knm x86-64 x86-64-v2 x86-64-v3 x86-64-v4 eden-x2 nano nano-1000 nano-2000 nano-3000 nano-x2 eden-x4 nano-x4 k8 k8-sse3 opteron opteron-sse3 athlon64 athlon64-sse3 athlon-fx amdfam10 barcelona bdver1 bdver2 bdver3 bdver4 znver1 znver2 znver3 btver1 btver2 native
make[2]: *** [CMakeFiles/csleigh.i.dir/build.make:73: CMakeFiles/csleigh.i] Error 1
make[2]: Leaving directory '/home/fanta/packages/python-pypcode/src/pypcode-1.0.2/pypcode/native/build'
make[1]: *** [CMakeFiles/Makefile2:142: CMakeFiles/csleigh.i.dir/all] Error 2
make[1]: *** Waiting for unfinished jobs....
In pypcode/native/CMakeLists.txt, CMake scripts are as follows
add_custom_target(
csleigh.i ALL
COMMAND ${CMAKE_C_COMPILER} ${CMAKE_C_FLAGS} ${PREPROCESSOR_ONLY_FLAGS} ${CMAKE_SOURCE_DIR}/csleigh.h > ${CMAKE_BINARY_DIR}/csleigh.i
BYPRODUCTS ${CMAKE_BINARY_DIR}/csleigh.i
VERBATIM
)
If passing CFLAGS="multiple flags", due to VERBATIM, the command will contain flags with quotes which causes a disaster.
To solve this problem, one way is using "SEPARATE_ARGUMENTS"
SEPARATE_ARGUMENTS(NEW_CMAKE_CXX_FLAGS UNIX_COMMAND ${CMAKE_C_FLAGS})
add_custom_target(
csleigh.i ALL
COMMAND ${CMAKE_C_COMPILER} ${NEW_CMAKE_C_FLAGS} ${PREPROCESSOR_ONLY_FLAGS} ${CMAKE_SOURCE_DIR}/csleigh.h > ${CMAKE_BINARY_DIR}/csleigh.i
BYPRODUCTS ${CMAKE_BINARY_DIR}/csleigh.i
VERBATIM
)
If the maintainer thinks this patch is OK, I will send a PR :)
Based on this:
Lines 33 to 34 in 462f1db
I'm assuming this project is Apache2-licensed? Please make it explicit.
When translating the instruction blr x8
with pypcode, it appears that the x30
link register is set to 0x4
whereas it should be set to pc+0x4
:
IMARK ram[0:4]
pc = x8
x30 = 0x0 + 0x4
call [pc]
I would have expected:
IMARK ram[0:4]
x30 = pc + 0x4
pc = x8
call [pc]
See attached script to reproduce the problem. For convenience, I used keystone to build the reproduction test case, but the same problem happens on bytes coming from real binary.
import keystone
import pypcode
ctx = pypcode.Context("AARCH64:LE:64:AppleSilicon")
asm = "blr x8"
ks = keystone.Ks(keystone.KS_ARCH_ARM64, keystone.KS_MODE_LITTLE_ENDIAN)
instr_bytes, count = ks.asm(asm, as_bytes=True)
ins = ctx.disassemble(instr_bytes).instructions[0]
print(f"{ins.mnem} {ins.body}:")
ops = ctx.translate(instr_bytes).ops
for op in ops:
print(pypcode.PcodePrettyPrinter.fmt_op(op))
And here is this program output:
blr x8:
IMARK ram[0:4]
pc = x8
x30 = 0x0 + 0x4
call [pc]
I am using pypcode in standalone manner, I checked out the last commit of master branch (ed59b51) on mac0S and installed it virtualenv using pip install .
Everything is working fine except this unexpected translation.
No response
Language: MIPS:LE:32:default
opcode = b"\x40\x00\x40\x10"
result = context.translate(opcode, 0)
Yields the following error: "terminate called after throwing an instance of 'std::out_of_range'
what(): Attempting to lift outside buffer range"
In ghidra, this is what you get:
00405304 04 00 40 10 beq v0,zero,LAB_00405318
$U240:1 = INT_EQUAL v0, 0:4
at = INT_OR at, 0:4
CBRANCH *[ram]0x405318:4, $U240
With the new pip release, using the package raises an error:
(tmp-900339b97de62dc) ➜ /tmp python -c 'import pypcode'
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "./tmp-900339b97de62dc/lib/python3.10/site-packages/pypcode/__init__.py", line 13, in <module>
from ._csleigh import ffi
ModuleNotFoundError: No module named 'pypcode._csleigh'
$ mktmpenv -p $(which python3.10)
pip install 'pypcode'
python -c 'import pypcode'
However, the same steps with PyPcode 1.0.7 works:
$ pip install 'pypcode==1.0.7'
$ python -c 'import pypcode' && echo "ok"
Environment:
No response
There are more encoded details about address spaces. Expose them.
No response
No response
Currently users must enumerate all languages. Add a convenience function.
No response
No response
No response
In my env (CPython 3.8.10, Ubuntu 20.04), the FfiPreBuildExtension
doesn't seem to actually be getting run.
Support FreeBSD builds.
@rhelmot is already working on this.
No response
No response
https://github.com/NationalSecurityAgency/ghidra/tree/Ghidra_10.2.3_build
No response
No response
Hi:
I am trying to use pypcode to generate pcode from binary. And I always receive baddataerror as followed:
(pypcode) muqi@muqi-desktop:~/pcode_test/code_A_calls_B/angr_script$ python -m pypcode x86:LE:64:default -r /bin/true
--------------------------------------------------------------------------------
00000000/2: JG 0x47
--------------------------------------------------------------------------------
0: unique[0x19e0:1] = BOOL_NEGATE register[0x206:1]
1: unique[0x19f0:1] = INT_EQUAL register[0x20b:1], register[0x207:1]
2: unique[0x1a10:1] = BOOL_AND unique[0x19e0:1], unique[0x19f0:1]
3: CBRANCH ram[0x47:8], unique[0x1a10:1]
** An error occured during translation: BadDataError('r0x00000002: Unable to resolve constructor',)
Tried pypcode from 1.0.0 to the current version on my virtual environment, all report the same.
Btw, The pypcode with version 0.0.2 works well for me.
Is that because I missed some settings relating to cffi?
Thanks!
my python version is 3.6.9, os version ubuntu 18.04 and here is my pip list:
(pypcode) muqi@muqi-desktop:~/pcode_test/code_A_calls_B/angr_script$ pip list
(pypcode) muqi@muqi-desktop:~/pcode_test/code_A_calls_B/angr_script$ pip list
Package Version
------------- -------
cffi 1.14.6
pip 21.2
pkg_resources 0.0.0
pycparser 2.20
pypcode 1.0.1
setuptools 57.4.0
wheel 0.36.2
In pyvex, user can specify skip_stmts
to pre-filter unused statements and just look at control flow information. This would be nice to have as an option, so we don't have to filter on Python side.
No response
No response
Allow people to compile slaspec files without needing to build sleigh
No response
No response
Raise exception (result.error
) from Context::translate
No response
No response
No response
Enable tests when GitHub ships macOS arm64 runners
No response
No response
IIRC impacted by 10.2.2 update @
pypcode/pypcode/native/csleigh.cc
Line 255 in ec67cfc
No response
No response
No response
I am currently using v1.1.2 within a project I am working on and am looking to improve the overall execution speed by any means necessary. It seems that due to initial design decisions, updating my code to the newer versions would require a good bit of restructuring. I was wondering if anyone had information, or insight regarding how the v2.0+ versions and the v1.1.2 version compare in terms of performance? More specifically, instruction translation speeds and memory usage? Is there much of a difference?
Thanks in advance for any assistance!
Sync to Ghidra 11.0.3
No response
No response
Accroding to the docs, I tried to translate a binary file of the SPARC architecture into pcode, and I ran the following code:
ctx = Context("x86:LE:64:default")
tx = ctx.translate(bin_data)
However, I encountered the following error:
LowlevelError: Could not obtain cached delay slot instruction
I have tried multiple SPARC files, and I get the same error every time.
I want to know if pypcode does not support the translation of SPARC architecture files?
By the way, the files' arch is:
ELF 64-bit MSB relocatable, SPARC V9, relaxed memory ordering, version 1 (SYSV), not stripped.
Make things easy, support them all out of the box.
https://github.com/NationalSecurityAgency/ghidra/tree/master/Ghidra/Processors
If max_instructions=1
when pyvex lifts an instruction that will cause delay slot execution, it will return NoDecode
. pypcode will actually return 2 instructions. Probably best to be consistent with pyvex here and always decode at most max_instructions
, including delay slot execution.
No response
No response
No response
Is there a way to query a common register offset (eg, program counter, frame/stack pointer, flags/status) without naming the architecture-specific name (RIP, etc)?
~ % ./test_pcode.py -r x86:LE:64:default ~/test.bin
--------------------------------------------------------------------------------
00000000/9: 64 48 8b 14 25 c0 ff ff ff MOV RDX,qword ptr FS:[0xffffffc0]
--------------------------------------------------------------------------------
0: unique[0x4f00:8] = INT_ADD register[0x110:8], const[0xffffffc0:8]
1: unique[0xc000:8] = LOAD const[0x55e5781c94a0:8], unique[0x4f00:8]
2: register[0x10:8] = COPY unique[0xc000:8]
test.bin contains just those 9 bytes of that single MOV instruction. Notice in the first pcode op, the const[0xffffffc0:8] is 8 bytes long, but shouldn't it be only 4? Or extended first?
Not sure if this is a sleigh bug?
Language: PowerPC:LE:64:A2ALT
opcode0 = b"\x0c\x00\xfe\x41"
opcode1 = b"\x25\xde\xff\x4b"
result = context.translate(opcode0, 0)
result = context.translate(opcode1, 0)
Error thrown after executing this translations in sequence : "terminate called after throwing an instance of 'LowlevelError'"
This is what you get in ghidra:
100021d8 0c 00 fe 41 beq cr7,LAB_100021e4
$U1470:1 = COPY 0:1
$U100:4 = INT_SUB 3:4, 2:4
$U120:1 = INT_RIGHT cr7, $U100
$U1470:1 = INT_AND $U120, 1:1
CBRANCH *[ram]0x100021e4:4, $U1470
100021dc 25 de ff 4b bl Elf64_Ehdr_10000000
r2Save = COPY r2
LR = COPY 0x100021e0:8
CALL *[ram]0x10000000:4
When pretty printing a store command, I'm getting the python representations for Varnode objects. For example:
*[ram]unique[180:4] = <pypcode.pypcode_native.Varnode object at 0x7f0d471eeb50>
import pypcode
context = pypcode.Context("MIPS:BE:32:default")
print(pypcode.PcodePrettyPrinter.fmt_op(context.translate(b'\xaf\xbc\x00\x10').ops[4]))
pypcode version: 1.1.3.dev0
python -m angr.misc.bug_report:
/home/doug/projects/XXX/venv/lib/python3.11/site-packages/angr/misc/bug_report.py:1: DeprecationWarning: the imp module is deprecated in favour of importlib and slated for removal in Python 3.12; see the module's documentation for alternative uses
import imp
angr environment report
=============================
Date: 2023-12-11 16:08:17.962668
Running in virtual environment at /home/doug/projects/XXX/venv
Platform: linux-x86_64
Python version: 3.11.6 (main, Oct 23 2023, 22:48:54) [GCC 11.4.0]
######## angr #########
Python found it in /home/doug/projects/XXX/venv/lib/python3.11/site-packages/angr
Pip version angr 9.2.74
Git info:
Current commit da5e2b9755125aae555fb82dd93e3e16b6c04526 from branch build-full-dataset
Could not resolve tracking branch or remote info!
######## ailment #########
Python found it in /home/doug/projects/XXX/venv/lib/python3.11/site-packages/ailment
Pip version ailment 9.2.74
Git info:
Current commit da5e2b9755125aae555fb82dd93e3e16b6c04526 from branch build-full-dataset
Could not resolve tracking branch or remote info!
######## cle #########
Python found it in /home/doug/projects/XXX/venv/lib/python3.11/site-packages/cle
Pip version cle 9.2.74
Git info:
Current commit da5e2b9755125aae555fb82dd93e3e16b6c04526 from branch build-full-dataset
Could not resolve tracking branch or remote info!
######## pyvex #########
Python found it in /home/doug/projects/XXX/venv/lib/python3.11/site-packages/pyvex
Pip version pyvex 9.2.74
Git info:
Current commit da5e2b9755125aae555fb82dd93e3e16b6c04526 from branch build-full-dataset
Could not resolve tracking branch or remote info!
######## claripy #########
Python found it in /home/doug/projects/XXX/venv/lib/python3.11/site-packages/claripy
Pip version claripy 9.2.74
Git info:
Current commit da5e2b9755125aae555fb82dd93e3e16b6c04526 from branch build-full-dataset
Could not resolve tracking branch or remote info!
######## archinfo #########
Python found it in /home/doug/projects/XXX/venv/lib/python3.11/site-packages/archinfo
Pip version archinfo 9.2.74
Git info:
Current commit da5e2b9755125aae555fb82dd93e3e16b6c04526 from branch build-full-dataset
Could not resolve tracking branch or remote info!
######## z3 #########
Python found it in /home/doug/projects/XXX/venv/lib/python3.11/site-packages/z3
Pip version z3-solver 4.10.2.0
Git info:
Current commit da5e2b9755125aae555fb82dd93e3e16b6c04526 from branch build-full-dataset
Could not resolve tracking branch or remote info!
######## unicorn #########
Python found it in /home/doug/projects/XXX/venv/lib/python3.11/site-packages/unicorn
Pip version unicorn 2.0.1.post1
Git info:
Current commit da5e2b9755125aae555fb82dd93e3e16b6c04526 from branch build-full-dataset
Could not resolve tracking branch or remote info!
######### Native Module Info ##########
angr: <CDLL '/home/doug/projects/XXX/venv/lib/python3.11/site-packages/angr/state_plugins/../lib/angr_native.so', handle 2061510 at 0x7f17fa79b710>
unicorn: <CDLL '/home/doug/projects/XXX/venv/lib/python3.11/site-packages/unicorn/lib/libunicorn.so.2', handle 1a615d0 at 0x7f17fdd87790>
pyvex: <cffi.api._make_ffi_library.<locals>.FFILibrary object at 0x7f17fe96b910>
z3: <CDLL '/home/doug/projects/XXX/venv/lib/python3.11/site-packages/z3/lib/libz3.so', handle 13e59f0 at 0x7f180066dc90>
No response
Line 34 in 76205b4
The license tag says GPL
. Could you be more specific, e. g., if it's GPLv1
, GPLv1+
, etc.?
Thanks
You can do it today, but you need to use the lower level csleigh API. Add a nicer interface for it.
I have tried the angr examples and I'm finding an error related to architecture mapping, so I would like to know the correct way to specify target architecture when using angr with the pcode engine.
For example, with the 0ctf_trace example (https://github.com/angr/angr-examples/tree/master/examples/0ctf_trace), when I run solve.py I have no error. But when I modify it to use the pcode engine instead of VEX by adding "engine=angr.engines.UberEnginePcode" to the project constructor, I see this error:
ERROR | 2023-08-21 16:46:49,214 | angr.engines.pcode.lifter | Unknown mapping of MIPS32 to pcode languge id
The problem seems to be that the Project constructor has load_options that specify arch as 'mipsel'. I can't delete this argument because an architecture is required for the blob backend. But 'mipsel' is apparently not correct for pypcode.
Similarly, I can run the android_arm_license_validation example (https://github.com/angr/angr-examples/tree/master/examples/android_arm_license_validation) using VEX, but when I modify solve.py to use the pcode engine I see this error:
ERROR | 2023-08-21 16:52:25,375 | angr.engines.pcode.lifter | Unknown mapping of ARMEL to pcode languge id
In this case there is no arch parameter, and angr has determined the architecture to be ARM:
>>> b = angr.Project("./validate", load_options = load_options, auto_load_libs=False)
>>> state = b.factory.blank_state(addr=0x401760)
>>> state.arch
<Arch ARMEL (LE)>
Should I be using the arch parameter to set a pcode language ID?
Any assistance much appreciated.
Version info:
Host: Ubuntu20 (Linux 5.4.0-156-generic #173-Ubuntu SMP Tue Jul 11 07:25:22 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux)
python: 3.8.10
angr, cle, claripy, pyvex, 9.2.64
pypcode: 1.1.2
capstone: 5.0.0.post1
cffi: 1.15.1
pycparser: 2.21
Would it be possible to skip over data that is mixed with code (ARM) instead of returning?
Looking at incrementing the offset by the default instruction alignment here instead of the break
:
pypcode/pypcode/native/csleigh.cc
Lines 336 to 340 in b0b91c3
Hi, this project is awesome and thanks for the work! I have compiled this project on the Apple Silicon successfully by adding this definition in Ghidra_9.2.3_build
.
#if defined (__APPLE_CC__) && defined (__aarch64__)
#define HOST_ENDIAN 0
typedef unsigned int uintm;
typedef int intm;
typedef unsigned long uint8;
typedef long int8;
typedef unsigned int uint4;
typedef int int4;
typedef unsigned short uint2;
typedef short int2;
typedef unsigned char uint1;
typedef char int1;
typedef uint8 uintp;
#endif
So, could you please help sync with the upstream code so that cursed Apple Silicon users can benefit?
When failing to load SLA files, everything comes crashing down.
pypcode/pypcode/native/csleigh.cc
Line 255 in b5be395
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.