angr / pypcode Goto Github PK

View Code? Open in Web Editor NEW

170.0 9.0 23.0 3.88 MB

Python bindings to Ghidra's SLEIGH library for disassembly and lifting to P-Code IR

Home Page: https://api.angr.io/projects/pypcode/en/latest/

License: Other

Makefile 0.77% C++ 80.69% Yacc 4.17% Lex 1.23% Python 9.15% Assembly 3.85% Shell 0.03% CMake 0.10%

ghidra sleigh pcode ir python

pypcode's Issues

Description

We can reduce amount of references to context. Helpful also for creating p-code manually.

Alternatives

No response

Additional context

No response

Description

In some cases, an older csleigh version may be accidentally loaded (eg working with multiple python revs). Add version enforcement.

Alternatives

No response

Additional context

No response

Build error if exporting multiple CFLAGS

When I was creating an ArchLinux package of this repository I encountered a build error due to makepkg script of ArchLinux exporting multiple CFLAGS. The error message is as follows

/usr/bin/cc "-march=x86-64 -mtune=generic -O2 -pipe -fno-plt -fexceptions         -Wp,-D_FORTIFY_SOURCE=2 -Wformat -Werror=format-security         -fstack-clash-protection -fcf-protection" -DCFFI_CDEF=1 -E -P /home/fanta/packages/python-pypcode/src/pypcode-1.0.2/pypcode/native/csleigh.h > /home/fanta/packages/python-pypcode/src/pypcode-1.0.2/pypcode/native/build/csleigh.i
cc1: error: bad value (‘x86-64 -mtune=generic -O2 -pipe -fno-plt -fexceptions         -Wp,-D_FORTIFY_SOURCE=2 -Wformat -Werror=format-security         -fstack-clash-protection -fcf-protection’) for ‘-march=’ switch
cc1: note: valid arguments to ‘-march=’ switch are: nocona core2 nehalem corei7 westmere sandybridge corei7-avx ivybridge core-avx-i haswell core-avx2 broadwell skylake skylake-avx512 cannonlake icelake-client rocketlake icelake-server cascadelake tigerlake cooperlake sapphirerapids alderlake bonnell atom silvermont slm goldmont goldmont-plus tremont knl knm x86-64 x86-64-v2 x86-64-v3 x86-64-v4 eden-x2 nano nano-1000 nano-2000 nano-3000 nano-x2 eden-x4 nano-x4 k8 k8-sse3 opteron opteron-sse3 athlon64 athlon64-sse3 athlon-fx amdfam10 barcelona bdver1 bdver2 bdver3 bdver4 znver1 znver2 znver3 btver1 btver2 native
make[2]: *** [CMakeFiles/csleigh.i.dir/build.make:73: CMakeFiles/csleigh.i] Error 1
make[2]: Leaving directory '/home/fanta/packages/python-pypcode/src/pypcode-1.0.2/pypcode/native/build'
make[1]: *** [CMakeFiles/Makefile2:142: CMakeFiles/csleigh.i.dir/all] Error 2
make[1]: *** Waiting for unfinished jobs....

In pypcode/native/CMakeLists.txt, CMake scripts are as follows

add_custom_target(
	csleigh.i ALL
	COMMAND ${CMAKE_C_COMPILER} ${CMAKE_C_FLAGS} ${PREPROCESSOR_ONLY_FLAGS} ${CMAKE_SOURCE_DIR}/csleigh.h > ${CMAKE_BINARY_DIR}/csleigh.i
	BYPRODUCTS ${CMAKE_BINARY_DIR}/csleigh.i
	VERBATIM
)

If passing CFLAGS="multiple flags", due to VERBATIM, the command will contain flags with quotes which causes a disaster.
To solve this problem, one way is using "SEPARATE_ARGUMENTS"

SEPARATE_ARGUMENTS(NEW_CMAKE_CXX_FLAGS UNIX_COMMAND ${CMAKE_C_FLAGS})
add_custom_target(
	csleigh.i ALL
	COMMAND ${CMAKE_C_COMPILER} ${NEW_CMAKE_C_FLAGS} ${PREPROCESSOR_ONLY_FLAGS} ${CMAKE_SOURCE_DIR}/csleigh.h > ${CMAKE_BINARY_DIR}/csleigh.i
	BYPRODUCTS ${CMAKE_BINARY_DIR}/csleigh.i
	VERBATIM
)

If the maintainer thinks this patch is OK, I will send a PR :)

Licence not explicit

Based on this:

pypcode/setup.py

Lines 33 to 34 in 462f1db

 url='https://github.com/angr/pypcode', 

 license='Apache2',

I'm assuming this project is Apache2-licensed? Please make it explicit.

In ARM64 BLR translation, x30 should depends on pc

Description

When translating the instruction blr x8 with pypcode, it appears that the x30link register is set to 0x4 whereas it should be set to pc+0x4:

IMARK ram[0:4]
pc = x8
x30 = 0x0 + 0x4
call [pc]

I would have expected:

IMARK ram[0:4]
x30 = pc + 0x4
pc = x8
call [pc]

Steps to reproduce the bug

See attached script to reproduce the problem. For convenience, I used keystone to build the reproduction test case, but the same problem happens on bytes coming from real binary.

import keystone
import pypcode
ctx = pypcode.Context("AARCH64:LE:64:AppleSilicon")
asm = "blr x8"
ks = keystone.Ks(keystone.KS_ARCH_ARM64, keystone.KS_MODE_LITTLE_ENDIAN)
instr_bytes, count = ks.asm(asm, as_bytes=True)
ins = ctx.disassemble(instr_bytes).instructions[0]
print(f"{ins.mnem} {ins.body}:")
ops = ctx.translate(instr_bytes).ops
for op in ops:
    print(pypcode.PcodePrettyPrinter.fmt_op(op))

And here is this program output:

blr x8:
IMARK ram[0:4]
pc = x8
x30 = 0x0 + 0x4
call [pc]

blr.py.tgz

Environment

I am using pypcode in standalone manner, I checked out the last commit of master branch (ed59b51) on mac0S and installed it virtualenv using pip install . Everything is working fine except this unexpected translation.

Additional context

No response

Throws "terminate called after throwing an instance of 'std::out_of_range" translating MIPSEL opcode

Language: MIPS:LE:32:default

opcode = b"\x40\x00\x40\x10"
result = context.translate(opcode, 0)

Yields the following error: "terminate called after throwing an instance of 'std::out_of_range'
what(): Attempting to lift outside buffer range"

In ghidra, this is what you get:

    00405304 04 00 40 10     beq        v0,zero,LAB_00405318
                                                  $U240:1 = INT_EQUAL v0, 0:4
                                                  at = INT_OR at, 0:4
                                                  CBRANCH *[ram]0x405318:4, $U240

Unable to install PyPcode 1.1.0

Description

With the new pip release, using the package raises an error:

(tmp-900339b97de62dc) ➜ /tmp python -c 'import pypcode'                         
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "./tmp-900339b97de62dc/lib/python3.10/site-packages/pypcode/__init__.py", line 13, in <module>
    from ._csleigh import ffi
ModuleNotFoundError: No module named 'pypcode._csleigh'

Steps to reproduce the bug

Create a virtualenv for a fresh Python version (tested with 3.9, 3.10)

$ mktmpenv -p $(which python3.10)

pip install 'pypcode'
python -c 'import pypcode'

However, the same steps with PyPcode 1.0.7 works:

$ pip install 'pypcode==1.0.7' 
$ python -c 'import pypcode'  && echo "ok"

Environment

Environment:

Linux (Debian Stable)
Python 3.{9,10}
PyPcode: 1.1.0

Additional context

No response

Expose AddrSpace details

Description

There are more encoded details about address spaces. Expose them.

Alternatives

No response

Additional context

No response

Add language lookup function

Description

Currently users must enumerate all languages. Add a convenience function.

Steps to reproduce the bug

No response

Environment

No response

Additional context

No response

Installer does not build sla files by default?

In my env (CPython 3.8.10, Ubuntu 20.04), the FfiPreBuildExtension doesn't seem to actually be getting run.

FreeBSD support

Description

Support FreeBSD builds.

@rhelmot is already working on this.

Alternatives

No response

Additional context

No response

Update SLEIGH from Ghidra 10.2.3 release

Description

https://github.com/NationalSecurityAgency/ghidra/tree/Ghidra_10.2.3_build

Alternatives

No response

Additional context

No response

Cannot print pcode

Hi:
I am trying to use pypcode to generate pcode from binary. And I always receive baddataerror as followed:

(pypcode) muqi@muqi-desktop:~/pcode_test/code_A_calls_B/angr_script$ python -m pypcode x86:LE:64:default -r /bin/true 
--------------------------------------------------------------------------------
00000000/2: JG 0x47
--------------------------------------------------------------------------------
  0: unique[0x19e0:1] = BOOL_NEGATE register[0x206:1]
  1: unique[0x19f0:1] = INT_EQUAL register[0x20b:1], register[0x207:1]
  2: unique[0x1a10:1] = BOOL_AND unique[0x19e0:1], unique[0x19f0:1]
  3: CBRANCH ram[0x47:8], unique[0x1a10:1]

** An error occured during translation: BadDataError('r0x00000002: Unable to resolve constructor',)

Tried pypcode from 1.0.0 to the current version on my virtual environment, all report the same.
Btw, The pypcode with version 0.0.2 works well for me.
Is that because I missed some settings relating to cffi?

Thanks!

my python version is 3.6.9, os version ubuntu 18.04 and here is my pip list:
(pypcode) muqi@muqi-desktop:~/pcode_test/code_A_calls_B/angr_script$ pip list

(pypcode) muqi@muqi-desktop:~/pcode_test/code_A_calls_B/angr_script$ pip list
Package       Version
------------- -------
cffi          1.14.6
pip           21.2
pkg_resources 0.0.0
pycparser     2.20
pypcode       1.0.1
setuptools    57.4.0
wheel         0.36.2

Add option to return only control ops

Description

In pyvex, user can specify skip_stmts to pre-filter unused statements and just look at control flow information. This would be nice to have as an option, so we don't have to filter on Python side.

Alternatives

No response

Additional context

No response

Windows support is broken

Ship sleigh binary

Description

Allow people to compile slaspec files without needing to build sleigh

Alternatives

No response

Additional context

No response

Raise translation exceptions

Description

Raise exception (result.error) from Context::translate

Steps to reproduce the bug

No response

Environment

No response

Additional context

No response

Test macOS arm64 builds

Description

Enable tests when GitHub ships macOS arm64 runners

Alternatives

No response

Additional context

No response

Fix multithreading

Description

IIRC impacted by 10.2.2 update @

pypcode/pypcode/native/csleigh.cc

Line 255 in ec67cfc

AttributeId::initialize();

Steps to reproduce the bug

No response

Environment

No response

Additional context

No response

Does anyone have any insight regarding newer version performance changes?

Question

I am currently using v1.1.2 within a project I am working on and am looking to improve the overall execution speed by any means necessary. It seems that due to initial design decisions, updating my code to the newer versions would require a good bit of restructuring. I was wondering if anyone had information, or insight regarding how the v2.0+ versions and the v1.1.2 version compare in terms of performance? More specifically, instruction translation speeds and memory usage? Is there much of a difference?

Thanks in advance for any assistance!

Sync to Ghidra 11.0.3

Description

Sync to Ghidra 11.0.3

Alternatives

No response

Additional context

No response

Cannot translate c-program under sparc to p-code?

Question

Accroding to the docs, I tried to translate a binary file of the SPARC architecture into pcode, and I ran the following code:

ctx = Context("x86:LE:64:default")
tx = ctx.translate(bin_data)

However, I encountered the following error:

LowlevelError: Could not obtain cached delay slot instruction

I have tried multiple SPARC files, and I get the same error every time.

I want to know if pypcode does not support the translation of SPARC architecture files?

By the way, the files' arch is:

ELF 64-bit MSB relocatable, SPARC V9, relaxed memory ordering, version 1 (SYSV), not stripped.

Add CI lint step

Add doc generation

Add all architecture definitions

Make things easy, support them all out of the box.

https://github.com/NationalSecurityAgency/ghidra/tree/master/Ghidra/Processors

Align with pyvex on `max_instructions` behavior

Description

If max_instructions=1 when pyvex lifts an instruction that will cause delay slot execution, it will return NoDecode. pypcode will actually return 2 instructions. Probably best to be consistent with pyvex here and always decode at most max_instructions, including delay slot execution.

Steps to reproduce the bug

No response

Environment

No response

Additional context

No response

Add more comprehensive tests

Common registers?

Is there a way to query a common register offset (eg, program counter, frame/stack pointer, flags/status) without naming the architecture-specific name (RIP, etc)?

Seemingly incorrect const size in fs base calculation (x86-64)

~ % ./test_pcode.py -r x86:LE:64:default ~/test.bin
--------------------------------------------------------------------------------

00000000/9: 64 48 8b 14 25 c0 ff ff ff MOV RDX,qword ptr FS:[0xffffffc0]
--------------------------------------------------------------------------------
  0: unique[0x4f00:8] = INT_ADD register[0x110:8], const[0xffffffc0:8]
  1: unique[0xc000:8] = LOAD const[0x55e5781c94a0:8], unique[0x4f00:8]
  2: register[0x10:8] = COPY unique[0xc000:8]

test.bin contains just those 9 bytes of that single MOV instruction. Notice in the first pcode op, the const[0xffffffc0:8] is 8 bytes long, but shouldn't it be only 4? Or extended first?

Not sure if this is a sleigh bug?

Throws "terminate called after throwing an instance of 'LowLevelError'" while translating PowerPC

Language: PowerPC:LE:64:A2ALT

opcode0 = b"\x0c\x00\xfe\x41"
opcode1 = b"\x25\xde\xff\x4b"

result = context.translate(opcode0, 0)
result = context.translate(opcode1, 0)

Error thrown after executing this translations in sequence : "terminate called after throwing an instance of 'LowlevelError'"

This is what you get in ghidra:
100021d8 0c 00 fe 41 beq cr7,LAB_100021e4
$U1470:1 = COPY 0:1
$U100:4 = INT_SUB 3:4, 2:4
$U120:1 = INT_RIGHT cr7, $U100
$U1470:1 = INT_AND $U120, 1:1
CBRANCH *[ram]0x100021e4:4, $U1470
100021dc 25 de ff 4b bl Elf64_Ehdr_10000000
r2Save = COPY r2
LR = COPY 0x100021e0:8
CALL *[ram]0x10000000:4

Update to 10.2, when it comes out

Probably soon: https://github.com/NationalSecurityAgency/ghidra/milestone/19

Incorrectly Formatting STORE Input

Description

When pretty printing a store command, I'm getting the python representations for Varnode objects. For example:

*[ram]unique[180:4] = <pypcode.pypcode_native.Varnode object at 0x7f0d471eeb50>

Steps to reproduce the bug

import pypcode

context = pypcode.Context("MIPS:BE:32:default")
print(pypcode.PcodePrettyPrinter.fmt_op(context.translate(b'\xaf\xbc\x00\x10').ops[4]))

Environment

pypcode version: 1.1.3.dev0

python -m angr.misc.bug_report:

/home/doug/projects/XXX/venv/lib/python3.11/site-packages/angr/misc/bug_report.py:1: DeprecationWarning: the imp module is deprecated in favour of importlib and slated for removal in Python 3.12; see the module's documentation for alternative uses
  import imp
angr environment report
=============================
Date: 2023-12-11 16:08:17.962668
Running in virtual environment at /home/doug/projects/XXX/venv
Platform: linux-x86_64
Python version: 3.11.6 (main, Oct 23 2023, 22:48:54) [GCC 11.4.0]
######## angr #########
Python found it in /home/doug/projects/XXX/venv/lib/python3.11/site-packages/angr
Pip version angr 9.2.74
Git info:
        Current commit da5e2b9755125aae555fb82dd93e3e16b6c04526 from branch build-full-dataset
Could not resolve tracking branch or remote info!
######## ailment #########
Python found it in /home/doug/projects/XXX/venv/lib/python3.11/site-packages/ailment
Pip version ailment 9.2.74
Git info:
        Current commit da5e2b9755125aae555fb82dd93e3e16b6c04526 from branch build-full-dataset
Could not resolve tracking branch or remote info!
######## cle #########
Python found it in /home/doug/projects/XXX/venv/lib/python3.11/site-packages/cle
Pip version cle 9.2.74
Git info:
        Current commit da5e2b9755125aae555fb82dd93e3e16b6c04526 from branch build-full-dataset
Could not resolve tracking branch or remote info!
######## pyvex #########
Python found it in /home/doug/projects/XXX/venv/lib/python3.11/site-packages/pyvex
Pip version pyvex 9.2.74
Git info:
        Current commit da5e2b9755125aae555fb82dd93e3e16b6c04526 from branch build-full-dataset
Could not resolve tracking branch or remote info!
######## claripy #########
Python found it in /home/doug/projects/XXX/venv/lib/python3.11/site-packages/claripy
Pip version claripy 9.2.74
Git info:
        Current commit da5e2b9755125aae555fb82dd93e3e16b6c04526 from branch build-full-dataset
Could not resolve tracking branch or remote info!
######## archinfo #########
Python found it in /home/doug/projects/XXX/venv/lib/python3.11/site-packages/archinfo
Pip version archinfo 9.2.74
Git info:
        Current commit da5e2b9755125aae555fb82dd93e3e16b6c04526 from branch build-full-dataset
Could not resolve tracking branch or remote info!
######## z3 #########
Python found it in /home/doug/projects/XXX/venv/lib/python3.11/site-packages/z3
Pip version z3-solver 4.10.2.0
Git info:
        Current commit da5e2b9755125aae555fb82dd93e3e16b6c04526 from branch build-full-dataset
Could not resolve tracking branch or remote info!
######## unicorn #########
Python found it in /home/doug/projects/XXX/venv/lib/python3.11/site-packages/unicorn
Pip version unicorn 2.0.1.post1
Git info:
        Current commit da5e2b9755125aae555fb82dd93e3e16b6c04526 from branch build-full-dataset
Could not resolve tracking branch or remote info!
######### Native Module Info ##########
angr: <CDLL '/home/doug/projects/XXX/venv/lib/python3.11/site-packages/angr/state_plugins/../lib/angr_native.so', handle 2061510 at 0x7f17fa79b710>
unicorn: <CDLL '/home/doug/projects/XXX/venv/lib/python3.11/site-packages/unicorn/lib/libunicorn.so.2', handle 1a615d0 at 0x7f17fdd87790>
pyvex: <cffi.api._make_ffi_library.<locals>.FFILibrary object at 0x7f17fe96b910>
z3: <CDLL '/home/doug/projects/XXX/venv/lib/python3.11/site-packages/z3/lib/libz3.so', handle 13e59f0 at 0x7f180066dc90>

Additional context

No response

Specify license

pypcode/setup.py

Line 34 in 76205b4

license='GPL',

The license tag says GPL. Could you be more specific, e. g., if it's GPLv1, GPLv1+, etc.?

Thanks

Make setting context variables nicer

You can do it today, but you need to use the lower level csleigh API. Add a nicer interface for it.

Startup time is incredibly slow

Lazy bindings generation
Big XML files loaded; switch to binary XML?

How to specify architecture in the pcode engine?

Question

I have tried the angr examples and I'm finding an error related to architecture mapping, so I would like to know the correct way to specify target architecture when using angr with the pcode engine.

For example, with the 0ctf_trace example (https://github.com/angr/angr-examples/tree/master/examples/0ctf_trace), when I run solve.py I have no error. But when I modify it to use the pcode engine instead of VEX by adding "engine=angr.engines.UberEnginePcode" to the project constructor, I see this error:

ERROR    | 2023-08-21 16:46:49,214 | angr.engines.pcode.lifter | Unknown mapping of MIPS32 to pcode languge id

The problem seems to be that the Project constructor has load_options that specify arch as 'mipsel'. I can't delete this argument because an architecture is required for the blob backend. But 'mipsel' is apparently not correct for pypcode.

Similarly, I can run the android_arm_license_validation example (https://github.com/angr/angr-examples/tree/master/examples/android_arm_license_validation) using VEX, but when I modify solve.py to use the pcode engine I see this error:

ERROR    | 2023-08-21 16:52:25,375 | angr.engines.pcode.lifter | Unknown mapping of ARMEL to pcode languge id

In this case there is no arch parameter, and angr has determined the architecture to be ARM:

>>> b = angr.Project("./validate", load_options = load_options, auto_load_libs=False)
>>> state = b.factory.blank_state(addr=0x401760)
>>> state.arch
<Arch ARMEL (LE)>

Should I be using the arch parameter to set a pcode language ID?

Any assistance much appreciated.

Version info:
Host: Ubuntu20 (Linux 5.4.0-156-generic #173-Ubuntu SMP Tue Jul 11 07:25:22 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux)
python: 3.8.10
angr, cle, claripy, pyvex, 9.2.64
pypcode: 1.1.2
capstone: 5.0.0.post1
cffi: 1.15.1
pycparser: 2.21

Provide wheel for Apple Si

skipping over data?

Would it be possible to skip over data that is mixed with code (ARM) instead of returning?

Looking at incrementing the offset by the default instruction alignment here instead of the break:

pypcode/pypcode/native/csleigh.cc

Lines 336 to 340 in b0b91c3

 } catch (BadDataError &e) { 

 res->updateWithException(e, addr); 

 break; 

 } 

 }

Apple Silicon support

Hi, this project is awesome and thanks for the work! I have compiled this project on the Apple Silicon successfully by adding this definition in Ghidra_9.2.3_build.

https://github.com/NationalSecurityAgency/ghidra/blob/4e16b3aa3a649b87a54a6e43a5c01360fd255a83/Ghidra/Features/Decompiler/src/decompile/cpp/types.h#L184

#if defined (__APPLE_CC__) && defined (__aarch64__)
#define HOST_ENDIAN 0
typedef unsigned int uintm;
typedef int intm;
typedef unsigned long uint8;
typedef long int8;
typedef unsigned int uint4;
typedef int int4;
typedef unsigned short uint2;
typedef short int2;
typedef unsigned char uint1;
typedef char int1;
typedef uint8 uintp;
#endif

So, could you please help sync with the upstream code so that cursed Apple Silicon users can benefit?

Handle parsing errors more gracefully

When failing to load SLA files, everything comes crashing down.

pypcode/pypcode/native/csleigh.cc

Line 255 in b5be395

// FIXME: try/catch XmlError

	} catch (BadDataError &e) {
	res->updateWithException(e, addr);
	break;
	}
	}

angr / pypcode Goto Github PK

pypcode's Issues

Description

Alternatives

Additional context

Description

Alternatives

Additional context

Description

Steps to reproduce the bug

Environment

Additional context

Description

Steps to reproduce the bug

Environment

Additional context

Description

Alternatives

Additional context

Description

Steps to reproduce the bug

Environment

Additional context

Description

Alternatives

Additional context

Description

Alternatives

Additional context

Description

Alternatives

Additional context

Description

Alternatives

Additional context

Description

Steps to reproduce the bug

Environment

Additional context

Description

Alternatives

Additional context

Description

Steps to reproduce the bug

Environment

Additional context

Question

Description

Alternatives

Additional context

Question

Description

Steps to reproduce the bug

Environment

Additional context

Description

Steps to reproduce the bug

Environment

Additional context

Question

Recommend Projects

Recommend Topics

Recommend Org