Git Product home page Git Product logo

pypcode's Introduction

pypcode

pypi

Machine code disassembly and IR translation library for Python using the excellent SLEIGH library from the Ghidra framework.

This library was created primarily for use with angr, which provides analyses and symbolic execution of p-code.

Documentation covering how to install and use pypcode is available here.

pypcode's People

Contributors

anthraxx avatar douglasdennis avatar fantasquex avatar ltfish avatar mborgerson avatar nabar33 avatar pre-commit-ci[bot] avatar twizmwazin avatar whoismissing avatar zwimer avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

pypcode's Issues

Ship sleigh binary

Description

Allow people to compile slaspec files without needing to build sleigh

Alternatives

No response

Additional context

No response

Test macOS arm64 builds

Description

Enable tests when GitHub ships macOS arm64 runners

Alternatives

No response

Additional context

No response

Cannot print pcode

Hi:
I am trying to use pypcode to generate pcode from binary. And I always receive baddataerror as followed:

(pypcode) muqi@muqi-desktop:~/pcode_test/code_A_calls_B/angr_script$ python -m pypcode x86:LE:64:default -r /bin/true 
--------------------------------------------------------------------------------
00000000/2: JG 0x47
--------------------------------------------------------------------------------
  0: unique[0x19e0:1] = BOOL_NEGATE register[0x206:1]
  1: unique[0x19f0:1] = INT_EQUAL register[0x20b:1], register[0x207:1]
  2: unique[0x1a10:1] = BOOL_AND unique[0x19e0:1], unique[0x19f0:1]
  3: CBRANCH ram[0x47:8], unique[0x1a10:1]

** An error occured during translation: BadDataError('r0x00000002: Unable to resolve constructor',)

Tried pypcode from 1.0.0 to the current version on my virtual environment, all report the same.
Btw, The pypcode with version 0.0.2 works well for me.
Is that because I missed some settings relating to cffi?

Thanks!

my python version is 3.6.9, os version ubuntu 18.04 and here is my pip list:
(pypcode) muqi@muqi-desktop:~/pcode_test/code_A_calls_B/angr_script$ pip list

(pypcode) muqi@muqi-desktop:~/pcode_test/code_A_calls_B/angr_script$ pip list
Package       Version
------------- -------
cffi          1.14.6
pip           21.2
pkg_resources 0.0.0
pycparser     2.20
pypcode       1.0.1
setuptools    57.4.0
wheel         0.36.2

Verify loaded library version

Description

In some cases, an older csleigh version may be accidentally loaded (eg working with multiple python revs). Add version enforcement.

Alternatives

No response

Additional context

No response

Raise translation exceptions

Description

Raise exception (result.error) from Context::translate

Steps to reproduce the bug

No response

Environment

No response

Additional context

No response

Throws "terminate called after throwing an instance of 'LowLevelError'" while translating PowerPC

Language: PowerPC:LE:64:A2ALT

opcode0 = b"\x0c\x00\xfe\x41"
opcode1 = b"\x25\xde\xff\x4b"

result = context.translate(opcode0, 0)
result = context.translate(opcode1, 0)

Error thrown after executing this translations in sequence : "terminate called after throwing an instance of 'LowlevelError'"

This is what you get in ghidra:
100021d8 0c 00 fe 41 beq cr7,LAB_100021e4
$U1470:1 = COPY 0:1
$U100:4 = INT_SUB 3:4, 2:4
$U120:1 = INT_RIGHT cr7, $U100
$U1470:1 = INT_AND $U120, 1:1
CBRANCH *[ram]0x100021e4:4, $U1470
100021dc 25 de ff 4b bl Elf64_Ehdr_10000000
r2Save = COPY r2
LR = COPY 0x100021e0:8
CALL *[ram]0x10000000:4

Sync to Ghidra 10.4

Description

Sync to Ghidra 10.4

Alternatives

No response

Additional context

No response

Apple Silicon support

Hi, this project is awesome and thanks for the work! I have compiled this project on the Apple Silicon successfully by adding this definition in Ghidra_9.2.3_build.

https://github.com/NationalSecurityAgency/ghidra/blob/4e16b3aa3a649b87a54a6e43a5c01360fd255a83/Ghidra/Features/Decompiler/src/decompile/cpp/types.h#L184

#if defined (__APPLE_CC__) && defined (__aarch64__)
#define HOST_ENDIAN 0
typedef unsigned int uintm;
typedef int intm;
typedef unsigned long uint8;
typedef long int8;
typedef unsigned int uint4;
typedef int int4;
typedef unsigned short uint2;
typedef short int2;
typedef unsigned char uint1;
typedef char int1;
typedef uint8 uintp;
#endif

So, could you please help sync with the upstream code so that cursed Apple Silicon users can benefit?

Add language lookup function

Description

Currently users must enumerate all languages. Add a convenience function.

Steps to reproduce the bug

No response

Environment

No response

Additional context

No response

In ARM64 BLR translation, x30 should depends on pc

Description

When translating the instruction blr x8 with pypcode, it appears that the x30link register is set to 0x4 whereas it should be set to pc+0x4:

IMARK ram[0:4]
pc = x8
x30 = 0x0 + 0x4
call [pc]

I would have expected:

IMARK ram[0:4]
x30 = pc + 0x4
pc = x8
call [pc]

Steps to reproduce the bug

See attached script to reproduce the problem. For convenience, I used keystone to build the reproduction test case, but the same problem happens on bytes coming from real binary.

import keystone
import pypcode
ctx = pypcode.Context("AARCH64:LE:64:AppleSilicon")
asm = "blr x8"
ks = keystone.Ks(keystone.KS_ARCH_ARM64, keystone.KS_MODE_LITTLE_ENDIAN)
instr_bytes, count = ks.asm(asm, as_bytes=True)
ins = ctx.disassemble(instr_bytes).instructions[0]
print(f"{ins.mnem} {ins.body}:")
ops = ctx.translate(instr_bytes).ops
for op in ops:
    print(pypcode.PcodePrettyPrinter.fmt_op(op))

And here is this program output:

blr x8:
IMARK ram[0:4]
pc = x8
x30 = 0x0 + 0x4
call [pc]

blr.py.tgz

Environment

I am using pypcode in standalone manner, I checked out the last commit of master branch (ed59b51) on mac0S and installed it virtualenv using pip install . Everything is working fine except this unexpected translation.

Additional context

No response

skipping over data?

Would it be possible to skip over data that is mixed with code (ARM) instead of returning?

Looking at incrementing the offset by the default instruction alignment here instead of the break:

} catch (BadDataError &e) {
res->updateWithException(e, addr);
break;
}
}

Expose AddrSpace details

Description

There are more encoded details about address spaces. Expose them.

Alternatives

No response

Additional context

No response

Unable to install PyPcode 1.1.0

Description

With the new pip release, using the package raises an error:

(tmp-900339b97de62dc) ➜ /tmp python -c 'import pypcode'                         
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "./tmp-900339b97de62dc/lib/python3.10/site-packages/pypcode/__init__.py", line 13, in <module>
    from ._csleigh import ffi
ModuleNotFoundError: No module named 'pypcode._csleigh'

Steps to reproduce the bug

  1. Create a virtualenv for a fresh Python version (tested with 3.9, 3.10)
$ mktmpenv -p $(which python3.10)
  1. pip install 'pypcode'
  2. python -c 'import pypcode'

However, the same steps with PyPcode 1.0.7 works:

$ pip install 'pypcode==1.0.7' 
$ python -c 'import pypcode'  && echo "ok"

Environment

Environment:

  • Linux (Debian Stable)
  • Python 3.{9,10}
  • PyPcode: 1.1.0

Additional context

No response

How to specify architecture in the pcode engine?

Question

I have tried the angr examples and I'm finding an error related to architecture mapping, so I would like to know the correct way to specify target architecture when using angr with the pcode engine.

For example, with the 0ctf_trace example (https://github.com/angr/angr-examples/tree/master/examples/0ctf_trace), when I run solve.py I have no error. But when I modify it to use the pcode engine instead of VEX by adding "engine=angr.engines.UberEnginePcode" to the project constructor, I see this error:

ERROR    | 2023-08-21 16:46:49,214 | angr.engines.pcode.lifter | Unknown mapping of MIPS32 to pcode languge id

The problem seems to be that the Project constructor has load_options that specify arch as 'mipsel'. I can't delete this argument because an architecture is required for the blob backend. But 'mipsel' is apparently not correct for pypcode.

Similarly, I can run the android_arm_license_validation example (https://github.com/angr/angr-examples/tree/master/examples/android_arm_license_validation) using VEX, but when I modify solve.py to use the pcode engine I see this error:

ERROR    | 2023-08-21 16:52:25,375 | angr.engines.pcode.lifter | Unknown mapping of ARMEL to pcode languge id

In this case there is no arch parameter, and angr has determined the architecture to be ARM:

>>> b = angr.Project("./validate", load_options = load_options, auto_load_libs=False)
>>> state = b.factory.blank_state(addr=0x401760)
>>> state.arch
<Arch ARMEL (LE)>

Should I be using the arch parameter to set a pcode language ID?

Any assistance much appreciated.

Version info:
Host: Ubuntu20 (Linux 5.4.0-156-generic #173-Ubuntu SMP Tue Jul 11 07:25:22 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux)
python: 3.8.10
angr, cle, claripy, pyvex, 9.2.64
pypcode: 1.1.2
capstone: 5.0.0.post1
cffi: 1.15.1
pycparser: 2.21

Throws "terminate called after throwing an instance of 'std::out_of_range" translating MIPSEL opcode

Language: MIPS:LE:32:default

opcode = b"\x40\x00\x40\x10"
result = context.translate(opcode, 0)

Yields the following error: "terminate called after throwing an instance of 'std::out_of_range'
what(): Attempting to lift outside buffer range"

In ghidra, this is what you get:

    00405304 04 00 40 10     beq        v0,zero,LAB_00405318
                                                  $U240:1 = INT_EQUAL v0, 0:4
                                                  at = INT_OR at, 0:4
                                                  CBRANCH *[ram]0x405318:4, $U240

Seemingly incorrect const size in fs base calculation (x86-64)

~ % ./test_pcode.py -r x86:LE:64:default ~/test.bin
--------------------------------------------------------------------------------

00000000/9: 64 48 8b 14 25 c0 ff ff ff MOV RDX,qword ptr FS:[0xffffffc0]
--------------------------------------------------------------------------------
  0: unique[0x4f00:8] = INT_ADD register[0x110:8], const[0xffffffc0:8]
  1: unique[0xc000:8] = LOAD const[0x55e5781c94a0:8], unique[0x4f00:8]
  2: register[0x10:8] = COPY unique[0xc000:8]

test.bin contains just those 9 bytes of that single MOV instruction. Notice in the first pcode op, the const[0xffffffc0:8] is 8 bytes long, but shouldn't it be only 4? Or extended first?

Not sure if this is a sleigh bug?

Incorrectly Formatting STORE Input

Description

When pretty printing a store command, I'm getting the python representations for Varnode objects. For example:

*[ram]unique[180:4] = <pypcode.pypcode_native.Varnode object at 0x7f0d471eeb50>

Steps to reproduce the bug

import pypcode

context = pypcode.Context("MIPS:BE:32:default")
print(pypcode.PcodePrettyPrinter.fmt_op(context.translate(b'\xaf\xbc\x00\x10').ops[4]))

Environment

pypcode version: 1.1.3.dev0

python -m angr.misc.bug_report:

/home/doug/projects/XXX/venv/lib/python3.11/site-packages/angr/misc/bug_report.py:1: DeprecationWarning: the imp module is deprecated in favour of importlib and slated for removal in Python 3.12; see the module's documentation for alternative uses
  import imp
angr environment report
=============================
Date: 2023-12-11 16:08:17.962668
Running in virtual environment at /home/doug/projects/XXX/venv
Platform: linux-x86_64
Python version: 3.11.6 (main, Oct 23 2023, 22:48:54) [GCC 11.4.0]
######## angr #########
Python found it in /home/doug/projects/XXX/venv/lib/python3.11/site-packages/angr
Pip version angr 9.2.74
Git info:
        Current commit da5e2b9755125aae555fb82dd93e3e16b6c04526 from branch build-full-dataset
Could not resolve tracking branch or remote info!
######## ailment #########
Python found it in /home/doug/projects/XXX/venv/lib/python3.11/site-packages/ailment
Pip version ailment 9.2.74
Git info:
        Current commit da5e2b9755125aae555fb82dd93e3e16b6c04526 from branch build-full-dataset
Could not resolve tracking branch or remote info!
######## cle #########
Python found it in /home/doug/projects/XXX/venv/lib/python3.11/site-packages/cle
Pip version cle 9.2.74
Git info:
        Current commit da5e2b9755125aae555fb82dd93e3e16b6c04526 from branch build-full-dataset
Could not resolve tracking branch or remote info!
######## pyvex #########
Python found it in /home/doug/projects/XXX/venv/lib/python3.11/site-packages/pyvex
Pip version pyvex 9.2.74
Git info:
        Current commit da5e2b9755125aae555fb82dd93e3e16b6c04526 from branch build-full-dataset
Could not resolve tracking branch or remote info!
######## claripy #########
Python found it in /home/doug/projects/XXX/venv/lib/python3.11/site-packages/claripy
Pip version claripy 9.2.74
Git info:
        Current commit da5e2b9755125aae555fb82dd93e3e16b6c04526 from branch build-full-dataset
Could not resolve tracking branch or remote info!
######## archinfo #########
Python found it in /home/doug/projects/XXX/venv/lib/python3.11/site-packages/archinfo
Pip version archinfo 9.2.74
Git info:
        Current commit da5e2b9755125aae555fb82dd93e3e16b6c04526 from branch build-full-dataset
Could not resolve tracking branch or remote info!
######## z3 #########
Python found it in /home/doug/projects/XXX/venv/lib/python3.11/site-packages/z3
Pip version z3-solver 4.10.2.0
Git info:
        Current commit da5e2b9755125aae555fb82dd93e3e16b6c04526 from branch build-full-dataset
Could not resolve tracking branch or remote info!
######## unicorn #########
Python found it in /home/doug/projects/XXX/venv/lib/python3.11/site-packages/unicorn
Pip version unicorn 2.0.1.post1
Git info:
        Current commit da5e2b9755125aae555fb82dd93e3e16b6c04526 from branch build-full-dataset
Could not resolve tracking branch or remote info!
######### Native Module Info ##########
angr: <CDLL '/home/doug/projects/XXX/venv/lib/python3.11/site-packages/angr/state_plugins/../lib/angr_native.so', handle 2061510 at 0x7f17fa79b710>
unicorn: <CDLL '/home/doug/projects/XXX/venv/lib/python3.11/site-packages/unicorn/lib/libunicorn.so.2', handle 1a615d0 at 0x7f17fdd87790>
pyvex: <cffi.api._make_ffi_library.<locals>.FFILibrary object at 0x7f17fe96b910>
z3: <CDLL '/home/doug/projects/XXX/venv/lib/python3.11/site-packages/z3/lib/libz3.so', handle 13e59f0 at 0x7f180066dc90>

Additional context

No response

FreeBSD support

Description

Support FreeBSD builds.

@rhelmot is already working on this.

Alternatives

No response

Additional context

No response

Reduce context dependencies

Description

We can reduce amount of references to context. Helpful also for creating p-code manually.

Alternatives

No response

Additional context

No response

Build error if exporting multiple CFLAGS

When I was creating an ArchLinux package of this repository I encountered a build error due to makepkg script of ArchLinux exporting multiple CFLAGS. The error message is as follows

/usr/bin/cc "-march=x86-64 -mtune=generic -O2 -pipe -fno-plt -fexceptions         -Wp,-D_FORTIFY_SOURCE=2 -Wformat -Werror=format-security         -fstack-clash-protection -fcf-protection" -DCFFI_CDEF=1 -E -P /home/fanta/packages/python-pypcode/src/pypcode-1.0.2/pypcode/native/csleigh.h > /home/fanta/packages/python-pypcode/src/pypcode-1.0.2/pypcode/native/build/csleigh.i
cc1: error: bad value (‘x86-64 -mtune=generic -O2 -pipe -fno-plt -fexceptions         -Wp,-D_FORTIFY_SOURCE=2 -Wformat -Werror=format-security         -fstack-clash-protection -fcf-protection’) for ‘-march=’ switch
cc1: note: valid arguments to ‘-march=’ switch are: nocona core2 nehalem corei7 westmere sandybridge corei7-avx ivybridge core-avx-i haswell core-avx2 broadwell skylake skylake-avx512 cannonlake icelake-client rocketlake icelake-server cascadelake tigerlake cooperlake sapphirerapids alderlake bonnell atom silvermont slm goldmont goldmont-plus tremont knl knm x86-64 x86-64-v2 x86-64-v3 x86-64-v4 eden-x2 nano nano-1000 nano-2000 nano-3000 nano-x2 eden-x4 nano-x4 k8 k8-sse3 opteron opteron-sse3 athlon64 athlon64-sse3 athlon-fx amdfam10 barcelona bdver1 bdver2 bdver3 bdver4 znver1 znver2 znver3 btver1 btver2 native
make[2]: *** [CMakeFiles/csleigh.i.dir/build.make:73: CMakeFiles/csleigh.i] Error 1
make[2]: Leaving directory '/home/fanta/packages/python-pypcode/src/pypcode-1.0.2/pypcode/native/build'
make[1]: *** [CMakeFiles/Makefile2:142: CMakeFiles/csleigh.i.dir/all] Error 2
make[1]: *** Waiting for unfinished jobs....

In pypcode/native/CMakeLists.txt, CMake scripts are as follows

add_custom_target(
	csleigh.i ALL
	COMMAND ${CMAKE_C_COMPILER} ${CMAKE_C_FLAGS} ${PREPROCESSOR_ONLY_FLAGS} ${CMAKE_SOURCE_DIR}/csleigh.h > ${CMAKE_BINARY_DIR}/csleigh.i
	BYPRODUCTS ${CMAKE_BINARY_DIR}/csleigh.i
	VERBATIM
)

If passing CFLAGS="multiple flags", due to VERBATIM, the command will contain flags with quotes which causes a disaster.
To solve this problem, one way is using "SEPARATE_ARGUMENTS"

SEPARATE_ARGUMENTS(NEW_CMAKE_CXX_FLAGS UNIX_COMMAND ${CMAKE_C_FLAGS})
add_custom_target(
	csleigh.i ALL
	COMMAND ${CMAKE_C_COMPILER} ${NEW_CMAKE_C_FLAGS} ${PREPROCESSOR_ONLY_FLAGS} ${CMAKE_SOURCE_DIR}/csleigh.h > ${CMAKE_BINARY_DIR}/csleigh.i
	BYPRODUCTS ${CMAKE_BINARY_DIR}/csleigh.i
	VERBATIM
)

If the maintainer thinks this patch is OK, I will send a PR :)

Align with pyvex on `max_instructions` behavior

Description

If max_instructions=1 when pyvex lifts an instruction that will cause delay slot execution, it will return NoDecode. pypcode will actually return 2 instructions. Probably best to be consistent with pyvex here and always decode at most max_instructions, including delay slot execution.

Steps to reproduce the bug

No response

Environment

No response

Additional context

No response

Common registers?

Is there a way to query a common register offset (eg, program counter, frame/stack pointer, flags/status) without naming the architecture-specific name (RIP, etc)?

Add option to return only control ops

Description

In pyvex, user can specify skip_stmts to pre-filter unused statements and just look at control flow information. This would be nice to have as an option, so we don't have to filter on Python side.

Alternatives

No response

Additional context

No response

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.