Comments (7)
Regarding the RAM consumption. This is because in ABV we have to keep track of every store access. All of them return a new symbolic memory state (with a SSA form). This is how store
nodes in SMT work but in our case, to be able to provide such state, we have to keep everything in memory which is quite explosive. We still didn't find an optimization for this case. That's why I don't recommend to use MEMORY_ARRAY on a whole part of the memory :)
memory_0 = {} // fresh memory.
memory_1 = store(memory_0, 0x1000, 0x41) // store 0x41 at addr 0x1000 and returns the new memory state
memory_2 = store(memory_1, 0x1001, 0x42) // store 0x42 at addr 0x1001 and returns the new memory state
memory_3 = store(memory_2, 0x1002, 0x43) // store 0x43 at addr 0x1002 and returns the new memory state
memory_4 = store(memory_3, 0x1003, 0x44) // store 0x44 at addr 0x1003 and returns the new memory state
from triton.
There is no synchronization between the symbolic memory state and the concrete when MEMORY_ARRAY
is enabled. As soon as MEMORY_ARRAY
is enabled, the memory state starts from zero. So, every previous setConcreteMemoryAreaValue
are ignored. That's why it works if you set the concrete memory after enabling MEMORY_ARRAY
.
For the RAM consumption, it is known issue when using MEMORY_ARRAY
. Can you try to only symbolize your data instead of the whole code?
from triton.
I was looking at the examples, shouldn't the call to setConcreteMemoryAreaValue
do what I need (set the contents for a loaded segment without making it symbolic)?
from triton.
maybe something like this?
from triton import *
# code / data you don't want to reason symbolically on.
code = [
b"\x0c\x00\x9f\xe5", # 0x801A48: ldr r0, [pc, #0xc]
b"\x00\x00\x9a\xe7", # 0x801A4c: ldr r0, [sl, r0]
b"\x00\x00\xa0\xe1", # 0x801A50: mov r0, r0
]
# code / data you really want to reason symbolically on.
data = [
b"\x41\x41\x41\x41", # 0x801A54: just data
b"\x42\x42\x42\x42", # 0x801A58: just data
b"\x43\x43\x43\x43", # 0x801A5C: just data
]
ctx = TritonContext(ARCH.ARM32)
# Code is concrete
ctx.setConcreteMemoryAreaValue(0x801A48, b"".join(code))
# Data is symbolic
ctx.setMode(MODE.MEMORY_ARRAY, True)
ctx.setConcreteMemoryAreaValue(0x801A54, b"".join(data))
# Set concrete register values
ctx.setConcreteRegisterValue(ctx.registers.pc, 0x801A48)
for op in code:
inst = Instruction(op)
ctx.processing(inst)
print(inst)
print('r0 = ', hex(ctx.getConcreteRegisterValue(ctx.registers.r0)))
for e in inst.getSymbolicExpressions():
print(e)
print()
from triton.
Hm, but what if I don't want to have symbolic representation for any of the data I'm defining on this step? E.g., while I'm loading the whole ELF I simply want to load all the sections (.text, .data, ...), after that I want to be able to enable the ctx.setMode(MODE.MEMORY_ARRAY, True)
so I could enable symbolic memory modelling for the user-provided input.
Right now what would happen is enabling ctx.setMode(MODE.MEMORY_ARRAY, True)
after setConcreteMemoryAreaValue
makes the instruction ldr r0, [pc, #0xc]
to read zero-value, while (as far as I can tell) it should be 0x43434343
.
Here is the full example. Judging by what ctx.getConcreteMemoryAreaValue(CODE_BASE, 24)
returns, all the data is here, but executing the instruction still returns 0.
from triton import TritonContext, ARCH, Instruction, EXCEPTION, MODE, AST_NODE
def main():
CODE_BASE = 0x801A48
code = [
b"\x0c\x00\x9f\xe5", # 0x801A48: ldr r0, [pc, #0xc]
b"\x00\x00\x9a\xe7", # 0x801A4c: ldr r0, [sl, r0]
b"\x00\x00\xa0\xe1", # 0x801A50: mov r0, r0
b"\x41\x41\x41\x41", # 0x801A54: just data
b"\x42\x42\x42\x42", # 0x801A58: just data
b"\x43\x43\x43\x43", # 0x801A5C: just data
]
ctx = TritonContext(ARCH.ARM32)
# Define the code segment
ctx.setConcreteMemoryAreaValue(CODE_BASE, b"".join(code))
ctx.setMode(MODE.MEMORY_ARRAY, True)
ctx.concretizeAllMemory()
ctx.concretizeAllRegister()
print(ctx.getConcreteMemoryAreaValue(CODE_BASE, 24))
# Setup the registers
ctx.setConcreteRegisterValue(ctx.registers.pc, CODE_BASE)
ctx.setThumb(False)
pc = CODE_BASE
while pc <= 0x801A50:
op = ctx.getConcreteMemoryAreaValue(pc, 4)
ins = Instruction(pc, op)
if not ins.getDisassembly():
ctx.disassembly(ins)
opcode = ins.getOpcode()
mnemonic = ins.getType()
ret = ctx.buildSemantics(ins)
if ret == EXCEPTION.FAULT_UD:
print(f"[-] Instruction is not supported: {ins}")
break
print(f"[+] {ins}")
print(f"[+] r0 = {ctx.getConcreteRegisterValue(ctx.registers.r0):#010x}")
print(f"[+] pc = {ctx.getConcreteRegisterValue(ctx.registers.pc):#010x}")
print("")
pc = ctx.getConcreteRegisterValue(ctx.registers.pc)
print("[+] Instruction emulation done.")
if __name__ == "__main__":
main()
from triton.
Yes I understand your problem but the point is that the symbolic memory model of Triton has two different paradigms that can't communicate between each other. The SMT logic behind is either BV
or ABV
. When MEMORY_ARRAY
is off, BV
logic is used and it relies on the concrete state. But as soon as ABV
is used, there is no more concrete state for memory, ABV has its own memory model (which is symbolic).
When setting ABV
(e.g ctx.setMode(MODE.MEMORY_ARRAY, True)
), a fresh symbolic memory is created and then each store are saved as symbolic expression into the memory model of ABV
. Example below.
memory_0 = {} // fresh memory.
memory_1 = store(memory_0, 0x1000, 0x41) // store 0x41 at addr 0x1000 and returns the new memory state
memory_2 = store(memory_1, 0x1001, 0x42) // store 0x42 at addr 0x1001 and returns the new memory state
memory_3 = store(memory_2, 0x1002, 0x43) // store 0x43 at addr 0x1002 and returns the new memory state
memory_4 = store(memory_3, 0x1003, 0x44) // store 0x44 at addr 0x1003 and returns the new memory state
...
So basically what happens (see [0, 1, 2, 3]):
from triton import TritonContext, ARCH, Instruction, EXCEPTION, MODE, AST_NODE
def main():
CODE_BASE = 0x801A48
code = [
b"\x0c\x00\x9f\xe5", # 0x801A48: ldr r0, [pc, #0xc]
b"\x00\x00\x9a\xe7", # 0x801A4c: ldr r0, [sl, r0]
b"\x00\x00\xa0\xe1", # 0x801A50: mov r0, r0
b"\x41\x41\x41\x41", # 0x801A54: just data
b"\x42\x42\x42\x42", # 0x801A58: just data
b"\x43\x43\x43\x43", # 0x801A5C: just data
]
ctx = TritonContext(ARCH.ARM32)
# Define the code segment
ctx.setConcreteMemoryAreaValue(CODE_BASE, b"".join(code))
# [0] Before this point you were in BV logic, so the concrete state are stored as expected.
####################################################
ctx.setMode(MODE.MEMORY_ARRAY, True)
# [1] Now the memory model has changed, we switched to ABV, so we create a fresh memory.
# [2] It returns the correct answer because you requested the concrete state but not the symbolic one.
# From here, you have a desynchronization between your symbolic and your concrete memory state.
print(ctx.getConcreteMemoryAreaValue(CODE_BASE, 24))
[...]
# [3] From this point, every interpreted `load / store` uses the ABV memory model which is fresh (so 0).
while pc <= 0x801A50:
op = ctx.getConcreteMemoryAreaValue(pc, 4)
[...]
In ABV, to keep the synchro between the concrete and the symbolic state you should enable MEMORY_ARRAY
before any setConcrete*
.
from triton.
Oh, makes sense now! Thanks a lot for the clarification!
from triton.
Related Issues (20)
- symbolizeRegister result is different with setConcreteRegisterValue result ?
- How to determine if a concrete register value is known? HOT 2
- fails to build against LLVM-18
- [OSX ERROR] SystemError: initialization of triton did not return an extension module HOT 7
- lea semantic bugg ?
- LDRSW instruction error ?
- Incorrectly handled x86 instruction, rcl memory, immediate HOT 2
- Trying to collect symbolic address from MemoryAccess HOT 2
- Add Dissasembly callbacks? HOT 2
- Is it possible to symbolize arbitrary memory access before the actual processing? HOT 1
- Building errors on python3.12
- Inconsistency between liftToSMT and isSat when MEMORY_ARRAY is enabled HOT 2
- Cannot build using lastest Bitwuzla version HOT 1
- Is Anyone Build Triton In Rust bindings HOT 2
- how to get opcode size HOT 1
- Symbolic execution jump instruction path HOT 1
- How to complement condition instruction. HOT 1
- Can use triton emulator execution hook function? HOT 1
- Incompatible with Python-3.11 HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from triton.