Hi! I was experimenting with memory array mode and encountered some behaviour that

maybe something like this? <div class="highlight highlight-source-python notransla

Clarification regarding MEMORY_ARRAY mode about triton HOT 7 CLOSED

m4drat commented on June 12, 2024

Clarification regarding MEMORY_ARRAY mode

from triton.

Comments (7)

JonathanSalwan commented on June 12, 2024 1

Regarding the RAM consumption. This is because in ABV we have to keep track of every store access. All of them return a new symbolic memory state (with a SSA form). This is how store nodes in SMT work but in our case, to be able to provide such state, we have to keep everything in memory which is quite explosive. We still didn't find an optimization for this case. That's why I don't recommend to use MEMORY_ARRAY on a whole part of the memory :)

memory_0 = {} // fresh memory.
memory_1 = store(memory_0, 0x1000, 0x41) // store 0x41 at addr 0x1000 and returns the new memory state
memory_2 = store(memory_1, 0x1001, 0x42) // store 0x42 at addr 0x1001 and returns the new memory state
memory_3 = store(memory_2, 0x1002, 0x43) // store 0x43 at addr 0x1002 and returns the new memory state
memory_4 = store(memory_3, 0x1003, 0x44) // store 0x44 at addr 0x1003 and returns the new memory state

from triton.

JonathanSalwan commented on June 12, 2024

There is no synchronization between the symbolic memory state and the concrete when MEMORY_ARRAY is enabled. As soon as MEMORY_ARRAY is enabled, the memory state starts from zero. So, every previous setConcreteMemoryAreaValue are ignored. That's why it works if you set the concrete memory after enabling MEMORY_ARRAY.

For the RAM consumption, it is known issue when using MEMORY_ARRAY. Can you try to only symbolize your data instead of the whole code?

from triton.

m4drat commented on June 12, 2024

I was looking at the examples, shouldn't the call to setConcreteMemoryAreaValue do what I need (set the contents for a loaded segment without making it symbolic)?

from triton.

JonathanSalwan commented on June 12, 2024

maybe something like this?

from triton import *

# code / data you don't want to reason symbolically on.
code = [
    b"\x0c\x00\x9f\xe5",  # 0x801A48: ldr    r0, [pc, #0xc]
    b"\x00\x00\x9a\xe7",  # 0x801A4c: ldr    r0, [sl, r0]
    b"\x00\x00\xa0\xe1",  # 0x801A50: mov    r0, r0
]

# code / data you really want to reason symbolically on.
data = [
    b"\x41\x41\x41\x41",  # 0x801A54: just data
    b"\x42\x42\x42\x42",  # 0x801A58: just data
    b"\x43\x43\x43\x43",  # 0x801A5C: just data
]

ctx = TritonContext(ARCH.ARM32)

# Code is concrete
ctx.setConcreteMemoryAreaValue(0x801A48, b"".join(code))

# Data is symbolic
ctx.setMode(MODE.MEMORY_ARRAY, True)
ctx.setConcreteMemoryAreaValue(0x801A54, b"".join(data))

# Set concrete register values
ctx.setConcreteRegisterValue(ctx.registers.pc, 0x801A48)

for op in code:
    inst = Instruction(op)
    ctx.processing(inst)
    print(inst)
    print('r0 = ', hex(ctx.getConcreteRegisterValue(ctx.registers.r0)))
    for e in inst.getSymbolicExpressions():
        print(e)
    print()

from triton.

m4drat commented on June 12, 2024

Hm, but what if I don't want to have symbolic representation for any of the data I'm defining on this step? E.g., while I'm loading the whole ELF I simply want to load all the sections (.text, .data, ...), after that I want to be able to enable the ctx.setMode(MODE.MEMORY_ARRAY, True) so I could enable symbolic memory modelling for the user-provided input.

Right now what would happen is enabling ctx.setMode(MODE.MEMORY_ARRAY, True) after setConcreteMemoryAreaValue makes the instruction ldr r0, [pc, #0xc] to read zero-value, while (as far as I can tell) it should be 0x43434343.

Here is the full example. Judging by what ctx.getConcreteMemoryAreaValue(CODE_BASE, 24) returns, all the data is here, but executing the instruction still returns 0.

from triton import TritonContext, ARCH, Instruction, EXCEPTION, MODE, AST_NODE


def main():
    CODE_BASE = 0x801A48

    code = [
        b"\x0c\x00\x9f\xe5",  # 0x801A48: ldr    r0, [pc, #0xc]
        b"\x00\x00\x9a\xe7",  # 0x801A4c: ldr    r0, [sl, r0]
        b"\x00\x00\xa0\xe1",  # 0x801A50: mov    r0, r0
        b"\x41\x41\x41\x41",  # 0x801A54: just data
        b"\x42\x42\x42\x42",  # 0x801A58: just data
        b"\x43\x43\x43\x43",  # 0x801A5C: just data
    ]

    ctx = TritonContext(ARCH.ARM32)

    # Define the code segment
    ctx.setConcreteMemoryAreaValue(CODE_BASE, b"".join(code))
    ctx.setMode(MODE.MEMORY_ARRAY, True)

    ctx.concretizeAllMemory()
    ctx.concretizeAllRegister()

    print(ctx.getConcreteMemoryAreaValue(CODE_BASE, 24))

    # Setup the registers
    ctx.setConcreteRegisterValue(ctx.registers.pc, CODE_BASE)
    ctx.setThumb(False)

    pc = CODE_BASE

    while pc <= 0x801A50:
        op = ctx.getConcreteMemoryAreaValue(pc, 4)

        ins = Instruction(pc, op)

        if not ins.getDisassembly():
            ctx.disassembly(ins)

        opcode = ins.getOpcode()
        mnemonic = ins.getType()

        ret = ctx.buildSemantics(ins)

        if ret == EXCEPTION.FAULT_UD:
            print(f"[-] Instruction is not supported: {ins}")
            break

        print(f"[+] {ins}")
        print(f"[+] r0 = {ctx.getConcreteRegisterValue(ctx.registers.r0):#010x}")
        print(f"[+] pc = {ctx.getConcreteRegisterValue(ctx.registers.pc):#010x}")
        print("")

        pc = ctx.getConcreteRegisterValue(ctx.registers.pc)

    print("[+] Instruction emulation done.")


if __name__ == "__main__":
    main()

from triton.

JonathanSalwan commented on June 12, 2024

Yes I understand your problem but the point is that the symbolic memory model of Triton has two different paradigms that can't communicate between each other. The SMT logic behind is either BV or ABV. When MEMORY_ARRAY is off, BV logic is used and it relies on the concrete state. But as soon as ABV is used, there is no more concrete state for memory, ABV has its own memory model (which is symbolic).

When setting ABV (e.g ctx.setMode(MODE.MEMORY_ARRAY, True)), a fresh symbolic memory is created and then each store are saved as symbolic expression into the memory model of ABV. Example below.

memory_0 = {} // fresh memory.
memory_1 = store(memory_0, 0x1000, 0x41) // store 0x41 at addr 0x1000 and returns the new memory state
memory_2 = store(memory_1, 0x1001, 0x42) // store 0x42 at addr 0x1001 and returns the new memory state
memory_3 = store(memory_2, 0x1002, 0x43) // store 0x43 at addr 0x1002 and returns the new memory state
memory_4 = store(memory_3, 0x1003, 0x44) // store 0x44 at addr 0x1003 and returns the new memory state
...

So basically what happens (see [0, 1, 2, 3]):

from triton import TritonContext, ARCH, Instruction, EXCEPTION, MODE, AST_NODE


def main():
    CODE_BASE = 0x801A48

    code = [
        b"\x0c\x00\x9f\xe5",  # 0x801A48: ldr    r0, [pc, #0xc]
        b"\x00\x00\x9a\xe7",  # 0x801A4c: ldr    r0, [sl, r0]
        b"\x00\x00\xa0\xe1",  # 0x801A50: mov    r0, r0
        b"\x41\x41\x41\x41",  # 0x801A54: just data
        b"\x42\x42\x42\x42",  # 0x801A58: just data
        b"\x43\x43\x43\x43",  # 0x801A5C: just data
    ]

    ctx = TritonContext(ARCH.ARM32)

    # Define the code segment
    ctx.setConcreteMemoryAreaValue(CODE_BASE, b"".join(code))
       
    # [0] Before this point you were in BV logic, so the concrete state are stored as expected.
      
    ####################################################
   
    ctx.setMode(MODE.MEMORY_ARRAY, True)
    # [1] Now the memory model has changed, we switched to ABV, so we create a fresh memory.
   
    # [2] It returns the correct answer because you requested the concrete state but not the symbolic one.
    # From here, you have a desynchronization between your symbolic and your concrete memory state.
    print(ctx.getConcreteMemoryAreaValue(CODE_BASE, 24))

   [...]

    # [3] From this point, every interpreted `load / store` uses the ABV memory model which is fresh (so 0).
    while pc <= 0x801A50:
        op = ctx.getConcreteMemoryAreaValue(pc, 4)

        [...]

In ABV, to keep the synchro between the concrete and the symbolic state you should enable MEMORY_ARRAY before any setConcrete*.

from triton.

m4drat commented on June 12, 2024

Oh, makes sense now! Thanks a lot for the clarification!

from triton.

Clarification regarding MEMORY_ARRAY mode about triton HOT 7 CLOSED

Comments (7)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent