Git Product home page Git Product logo

Comments (7)

JonathanSalwan avatar JonathanSalwan commented on June 12, 2024 1

Regarding the RAM consumption. This is because in ABV we have to keep track of every store access. All of them return a new symbolic memory state (with a SSA form). This is how store nodes in SMT work but in our case, to be able to provide such state, we have to keep everything in memory which is quite explosive. We still didn't find an optimization for this case. That's why I don't recommend to use MEMORY_ARRAY on a whole part of the memory :)

memory_0 = {} // fresh memory.
memory_1 = store(memory_0, 0x1000, 0x41) // store 0x41 at addr 0x1000 and returns the new memory state
memory_2 = store(memory_1, 0x1001, 0x42) // store 0x42 at addr 0x1001 and returns the new memory state
memory_3 = store(memory_2, 0x1002, 0x43) // store 0x43 at addr 0x1002 and returns the new memory state
memory_4 = store(memory_3, 0x1003, 0x44) // store 0x44 at addr 0x1003 and returns the new memory state

from triton.

JonathanSalwan avatar JonathanSalwan commented on June 12, 2024

There is no synchronization between the symbolic memory state and the concrete when MEMORY_ARRAY is enabled. As soon as MEMORY_ARRAY is enabled, the memory state starts from zero. So, every previous setConcreteMemoryAreaValue are ignored. That's why it works if you set the concrete memory after enabling MEMORY_ARRAY.

For the RAM consumption, it is known issue when using MEMORY_ARRAY. Can you try to only symbolize your data instead of the whole code?

from triton.

m4drat avatar m4drat commented on June 12, 2024

I was looking at the examples, shouldn't the call to setConcreteMemoryAreaValue do what I need (set the contents for a loaded segment without making it symbolic)?

from triton.

JonathanSalwan avatar JonathanSalwan commented on June 12, 2024

maybe something like this?

from triton import *

# code / data you don't want to reason symbolically on.
code = [
    b"\x0c\x00\x9f\xe5",  # 0x801A48: ldr    r0, [pc, #0xc]
    b"\x00\x00\x9a\xe7",  # 0x801A4c: ldr    r0, [sl, r0]
    b"\x00\x00\xa0\xe1",  # 0x801A50: mov    r0, r0
]

# code / data you really want to reason symbolically on.
data = [
    b"\x41\x41\x41\x41",  # 0x801A54: just data
    b"\x42\x42\x42\x42",  # 0x801A58: just data
    b"\x43\x43\x43\x43",  # 0x801A5C: just data
]

ctx = TritonContext(ARCH.ARM32)

# Code is concrete
ctx.setConcreteMemoryAreaValue(0x801A48, b"".join(code))

# Data is symbolic
ctx.setMode(MODE.MEMORY_ARRAY, True)
ctx.setConcreteMemoryAreaValue(0x801A54, b"".join(data))

# Set concrete register values
ctx.setConcreteRegisterValue(ctx.registers.pc, 0x801A48)

for op in code:
    inst = Instruction(op)
    ctx.processing(inst)
    print(inst)
    print('r0 = ', hex(ctx.getConcreteRegisterValue(ctx.registers.r0)))
    for e in inst.getSymbolicExpressions():
        print(e)
    print()

from triton.

m4drat avatar m4drat commented on June 12, 2024

Hm, but what if I don't want to have symbolic representation for any of the data I'm defining on this step? E.g., while I'm loading the whole ELF I simply want to load all the sections (.text, .data, ...), after that I want to be able to enable the ctx.setMode(MODE.MEMORY_ARRAY, True) so I could enable symbolic memory modelling for the user-provided input.

Right now what would happen is enabling ctx.setMode(MODE.MEMORY_ARRAY, True) after setConcreteMemoryAreaValue makes the instruction ldr r0, [pc, #0xc] to read zero-value, while (as far as I can tell) it should be 0x43434343.

Here is the full example. Judging by what ctx.getConcreteMemoryAreaValue(CODE_BASE, 24) returns, all the data is here, but executing the instruction still returns 0.

from triton import TritonContext, ARCH, Instruction, EXCEPTION, MODE, AST_NODE


def main():
    CODE_BASE = 0x801A48

    code = [
        b"\x0c\x00\x9f\xe5",  # 0x801A48: ldr    r0, [pc, #0xc]
        b"\x00\x00\x9a\xe7",  # 0x801A4c: ldr    r0, [sl, r0]
        b"\x00\x00\xa0\xe1",  # 0x801A50: mov    r0, r0
        b"\x41\x41\x41\x41",  # 0x801A54: just data
        b"\x42\x42\x42\x42",  # 0x801A58: just data
        b"\x43\x43\x43\x43",  # 0x801A5C: just data
    ]

    ctx = TritonContext(ARCH.ARM32)

    # Define the code segment
    ctx.setConcreteMemoryAreaValue(CODE_BASE, b"".join(code))
    ctx.setMode(MODE.MEMORY_ARRAY, True)

    ctx.concretizeAllMemory()
    ctx.concretizeAllRegister()

    print(ctx.getConcreteMemoryAreaValue(CODE_BASE, 24))

    # Setup the registers
    ctx.setConcreteRegisterValue(ctx.registers.pc, CODE_BASE)
    ctx.setThumb(False)

    pc = CODE_BASE

    while pc <= 0x801A50:
        op = ctx.getConcreteMemoryAreaValue(pc, 4)

        ins = Instruction(pc, op)

        if not ins.getDisassembly():
            ctx.disassembly(ins)

        opcode = ins.getOpcode()
        mnemonic = ins.getType()

        ret = ctx.buildSemantics(ins)

        if ret == EXCEPTION.FAULT_UD:
            print(f"[-] Instruction is not supported: {ins}")
            break

        print(f"[+] {ins}")
        print(f"[+] r0 = {ctx.getConcreteRegisterValue(ctx.registers.r0):#010x}")
        print(f"[+] pc = {ctx.getConcreteRegisterValue(ctx.registers.pc):#010x}")
        print("")

        pc = ctx.getConcreteRegisterValue(ctx.registers.pc)

    print("[+] Instruction emulation done.")


if __name__ == "__main__":
    main()

from triton.

JonathanSalwan avatar JonathanSalwan commented on June 12, 2024

Yes I understand your problem but the point is that the symbolic memory model of Triton has two different paradigms that can't communicate between each other. The SMT logic behind is either BV or ABV. When MEMORY_ARRAY is off, BV logic is used and it relies on the concrete state. But as soon as ABV is used, there is no more concrete state for memory, ABV has its own memory model (which is symbolic).

When setting ABV (e.g ctx.setMode(MODE.MEMORY_ARRAY, True)), a fresh symbolic memory is created and then each store are saved as symbolic expression into the memory model of ABV. Example below.

memory_0 = {} // fresh memory.
memory_1 = store(memory_0, 0x1000, 0x41) // store 0x41 at addr 0x1000 and returns the new memory state
memory_2 = store(memory_1, 0x1001, 0x42) // store 0x42 at addr 0x1001 and returns the new memory state
memory_3 = store(memory_2, 0x1002, 0x43) // store 0x43 at addr 0x1002 and returns the new memory state
memory_4 = store(memory_3, 0x1003, 0x44) // store 0x44 at addr 0x1003 and returns the new memory state
...

So basically what happens (see [0, 1, 2, 3]):

from triton import TritonContext, ARCH, Instruction, EXCEPTION, MODE, AST_NODE


def main():
    CODE_BASE = 0x801A48

    code = [
        b"\x0c\x00\x9f\xe5",  # 0x801A48: ldr    r0, [pc, #0xc]
        b"\x00\x00\x9a\xe7",  # 0x801A4c: ldr    r0, [sl, r0]
        b"\x00\x00\xa0\xe1",  # 0x801A50: mov    r0, r0
        b"\x41\x41\x41\x41",  # 0x801A54: just data
        b"\x42\x42\x42\x42",  # 0x801A58: just data
        b"\x43\x43\x43\x43",  # 0x801A5C: just data
    ]

    ctx = TritonContext(ARCH.ARM32)

    # Define the code segment
    ctx.setConcreteMemoryAreaValue(CODE_BASE, b"".join(code))
       
    # [0] Before this point you were in BV logic, so the concrete state are stored as expected.
      
    ####################################################
   
    ctx.setMode(MODE.MEMORY_ARRAY, True)
    # [1] Now the memory model has changed, we switched to ABV, so we create a fresh memory.
   
    # [2] It returns the correct answer because you requested the concrete state but not the symbolic one.
    # From here, you have a desynchronization between your symbolic and your concrete memory state.
    print(ctx.getConcreteMemoryAreaValue(CODE_BASE, 24))

   [...]

    # [3] From this point, every interpreted `load / store` uses the ABV memory model which is fresh (so 0).
    while pc <= 0x801A50:
        op = ctx.getConcreteMemoryAreaValue(pc, 4)

        [...]

In ABV, to keep the synchro between the concrete and the symbolic state you should enable MEMORY_ARRAY before any setConcrete*.

from triton.

m4drat avatar m4drat commented on June 12, 2024

Oh, makes sense now! Thanks a lot for the clarification!

from triton.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.