in BinaryFunction.cpp::BinaryFunction::disassemble() <div class="snippet-clipboard

Hello <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-ur

Why skip-function when it jump to itself？ about bolt HOT 7 CLOSED

CcWeapon commented on July 3, 2024

Why skip-function when it jump to itself？

from bolt.

Comments (7)

yota9 commented on July 3, 2024

Hello @CcWeapon ! This is good question, I think it is there for historical reasons. I was thinking about removing this, but the case is quite rare and I didn't found it in my code base yet. To remove this I think we will need to write a few tests just to be sure everything works fine. Right now I don't see reasons why we won't handle such cases properly.. Maybe @maksfb or @rafaelauler might tell something more about it..

from bolt.

rafaelauler commented on July 3, 2024

If you reaching this line, it means your input binary has a weird function with a call to an address that is inside the same function of the call. It is not a recursive call because the call target is not the function address itself, but rather a block inside it. It is doing something like this:

func:
... instructions
basicblock-a:
.... instructions
basic-block-b:
call basicblock-a

We expect control transfer inside the same function to happen through jump/branch instructions. When we detect a call to a block inside the same function, we conservatively assume we do not understand the CFG for this function and mark it as non simple, which means we will not optimize this function or change anything there.

In X86, there are some reasons you might have a call like that. For example, to get current PC address via return address:

func:
call label
label:
pop %rax ; RAX now has label address

from bolt.

rafaelauler commented on July 3, 2024

Example of code exercising this case:

// This is reduced test case for BOLT containing an internal call based on
// GetCoreDump (from google core dumper).

.text
.globl getCallback
.type getCallback, %function
getCallback:
.cfi_startproc
pushq %rbp
movq %rsp, %rbp
pushq %r12
pushq %rbx
subq $288, %rsp
callq .Lnext_instr
.Lnext_instr:
popq %rax
addq $17, %rax
addq $288, %rsp
popq %rbx
popq %r12
popq %rbp
retq
.Lweird_callback:
mov $0xDEADBEEF, %rax
retq
.cfi_endproc

For this code, we probably need to mark the function as ignored, and not as non-simple. Ignore means we will not move it.

from bolt.

yota9 commented on July 3, 2024

I agree that the case is weird, but just wondering, but are there any real blocking things that prevents us from optimizing such things? Yes, there would be fall-through BB and we will create extra entry point, but actually it seems that everything should work fine. Or it us just an assumption that something weird asm-written might going on here and it is better not to touch such things?

from bolt.

CcWeapon commented on July 3, 2024

You can find art_quick_osr_stub() in AOSP, it calls .Losr_entry in itself. If I don't set this function to simple, what error might happen?

ENTRY art_quick_osr_stub
    SPILL_ALL_CALLEE_SAVE_GPRS             @ Spill regs (9)
    vpush  {s16-s31}                       @ Spill fp-regs (16)
    .cfi_adjust_cfa_offset 64
    SAVE_SIZE=(9*4+16*4)
    mov    r11, sp                         @ Save the stack pointer
    .cfi_def_cfa r11, SAVE_SIZE            @ CFA = r11 + SAVE_SIZE
    .cfi_remember_state
    mov    r10, r1                         @ Save size of stack
    ldr    r9, [r11, #(SAVE_SIZE+4)]       @ Move managed thread pointer into r9
    REFRESH_MARKING_REGISTER
    mov    r6, r2                          @ Save the pc to call
    sub    r7, sp, #12                     @ Reserve space for stack pointer,
                                           @    JValue* result, and ArtMethod* slot.
    and    r7, #0xFFFFFFF0                 @ Align stack pointer
    mov    sp, r7                          @ Update stack pointer
    str    r11, [sp, #4]                   @ Save old stack pointer
    str    r3, [sp, #8]                    @ Save JValue* result
    mov    ip, #0
    str    ip, [sp]                        @ Store null for ArtMethod* at bottom of frame
    // r11 isn't properly spilled in the osr method, so we need use DWARF expression.
    // NB: the CFI must be before the call since this is the address gdb will lookup.
    // NB: gdb expects that cfa_expression returns the CFA value (not address to it).
    .cfi_escape                            /* CFA = [sp + 4] + SAVE_SIZE */ \
      0x0f, 6,                             /* DW_CFA_def_cfa_expression(len) */ \
      0x92, 13, 4,                         /* DW_OP_bregx(reg,offset) */ \
      0x06,                                /* DW_OP_deref */ \
      0x23, SAVE_SIZE                      /* DW_OP_plus_uconst(val) */
    bl     .Losr_entry                     @ Call the method
    ldr    r10, [sp, #8]                   @ Restore JValue* result
    ldr    sp, [sp, #4]                    @ Restore saved stack pointer
    .cfi_def_cfa sp, SAVE_SIZE             @ CFA = sp + SAVE_SIZE
    ldr    r4, [sp, #SAVE_SIZE]            @ load shorty
    ldrb   r4, [r4, #0]                    @ load return type
    cmp    r4, #68                         @ Test if result type char == 'D'.
    beq    .Losr_fp_result
    cmp    r4, #70                         @ Test if result type char == 'F'.
    beq    .Losr_fp_result
    strd r0, [r10]                         @ Store r0/r1 into result pointer
    b    .Losr_exit
.Losr_fp_result:
    vstr d0, [r10]                         @ Store s0-s1/d0 into result pointer
.Losr_exit:
    vpop   {s16-s31}
    .cfi_adjust_cfa_offset -64
    pop    {r4, r5, r6, r7, r8, r9, r10, r11, pc}
.Losr_entry:
    .cfi_restore_state
    .cfi_def_cfa r11, SAVE_SIZE            @ CFA = r11 + SAVE_SIZE
    sub sp, sp, r10                        @ Reserve space for callee stack
    sub r10, r10, #4
    str lr, [sp, r10]                      @ Store link register per the compiler ABI
    mov r2, r10
    mov r1, r0
    mov r0, sp
    bl  memcpy                             @ memcpy (dest r0, src r1, bytes r2)
    bx r6
END art_quick_osr_stub

from bolt.

rafaelauler commented on July 3, 2024

My worry with the function I pasted is that we have an unidentified code reference that we will not update / will break if we move these functions:

callq .Lnext_instr
.Lnext_instr:
popq %rax
addq $17, %rax   ;  <--  17 bytes of distance after .Lnext_instr, pointing to .Lweird_callback
addq $288, %rsp
popq %rbx
popq %r12
popq %rbp
retq
.Lweird_callback:   ;  <-- we can't move this block
mov $0xDEADBEEF, %rax
retq
.cfi_endproc

In the example above, the immediate $17 will not be updated by BOLT, and moving this code will break it. That's why we conservatively put this function into a block list.

from bolt.

CcWeapon commented on July 3, 2024

thx~

from bolt.

Why skip-function when it jump to itself？ about bolt HOT 7 CLOSED

Comments (7)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent