Comments (7)
Hello @CcWeapon ! This is good question, I think it is there for historical reasons. I was thinking about removing this, but the case is quite rare and I didn't found it in my code base yet. To remove this I think we will need to write a few tests just to be sure everything works fine. Right now I don't see reasons why we won't handle such cases properly.. Maybe @maksfb or @rafaelauler might tell something more about it..
from bolt.
If you reaching this line, it means your input binary has a weird function with a call to an address that is inside the same function of the call. It is not a recursive call because the call target is not the function address itself, but rather a block inside it. It is doing something like this:
func:
... instructions
basicblock-a:
.... instructions
basic-block-b:
call basicblock-a
We expect control transfer inside the same function to happen through jump/branch instructions. When we detect a call to a block inside the same function, we conservatively assume we do not understand the CFG for this function and mark it as non simple, which means we will not optimize this function or change anything there.
In X86, there are some reasons you might have a call like that. For example, to get current PC address via return address:
func:
call label
label:
pop %rax ; RAX now has label address
from bolt.
Example of code exercising this case:
// This is reduced test case for BOLT containing an internal call based on
// GetCoreDump (from google core dumper).
.text
.globl getCallback
.type getCallback, %function
getCallback:
.cfi_startproc
pushq %rbp
movq %rsp, %rbp
pushq %r12
pushq %rbx
subq $288, %rsp
callq .Lnext_instr
.Lnext_instr:
popq %rax
addq $17, %rax
addq $288, %rsp
popq %rbx
popq %r12
popq %rbp
retq
.Lweird_callback:
mov $0xDEADBEEF, %rax
retq
.cfi_endproc
For this code, we probably need to mark the function as ignored, and not as non-simple. Ignore means we will not move it.
from bolt.
I agree that the case is weird, but just wondering, but are there any real blocking things that prevents us from optimizing such things? Yes, there would be fall-through BB and we will create extra entry point, but actually it seems that everything should work fine. Or it us just an assumption that something weird asm-written might going on here and it is better not to touch such things?
from bolt.
You can find art_quick_osr_stub() in AOSP, it calls .Losr_entry in itself. If I don't set this function to simple, what error might happen?
ENTRY art_quick_osr_stub
SPILL_ALL_CALLEE_SAVE_GPRS @ Spill regs (9)
vpush {s16-s31} @ Spill fp-regs (16)
.cfi_adjust_cfa_offset 64
SAVE_SIZE=(9*4+16*4)
mov r11, sp @ Save the stack pointer
.cfi_def_cfa r11, SAVE_SIZE @ CFA = r11 + SAVE_SIZE
.cfi_remember_state
mov r10, r1 @ Save size of stack
ldr r9, [r11, #(SAVE_SIZE+4)] @ Move managed thread pointer into r9
REFRESH_MARKING_REGISTER
mov r6, r2 @ Save the pc to call
sub r7, sp, #12 @ Reserve space for stack pointer,
@ JValue* result, and ArtMethod* slot.
and r7, #0xFFFFFFF0 @ Align stack pointer
mov sp, r7 @ Update stack pointer
str r11, [sp, #4] @ Save old stack pointer
str r3, [sp, #8] @ Save JValue* result
mov ip, #0
str ip, [sp] @ Store null for ArtMethod* at bottom of frame
// r11 isn't properly spilled in the osr method, so we need use DWARF expression.
// NB: the CFI must be before the call since this is the address gdb will lookup.
// NB: gdb expects that cfa_expression returns the CFA value (not address to it).
.cfi_escape /* CFA = [sp + 4] + SAVE_SIZE */ \
0x0f, 6, /* DW_CFA_def_cfa_expression(len) */ \
0x92, 13, 4, /* DW_OP_bregx(reg,offset) */ \
0x06, /* DW_OP_deref */ \
0x23, SAVE_SIZE /* DW_OP_plus_uconst(val) */
bl .Losr_entry @ Call the method
ldr r10, [sp, #8] @ Restore JValue* result
ldr sp, [sp, #4] @ Restore saved stack pointer
.cfi_def_cfa sp, SAVE_SIZE @ CFA = sp + SAVE_SIZE
ldr r4, [sp, #SAVE_SIZE] @ load shorty
ldrb r4, [r4, #0] @ load return type
cmp r4, #68 @ Test if result type char == 'D'.
beq .Losr_fp_result
cmp r4, #70 @ Test if result type char == 'F'.
beq .Losr_fp_result
strd r0, [r10] @ Store r0/r1 into result pointer
b .Losr_exit
.Losr_fp_result:
vstr d0, [r10] @ Store s0-s1/d0 into result pointer
.Losr_exit:
vpop {s16-s31}
.cfi_adjust_cfa_offset -64
pop {r4, r5, r6, r7, r8, r9, r10, r11, pc}
.Losr_entry:
.cfi_restore_state
.cfi_def_cfa r11, SAVE_SIZE @ CFA = r11 + SAVE_SIZE
sub sp, sp, r10 @ Reserve space for callee stack
sub r10, r10, #4
str lr, [sp, r10] @ Store link register per the compiler ABI
mov r2, r10
mov r1, r0
mov r0, sp
bl memcpy @ memcpy (dest r0, src r1, bytes r2)
bx r6
END art_quick_osr_stub
from bolt.
My worry with the function I pasted is that we have an unidentified code reference that we will not update / will break if we move these functions:
callq .Lnext_instr
.Lnext_instr:
popq %rax
addq $17, %rax ; <-- 17 bytes of distance after .Lnext_instr, pointing to .Lweird_callback
addq $288, %rsp
popq %rbx
popq %r12
popq %rbp
retq
.Lweird_callback: ; <-- we can't move this block
mov $0xDEADBEEF, %rax
retq
.cfi_endproc
In the example above, the immediate $17 will not be updated by BOLT, and moving this code will break it. That's why we conservatively put this function into a block list.
from bolt.
thx~
from bolt.
Related Issues (20)
- A dump about function cannot be properly disassembled when use -use-old-text HOT 2
- perf2bolt: crashes with assertion. HOT 5
- A NullPtrException after BOLT on libart.so HOT 2
- BOLT/LLVM? does not preserve prefixes on conditional branches HOT 2
- Assert error in resolveAArch64Relocation HOT 15
- ELF32 support HOT 1
- BOLT only failed with --hugify HOT 13
- Why Is registerName Executed Twice for a Symbol? HOT 1
- How do I run BOLT on a benchmark of HHVM? HOT 1
- A Problem About .got Table Updates HOT 4
- Promote BOLT to the chromium team HOT 11
- A Problem about IsSimple be used in disassemble
- Any ideas to support riscv architecture? HOT 5
- clang14 build with -update-debug-sections rewrite debug info core HOT 14
- A fatal error about -update-debug-sections HOT 3
- Any suggest to couple two functions? HOT 11
- How to use --hugify option? HOT 20
- Failed to BOLT HOT 2
- [C++] Exceptions catch leads to Segmentation fault
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from bolt.