The sysvabi64/sysvabi64.rst document contains sample

Should we create a similar bug for <a href="https://sourceware.org/bugzilla/" rel="nof

[sysvabi64] Optional `add x16, x16, :lo12: &.got.plt[N]]` for `ld -z now` about abi-aa HOT 5 OPEN

MaskRay commented on May 23, 2024

[sysvabi64] Optional `add x16, x16, :lo12: &.got.plt[N]]` for `ld -z now`

from abi-aa.

Comments (5)

appujee commented on May 23, 2024

Should we create a similar bug for binutils ?

from abi-aa.

MaskRay commented on May 23, 2024

I think some binutils aarch64 maintainers watch this repository. This issue tracker may be the best place to discuss the possible change. The owners of this repository have a better idea which binutils aarch64 maintainers should be tagged:)

from abi-aa.

nsz-arm commented on May 23, 2024

this prevents plt hooking at runtime (not just lazy binding). e.g. glibc ld.so supports LD_PROFILE and LD_AUDIT even with bind-now (then plt got is not readonly). but i think there are external tools that hook plts like ltrace (although i think it only places breakpoints on the plt, but this means it has to know the plt layout). so this is not an obvious change (might need additional elf marking).

from abi-aa.

smithp35 commented on May 23, 2024

From the perspective of just the sysvabi64 document I'd like to write down what the minimal requirements are for the PLT sequences. I think that the two extreme approaches are:

A minimalist document the calling conventions for PLT[0] and PLT[N]. No requirements on the instruction sequences, or requirements for uniform size of PLT[N]. This would maximise the freedom for linker implementers, but make it harder for disassembly/debugging tools as they would need to handle more possible implementations.
A maximalist documentation of the calling conventions and at least some of the heuristics used by disassemblers/debuggers to recognise PLT entries. This has the opposite properties of the first.

I think our approach so far has tended towards the maximalist as we've provided some dynamic tags such as DT_AARCH64_BTI_PLT and DT_AARCH64_PAC_PLT https://github.com/ARM-software/abi-aa/blob/main/aaelf64/aaelf64.rst#642dynamic-section

I personally tend towards the minimalist approach in the specifications to provide more freedom for implementers who may choose different trade-offs.

I'm wondering if there is a set of properties that we represent with a combination of dynamic tags so that we don't have to keep introducing new ones. For example:

DT_AARCH64_PLT_SIZE = <size of each PLT entry, with a special value for variable sized entries>
DT_AARCH64_NOLAZY =
DT_AARCH64_PAC_PLT = <PLT entry expects .got.plt entries to be signed>
DT_AARCH64_OS_PLT = <PLT has OS specific behaviour described by one or more tags between DT_LOOS and DT_HIOS>

I think DT_AARCH64_BTI_PLT may not be required as DT_AARCH64_PLT_SIZE should work for that purpose.

For this particular case we have documented the calling convention for PLT[0], but not PLT[N] which is strongly implied by PLT[0]. I think this could be made explicit by stating that when lazy loading is permitted, ip0 (x16) contains the address of the .got.plt entry corresponding to PLT[N], contents unspecified otherwise.

As an aside there are other possibilities for the PLT entries that could help:

For small ELF files where the .got.plt is within 1 MiB of the of the .plt then adr and ldr could be used to save an adrp.
For BTI it doesn't have to embedded in the PLT entry itself. There could be a thunk/stub for every indirect branch to a PLT entry that direct branches to the PLT which does not have a BTI. This reduces the amount of BTIs at the expense of a slightly more expensive sequence for indirect calls.

from abi-aa.

Wilco1 commented on May 23, 2024

A larger issue is that PLTs have become less efficient with the added BTI making them 20 bytes and span multiple fetch blocks. In principle we don't need BTI in PLTs. To reduce PLT uses we could always create function addresses via a GOT load. Canonical PLTs still need a BTI. An extra thunk just containing BTI would add significant overhead, so the thunk needs to load the address from the GOT and branch (making them non-lazy).

It would be feasible to remove the ADD x16 by slightly changing the PLT: the default (unlinked) GOT entry could point to a branch associated with the PLT entry which then branches to PLT[0]. So x16 contains a unique address relating to the PLT entry. The branch can be at the begin/end of the PLT sequence (eg. ADRP/LDR/BR/B) or placed after all PLTs (which would allow using BTI for canonical PLTs).

from abi-aa.

[sysvabi64] Optional `add x16, x16, :lo12: &.got.plt[N]]` for `ld -z now` about abi-aa HOT 5 OPEN

Comments (5)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent