Git Product home page Git Product logo

Comments (21)

Amanieu avatar Amanieu commented on August 19, 2024 2

OK I figured out the root cause of the problem. It's because LLVM is clobbering the link register on the tail call:

0000000000006924 <_ZN3std9panicking11begin_panic28_$u7b$$u7b$closure$u7d$$u7d$17h3fd05c3e3e7919b0E>:
    6924:	d100c3ff 	sub	sp, sp, #0x30
    6928:	aa0003e8 	mov	x8, x0
    692c:	f940010a 	ldr	x10, [x8]
    6930:	f9400509 	ldr	x9, [x8, #8]
    6934:	f9000bea 	str	x10, [sp, #16]
    6938:	f9000fe9 	str	x9, [sp, #24]
    693c:	f9400bea 	ldr	x10, [sp, #16]
    6940:	f9400fe9 	ldr	x9, [sp, #24]
    6944:	910003e0 	mov	x0, sp
    6948:	f90003ea 	str	x10, [sp]
    694c:	f90007e9 	str	x9, [sp, #8]
    6950:	f9400903 	ldr	x3, [x8, #16]
    6954:	f00002a1 	adrp	x1, 5d000 <GCC_except_table144+0x159b4>
    6958:	91248021 	add	x1, x1, #0x920
    695c:	aa1f03e2 	mov	x2, xzr
    6960:	52800028 	mov	w8, #0x1                   	// #1
    6964:	12000104 	and	w4, w8, #0x1
    6968:	2a1f03e8 	mov	w8, wzr
    696c:	12000105 	and	w5, w8, #0x1
    6970:	940060d9 	bl	1ecd4 <_ZN3std9panicking20rust_panic_with_hook17h06b7ef7eb6d8944bE>

Note how the incoming LR value is not saved anywhere and is later clobbered by the BL instruction.

Also, my earlier statement was incorrect, _ZN3std9panicking11begin_panic28_$u7b$$u7b$closure$u7d$$u7d$17h3fd05c3e3e7919b0E does have unwinding information:

00000088 0000000000000010 00000000 CIE
  Version:               1
  Augmentation:          "zR"
  Code alignment factor: 1
  Data alignment factor: -4
  Return address column: 30
  Augmentation data:     1b
  DW_CFA_def_cfa: r31 (sp) ofs 0

0000009c 0000000000000010 00000018 FDE cie=00000088 pc=0000000000006924..0000000000006974
  DW_CFA_advance_loc: 4 to 0000000000006928
  DW_CFA_def_cfa_offset: 48

This unwind info is claiming that the return address is present in the link register (this is the default if not overridden), but this is clearly incorrect in this case since the link register has been clobbered.

So the unwinder is behaving correctly, it's just that LLVM is emitting incorrect unwind info (or that it should be preserving the caller's link register).

from rust.

saethlin avatar saethlin commented on August 19, 2024

I think this was already reported? #123686

from rust.

cuviper avatar cuviper commented on August 19, 2024

@saethlin I mentioned that one at the end:

This is similar to #123686, but that proposed fix only applies to Windows.

and AFAICS that report doesn't depend on -Cpanic=abort either.

from rust.

cuviper avatar cuviper commented on August 19, 2024

It might be a duplicate of #121817, cc @wesleywiser

from rust.

nbdd0121 avatar nbdd0121 commented on August 19, 2024

#97235 adds a lint (which isn't even executed with panic=abort), and should have no codegen changes.

The current behaviour doesn't surprise me since we neither force frame pointers, nor force unwind tables. To generate the stack trace usually what happens is that eh_frame to unwind the frames is used if it exists, otherwise frame pointers are used to continue unwind the frames. We build the standard library with -Cpanic=unwind so the eh_frame is available, but past begin_panic nothing is available anymore.

Adding -Cforce-frame-pointers=yes allows termination (still no traces beyond begin_panic, and adding -Cforce-unwind-tables=yes produces a property stack trace.

from rust.

cuviper avatar cuviper commented on August 19, 2024

#97235 adds a lint (which isn't even executed with panic=abort), and should have no codegen changes.

OK, I don't know. The bisected nightly works fine when I build it myself, which implies that it's sensitive to the particular configuration, but that makes it hard to try and narrow this further. Or if the bug only arose there because of happenstance code changes, the root cause might not be in that range at all.

from rust.

saethlin avatar saethlin commented on August 19, 2024

I can reproduce the infinite backtrace with 2020-03-01 and the target is gone in 2020-01-01 so I suspect that this bug has been in the target since we started distributing artifacts for it.

I cannot reproduce the infinite backtrace printing with -Zbuild-std. I suspect the problem has something to do with the fact that the standard library is precompiled with -Cpanic=unwind then we're linking to it with -Cpanic=abort, and I don't know how to imitate that with -Zbuild-std.

The behavior of the MIR inliner is sensitive to the stable crate hash, so if the behavior of the MIR inliner can impact whether the fatal optimization happens, it may not be possible to bisect this.

from rust.

saethlin avatar saethlin commented on August 19, 2024

The current behaviour doesn't surprise me since we neither force frame pointers, nor force unwind tables. To generate the stack trace usually what happens is that eh_frame to unwind the frames is used if it exists, otherwise frame pointers are used to continue unwind the frames. We build the standard library with -Cpanic=unwind so the eh_frame is available, but past begin_panic nothing is available anymore.

To be clear, the behavior seems wrong to me even using -Zbuild-std:

ubuntu@ip-172-31-13-98:~/demo$ RUSTFLAGS=-Cpanic=abort RUST_BACKTRACE=full cargo +nightly r -Zbuild-std=std,panic_abort --target=aarch64-unknown-linux-gnu
    Finished `dev` profile [unoptimized + debuginfo] target(s) in 0.04s
     Running `target/aarch64-unknown-linux-gnu/debug/demo`
thread 'main' panicked at src/main.rs:1:13:
explicit panic
stack backtrace:
   0:     0xaaaabefe12a8 - std::backtrace_rs::backtrace::libunwind::trace::h3eb3677c648e214e
                               at /home/ubuntu/.rustup/toolchains/nightly-aarch64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/../../backtrace/src/backtrace/libunwind.rs:105:5
   1:     0xaaaabefe12a8 - std::backtrace_rs::backtrace::trace_unsynchronized::h6b663bf10ab98a13
                               at /home/ubuntu/.rustup/toolchains/nightly-aarch64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/../../backtrace/src/backtrace/mod.rs:66:5
Aborted (core dumped)

Buggy in a different way, but still buggy, so maybe it's not worth splitting hairs.

from rust.

Amanieu avatar Amanieu commented on August 19, 2024

The current behaviour doesn't surprise me since we neither force frame pointers, nor force unwind tables. To generate the stack trace usually what happens is that eh_frame to unwind the frames is used if it exists, otherwise frame pointers are used to continue unwind the frames. We build the standard library with -Cpanic=unwind so the eh_frame is available, but past begin_panic nothing is available anymore.

Shouldn't the unwinder just give up if it reaches a frame with no unwinding information? This seems to be the issue here since the loop start when the unwinder reaches the first frame compiled with panic=abort.

This target uses libgcc as the unwinder, this might be worth looking more deeply into.

A truncated backtrace is normal if everything is compiled with panic=abort since we don't emit eh_frame by default so the unwinder doesn't know how to unwind the stack. What shouldn't be happening is that the unwinder ends up in an infinite loop since that indicates invalid unwind information.

Also note that the call into panic is a non-returning tail call. This has caused issues in the past on the ARM target (#69231) since LLVM would assume the LR value is dead and leave it dangling. If this happens to hold a function pointer to std::panicking::begin_panic then this could lead to the infinite loop we are seeing.

from rust.

workingjubilee avatar workingjubilee commented on August 19, 2024

Huh? I thought AArch64 code was being compiled with frame pointers?

from rust.

apiraino avatar apiraino commented on August 19, 2024

WG-prioritization assigning priority (Zulip discussion).

@rustbot label -I-prioritize +P-high

from rust.

nbdd0121 avatar nbdd0121 commented on August 19, 2024

Shouldn't the unwinder just give up if it reaches a frame with no unwinding information?

For backtracing, libgcc does have a mechanism to use a fallback mechanism if unwind info is not available. Although I just checked that it is implemented for PPC only, so this wasn't the case for aarch64.

This unfortunately makes me unable to explain why the looping behaviour disappears when we force frame pointer though.

Huh? I thought AArch64 code was being compiled with frame pointers?

Doesn't look like it. https://github.com/rust-lang/rust/blob/master/compiler/rustc_target/src/spec/targets/aarch64_unknown_linux_gnu.rs

Apparently we have frame pointers enabled for non-leaf functions on Apple platforms, but not Linux.

from rust.

Amanieu avatar Amanieu commented on August 19, 2024

Another interesting data point. When building for aarch64-unknown-linux-musl, I get this output:

stack backtrace:
   0:           0x419218 - std::backtrace_rs::backtrace::libunwind::trace::h41b2803078d25074
                               at /rustc/c67326b063bd27ed04f306ba2e372cd92e0a8751/library/std/src/../../backtrace/src/backtrace/libunwind.rs:105:5
   1:           0x419218 - std::backtrace_rs::backtrace::trace_unsynchronized::hb38469496ae402aa
                               at /rustc/c67326b063bd27ed04f306ba2e372cd92e0a8751/library/std/src/../../backtrace/src/backtrace/mod.rs:66:5
   2:           0x419218 - std::sys_common::backtrace::_print_fmt::h0cf53d01d01865f9
                               at /rustc/c67326b063bd27ed04f306ba2e372cd92e0a8751/library/std/src/sys_common/backtrace.rs:68:5
   3:           0x419218 - <std::sys_common::backtrace::_print::DisplayBacktrace as core::fmt::Display>::fmt::h2016a34b1ca538fd
                               at /rustc/c67326b063bd27ed04f306ba2e372cd92e0a8751/library/std/src/sys_common/backtrace.rs:44:22
   4:           0x44476c - core::fmt::rt::Argument::fmt::h96d6bfc4fa1065a9
                               at /rustc/c67326b063bd27ed04f306ba2e372cd92e0a8751/library/core/src/fmt/rt.rs:142:9
   5:           0x44476c - core::fmt::write::h3a1e5be1e1cfe579
                               at /rustc/c67326b063bd27ed04f306ba2e372cd92e0a8751/library/core/src/fmt/mod.rs:1153:17
   6:           0x4173e8 - std::io::Write::write_fmt::hcc53d7d63a73e855
                               at /rustc/c67326b063bd27ed04f306ba2e372cd92e0a8751/library/std/src/io/mod.rs:1843:15
   7:           0x419024 - std::sys_common::backtrace::_print::hae3e089c5d3e39b2
                               at /rustc/c67326b063bd27ed04f306ba2e372cd92e0a8751/library/std/src/sys_common/backtrace.rs:47:5
   8:           0x419024 - std::sys_common::backtrace::print::h1182a252cb3edf9c
                               at /rustc/c67326b063bd27ed04f306ba2e372cd92e0a8751/library/std/src/sys_common/backtrace.rs:34:9
   9:           0x41a474 - std::panicking::default_hook::{{closure}}::he5c426f80ff73911
  10:           0x41a114 - std::panicking::default_hook::h5274ad032677b37f
                               at /rustc/c67326b063bd27ed04f306ba2e372cd92e0a8751/library/std/src/panicking.rs:292:9
  11:           0x41a990 - std::panicking::rust_panic_with_hook::h45e7e3752affffd6
                               at /rustc/c67326b063bd27ed04f306ba2e372cd92e0a8751/library/std/src/panicking.rs:779:13
  12:           0x4022c8 - std::panicking::begin_panic::{{closure}}::h275b05efe23611e5
  13:           0x430254 - _Unwind_Backtrace

Note that the last line is clearly incorrect: _Unwind_Backtrace does not call begin_panic.

from rust.

Amanieu avatar Amanieu commented on August 19, 2024

More clues:

  • std::panicking::rust_panic_with_hook is built as part of the standard library with -Cpanic=unwind.
  • std::panicking::begin_panic::{{closure}} is generic and built as part of the main program with -Cpanic=abort. I checked the binary with objdump and it does not have any eh_frame data.

What should have happened is that we stop the backtrace at std::panicking::begin_panic::{{closure}} because it has no unwind info to continue further. But for some reason we attempt to unwind despite the lack of unwind info, which causes this problem. Is the backtrace unwinder automatically falling back to frame pointers?

from rust.

eggyal avatar eggyal commented on August 19, 2024

https://github.com/libunwind/libunwind/blob/05afdabf38d3fa461b7a9de80c64a6513a564d81/src/aarch64/Gstep.c#L635-L638

	  /* Procedure Call Standard for the ARM 64-bit Architecture (AArch64)
	   * specifies that the end of the frame record chain is indicated by
	   * the address zero in the address for the previous frame.
	   */

from rust.

Amanieu avatar Amanieu commented on August 19, 2024

Here is the LLVM IR for this function. I think the root of the problem is that we are assigning a personality function to this function, which forces LLVM to emit DWARF frame metadata for this function. However this is a nounwind function that only calls other nounwind functions, so LLVM clobbering the link register is perfectly valid.

I see 2 ways of fixing this:

  • Figure out why rustc is adding a personality to this function and stop doing that.
  • Change LLVM so that that attaching a personality to a function implies the uwtable attribute.
; ModuleID = '<stdin>'
source_filename = "rust_out.7d368e479ffa2ca5-cgu.0"
target datalayout = "e-m:e-i8:8:32-i16:16:32-i64:64-i128:128-n32:64-S128"
target triple = "aarch64-unknown-linux-musl"

%"std::panicking::begin_panic::Payload<&str>" = type { %"core::option::Option<&str>" }
%"core::option::Option<&str>" = type { ptr, [1 x i64] }

@vtable.1 = external hidden unnamed_addr constant <{ ptr, [16 x i8], ptr, ptr }>, align 8

; Function Attrs: inlinehint noreturn nounwind
define hidden fastcc void @"_ZN3std9panicking11begin_panic28_$u7b$$u7b$closure$u7d$$u7d$17h275b05efe23611e5E"(ptr noalias nocapture noundef readonly align 8 dereferenceable(24) %_1) unnamed_addr #0 personality ptr @rust_eh_personality {
start:
  %_4 = alloca %"std::panicking::begin_panic::Payload<&str>", align 8
  call void @llvm.lifetime.start.p0(i64 16, ptr nonnull %_4)
  %inner.0 = load ptr, ptr %_1, align 8, !nonnull !3, !align !4, !noundef !3
  %0 = getelementptr inbounds i8, ptr %_1, i64 8
  %inner.1 = load i64, ptr %0, align 8, !noundef !3
  store ptr %inner.0, ptr %_4, align 8
  %1 = getelementptr inbounds i8, ptr %_4, i64 8
  store i64 %inner.1, ptr %1, align 8
  %2 = getelementptr inbounds i8, ptr %_1, i64 16
  %_6 = load ptr, ptr %2, align 8, !nonnull !3, !align !5, !noundef !3
  call void @_ZN3std9panicking20rust_panic_with_hook17h45e7e3752affffd6E(ptr noundef nonnull align 1 %_4, ptr noalias noundef nonnull readonly align 8 dereferenceable(24) @vtable.1, ptr noalias noundef readonly align 8 dereferenceable_or_null(48) null, ptr noalias noundef nonnull readonly align 8 dereferenceable(24) %_6, i1 noundef zeroext true, i1 noundef zeroext false) #4
  unreachable
}

; Function Attrs: nounwind
declare noundef i32 @rust_eh_personality(i32 noundef, i32 noundef, i64 noundef, ptr noundef, ptr noundef) unnamed_addr #1

; Function Attrs: noreturn nounwind
declare void @_ZN3std9panicking20rust_panic_with_hook17h45e7e3752affffd6E(ptr noundef nonnull align 1, ptr noalias noundef readonly align 8 dereferenceable(24), ptr noalias noundef readonly align 8 dereferenceable_or_null(48), ptr noalias noundef readonly align 8 dereferenceable(24), i1 noundef zeroext, i1 noundef zeroext) unnamed_addr #2

; Function Attrs: nocallback nofree nosync nounwind willreturn memory(argmem: readwrite)
declare void @llvm.lifetime.start.p0(i64 immarg, ptr nocapture) #3

attributes #0 = { inlinehint noreturn nounwind "probe-stack"="inline-asm" "target-cpu"="generic" "target-features"="+v8a" }
attributes #1 = { nounwind "probe-stack"="inline-asm" "target-cpu"="generic" "target-features"="+v8a" }
attributes #2 = { noreturn nounwind "probe-stack"="inline-asm" "target-cpu"="generic" "target-features"="+v8a" }
attributes #3 = { nocallback nofree nosync nounwind willreturn memory(argmem: readwrite) }
attributes #4 = { noreturn nounwind }

!llvm.module.flags = !{!0, !1}
!llvm.ident = !{!2}

!0 = !{i32 8, !"PIC Level", i32 2}
!1 = !{i32 7, !"PIE Level", i32 2}
!2 = !{!"rustc version 1.78.0-nightly (c67326b06 2024-03-15)"}
!3 = !{}
!4 = !{i64 1}
!5 = !{i64 8}

from rust.

nbdd0121 avatar nbdd0121 commented on August 19, 2024

I suppose we can also just emit the uwtable attribute whenever we attach a personality function?

from rust.

workingjubilee avatar workingjubilee commented on August 19, 2024

Huh? I thought AArch64 code was being compiled with frame pointers?

Doesn't look like it. https://github.com/rust-lang/rust/blob/master/compiler/rustc_target/src/spec/targets/aarch64_unknown_linux_gnu.rs

Apparently we have frame pointers enabled for non-leaf functions on Apple platforms, but not Linux.

Mmm. We probably ought to be adopting frame pointers across the board for AAPCS64 reasons. It's technically acceptable for a platform to deviate, but it should be explicit.

from rust.

davidlt avatar davidlt commented on August 19, 2024

I think this also affects riscv64 in Fedora/RISCV. Looking at the builds logs it seems to be panic_abort_doc_tests. It just runs forever slowly consuming the memory until it gets killed.

from rust.

decathorpe avatar decathorpe commented on August 19, 2024

Huh? I thought AArch64 code was being compiled with frame pointers?

To clarify, Fedora does force-enable frame pointers for x86_64 and aarch64, so this might be affecting Rust builds on Fedora.

from rust.

cuviper avatar cuviper commented on August 19, 2024

I reproduced the problem using nightlies for this report, so I don't think Fedora's configuration matters.

from rust.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.