Git Product home page Git Product logo

Comments (8)

dvyukov avatar dvyukov commented on May 18, 2024

It looks like the problem is here:

#13 0x00007fffe5711624 in ?? () from /usr/lib/libnvidia-glcore.so.530.41.03
#14 0x00007ffff7a9ebb5 in start_thread (arg=<optimized out>) at pthread_create.c:444

A new thread calls into libnvidia-glcore.so which calls calloc before tsan initialization for the thread.
The assumption is that tsan thread function wrapper will run before any "user" code.

What do you have at pthread_create.c:444? How/why is it calling into libnvidia-glcore.so?

from sanitizers.

rcorre avatar rcorre commented on May 18, 2024

What do you have at pthread_create.c:444

I'm on glibc, so pthread_create.c:444 is ret = pd->start_routine (pd->arg);: https://sourceware.org/git/?p=glibc.git;a=blob;f=nptl/pthread_create.c;h=a3619da1e216190bb4679936e105d418f683222a;hb=e6a252758cbadb13654e66e1f2445ef6f8a4dea0#l444

How/why is it calling into libnvidia-glcore.so?

I'm unsure so far, I'll keep trying to track down what spawns that thread.

It looks like there are a few threads like that:

Thread 19 (Thread 0x7fffde2d06c0 (LWP 44732) "[vkcf] Analysis"):
#0  futex_wait (private=0, expected=2, futex_word=0x7fffe6d53c68) at ../sysdeps/nptl/futex-internal.h:146
#1  __GI___lll_lock_wait (futex=futex@entry=0x7fffe6d53c68, private=0) at lowlevellock.c:49
#2  0x00007ffff7aa1efa in lll_mutex_lock_optimized (mutex=0x7fffe6d53c68) at pthread_mutex_lock.c:48
#3  ___pthread_mutex_lock (mutex=0x7fffe6d53c68) at pthread_mutex_lock.c:128
#4  0x00007fffe570ed63 in ?? () from /usr/lib/libnvidia-glcore.so.530.41.03
#5  0x00007fffe7139e31 in ?? () from /usr/lib/libGLX_nvidia.so.0
#6  0x00007fffe5710716 in ?? () from /usr/lib/libnvidia-glcore.so.530.41.03
#7  0x00007fffe5711624 in ?? () from /usr/lib/libnvidia-glcore.so.530.41.03
#8  0x00007ffff7a9ebb5 in start_thread (arg=<optimized out>) at pthread_create.c:444
#9  0x00007ffff7b20d90 in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81

Thread 18 (Thread 0x7fffdead16c0 (LWP 44731) "[vkrt] Analysis"):
#0  futex_wait (private=0, expected=2, futex_word=0x7fffe6d53c68) at ../sysdeps/nptl/futex-internal.h:146
#1  __GI___lll_lock_wait (futex=futex@entry=0x7fffe6d53c68, private=0) at lowlevellock.c:49
#2  0x00007ffff7aa1efa in lll_mutex_lock_optimized (mutex=0x7fffe6d53c68) at pthread_mutex_lock.c:48
#3  ___pthread_mutex_lock (mutex=0x7fffe6d53c68) at pthread_mutex_lock.c:128
#4  0x00007fffe570ed63 in ?? () from /usr/lib/libnvidia-glcore.so.530.41.03
#5  0x00007fffe7139e31 in ?? () from /usr/lib/libGLX_nvidia.so.0
#6  0x00007fffe5710716 in ?? () from /usr/lib/libnvidia-glcore.so.530.41.03
#7  0x00007fffe5711624 in ?? () from /usr/lib/libnvidia-glcore.so.530.41.03
#8  0x00007ffff7a9ebb5 in start_thread (arg=<optimized out>) at pthread_create.c:444
#9  0x00007ffff7b20d90 in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81

Thread 17 (Thread 0x7fffdf2d26c0 (LWP 44730) "godot.linuxbsd."):
#0  0x0000555555a641c1 in __sanitizer::CombinedAllocator<__sanitizer::SizeClassAllocator32<__sanitizer::AP32>, __sanitizer::LargeMmapAllocatorPtrArrayStatic>::Allocate(__sanitizer::SizeClassAllocator32LocalCache<__sanitizer::SizeClassAllocator32<__sanitizer::AP32> >*, unsigned long, unsigned long) ()
#1  0x0000555555a61d0e in __sanitizer::InternalAlloc(unsigned long, __sanitizer::SizeClassAllocator32LocalCache<__sanitizer::SizeClassAllocator32<__sanitizer::AP32> >*, unsigned long) ()
#2  0x0000555555b1e66f in __tsan::PrintCurrentStackSlow(unsigned long) ()
#3  0x0000555555b03612 in __tsan::CheckUnwind() ()
#4  0x0000555555a77ff1 in __sanitizer::CheckFailed(char const*, int, char const*, unsigned long long, unsigned long long) ()
#5  0x0000555555aa271e in __tsan::CallUserSignalHandler(__tsan::ThreadState*, bool, bool, int, __sanitizer::__sanitizer_siginfo*, void*) ()
#6  0x0000555555aa2898 in sighandler(int, __sanitizer::__sanitizer_siginfo*, void*) ()
#7  <signal handler called>
#8  0x0000555555afea14 in __tsan::user_alloc_internal(__tsan::ThreadState*, unsigned long, unsigned long, unsigned long, bool) ()
#9  0x0000555555afee6e in __tsan::user_calloc(__tsan::ThreadState*, unsigned long, unsigned long, unsigned long) ()
#10 0x0000555555ab1ec2 in calloc ()
#11 0x00007fffe57103f1 in ?? () from /usr/lib/libnvidia-glcore.so.530.41.03
#12 0x00007fffe57105e2 in ?? () from /usr/lib/libnvidia-glcore.so.530.41.03
#13 0x00007fffe5711624 in ?? () from /usr/lib/libnvidia-glcore.so.530.41.03
#14 0x00007ffff7a9ebb5 in start_thread (arg=<optimized out>) at pthread_create.c:444
#15 0x00007ffff7b20d90 in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81

The rest all seem to have TSAN injected before any other code:

#19 0x0000555555a9f823 in __tsan_thread_start_func ()
#20 0x00007ffff7a9ebb5 in start_thread (arg=<optimized out>) at pthread_create.c:444
#21 0x00007ffff7b20d90 in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81

This is where the main thread is:

#20 0x00007fffeabacc06 in terminator_CreateDevice (physicalDevice=<optimized out>, pCreateInfo=<optimized out>, pAllocator=<optimized out>, pDevice=<optimized out>) at /usr/src/debug/vulkan-icd-loader/Vulkan-Loader-1.3.245/loader/loader.c:6079
#21 0x00007fffeaba917b in loader_create_device_chain (pd=pd@entry=0x7b0800079940, pCreateInfo=pCreateInfo@entry=0x7fffffffbdf8, pAllocator=pAllocator@entry=0x0, inst=inst@entry=0x7b840000b400, dev=dev@entry=0x7b8c00013400, callingLayer=callingLayer@entry=0x0, layerNextGDPA=<optimized out>) at /usr/src/debug/vulkan-icd-loader/Vulkan-Loader-1.3.245/loader/loader.c:5227
#22 0x00007fffeabaaae6 in loader_layer_create_device (instance=<optimized out>, physicalDevice=<optimized out>, pCreateInfo=<optimized out>, pAllocator=<optimized out>, pDevice=<optimized out>, layerGIPA=<optimized out>, nextGDPA=<optimized out>) at /usr/src/debug/vulkan-icd-loader/Vulkan-Loader-1.3.245/loader/loader.c:4602
#23 0x00007fffeabc2ccb in vkCreateDevice (physicalDevice=0x7b0800079880, pCreateInfo=pCreateInfo@entry=0x7fffffffbdf8, pAllocator=pAllocator@entry=0x0, pDevice=pDevice@entry=0x7b740000a370) at /usr/src/debug/vulkan-icd-loader/Vulkan-Loader-1.3.245/loader/trampoline.c:858
#24 0x0000555557c4067e in VulkanContext::_create_device (this=this@entry=0x7b740000a010) at drivers/vulkan/vulkan_context.cpp:1475
#25 0x0000555557c40929 in VulkanContext::_initialize_queues (this=this@entry=0x7b740000a010, p_surface=p_surface@entry=0x7b14000f9920) at drivers/vulkan/vulkan_context.cpp:1528
#26 0x0000555557c418b4 in VulkanContext::_window_create (this=0x7b740000a010, p_window_id=0, p_vsync_mode=DisplayServer::VSYNC_ENABLED, p_surface=0x7b14000f9920, p_width=1152, p_height=648) at drivers/vulkan/vulkan_context.cpp:1688
#27 0x0000555555ba5923 in VulkanContextX11::window_create (this=0x7b740000a010, p_window_id=0, p_vsync_mode=DisplayServer::VSYNC_ENABLED, p_window=16777218, p_display=<optimized out>, p_width=1152, p_height=648) at platform/linuxbsd/x11/vulkan_context_x11.cpp:56
#28 0x0000555555b52033 in DisplayServerX11::_create_window (this=<optimized out>, this@entry=0x7b7000000010, p_mode=<optimized out>, p_mode@entry=DisplayServer::WINDOW_MODE_MAXIMIZED, p_vsync_mode=<optimized out>, p_vsync_mode@entry=DisplayServer::VSYNC_ENABLED, p_flags=<optimized out>, p_flags@entry=0, p_rect=...) at platform/linuxbsd/x11/display_server_x11.cpp:5144
#29 0x0000555555b6b5bc in DisplayServerX11::DisplayServerX11 (this=0x7b7000000010, this@entry=0x7fffffffcd10, p_rendering_driver=..., p_mode=p_mode@entry=DisplayServer::WINDOW_MODE_MAXIMIZED, p_vsync_mode=p_vsync_mode@entry=DisplayServer::VSYNC_ENABLED, p_flags=p_flags@entry=0, p_position=p_position@entry=0x0, p_resolution=..., p_screen=<optimized out>, r_error=@0x7fffffffcd10: OK) at platform/linuxbsd/x11/display_server_x11.cpp:5551
#30 0x0000555555b69922 in DisplayServerX11::create_func (p_rendering_driver=..., p_mode=DisplayServer::WINDOW_MODE_MAXIMIZED, p_vsync_mode=DisplayServer::VSYNC_ENABLED, p_flags=0, p_position=0x0, p_resolution=..., p_screen=<optimized out>, r_error=<optimized out>) at platform/linuxbsd/x11/display_server_x11.cpp:4845
#31 0x000055555b18c4f5 in DisplayServer::create (p_index=<optimized out>, p_rendering_driver=..., p_mode=DisplayServer::WINDOW_MODE_MAXIMIZED, p_vsync_mode=DisplayServer::VSYNC_ENABLED, p_flags=0, p_position=0x0, p_resolution=..., p_screen=<optimized out>, r_error=<optimized out>) at servers/display_server.cpp:904
#32 0x0000555555bdd069 in Main::setup2 (p_main_tid_override=p_main_tid_override@entry=0) at main/main.cpp:2001
#33 0x0000555555bd5b61 in Main::setup (execpath=0x7fffffffde1d "/home/rcorre/src/godot/godot/bin/godot.linuxbsd.editor.x86_64.llvm.san", argc=<optimized out>, argv=<optimized out>, p_second_phase=true) at main/main.cpp:1879
#34 0x0000555555b26bf0 in main (argc=<optimized out>, argv=<optimized out>) at platform/linuxbsd/godot_linuxbsd.cpp:61

from sanitizers.

dvyukov avatar dvyukov commented on May 18, 2024

I'm on glibc, so pthread_create.c:444 is ret = pd->start_routine (pd->arg);

Oh, this is thread start routine. So, yes, libnvidia-glcore.so somehow escapes tsan pthread_create interceptor that should initialize the thread.

Perhaps it uses raw clone syscall or something to create it.
However, if they do this, the thread also won't have TLS initialized by glibc...
I can't find source for it.

Debian says it's "non-free" package:
https://packages.debian.org/sid/libnvidia-glcore
Does it mean there are no sources available?

from sanitizers.

rcorre avatar rcorre commented on May 18, 2024

Does it mean there are no sources available?

Yes, unfortunately these are the proprietary NVidia drivers. I'll try to repro with the free Noveau drivers instead.

from sanitizers.

rcorre avatar rcorre commented on May 18, 2024

Turns out noveau doesn't support my GPU, so I can only use the proprietary drivers :(

from sanitizers.

Calinou avatar Calinou commented on May 18, 2024

Maybe look into using lavapipe or SwiftShader, which won't use the GPU at all to render Godot. The thread synchronization issue should still occur even if software Vulkan emulation is used.

from sanitizers.

rcorre avatar rcorre commented on May 18, 2024

Good idea @Calinou, lavapipe worked. @dvyukov, I'll leave it to you whether you want to close this. Thanks for all the help!

from sanitizers.

dvyukov avatar dvyukov commented on May 18, 2024

I don't think it's actionable on sanitizer side and the history will be kept, so closing for now.

from sanitizers.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.