Git Product home page Git Product logo

Comments (20)

kprovost avatar kprovost commented on May 25, 2024 1

We've obtained a backtrace over all threads in a wedged charon process:
(From https://redmine.pfsense.org/issues/13014#note-35, on strongswan-5.9.8)

I am insufficiently familiar with the architecture of strongswan to work out where the bug is, but it appears that there is a cross-thread/cross-lock deadlock. I think thread 13 is hanging on the the log lock while hung up on something else, and that's causing most other threads to sit around waiting for the log lock. I can't quite work out where send_() (in thread 13) is waiting.

(gdb) thread apply all bt
--
 
Thread 17 (LWP 118732 of process 18493):
#0  0x0000000824dc8fdc in ?? () from /lib/libthr.so.3
#1  0x0000000824dd82fc in ?? () from /lib/libthr.so.3
#2  0x0000000824dd0a6c in ?? () from /lib/libthr.so.3
#3  0x000000082303005f in lock () from /usr/local/lib/ipsec/libstrongswan.so.0
#4  0x000000082301b40b in remove_ () from /usr/local/lib/ipsec/libstrongswan.so.0
#5  0x0000000823015f3c in on_write () from /usr/local/lib/ipsec/libstrongswan.so.0
#6  0x0000000859cbcc85 in _cb_enable_writer () from /usr/local/lib/ipsec/plugins/libstrongswan-vici.so
#7  0x00000008230195d3 in execute () from /usr/local/lib/ipsec/libstrongswan.so.0
#8  0x0000000823019dd7 in _cb_process_jobs () from /usr/local/lib/ipsec/libstrongswan.so.0
#9  0x000000082302f640 in thread_main () from /usr/local/lib/ipsec/libstrongswan.so.0
#10 0x0000000824dcbcca in ?? () from /lib/libthr.so.3
#11 0x0000000000000000 in ?? ()
Backtrace stopped: Cannot access memory at address 0x8721a3000
 
Thread 16 (LWP 118731 of process 18493):
--Type <RET> for more, q to quit, c to continue without paging--c
#0  0x0000000824dc8fdc in ?? () from /lib/libthr.so.3
#1  0x0000000824dd88a0 in ?? () from /lib/libthr.so.3
#2  0x0000000824dd3513 in ?? () from /lib/libthr.so.3
#3  0x00000008230305bf in write_lock () from /usr/local/lib/ipsec/libstrongswan.so.0
#4  0x0000000823a052ff in remove_logger () from /usr/local/lib/ipsec/libcharon.so.0
#5  0x000000082302f7eb in thread_cleanup_pop () from /usr/local/lib/ipsec/libstrongswan.so.0
#6  0x0000000823a0c3e1 in wait_for_listener () from /usr/local/lib/ipsec/libcharon.so.0
#7  0x0000000823a0bfe4 in initiate () from /usr/local/lib/ipsec/libcharon.so.0
#8  0x0000000859cc524d in _cb_initiate () from /usr/local/lib/ipsec/plugins/libstrongswan-vici.so
#9  0x0000000859cbfad9 in process_request () from /usr/local/lib/ipsec/plugins/libstrongswan-vici.so
#10 0x0000000859cc02de in _cb_inbound () from /usr/local/lib/ipsec/plugins/libstrongswan-vici.so
#11 0x0000000859cbd9ef in _cb_process_queue () from /usr/local/lib/ipsec/plugins/libstrongswan-vici.so
#12 0x00000008230195d3 in execute () from /usr/local/lib/ipsec/libstrongswan.so.0
#13 0x0000000823019dd7 in _cb_process_jobs () from /usr/local/lib/ipsec/libstrongswan.so.0
#14 0x000000082302f640 in thread_main () from /usr/local/lib/ipsec/libstrongswan.so.0
#15 0x0000000824dcbcca in ?? () from /lib/libthr.so.3
#16 0x0000000000000000 in ?? ()
Backtrace stopped: Cannot access memory at address 0x870388000
 
Thread 15 (LWP 118730 of process 18493):
#0  0x0000000824dc8fdc in ?? () from /lib/libthr.so.3
#1  0x0000000824dd82fc in ?? () from /lib/libthr.so.3
#2  0x0000000824dd0a6c in ?? () from /lib/libthr.so.3
#3  0x000000082303005f in lock () from /usr/local/lib/ipsec/libstrongswan.so.0
#4  0x000000082301b2ca in notify_end () from /usr/local/lib/ipsec/libstrongswan.so.0
#5  0x0000000823019608 in destroy () from /usr/local/lib/ipsec/libstrongswan.so.0
#6  0x0000000823019e47 in _cb_process_jobs () from /usr/local/lib/ipsec/libstrongswan.so.0
#7  0x000000082302f640 in thread_main () from /usr/local/lib/ipsec/libstrongswan.so.0
#8  0x0000000824dcbcca in ?? () from /lib/libthr.so.3
#9  0x0000000000000000 in ?? ()
Backtrace stopped: Cannot access memory at address 0x86f159000
 
Thread 14 (LWP 118729 of process 18493):
#0  0x0000000824dc8fdc in ?? () from /lib/libthr.so.3
#1  0x0000000824dd8851 in ?? () from /lib/libthr.so.3
#2  0x0000000824dd32ee in ?? () from /lib/libthr.so.3
#3  0x000000082303056f in read_lock () from /usr/local/lib/ipsec/libstrongswan.so.0
#4  0x0000000823a05552 in vlog () from /usr/local/lib/ipsec/libcharon.so.0
#5  0x0000000823a054cf in log_ () from /usr/local/lib/ipsec/libcharon.so.0
#6  0x0000000857668a55 in del_sa () from /usr/local/lib/ipsec/plugins/libstrongswan-kernel-pfkey.so
#7  0x0000000823a2a942 in destroy () from /usr/local/lib/ipsec/libcharon.so.0
#8  0x0000000823a64ca1 in destroy () from /usr/local/lib/ipsec/libcharon.so.0
#9  0x0000000823a59cd7 in flush_queue () from /usr/local/lib/ipsec/libcharon.so.0
#10 0x0000000823a60d7a in process_r () from /usr/local/lib/ipsec/libcharon.so.0
#11 0x0000000823a58374 in process_request () from /usr/local/lib/ipsec/libcharon.so.0
#12 0x0000000823a575bf in process_message () from /usr/local/lib/ipsec/libcharon.so.0
#13 0x0000000823a2de78 in process_message () from /usr/local/lib/ipsec/libcharon.so.0
#14 0x0000000823a247b1 in execute () from /usr/local/lib/ipsec/libcharon.so.0
#15 0x0000000823019dd7 in _cb_process_jobs () from /usr/local/lib/ipsec/libstrongswan.so.0
#16 0x000000082302f640 in thread_main () from /usr/local/lib/ipsec/libstrongswan.so.0
#17 0x0000000824dcbcca in ?? () from /lib/libthr.so.3
#18 0x0000000000000000 in ?? ()
Backtrace stopped: Cannot access memory at address 0x86e384000
 
Thread 13 (LWP 118728 of process 18493):
#0  0x0000000824dc8fdc in ?? () from /lib/libthr.so.3
#1  0x0000000824dd86d0 in ?? () from /lib/libthr.so.3
#2  0x0000000824dcab4f in ?? () from /lib/libthr.so.3
#3  0x0000000859cbc84f in send_ () from /usr/local/lib/ipsec/plugins/libstrongswan-vici.so
#4  0x0000000859cc001a in raise_event () from /usr/local/lib/ipsec/plugins/libstrongswan-vici.so
#5  0x0000000859cc5556 in log_vici () from /usr/local/lib/ipsec/plugins/libstrongswan-vici.so
#6  0x0000000823a0c579 in listener_log () from /usr/local/lib/ipsec/libcharon.so.0
#7  0x0000000823a05766 in _cb_log_cb () from /usr/local/lib/ipsec/libcharon.so.0
#8  0x0000000822ff6b7c in invoke_function () from /usr/local/lib/ipsec/libstrongswan.so.0
#9  0x0000000823a0563a in vlog () from /usr/local/lib/ipsec/libcharon.so.0
#10 0x0000000823a054cf in log_ () from /usr/local/lib/ipsec/libcharon.so.0
#11 0x0000000823a60d1d in process_r () from /usr/local/lib/ipsec/libcharon.so.0
#12 0x0000000823a58374 in process_request () from /usr/local/lib/ipsec/libcharon.so.0
#13 0x0000000823a575bf in process_message () from /usr/local/lib/ipsec/libcharon.so.0
#14 0x0000000823a2de78 in process_message () from /usr/local/lib/ipsec/libcharon.so.0
#15 0x0000000823a247b1 in execute () from /usr/local/lib/ipsec/libcharon.so.0
#16 0x0000000823019dd7 in _cb_process_jobs () from /usr/local/lib/ipsec/libstrongswan.so.0
#17 0x000000082302f640 in thread_main () from /usr/local/lib/ipsec/libstrongswan.so.0
#18 0x0000000824dcbcca in ?? () from /lib/libthr.so.3
#19 0x0000000000000000 in ?? ()
Backtrace stopped: Cannot access memory at address 0x86d77c000
 
Thread 12 (LWP 118727 of process 18493):
#0  0x0000000824dc8fdc in ?? () from /lib/libthr.so.3
#1  0x0000000824dd88a0 in ?? () from /lib/libthr.so.3
#2  0x0000000824dd3513 in ?? () from /lib/libthr.so.3
#3  0x00000008230305bf in write_lock () from /usr/local/lib/ipsec/libstrongswan.so.0
#4  0x0000000823a052ff in remove_logger () from /usr/local/lib/ipsec/libcharon.so.0
#5  0x000000082302f7eb in thread_cleanup_pop () from /usr/local/lib/ipsec/libstrongswan.so.0
#6  0x0000000823a0c3e1 in wait_for_listener () from /usr/local/lib/ipsec/libcharon.so.0
#7  0x0000000823a0bfe4 in initiate () from /usr/local/lib/ipsec/libcharon.so.0
#8  0x0000000859cc524d in _cb_initiate () from /usr/local/lib/ipsec/plugins/libstrongswan-vici.so
#9  0x0000000859cbfad9 in process_request () from /usr/local/lib/ipsec/plugins/libstrongswan-vici.so
#10 0x0000000859cc02de in _cb_inbound () from /usr/local/lib/ipsec/plugins/libstrongswan-vici.so
#11 0x0000000859cbd9ef in _cb_process_queue () from /usr/local/lib/ipsec/plugins/libstrongswan-vici.so
#12 0x00000008230195d3 in execute () from /usr/local/lib/ipsec/libstrongswan.so.0
#13 0x0000000823019dd7 in _cb_process_jobs () from /usr/local/lib/ipsec/libstrongswan.so.0
#14 0x000000082302f640 in thread_main () from /usr/local/lib/ipsec/libstrongswan.so.0
#15 0x0000000824dcbcca in ?? () from /lib/libthr.so.3
#16 0x0000000000000000 in ?? ()
Backtrace stopped: Cannot access memory at address 0x86b799000
 
Thread 11 (LWP 118726 of process 18493):
#0  0x0000000824dc8fdc in ?? () from /lib/libthr.so.3
#1  0x0000000824dd88a0 in ?? () from /lib/libthr.so.3
#2  0x0000000824dd3513 in ?? () from /lib/libthr.so.3
#3  0x00000008230305bf in write_lock () from /usr/local/lib/ipsec/libstrongswan.so.0
#4  0x0000000823a052ff in remove_logger () from /usr/local/lib/ipsec/libcharon.so.0
#5  0x000000082302f7eb in thread_cleanup_pop () from /usr/local/lib/ipsec/libstrongswan.so.0
#6  0x0000000823a0c3e1 in wait_for_listener () from /usr/local/lib/ipsec/libcharon.so.0
#7  0x0000000823a0bfe4 in initiate () from /usr/local/lib/ipsec/libcharon.so.0
#8  0x0000000859cc524d in _cb_initiate () from /usr/local/lib/ipsec/plugins/libstrongswan-vici.so
#9  0x0000000859cbfad9 in process_request () from /usr/local/lib/ipsec/plugins/libstrongswan-vici.so
#10 0x0000000859cc02de in _cb_inbound () from /usr/local/lib/ipsec/plugins/libstrongswan-vici.so
#11 0x0000000859cbd9ef in _cb_process_queue () from /usr/local/lib/ipsec/plugins/libstrongswan-vici.so
#12 0x00000008230195d3 in execute () from /usr/local/lib/ipsec/libstrongswan.so.0
#13 0x0000000823019dd7 in _cb_process_jobs () from /usr/local/lib/ipsec/libstrongswan.so.0
#14 0x000000082302f640 in thread_main () from /usr/local/lib/ipsec/libstrongswan.so.0
#15 0x0000000824dcbcca in ?? () from /lib/libthr.so.3
#16 0x0000000000000000 in ?? ()
Backtrace stopped: Cannot access memory at address 0x86aa18000
 
Thread 10 (LWP 118725 of process 18493):
#0  0x0000000824dc8fdc in ?? () from /lib/libthr.so.3
#1  0x0000000824dd88a0 in ?? () from /lib/libthr.so.3
#2  0x0000000824dd3513 in ?? () from /lib/libthr.so.3
#3  0x00000008230305bf in write_lock () from /usr/local/lib/ipsec/libstrongswan.so.0
#4  0x0000000823a052ff in remove_logger () from /usr/local/lib/ipsec/libcharon.so.0
#5  0x000000082302f7eb in thread_cleanup_pop () from /usr/local/lib/ipsec/libstrongswan.so.0
#6  0x0000000823a0c3e1 in wait_for_listener () from /usr/local/lib/ipsec/libcharon.so.0
#7  0x0000000823a0bfe4 in initiate () from /usr/local/lib/ipsec/libcharon.so.0
#8  0x0000000859cc524d in _cb_initiate () from /usr/local/lib/ipsec/plugins/libstrongswan-vici.so
#9  0x0000000859cbfad9 in process_request () from /usr/local/lib/ipsec/plugins/libstrongswan-vici.so
#10 0x0000000859cc02de in _cb_inbound () from /usr/local/lib/ipsec/plugins/libstrongswan-vici.so
#11 0x0000000859cbd9ef in _cb_process_queue () from /usr/local/lib/ipsec/plugins/libstrongswan-vici.so
#12 0x00000008230195d3 in execute () from /usr/local/lib/ipsec/libstrongswan.so.0
#13 0x0000000823019dd7 in _cb_process_jobs () from /usr/local/lib/ipsec/libstrongswan.so.0
#14 0x000000082302f640 in thread_main () from /usr/local/lib/ipsec/libstrongswan.so.0
#15 0x0000000824dcbcca in ?? () from /lib/libthr.so.3
#16 0x0000000000000000 in ?? ()
Backtrace stopped: Cannot access memory at address 0x869851000
 
Thread 9 (LWP 118724 of process 18493):
#0  0x0000000824dc8fdc in ?? () from /lib/libthr.so.3
#1  0x0000000824dd88a0 in ?? () from /lib/libthr.so.3
#2  0x0000000824dd3513 in ?? () from /lib/libthr.so.3
#3  0x00000008230305bf in write_lock () from /usr/local/lib/ipsec/libstrongswan.so.0
#4  0x0000000823a052ff in remove_logger () from /usr/local/lib/ipsec/libcharon.so.0
#5  0x000000082302f7eb in thread_cleanup_pop () from /usr/local/lib/ipsec/libstrongswan.so.0
#6  0x0000000823a0c3e1 in wait_for_listener () from /usr/local/lib/ipsec/libcharon.so.0
#7  0x0000000823a0bfe4 in initiate () from /usr/local/lib/ipsec/libcharon.so.0
#8  0x0000000859cc524d in _cb_initiate () from /usr/local/lib/ipsec/plugins/libstrongswan-vici.so
#9  0x0000000859cbfad9 in process_request () from /usr/local/lib/ipsec/plugins/libstrongswan-vici.so
#10 0x0000000859cc02de in _cb_inbound () from /usr/local/lib/ipsec/plugins/libstrongswan-vici.so
#11 0x0000000859cbd9ef in _cb_process_queue () from /usr/local/lib/ipsec/plugins/libstrongswan-vici.so
#12 0x00000008230195d3 in execute () from /usr/local/lib/ipsec/libstrongswan.so.0
#13 0x0000000823019dd7 in _cb_process_jobs () from /usr/local/lib/ipsec/libstrongswan.so.0
#14 0x000000082302f640 in thread_main () from /usr/local/lib/ipsec/libstrongswan.so.0
#15 0x0000000824dcbcca in ?? () from /lib/libthr.so.3
#16 0x0000000000000000 in ?? ()
Backtrace stopped: Cannot access memory at address 0x868a32000
 
Thread 8 (LWP 118723 of process 18493):
#0  0x0000000824dc8fdc in ?? () from /lib/libthr.so.3
#1  0x0000000824dd82fc in ?? () from /lib/libthr.so.3
#2  0x0000000824dd0a6c in ?? () from /lib/libthr.so.3
#3  0x000000082303005f in lock () from /usr/local/lib/ipsec/libstrongswan.so.0
#4  0x000000082301b40b in remove_ () from /usr/local/lib/ipsec/libstrongswan.so.0
#5  0x0000000823015f3c in on_write () from /usr/local/lib/ipsec/libstrongswan.so.0
#6  0x0000000859cbcc85 in _cb_enable_writer () from /usr/local/lib/ipsec/plugins/libstrongswan-vici.so
#7  0x00000008230195d3 in execute () from /usr/local/lib/ipsec/libstrongswan.so.0
#8  0x0000000823019dd7 in _cb_process_jobs () from /usr/local/lib/ipsec/libstrongswan.so.0
#9  0x000000082302f640 in thread_main () from /usr/local/lib/ipsec/libstrongswan.so.0
#10 0x0000000824dcbcca in ?? () from /lib/libthr.so.3
#11 0x0000000000000000 in ?? ()
Backtrace stopped: Cannot access memory at address 0x8687a8000
 
Thread 7 (LWP 118722 of process 18493):
#0  0x0000000824dc8fdc in ?? () from /lib/libthr.so.3
#1  0x0000000824dd88a0 in ?? () from /lib/libthr.so.3
#2  0x0000000824dd3513 in ?? () from /lib/libthr.so.3
#3  0x00000008230305bf in write_lock () from /usr/local/lib/ipsec/libstrongswan.so.0
#4  0x0000000823a052ff in remove_logger () from /usr/local/lib/ipsec/libcharon.so.0
#5  0x000000082302f7eb in thread_cleanup_pop () from /usr/local/lib/ipsec/libstrongswan.so.0
#6  0x0000000823a0c3e1 in wait_for_listener () from /usr/local/lib/ipsec/libcharon.so.0
#7  0x0000000823a0bfe4 in initiate () from /usr/local/lib/ipsec/libcharon.so.0
#8  0x0000000859cc524d in _cb_initiate () from /usr/local/lib/ipsec/plugins/libstrongswan-vici.so
#9  0x0000000859cbfad9 in process_request () from /usr/local/lib/ipsec/plugins/libstrongswan-vici.so
#10 0x0000000859cc02de in _cb_inbound () from /usr/local/lib/ipsec/plugins/libstrongswan-vici.so
#11 0x0000000859cbd9ef in _cb_process_queue () from /usr/local/lib/ipsec/plugins/libstrongswan-vici.so
#12 0x00000008230195d3 in execute () from /usr/local/lib/ipsec/libstrongswan.so.0
#13 0x0000000823019dd7 in _cb_process_jobs () from /usr/local/lib/ipsec/libstrongswan.so.0
#14 0x000000082302f640 in thread_main () from /usr/local/lib/ipsec/libstrongswan.so.0
#15 0x0000000824dcbcca in ?? () from /lib/libthr.so.3
#16 0x0000000000000000 in ?? ()
Backtrace stopped: Cannot access memory at address 0x866b12000
 
Thread 6 (LWP 118721 of process 18493):
#0  0x0000000824dc8fdc in ?? () from /lib/libthr.so.3
#1  0x0000000824dd82fc in ?? () from /lib/libthr.so.3
#2  0x0000000824dd0a6c in ?? () from /lib/libthr.so.3
#3  0x000000082303005f in lock () from /usr/local/lib/ipsec/libstrongswan.so.0
#4  0x000000082301b2ca in notify_end () from /usr/local/lib/ipsec/libstrongswan.so.0
#5  0x0000000823019608 in destroy () from /usr/local/lib/ipsec/libstrongswan.so.0
#6  0x0000000823019e47 in _cb_process_jobs () from /usr/local/lib/ipsec/libstrongswan.so.0
#7  0x000000082302f640 in thread_main () from /usr/local/lib/ipsec/libstrongswan.so.0
#8  0x0000000824dcbcca in ?? () from /lib/libthr.so.3
#9  0x0000000000000000 in ?? ()
Backtrace stopped: Cannot access memory at address 0x865c3f000
 
Thread 5 (LWP 118720 of process 18493):
#0  0x0000000824dc8fdc in ?? () from /lib/libthr.so.3
#1  0x0000000824dd86d0 in ?? () from /lib/libthr.so.3
#2  0x0000000824dcab4f in ?? () from /lib/libthr.so.3
#3  0x0000000823a2360d in send_packets () from /usr/local/lib/ipsec/libcharon.so.0
#4  0x00000008230195d3 in execute () from /usr/local/lib/ipsec/libstrongswan.so.0
#5  0x0000000823019dd7 in _cb_process_jobs () from /usr/local/lib/ipsec/libstrongswan.so.0
#6  0x000000082302f640 in thread_main () from /usr/local/lib/ipsec/libstrongswan.so.0
#7  0x0000000824dcbcca in ?? () from /lib/libthr.so.3
#8  0x0000000000000000 in ?? ()
Backtrace stopped: Cannot access memory at address 0x864a43000
 
Thread 4 (LWP 118719 of process 18493):
#0  0x0000000824dc8fdc in ?? () from /lib/libthr.so.3
#1  0x0000000824dd8851 in ?? () from /lib/libthr.so.3
#2  0x0000000824dd32ee in ?? () from /lib/libthr.so.3
#3  0x000000082303056f in read_lock () from /usr/local/lib/ipsec/libstrongswan.so.0
#4  0x0000000823a05552 in vlog () from /usr/local/lib/ipsec/libcharon.so.0
#5  0x0000000823a054cf in log_ () from /usr/local/lib/ipsec/libcharon.so.0
#6  0x0000000823a10099 in parse_header () from /usr/local/lib/ipsec/libcharon.so.0
#7  0x0000000823a22832 in receive_packets () from /usr/local/lib/ipsec/libcharon.so.0
#8  0x00000008230195d3 in execute () from /usr/local/lib/ipsec/libstrongswan.so.0
#9  0x0000000823019dd7 in _cb_process_jobs () from /usr/local/lib/ipsec/libstrongswan.so.0
#10 0x000000082302f640 in thread_main () from /usr/local/lib/ipsec/libstrongswan.so.0
#11 0x0000000824dcbcca in ?? () from /lib/libthr.so.3
#12 0x0000000000000000 in ?? ()
Backtrace stopped: Cannot access memory at address 0x8639b7000
 
Thread 3 (LWP 118718 of process 18493):
#0  0x0000000824dc8fdc in ?? () from /lib/libthr.so.3
#1  0x0000000824dd8851 in ?? () from /lib/libthr.so.3
#2  0x0000000824dd32ee in ?? () from /lib/libthr.so.3
#3  0x000000082303056f in read_lock () from /usr/local/lib/ipsec/libstrongswan.so.0
#4  0x0000000823a05552 in vlog () from /usr/local/lib/ipsec/libcharon.so.0
#5  0x0000000823a0d51c in dbg_bus () from /usr/local/lib/ipsec/libcharon.so.0
#6  0x000000082301ad4f in watch () from /usr/local/lib/ipsec/libstrongswan.so.0
#7  0x00000008230195d3 in execute () from /usr/local/lib/ipsec/libstrongswan.so.0
#8  0x0000000823019dd7 in _cb_process_jobs () from /usr/local/lib/ipsec/libstrongswan.so.0
#9  0x000000082302f640 in thread_main () from /usr/local/lib/ipsec/libstrongswan.so.0
#10 0x0000000824dcbcca in ?? () from /lib/libthr.so.3
#11 0x0000000000000000 in ?? ()
Backtrace stopped: Cannot access memory at address 0x86363c000
 
Thread 2 (LWP 118717 of process 18493):
#0  0x0000000824dc8fdc in ?? () from /lib/libthr.so.3
#1  0x0000000824dd8851 in ?? () from /lib/libthr.so.3
#2  0x0000000824dd32ee in ?? () from /lib/libthr.so.3
#3  0x000000082303056f in read_lock () from /usr/local/lib/ipsec/libstrongswan.so.0
#4  0x0000000823a05552 in vlog () from /usr/local/lib/ipsec/libcharon.so.0
#5  0x0000000823a0d51c in dbg_bus () from /usr/local/lib/ipsec/libcharon.so.0
#6  0x000000082301a612 in schedule () from /usr/local/lib/ipsec/libstrongswan.so.0
#7  0x00000008230195d3 in execute () from /usr/local/lib/ipsec/libstrongswan.so.0
#8  0x0000000823019dd7 in _cb_process_jobs () from /usr/local/lib/ipsec/libstrongswan.so.0
#9  0x000000082302f640 in thread_main () from /usr/local/lib/ipsec/libstrongswan.so.0
#10 0x0000000824dcbcca in ?? () from /lib/libthr.so.3
#11 0x0000000000000000 in ?? ()
Backtrace stopped: Cannot access memory at address 0x8624db000
 
Thread 1 (LWP 101124 of process 18493):
#0  0x000000082531ed0a in _sigwaitinfo () from /lib/libc.so.7
#1  0x0000000824dd4acb in ?? () from /lib/libthr.so.3
#2  0x00000000002039a8 in run ()
#3  0x0000000000203723 in main ()
(gdb)

from strongswan.

kprovost avatar kprovost commented on May 25, 2024 1

Our user(s) report that the issue does not recur with this patch (https://redmine.pfsense.org/issues/13014#note-51).
We'll include it in the upcoming pfSense release.

from strongswan.

tobiasbrunner avatar tobiasbrunner commented on May 25, 2024 1

Great, thanks for the update. I've pushed the fix to master.

from strongswan.

Thermi avatar Thermi commented on May 25, 2024

Configure logging, install debug symbols (or don't strip the binaries), and attach a debugger when it happens. Then provide stacktraces for all threads.
Then we can say something.
From a first glance the thread doesn't provide anything like that.

from strongswan.

wickeren avatar wickeren commented on May 25, 2024

Ok, I have to say that I haven’t done this before, let alone on freebsd. I see what I can do to provide the needed information.

from strongswan.

Thermi avatar Thermi commented on May 25, 2024

Issue still relevant/company still alive/?

from strongswan.

ablizno avatar ablizno commented on May 25, 2024

Issue still relevant/company still alive/?

Yes this is still relevant. It seems to have become more widespread now. There has been more posts to the pfsense redmine, along with a ktrace dump.

https://redmine.pfsense.org/issues/13014

In addition to the thread that OP posted, which has gained more traction earlier this year. There is a newer thread with more information

https://forum.netgate.com/topic/172075/my-ipsec-service-hangs/39

I would be willing to provide any information that I can that has not already been presented in those threads.

I also saw one post with the exact same issue on OPNsense which leads me to believe the issue resides within strongswan.

from strongswan.

tobiasbrunner avatar tobiasbrunner commented on May 25, 2024

Thanks for the backtraces. I think I see how this could happen. But could you please clarify what log levels are configured for the daemon in your case?

from strongswan.

kprovost avatar kprovost commented on May 25, 2024

That should be in strongswan.conf (https://redmine.pfsense.org/attachments/4529), right?

syslog {
identifier = charon
# log everything under daemon since it ends up in the same place regardless with our syslog.conf
daemon {
ike_name = yes
dmn = 2
mgr = 2
ike = 0
chd = 0
job = 1
cfg = 2
knl = 2
net = 0
asn = 1
enc = 0
imc = 1
imv = 1
pts = 1
tls = 1
esp = 2
lib = 2
}
# disable logging under auth so logs aren't duplicated
auth {
default = -1
}
}

from strongswan.

tobiasbrunner avatar tobiasbrunner commented on May 25, 2024

That's the config on the system from which the backtraces are?

from strongswan.

kprovost avatar kprovost commented on May 25, 2024

To the best of my knowledge, yes.
These are bug reports from pfSense users and we've not been able to reproduce the issue ourselves yet.

from strongswan.

tobiasbrunner avatar tobiasbrunner commented on May 25, 2024

My analysis of the backtraces above is as follows:

  • Thread 13 is logging progress for a vici initiate() command via vici. This thread has the read lock in bus_t locked and now tries to send a log message over the vici connection via vici_dispatcher_t::raise_event() and vici_socket_t::send(). The latter has to acquire the connection entry as writer via find_entry() but that's not possible because another thread already has it checked out in this mode. So it has to wait on a condvar until the other thread is done (this is a guess as the backtrace does not explicitly show this).
  • Either thread 8 or thread 17 is the thread that has the connection entry checked out. They were triggered by previous log messages which queued a job to asynchronously call enable_writer() for the respective vici connection's stream. To react when the stream is ready for writing, the jobs register a callback with the watcher component via stream_t::on_write(). However, there they get stuck in watcher_t::remove() on the watcher mutex which is currently held by another thread.
  • Thread 3 is presumably the thread that holds the watcher mutex while it tries to log a message at the same time. This in turn requires acquiring the read lock in bus_t. Now that's usually not a problem as read locks may be held by multiple threads concurrently. However, at the same time multiple threads try to acquire the write lock in bus_t. In order to prevent writer starvation, the rwlock implementation can block further threads from acquiring read locks while writers are waiting.
  • Threads 7, 9, 10, 11, 12 and 16 all handle vici initiate() commands, which either are complete or were called with a timeout that has expired. They all try to remove their respective loggers and are waiting to acquire the write lock in bus_t.
  • So a deadlock is caused: These last threads block thread 3 from acquiring the read lock in bus_t, which, therefore, can't release the watcher mutex that's required for thread 8/17 to release the connection entry that in turn is required by thread 13 to eventually release the read lock in bus_t that would allow threads to acquire the write lock in bus_t.

What's weird, and why I asked about the log levels, is that the code in watcher.c::watch() (thread 3) only logs messages on level 2 and 3 while holding the watcher mutex (those watching ... for ... messages seen in some other user's log). So with job = 1 these log messages should not require the read lock in bus_t as there is a check in bus_t::vlog() that should cause a return before having to acquire the lock if no logger with equal or higher log level is registered. Either there is such a logger or the check fails for some reason. (Or my analysis is incorrect. Or there might be another way how a similar deadlock gets triggered even with reduced log levels - one theoretical possibility is via watcher.c::update() because if the write() call there fails for some reason, the watcher lock is held while logging on level 1, this seems not very likely, though.)

When using swanctl --initiate and the log messages on stdout are not actually used (e.g. when scripting), it should be possible to avoid this issue by disabling logging (--loglevel -1), also when initiating directly via vici's initiate() command (the default for the loglevel argument is 1). Same goes for --terminate/terminate().

I'm not yet sure what the proper fix for this will be.

  • While we could decouple the command-specific log handler from the vici connection entries relatively easily, this could cause log messages to get lost if the connection is closed before the queue with log messages can be processed (the current code adds the messages to each connection's own message queue so they are definitely sent before the final response to the command). Avoiding that would require some more complicated approach.
  • It might be possible to use more fine-grained locking for the vici connection entries in order to avoid that the thread that registers the connection's stream with the watcher blocks threads that concurrently try to queue messages.
  • We might also want to consider removing the log messages in watcher_t::watch() (not sure how useful they really are, I don't think I've ever used them to debug anything), or at least log them while not holding the mutex (not ideal as that requires enumerating the FDs multiple times, unless we write a compact log message to a buffer during the loop and log that as a single message afterwards, something like watching 12[re], 9[r], 15[rw] ...).

Might even need multiple of these changes.

from strongswan.

tobiasbrunner avatar tobiasbrunner commented on May 25, 2024

I've pushed a possible fix to the 566-vici-control-log branch. In the scenario above it would allow thread 13 to acquire the connection entry even if thread 8 or 17 currently held it while waiting for the watcher mutex. That way thread 13 could proceed and release the read lock in bus_t allowing the other threads to continue.

from strongswan.

kprovost avatar kprovost commented on May 25, 2024

Fantastic!
I'll build that into a package and ask the reporting users to verify.

from strongswan.

tobiasbrunner avatar tobiasbrunner commented on May 25, 2024

Great, let me know how that works out. If there is still some issue, we hopefully get another set of backtraces and corresponding configs.

from strongswan.

tobiasbrunner avatar tobiasbrunner commented on May 25, 2024

@kprovost By the way, what compiler version do you use to build the strongSwan package?

from strongswan.

kprovost avatar kprovost commented on May 25, 2024

We use the native compiler on FreeBSD main, which for the most recent released pfSense version (23.01) is LLVM llvmorg-14.0.5-0-gc12386ae247c.

from strongswan.

tobiasbrunner avatar tobiasbrunner commented on May 25, 2024

OK, thanks. Then the check in bus_t::vlog() should actually work (it relies on atomic built-ins, which clang basically always supported). So I still wonder how this could get triggered with the log level for job lower than 2 (thread 3 clearly skipped the check to wait on the read lock).

from strongswan.

tobiasbrunner avatar tobiasbrunner commented on May 25, 2024

Don't know why I didn't see it before, but any initiate() or terminate() vici command that hasn't set timeout to -1 will implicitly register a logger with level 4 via the controller_t component. The filtering based on the log level passed to the command only happens later in a callback in the vici plugin. Additionally, there is a filter that intents to restrict log messages to a particular connection (IKE_SA), but some messages could get through before the SA is actually initiated (the logger is registered before that). I've addressed both issues with a bunch of changes to the watcher_t component in the watcher-logging branch. It's not stuff that's necessary to fix the deadlock, though.

from strongswan.

kprovost avatar kprovost commented on May 25, 2024

Thanks for the update.

We have at least one user running the patched version (with f33cf93) promising to report back next week. No issues so far, and I expect we'll hear soon if the problem does recur.

from strongswan.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.