Comments (8)
Thanks for the report.
What are the results of this test with DRBD 9.1.5 and 9.0.32?
What load are you applying with fio?
from drbd.
For the record, rq_state: 284000
corresponds to RQ_WRITE|RQ_UNPLUG|RQ_COMPLETION_SUSP
.
The disassembly around drbd_sender+0x208
(0x208==520) on this kernel is:
/drbd/drbd/drbd_sender.c:
3133 read_lock_irq(&connection->resource->state_rwlock);
0x00000000000100fb <+459>: mov 0x10(%rbx),%rax
0x00000000000100ff <+463>: lea 0x130(%rax),%rdi
0x0000000000010106 <+470>: callq 0x1010b <drbd_sender+475>
3134 __req_mod(req, SEND_CANCELED, peer_device, &m);
0x000000000001010b <+475>: mov %r13,%rdx
0x000000000001010e <+478>: mov $0x5,%esi
0x0000000000010113 <+483>: mov %rbp,%rdi
0x0000000000010116 <+486>: lea 0x10(%rsp),%rcx
0x000000000001011b <+491>: callq 0x10120 <drbd_sender+496>
./include/asm-generic/qrwlock.h:
117
0x0000000000010120 <+496>: mov 0x10(%rbx),%rdx
./arch/x86/include/asm/atomic.h:
177
0x0000000000010124 <+500>: mov $0xfffffe00,%eax
0x0000000000010129 <+505>: lock xadd %eax,0x130(%rdx)
./arch/x86/include/asm/paravirt.h:
804
0x0000000000010131 <+513>: callq *0x0
/drbd/drbd/drbd_sender.c:
3136 if (m.bio)
0x0000000000010138 <+520>: cmpq $0x0,0x10(%rsp)
0x000000000001013e <+526>: je 0x1014d <drbd_sender+541>
3137 complete_master_bio(device, &m);
0x0000000000010140 <+528>: lea 0x10(%rsp),%rsi
0x0000000000010145 <+533>: mov %r15,%rdi
0x0000000000010148 <+536>: callq 0x1014d <drbd_sender+541>
from drbd.
One more test to perform would be blocking both directions with iptables. This is a particularly unusual test because the network connection is only broken in one direction. Try blocking both for comparison.
from drbd.
fio cmdline:
fio --ioengine=libaio --iodepth=4 --rw=randread --bs=4k --size=10g --numjobs=1 --direct=1 --sync=1 --name=x --filename=/dev/drbd854
(or randwrite, but that is going to ext4 inside a vm on this device)
I tried the test again with blocking both directions.
Read test resulted in the same outcome, but for the write test it's different, now there's an IO error reported to the application and a stuck request, but not the logic bug (this could be due to random chance, only tried it once):
Feb 18 14:18:06 nodeC kernel: drbd u12215 nodeA: PingAck did not arrive in time.
Feb 18 14:18:06 nodeC kernel: drbd u12215 nodeA: conn( Connected -> NetworkFailure ) peer( Secondary -> Unknown )
Feb 18 14:18:06 nodeC kernel: drbd u12215/0 drbd854 nodeA: pdsk( UpToDate -> DUnknown ) repl( Established -> Off )
Feb 18 14:18:06 nodeC kernel: drbd u12215 nodeA: ack_receiver terminated
Feb 18 14:18:06 nodeC kernel: drbd u12215 nodeA: Terminating ack_recv thread
Feb 18 14:18:07 nodeC kernel: drbd u12215 nodeA: Terminating sender thread
Feb 18 14:18:07 nodeC kernel: drbd u12215 nodeA: Starting sender thread (from drbd_r_u12215 [78220])
Feb 18 14:18:07 nodeC kernel: drbd u12215/0 drbd854: sending new current UUID: CF34C6AD6BADB163
Feb 18 14:18:07 nodeC kernel: drbd u12215 nodeA: Connection closed
Feb 18 14:18:07 nodeC kernel: drbd u12215 nodeA: helper command: /sbin/drbdadm disconnected
Feb 18 14:18:07 nodeC kernel: drbd u12215 nodeA: helper command: /sbin/drbdadm disconnected exit code 0
Feb 18 14:18:07 nodeC kernel: drbd u12215 nodeA: conn( NetworkFailure -> Unconnected )
Feb 18 14:18:07 nodeC kernel: drbd u12215 nodeA: Restarting receiver thread
Feb 18 14:18:07 nodeC kernel: drbd u12215 nodeA: conn( Unconnected -> Connecting )
Feb 18 14:18:26 nodeC kernel: drbd u12215 nodeB: PingAck did not arrive in time.
Feb 18 14:18:26 nodeC kernel: drbd u12215: susp-io( no -> no-disk)
Feb 18 14:18:26 nodeC kernel: drbd u12215 nodeB: conn( Connected -> NetworkFailure ) peer( Secondary -> Unknown )
Feb 18 14:18:26 nodeC kernel: drbd u12215/0 drbd854 nodeB: pdsk( UpToDate -> DUnknown ) repl( Established -> Off )
Feb 18 14:18:26 nodeC kernel: drbd u12215 nodeB: ack_receiver terminated
Feb 18 14:18:26 nodeC kernel: drbd u12215/0 drbd854 nodeA: helper command: /sbin/drbdadm pri-on-incon-degr
Feb 18 14:18:26 nodeC kernel: drbd u12215 nodeB: Terminating ack_recv thread
Feb 18 14:18:26 nodeC kernel: drbd u12215/0 drbd854 nodeA: helper command: /sbin/drbdadm pri-on-incon-degr exit code 0
Feb 18 14:18:26 nodeC kernel: drbd u12215/0 drbd854 nodeB: helper command: /sbin/drbdadm pri-on-incon-degr
Feb 18 14:18:26 nodeC kernel: drbd u12215/0 drbd854 nodeB: helper command: /sbin/drbdadm pri-on-incon-degr exit code 0
Feb 18 14:18:26 nodeC kernel: drbd u12215 nodeB: Terminating sender thread
Feb 18 14:18:26 nodeC kernel: drbd u12215 nodeB: Starting sender thread (from drbd_r_u12215 [78221])
Feb 18 14:18:27 nodeC kernel: drbd u12215 nodeB: Connection closed
Feb 18 14:18:27 nodeC kernel: drbd u12215 nodeB: helper command: /sbin/drbdadm disconnected
Feb 18 14:18:27 nodeC kernel: drbd u12215 nodeB: helper command: /sbin/drbdadm disconnected exit code 0
Feb 18 14:18:27 nodeC kernel: drbd u12215 nodeB: conn( NetworkFailure -> Unconnected )
Feb 18 14:18:27 nodeC kernel: drbd u12215 nodeB: Restarting receiver thread
Feb 18 14:18:27 nodeC kernel: drbd u12215 nodeB: conn( Unconnected -> Connecting )
Feb 18 14:18:39 nodeC kernel: drbd u12215 nodeA: Handshake to peer 10 successful: Agreed network protocol version 121
Feb 18 14:18:39 nodeC kernel: drbd u12215 nodeA: Feature flags enabled on protocol level: 0xf TRIM THIN_RESYNC WRITE_SAME WRITE_ZEROES.
Feb 18 14:18:39 nodeC kernel: drbd u12215 nodeA: Starting ack_recv thread (from drbd_r_u12215 [78220])
Feb 18 14:18:39 nodeC kernel: drbd u12215: Preparing cluster-wide state change 104547609 (7->10 499/145)
Feb 18 14:18:39 nodeC kernel: drbd u12215: State change 104547609: primary_nodes=80, weak_nodes=FFFFFFFFFFFFFB7F
Feb 18 14:18:39 nodeC kernel: drbd u12215: Committing cluster-wide state change 104547609 (24ms)
Feb 18 14:18:39 nodeC kernel: drbd u12215 nodeA: conn( Connecting -> Connected ) peer( Unknown -> Secondary )
Feb 18 14:18:39 nodeC kernel: drbd u12215/0 drbd854 nodeA: pdsk( DUnknown -> Consistent ) repl( Off -> Established )
Feb 18 14:18:39 nodeC kernel: drbd u12215: susp-io( no-disk -> no)
Feb 18 14:18:39 nodeC kernel: drbd u12215/0 drbd854: IO ERROR: neither local nor remote data, sector 0+0
Feb 18 14:18:39 nodeC kernel: drbd u12215/0 drbd854 nodeA: pdsk( Consistent -> UpToDate )
Feb 18 14:18:59 nodeC kernel: drbd u12215 nodeB: Handshake to peer 11 successful: Agreed network protocol version 121
Feb 18 14:18:59 nodeC kernel: drbd u12215 nodeB: Feature flags enabled on protocol level: 0xf TRIM THIN_RESYNC WRITE_SAME WRITE_ZEROES.
Feb 18 14:18:59 nodeC kernel: drbd u12215 nodeB: Starting ack_recv thread (from drbd_r_u12215 [78221])
Feb 18 14:19:00 nodeC kernel: drbd u12215: Preparing cluster-wide state change 1681449179 (7->11 499/145)
Feb 18 14:19:00 nodeC kernel: drbd u12215: State change 1681449179: primary_nodes=80, weak_nodes=FFFFFFFFFFFFF37F
Feb 18 14:19:00 nodeC kernel: drbd u12215: Committing cluster-wide state change 1681449179 (28ms)
Feb 18 14:19:00 nodeC kernel: drbd u12215 nodeB: conn( Connecting -> Connected ) peer( Unknown -> Secondary )
Feb 18 14:19:00 nodeC kernel: drbd u12215/0 drbd854 nodeB: pdsk( DUnknown -> Outdated ) repl( Off -> Established )
Feb 18 14:19:00 nodeC kernel: drbd u12215/0 drbd854 nodeB: pdsk( Outdated -> UpToDate )
Feb 18 14:19:43 nodeC kernel: drbd u12215/0 drbd854 nodeB: Remote failed to finish a request within 87433ms > ko-count (7) * timeout (60 * 0.1s)
Feb 18 14:19:43 nodeC kernel: drbd u12215 nodeB: conn( Connected -> Timeout ) peer( Secondary -> Unknown )
Feb 18 14:19:43 nodeC kernel: drbd u12215/0 drbd854 nodeB: pdsk( UpToDate -> DUnknown ) repl( Established -> Off )
Feb 18 14:19:43 nodeC kernel: drbd u12215 nodeB: ack_receiver terminated
Feb 18 14:19:43 nodeC kernel: drbd u12215 nodeB: Terminating ack_recv thread
Feb 18 14:19:43 nodeC kernel: drbd u12215 nodeB: Terminating sender thread
Feb 18 14:19:43 nodeC kernel: drbd u12215 nodeB: Starting sender thread (from drbd_r_u12215 [78221])
Feb 18 14:19:43 nodeC kernel: drbd u12215/0 drbd854: sending new current UUID: FE90848345F0258D
Feb 18 14:19:43 nodeC kernel: drbd u12215 nodeB: Connection closed
Feb 18 14:19:43 nodeC kernel: drbd u12215 nodeA: Preparing remote state change 2931506494
Feb 18 14:19:43 nodeC kernel: drbd u12215 nodeB: helper command: /sbin/drbdadm disconnected
Feb 18 14:19:43 nodeC kernel: drbd u12215 nodeB: helper command: /sbin/drbdadm disconnected exit code 0
Feb 18 14:19:43 nodeC kernel: drbd u12215 nodeB: conn( Timeout -> Unconnected )
Feb 18 14:19:43 nodeC kernel: drbd u12215 nodeB: Restarting receiver thread
Feb 18 14:19:43 nodeC kernel: drbd u12215 nodeB: conn( Unconnected -> Connecting )
Feb 18 14:19:43 nodeC kernel: drbd u12215 nodeA: Committing remote state change 2931506494 (primary_nodes=80)
Feb 18 14:19:43 nodeC kernel: drbd u12215/0 drbd854 nodeB: pdsk( DUnknown -> Outdated )
Feb 18 14:19:44 nodeC kernel: drbd u12215 nodeB: Handshake to peer 11 successful: Agreed network protocol version 121
Feb 18 14:19:44 nodeC kernel: drbd u12215 nodeB: Feature flags enabled on protocol level: 0xf TRIM THIN_RESYNC WRITE_SAME WRITE_ZEROES.
Feb 18 14:19:44 nodeC kernel: drbd u12215 nodeB: Starting ack_recv thread (from drbd_r_u12215 [78221])
Feb 18 14:19:44 nodeC kernel: drbd u12215: Preparing cluster-wide state change 3841230571 (7->11 499/145)
Feb 18 14:19:44 nodeC kernel: drbd u12215: State change 3841230571: primary_nodes=80, weak_nodes=FFFFFFFFFFFFF37F
Feb 18 14:19:44 nodeC kernel: drbd u12215: Committing cluster-wide state change 3841230571 (28ms)
Feb 18 14:19:44 nodeC kernel: drbd u12215 nodeB: conn( Connecting -> Connected ) peer( Unknown -> Secondary )
Feb 18 14:19:44 nodeC kernel: drbd u12215/0 drbd854 nodeB: repl( Off -> Established )
Feb 18 14:19:44 nodeC kernel: drbd u12215/0 drbd854 nodeB: pdsk( Outdated -> Inconsistent ) resync-susp( no -> peer )
Feb 18 14:19:44 nodeC kernel: drbd u12215/0 drbd854 nodeB: pdsk( Inconsistent -> UpToDate ) resync-susp( peer -> no )
[root@nodeC ~]# cat /sys/block/drbd854/inflight
1 0
Testing symmetric blocking with 9.1.5:
version: 9.1.5 (api:2/proto:110-121)
GIT-hash: f41bc23cec1e919932ce35947f1b4e67d9e6db74 build by mockbuild@, 2021-12-18 14:03:28
Transports (api:17): tcp (9.1.5)
Read: same as before, IO error as soon as the second connection times out.
Write: similar to 9.1.6, with IO error after nodeA reconnecting, but also the logic bug:
Feb 18 14:44:33 nodeC kernel: drbd u12215 nodeA: PingAck did not arrive in time.
Feb 18 14:44:33 nodeC kernel: drbd u12215 nodeA: conn( Connected -> NetworkFailure ) peer( Secondary -> Unknown )
Feb 18 14:44:33 nodeC kernel: drbd u12215/0 drbd854 nodeA: pdsk( UpToDate -> DUnknown ) repl( Established -> Off )
Feb 18 14:44:33 nodeC kernel: drbd u12215 nodeA: ack_receiver terminated
Feb 18 14:44:33 nodeC kernel: drbd u12215 nodeA: Terminating ack_recv thread
Feb 18 14:44:33 nodeC kernel: drbd u12215 nodeA: Terminating sender thread
Feb 18 14:44:33 nodeC kernel: drbd u12215 nodeA: Starting sender thread (from drbd_r_u12215 [2970])
Feb 18 14:44:33 nodeC kernel: drbd u12215/0 drbd854: sending new current UUID: AB135B0DD32864CF
Feb 18 14:44:33 nodeC kernel: drbd u12215 nodeA: Connection closed
Feb 18 14:44:33 nodeC kernel: drbd u12215 nodeA: helper command: /sbin/drbdadm disconnected
Feb 18 14:44:33 nodeC kernel: drbd u12215 nodeA: helper command: /sbin/drbdadm disconnected exit code 0
Feb 18 14:44:33 nodeC kernel: drbd u12215 nodeA: conn( NetworkFailure -> Unconnected )
Feb 18 14:44:33 nodeC kernel: drbd u12215 nodeA: Restarting receiver thread
Feb 18 14:44:33 nodeC kernel: drbd u12215 nodeA: conn( Unconnected -> Connecting )
Feb 18 14:44:53 nodeC kernel: drbd u12215 nodeB: PingAck did not arrive in time.
Feb 18 14:44:53 nodeC kernel: drbd u12215: susp-io( no -> no-disk)
Feb 18 14:44:53 nodeC kernel: drbd u12215 nodeB: conn( Connected -> NetworkFailure ) peer( Secondary -> Unknown )
Feb 18 14:44:53 nodeC kernel: drbd u12215/0 drbd854 nodeB: pdsk( UpToDate -> DUnknown ) repl( Established -> Off )
Feb 18 14:44:53 nodeC kernel: drbd u12215 nodeB: ack_receiver terminated
Feb 18 14:44:53 nodeC kernel: drbd u12215/0 drbd854 nodeA: helper command: /sbin/drbdadm pri-on-incon-degr
Feb 18 14:44:53 nodeC kernel: drbd u12215 nodeB: Terminating ack_recv thread
Feb 18 14:44:53 nodeC kernel: drbd u12215/0 drbd854 nodeA: helper command: /sbin/drbdadm pri-on-incon-degr exit code 0
Feb 18 14:44:53 nodeC kernel: drbd u12215/0 drbd854 nodeB: helper command: /sbin/drbdadm pri-on-incon-degr
Feb 18 14:44:53 nodeC kernel: drbd u12215/0 drbd854 nodeB: helper command: /sbin/drbdadm pri-on-incon-degr exit code 0
Feb 18 14:44:53 nodeC kernel: drbd u12215 nodeB: Terminating sender thread
Feb 18 14:44:53 nodeC kernel: drbd u12215 nodeB: Starting sender thread (from drbd_r_u12215 [2971])
Feb 18 14:44:53 nodeC kernel: drbd u12215 nodeB: Connection closed
Feb 18 14:44:53 nodeC kernel: drbd u12215 nodeB: helper command: /sbin/drbdadm disconnected
Feb 18 14:44:53 nodeC kernel: drbd u12215 nodeB: helper command: /sbin/drbdadm disconnected exit code 0
Feb 18 14:44:53 nodeC kernel: drbd u12215 nodeB: conn( NetworkFailure -> Unconnected )
Feb 18 14:44:53 nodeC kernel: drbd u12215 nodeB: Restarting receiver thread
Feb 18 14:44:53 nodeC kernel: drbd u12215 nodeB: conn( Unconnected -> Connecting )
Feb 18 14:45:03 nodeC kernel: drbd u12215 nodeA: Handshake to peer 10 successful: Agreed network protocol version 121
Feb 18 14:45:03 nodeC kernel: drbd u12215 nodeA: Feature flags enabled on protocol level: 0xf TRIM THIN_RESYNC WRITE_SAME WRITE_ZEROES.
Feb 18 14:45:03 nodeC kernel: drbd u12215 nodeA: Starting ack_recv thread (from drbd_r_u12215 [2970])
Feb 18 14:45:03 nodeC kernel: drbd u12215: Preparing cluster-wide state change 2772152590 (7->10 499/145)
Feb 18 14:45:03 nodeC kernel: drbd u12215: State change 2772152590: primary_nodes=80, weak_nodes=FFFFFFFFFFFFFB7F
Feb 18 14:45:03 nodeC kernel: drbd u12215: Committing cluster-wide state change 2772152590 (76ms)
Feb 18 14:45:03 nodeC kernel: drbd u12215 nodeA: conn( Connecting -> Connected ) peer( Unknown -> Secondary )
Feb 18 14:45:03 nodeC kernel: drbd u12215/0 drbd854 nodeA: pdsk( DUnknown -> Inconsistent ) repl( Off -> Established ) resync-susp( no -> peer )
Feb 18 14:45:03 nodeC kernel: drbd u12215: susp-io( no-disk -> no)
Feb 18 14:45:03 nodeC kernel: drbd u12215/0 drbd854: IO ERROR: neither local nor remote data, sector 4721080+24
Feb 18 14:45:03 nodeC kernel: drbd u12215/0 drbd854: IO ERROR: neither local nor remote data, sector 0+0
Feb 18 14:45:05 nodeC kernel: drbd u12215/0 drbd854 nodeA: pdsk( Inconsistent -> Outdated ) resync-susp( peer -> no )
Feb 18 14:45:25 nodeC kernel: drbd u12215 nodeB: Handshake to peer 11 successful: Agreed network protocol version 121
Feb 18 14:45:25 nodeC kernel: drbd u12215 nodeB: Feature flags enabled on protocol level: 0xf TRIM THIN_RESYNC WRITE_SAME WRITE_ZEROES.
Feb 18 14:45:25 nodeC kernel: drbd u12215 nodeB: Starting ack_recv thread (from drbd_r_u12215 [2971])
Feb 18 14:45:25 nodeC kernel: drbd u12215: Preparing cluster-wide state change 1015297695 (7->11 499/145)
Feb 18 14:45:25 nodeC kernel: drbd u12215: State change 1015297695: primary_nodes=80, weak_nodes=FFFFFFFFFFFFF37F
Feb 18 14:45:25 nodeC kernel: drbd u12215: Committing cluster-wide state change 1015297695 (24ms)
Feb 18 14:45:25 nodeC kernel: drbd u12215 nodeB: conn( Connecting -> Connected ) peer( Unknown -> Secondary )
Feb 18 14:45:25 nodeC kernel: drbd u12215/0 drbd854 nodeB: pdsk( DUnknown -> Outdated ) repl( Off -> Established )
Feb 18 14:45:25 nodeC kernel: drbd u12215/0 drbd854 nodeB: pdsk( Outdated -> UpToDate )
Feb 18 14:46:11 nodeC kernel: drbd u12215/0 drbd854 nodeB: Remote failed to finish a request within 89175ms > ko-count (7) * timeout (60 * 0.1s)
Feb 18 14:46:11 nodeC kernel: drbd u12215: susp-io( no -> no-disk)
Feb 18 14:46:11 nodeC kernel: drbd u12215 nodeB: conn( Connected -> Timeout ) peer( Secondary -> Unknown )
Feb 18 14:46:11 nodeC kernel: drbd u12215/0 drbd854 nodeB: pdsk( UpToDate -> DUnknown ) repl( Established -> Off )
Feb 18 14:46:11 nodeC kernel: drbd u12215 nodeB: ack_receiver terminated
Feb 18 14:46:11 nodeC kernel: drbd u12215/0 drbd854 nodeA: helper command: /sbin/drbdadm pri-on-incon-degr
Feb 18 14:46:11 nodeC kernel: drbd u12215 nodeB: Terminating ack_recv thread
Feb 18 14:46:11 nodeC kernel: drbd u12215/0 drbd854 nodeA: helper command: /sbin/drbdadm pri-on-incon-degr exit code 0
Feb 18 14:46:11 nodeC kernel: drbd u12215/0 drbd854 nodeB: helper command: /sbin/drbdadm pri-on-incon-degr
Feb 18 14:46:11 nodeC kernel: drbd u12215/0 drbd854 nodeB: helper command: /sbin/drbdadm pri-on-incon-degr exit code 0
Feb 18 14:46:11 nodeC kernel: drbd u12215 nodeB: Terminating sender thread
Feb 18 14:46:11 nodeC kernel: drbd u12215 nodeB: Starting sender thread (from drbd_r_u12215 [2971])
Feb 18 14:46:11 nodeC kernel: drbd u12215 nodeB: Connection closed
Feb 18 14:46:11 nodeC kernel: drbd u12215 nodeA: Preparing remote state change 3864336905
Feb 18 14:46:11 nodeC kernel: drbd u12215 nodeB: helper command: /sbin/drbdadm disconnected
Feb 18 14:46:11 nodeC kernel: drbd u12215 nodeB: helper command: /sbin/drbdadm disconnected exit code 0
Feb 18 14:46:11 nodeC kernel: drbd u12215 nodeB: conn( Timeout -> Unconnected )
Feb 18 14:46:11 nodeC kernel: drbd u12215 nodeB: Restarting receiver thread
Feb 18 14:46:11 nodeC kernel: drbd u12215 nodeB: conn( Unconnected -> Connecting )
Feb 18 14:46:11 nodeC kernel: drbd u12215 nodeA: Committing remote state change 3864336905 (primary_nodes=80)
Feb 18 14:46:11 nodeC kernel: drbd u12215/0 drbd854 nodeB: pdsk( DUnknown -> Outdated )
Feb 18 14:46:12 nodeC kernel: drbd u12215 nodeB: Handshake to peer 11 successful: Agreed network protocol version 121
Feb 18 14:46:12 nodeC kernel: drbd u12215 nodeB: Feature flags enabled on protocol level: 0xf TRIM THIN_RESYNC WRITE_SAME WRITE_ZEROES.
Feb 18 14:46:12 nodeC kernel: drbd u12215 nodeB: Starting ack_recv thread (from drbd_r_u12215 [2971])
Feb 18 14:46:12 nodeC kernel: drbd u12215: Preparing cluster-wide state change 3509072553 (7->11 499/145)
Feb 18 14:46:12 nodeC kernel: drbd u12215: State change 3509072553: primary_nodes=80, weak_nodes=FFFFFFFFFFFFF37F
Feb 18 14:46:12 nodeC kernel: drbd u12215: Committing cluster-wide state change 3509072553 (28ms)
Feb 18 14:46:12 nodeC kernel: drbd u12215 nodeB: conn( Connecting -> Connected ) peer( Unknown -> Secondary )
Feb 18 14:46:12 nodeC kernel: drbd u12215/0 drbd854 nodeB: repl( Off -> Established )
Feb 18 14:46:12 nodeC kernel: drbd u12215: susp-io( no-disk -> no)
Feb 18 14:46:12 nodeC kernel: drbd u12215/0 drbd854 nodeB: pdsk( Outdated -> UpToDate )
Feb 18 14:46:56 nodeC kernel: drbd u12215/0 drbd854 nodeB: Remote failed to finish a request within 134231ms > ko-count (7) * timeout (60 * 0.1s)
Feb 18 14:46:56 nodeC kernel: drbd u12215: susp-io( no -> no-disk)
Feb 18 14:46:56 nodeC kernel: drbd u12215 nodeB: conn( Connected -> Timeout ) peer( Secondary -> Unknown )
Feb 18 14:46:56 nodeC kernel: drbd u12215/0 drbd854 nodeB: pdsk( UpToDate -> DUnknown ) repl( Established -> Off )
Feb 18 14:46:56 nodeC kernel: drbd u12215 nodeB: ack_receiver terminated
Feb 18 14:46:56 nodeC kernel: drbd u12215/0 drbd854 nodeA: helper command: /sbin/drbdadm pri-on-incon-degr
Feb 18 14:46:56 nodeC kernel: drbd u12215 nodeB: Terminating ack_recv thread
Feb 18 14:46:56 nodeC kernel: drbd u12215/0 drbd854 nodeA: helper command: /sbin/drbdadm pri-on-incon-degr exit code 0
Feb 18 14:46:56 nodeC kernel: drbd u12215/0 drbd854 nodeB: helper command: /sbin/drbdadm pri-on-incon-degr
Feb 18 14:46:56 nodeC kernel: drbd u12215/0 drbd854 nodeB: helper command: /sbin/drbdadm pri-on-incon-degr exit code 0
Feb 18 14:46:56 nodeC kernel: drbd u12215/0 drbd854: drbd_req_destroy: Logic BUG rq_state: 284000, completion_ref = 1
Feb 18 14:47:24 nodeC kernel: watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [drbd_s_u12215:4007]
Feb 18 14:47:24 nodeC kernel: Modules linked in: dm_mod vhost_net vhost vhost_iotlb tap tun drbd_transport_tcp(OE) drbd(OE) nfsv3 nfs_acl nfs lockd grace fscache netconsole ixgbe mdio dca mlx4_ib mlx4_en nf_defrag_ipv6 nf_defrag_ipv4 bridge stp llc mlx4_core nft_compat nft_counter nf_tables nfnetlink rpcrdma rdma_ucm
b_srpt ib_isert iscsi_target_mod target_core_mod ib_iser libiscsi scsi_transport_iscsi ib_umad rdma_cm ib_ipoib iw_cm ib_cm ib_uverbs ib_core sunrpc vfat fat intel_rapl_msr iTCO_wdt iTCO_vendor_support mxm_wmi dcdbas intel_rapl_common sb_edac x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm irqbypass crct1
0dif_pclmul crc32_pclmul mgag200 ghash_clmulni_intel i2c_algo_bit rapl drm_kms_helper intel_cstate syscopyarea intel_uncore sysfillrect cdc_ether sysimgblt pcspkr fb_sys_fops usbnet drm mii ipmi_ssif lpc_ich tg3 ipmi_si ipmi_devintf ipmi_msghandler mei_me mei wmi acpi_power_meter xfs libcrc32c raid1 sr_mod cdrom sd_mo
d t10_pi sg ahci libahci crc32c_intel libata megaraid_sas
Feb 18 14:47:24 nodeC kernel: [last unloaded: mlx4_core]
Feb 18 14:47:24 nodeC kernel: CPU: 0 PID: 4007 Comm: drbd_s_u12215 Tainted: G OE --------- - - 4.18.0-348.12.2.el8_5.x86_64 #1
Feb 18 14:47:24 nodeC kernel: Hardware name: Dell Inc. PowerEdge R730/0WCJNT, BIOS 2.13.0 05/14/2021
Feb 18 14:47:24 nodeC kernel: RIP: 0010:drbd_sender+0x208/0x3c0 [drbd]
Feb 18 14:47:24 nodeC kernel: Code: be 07 00 00 00 48 89 ef 48 8d 4c 24 10 e8 70 57 01 00 48 8b 53 10 b8 00 fe ff ff f0 0f c1 82 30 01 00 00 fb 66 0f 1f 44 00 00 <48> 83 7c 24 10 00 74 0d 48 8d 74 24 10 4c 89 ff e8 53 56 01 00 48
Feb 18 14:47:24 nodeC kernel: RSP: 0018:ffffb00f5a7ffe80 EFLAGS: 00000247 ORIG_RAX: ffffffffffffff13
Feb 18 14:47:24 nodeC kernel: RAX: 0000000000000200 RBX: ffffa0a6d63c9000 RCX: 000000000000031c
Feb 18 14:47:24 nodeC kernel: RDX: ffffa0a8f82fa800 RSI: 0000000000284000 RDI: ffffa0aaab9f9280
Feb 18 14:47:24 nodeC kernel: RBP: ffffa0aaab9f90c0 R08: ffffa0aaab9f90d6 R09: 0000000000000000
Feb 18 14:47:24 nodeC kernel: R10: ffffa0a6d63c9000 R11: ffffa0a6d63c9050 R12: ffffa0a6d63c94f0
Feb 18 14:47:24 nodeC kernel: R13: ffffa0a8f82f9000 R14: ffffa0a6d63c9048 R15: ffffa0a6d02cd800
Feb 18 14:47:24 nodeC kernel: FS: 0000000000000000(0000) GS:ffffa0a53f600000(0000) knlGS:0000000000000000
Feb 18 14:47:24 nodeC kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Feb 18 14:47:24 nodeC kernel: CR2: 000000c00218d000 CR3: 0000007ffb610006 CR4: 00000000003726f0
Feb 18 14:47:24 nodeC kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Feb 18 14:47:24 nodeC kernel: DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Feb 18 14:47:24 nodeC kernel: Call Trace:
Feb 18 14:47:24 nodeC kernel: ? __drbd_next_peer_device_ref+0x140/0x140 [drbd]
Feb 18 14:47:24 nodeC kernel: drbd_thread_setup+0x5e/0x160 [drbd]
Feb 18 14:47:24 nodeC kernel: ? __drbd_next_peer_device_ref+0x140/0x140 [drbd]
Feb 18 14:47:24 nodeC kernel: kthread+0x116/0x130
Feb 18 14:47:24 nodeC kernel: ? kthread_flush_work_fn+0x10/0x10
Feb 18 14:47:24 nodeC kernel: ret_from_fork+0x35/0x40
With 9.0.32 (only on nodeC, A/B are still 9.1.6):
Read: as before, IO error on second disconnect
Write: results in a kernel crash instead of complaining of logic bugs:
version: 9.0.32-1 (api:2/proto:86-121)
GIT-hash: 7d2933d5a3764fcc5e0bf54b71fd9cfb0363be1a build by root@nodeC, 2022-02-18 14:55:18
Transports (api:17): tcp (9.0.32-1)
2022-02-18 14:59:13 err kernel: drbd u12215 nodeA: PingAck did not arrive in time.
2022-02-18 14:59:13 info kernel: drbd u12215 nodeA: conn( Connected -> NetworkFailure ) peer( Secondary -> Unknown )
2022-02-18 14:59:13 info kernel: drbd u12215/0 drbd854 nodeA: pdsk( UpToDate -> DUnknown ) repl( Established -> Off )
2022-02-18 14:59:13 info kernel: drbd u12215 nodeA: ack_receiver terminated
2022-02-18 14:59:13 info kernel: drbd u12215 nodeA: Terminating ack_recv thread
2022-02-18 14:59:13 info kernel: drbd u12215 nodeA: Terminating sender thread
2022-02-18 14:59:13 info kernel: drbd u12215 nodeA: Starting sender thread (from drbd_r_u12215 [5386])
2022-02-18 14:59:13 info kernel: drbd u12215/0 drbd854: sending new current UUID: 762953691B1D87C3
2022-02-18 14:59:13 info kernel: drbd u12215 nodeA: Connection closed
2022-02-18 14:59:13 info kernel: drbd u12215 nodeA: helper command: /sbin/drbdadm disconnected
2022-02-18 14:59:13 info kernel: drbd u12215 nodeA: helper command: /sbin/drbdadm disconnected exit code 0
2022-02-18 14:59:13 info kernel: drbd u12215 nodeA: conn( NetworkFailure -> Unconnected )
2022-02-18 14:59:13 info kernel: drbd u12215 nodeA: Restarting receiver thread
2022-02-18 14:59:13 info kernel: drbd u12215 nodeA: conn( Unconnected -> Connecting )
2022-02-18 14:59:33 err kernel: drbd u12215 nodeB: PingAck did not arrive in time.
2022-02-18 14:59:33 info kernel: drbd u12215: susp-io( no -> no-disk)
2022-02-18 14:59:33 info kernel: drbd u12215 nodeB: conn( Connected -> NetworkFailure ) peer( Secondary -> Unknown )
2022-02-18 14:59:33 info kernel: drbd u12215/0 drbd854 nodeB: pdsk( UpToDate -> DUnknown ) repl( Established -> Off )
2022-02-18 14:59:33 info kernel: drbd u12215 nodeB: ack_receiver terminated
2022-02-18 14:59:33 info kernel: drbd u12215/0 drbd854 nodeA: helper command: /sbin/drbdadm pri-on-incon-degr
2022-02-18 14:59:33 info kernel: drbd u12215 nodeB: Terminating ack_recv thread
2022-02-18 14:59:33 info kernel: drbd u12215/0 drbd854 nodeA: helper command: /sbin/drbdadm pri-on-incon-degr exit code 0
2022-02-18 14:59:33 info kernel: drbd u12215/0 drbd854 nodeB: helper command: /sbin/drbdadm pri-on-incon-degr
2022-02-18 14:59:33 info kernel: drbd u12215/0 drbd854 nodeB: helper command: /sbin/drbdadm pri-on-incon-degr exit code 0
2022-02-18 14:59:33 info kernel: drbd u12215 nodeB: Terminating sender thread
2022-02-18 14:59:33 info kernel: drbd u12215 nodeB: Starting sender thread (from drbd_r_u12215 [5387])
2022-02-18 14:59:33 info kernel: drbd u12215 nodeB: Connection closed
2022-02-18 14:59:33 info kernel: drbd u12215 nodeB: helper command: /sbin/drbdadm disconnected
2022-02-18 14:59:33 info kernel: drbd u12215 nodeB: helper command: /sbin/drbdadm disconnected exit code 0
2022-02-18 14:59:33 info kernel: drbd u12215 nodeB: conn( NetworkFailure -> Unconnected )
2022-02-18 14:59:33 info kernel: drbd u12215 nodeB: Restarting receiver thread
2022-02-18 14:59:33 info kernel: drbd u12215 nodeB: conn( Unconnected -> Connecting )
2022-02-18 14:59:44 info kernel: drbd u12215 nodeA: Handshake to peer 10 successful: Agreed network protocol version 121
2022-02-18 14:59:44 info kernel: drbd u12215 nodeA: Feature flags enabled on protocol level: 0xf TRIM THIN_RESYNC WRITE_SAME WRITE_ZEROES.
2022-02-18 14:59:44 info kernel: drbd u12215 nodeA: Starting ack_recv thread (from drbd_r_u12215 [5386])
2022-02-18 14:59:44 info kernel: drbd u12215: Preparing cluster-wide state change 2786458399 (7->10 499/145)
2022-02-18 14:59:44 info kernel: drbd u12215: State change 2786458399: primary_nodes=80, weak_nodes=FFFFFFFFFFFFFB7F
2022-02-18 14:59:44 info kernel: drbd u12215: Committing cluster-wide state change 2786458399 (93ms)
2022-02-18 14:59:44 info kernel: drbd u12215 nodeA: conn( Connecting -> Connected ) peer( Unknown -> Secondary )
2022-02-18 14:59:44 info kernel: drbd u12215/0 drbd854 nodeA: pdsk( DUnknown -> Inconsistent ) repl( Off -> Established ) resync-susp( no -> peer )
2022-02-18 14:59:44 info kernel: drbd u12215: susp-io( no-disk -> no)
2022-02-18 14:59:44 err kernel: drbd u12215/0 drbd854: IO ERROR: neither local nor remote data, sector 4721080+24
2022-02-18 14:59:44 err kernel: drbd u12215/0 drbd854: IO ERROR: neither local nor remote data, sector 4721080+8
2022-02-18 14:59:44 err kernel: drbd u12215/0 drbd854: IO ERROR: neither local nor remote data, sector 829328+32
2022-02-18 14:59:44 err kernel: drbd u12215/0 drbd854: IO ERROR: neither local nor remote data, sector 829328+8
2022-02-18 14:59:44 err kernel: drbd u12215/0 drbd854: IO ERROR: neither local nor remote data, sector 19657200+16
2022-02-18 14:59:44 err kernel: drbd u12215/0 drbd854: IO ERROR: neither local nor remote data, sector 19657200+16
2022-02-18 14:59:45 info kernel: drbd u12215/0 drbd854 nodeA: pdsk( Inconsistent -> Outdated ) resync-susp( peer -> no )
2022-02-18 14:59:47 err kernel: drbd u12215/0 drbd854: IO ERROR: neither local nor remote data, sector 534299+3
2022-02-18 15:00:06 info kernel: drbd u12215 nodeB: Handshake to peer 11 successful: Agreed network protocol version 121
2022-02-18 15:00:06 info kernel: drbd u12215 nodeB: Feature flags enabled on protocol level: 0xf TRIM THIN_RESYNC WRITE_SAME WRITE_ZEROES.
2022-02-18 15:00:06 info kernel: drbd u12215 nodeB: Starting ack_recv thread (from drbd_r_u12215 [5387])
2022-02-18 15:00:06 info kernel: drbd u12215: Preparing cluster-wide state change 4096656875 (7->11 499/145)
2022-02-18 15:00:06 info kernel: drbd u12215: State change 4096656875: primary_nodes=80, weak_nodes=FFFFFFFFFFFFF37F
2022-02-18 15:00:06 info kernel: drbd u12215: Committing cluster-wide state change 4096656875 (24ms)
2022-02-18 15:00:06 info kernel: drbd u12215 nodeB: conn( Connecting -> Connected ) peer( Unknown -> Secondary )
2022-02-18 15:00:06 info kernel: drbd u12215/0 drbd854 nodeB: pdsk( DUnknown -> Outdated ) repl( Off -> Established )
2022-02-18 15:00:06 info kernel: drbd u12215/0 drbd854 nodeB: pdsk( Outdated -> UpToDate )
2022-02-18 15:00:52 warning kernel: drbd u12215/0 drbd854: Remote failed to finish a request within 89864ms > ko-count (7) * timeout (60 * 0.1s)
2022-02-18 15:00:52 info kernel: drbd u12215: susp-io( no -> no-disk)
2022-02-18 15:00:52 info kernel: drbd u12215 nodeB: conn( Connected -> Timeout ) peer( Secondary -> Unknown )
2022-02-18 15:00:52 info kernel: drbd u12215/0 drbd854 nodeB: pdsk( UpToDate -> DUnknown ) repl( Established -> Off )
2022-02-18 15:00:52 info kernel: drbd u12215 nodeB: ack_receiver terminated
2022-02-18 15:00:52 info kernel: drbd u12215/0 drbd854 nodeA: helper command: /sbin/drbdadm pri-on-incon-degr
2022-02-18 15:00:52 info kernel: drbd u12215 nodeB: Terminating ack_recv thread
2022-02-18 15:00:52 info kernel: drbd u12215/0 drbd854 nodeA: helper command: /sbin/drbdadm pri-on-incon-degr exit code 0
2022-02-18 15:00:52 info kernel: drbd u12215/0 drbd854 nodeB: helper command: /sbin/drbdadm pri-on-incon-degr
2022-02-18 15:00:52 info kernel: drbd u12215/0 drbd854 nodeB: helper command: /sbin/drbdadm pri-on-incon-degr exit code 0
2022-02-18 15:00:52 info kernel: drbd u12215 nodeB: Terminating sender thread
2022-02-18 15:00:52 info kernel: drbd u12215 nodeB: Starting sender thread (from drbd_r_u12215 [5387])
2022-02-18 15:00:52 info kernel: drbd u12215 nodeB: Connection closed
2022-02-18 15:00:52 info kernel: drbd u12215 nodeA: Preparing remote state change 694803547
2022-02-18 15:00:52 info kernel: drbd u12215 nodeB: helper command: /sbin/drbdadm disconnected
2022-02-18 15:00:52 info kernel: drbd u12215 nodeB: helper command: /sbin/drbdadm disconnected exit code 0
2022-02-18 15:00:52 info kernel: drbd u12215 nodeB: conn( Timeout -> Unconnected )
2022-02-18 15:00:52 info kernel: drbd u12215 nodeB: Restarting receiver thread
2022-02-18 15:00:52 info kernel: drbd u12215 nodeB: conn( Unconnected -> Connecting )
2022-02-18 15:00:52 info kernel: drbd u12215 nodeA: Committing remote state change 694803547 (primary_nodes=80)
2022-02-18 15:00:52 info kernel: drbd u12215/0 drbd854 nodeB: pdsk( DUnknown -> Outdated )
2022-02-18 15:00:53 info kernel: drbd u12215 nodeB: Handshake to peer 11 successful: Agreed network protocol version 121
2022-02-18 15:00:53 info kernel: drbd u12215 nodeB: Feature flags enabled on protocol level: 0xf TRIM THIN_RESYNC WRITE_SAME WRITE_ZEROES.
2022-02-18 15:00:53 info kernel: drbd u12215 nodeB: Starting ack_recv thread (from drbd_r_u12215 [5387])
2022-02-18 15:00:53 info kernel: drbd u12215: Preparing cluster-wide state change 268496490 (7->11 499/145)
2022-02-18 15:00:53 info kernel: drbd u12215: State change 268496490: primary_nodes=80, weak_nodes=FFFFFFFFFFFFF37F
2022-02-18 15:00:53 info kernel: drbd u12215: Committing cluster-wide state change 268496490 (24ms)
2022-02-18 15:00:53 info kernel: drbd u12215 nodeB: conn( Connecting -> Connected ) peer( Unknown -> Secondary )
2022-02-18 15:00:53 info kernel: drbd u12215/0 drbd854 nodeB: repl( Off -> Established )
2022-02-18 15:00:53 info kernel: drbd u12215: susp-io( no-disk -> no)
2022-02-18 15:00:53 info kernel: drbd u12215/0 drbd854 nodeB: pdsk( Outdated -> UpToDate )
2022-02-18 15:01:37 notice [ 578.543260] drbd u12215/0 drbd854: Remote failed to finish a request within 134920ms > ko-count (7) * timeout (60 * 0.1s)
2022-02-18 15:01:37 notice [ 578.544168] drbd u12215: susp-io( no -> no-disk)
2022-02-18 15:01:37 notice [ 578.544720] drbd u12215 nodeB: conn( Connected -> Timeout ) peer( Secondary -> Unknown )
2022-02-18 15:01:37 notice [ 578.545286] drbd u12215/0 drbd854 nodeB: pdsk( UpToDate -> DUnknown ) repl( Established -> Off )
2022-02-18 15:01:37 notice [ 578.545912] drbd u12215 nodeB: ack_receiver terminated
2022-02-18 15:01:37 notice [ 578.546170] drbd u12215/0 drbd854 nodeA: helper command: /sbin/drbdadm pri-on-incon-degr
2022-02-18 15:01:37 notice [ 578.546542] drbd u12215 nodeB: Terminating ack_recv thread
2022-02-18 15:01:37 notice [ 578.550764] drbd u12215/0 drbd854 nodeA: helper command: /sbin/drbdadm pri-on-incon-degr exit code 0
2022-02-18 15:01:37 notice [ 578.551801] drbd u12215/0 drbd854 nodeB: helper command: /sbin/drbdadm pri-on-incon-degr
2022-02-18 15:01:37 notice [ 578.554057] drbd u12215/0 drbd854 nodeB: helper command: /sbin/drbdadm pri-on-incon-degr exit code 0
2022-02-18 15:01:37 notice [ 578.565163] BUG: unable to handle kernel NULL pointer dereference at 0000000000000080
2022-02-18 15:01:37 notice [ 578.565772] PGD 0 P4D 0
2022-02-18 15:01:37 notice [ 578.566340] Oops: 0000 [#1] SMP PTI
2022-02-18 15:01:37 notice [ 578.566908] CPU: 0 PID: 6261 Comm: drbd_s_u12215 Tainted: G OE --------- - - 4.18.0-348.12.2.el8_5.x86_64 #1
2022-02-18 15:01:37 notice [ 578.567564] Hardware name: Dell Inc. PowerEdge R730/0WCJNT, BIOS 2.13.0 05/14/2021
2022-02-18 15:01:37 notice [ 578.568137] RIP: 0010:tl_next_request_for_connection+0x122/0x350 [drbd]
2022-02-18 15:01:37 notice [ 578.568748] Code: 85 db 48 0f 44 dd 49 8b 44 24 10 48 8b 55 58 48 05 48 01 00 00 48 8d 6a a8 48 39 c2 0f 84 25 01 00 00 4c 8b 75 00 48 8b 3c 24 <49> 63 b6 80 00 00 00 e8 12 4a 1b d2 4c 63 88 84 00 00 00 49 89 c7
2022-02-18 15:01:37 notice [ 578.569928] RSP: 0018:ffffb5eed9943db0 EFLAGS: 00010087
2022-02-18 15:01:37 notice [ 578.570544] RAX: ffff8dbc49799148 RBX: ffff8db869166b50 RCX: 0000000000000000
2022-02-18 15:01:37 notice [ 578.571106] RDX: ffff8db869166ba8 RSI: 0000000000000000 RDI: ffff8db819a07048
2022-02-18 15:01:37 notice [ 578.571722] RBP: ffff8db869166b50 R08: ffff8db67f61ab80 R09: 000000000000000b
2022-02-18 15:01:37 notice [ 578.572301] R10: 0000000000000000 R11: 0000000000008000 R12: ffff8db819a07000
2022-02-18 15:01:37 notice [ 578.572889] R13: 0000000000000000 R14: 0000000000000000 R15: ffff8db92aa38800
2022-02-18 15:01:37 notice [ 578.573474] FS: 0000000000000000(0000) GS:ffff8db67f600000(0000) knlGS:0000000000000000
2022-02-18 15:01:37 notice [ 578.574047] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
2022-02-18 15:01:37 notice [ 578.574661] CR2: 0000000000000080 CR3: 000000b6c0810003 CR4: 00000000003726f0
2022-02-18 15:01:37 notice [ 578.575216] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
2022-02-18 15:01:37 notice [ 578.575824] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
2022-02-18 15:01:37 notice [ 578.576400] Call Trace:
2022-02-18 15:01:37 notice [ 578.576976] check_sender_todo+0x16/0xb0 [drbd]
2022-02-18 15:01:37 notice [ 578.577621] wait_for_sender_todo+0xde/0x290 [drbd]
2022-02-18 15:01:37 notice [ 578.578171] ? finish_wait+0x80/0x80
2022-02-18 15:01:37 notice [ 578.578775] drbd_sender+0x301/0x3f0 [drbd]
2022-02-18 15:01:37 notice [ 578.579352] ? __drbd_next_peer_device_ref+0x140/0x140 [drbd]
2022-02-18 15:01:37 notice [ 578.579944] drbd_thread_setup+0x5e/0x160 [drbd]
2022-02-18 15:01:37 notice [ 578.580554] ? __drbd_next_peer_device_ref+0x140/0x140 [drbd]
2022-02-18 15:01:37 notice [ 578.581102] kthread+0x116/0x130
2022-02-18 15:01:37 notice [ 578.581709] ? kthread_flush_work_fn+0x10/0x10
2022-02-18 15:01:37 notice [ 578.582258] ret_from_fork+0x35/0x40
2022-02-18 15:01:37 notice [ 578.588274] CR2: 0000000000000080
2022-02-18 15:01:37 notice [ 578.588909] ---[ end trace 0120c3e98d62a293 ]---
2022-02-18 15:01:37 notice [ 578.632554] RIP: 0010:tl_next_request_for_connection+0x122/0x350 [drbd]
2022-02-18 15:01:37 notice [ 578.633316] Code: 85 db 48 0f 44 dd 49 8b 44 24 10 48 8b 55 58 48 05 48 01 00 00 48 8d 6a a8 48 39 c2 0f 84 25 01 00 00 4c 8b 75 00 48 8b 3c 24 <49> 63 b6 80 00 00 00 e8 12 4a 1b d2 4c 63 88 84 00 00 00 49 89 c7
2022-02-18 15:01:37 notice [ 578.634634] RSP: 0018:ffffb5eed9943db0 EFLAGS: 00010087
2022-02-18 15:01:37 notice [ 578.635316] RAX: ffff8dbc49799148 RBX: ffff8db869166b50 RCX: 0000000000000000
2022-02-18 15:01:37 notice [ 578.635974] RDX: ffff8db869166ba8 RSI: 0000000000000000 RDI: ffff8db819a07048
2022-02-18 15:01:37 notice [ 578.636658] RBP: ffff8db869166b50 R08: ffff8db67f61ab80 R09: 000000000000000b
2022-02-18 15:01:37 notice [ 578.637278] R10: 0000000000000000 R11: 0000000000008000 R12: ffff8db819a07000
2022-02-18 15:01:37 notice [ 578.637933] R13: 0000000000000000 R14: 0000000000000000 R15: ffff8db92aa38800
2022-02-18 15:01:37 notice [ 578.638632] FS: 0000000000000000(0000) GS:ffff8db67f600000(0000) knlGS:0000000000000000
2022-02-18 15:01:37 notice [ 578.639292] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
2022-02-18 15:01:37 notice [ 578.639909] CR2: 0000000000000080 CR3: 000000b6c0810003 CR4: 00000000003726f0
2022-02-18 15:01:37 notice [ 578.640587] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
2022-02-18 15:01:37 notice [ 578.641155] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
2022-02-18 15:01:37 notice [ 578.641766] Kernel panic - not syncing: Fatal exception
2022-02-18 15:01:37 notice [ 578.734113] Kernel Offset: 0x11a00000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
2022-02-18 15:01:37 notice [ 578.777494] ---[ end Kernel panic - not syncing: Fatal exception ]---
from drbd.
I would like to finish evaluating diskless mode, but unfortunately haven't gotten past the first test. Do you need any help fixing the bugs?
from drbd.
Regarding the errors on read, it appears that on-no-data-accessible suspend-io
only applies when the resource is Primary. It is possible to access DRBD read-only without promoting to Primary. Try the test again with an explicit drbdadm primary ...
first.
Regarding the write errors, we have applied some fixes to the drbd-9.1
branch just recently that relate to the area of suspending and resuming IO. Please try the current drbd-9.1
branch.
Also, you probably want to enable quorum, since you are configuring on-no-quorum
. Set the quorum
option too.
from drbd.
@koszik Please test again with drbd-9.1.7
. Does the bug still occur with the write test?
from drbd.
Given that there has been no information to the contrary, I am assuming that this is fixed in drbd-9.1.7
.
from drbd.
Related Issues (20)
- Very low write speeds HOT 3
- Kernel Module build failure on CentOS Stream 9 5.14.0-372.el9.x86_64 HOT 3
- kernel panic list_add corruption HOT 5
- how can I install drbd HOT 1
- Unable to demote DRBD devices since it is opened by Multipath HOT 4
- Missing 9.2.6 tag HOT 1
- 9.2.6 - unknown tls symbols with Kernel 6.1.60 LTS HOT 5
- drbd_transport_rdma.ko not removed from /lib/modules
- Drbd 9.2.6: Does protocol A or B still work? HOT 4
- Kernel Panic with 9.1.15 HOT 3
- drbdadm create-md always results in "BAD LUCK, equal hashes" HOT 2
- drbd with rdma occur: Not sending flow_control mgs, no receive window!
- is it possible to serve >100 clients (diskless drbd node) with 3 ~ 5 drbd server nodes? HOT 1
- drbd 9.2.7 fails to build against Linux 6.6.13 without spatch HOT 5
- drbd 9.2.7 fails to build when executing spatch HOT 2
- Kernel Panic with 9.2.8 HOT 2
- The new kernel version compilation problem HOT 3
- ubuntu 22 drbd-dkms package not installing HOT 3
- PRIMARY timeouted after secondary node rebooted and joined cluster(minutes after) - RDMA
- drbd 9.2.7 failed to stop workqueue "drbd_pas" after systemctl stop drbd
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from drbd.