Git Product home page Git Product logo

Comments (8)

JoelColledge avatar JoelColledge commented on May 21, 2024

Thanks for the report.

What are the results of this test with DRBD 9.1.5 and 9.0.32?

What load are you applying with fio?

from drbd.

JoelColledge avatar JoelColledge commented on May 21, 2024

For the record, rq_state: 284000 corresponds to RQ_WRITE|RQ_UNPLUG|RQ_COMPLETION_SUSP.

The disassembly around drbd_sender+0x208 (0x208==520) on this kernel is:

/drbd/drbd/drbd_sender.c:
3133			read_lock_irq(&connection->resource->state_rwlock);
   0x00000000000100fb <+459>:	mov    0x10(%rbx),%rax
   0x00000000000100ff <+463>:	lea    0x130(%rax),%rdi
   0x0000000000010106 <+470>:	callq  0x1010b <drbd_sender+475>

3134			__req_mod(req, SEND_CANCELED, peer_device, &m);
   0x000000000001010b <+475>:	mov    %r13,%rdx
   0x000000000001010e <+478>:	mov    $0x5,%esi
   0x0000000000010113 <+483>:	mov    %rbp,%rdi
   0x0000000000010116 <+486>:	lea    0x10(%rsp),%rcx
   0x000000000001011b <+491>:	callq  0x10120 <drbd_sender+496>

./include/asm-generic/qrwlock.h:
117
   0x0000000000010120 <+496>:	mov    0x10(%rbx),%rdx

./arch/x86/include/asm/atomic.h:
177
   0x0000000000010124 <+500>:	mov    $0xfffffe00,%eax
   0x0000000000010129 <+505>:	lock xadd %eax,0x130(%rdx)

./arch/x86/include/asm/paravirt.h:
804
   0x0000000000010131 <+513>:	callq  *0x0

/drbd/drbd/drbd_sender.c:
3136			if (m.bio)
   0x0000000000010138 <+520>:	cmpq   $0x0,0x10(%rsp)
   0x000000000001013e <+526>:	je     0x1014d <drbd_sender+541>

3137				complete_master_bio(device, &m);
   0x0000000000010140 <+528>:	lea    0x10(%rsp),%rsi
   0x0000000000010145 <+533>:	mov    %r15,%rdi
   0x0000000000010148 <+536>:	callq  0x1014d <drbd_sender+541>

from drbd.

JoelColledge avatar JoelColledge commented on May 21, 2024

One more test to perform would be blocking both directions with iptables. This is a particularly unusual test because the network connection is only broken in one direction. Try blocking both for comparison.

from drbd.

koszik avatar koszik commented on May 21, 2024

fio cmdline:

fio --ioengine=libaio --iodepth=4 --rw=randread --bs=4k --size=10g --numjobs=1 --direct=1 --sync=1 --name=x --filename=/dev/drbd854

(or randwrite, but that is going to ext4 inside a vm on this device)

I tried the test again with blocking both directions.

Read test resulted in the same outcome, but for the write test it's different, now there's an IO error reported to the application and a stuck request, but not the logic bug (this could be due to random chance, only tried it once):

Feb 18 14:18:06 nodeC kernel: drbd u12215 nodeA: PingAck did not arrive in time.
Feb 18 14:18:06 nodeC kernel: drbd u12215 nodeA: conn( Connected -> NetworkFailure ) peer( Secondary -> Unknown )
Feb 18 14:18:06 nodeC kernel: drbd u12215/0 drbd854 nodeA: pdsk( UpToDate -> DUnknown ) repl( Established -> Off )
Feb 18 14:18:06 nodeC kernel: drbd u12215 nodeA: ack_receiver terminated
Feb 18 14:18:06 nodeC kernel: drbd u12215 nodeA: Terminating ack_recv thread
Feb 18 14:18:07 nodeC kernel: drbd u12215 nodeA: Terminating sender thread
Feb 18 14:18:07 nodeC kernel: drbd u12215 nodeA: Starting sender thread (from drbd_r_u12215 [78220])
Feb 18 14:18:07 nodeC kernel: drbd u12215/0 drbd854: sending new current UUID: CF34C6AD6BADB163
Feb 18 14:18:07 nodeC kernel: drbd u12215 nodeA: Connection closed
Feb 18 14:18:07 nodeC kernel: drbd u12215 nodeA: helper command: /sbin/drbdadm disconnected
Feb 18 14:18:07 nodeC kernel: drbd u12215 nodeA: helper command: /sbin/drbdadm disconnected exit code 0
Feb 18 14:18:07 nodeC kernel: drbd u12215 nodeA: conn( NetworkFailure -> Unconnected )
Feb 18 14:18:07 nodeC kernel: drbd u12215 nodeA: Restarting receiver thread
Feb 18 14:18:07 nodeC kernel: drbd u12215 nodeA: conn( Unconnected -> Connecting )

Feb 18 14:18:26 nodeC kernel: drbd u12215 nodeB: PingAck did not arrive in time.
Feb 18 14:18:26 nodeC kernel: drbd u12215: susp-io( no -> no-disk)
Feb 18 14:18:26 nodeC kernel: drbd u12215 nodeB: conn( Connected -> NetworkFailure ) peer( Secondary -> Unknown )
Feb 18 14:18:26 nodeC kernel: drbd u12215/0 drbd854 nodeB: pdsk( UpToDate -> DUnknown ) repl( Established -> Off )
Feb 18 14:18:26 nodeC kernel: drbd u12215 nodeB: ack_receiver terminated
Feb 18 14:18:26 nodeC kernel: drbd u12215/0 drbd854 nodeA: helper command: /sbin/drbdadm pri-on-incon-degr
Feb 18 14:18:26 nodeC kernel: drbd u12215 nodeB: Terminating ack_recv thread
Feb 18 14:18:26 nodeC kernel: drbd u12215/0 drbd854 nodeA: helper command: /sbin/drbdadm pri-on-incon-degr exit code 0
Feb 18 14:18:26 nodeC kernel: drbd u12215/0 drbd854 nodeB: helper command: /sbin/drbdadm pri-on-incon-degr
Feb 18 14:18:26 nodeC kernel: drbd u12215/0 drbd854 nodeB: helper command: /sbin/drbdadm pri-on-incon-degr exit code 0
Feb 18 14:18:26 nodeC kernel: drbd u12215 nodeB: Terminating sender thread
Feb 18 14:18:26 nodeC kernel: drbd u12215 nodeB: Starting sender thread (from drbd_r_u12215 [78221])
Feb 18 14:18:27 nodeC kernel: drbd u12215 nodeB: Connection closed
Feb 18 14:18:27 nodeC kernel: drbd u12215 nodeB: helper command: /sbin/drbdadm disconnected
Feb 18 14:18:27 nodeC kernel: drbd u12215 nodeB: helper command: /sbin/drbdadm disconnected exit code 0
Feb 18 14:18:27 nodeC kernel: drbd u12215 nodeB: conn( NetworkFailure -> Unconnected )
Feb 18 14:18:27 nodeC kernel: drbd u12215 nodeB: Restarting receiver thread
Feb 18 14:18:27 nodeC kernel: drbd u12215 nodeB: conn( Unconnected -> Connecting )

Feb 18 14:18:39 nodeC kernel: drbd u12215 nodeA: Handshake to peer 10 successful: Agreed network protocol version 121
Feb 18 14:18:39 nodeC kernel: drbd u12215 nodeA: Feature flags enabled on protocol level: 0xf TRIM THIN_RESYNC WRITE_SAME WRITE_ZEROES.
Feb 18 14:18:39 nodeC kernel: drbd u12215 nodeA: Starting ack_recv thread (from drbd_r_u12215 [78220])
Feb 18 14:18:39 nodeC kernel: drbd u12215: Preparing cluster-wide state change 104547609 (7->10 499/145)
Feb 18 14:18:39 nodeC kernel: drbd u12215: State change 104547609: primary_nodes=80, weak_nodes=FFFFFFFFFFFFFB7F
Feb 18 14:18:39 nodeC kernel: drbd u12215: Committing cluster-wide state change 104547609 (24ms)
Feb 18 14:18:39 nodeC kernel: drbd u12215 nodeA: conn( Connecting -> Connected ) peer( Unknown -> Secondary )
Feb 18 14:18:39 nodeC kernel: drbd u12215/0 drbd854 nodeA: pdsk( DUnknown -> Consistent ) repl( Off -> Established )
Feb 18 14:18:39 nodeC kernel: drbd u12215: susp-io( no-disk -> no)
Feb 18 14:18:39 nodeC kernel: drbd u12215/0 drbd854: IO ERROR: neither local nor remote data, sector 0+0
Feb 18 14:18:39 nodeC kernel: drbd u12215/0 drbd854 nodeA: pdsk( Consistent -> UpToDate )

Feb 18 14:18:59 nodeC kernel: drbd u12215 nodeB: Handshake to peer 11 successful: Agreed network protocol version 121
Feb 18 14:18:59 nodeC kernel: drbd u12215 nodeB: Feature flags enabled on protocol level: 0xf TRIM THIN_RESYNC WRITE_SAME WRITE_ZEROES.
Feb 18 14:18:59 nodeC kernel: drbd u12215 nodeB: Starting ack_recv thread (from drbd_r_u12215 [78221])
Feb 18 14:19:00 nodeC kernel: drbd u12215: Preparing cluster-wide state change 1681449179 (7->11 499/145)
Feb 18 14:19:00 nodeC kernel: drbd u12215: State change 1681449179: primary_nodes=80, weak_nodes=FFFFFFFFFFFFF37F
Feb 18 14:19:00 nodeC kernel: drbd u12215: Committing cluster-wide state change 1681449179 (28ms)
Feb 18 14:19:00 nodeC kernel: drbd u12215 nodeB: conn( Connecting -> Connected ) peer( Unknown -> Secondary )
Feb 18 14:19:00 nodeC kernel: drbd u12215/0 drbd854 nodeB: pdsk( DUnknown -> Outdated ) repl( Off -> Established )
Feb 18 14:19:00 nodeC kernel: drbd u12215/0 drbd854 nodeB: pdsk( Outdated -> UpToDate )

Feb 18 14:19:43 nodeC kernel: drbd u12215/0 drbd854 nodeB: Remote failed to finish a request within 87433ms > ko-count (7) * timeout (60 * 0.1s)
Feb 18 14:19:43 nodeC kernel: drbd u12215 nodeB: conn( Connected -> Timeout ) peer( Secondary -> Unknown )
Feb 18 14:19:43 nodeC kernel: drbd u12215/0 drbd854 nodeB: pdsk( UpToDate -> DUnknown ) repl( Established -> Off )
Feb 18 14:19:43 nodeC kernel: drbd u12215 nodeB: ack_receiver terminated
Feb 18 14:19:43 nodeC kernel: drbd u12215 nodeB: Terminating ack_recv thread
Feb 18 14:19:43 nodeC kernel: drbd u12215 nodeB: Terminating sender thread
Feb 18 14:19:43 nodeC kernel: drbd u12215 nodeB: Starting sender thread (from drbd_r_u12215 [78221])
Feb 18 14:19:43 nodeC kernel: drbd u12215/0 drbd854: sending new current UUID: FE90848345F0258D
Feb 18 14:19:43 nodeC kernel: drbd u12215 nodeB: Connection closed
Feb 18 14:19:43 nodeC kernel: drbd u12215 nodeA: Preparing remote state change 2931506494
Feb 18 14:19:43 nodeC kernel: drbd u12215 nodeB: helper command: /sbin/drbdadm disconnected
Feb 18 14:19:43 nodeC kernel: drbd u12215 nodeB: helper command: /sbin/drbdadm disconnected exit code 0
Feb 18 14:19:43 nodeC kernel: drbd u12215 nodeB: conn( Timeout -> Unconnected )
Feb 18 14:19:43 nodeC kernel: drbd u12215 nodeB: Restarting receiver thread
Feb 18 14:19:43 nodeC kernel: drbd u12215 nodeB: conn( Unconnected -> Connecting )
Feb 18 14:19:43 nodeC kernel: drbd u12215 nodeA: Committing remote state change 2931506494 (primary_nodes=80)
Feb 18 14:19:43 nodeC kernel: drbd u12215/0 drbd854 nodeB: pdsk( DUnknown -> Outdated )
Feb 18 14:19:44 nodeC kernel: drbd u12215 nodeB: Handshake to peer 11 successful: Agreed network protocol version 121
Feb 18 14:19:44 nodeC kernel: drbd u12215 nodeB: Feature flags enabled on protocol level: 0xf TRIM THIN_RESYNC WRITE_SAME WRITE_ZEROES.
Feb 18 14:19:44 nodeC kernel: drbd u12215 nodeB: Starting ack_recv thread (from drbd_r_u12215 [78221])
Feb 18 14:19:44 nodeC kernel: drbd u12215: Preparing cluster-wide state change 3841230571 (7->11 499/145)
Feb 18 14:19:44 nodeC kernel: drbd u12215: State change 3841230571: primary_nodes=80, weak_nodes=FFFFFFFFFFFFF37F
Feb 18 14:19:44 nodeC kernel: drbd u12215: Committing cluster-wide state change 3841230571 (28ms)
Feb 18 14:19:44 nodeC kernel: drbd u12215 nodeB: conn( Connecting -> Connected ) peer( Unknown -> Secondary )
Feb 18 14:19:44 nodeC kernel: drbd u12215/0 drbd854 nodeB: repl( Off -> Established )
Feb 18 14:19:44 nodeC kernel: drbd u12215/0 drbd854 nodeB: pdsk( Outdated -> Inconsistent ) resync-susp( no -> peer )
Feb 18 14:19:44 nodeC kernel: drbd u12215/0 drbd854 nodeB: pdsk( Inconsistent -> UpToDate ) resync-susp( peer -> no )

[root@nodeC ~]# cat /sys/block/drbd854/inflight
       1        0

Testing symmetric blocking with 9.1.5:

version: 9.1.5 (api:2/proto:110-121)
GIT-hash: f41bc23cec1e919932ce35947f1b4e67d9e6db74 build by mockbuild@, 2021-12-18 14:03:28
Transports (api:17): tcp (9.1.5)

Read: same as before, IO error as soon as the second connection times out.

Write: similar to 9.1.6, with IO error after nodeA reconnecting, but also the logic bug:

Feb 18 14:44:33 nodeC kernel: drbd u12215 nodeA: PingAck did not arrive in time.
Feb 18 14:44:33 nodeC kernel: drbd u12215 nodeA: conn( Connected -> NetworkFailure ) peer( Secondary -> Unknown )
Feb 18 14:44:33 nodeC kernel: drbd u12215/0 drbd854 nodeA: pdsk( UpToDate -> DUnknown ) repl( Established -> Off )
Feb 18 14:44:33 nodeC kernel: drbd u12215 nodeA: ack_receiver terminated
Feb 18 14:44:33 nodeC kernel: drbd u12215 nodeA: Terminating ack_recv thread
Feb 18 14:44:33 nodeC kernel: drbd u12215 nodeA: Terminating sender thread
Feb 18 14:44:33 nodeC kernel: drbd u12215 nodeA: Starting sender thread (from drbd_r_u12215 [2970])
Feb 18 14:44:33 nodeC kernel: drbd u12215/0 drbd854: sending new current UUID: AB135B0DD32864CF
Feb 18 14:44:33 nodeC kernel: drbd u12215 nodeA: Connection closed
Feb 18 14:44:33 nodeC kernel: drbd u12215 nodeA: helper command: /sbin/drbdadm disconnected
Feb 18 14:44:33 nodeC kernel: drbd u12215 nodeA: helper command: /sbin/drbdadm disconnected exit code 0
Feb 18 14:44:33 nodeC kernel: drbd u12215 nodeA: conn( NetworkFailure -> Unconnected )
Feb 18 14:44:33 nodeC kernel: drbd u12215 nodeA: Restarting receiver thread
Feb 18 14:44:33 nodeC kernel: drbd u12215 nodeA: conn( Unconnected -> Connecting )
Feb 18 14:44:53 nodeC kernel: drbd u12215 nodeB: PingAck did not arrive in time.
Feb 18 14:44:53 nodeC kernel: drbd u12215: susp-io( no -> no-disk)
Feb 18 14:44:53 nodeC kernel: drbd u12215 nodeB: conn( Connected -> NetworkFailure ) peer( Secondary -> Unknown )
Feb 18 14:44:53 nodeC kernel: drbd u12215/0 drbd854 nodeB: pdsk( UpToDate -> DUnknown ) repl( Established -> Off )

Feb 18 14:44:53 nodeC kernel: drbd u12215 nodeB: ack_receiver terminated
Feb 18 14:44:53 nodeC kernel: drbd u12215/0 drbd854 nodeA: helper command: /sbin/drbdadm pri-on-incon-degr
Feb 18 14:44:53 nodeC kernel: drbd u12215 nodeB: Terminating ack_recv thread
Feb 18 14:44:53 nodeC kernel: drbd u12215/0 drbd854 nodeA: helper command: /sbin/drbdadm pri-on-incon-degr exit code 0
Feb 18 14:44:53 nodeC kernel: drbd u12215/0 drbd854 nodeB: helper command: /sbin/drbdadm pri-on-incon-degr
Feb 18 14:44:53 nodeC kernel: drbd u12215/0 drbd854 nodeB: helper command: /sbin/drbdadm pri-on-incon-degr exit code 0
Feb 18 14:44:53 nodeC kernel: drbd u12215 nodeB: Terminating sender thread
Feb 18 14:44:53 nodeC kernel: drbd u12215 nodeB: Starting sender thread (from drbd_r_u12215 [2971])
Feb 18 14:44:53 nodeC kernel: drbd u12215 nodeB: Connection closed
Feb 18 14:44:53 nodeC kernel: drbd u12215 nodeB: helper command: /sbin/drbdadm disconnected
Feb 18 14:44:53 nodeC kernel: drbd u12215 nodeB: helper command: /sbin/drbdadm disconnected exit code 0
Feb 18 14:44:53 nodeC kernel: drbd u12215 nodeB: conn( NetworkFailure -> Unconnected )
Feb 18 14:44:53 nodeC kernel: drbd u12215 nodeB: Restarting receiver thread
Feb 18 14:44:53 nodeC kernel: drbd u12215 nodeB: conn( Unconnected -> Connecting )

Feb 18 14:45:03 nodeC kernel: drbd u12215 nodeA: Handshake to peer 10 successful: Agreed network protocol version 121
Feb 18 14:45:03 nodeC kernel: drbd u12215 nodeA: Feature flags enabled on protocol level: 0xf TRIM THIN_RESYNC WRITE_SAME WRITE_ZEROES.
Feb 18 14:45:03 nodeC kernel: drbd u12215 nodeA: Starting ack_recv thread (from drbd_r_u12215 [2970])
Feb 18 14:45:03 nodeC kernel: drbd u12215: Preparing cluster-wide state change 2772152590 (7->10 499/145)
Feb 18 14:45:03 nodeC kernel: drbd u12215: State change 2772152590: primary_nodes=80, weak_nodes=FFFFFFFFFFFFFB7F
Feb 18 14:45:03 nodeC kernel: drbd u12215: Committing cluster-wide state change 2772152590 (76ms)
Feb 18 14:45:03 nodeC kernel: drbd u12215 nodeA: conn( Connecting -> Connected ) peer( Unknown -> Secondary )
Feb 18 14:45:03 nodeC kernel: drbd u12215/0 drbd854 nodeA: pdsk( DUnknown -> Inconsistent ) repl( Off -> Established ) resync-susp( no -> peer )
Feb 18 14:45:03 nodeC kernel: drbd u12215: susp-io( no-disk -> no)
Feb 18 14:45:03 nodeC kernel: drbd u12215/0 drbd854: IO ERROR: neither local nor remote data, sector 4721080+24
Feb 18 14:45:03 nodeC kernel: drbd u12215/0 drbd854: IO ERROR: neither local nor remote data, sector 0+0

Feb 18 14:45:05 nodeC kernel: drbd u12215/0 drbd854 nodeA: pdsk( Inconsistent -> Outdated ) resync-susp( peer -> no )

Feb 18 14:45:25 nodeC kernel: drbd u12215 nodeB: Handshake to peer 11 successful: Agreed network protocol version 121
Feb 18 14:45:25 nodeC kernel: drbd u12215 nodeB: Feature flags enabled on protocol level: 0xf TRIM THIN_RESYNC WRITE_SAME WRITE_ZEROES.
Feb 18 14:45:25 nodeC kernel: drbd u12215 nodeB: Starting ack_recv thread (from drbd_r_u12215 [2971])
Feb 18 14:45:25 nodeC kernel: drbd u12215: Preparing cluster-wide state change 1015297695 (7->11 499/145)
Feb 18 14:45:25 nodeC kernel: drbd u12215: State change 1015297695: primary_nodes=80, weak_nodes=FFFFFFFFFFFFF37F
Feb 18 14:45:25 nodeC kernel: drbd u12215: Committing cluster-wide state change 1015297695 (24ms)
Feb 18 14:45:25 nodeC kernel: drbd u12215 nodeB: conn( Connecting -> Connected ) peer( Unknown -> Secondary )
Feb 18 14:45:25 nodeC kernel: drbd u12215/0 drbd854 nodeB: pdsk( DUnknown -> Outdated ) repl( Off -> Established )
Feb 18 14:45:25 nodeC kernel: drbd u12215/0 drbd854 nodeB: pdsk( Outdated -> UpToDate )

Feb 18 14:46:11 nodeC kernel: drbd u12215/0 drbd854 nodeB: Remote failed to finish a request within 89175ms > ko-count (7) * timeout (60 * 0.1s)
Feb 18 14:46:11 nodeC kernel: drbd u12215: susp-io( no -> no-disk)
Feb 18 14:46:11 nodeC kernel: drbd u12215 nodeB: conn( Connected -> Timeout ) peer( Secondary -> Unknown )
Feb 18 14:46:11 nodeC kernel: drbd u12215/0 drbd854 nodeB: pdsk( UpToDate -> DUnknown ) repl( Established -> Off )
Feb 18 14:46:11 nodeC kernel: drbd u12215 nodeB: ack_receiver terminated
Feb 18 14:46:11 nodeC kernel: drbd u12215/0 drbd854 nodeA: helper command: /sbin/drbdadm pri-on-incon-degr
Feb 18 14:46:11 nodeC kernel: drbd u12215 nodeB: Terminating ack_recv thread
Feb 18 14:46:11 nodeC kernel: drbd u12215/0 drbd854 nodeA: helper command: /sbin/drbdadm pri-on-incon-degr exit code 0
Feb 18 14:46:11 nodeC kernel: drbd u12215/0 drbd854 nodeB: helper command: /sbin/drbdadm pri-on-incon-degr
Feb 18 14:46:11 nodeC kernel: drbd u12215/0 drbd854 nodeB: helper command: /sbin/drbdadm pri-on-incon-degr exit code 0
Feb 18 14:46:11 nodeC kernel: drbd u12215 nodeB: Terminating sender thread
Feb 18 14:46:11 nodeC kernel: drbd u12215 nodeB: Starting sender thread (from drbd_r_u12215 [2971])
Feb 18 14:46:11 nodeC kernel: drbd u12215 nodeB: Connection closed
Feb 18 14:46:11 nodeC kernel: drbd u12215 nodeA: Preparing remote state change 3864336905
Feb 18 14:46:11 nodeC kernel: drbd u12215 nodeB: helper command: /sbin/drbdadm disconnected
Feb 18 14:46:11 nodeC kernel: drbd u12215 nodeB: helper command: /sbin/drbdadm disconnected exit code 0
Feb 18 14:46:11 nodeC kernel: drbd u12215 nodeB: conn( Timeout -> Unconnected )
Feb 18 14:46:11 nodeC kernel: drbd u12215 nodeB: Restarting receiver thread
Feb 18 14:46:11 nodeC kernel: drbd u12215 nodeB: conn( Unconnected -> Connecting )
Feb 18 14:46:11 nodeC kernel: drbd u12215 nodeA: Committing remote state change 3864336905 (primary_nodes=80)
Feb 18 14:46:11 nodeC kernel: drbd u12215/0 drbd854 nodeB: pdsk( DUnknown -> Outdated )

Feb 18 14:46:12 nodeC kernel: drbd u12215 nodeB: Handshake to peer 11 successful: Agreed network protocol version 121
Feb 18 14:46:12 nodeC kernel: drbd u12215 nodeB: Feature flags enabled on protocol level: 0xf TRIM THIN_RESYNC WRITE_SAME WRITE_ZEROES.
Feb 18 14:46:12 nodeC kernel: drbd u12215 nodeB: Starting ack_recv thread (from drbd_r_u12215 [2971])
Feb 18 14:46:12 nodeC kernel: drbd u12215: Preparing cluster-wide state change 3509072553 (7->11 499/145)
Feb 18 14:46:12 nodeC kernel: drbd u12215: State change 3509072553: primary_nodes=80, weak_nodes=FFFFFFFFFFFFF37F
Feb 18 14:46:12 nodeC kernel: drbd u12215: Committing cluster-wide state change 3509072553 (28ms)
Feb 18 14:46:12 nodeC kernel: drbd u12215 nodeB: conn( Connecting -> Connected ) peer( Unknown -> Secondary )
Feb 18 14:46:12 nodeC kernel: drbd u12215/0 drbd854 nodeB: repl( Off -> Established )
Feb 18 14:46:12 nodeC kernel: drbd u12215: susp-io( no-disk -> no)
Feb 18 14:46:12 nodeC kernel: drbd u12215/0 drbd854 nodeB: pdsk( Outdated -> UpToDate )

Feb 18 14:46:56 nodeC kernel: drbd u12215/0 drbd854 nodeB: Remote failed to finish a request within 134231ms > ko-count (7) * timeout (60 * 0.1s)
Feb 18 14:46:56 nodeC kernel: drbd u12215: susp-io( no -> no-disk)
Feb 18 14:46:56 nodeC kernel: drbd u12215 nodeB: conn( Connected -> Timeout ) peer( Secondary -> Unknown )
Feb 18 14:46:56 nodeC kernel: drbd u12215/0 drbd854 nodeB: pdsk( UpToDate -> DUnknown ) repl( Established -> Off )
Feb 18 14:46:56 nodeC kernel: drbd u12215 nodeB: ack_receiver terminated
Feb 18 14:46:56 nodeC kernel: drbd u12215/0 drbd854 nodeA: helper command: /sbin/drbdadm pri-on-incon-degr
Feb 18 14:46:56 nodeC kernel: drbd u12215 nodeB: Terminating ack_recv thread
Feb 18 14:46:56 nodeC kernel: drbd u12215/0 drbd854 nodeA: helper command: /sbin/drbdadm pri-on-incon-degr exit code 0
Feb 18 14:46:56 nodeC kernel: drbd u12215/0 drbd854 nodeB: helper command: /sbin/drbdadm pri-on-incon-degr
Feb 18 14:46:56 nodeC kernel: drbd u12215/0 drbd854 nodeB: helper command: /sbin/drbdadm pri-on-incon-degr exit code 0
Feb 18 14:46:56 nodeC kernel: drbd u12215/0 drbd854: drbd_req_destroy: Logic BUG rq_state: 284000, completion_ref = 1

Feb 18 14:47:24 nodeC kernel: watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [drbd_s_u12215:4007]
Feb 18 14:47:24 nodeC kernel: Modules linked in: dm_mod vhost_net vhost vhost_iotlb tap tun drbd_transport_tcp(OE) drbd(OE) nfsv3 nfs_acl nfs lockd grace fscache netconsole ixgbe mdio dca mlx4_ib mlx4_en nf_defrag_ipv6 nf_defrag_ipv4 bridge stp llc mlx4_core nft_compat nft_counter nf_tables nfnetlink rpcrdma rdma_ucm
b_srpt ib_isert iscsi_target_mod target_core_mod ib_iser libiscsi scsi_transport_iscsi ib_umad rdma_cm ib_ipoib iw_cm ib_cm ib_uverbs ib_core sunrpc vfat fat intel_rapl_msr iTCO_wdt iTCO_vendor_support mxm_wmi dcdbas intel_rapl_common sb_edac x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm irqbypass crct1
0dif_pclmul crc32_pclmul mgag200 ghash_clmulni_intel i2c_algo_bit rapl drm_kms_helper intel_cstate syscopyarea intel_uncore sysfillrect cdc_ether sysimgblt pcspkr fb_sys_fops usbnet drm mii ipmi_ssif lpc_ich tg3 ipmi_si ipmi_devintf ipmi_msghandler mei_me mei wmi acpi_power_meter xfs libcrc32c raid1 sr_mod cdrom sd_mo
d t10_pi sg ahci libahci crc32c_intel libata megaraid_sas
Feb 18 14:47:24 nodeC kernel: [last unloaded: mlx4_core]
Feb 18 14:47:24 nodeC kernel: CPU: 0 PID: 4007 Comm: drbd_s_u12215 Tainted: G           OE    --------- -  - 4.18.0-348.12.2.el8_5.x86_64 #1
Feb 18 14:47:24 nodeC kernel: Hardware name: Dell Inc. PowerEdge R730/0WCJNT, BIOS 2.13.0 05/14/2021
Feb 18 14:47:24 nodeC kernel: RIP: 0010:drbd_sender+0x208/0x3c0 [drbd]
Feb 18 14:47:24 nodeC kernel: Code: be 07 00 00 00 48 89 ef 48 8d 4c 24 10 e8 70 57 01 00 48 8b 53 10 b8 00 fe ff ff f0 0f c1 82 30 01 00 00 fb 66 0f 1f 44 00 00 <48> 83 7c 24 10 00 74 0d 48 8d 74 24 10 4c 89 ff e8 53 56 01 00 48
Feb 18 14:47:24 nodeC kernel: RSP: 0018:ffffb00f5a7ffe80 EFLAGS: 00000247 ORIG_RAX: ffffffffffffff13
Feb 18 14:47:24 nodeC kernel: RAX: 0000000000000200 RBX: ffffa0a6d63c9000 RCX: 000000000000031c
Feb 18 14:47:24 nodeC kernel: RDX: ffffa0a8f82fa800 RSI: 0000000000284000 RDI: ffffa0aaab9f9280
Feb 18 14:47:24 nodeC kernel: RBP: ffffa0aaab9f90c0 R08: ffffa0aaab9f90d6 R09: 0000000000000000
Feb 18 14:47:24 nodeC kernel: R10: ffffa0a6d63c9000 R11: ffffa0a6d63c9050 R12: ffffa0a6d63c94f0
Feb 18 14:47:24 nodeC kernel: R13: ffffa0a8f82f9000 R14: ffffa0a6d63c9048 R15: ffffa0a6d02cd800
Feb 18 14:47:24 nodeC kernel: FS:  0000000000000000(0000) GS:ffffa0a53f600000(0000) knlGS:0000000000000000
Feb 18 14:47:24 nodeC kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Feb 18 14:47:24 nodeC kernel: CR2: 000000c00218d000 CR3: 0000007ffb610006 CR4: 00000000003726f0
Feb 18 14:47:24 nodeC kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Feb 18 14:47:24 nodeC kernel: DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Feb 18 14:47:24 nodeC kernel: Call Trace:
Feb 18 14:47:24 nodeC kernel: ? __drbd_next_peer_device_ref+0x140/0x140 [drbd]
Feb 18 14:47:24 nodeC kernel: drbd_thread_setup+0x5e/0x160 [drbd]
Feb 18 14:47:24 nodeC kernel: ? __drbd_next_peer_device_ref+0x140/0x140 [drbd]
Feb 18 14:47:24 nodeC kernel: kthread+0x116/0x130
Feb 18 14:47:24 nodeC kernel: ? kthread_flush_work_fn+0x10/0x10
Feb 18 14:47:24 nodeC kernel: ret_from_fork+0x35/0x40

With 9.0.32 (only on nodeC, A/B are still 9.1.6):

Read: as before, IO error on second disconnect

Write: results in a kernel crash instead of complaining of logic bugs:

version: 9.0.32-1 (api:2/proto:86-121)
GIT-hash: 7d2933d5a3764fcc5e0bf54b71fd9cfb0363be1a build by root@nodeC, 2022-02-18 14:55:18
Transports (api:17): tcp (9.0.32-1)


2022-02-18 14:59:13 err kernel: drbd u12215 nodeA: PingAck did not arrive in time.
2022-02-18 14:59:13 info kernel: drbd u12215 nodeA: conn( Connected -> NetworkFailure ) peer( Secondary -> Unknown )
2022-02-18 14:59:13 info kernel: drbd u12215/0 drbd854 nodeA: pdsk( UpToDate -> DUnknown ) repl( Established -> Off )
2022-02-18 14:59:13 info kernel: drbd u12215 nodeA: ack_receiver terminated
2022-02-18 14:59:13 info kernel: drbd u12215 nodeA: Terminating ack_recv thread
2022-02-18 14:59:13 info kernel: drbd u12215 nodeA: Terminating sender thread
2022-02-18 14:59:13 info kernel: drbd u12215 nodeA: Starting sender thread (from drbd_r_u12215 [5386])
2022-02-18 14:59:13 info kernel: drbd u12215/0 drbd854: sending new current UUID: 762953691B1D87C3
2022-02-18 14:59:13 info kernel: drbd u12215 nodeA: Connection closed
2022-02-18 14:59:13 info kernel: drbd u12215 nodeA: helper command: /sbin/drbdadm disconnected
2022-02-18 14:59:13 info kernel: drbd u12215 nodeA: helper command: /sbin/drbdadm disconnected exit code 0
2022-02-18 14:59:13 info kernel: drbd u12215 nodeA: conn( NetworkFailure -> Unconnected )
2022-02-18 14:59:13 info kernel: drbd u12215 nodeA: Restarting receiver thread
2022-02-18 14:59:13 info kernel: drbd u12215 nodeA: conn( Unconnected -> Connecting )

2022-02-18 14:59:33 err kernel: drbd u12215 nodeB: PingAck did not arrive in time.
2022-02-18 14:59:33 info kernel: drbd u12215: susp-io( no -> no-disk)
2022-02-18 14:59:33 info kernel: drbd u12215 nodeB: conn( Connected -> NetworkFailure ) peer( Secondary -> Unknown )
2022-02-18 14:59:33 info kernel: drbd u12215/0 drbd854 nodeB: pdsk( UpToDate -> DUnknown ) repl( Established -> Off )
2022-02-18 14:59:33 info kernel: drbd u12215 nodeB: ack_receiver terminated
2022-02-18 14:59:33 info kernel: drbd u12215/0 drbd854 nodeA: helper command: /sbin/drbdadm pri-on-incon-degr
2022-02-18 14:59:33 info kernel: drbd u12215 nodeB: Terminating ack_recv thread
2022-02-18 14:59:33 info kernel: drbd u12215/0 drbd854 nodeA: helper command: /sbin/drbdadm pri-on-incon-degr exit code 0
2022-02-18 14:59:33 info kernel: drbd u12215/0 drbd854 nodeB: helper command: /sbin/drbdadm pri-on-incon-degr
2022-02-18 14:59:33 info kernel: drbd u12215/0 drbd854 nodeB: helper command: /sbin/drbdadm pri-on-incon-degr exit code 0
2022-02-18 14:59:33 info kernel: drbd u12215 nodeB: Terminating sender thread
2022-02-18 14:59:33 info kernel: drbd u12215 nodeB: Starting sender thread (from drbd_r_u12215 [5387])
2022-02-18 14:59:33 info kernel: drbd u12215 nodeB: Connection closed
2022-02-18 14:59:33 info kernel: drbd u12215 nodeB: helper command: /sbin/drbdadm disconnected
2022-02-18 14:59:33 info kernel: drbd u12215 nodeB: helper command: /sbin/drbdadm disconnected exit code 0
2022-02-18 14:59:33 info kernel: drbd u12215 nodeB: conn( NetworkFailure -> Unconnected )
2022-02-18 14:59:33 info kernel: drbd u12215 nodeB: Restarting receiver thread
2022-02-18 14:59:33 info kernel: drbd u12215 nodeB: conn( Unconnected -> Connecting )

2022-02-18 14:59:44 info kernel: drbd u12215 nodeA: Handshake to peer 10 successful: Agreed network protocol version 121
2022-02-18 14:59:44 info kernel: drbd u12215 nodeA: Feature flags enabled on protocol level: 0xf TRIM THIN_RESYNC WRITE_SAME WRITE_ZEROES.
2022-02-18 14:59:44 info kernel: drbd u12215 nodeA: Starting ack_recv thread (from drbd_r_u12215 [5386])
2022-02-18 14:59:44 info kernel: drbd u12215: Preparing cluster-wide state change 2786458399 (7->10 499/145)
2022-02-18 14:59:44 info kernel: drbd u12215: State change 2786458399: primary_nodes=80, weak_nodes=FFFFFFFFFFFFFB7F
2022-02-18 14:59:44 info kernel: drbd u12215: Committing cluster-wide state change 2786458399 (93ms)
2022-02-18 14:59:44 info kernel: drbd u12215 nodeA: conn( Connecting -> Connected ) peer( Unknown -> Secondary )
2022-02-18 14:59:44 info kernel: drbd u12215/0 drbd854 nodeA: pdsk( DUnknown -> Inconsistent ) repl( Off -> Established ) resync-susp( no -> peer )
2022-02-18 14:59:44 info kernel: drbd u12215: susp-io( no-disk -> no)
2022-02-18 14:59:44 err kernel: drbd u12215/0 drbd854: IO ERROR: neither local nor remote data, sector 4721080+24
2022-02-18 14:59:44 err kernel: drbd u12215/0 drbd854: IO ERROR: neither local nor remote data, sector 4721080+8
2022-02-18 14:59:44 err kernel: drbd u12215/0 drbd854: IO ERROR: neither local nor remote data, sector 829328+32
2022-02-18 14:59:44 err kernel: drbd u12215/0 drbd854: IO ERROR: neither local nor remote data, sector 829328+8
2022-02-18 14:59:44 err kernel: drbd u12215/0 drbd854: IO ERROR: neither local nor remote data, sector 19657200+16
2022-02-18 14:59:44 err kernel: drbd u12215/0 drbd854: IO ERROR: neither local nor remote data, sector 19657200+16

2022-02-18 14:59:45 info kernel: drbd u12215/0 drbd854 nodeA: pdsk( Inconsistent -> Outdated ) resync-susp( peer -> no )

2022-02-18 14:59:47 err kernel: drbd u12215/0 drbd854: IO ERROR: neither local nor remote data, sector 534299+3

2022-02-18 15:00:06 info kernel: drbd u12215 nodeB: Handshake to peer 11 successful: Agreed network protocol version 121
2022-02-18 15:00:06 info kernel: drbd u12215 nodeB: Feature flags enabled on protocol level: 0xf TRIM THIN_RESYNC WRITE_SAME WRITE_ZEROES.
2022-02-18 15:00:06 info kernel: drbd u12215 nodeB: Starting ack_recv thread (from drbd_r_u12215 [5387])
2022-02-18 15:00:06 info kernel: drbd u12215: Preparing cluster-wide state change 4096656875 (7->11 499/145)
2022-02-18 15:00:06 info kernel: drbd u12215: State change 4096656875: primary_nodes=80, weak_nodes=FFFFFFFFFFFFF37F
2022-02-18 15:00:06 info kernel: drbd u12215: Committing cluster-wide state change 4096656875 (24ms)
2022-02-18 15:00:06 info kernel: drbd u12215 nodeB: conn( Connecting -> Connected ) peer( Unknown -> Secondary )
2022-02-18 15:00:06 info kernel: drbd u12215/0 drbd854 nodeB: pdsk( DUnknown -> Outdated ) repl( Off -> Established )
2022-02-18 15:00:06 info kernel: drbd u12215/0 drbd854 nodeB: pdsk( Outdated -> UpToDate )

2022-02-18 15:00:52 warning kernel: drbd u12215/0 drbd854: Remote failed to finish a request within 89864ms > ko-count (7) * timeout (60 * 0.1s)
2022-02-18 15:00:52 info kernel: drbd u12215: susp-io( no -> no-disk)
2022-02-18 15:00:52 info kernel: drbd u12215 nodeB: conn( Connected -> Timeout ) peer( Secondary -> Unknown )
2022-02-18 15:00:52 info kernel: drbd u12215/0 drbd854 nodeB: pdsk( UpToDate -> DUnknown ) repl( Established -> Off )
2022-02-18 15:00:52 info kernel: drbd u12215 nodeB: ack_receiver terminated
2022-02-18 15:00:52 info kernel: drbd u12215/0 drbd854 nodeA: helper command: /sbin/drbdadm pri-on-incon-degr
2022-02-18 15:00:52 info kernel: drbd u12215 nodeB: Terminating ack_recv thread
2022-02-18 15:00:52 info kernel: drbd u12215/0 drbd854 nodeA: helper command: /sbin/drbdadm pri-on-incon-degr exit code 0
2022-02-18 15:00:52 info kernel: drbd u12215/0 drbd854 nodeB: helper command: /sbin/drbdadm pri-on-incon-degr
2022-02-18 15:00:52 info kernel: drbd u12215/0 drbd854 nodeB: helper command: /sbin/drbdadm pri-on-incon-degr exit code 0
2022-02-18 15:00:52 info kernel: drbd u12215 nodeB: Terminating sender thread
2022-02-18 15:00:52 info kernel: drbd u12215 nodeB: Starting sender thread (from drbd_r_u12215 [5387])
2022-02-18 15:00:52 info kernel: drbd u12215 nodeB: Connection closed
2022-02-18 15:00:52 info kernel: drbd u12215 nodeA: Preparing remote state change 694803547
2022-02-18 15:00:52 info kernel: drbd u12215 nodeB: helper command: /sbin/drbdadm disconnected
2022-02-18 15:00:52 info kernel: drbd u12215 nodeB: helper command: /sbin/drbdadm disconnected exit code 0
2022-02-18 15:00:52 info kernel: drbd u12215 nodeB: conn( Timeout -> Unconnected )
2022-02-18 15:00:52 info kernel: drbd u12215 nodeB: Restarting receiver thread
2022-02-18 15:00:52 info kernel: drbd u12215 nodeB: conn( Unconnected -> Connecting )
2022-02-18 15:00:52 info kernel: drbd u12215 nodeA: Committing remote state change 694803547 (primary_nodes=80)
2022-02-18 15:00:52 info kernel: drbd u12215/0 drbd854 nodeB: pdsk( DUnknown -> Outdated )

2022-02-18 15:00:53 info kernel: drbd u12215 nodeB: Handshake to peer 11 successful: Agreed network protocol version 121
2022-02-18 15:00:53 info kernel: drbd u12215 nodeB: Feature flags enabled on protocol level: 0xf TRIM THIN_RESYNC WRITE_SAME WRITE_ZEROES.
2022-02-18 15:00:53 info kernel: drbd u12215 nodeB: Starting ack_recv thread (from drbd_r_u12215 [5387])
2022-02-18 15:00:53 info kernel: drbd u12215: Preparing cluster-wide state change 268496490 (7->11 499/145)
2022-02-18 15:00:53 info kernel: drbd u12215: State change 268496490: primary_nodes=80, weak_nodes=FFFFFFFFFFFFF37F
2022-02-18 15:00:53 info kernel: drbd u12215: Committing cluster-wide state change 268496490 (24ms)
2022-02-18 15:00:53 info kernel: drbd u12215 nodeB: conn( Connecting -> Connected ) peer( Unknown -> Secondary )
2022-02-18 15:00:53 info kernel: drbd u12215/0 drbd854 nodeB: repl( Off -> Established )
2022-02-18 15:00:53 info kernel: drbd u12215: susp-io( no-disk -> no)
2022-02-18 15:00:53 info kernel: drbd u12215/0 drbd854 nodeB: pdsk( Outdated -> UpToDate )

2022-02-18 15:01:37 notice [  578.543260] drbd u12215/0 drbd854: Remote failed to finish a request within 134920ms > ko-count (7) * timeout (60 * 0.1s)
2022-02-18 15:01:37 notice [  578.544168] drbd u12215: susp-io( no -> no-disk)
2022-02-18 15:01:37 notice [  578.544720] drbd u12215 nodeB: conn( Connected -> Timeout ) peer( Secondary -> Unknown )
2022-02-18 15:01:37 notice [  578.545286] drbd u12215/0 drbd854 nodeB: pdsk( UpToDate -> DUnknown ) repl( Established -> Off )
2022-02-18 15:01:37 notice [  578.545912] drbd u12215 nodeB: ack_receiver terminated
2022-02-18 15:01:37 notice [  578.546170] drbd u12215/0 drbd854 nodeA: helper command: /sbin/drbdadm pri-on-incon-degr
2022-02-18 15:01:37 notice [  578.546542] drbd u12215 nodeB: Terminating ack_recv thread
2022-02-18 15:01:37 notice [  578.550764] drbd u12215/0 drbd854 nodeA: helper command: /sbin/drbdadm pri-on-incon-degr exit code 0
2022-02-18 15:01:37 notice [  578.551801] drbd u12215/0 drbd854 nodeB: helper command: /sbin/drbdadm pri-on-incon-degr
2022-02-18 15:01:37 notice [  578.554057] drbd u12215/0 drbd854 nodeB: helper command: /sbin/drbdadm pri-on-incon-degr exit code 0
2022-02-18 15:01:37 notice [  578.565163] BUG: unable to handle kernel NULL pointer dereference at 0000000000000080
2022-02-18 15:01:37 notice [  578.565772] PGD 0 P4D 0
2022-02-18 15:01:37 notice [  578.566340] Oops: 0000 [#1] SMP PTI
2022-02-18 15:01:37 notice [  578.566908] CPU: 0 PID: 6261 Comm: drbd_s_u12215 Tainted: G           OE    --------- -  - 4.18.0-348.12.2.el8_5.x86_64 #1
2022-02-18 15:01:37 notice [  578.567564] Hardware name: Dell Inc. PowerEdge R730/0WCJNT, BIOS 2.13.0 05/14/2021
2022-02-18 15:01:37 notice [  578.568137] RIP: 0010:tl_next_request_for_connection+0x122/0x350 [drbd]
2022-02-18 15:01:37 notice [  578.568748] Code: 85 db 48 0f 44 dd 49 8b 44 24 10 48 8b 55 58 48 05 48 01 00 00 48 8d 6a a8 48 39 c2 0f 84 25 01 00 00 4c 8b 75 00 48 8b 3c 24 <49> 63 b6 80 00 00 00 e8 12 4a 1b d2 4c 63 88 84 00 00 00 49 89 c7
2022-02-18 15:01:37 notice [  578.569928] RSP: 0018:ffffb5eed9943db0 EFLAGS: 00010087
2022-02-18 15:01:37 notice [  578.570544] RAX: ffff8dbc49799148 RBX: ffff8db869166b50 RCX: 0000000000000000
2022-02-18 15:01:37 notice [  578.571106] RDX: ffff8db869166ba8 RSI: 0000000000000000 RDI: ffff8db819a07048
2022-02-18 15:01:37 notice [  578.571722] RBP: ffff8db869166b50 R08: ffff8db67f61ab80 R09: 000000000000000b
2022-02-18 15:01:37 notice [  578.572301] R10: 0000000000000000 R11: 0000000000008000 R12: ffff8db819a07000
2022-02-18 15:01:37 notice [  578.572889] R13: 0000000000000000 R14: 0000000000000000 R15: ffff8db92aa38800
2022-02-18 15:01:37 notice [  578.573474] FS:  0000000000000000(0000) GS:ffff8db67f600000(0000) knlGS:0000000000000000
2022-02-18 15:01:37 notice [  578.574047] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
2022-02-18 15:01:37 notice [  578.574661] CR2: 0000000000000080 CR3: 000000b6c0810003 CR4: 00000000003726f0
2022-02-18 15:01:37 notice [  578.575216] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
2022-02-18 15:01:37 notice [  578.575824] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
2022-02-18 15:01:37 notice [  578.576400] Call Trace:
2022-02-18 15:01:37 notice [  578.576976]  check_sender_todo+0x16/0xb0 [drbd]
2022-02-18 15:01:37 notice [  578.577621]  wait_for_sender_todo+0xde/0x290 [drbd]
2022-02-18 15:01:37 notice [  578.578171]  ? finish_wait+0x80/0x80
2022-02-18 15:01:37 notice [  578.578775]  drbd_sender+0x301/0x3f0 [drbd]
2022-02-18 15:01:37 notice [  578.579352]  ? __drbd_next_peer_device_ref+0x140/0x140 [drbd]
2022-02-18 15:01:37 notice [  578.579944]  drbd_thread_setup+0x5e/0x160 [drbd]
2022-02-18 15:01:37 notice [  578.580554]  ? __drbd_next_peer_device_ref+0x140/0x140 [drbd]
2022-02-18 15:01:37 notice [  578.581102]  kthread+0x116/0x130
2022-02-18 15:01:37 notice [  578.581709]  ? kthread_flush_work_fn+0x10/0x10
2022-02-18 15:01:37 notice [  578.582258]  ret_from_fork+0x35/0x40
2022-02-18 15:01:37 notice [  578.588274] CR2: 0000000000000080
2022-02-18 15:01:37 notice [  578.588909] ---[ end trace 0120c3e98d62a293 ]---
2022-02-18 15:01:37 notice [  578.632554] RIP: 0010:tl_next_request_for_connection+0x122/0x350 [drbd]
2022-02-18 15:01:37 notice [  578.633316] Code: 85 db 48 0f 44 dd 49 8b 44 24 10 48 8b 55 58 48 05 48 01 00 00 48 8d 6a a8 48 39 c2 0f 84 25 01 00 00 4c 8b 75 00 48 8b 3c 24 <49> 63 b6 80 00 00 00 e8 12 4a 1b d2 4c 63 88 84 00 00 00 49 89 c7
2022-02-18 15:01:37 notice [  578.634634] RSP: 0018:ffffb5eed9943db0 EFLAGS: 00010087
2022-02-18 15:01:37 notice [  578.635316] RAX: ffff8dbc49799148 RBX: ffff8db869166b50 RCX: 0000000000000000
2022-02-18 15:01:37 notice [  578.635974] RDX: ffff8db869166ba8 RSI: 0000000000000000 RDI: ffff8db819a07048
2022-02-18 15:01:37 notice [  578.636658] RBP: ffff8db869166b50 R08: ffff8db67f61ab80 R09: 000000000000000b
2022-02-18 15:01:37 notice [  578.637278] R10: 0000000000000000 R11: 0000000000008000 R12: ffff8db819a07000
2022-02-18 15:01:37 notice [  578.637933] R13: 0000000000000000 R14: 0000000000000000 R15: ffff8db92aa38800
2022-02-18 15:01:37 notice [  578.638632] FS:  0000000000000000(0000) GS:ffff8db67f600000(0000) knlGS:0000000000000000
2022-02-18 15:01:37 notice [  578.639292] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
2022-02-18 15:01:37 notice [  578.639909] CR2: 0000000000000080 CR3: 000000b6c0810003 CR4: 00000000003726f0
2022-02-18 15:01:37 notice [  578.640587] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
2022-02-18 15:01:37 notice [  578.641155] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
2022-02-18 15:01:37 notice [  578.641766] Kernel panic - not syncing: Fatal exception
2022-02-18 15:01:37 notice [  578.734113] Kernel Offset: 0x11a00000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
2022-02-18 15:01:37 notice [  578.777494] ---[ end Kernel panic - not syncing: Fatal exception ]---

from drbd.

koszik avatar koszik commented on May 21, 2024

I would like to finish evaluating diskless mode, but unfortunately haven't gotten past the first test. Do you need any help fixing the bugs?

from drbd.

JoelColledge avatar JoelColledge commented on May 21, 2024

Regarding the errors on read, it appears that on-no-data-accessible suspend-io only applies when the resource is Primary. It is possible to access DRBD read-only without promoting to Primary. Try the test again with an explicit drbdadm primary ... first.

Regarding the write errors, we have applied some fixes to the drbd-9.1 branch just recently that relate to the area of suspending and resuming IO. Please try the current drbd-9.1 branch.

Also, you probably want to enable quorum, since you are configuring on-no-quorum. Set the quorum option too.

from drbd.

JoelColledge avatar JoelColledge commented on May 21, 2024

@koszik Please test again with drbd-9.1.7. Does the bug still occur with the write test?

from drbd.

JoelColledge avatar JoelColledge commented on May 21, 2024

Given that there has been no information to the contrary, I am assuming that this is fixed in drbd-9.1.7.

from drbd.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.