Git Product home page Git Product logo

drbd's People

Contributors

andreas-gruenbacher avatar ardje avatar bmwiedemann avatar brhellman avatar chrboe avatar ebiederm avatar elfring avatar error27 avatar htejun avatar joelcolledge avatar joeperches avatar johannesthoma avatar lge avatar micha137 avatar nick-wang avatar paulgortmaker avatar philipp-reisner avatar phmarek avatar raltnoeder avatar rasto avatar rck avatar rustyrussell avatar sfrothwell avatar simon3z avatar snitm avatar thomas-mangin avatar veggiemike avatar wanzenbug avatar xurui-xr avatar z14825 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

drbd's Issues

DRBD build error with kernel 5.11.15-200.fc33.x86_64

Hi!
I`m trying to build drbd 9.1.2 module in fedora coreos 33 with kernel 5.11.15-200.fc33.x86_64 and have the following error

make[1]: Entering directory '/tmp/pkg/drbd-9.1.2/drbd'

    Calling toplevel makefile of kernel source tree, which I believe is in
    KDIR=/opt/src/5.11.15-200.fc33.x86_64

make -C /opt/src/5.11.15-200.fc33.x86_64   M=/tmp/pkg/drbd-9.1.2/drbd  modules
  COMPAT  __vmalloc_has_2_params
  COMPAT  alloc_workqueue_takes_fmt
  COMPAT  before_4_13_kernel_read
  COMPAT  blkdev_issue_zeroout_discard
  COMPAT  can_include_vermagic_h
  COMPAT  genl_policy_in_ops
  COMPAT  have_BIO_MAX_VECS
  COMPAT  have_CRYPTO_TFM_NEED_KEY
  COMPAT  have_SHASH_DESC_ON_STACK
  COMPAT  have_WB_congested_enum
  COMPAT  have_allow_kernel_signal
  COMPAT  have_bdi_cap_stable_writes
  COMPAT  have_bdi_congested_fn
  COMPAT  have_bio_bi_bdev
  COMPAT  have_bio_bi_error
  COMPAT  have_bio_bi_opf
  COMPAT  have_bio_bi_status
  COMPAT  have_bio_clone_fast
  COMPAT  have_bio_op_shift
  COMPAT  have_bio_set_dev
  COMPAT  have_bio_set_op_attrs
  COMPAT  have_bio_start_io_acct
  COMPAT  have_bioset_init
  COMPAT  have_bioset_need_bvecs
  COMPAT  have_blk_alloc_queue_rh
  COMPAT  have_blk_check_plugged
  COMPAT  have_blk_qc_t_make_request
  COMPAT  have_blk_queue_flag_set
  COMPAT  have_blk_queue_make_request
  COMPAT  have_blk_queue_merge_bvec
  COMPAT  have_blk_queue_plugged
  COMPAT  have_blk_queue_split_bio
  COMPAT  have_blk_queue_split_q_bio
  COMPAT  have_blk_queue_split_q_bio_bioset
  COMPAT  have_blk_queue_update_readahead
  COMPAT  have_blk_queue_write_cache
  COMPAT  have_d_inode
  COMPAT  have_fallthrough
  COMPAT  have_generic_start_io_acct_q_rw_sect_part
  COMPAT  have_generic_start_io_acct_rw_sect_part
  COMPAT  have_genl_family_parallel_ops
  COMPAT  have_hd_struct
  COMPAT  have_ib_cq_init_attr
  COMPAT  have_ib_get_dma_mr
  COMPAT  have_idr_is_empty
  COMPAT  have_inode_lock
  COMPAT  have_ktime_to_timespec64
  COMPAT  have_kvfree
  COMPAT  have_max_send_recv_sge
  COMPAT  have_nla_nest_start_noflag
  COMPAT  have_nla_parse_deprecated
  COMPAT  have_nla_put_64bit
  COMPAT  have_nla_strscpy
  COMPAT  have_part_stat_h
  COMPAT  have_part_stat_read_accum
  COMPAT  have_pointer_backing_dev_info
  COMPAT  have_proc_create_single
  COMPAT  have_queue_flag_stable_writes
  COMPAT  have_rb_declare_callbacks_max
  COMPAT  have_refcount_inc
  COMPAT  have_req_flush
  COMPAT  have_req_hardbarrier
  COMPAT  have_req_noidle
  COMPAT  have_req_nounmap
  COMPAT  have_req_op_write
  COMPAT  have_req_op_write_same
  COMPAT  have_req_op_write_zeroes
  COMPAT  have_req_prio
  COMPAT  have_req_write
  COMPAT  have_req_write_same
  COMPAT  have_revalidate_disk_size
  COMPAT  have_sched_set_fifo
  COMPAT  have_security_netlink_recv
  COMPAT  have_sendpage_ok
  COMPAT  have_set_capacity_and_notify
  COMPAT  have_shash_desc_zero
  COMPAT  have_simple_positive
  COMPAT  have_sock_set_keepalive
  COMPAT  have_struct_bvec_iter
  COMPAT  have_struct_kernel_param_ops
  COMPAT  have_struct_size
  COMPAT  have_submit_bio
  COMPAT  have_submit_bio_noacct
  COMPAT  have_tcp_sock_set_cork
  COMPAT  have_tcp_sock_set_nodelay
  COMPAT  have_tcp_sock_set_quickack
  COMPAT  have_time64_to_tm
  COMPAT  have_timer_setup
  COMPAT  have_void_make_request
  COMPAT  ib_alloc_pd_has_2_params
  COMPAT  ib_device_has_ops
  COMPAT  ib_post_send_const_params
  COMPAT  ib_query_device_has_3_params
  COMPAT  need_make_request_recursion
  COMPAT  part_stat_read_takes_block_device
  COMPAT  queue_limits_has_discard_zeroes_data
  COMPAT  rdma_create_id_has_net_ns
  COMPAT  sock_create_kern_has_five_parameters
  COMPAT  sock_ops_returns_addr_len
  UPD     /tmp/pkg/drbd-9.1.2/drbd/compat.5.11.15-200.fc33.x86_64.h
  UPD     /tmp/pkg/drbd-9.1.2/drbd/compat.h
  GENPATCHNAMES   5.11.15-200.fc33.x86_64
  SPATCH   42878173f07c058bc39c76853025764a  5.11.15-200.fc33.x86_64
  PATCH
patching file drbd-headers/linux/genl_magic_func.h
Hunk #2 succeeded at 312 (offset -20 lines).
  CC [M]  /tmp/pkg/drbd-9.1.2/drbd/drbd_dax_pmem.o
In file included from /tmp/pkg/drbd-9.1.2/drbd/drbd_dax_pmem.c:28:
/tmp/pkg/drbd-9.1.2/drbd/drbd_int.h:1782:29: warning: "BIO_MAX_VECS" is not defined, evaluates to 0 [-Wundef]
 1782 | #define DRBD_BIO_MAX_PAGES (BIO_MAX_VECS << PAGE_SHIFT)
      |                             ^~~~~~~~~~~~
/tmp/pkg/drbd-9.1.2/drbd/drbd_int.h:1783:25: note: in expansion of macro 'DRBD_BIO_MAX_PAGES'
 1783 | #if DRBD_MAX_BIO_SIZE > DRBD_BIO_MAX_PAGES
      |                         ^~~~~~~~~~~~~~~~~~
/tmp/pkg/drbd-9.1.2/drbd/drbd_int.h:1784:2: error: #error Architecture not supported: DRBD_MAX_BIO_SIZE > (BIO_MAX_VECS << PAGE_SHIFT)
 1784 | #error Architecture not supported: DRBD_MAX_BIO_SIZE > (BIO_MAX_VECS << PAGE_SHIFT)
      |  ^~~~~
make[1]: Leaving directory '/tmp/pkg/drbd-9.1.2/drbd'
make[3]: *** [scripts/Makefile.build:279: /tmp/pkg/drbd-9.1.2/drbd/drbd_dax_pmem.o] Error 1
make[2]: *** [Makefile:1821: /tmp/pkg/drbd-9.1.2/drbd] Error 2
make[1]: *** [Makefile:132: kbuild] Error 2
make: *** [Makefile:128: module] Error 2

Could not find the expexted *.ko, see stderr for more details

low performance on /dev/drbd1

Hi. I am testing DRBD (kmod version 9.2.2). When I testing a disk without drbd (fio -ioengine=libaio -direct=1 -name=test -bs=4k -iodepth=32 -rw=randwrite -runtime=60 -filename=/dev/sdc) I get 230k iops. However, a similar test, but on /dev/drbd1 shows no more than 50k-60k iops. I tried disabling the resource (drbdadm disconnect r0) to exclude the replication network and test again, but got the same result. Tell me, please, what could be the reason for the degradation of local performance /dev/drbd1?

My *.res file in attach
r0.txt

Immediate Crash/Reboot when autoplace > 0 on 6.1.x kernel

Hello,

I'm using Talos Linux, and trying to use DRBD for the piraeus operator. When autoplace/replicas is 1, there's no issue, but once I set it to 3 and try to provision a pvc against it, the node the pvc's pod is scheduled on reboots.

I initially opened an issue with Talos Linux, but they referred me to open an issue here as it appears to be related to the module compiled against 6.1 kernel.

I have attached a talos dmesg log. I've attached the "talos dmesg" kernel log, and at the time of the crash there's no additional kernel panic/messages on the console.

Thank you!
drbd-crash.log

Kernel Panic

While working on linstor-gateway, I was able to panic drbd on kernel-lt and 5.14.21.

The change I made to Linstor-Gateway is removing the "must be offline" check located here: https://github.com/LINBIT/linstor-gateway/blob/master/pkg/nvmeof/nvmeof.go#L271-L274

		status := linstorcontrol.StatusFromResources(path, resourceDefinition, resourceGroup, resources)
		if status.Service == common.ServiceStateStarted {
			return nil, errors.New("cannot add volume while service is running")
		}

I'm not sure if this change is related or not, but I wouldn't expect this to result in a panic.

zfs:

[root@ac-1f-6b-9e-e5-46 zfs]# zfs --version
zfs-2.1.4-1
zfs-kmod-2.1.4-1

drbd:

[root@ac-1f-6b-9e-e5-46 zfs]# drbdadm --version
DRBDADM_BUILDTAG=GIT-hash:\ 9aeb1059d37b92fec8db2b47e356c4e7fa030b64\ build\ by\ root@drbd-lsc-0\,\ 2022-06-23\ 05:01:03
DRBDADM_API_VERSION=2
DRBD_KERNEL_VERSION_CODE=0x090107
DRBD_KERNEL_VERSION=9.1.7
DRBDADM_VERSION_CODE=0x091500
DRBDADM_VERSION=9.21.0

drbd-reactor:

[root@ac-1f-6b-9e-e5-46 zfs]# drbd-reactor --version
drbd-reactor 0.7.0

Kernel: Linux ac-1f-6b-9e-e5-46 5.4.205-1.el8.elrepo.x86_64 #1 SMP Tue Jul 12 10:48:44 EDT 2022 x86_64 x86_64 x86_64 GNU/Linux

[ 1758.843242] drbd milliman/1 drbd1001 ac-1f-6b-9e-e5-46: helper command: /sbin/drbdadm before-resync-target exit code 0
[ 1758.865053] drbd milliman: Aborting cluster-wide state change 2530163159 (31ms) rv = -19
[ 1758.873892] drbd milliman: Preparing cluster-wide state change 247777449 (1->-1 3/1)
[ 1758.899606] drbd milliman ac-1f-6b-9e-e5-46: Aborting local state change 247777449 to yield to remote state change 1249144741.
[ 1758.912455] drbd milliman: Aborting cluster-wide state change 247777449 (38ms) rv = -19
[ 1758.921189] drbd milliman: Preparing cluster-wide state change 1328687619 (1->-1 3/1)
[ 1758.929707] drbd milliman: Aborting cluster-wide state change 1328687619 (9ms) rv = -19
[ 1758.938412] drbd milliman: Preparing cluster-wide state change 2976414762 (1->-1 3/1)
[ 1758.946915] drbd milliman: Aborting cluster-wide state change 2976414762 (9ms) rv = -19
[ 1758.967206] drbd milliman ac-1f-6b-9e-e5-46: Preparing remote state change 1249144741
[ 1759.000135] drbd milliman ac-1f-6b-9e-e5-46: Committing remote state change 1249144741 (primary_nodes=1)
[ 1759.010216] drbd milliman ac-1f-6b-9e-e5-46: peer( Secondary -> Primary )
[ 1759.069750] drbd milliman/1 drbd1001: disk( Outdated -> Inconsistent )
[ 1759.076966] drbd milliman/1 drbd1001 ac-1f-6b-a5-ab-ea: resync-susp( no -> connection dependency )
[ 1759.086600] drbd milliman/1 drbd1001 ac-1f-6b-9e-e5-46: repl( WFBitMapT -> SyncTarget )
[ 1759.095740] drbd milliman/0 drbd1000: disk( Outdated -> Inconsistent )
[ 1759.102917] drbd milliman/0 drbd1000 ac-1f-6b-a5-ab-ea: resync-susp( no -> connection dependency )
[ 1759.112497] drbd milliman/0 drbd1000 ac-1f-6b-9e-e5-46: repl( WFBitMapT -> SyncTarget )
[ 1759.121267] drbd milliman/1 drbd1001 ac-1f-6b-9e-e5-46: Began resync as SyncTarget (will sync 5066752 KB [1266688 bits set]).
[ 1759.133258] drbd milliman/0 drbd1000 ac-1f-6b-9e-e5-46: Began resync as SyncTarget (will sync 32768 KB [8192 bits set]).
[ 1759.133451] drbd milliman/0 drbd1000 ac-1f-6b-9e-e5-46: received new current UUID: DBD3CCFBFA3D8BAF weak_nodes=FFFFFFFFFFFFFFFC
[ 1759.263009] drbd milliman/1 drbd1001 ac-1f-6b-9e-e5-46: received new current UUID: 7AD2E749AAAFFC69 weak_nodes=FFFFFFFFFFFFFFFC
[ 1760.028877] drbd milliman/0 drbd1000 ac-1f-6b-9e-e5-46: Resync done (total 1 sec; paused 0 sec; 32768 K/sec)
[ 1760.039382] drbd milliman/0 drbd1000 ac-1f-6b-9e-e5-46: updated UUIDs DBD3CCFBFA3D8BAE:0000000000000000:C6FAEE622D6CFFFA:0000000000000000
[ 1760.053003] drbd milliman/0 drbd1000: disk( Inconsistent -> UpToDate )
[ 1760.060112] drbd milliman/0 drbd1000 ac-1f-6b-a5-ab-ea: resync-susp( connection dependency -> no )
[ 1760.069638] drbd milliman/0 drbd1000 ac-1f-6b-9e-e5-46: repl( SyncTarget -> Established )
[ 1760.079754] drbd milliman/0 drbd1000 ac-1f-6b-9e-e5-46: helper command: /sbin/drbdadm after-resync-target
[ 1760.091674] drbd milliman/0 drbd1000 ac-1f-6b-9e-e5-46: helper command: /sbin/drbdadm after-resync-target exit code 0
[ 1814.063918] drbd milliman/1 drbd1001 ac-1f-6b-9e-e5-46: Resync done (total 54 sec; paused 0 sec; 93828 K/sec)
[ 1814.074464] drbd milliman/1 drbd1001 ac-1f-6b-9e-e5-46: updated UUIDs 7AD2E749AAAFFC68:0000000000000000:9656137FABC73162:0000000000000000
[ 1814.087919] drbd milliman/1 drbd1001: disk( Inconsistent -> UpToDate )
[ 1814.094958] drbd milliman/1 drbd1001 ac-1f-6b-a5-ab-ea: resync-susp( connection dependency -> no )
[ 1814.104426] drbd milliman/1 drbd1001 ac-1f-6b-9e-e5-46: repl( SyncTarget -> Established )
[ 1814.117246] drbd milliman/1 drbd1001 ac-1f-6b-9e-e5-46: helper command: /sbin/drbdadm after-resync-target
[ 1814.132686] drbd milliman/1 drbd1001 ac-1f-6b-9e-e5-46: helper command: /sbin/drbdadm after-resync-target exit code 0
[ 1832.336993] drbd demo0/3 drbd1005: meta-data IO uses: blk-bio
[ 1832.341362] drbd demo0/3 drbd1005: disabling discards due to peer capabilities
[ 1832.344636] drbd demo0: State change failed: In transient state, retry after next state change
[ 1832.360685] drbd demo0/3 drbd1005: Failed: disk( Diskless -> Attaching )
[ 1832.368103] drbd demo0/3 drbd1005 ac-1f-6b-9e-e5-46: self 0000000000000000:0000000000000000:0000000000000000:0000000000000000 bits:0 flags:0
[ 1832.368109] drbd demo0: State change failed: In transient state, retry after next state change
[ 1832.382029] drbd demo0/3 drbd1005 ac-1f-6b-9e-e5-46: peer's exposed UUID: 0000000000000000
[ 1832.391198] drbd demo0/3 drbd1005: Failed: disk( Diskless -> Attaching )
[ 1832.407293] drbd demo0/3 drbd1005: disabling discards due to peer capabilities
[ 1832.415066] drbd demo0/3 drbd1005 ac-1f-6b-a5-ab-ea: self 0000000000000000:0000000000000000:0000000000000000:0000000000000000 bits:0 flags:0
[ 1832.428722] drbd demo0/3 drbd1005 ac-1f-6b-a5-ab-ea: peer's exposed UUID: 0000000000000000
[ 1832.437526] drbd demo0/3 drbd1005 ac-1f-6b-a5-ab-ea: pdsk( DUnknown -> Diskless ) repl( Off -> Established )
[ 1832.447954] drbd demo0: State change failed: In transient state, retry after next state change
[ 1832.457104] drbd demo0/3 drbd1005: Failed: disk( Diskless -> Attaching )
[ 1832.464352] BUG: kernel NULL pointer dereference, address: 0000000000000010
[ 1832.464439] drbd demo0: State change failed: In transient state, retry after next state change
[ 1832.471880] #PF: supervisor read access in kernel mode
[ 1832.481074] drbd demo0/3 drbd1005: Failed: disk( Diskless -> Attaching )
[ 1832.486774] #PF: error_code(0x0000) - not-present page
[ 1832.499765] PGD 0 P4D 0
[ 1832.502896] Oops: 0000 [#1] SMP NOPTI
[ 1832.507123] CPU: 0 PID: 83920 Comm: drbd_r_demo0 Tainted: P           OE     5.4.205-1.el8.elrepo.x86_64 #1
[ 1832.517442] Hardware name: Supermicro SYS-1029U-TN10RT/X11DPU, BIOS 3.1 04/29/2019
[ 1832.525626] RIP: 0010:drbd_determine_dev_size+0x5a/0x520 [drbd]
[ 1832.532118] Code: 00 48 89 44 24 78 31 c0 e8 73 e1 ff ff 48 c7 c6 b0 d4 8c c0 48 89 df e8 a4 7d fe ff 48 89 44 24 08 48 85 c0 0f 84 4a 04 00 00 <49> 8b 47 10 4d 8b 77 18 48 89 04 24 41 8b 47 48 89 44 24 18 49 8b
[ 1832.551934] RSP: 0018:ffffaaaff1587d00 EFLAGS: 00010286
[ 1832.557700] RAX: ffff9c7326224000 RBX: ffff9c72d0a1e000 RCX: 0000000000000000
[ 1832.565368] RDX: 0000000000000001 RSI: ffffffffc08cd4b0 RDI: ffff9c72d0a1e000
[ 1832.573019] RBP: 0000000000000000 R08: 0000000000000332 R09: 000000000002ea40
[ 1832.580667] R10: 0000000000008905 R11: 0000000000004482 R12: 0000000000000000
[ 1832.588284] R13: 0000000000000000 R14: ffff9c734142d000 R15: 0000000000000000
[ 1832.595889] FS:  0000000000000000(0000) GS:ffff9c1380600000(0000) knlGS:0000000000000000
[ 1832.604466] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1832.610682] CR2: 0000000000000010 CR3: 000000a9a340a003 CR4: 00000000007606f0
[ 1832.618287] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1832.625903] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 1832.633525] PKRU: 55555554
[ 1832.636734] Call Trace:
[ 1832.639648]  ? printk+0x58/0x6f
[ 1832.643241]  receive_state+0x5f7/0x1040 [drbd]
[ 1832.648125]  ? drbd_recv+0x49/0x200 [drbd]
[ 1832.652692]  ? decode_header+0x17/0x130 [drbd]
[ 1832.657606]  ? _get_ldev_if_state.part.51+0xd0/0xd0 [drbd]
[ 1832.663555]  drbd_receiver+0x5a6/0x7f0 [drbd]
[ 1832.668351]  ? __drbd_next_peer_device_ref+0x140/0x140 [drbd]
[ 1832.674534]  drbd_thread_setup+0x5e/0x160 [drbd]
[ 1832.679594]  ? __drbd_next_peer_device_ref+0x140/0x140 [drbd]
[ 1832.685790]  kthread+0x10c/0x130
[ 1832.689467]  ? kthread_park+0x80/0x80
[ 1832.693578]  ret_from_fork+0x1f/0x40
[ 1832.697591] Modules linked in: zfs(POE) zunicode(POE) zzstd(OE) zlua(OE) zavl(POE) icp(POE) zcommon(POE) znvpair(POE) spl(OE) drbd_transport_tcp(OE) drbd(OE) bcache(E) crc64(E) dm_cache(E) dm_persistent_data(E) dm_bio_prison(E) dm_bufio(E) dm_writecache(E) nvme_rdma(E) nvmet_rdma(E) rdma_cm(E) iw_cm(E) ib_cm(E) ib_core(E) 8021q(E) garp(E) mrp(E) stp(E) llc(E) intel_rapl_msr(E) intel_rapl_common(E) iTCO_wdt(E) iTCO_vendor_support(E) skx_edac(E) nfit(E) libnvdimm(E) x86_pkg_temp_thermal(E) intel_powerclamp(E) coretemp(E) kvm_intel(E) kvm(E) irqbypass(E) crct10dif_pclmul(E) crc32_pclmul(E) rfkill(E) ghash_clmulni_intel(E) rapl(E) intel_cstate(E) mei_me(E) ipmi_ssif(E) sr_mod(E) cdrom(E) intel_uncore(E) pcspkr(E) sunrpc(E) sg(E) joydev(E) i2c_i801(E) lpc_ich(E) mei(E) ioatdma(E) ipmi_si(E) acpi_power_meter(E) acpi_pad(E) vfat(E) fat(E) dm_mod(E) uas(E) usb_storage(E) xfs(E) ast(E) i2c_algo_bit(E) libcrc32c(E) drm_vram_helper(E) ttm(E) nvmet_tcp(E) drm_kms_helper(E) ixgbe(E) nvmet(E)
[ 1832.697619]  syscopyarea(E) ahci(E) sysfillrect(E) sysimgblt(E) fb_sys_fops(E) nvme_tcp(E) libahci(E) nvme_fabrics(E) crc32c_intel(E) drm(E) mdio(E) libata(E) dca(E) wmi(E) nvme(E) nvme_core(E) ipmi_devintf(E) ipmi_msghandler(E)
[ 1832.808915] CR2: 0000000000000010
[ 1832.812751] ---[ end trace 1dbb53d7f2280dec ]---
[ 1832.876741] RIP: 0010:drbd_determine_dev_size+0x5a/0x520 [drbd]
[ 1832.883115] Code: 00 48 89 44 24 78 31 c0 e8 73 e1 ff ff 48 c7 c6 b0 d4 8c c0 48 89 df e8 a4 7d fe ff 48 89 44 24 08 48 85 c0 0f 84 4a 04 00 00 <49> 8b 47 10 4d 8b 77 18 48 89 04 24 41 8b 47 48 89 44 24 18 49 8b
[ 1832.902799] RSP: 0018:ffffaaaff1587d00 EFLAGS: 00010286
[ 1832.908514] RAX: ffff9c7326224000 RBX: ffff9c72d0a1e000 RCX: 0000000000000000
[ 1832.916117] RDX: 0000000000000001 RSI: ffffffffc08cd4b0 RDI: ffff9c72d0a1e000
[ 1832.923724] RBP: 0000000000000000 R08: 0000000000000332 R09: 000000000002ea40
[ 1832.931319] R10: 0000000000008905 R11: 0000000000004482 R12: 0000000000000000
[ 1832.938913] R13: 0000000000000000 R14: ffff9c734142d000 R15: 0000000000000000
[ 1832.946497] FS:  0000000000000000(0000) GS:ffff9c1380600000(0000) knlGS:0000000000000000
[ 1832.955026] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1832.961234] CR2: 0000000000000010 CR3: 000000a9a340a003 CR4: 00000000007606f0
[ 1832.968843] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1832.976426] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 1832.983991] PKRU: 55555554
[ 1832.987134] Kernel panic - not syncing: Fatal exception
[ 1832.992928] Kernel Offset: 0x22800000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
[ 1833.063038] ---[ end Kernel panic - not syncing: Fatal exception ]---

Kernel: 5.14.21:

[  350.867400] drbd milliman/0 drbd1000 ac-1f-6b-a4-df-ee: pdsk( DUnknown -> UpToDate ) repl( Off -> Established )
[  350.877981] drbd milliman/1 drbd1001: quorum( no -> yes )
[  350.883886] drbd milliman/1 drbd1001 ac-1f-6b-a4-df-ee: pdsk( DUnknown -> UpToDate ) repl( Off -> Established )
[ 9754.631374] drbd demo0: Starting worker thread (from drbdsetup [5294])
[ 9754.641352] drbd demo0 ac-1f-6b-a4-df-ee: Starting sender thread (from drbdsetup [5302])
[ 9754.663695] drbd demo0/0 drbd1002: meta-data IO uses: blk-bio
[ 9754.670334] drbd demo0/0 drbd1002: disk( Diskless -> Attaching )
[ 9754.676960] drbd demo0/0 drbd1002: Maximum number of peer devices = 7
[ 9754.684075] drbd demo0: Method to ensure write ordering: flush
[ 9754.690514] drbd demo0/0 drbd1002: drbd_bm_resize called with capacity == 131080
[ 9754.698516] drbd demo0/0 drbd1002: resync bitmap: bits=16385 words=1799 pages=4
[ 9754.706406] drbd1002: detected capacity change from 0 to 131080
[ 9754.712915] drbd demo0/0 drbd1002: size = 64 MB (65540 KB)
[ 9754.719120] drbd demo0/0 drbd1002: recounting of set bits took additional 0ms
[ 9754.726831] drbd demo0/0 drbd1002: disk( Attaching -> UpToDate )
[ 9754.733397] drbd demo0/0 drbd1002: attached to current UUID: ECDCAF858EE6D814
[ 9754.741100] drbd demo0/0 drbd1002: size = 64 MB (65540 KB)
[ 9754.774618] drbd demo0/1 drbd1003: meta-data IO uses: blk-bio
[ 9754.781254] drbd demo0/1 drbd1003: disk( Diskless -> Attaching )
[ 9754.787826] drbd demo0/1 drbd1003: Maximum number of peer devices = 7
[ 9754.794870] drbd demo0/1 drbd1003: drbd_bm_resize called with capacity == 209715208
[ 9754.820604] drbd demo0/1 drbd1003: resync bitmap: bits=26214401 words=2867207 pages=5601
[ 9754.829279] drbd1003: detected capacity change from 0 to 209715208
[ 9754.836024] drbd demo0/1 drbd1003: size = 100 GB (104857604 KB)
[ 9754.879087] drbd demo0/1 drbd1003: recounting of set bits took additional 13ms
[ 9754.895230] drbd demo0/1 drbd1003: disk( Attaching -> UpToDate )
[ 9754.901761] drbd demo0/1 drbd1003: attached to current UUID: 6048848BCEFDCF0A
[ 9754.909484] drbd demo0/1 drbd1003: size = 100 GB (104857604 KB)
[ 9754.911097] drbd demo0 ac-1f-6b-a4-df-ee: conn( StandAlone -> Unconnected )
[ 9754.923894] drbd demo0 ac-1f-6b-a4-df-ee: Starting receiver thread (from drbd_w_demo0 [5295])
[ 9754.933155] drbd demo0 ac-1f-6b-a4-df-ee: conn( Unconnected -> Connecting )
[ 9755.446373] drbd demo0 ac-1f-6b-a4-df-ee: Handshake to peer 0 successful: Agreed network protocol version 121
[ 9755.456896] drbd demo0 ac-1f-6b-a4-df-ee: Feature flags enabled on protocol level: 0xf TRIM THIN_RESYNC WRITE_SAME WRITE_ZEROES.
[ 9755.470894] drbd demo0 ac-1f-6b-a4-df-ee: Peer authenticated using 20 bytes HMAC
[ 9755.478807] drbd demo0 ac-1f-6b-a4-df-ee: Starting ack_recv thread (from drbd_r_demo0 [5444])
[ 9755.521518] drbd demo0 ac-1f-6b-a4-df-ee: Preparing remote state change 1818502727
[ 9755.543252] drbd demo0/0 drbd1002 ac-1f-6b-a4-df-ee: drbd_sync_handshake:
[ 9755.550472] drbd demo0/0 drbd1002 ac-1f-6b-a4-df-ee: self ECDCAF858EE6D814:0000000000000000:0000000000000000:0000000000000000 bits:0 flags:20
[ 9755.564013] drbd demo0/0 drbd1002 ac-1f-6b-a4-df-ee: peer E62158D3D1AFA478:ECDCAF858EE6D814:0000000000000000:0000000000000000 bits:16385 flags:20
[ 9755.577934] drbd demo0/0 drbd1002 ac-1f-6b-a4-df-ee: uuid_compare()=target-use-bitmap by rule=bitmap-peer
[ 9755.602244] drbd demo0/1 drbd1003 ac-1f-6b-a4-df-ee: drbd_sync_handshake:
[ 9755.609477] drbd demo0/1 drbd1003 ac-1f-6b-a4-df-ee: self 6048848BCEFDCF0A:0000000000000000:0000000000000000:0000000000000000 bits:0 flags:20
[ 9755.623038] drbd demo0/1 drbd1003 ac-1f-6b-a4-df-ee: peer 6048848BCEFDCF0A:0000000000000000:0000000000000000:0000000000000000 bits:0 flags:20
[ 9755.636634] drbd demo0/1 drbd1003 ac-1f-6b-a4-df-ee: uuid_compare()=no-sync by rule=both-off
[ 9755.683092] drbd demo0 ac-1f-6b-a4-df-ee: Committing remote state change 1818502727 (primary_nodes=0)
[ 9755.692790] drbd demo0 ac-1f-6b-a4-df-ee: conn( Connecting -> Connected ) peer( Unknown -> Secondary )
[ 9755.702545] drbd demo0/0 drbd1002: disk( UpToDate -> Outdated )
[ 9755.708922] drbd demo0/0 drbd1002 ac-1f-6b-a4-df-ee: pdsk( DUnknown -> UpToDate ) repl( Off -> WFBitMapT )
[ 9755.719032] drbd demo0/1 drbd1003 ac-1f-6b-a4-df-ee: pdsk( DUnknown -> UpToDate ) repl( Off -> Established )
[ 9755.738450] drbd demo0/0 drbd1002 ac-1f-6b-a4-df-ee: receive bitmap stats [Bytes(packets)]: plain 0(0), RLE 21(1), total 21; compression: 99.0%
[ 9755.760668] drbd demo0/0 drbd1002 ac-1f-6b-a4-df-ee: send bitmap stats [Bytes(packets)]: plain 0(0), RLE 21(1), total 21; compression: 99.0%
[ 9755.782646] drbd demo0/0 drbd1002 ac-1f-6b-a4-df-ee: helper command: /sbin/drbdadm before-resync-target
[ 9755.800952] drbd demo0/0 drbd1002 ac-1f-6b-a4-df-ee: helper command: /sbin/drbdadm before-resync-target exit code 0
[ 9755.820258] drbd demo0/0 drbd1002: disk( Outdated -> Inconsistent )
[ 9755.827064] drbd demo0/0 drbd1002 ac-1f-6b-a4-df-ee: repl( WFBitMapT -> SyncTarget )
[ 9755.835374] drbd demo0/0 drbd1002 ac-1f-6b-a4-df-ee: Began resync as SyncTarget (will sync 65540 KB [16385 bits set]).
[ 9756.385396] drbd demo0 ac-1f-6b-a4-df-ee: Preparing remote state change 3350157584
[ 9756.425889] drbd demo0 ac-1f-6b-a4-df-ee: Committing remote state change 3350157584 (primary_nodes=1)
[ 9756.435640] drbd demo0 ac-1f-6b-a4-df-ee: peer( Secondary -> Primary )
[ 9758.280497] drbd demo0/0 drbd1002 ac-1f-6b-a4-df-ee: Resync done (total 2 sec; paused 0 sec; 32768 K/sec)
[ 9758.290632] drbd demo0/0 drbd1002 ac-1f-6b-a4-df-ee: updated UUIDs E62158D3D1AFA478:0000000000000000:0000000000000000:0000000000000000
[ 9758.303782] drbd demo0/0 drbd1002: disk( Inconsistent -> UpToDate )
[ 9758.310611] drbd demo0/0 drbd1002 ac-1f-6b-a4-df-ee: repl( SyncTarget -> Established )
[ 9758.319908] drbd demo0/0 drbd1002 ac-1f-6b-a4-df-ee: helper command: /sbin/drbdadm after-resync-target
[ 9758.330851] drbd demo0/0 drbd1002 ac-1f-6b-a4-df-ee: helper command: /sbin/drbdadm after-resync-target exit code 0
[ 9778.139037] drbd demo0/2 drbd1004: meta-data IO uses: blk-bio
[ 9778.145730] drbd demo0: State change failed: In transient state, retry after next state change
[ 9778.154976] drbd demo0/2 drbd1004: Failed: disk( Diskless -> Attaching )
[ 9778.162307] drbd demo0: State change failed: In transient state, retry after next state change
[ 9778.171518] drbd demo0/2 drbd1004: Failed: disk( Diskless -> Attaching )
[ 9778.472135] drbd demo0/2 drbd1004: disabling discards due to peer capabilities
[ 9778.480103] drbd demo0/2 drbd1004 ac-1f-6b-a4-df-ee: self 0000000000000000:0000000000000000:0000000000000000:0000000000000000 bits:0 flags:0
[ 9778.493887] drbd demo0/2 drbd1004 ac-1f-6b-a4-df-ee: peer's exposed UUID: 0000000000000000
[ 9778.511106] BUG: kernel NULL pointer dereference, address: 0000000000000018
[ 9778.518618] #PF: supervisor read access in kernel mode
[ 9778.524305] #PF: error_code(0x0000) - not-present page
[ 9778.529987] PGD 0 P4D 0
[ 9778.533056] Oops: 0000 [#1] SMP NOPTI
[ 9778.537250] CPU: 1 PID: 5444 Comm: drbd_r_demo0 Tainted: P S         OE     5.14.21 #1
[ 9778.545682] Hardware name: Supermicro SYS-1029U-TN10RT/X11DPU, BIOS 3.1 04/29/2019
[ 9778.553758] RIP: 0010:drbd_determine_dev_size+0x5a/0x550 [drbd]
[ 9778.560208] Code: 00 48 89 44 24 78 31 c0 e8 13 e1 ff ff 48 c7 c6 f0 04 2c c1 48 89 df e8 14 72 fe ff 48 89 44 24 10 48 85 c0 0f 84 73 04 00 00 <49> 8b 47 18 48 89 04 24 49 8b 47 10 48 89 44 24 08 41 8b 47 48 89
[ 9778.580031] RSP: 0018:ffffb2d58151bd00 EFLAGS: 00010286
[ 9778.585772] RAX: ffff89be458a5000 RBX: ffff89be9365c000 RCX: 0000000000000000
[ 9778.593410] RDX: 0000000000000001 RSI: ffffffffc12c04f0 RDI: ffff89be9365c000
[ 9778.601035] RBP: 0000000000000000 R08: 0000000000000140 R09: 0000000000000180
[ 9778.608656] R10: 0000000000000140 R11: 0000000000004afc R12: 0000000000000000
[ 9778.616268] R13: ffff89bfb7160800 R14: 0000000000000000 R15: 0000000000000000
[ 9778.623881] FS:  0000000000000000(0000) GS:ffff89bcc0840000(0000) knlGS:0000000000000000
[ 9778.632444] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 9778.638660] CR2: 0000000000000018 CR3: 0000007dc6e0a006 CR4: 00000000007706e0
[ 9778.646293] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 9778.653898] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 9778.661486] PKRU: 55555554
[ 9778.664647] Call Trace:
[ 9778.667539]  <TASK>
[ 9778.670062]  ? vprintk_emit+0x128/0x270
[ 9778.674333]  ? printk+0x58/0x6f
[ 9778.677900]  receive_state+0x5f5/0x1080 [drbd]
[ 9778.682779]  ? receive_uuids110+0x570/0x570 [drbd]
[ 9778.687996]  ? drbd_recv+0x46/0x220 [drbd]
[ 9778.692510]  ? decode_header+0x17/0x140 [drbd]
[ 9778.697368]  ? receive_uuids110+0x570/0x570 [drbd]
[ 9778.702565]  drbd_receiver+0x598/0x830 [drbd]
[ 9778.707327]  drbd_thread_setup+0x76/0x1b0 [drbd]
[ 9778.712347]  ? __drbd_next_peer_device_ref+0x1a0/0x1a0 [drbd]
[ 9778.718485]  kthread+0x118/0x140
[ 9778.722092]  ? set_kthread_struct+0x40/0x40
[ 9778.726649]  ret_from_fork+0x1f/0x30
[ 9778.730601]  </TASK>
[ 9778.733164] Modules linked in: drbd_transport_tcp(OE) drbd(OE) bcache crc64 dm_cache dm_persistent_data dm_bio_prison dm_bufio dm_writecache nvme_rdma nvmet_rdma rdma_cm iw_cm ib_cm ib_core dm_mod 8021q garp mrp stp llc rfkill sunrpc intel_rapl_msr intel_rapl_common skx_edac nfit libnvdimm x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm iTCO_wdt iTCO_vendor_support irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel rapl intel_cstate ipmi_ssif vfat mei_me fat i2c_i801 intel_uncore joydev pcspkr mei acpi_ipmi ioatdma i2c_smbus lpc_ich ipmi_si acpi_power_meter acpi_pad binfmt_misc zfs(POE) zunicode(POE) zzstd(OE) zlua(OE) zavl(POE) icp(POE) zcommon(POE) znvpair(POE) spl(OE) nvmet_tcp nvmet nvme_tcp nvme_fabrics xfs libcrc32c sd_mod sg ast i2c_algo_bit drm_vram_helper drm_kms_helper syscopyarea sysfilblt fb_sys_fops drm_ttm_helper ttm drm ixgbe ahci libahci nvme uas nvme_core libata crc32c_intel usb_storage mdio t10_pi dca wmi ipmi_devintf ipmi_msghandler
[ 9778.823619] CR2: 0000000000000018
[ 9778.827377] ---[ end trace 09a2a2ea66dcaf4b ]---
[ 9778.895569] RIP: 0010:drbd_determine_dev_size+0x5a/0x550 [drbd]
[ 9778.901918] Code: 00 48 89 44 24 78 31 c0 e8 13 e1 ff ff 48 c7 c6 f0 04 2c c1 48 89 df e8 14 72 fe ff 48 89 44 24 10 48 85 c0 0f 84 73 04 00 00 <49> 8b 47 18 48 89 04 24 49 8b 47 10 48 89 44 24 08 41 8b 47 48 89
[ 9778.921526] RSP: 0018:ffffb2d58151bd00 EFLAGS: 00010286
[ 9778.927182] RAX: ffff89be458a5000 RBX: ffff89be9365c000 RCX: 0000000000000000
[ 9778.934752] RDX: 0000000000000001 RSI: ffffffffc12c04f0 RDI: ffff89be9365c000
[ 9778.942314] RBP: 0000000000000000 R08: 0000000000000140 R09: 0000000000000180
[ 9778.949877] R10: 0000000000000140 R11: 0000000000004afc R12: 0000000000000000
[ 9778.957422] R13: ffff89bfb7160800 R14: 0000000000000000 R15: 0000000000000000
[ 9778.964962] FS:  0000000000000000(0000) GS:ffff89bcc0840000(0000) knlGS:0000000000000000
[ 9778.973455] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 9778.979608] CR2: 0000000000000018 CR3: 0000007dc6e0a006 CR4: 00000000007706e0
[ 9778.987151] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 9778.994689] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 9779.002226] PKRU: 55555554
[ 9779.005349] Kernel panic - not syncing: Fatal exception
[ 9779.011059] Kernel Offset: 0x36600000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
[ 9779.064919] ---[ end Kernel panic - not syncing: Fatal exception ]---

DRBD 9.1.12: Quorum lost on secondary

Several resources are shown with status "quorum lost" in drbdmon but only on secondaries and only on one node. IO is working fine.

This is the output of drbdsetup show:

root@de-fra-node11:/# drbdsetup show pvc-96c2dc8b-31aa-4b28-8ab7-332b58de64b5
resource "pvc-96c2dc8b-31aa-4b28-8ab7-332b58de64b5" {
    options {
        on-no-data-accessible   suspend-io;
        quorum                  majority;
        quorum-minimum-redundancy       2;
        on-suspended-primary-outdated   force-secondary;
    }
    _this_host {
        node-id                 1;
        volume 0 {
            device                      minor 1176;
            disk                        "/dev/vg-pool/pvc-96c2dc8b-31aa-4b28-8ab7-332b58de64b5_00000";
            meta-disk                   internal;
            disk {
                rs-discard-granularity  65536; # bytes
            }
        }
    }
    connection {
        _peer_node_id 2;
        path {
            _this_host ipv4 192.168.10.214:7176;
            _remote_host ipv4 192.168.11.191:7176;
        }
        net {
            cram-hmac-alg       "sha1";
            shared-secret       "JLYRWkUgIJjidWuQ2DTX";
            rr-conflict         retry-connect;
            verify-alg          "sha1";
            max-buffers         12288;
            _name               "de-fra-node54";
        }
    }
    connection {
        _peer_node_id 0;
        path {
            _this_host ipv4 192.168.10.214:7176;
            _remote_host ipv4 192.168.11.193:7176;
        }
        net {
            cram-hmac-alg       "sha1";
            shared-secret       "JLYRWkUgIJjidWuQ2DTX";
            rr-conflict         retry-connect;
            verify-alg          "sha1";
            max-buffers         12288;
            _name               "de-fra-node56";
        }
    }
}

And this is the output of drbdadm status:

root@de-fra-node11:/# drbdadm status pvc-96c2dc8b-31aa-4b28-8ab7-332b58de64b5
pvc-96c2dc8b-31aa-4b28-8ab7-332b58de64b5 role:Secondary suspended:quorum
  disk:UpToDate quorum:no blocked:upper
  de-fra-node54 role:Primary
    peer-disk:UpToDate
  de-fra-node56 role:Secondary
    peer-disk:UpToDate

This is drbdmon:

⤷RES: pvc-96c2dc8b-31aa-4b28-8ab7-332b58de64b5          Secondary   QUORUM LOST
    ✗    0:   1176  UpToDate            
    ⚬⇄⚫de-fra-node54 ⚬⇄ de-fra-node56

On the other peers drbdadm reports no issues.

Why does node11 show a quorum issue while node54+56 show the disk as "UpToDate"? Shouldn't the peers show the issue as well? And shouldn't DRBD try to recover from the "no quorum" state automatically?

N.B. In Linstor the volume is shown as "UpToDate" on all nodes while drbdmon is reporting "QUORUM LOST" which makes detecting pretty difficult.

drbd-9.1.7 & upstream build errors on Centos Stream 9

Hi

I'm trying to build drbd from source (9.1.7 tarball and upstream) on Centos Stream 9 (20220705). Both ends with same errors, as per log below:

9.1.7 tarball error log

[root@node drbd-9.1.7]# make
Need a git checkout to regenerate drbd/.drbd_git_revision
make[1]: Entering directory '/root/drbd-9.1.7/drbd'

    Calling toplevel makefile of kernel source tree, which I believe is in
    KDIR=/lib/modules/5.14.0-124.el9.x86_64/build

make -C /lib/modules/5.14.0-124.el9.x86_64/build   M=/root/drbd-9.1.7/drbd  modules
  COMPAT  __vmalloc_has_2_params
  COMPAT  before_4_13_kernel_read
  COMPAT  blkdev_issue_zeroout_discard
  COMPAT  can_include_vermagic_h
  COMPAT  genl_policy_in_ops
  COMPAT  have_BIO_MAX_VECS
  COMPAT  have_CRYPTO_TFM_NEED_KEY
  COMPAT  have_SHASH_DESC_ON_STACK
  COMPAT  have_WB_congested_enum
  COMPAT  have_allow_kernel_signal
  COMPAT  have_bdgrab
  COMPAT  have_bdi_congested_fn
  COMPAT  have_bio_bi_bdev
  COMPAT  have_bio_bi_error
  COMPAT  have_bio_bi_opf
  COMPAT  have_bio_bi_status
  COMPAT  have_bio_clone_fast
  COMPAT  have_bio_op_shift
  COMPAT  have_bio_set_dev
  COMPAT  have_bio_set_op_attrs
  COMPAT  have_bio_start_io_acct
  COMPAT  have_bioset_init
  COMPAT  have_bioset_need_bvecs
  COMPAT  have_blk_alloc_disk
  COMPAT  have_blk_alloc_queue_rh
  COMPAT  have_blk_check_plugged
  COMPAT  have_blk_qc_t_make_request
  COMPAT  have_blk_qc_t_submit_bio
  COMPAT  have_blk_queue_flag_set
  COMPAT  have_blk_queue_make_request
  COMPAT  have_blk_queue_merge_bvec
  COMPAT  have_blk_queue_split_bio
  COMPAT  have_blk_queue_split_q_bio
  COMPAT  have_blk_queue_split_q_bio_bioset
  COMPAT  have_blk_queue_update_readahead
  COMPAT  have_blk_queue_write_cache
  COMPAT  have_d_inode
  COMPAT  have_disk_update_readahead
  COMPAT  have_fallthrough
  COMPAT  have_fs_dax_get_by_bdev
  COMPAT  have_generic_start_io_acct_q_rw_sect_part
  COMPAT  have_generic_start_io_acct_rw_sect_part
  COMPAT  have_genl_family_parallel_ops
  COMPAT  have_hd_struct
  COMPAT  have_ib_cq_init_attr
  COMPAT  have_ib_get_dma_mr
  COMPAT  have_idr_is_empty
  COMPAT  have_inode_lock
  COMPAT  have_ktime_to_timespec64
  COMPAT  have_kvfree
  COMPAT  have_max_send_recv_sge
  COMPAT  have_nla_nest_start_noflag
  COMPAT  have_nla_parse_deprecated
  COMPAT  have_nla_put_64bit
  COMPAT  have_nla_strscpy
  COMPAT  have_part_stat_h
  COMPAT  have_part_stat_read_accum
  COMPAT  have_pointer_backing_dev_info
  COMPAT  have_proc_create_single
  COMPAT  have_queue_flag_stable_writes
  COMPAT  have_rb_declare_callbacks_max
  COMPAT  have_refcount_inc
  COMPAT  have_req_hardbarrier
  COMPAT  have_req_noidle
  COMPAT  have_req_nounmap
  COMPAT  have_req_op_write
  COMPAT  have_req_op_write_zeroes
  COMPAT  have_req_write
  COMPAT  have_revalidate_disk_size
  COMPAT  have_sched_set_fifo
  COMPAT  have_security_netlink_recv
  COMPAT  have_sendpage_ok
  COMPAT  have_set_capacity_and_notify
  COMPAT  have_shash_desc_zero
  COMPAT  have_simple_positive
  COMPAT  have_sock_set_keepalive
  COMPAT  have_struct_bvec_iter
  COMPAT  have_struct_size
  COMPAT  have_submit_bio_noacct
  COMPAT  have_tcp_sock_set_cork
  COMPAT  have_tcp_sock_set_nodelay
  COMPAT  have_tcp_sock_set_quickack
  COMPAT  have_time64_to_tm
  COMPAT  have_timer_setup
  COMPAT  have_void_make_request
  COMPAT  have_void_submit_bio
  COMPAT  ib_alloc_pd_has_2_params
  COMPAT  ib_device_has_ops
  COMPAT  ib_post_send_const_params
  COMPAT  ib_query_device_has_3_params
  COMPAT  need_make_request_recursion
  COMPAT  part_stat_read_takes_block_device
  COMPAT  queue_limits_has_discard_zeroes_data
  COMPAT  rdma_create_id_has_net_ns
  COMPAT  sock_create_kern_has_five_parameters
  COMPAT  sock_ops_returns_addr_len
  COMPAT  struct_gendisk_has_backing_dev_info
make[4]: 'drbd-kernel-compat/cocci_cache/46bccf002fae330baad3a38b6ebe0a06/compat.patch' is up to date.
  PATCH
patching file drbd_receiver.c
patching file drbd_nl.c
patching file drbd_main.c
patching file drbd_debugfs.c
patching file drbd-headers/linux/genl_magic_func.h
Hunk #2 succeeded at 312 (offset -20 lines).
  CC [M]  /root/drbd-9.1.7/drbd/drbd_dax_pmem.o
  CC [M]  /root/drbd-9.1.7/drbd/drbd_debugfs.o
  CC [M]  /root/drbd-9.1.7/drbd/drbd_bitmap.o
In file included from ./include/linux/slab.h:15,
                 from /root/drbd-9.1.7/drbd/drbd_bitmap.c:19:
/root/drbd-9.1.7/drbd/drbd_bitmap.c: In function ‘bm_page_io_async’:
./include/linux/gfp.h:327:25: warning: passing argument 1 of ‘bio_alloc_bioset’ makes pointer from integer without a cast [-Wint-conversion]
  327 | #define GFP_NOIO        (__GFP_RECLAIM)
      |                         ^~~~~~~~~~~~~~~
      |                         |
      |                         unsigned int
/root/drbd-9.1.7/drbd/drbd_bitmap.c:1188:32: note: in expansion of macro ‘GFP_NOIO’
 1188 |         bio = bio_alloc_bioset(GFP_NOIO, 1, &drbd_md_io_bio_set);
      |                                ^~~~~~~~
In file included from ./include/linux/libnvdimm.h:14,
                 from /root/drbd-9.1.7/drbd/drbd_bitmap.c:21:
./include/linux/bio.h:408:51: note: expected ‘struct block_device *’ but argument is of type ‘unsigned int’
  408 | struct bio *bio_alloc_bioset(struct block_device *bdev, unsigned short nr_vecs,
      |                              ~~~~~~~~~~~~~~~~~~~~~^~~~
/root/drbd-9.1.7/drbd/drbd_bitmap.c:1188:45: warning: passing argument 3 of ‘bio_alloc_bioset’ makes integer from pointer without a cast [-Wint-conversion]
 1188 |         bio = bio_alloc_bioset(GFP_NOIO, 1, &drbd_md_io_bio_set);
      |                                             ^~~~~~~~~~~~~~~~~~~
      |                                             |
      |                                             struct bio_set *
In file included from ./include/linux/libnvdimm.h:14,
                 from /root/drbd-9.1.7/drbd/drbd_bitmap.c:21:
./include/linux/bio.h:409:43: note: expected ‘unsigned int’ but argument is of type ‘struct bio_set *’
  409 |                              unsigned int opf, gfp_t gfp_mask,
      |                              ~~~~~~~~~~~~~^~~
/root/drbd-9.1.7/drbd/drbd_bitmap.c:1188:15: error: too few arguments to function ‘bio_alloc_bioset’
 1188 |         bio = bio_alloc_bioset(GFP_NOIO, 1, &drbd_md_io_bio_set);
      |               ^~~~~~~~~~~~~~~~
In file included from ./include/linux/libnvdimm.h:14,
                 from /root/drbd-9.1.7/drbd/drbd_bitmap.c:21:
./include/linux/bio.h:408:13: note: declared here
  408 | struct bio *bio_alloc_bioset(struct block_device *bdev, unsigned short nr_vecs,
      |             ^~~~~~~~~~~~~~~~
make[3]: *** [scripts/Makefile.build:271: /root/drbd-9.1.7/drbd/drbd_bitmap.o] Error 1
make[2]: *** [Makefile:1881: /root/drbd-9.1.7/drbd] Error 2
make[1]: *** [Makefile:132: kbuild] Error 2
make[1]: Leaving directory '/root/drbd-9.1.7/drbd'
make: *** [Makefile:125: module] Error 2

upstream git repo error log

[root@node drbd]# git remote -vv
origin	https://github.com/LINBIT/drbd.git (fetch)
origin	https://github.com/LINBIT/drbd.git (push)
[root@node drbd]# git branch
* drbd-9.1
[root@node drbd]# git log -p | head -5
commit 34a655ba957f25d05ad8bb88f5d3dc408b50bd9a
Author: Philipp Reisner <[email protected]>
Date:   Wed Jul 6 17:44:32 2022 +0200

    Prepare drbd-9.1.8-rc.1

[root@node drbd]# git status
On branch drbd-9.1
Your branch is up to date with 'origin/drbd-9.1'.

nothing to commit, working tree clean
[root@node drbd]# make
make[1]: Entering directory '/root/drbd/drbd'

    Calling toplevel makefile of kernel source tree, which I believe is in
    KDIR=/lib/modules/5.14.0-124.el9.x86_64/build

make -C /lib/modules/5.14.0-124.el9.x86_64/build   M=/root/drbd/drbd  modules
  PATCH
patching file drbd_receiver.c
patching file drbd_nl.c
patching file drbd_main.c
patching file drbd_debugfs.c
patching file drbd_dax_pmem.c
  CC [M]  /root/drbd/drbd/drbd_dax_pmem.o
  CC [M]  /root/drbd/drbd/drbd_debugfs.o
  CC [M]  /root/drbd/drbd/drbd_bitmap.o
In file included from ./include/linux/slab.h:15,
                 from /root/drbd/drbd/drbd_bitmap.c:19:
/root/drbd/drbd/drbd_bitmap.c: In function ‘bm_page_io_async’:
./include/linux/gfp.h:327:25: warning: passing argument 1 of ‘bio_alloc_bioset’ makes pointer from integer without a cast [-Wint-conversion]
  327 | #define GFP_NOIO        (__GFP_RECLAIM)
      |                         ^~~~~~~~~~~~~~~
      |                         |
      |                         unsigned int
/root/drbd/drbd/drbd_bitmap.c:1188:32: note: in expansion of macro ‘GFP_NOIO’
 1188 |         bio = bio_alloc_bioset(GFP_NOIO, 1, &drbd_md_io_bio_set);
      |                                ^~~~~~~~
In file included from ./include/linux/libnvdimm.h:14,
                 from /root/drbd/drbd/drbd_bitmap.c:21:
./include/linux/bio.h:408:51: note: expected ‘struct block_device *’ but argument is of type ‘unsigned int’
  408 | struct bio *bio_alloc_bioset(struct block_device *bdev, unsigned short nr_vecs,
      |                              ~~~~~~~~~~~~~~~~~~~~~^~~~
/root/drbd/drbd/drbd_bitmap.c:1188:45: warning: passing argument 3 of ‘bio_alloc_bioset’ makes integer from pointer without a cast [-Wint-conversion]
 1188 |         bio = bio_alloc_bioset(GFP_NOIO, 1, &drbd_md_io_bio_set);
      |                                             ^~~~~~~~~~~~~~~~~~~
      |                                             |
      |                                             struct bio_set *
In file included from ./include/linux/libnvdimm.h:14,
                 from /root/drbd/drbd/drbd_bitmap.c:21:
./include/linux/bio.h:409:43: note: expected ‘unsigned int’ but argument is of type ‘struct bio_set *’
  409 |                              unsigned int opf, gfp_t gfp_mask,
      |                              ~~~~~~~~~~~~~^~~
/root/drbd/drbd/drbd_bitmap.c:1188:15: error: too few arguments to function ‘bio_alloc_bioset’
 1188 |         bio = bio_alloc_bioset(GFP_NOIO, 1, &drbd_md_io_bio_set);
      |               ^~~~~~~~~~~~~~~~
In file included from ./include/linux/libnvdimm.h:14,
                 from /root/drbd/drbd/drbd_bitmap.c:21:
./include/linux/bio.h:408:13: note: declared here
  408 | struct bio *bio_alloc_bioset(struct block_device *bdev, unsigned short nr_vecs,
      |             ^~~~~~~~~~~~~~~~
make[3]: *** [scripts/Makefile.build:271: /root/drbd/drbd/drbd_bitmap.o] Error 1
make[2]: *** [Makefile:1881: /root/drbd/drbd] Error 2
make[1]: *** [Makefile:134: kbuild] Error 2
make[1]: Leaving directory '/root/drbd/drbd'
make: *** [Makefile:126: module] Error 2

I can provide ssh access to the box if it can help troubleshoot.

kernel crash when creating metada (4 nodes)

Hi,

i bump into a kernel oops quiet regularly resyncronizing 4 nodes that were before distributed in 2 independent clusters of 2 nodes.

I am using AWS VMs and the block device are EBS ones. the lower layer storage are lvm volumes. cluster manager is pacemaker.

The steps i follow are:

  • i create 2 independant clusters (both device on each cluster configured whith peers max number to 3, however at this stage they use only 1 peer - i do it to avoid resizing when i move the setup to geo (2+2 devices))

  • at this stage my 2 clusters are up, all ok.

  • ->from there, i can reproduce the crash doing this:

  1. (on all nodes) i change the settings files to add the 2 extra nodes (thus the drbd.conf file now has 4 nodes instead of 2)

  2. i shutdown drbd resource on all nodes

3.(on all node) drbdadm create-md --force {{ reource }} --max-peers 3

  1. (on all node) drbdadm up {{ resource }}

  2. (on ONE node): drbdadm primary --force {{ resource }}data

at this stage crash of one remote peer , around 1/3

drbdadm --version
DRBDADM_BUILDTAG=GIT-hash:\ bd7a08c4ec9cad113c4e5ad448a15c8900a67b68\ build\ by\ [email protected],\ 2021-08-12\ 15:13:28
DRBDADM_API_VERSION=2
DRBD_KERNEL_VERSION_CODE=0x090103
DRBD_KERNEL_VERSION=9.1.3
DRBDADM_VERSION_CODE=0x091202
DRBDADM_VERSION=9.18.2

[ 4003.283029] BUG: unable to handle kernel NULL pointer dereference at (null)
[ 4003.292376] IP: [< (null)>] (null)
[ 4003.297386] PGD 80000000334ee067 PUD 78a18067 PMD 0
[ 4003.302589] Oops: 0010 [#1] SMP
[ 4003.307168] Modules linked in: drbd_transport_tcp(OE) drbd(OE) ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 ipt_REJECT nf_reject_ipv4 xt_conntrack ebtable_nat ebtable_broute bridge stp llc ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle ip6table_security ip6table_raw iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat iptable_mangle iptable_security iptable_raw nf_conntrack ip_set ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter nfit libnvdimm iosf_mbi crc32_pclmul ghash_clmulni_intel aesni_intel ppdev lrw gf128mul glue_helper ablk_helper cryptd pcspkr parport_pc parport i2c_piix4 softdog ip_tables xfs libcrc32c nvme crct10dif_pclmul crct10dif_common crc32c_intel serio_raw ena nvme_core sunrpc dm_mirror dm_region_hash dm_log dm_mod
[ 4003.368720] CPU: 1 PID: 30047 Comm: drbd_w_voicepri Kdump: loaded Tainted: G OE ------------ 3.10.0-1160.36.2.el7.x86_64 #1
[ 4003.379873] Hardware name: Amazon EC2 t3.small/, BIOS 1.0 10/16/2017
[ 4003.385536] task: ffff9cc336ed8000 ti: ffff9cc33937c000 task.ti: ffff9cc33937c000
[ 4003.395064] RIP: 0010:[<0000000000000000>] [< (null)>] (null)
[ 4003.404608] RSP: 0018:ffff9cc33937fdd0 EFLAGS: 00010246
[ 4003.410242] RAX: 0000000000000000 RBX: ffff9cc33937fe18 RCX: ffffffffc05a232c
[ 4003.416835] RDX: 0000000000000029 RSI: 0000000000000000 RDI: ffff9cc334b0eac8
[ 4003.423364] RBP: ffff9cc33937fe80 R08: 0000000000000c35 R09: 0000000000000140
[ 4003.429837] R10: 00000001003879f5 R11: ffffd50cc0cdae80 R12: ffff9cc334b0e960
[ 4003.436394] R13: ffff9cc334b0eac8 R14: ffff9cc334b0ed08 R15: ffff9cc334b0ed00
[ 4003.442829] FS: 0000000000000000(0000) GS:ffff9cc33d700000(0000) knlGS:0000000000000000
[ 4003.456660] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 4003.463410] CR2: 0000000000000000 CR3: 000000007356e000 CR4: 00000000007606e0
[ 4003.469757] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 4003.475895] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 4003.482058] PKRU: 00000000
[ 4003.486371] Call Trace:
[ 4003.490597] [] ? drbd_worker+0x112/0x530 [drbd]
[ 4003.496299] [] ? wake_up_atomic_t+0x30/0x30
[ 4003.502539] [] ? do_retry+0x290/0x290 [drbd]
[ 4003.508428] [] drbd_thread_setup+0xb6/0x1f0 [drbd]
[ 4003.515138] [] ? do_retry+0x290/0x290 [drbd]
[ 4003.529482] [] kthread+0xd1/0xe0
[ 4003.534944] [] ? insert_kthread_work+0x40/0x40
[ 4003.551754] [] ret_from_fork_nospec_begin+0x21/0x21
[ 4003.562722] [] ? insert_kthread_work+0x40/0x40
[ 4003.575279] Code: Bad RIP value.
[ 4003.585259] RIP [< (null)>] (null)
[ 4003.597228] RSP
[ 4003.608403] CR2: 0000000000000000

settings:
resource audiodata {
startup {
wfc-timeout 15;
degr-wfc-timeout 60;
}
volume 0 {
device /dev/drbd0;
disk /dev/mapper/audio_pool-audio_volume;
meta-disk internal;
}
on ip-172-31-12-50 {
node-id 0;
address 172.31.12.50:7788;
}
on ip-172-31-12-216 {
node-id 1;
address 172.31.12.216:7788;
}
connection {
host ip-172-31-12-50 address 172.31.12.50:7788;
host ip-172-31-12-216 address 172.31.12.216:7788;
net {
protocol C;
cram-hmac-alg sha1;
shared-secret "secret";
after-sb-0pri discard-younger-primary;
after-sb-1pri discard-secondary;
after-sb-2pri disconnect;
fencing resource-and-stonith;
}
}
# begin geo cluster extension
on ip-172-31-12-206 {
node-id 2;
address 172.31.12.206:7788;
}
on ip-172-31-12-143 {
node-id 3;
address 172.31.12.143:7788;
}
# extended local cluster synchronous links
connection {
host ip-172-31-12-206 address 172.31.12.206:7788;
host ip-172-31-12-143 address 172.31.12.143:7788;
net {
protocol C;
cram-hmac-alg sha1;
shared-secret "secret";
after-sb-0pri discard-younger-primary;
after-sb-1pri discard-secondary;
after-sb-2pri disconnect;
fencing resource-and-stonith;
}
}
#async links between local clusters
connection {
host ip-172-31-12-50 address 172.31.12.50:7788;
host ip-172-31-12-206 address 172.31.12.206:7788;
net {
protocol A;
}
}
connection {
host ip-172-31-12-50 address 172.31.12.50:7788;
host ip-172-31-12-143 address 172.31.12.143:7788;
net {
protocol A;
}
}
connection {
host ip-172-31-12-216 address 172.31.12.216:7788;
host ip-172-31-12-206 address 172.31.12.206:7788;
net {
protocol A;
}
}
connection {
host ip-172-31-12-216 address 172.31.12.216:7788;
host ip-172-31-12-143 address 172.31.12.143:7788;
net {
protocol A;
}
}
# end geo cluster extension
handlers {

              }

}

Unable to compile on 5.15.x kernel

I am unable to build the drbd module on 5.15.x kernels.

drbd/drbd_dax_pmem.c:66:19: error: implicit declaration of function ‘dax_get_by_host’ [-Werror=implicit-function-declaration] 66 | dax_dev = dax_get_by_host(disk_name);

I tried drbd 9.1.5 and 9.2.0-rc2 on an Arch Linux with 5.15.10 kernel.
It seems like dax_get_by_host is not exported anymore in 5.15.x, I can find it in the kallsyms but not in the header files.
I guess this is a known issue? Is there a workaround?

The document translation problem of Chinese

the url of this problem:16.4. 快速同步位图
the position of this problem: At the end of paragraph 16.4, as shown below
image

i think the "心平气和" should translate as "草率"

So the whole sentence should be translated like this
使用 drbdadm pause-sync 和drbdadm resume-sync命令也可以暂停并手动恢复重新同步。但是,您不应该草率地这样做——中断重新同步会使辅助节点的磁盘不一致的时间超过必要的时间。

drbd fails to compile in debian 11 bullseye with pve-kernel. commit 998a1fa did not resolve

Still fails even with those commit changes on branch drbd-9.1 and changes from commit 998a1fa

make[1]: Entering directory '/opt/drbd/drbd'

    Calling toplevel makefile of kernel source tree, which I believe is in
    KDIR=/lib/modules/5.15.30-1-pve/build

make -C /lib/modules/5.15.30-1-pve/build   M=/opt/drbd/drbd  modules
  COMPAT  __vmalloc_has_2_params
  COMPAT  before_4_13_kernel_read
  COMPAT  blkdev_issue_zeroout_discard
  COMPAT  can_include_vermagic_h
  COMPAT  genl_policy_in_ops
  COMPAT  have_BIO_MAX_VECS
  COMPAT  have_CRYPTO_TFM_NEED_KEY
  COMPAT  have_SHASH_DESC_ON_STACK
  COMPAT  have_WB_congested_enum
  COMPAT  have_allow_kernel_signal
  COMPAT  have_bdi_congested_fn
  COMPAT  have_bio_bi_bdev
  COMPAT  have_bio_bi_error
  COMPAT  have_bio_bi_opf
  COMPAT  have_bio_bi_status
  COMPAT  have_bio_clone_fast
  COMPAT  have_bio_op_shift
  COMPAT  have_bio_set_dev
  COMPAT  have_bio_set_op_attrs
  COMPAT  have_bio_start_io_acct
  COMPAT  have_bioset_init
  COMPAT  have_bioset_need_bvecs
  COMPAT  have_blk_alloc_disk
  COMPAT  have_blk_alloc_queue_rh
  COMPAT  have_blk_check_plugged
  COMPAT  have_blk_qc_t_make_request
  COMPAT  have_blk_queue_flag_set
  COMPAT  have_blk_queue_make_request
  COMPAT  have_blk_queue_merge_bvec
  COMPAT  have_blk_queue_split_bio
  COMPAT  have_blk_queue_split_q_bio
  COMPAT  have_blk_queue_split_q_bio_bioset
  COMPAT  have_blk_queue_update_readahead
  COMPAT  have_blk_queue_write_cache
  COMPAT  have_d_inode
  COMPAT  have_disk_update_readahead
  COMPAT  have_fallthrough
  COMPAT  have_generic_start_io_acct_q_rw_sect_part
  COMPAT  have_generic_start_io_acct_rw_sect_part
  COMPAT  have_genl_family_parallel_ops
  COMPAT  have_hd_struct
  COMPAT  have_ib_cq_init_attr
  COMPAT  have_ib_get_dma_mr
  COMPAT  have_idr_is_empty
  COMPAT  have_inode_lock
  COMPAT  have_ktime_to_timespec64
  COMPAT  have_kvfree
  COMPAT  have_max_send_recv_sge
  COMPAT  have_nla_nest_start_noflag
  COMPAT  have_nla_parse_deprecated
  COMPAT  have_nla_put_64bit
  COMPAT  have_nla_strscpy
  COMPAT  have_part_stat_h
  COMPAT  have_part_stat_read_accum
  COMPAT  have_pointer_backing_dev_info
  COMPAT  have_proc_create_single
  COMPAT  have_queue_flag_stable_writes
  COMPAT  have_rb_declare_callbacks_max
  COMPAT  have_refcount_inc
  COMPAT  have_req_hardbarrier
  COMPAT  have_req_noidle
  COMPAT  have_req_nounmap
  COMPAT  have_req_op_write
  COMPAT  have_req_op_write_zeroes
  COMPAT  have_req_write
  COMPAT  have_revalidate_disk_size
  COMPAT  have_sched_set_fifo
  COMPAT  have_security_netlink_recv
  COMPAT  have_sendpage_ok
  COMPAT  have_set_capacity_and_notify
  COMPAT  have_shash_desc_zero
  COMPAT  have_simple_positive
  COMPAT  have_sock_set_keepalive
  COMPAT  have_struct_bvec_iter
  COMPAT  have_struct_size
  COMPAT  have_submit_bio
  COMPAT  have_submit_bio_noacct
  COMPAT  have_tcp_sock_set_cork
  COMPAT  have_tcp_sock_set_nodelay
  COMPAT  have_tcp_sock_set_quickack
  COMPAT  have_time64_to_tm
  COMPAT  have_timer_setup
  COMPAT  have_void_make_request
  COMPAT  ib_alloc_pd_has_2_params
  COMPAT  ib_device_has_ops
  COMPAT  ib_post_send_const_params
  COMPAT  ib_query_device_has_3_params
  COMPAT  need_make_request_recursion
  COMPAT  part_stat_read_takes_block_device
  COMPAT  queue_limits_has_discard_zeroes_data
  COMPAT  rdma_create_id_has_net_ns
  COMPAT  sock_create_kern_has_five_parameters
  COMPAT  sock_ops_returns_addr_len
  UPD     /opt/drbd/drbd/compat.5.15.30-1-pve.h
  UPD     /opt/drbd/drbd/compat.h
  GENPATCHNAMES   5.15.30-1-pve
  SPATCH   610ca2503f32322b806dc438e639448a  5.15.30-1-pve
  PATCH
patching file drbd_req.c
patching file drbd_receiver.c
patching file drbd_nl.c
patching file drbd_main.c
patching file drbd_debugfs.c
patching file drbd-headers/linux/genl_magic_func.h
Hunk #2 succeeded at 312 (offset -20 lines).
  CC [M]  /opt/drbd/drbd/drbd_dax_pmem.o
  CC [M]  /opt/drbd/drbd/drbd_debugfs.o
  CC [M]  /opt/drbd/drbd/drbd_bitmap.o
  CC [M]  /opt/drbd/drbd/drbd_proc.o
  CC [M]  /opt/drbd/drbd/drbd_sender.o
  CC [M]  /opt/drbd/drbd/drbd_receiver.o
  CC [M]  /opt/drbd/drbd/drbd_req.o
/opt/drbd/drbd/drbd_req.c: In function ‘remote_due_to_read_balancing’:
/opt/drbd/drbd/drbd_req.c:1238:52: error: ‘struct request_queue’ has no member named ‘backing_dev_info’
 1238 |   bdi =& device->ldev->backing_bdev->bd_disk->queue->backing_dev_info;
      |                                                    ^~
make[3]: *** [scripts/Makefile.build:285: /opt/drbd/drbd/drbd_req.o] Error 1
make[2]: *** [Makefile:1875: /opt/drbd/drbd] Error 2
make[1]: *** [Makefile:132: kbuild] Error 2
make[1]: Leaving directory '/opt/drbd/drbd'
make: *** [Makefile:125: module] Error 2

Originally posted by @quiknick in #32 (comment)

Kernel Module build failure on Fedora CoreOS 411.36.202211041039-0 / 6.0.5-200.fc36.x86_64

Hi,

I trying to build the kernel module for OKD 4.11 (FCOS 36), but it fails. I using piraeus-operator for the installation, built my own image with fedora:36 following the guide in the piraeus repo.

I tried so far with v9.1.11, v9.2.0 and v9.2.1-rc.1

v9.1.11 had even more errors, so I think this issue occurs because of the kernel, probably it's too new.

FCOS Version: 411.36.202211041039-0
Kernel version: 6.0.5-200.fc36.x86_64

Could you please fix it, or guide me what to fix to make this build work?

Output of the build with v9.2.1-rc.1 (similar with v9.2.0):

Downloading: kernel-devel-6.0.5-200.fc36.x86_64.rpm
[==                                  ]   6% 1.00 MiB / 15.83 MiB
[====                                ]  12% 2.00 MiB / 15.83 MiB
[======                              ]  18% 3.00 MiB / 15.83 MiB
[=========                           ]  25% 4.00 MiB / 15.83 MiB
[===========                         ]  31% 5.00 MiB / 15.83 MiB
[=============                       ]  37% 6.00 MiB / 15.83 MiB
[===============                     ]  44% 7.00 MiB / 15.83 MiB
[==================                  ]  50% 8.00 MiB / 15.83 MiB
[====================                ]  56% 9.00 MiB / 15.83 MiB
[======================              ]  63% 10.00 MiB / 15.83 MiB
[=========================           ]  69% 11.00 MiB / 15.83 MiB
[===========================         ]  75% 12.00 MiB / 15.83 MiB
[=============================       ]  82% 13.00 MiB / 15.83 MiB
[===============================     ]  88% 14.00 MiB / 15.83 MiB
[==================================  ]  94% 15.00 MiB / 15.83 MiB
[====================================] 100% 15.83 MiB / 15.83 MiB
142424 blocks
Need a git checkout to regenerate drbd/.drbd_git_revision
make[1]: Entering directory '/tmp/pkg/drbd-9.2.1-rc.1/drbd'

    Calling toplevel makefile of kernel source tree, which I believe is in
    KDIR=/opt/usr/src/kernels/6.0.5-200.fc36.x86_64

make -C /opt/usr/src/kernels/6.0.5-200.fc36.x86_64   M=/tmp/pkg/drbd-9.2.1-rc.1/drbd  modules
  COMPAT  __vmalloc_has_2_params
  COMPAT  add_disk_returns_int
  COMPAT  before_4_13_kernel_read
  COMPAT  bio_alloc_has_4_params
  COMPAT  blkdev_issue_discard_takes_flags
  COMPAT  blkdev_issue_zeroout_discard
  COMPAT  can_include_vermagic_h
  COMPAT  dax_direct_access_takes_mode
  COMPAT  fs_dax_get_by_bdev_takes_start_off
  COMPAT  genl_policy_in_ops
  COMPAT  have_BIO_MAX_VECS
  COMPAT  have_CRYPTO_TFM_NEED_KEY
  COMPAT  have_GENHD_FL_NO_PART
  COMPAT  have_SHASH_DESC_ON_STACK
  COMPAT  have_WB_congested_enum
  COMPAT  have_allow_kernel_signal
  COMPAT  have_bdev_discard_granularity
  COMPAT  have_bdev_max_discard_sectors
  COMPAT  have_bdev_nr_sectors
  COMPAT  have_bdevname
  COMPAT  have_bdgrab
  COMPAT  have_bdi_congested
  COMPAT  have_bdi_congested_fn
  COMPAT  have_bio_alloc_clone
  COMPAT  have_bio_bi_bdev
  COMPAT  have_bio_bi_error
  COMPAT  have_bio_bi_opf
  COMPAT  have_bio_bi_status
  COMPAT  have_bio_clone_fast
  COMPAT  have_bio_op_shift
  COMPAT  have_bio_set_dev
  COMPAT  have_bio_set_op_attrs
  COMPAT  have_bio_start_io_acct
  COMPAT  have_bioset_init
  COMPAT  have_bioset_need_bvecs
  COMPAT  have_blk_alloc_disk
  COMPAT  have_blk_alloc_queue_rh
  COMPAT  have_blk_check_plugged
  COMPAT  have_blk_cleanup_disk
  COMPAT  have_blk_qc_t_make_request
  COMPAT  have_blk_qc_t_submit_bio
  COMPAT  have_blk_queue_flag_set
  COMPAT  have_blk_queue_make_request
  COMPAT  have_blk_queue_max_write_same_sectors
  COMPAT  have_blk_queue_merge_bvec
  COMPAT  have_blk_queue_split_bio
  COMPAT  have_blk_queue_split_q_bio
  COMPAT  have_blk_queue_split_q_bio_bioset
  COMPAT  have_blk_queue_update_readahead
  COMPAT  have_blk_queue_write_cache
  COMPAT  have_bvec_kmap_local
  COMPAT  have_d_inode
  COMPAT  have_disk_update_readahead
  COMPAT  have_fallthrough
  COMPAT  have_fs_dax_get_by_bdev
  COMPAT  have_generic_start_io_acct_q_rw_sect_part
  COMPAT  have_generic_start_io_acct_rw_sect_part
  COMPAT  have_hd_struct
  COMPAT  have_ib_cq_init_attr
  COMPAT  have_ib_get_dma_mr
  COMPAT  have_idr_is_empty
  COMPAT  have_inode_lock
  COMPAT  have_ktime_to_timespec64
  COMPAT  have_kvfree
  COMPAT  have_kvfree_rcu
  COMPAT  have_list_is_first
  COMPAT  have_max_send_recv_sge
  COMPAT  have_nla_nest_start_noflag
  COMPAT  have_nla_parse_deprecated
  COMPAT  have_nla_put_64bit
  COMPAT  have_nla_strscpy
  COMPAT  have_part_stat_h
  COMPAT  have_part_stat_read_accum
  COMPAT  have_pointer_backing_dev_info
  COMPAT  have_proc_create_single
  COMPAT  have_queue_flag_discard
  COMPAT  have_queue_flag_stable_writes
  COMPAT  have_rb_declare_callbacks_max
  COMPAT  have_refcount_inc
  COMPAT  have_req_hardbarrier
  COMPAT  have_req_noidle
  COMPAT  have_req_nounmap
  COMPAT  have_req_op_write
  COMPAT  have_req_op_write_zeroes
  COMPAT  have_req_write
  COMPAT  have_revalidate_disk_size
  COMPAT  have_sched_set_fifo
  COMPAT  have_security_netlink_recv
  COMPAT  have_sendpage_ok
  COMPAT  have_set_capacity_and_notify
  COMPAT  have_shash_desc_zero
  COMPAT  have_simple_positive
  COMPAT  have_sock_set_keepalive
  COMPAT  have_strscpy
  COMPAT  have_struct_bvec_iter
  COMPAT  have_struct_size
  COMPAT  have_submit_bio_noacct
  COMPAT  have_tcp_sock_set_cork
  COMPAT  have_tcp_sock_set_nodelay
  COMPAT  have_tcp_sock_set_quickack
  COMPAT  have_time64_to_tm
  COMPAT  have_timer_setup
  COMPAT  have_void_make_request
  COMPAT  have_void_submit_bio
  COMPAT  ib_alloc_pd_has_2_params
  COMPAT  ib_device_has_ops
  COMPAT  ib_post_send_const_params
  COMPAT  ib_query_device_has_3_params
  COMPAT  need_drbd_wrappers
  COMPAT  need_make_request_recursion
  COMPAT  need_skb_abort_seq_read
  COMPAT  part_stat_read_takes_block_device
  COMPAT  queue_limits_has_discard_zeroes_data
  COMPAT  rdma_create_id_has_net_ns
  COMPAT  rdma_reject_has_reason_arg
  COMPAT  sk_data_ready_has_1_param
  COMPAT  sock_create_kern_has_netns_parameter
  COMPAT  sock_ops_returns_addr_len
  COMPAT  struct_gendisk_has_backing_dev_info
  UPD     /tmp/pkg/drbd-9.2.1-rc.1/drbd/compat.6.0.5-200.fc36.x86_64.h
  UPD     /tmp/pkg/drbd-9.2.1-rc.1/drbd/compat.h
  GENPATCHNAMES   6.0.5-200.fc36.x86_64
  SPATCH   3eb6d1043e5e57694ddd4b1222e73336  6.0.5-200.fc36.x86_64
  PATCH
patching file drbd_sender.c
patching file drbd_req.c
patching file drbd_receiver.c
patching file drbd_nl.c
patching file drbd_main.c
patching file drbd_debugfs.c
patching file drbd_dax_pmem.c
  CC [M]  /tmp/pkg/drbd-9.2.1-rc.1/drbd/drbd_dax_pmem.o
/tmp/pkg/drbd-9.2.1-rc.1/drbd/drbd_dax_pmem.c: In function 'drbd_dax_open':
/tmp/pkg/drbd-9.2.1-rc.1/drbd/drbd_dax_pmem.c:62:19: error: too few arguments to function 'fs_dax_get_by_bdev'
   62 |         dax_dev = fs_dax_get_by_bdev(bdev->md_bdev);
      |                   ^~~~~~~~~~~~~~~~~~
In file included from /tmp/pkg/drbd-9.2.1-rc.1/drbd/drbd_dax_pmem.c:24:
./include/linux/dax.h:134:20: note: declared here
  134 | struct dax_device *fs_dax_get_by_bdev(struct block_device *bdev, u64 *start_off,
      |                    ^~~~~~~~~~~~~~~~~~
make[3]: *** [scripts/Makefile.build:249: /tmp/pkg/drbd-9.2.1-rc.1/drbd/drbd_dax_pmem.o] Error 1
make[2]: *** [Makefile:1856: /tmp/pkg/drbd-9.2.1-rc.1/drbd] Error 2
make[1]: Leaving directory '/tmp/pkg/drbd-9.2.1-rc.1/drbd'
make[1]: *** [Makefile:134: kbuild] Error 2
make: *** [Makefile:126: module] Error 2

Could not find the expexted *.ko, see stderr for more details

Thanks!

drbd with rdma gets stuck when shutting down a resource

update:

wait_event(rdma_transport->cm_count_wait, !atomic_read(&rdma_transport->cm_count));

this is the line it gets stuck.

drbd with rdma gets stuck when disconnecting a resource in sync

here are the logs we could retrieve:

n 12 15:24:16 os-hv-am-4 kernel: [  232.569733] drbd vm-57da08a3-2938-4769-8959-3a11616d6ef9/0 drbd8 os-hv-am-3: helper command: /sbin/drbdadm initial-split-brain
Jun 12 15:24:16 os-hv-am-4 kernel: [  232.571421] drbd vm-57da08a3-2938-4769-8959-3a11616d6ef9/0 drbd8 os-hv-am-3: helper command: /sbin/drbdadm initial-split-brain exit code 0
Jun 12 15:24:16 os-hv-am-4 kernel: [  232.571439] drbd vm-57da08a3-2938-4769-8959-3a11616d6ef9/0 drbd8 os-hv-am-3: Split-Brain detected but unresolved, dropping connection!
Jun 12 15:24:16 os-hv-am-4 kernel: [  232.572513] drbd vm-57da08a3-2938-4769-8959-3a11616d6ef9/0 drbd8 os-hv-am-3: helper command: /sbin/drbdadm split-brain
Jun 12 15:24:16 os-hv-am-4 kernel: [  232.574101] drbd vm-57da08a3-2938-4769-8959-3a11616d6ef9/0 drbd8 os-hv-am-3: helper command: /sbin/drbdadm split-brain exit code 0
Jun 12 15:24:16 os-hv-am-4 kernel: [  232.574129] drbd vm-57da08a3-2938-4769-8959-3a11616d6ef9: Aborting cluster-wide state change 3531582257 (20ms) rv = -27
Jun 12 15:24:16 os-hv-am-4 kernel: [  232.574141] drbd vm-57da08a3-2938-4769-8959-3a11616d6ef9 os-hv-am-3: conn( Connecting -> Disconnecting )
Jun 12 15:24:16 os-hv-am-4 kernel: [  232.574164] drbd vm-57da08a3-2938-4769-8959-3a11616d6ef9 os-hv-am-3: sock_recvmsg returned -4
Jun 12 15:24:16 os-hv-am-4 kernel: [  232.574242] drbd vm-57da08a3-2938-4769-8959-3a11616d6ef9 os-hv-am-3: Terminating sender thread
Jun 12 15:24:16 os-hv-am-4 kernel: [  232.574260] drbd vm-57da08a3-2938-4769-8959-3a11616d6ef9 os-hv-am-3: Starting sender thread (from drbd_r_vm-57da0 [3519])
Jun 12 15:24:16 os-hv-am-4 kernel: [  232.576227] drbd vm-57da08a3-2938-4769-8959-3a11616d6ef9 os-hv-am-3: pp_use = 1307, expected 0
Jun 12 15:24:16 os-hv-am-4 kernel: [  232.576229] drbd vm-57da08a3-2938-4769-8959-3a11616d6ef9 os-hv-am-3: Connection closed
Jun 12 15:24:16 os-hv-am-4 kernel: [  232.576239] drbd vm-57da08a3-2938-4769-8959-3a11616d6ef9 os-hv-am-3: helper command: /sbin/drbdadm disconnected
Jun 12 15:24:16 os-hv-am-4 kernel: [  232.577812] drbd vm-57da08a3-2938-4769-8959-3a11616d6ef9 os-hv-am-3: helper command: /sbin/drbdadm disconnected exit code 0
Jun 12 15:24:16 os-hv-am-4 kernel: [  232.577830] drbd vm-57da08a3-2938-4769-8959-3a11616d6ef9 os-hv-am-3: conn( Disconnecting -> StandAlone )
Jun 12 15:24:16 os-hv-am-4 kernel: [  232.577834] drbd vm-57da08a3-2938-4769-8959-3a11616d6ef9 os-hv-am-3: Terminating receiver thread
Jun 12 15:24:44 os-hv-am-4 kernel: [  260.140302] drbd vm-57da08a3-2938-4769-8959-3a11616d6ef9 os-hv-am-3: conn( StandAlone -> Unconnected )
Jun 12 15:24:44 os-hv-am-4 kernel: [  260.140350] drbd vm-57da08a3-2938-4769-8959-3a11616d6ef9 os-hv-am-3: Starting receiver thread (from drbd_w_vm-57da0 [3500])
Jun 12 15:24:44 os-hv-am-4 kernel: [  260.140513] drbd vm-57da08a3-2938-4769-8959-3a11616d6ef9 os-hv-am-3: conn( Unconnected -> Connecting )
Jun 12 15:24:44 os-hv-am-4 kernel: [  260.148270] drbd vm-57da08a3-2938-4769-8959-3a11616d6ef9 os-hv-am-3: Handshake to peer 1 successful: Agreed network protocol version 121
Jun 12 15:24:44 os-hv-am-4 kernel: [  260.148273] drbd vm-57da08a3-2938-4769-8959-3a11616d6ef9 os-hv-am-3: Feature flags enabled on protocol level: 0x3f TRIM THIN_RESYNC WRITE_SAME WRITE_ZEROES RESYNC_DAGTAG
Jun 12 15:24:44 os-hv-am-4 kernel: [  260.148316] drbd vm-57da08a3-2938-4769-8959-3a11616d6ef9 os-hv-am-3: Peer authenticated using 20 bytes HMAC
Jun 12 15:24:44 os-hv-am-4 kernel: [  260.157481] drbd vm-57da08a3-2938-4769-8959-3a11616d6ef9: Preparing cluster-wide state change 105203894 (0->1 499/145)
Jun 12 15:24:44 os-hv-am-4 kernel: [  260.193492] drbd vm-57da08a3-2938-4769-8959-3a11616d6ef9/0 drbd8 os-hv-am-3: drbd_sync_handshake:
Jun 12 15:24:44 os-hv-am-4 kernel: [  260.193497] drbd vm-57da08a3-2938-4769-8959-3a11616d6ef9/0 drbd8 os-hv-am-3: self 78ECCE9CE229714C:20F264F5A750B1D5:23B3AB54E1225FE6:940F040BBBAE35D4 bits:1264914 flags:22
Jun 12 15:24:44 os-hv-am-4 kernel: [  260.193500] drbd vm-57da08a3-2938-4769-8959-3a11616d6ef9/0 drbd8 os-hv-am-3: peer 5B8C3B4DEBC7DD0A:20F264F5A750B1D4:9AD186D68FBE0520:0000000000000000 bits:1264658 flags:1025
Jun 12 15:24:44 os-hv-am-4 kernel: [  260.193502] drbd vm-57da08a3-2938-4769-8959-3a11616d6ef9/0 drbd8 os-hv-am-3: uuid_compare()=split-brain-auto-recover by rule=bitmap-both
Jun 12 15:24:44 os-hv-am-4 kernel: [  260.193512] drbd vm-57da08a3-2938-4769-8959-3a11616d6ef9/0 drbd8 os-hv-am-3: helper command: /sbin/drbdadm initial-split-brain
Jun 12 15:24:44 os-hv-am-4 kernel: [  260.195114] drbd vm-57da08a3-2938-4769-8959-3a11616d6ef9/0 drbd8 os-hv-am-3: helper command: /sbin/drbdadm initial-split-brain exit code 0
Jun 12 15:24:44 os-hv-am-4 kernel: [  260.195125] drbd vm-57da08a3-2938-4769-8959-3a11616d6ef9/0 drbd8 os-hv-am-3: Split-Brain detected, manually solved. Sync from this node
Jun 12 15:24:44 os-hv-am-4 kernel: [  260.195139] drbd vm-57da08a3-2938-4769-8959-3a11616d6ef9: State change 105203894: primary_nodes=1, weak_nodes=FFFFFFFFFFFFFFFC
Jun 12 15:24:44 os-hv-am-4 kernel: [  260.195142] drbd vm-57da08a3-2938-4769-8959-3a11616d6ef9: Committing cluster-wide state change 105203894 (36ms)
Jun 12 15:24:44 os-hv-am-4 kernel: [  260.195155] drbd vm-57da08a3-2938-4769-8959-3a11616d6ef9 os-hv-am-3: conn( Connecting -> Connected ) peer( Unknown -> Secondary )
Jun 12 15:24:44 os-hv-am-4 kernel: [  260.195157] drbd vm-57da08a3-2938-4769-8959-3a11616d6ef9/0 drbd8 os-hv-am-3: pdsk( Outdated -> Inconsistent ) repl( Off -> WFBitMapS )
Jun 12 15:24:44 os-hv-am-4 kernel: [  260.200092] drbd vm-57da08a3-2938-4769-8959-3a11616d6ef9/0 drbd8 os-hv-am-3: send bitmap stats [Bytes(packets)]: plain 0(0), RLE 494(1), total 494; compression: 100.0%
Jun 12 15:24:44 os-hv-am-4 kernel: [  260.205201] drbd vm-57da08a3-2938-4769-8959-3a11616d6ef9/0 drbd8 os-hv-am-3: receive bitmap stats [Bytes(packets)]: plain 0(0), RLE 494(1), total 494; compression: 100.0%
Jun 12 15:24:44 os-hv-am-4 kernel: [  260.205214] drbd vm-57da08a3-2938-4769-8959-3a11616d6ef9/0 drbd8 os-hv-am-3: helper command: /sbin/drbdadm before-resync-source
Jun 12 15:24:44 os-hv-am-4 kernel: [  260.206764] drbd vm-57da08a3-2938-4769-8959-3a11616d6ef9/0 drbd8 os-hv-am-3: helper command: /sbin/drbdadm before-resync-source exit code 0
Jun 12 15:24:44 os-hv-am-4 kernel: [  260.206775] drbd vm-57da08a3-2938-4769-8959-3a11616d6ef9/0 drbd8 os-hv-am-3: repl( WFBitMapS -> SyncSource )
Jun 12 15:24:44 os-hv-am-4 kernel: [  260.206824] drbd vm-57da08a3-2938-4769-8959-3a11616d6ef9/0 drbd8 os-hv-am-3: Began resync as SyncSource (will sync 5059656 KB [1264914 bits set]).
Jun 12 15:24:44 os-hv-am-4 kernel: [  260.207282] ------------[ cut here ]------------
Jun 12 15:24:44 os-hv-am-4 kernel: [  260.207293] WARNING: CPU: 114 PID: 0 at kernel/softirq.c:168 __local_bh_enable_ip+0x3a/0x60
Jun 12 15:24:44 os-hv-am-4 kernel: [  260.207294] Modules linked in: drbd_transport_rdma(OE) nvmet_rdma nvmet drbd(OE) lru_cache nf_conntrack_netlink xfrm_user xfrm_algo xt_addrtype br_netfilter xt_CHECKSUM xt_MASQUERADE xt_conntrack ipt_REJECT nf_reject_ipv4 xt_tcpudp ip6table_mangle ip6table_nat iptable_mangle iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nf_tables nfnetlink ip6table_filter ip6_tables iptable_filter bpfilter aufs overlay nls_iso8859_1 ipmi_ssif amd64_edac_mod rpcrdma edac_mce_amd sunrpc rdma_ucm kvm_amd ib_iser kvm rdma_cm iw_cm crct10dif_pclmul libiscsi ib_umad ghash_clmulni_intel scsi_transport_iscsi ib_ipoib aesni_intel ib_cm crypto_simd cryptd glue_helper dell_smbios dcdbas mgag200 drm_vram_helper dell_wmi_descriptor wmi_bmof ttm drm_kms_helper i2c_algo_bit fb_sys_fops syscopyarea sysfillrect sysimgblt ccp k10temp ipmi_si ipmi_devintf ipmi_msghandler acpi_power_meter mac_hid bridge stp llc bonding sch_fq_codel ramoops reed_solomon drm efi_pstore ip_tables x_tables autofs4 raid10 raid456
Jun 12 15:24:44 os-hv-am-4 kernel: [  260.207334]  async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid0 multipath linear mlx4_ib mlx4_en mlx5_ib ib_uverbs raid1 ib_core crc32_pclmul mlx5_core mlx4_core tg3 ahci nvme pci_hyperv_intf libahci tls nvme_core mlxfw i2c_piix4 wmi
Jun 12 15:24:44 os-hv-am-4 kernel: [  260.207348] CPU: 114 PID: 0 Comm: swapper/114 Tainted: G        W  OE     5.4.0-139-generic #156-Ubuntu
Jun 12 15:24:44 os-hv-am-4 kernel: [  260.207349] Hardware name: Dell Inc. PowerEdge R6515/0R4CNN, BIOS 2.7.3 03/31/2022
Jun 12 15:24:44 os-hv-am-4 kernel: [  260.207353] RIP: 0010:__local_bh_enable_ip+0x3a/0x60
Jun 12 15:24:44 os-hv-am-4 kernel: [  260.207355] Code: a9 00 00 0f 00 75 23 83 ee 01 f7 de 65 01 35 05 af 37 7e 65 8b 05 fe ae 37 7e a9 00 ff 1f 00 74 0d 5d 65 ff 0d ef ae 37 7e c3 <0f> 0b eb d9 65 66 8b 05 da e9 38 7e 66 85 c0 74 e6 e8 60 ff ff ff
Jun 12 15:24:44 os-hv-am-4 kernel: [  260.207357] RSP: 0018:ffffab6e41e18c48 EFLAGS: 00010006
Jun 12 15:24:44 os-hv-am-4 kernel: [  260.207358] RAX: 0000000080010200 RBX: ffff8a5c2380c800 RCX: 0000000000000000
Jun 12 15:24:44 os-hv-am-4 kernel: [  260.207359] RDX: 0000000000000001 RSI: 0000000000000200 RDI: ffffffffc0d0ab62
Jun 12 15:24:44 os-hv-am-4 kernel: [  260.207360] RBP: ffffab6e41e18c48 R08: ffff8a5c18528b68 R09: ffff8a5c18528b90
Jun 12 15:24:44 os-hv-am-4 kernel: [  260.207361] R10: 0000000000000000 R11: ffff8a5bfd7e0060 R12: 0000000000000001
Jun 12 15:24:44 os-hv-am-4 kernel: [  260.207362] R13: ffff8a5c2380c858 R14: 0000000000000000 R15: ffff8a5b7ebee010
Jun 12 15:24:44 os-hv-am-4 kernel: [  260.207364] FS:  0000000000000000(0000) GS:ffff8a5c3f080000(0000) knlGS:0000000000000000
Jun 12 15:24:44 os-hv-am-4 kernel: [  260.207365] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jun 12 15:24:44 os-hv-am-4 kernel: [  260.207366] CR2: 000055e261248048 CR3: 000000bd5ca66000 CR4: 0000000000340ee0
Jun 12 15:24:44 os-hv-am-4 kernel: [  260.207367] Call Trace:
Jun 12 15:24:44 os-hv-am-4 kernel: [  260.207368]  <IRQ>
Jun 12 15:24:44 os-hv-am-4 kernel: [  260.207374]  _raw_spin_unlock_bh+0x1e/0x20
Jun 12 15:24:44 os-hv-am-4 kernel: [  260.207389]  update_peer_seq+0x62/0x80 [drbd]
Jun 12 15:24:44 os-hv-am-4 kernel: [  260.207401]  got_BlockAck+0x63/0x2c0 [drbd]
Jun 12 15:24:44 os-hv-am-4 kernel: [  260.207412]  drbd_control_data_ready+0x8b/0x1e0 [drbd]
Jun 12 15:24:44 os-hv-am-4 kernel: [  260.207417]  dtr_control_data_ready.isra.0+0x9b/0x100 [drbd_transport_rdma]
Jun 12 15:24:44 os-hv-am-4 kernel: [  260.207420]  dtr_rx_cq_event_handler+0x2a9/0x640 [drbd_transport_rdma]
Jun 12 15:24:44 os-hv-am-4 kernel: [  260.207432]  mlx5_ib_cq_comp+0x26/0x30 [mlx5_ib]
Jun 12 15:24:44 os-hv-am-4 kernel: [  260.207457]  mlx5_eq_comp_int+0xa1/0x180 [mlx5_core]
Jun 12 15:24:44 os-hv-am-4 kernel: [  260.207461]  notifier_call_chain+0x55/0x80
Jun 12 15:24:44 os-hv-am-4 kernel: [  260.207464]  atomic_notifier_call_chain+0x1a/0x20
Jun 12 15:24:44 os-hv-am-4 kernel: [  260.207494]  mlx5_irq_int_handler+0x15/0x20 [mlx5_core]
Jun 12 15:24:44 os-hv-am-4 kernel: [  260.207496]  __handle_irq_event_percpu+0x42/0x180
Jun 12 15:24:44 os-hv-am-4 kernel: [  260.207499]  handle_irq_event_percpu+0x33/0x80
Jun 12 15:24:44 os-hv-am-4 kernel: [  260.207501]  handle_irq_event+0x3b/0x60
Jun 12 15:24:44 os-hv-am-4 kernel: [  260.207503]  handle_edge_irq+0x93/0x1c0
Jun 12 15:24:44 os-hv-am-4 kernel: [  260.207505]  do_IRQ+0x55/0xf0
Jun 12 15:24:44 os-hv-am-4 kernel: [  260.207507]  common_interrupt+0xf/0xf
Jun 12 15:24:44 os-hv-am-4 kernel: [  260.207508]  </IRQ>
Jun 12 15:24:44 os-hv-am-4 kernel: [  260.207511] RIP: 0010:cpuidle_enter_state+0xc5/0x450
Jun 12 15:24:44 os-hv-am-4 kernel: [  260.207513] Code: ff e8 1f f7 83 ff 80 7d c7 00 74 17 9c 58 0f 1f 44 00 00 f6 c4 02 0f 85 65 03 00 00 31 ff e8 92 0c 8a ff fb 66 0f 1f 44 00 00 <45> 85 ed 0f 88 8f 02 00 00 49 63 cd 4c 8b 7d d0 4c 2b 7d c8 48 8d
Jun 12 15:24:44 os-hv-am-4 kernel: [  260.207514] RSP: 0018:ffffab6e4068fe38 EFLAGS: 00000246 ORIG_RAX: ffffffffffffffd9
Jun 12 15:24:44 os-hv-am-4 kernel: [  260.207516] RAX: ffff8a5c3f0afe80 RBX: ffffffff83769620 RCX: 000000000000001f
Jun 12 15:24:44 os-hv-am-4 kernel: [  260.207516] RDX: 0000000000000000 RSI: 0000000038feef44 RDI: 0000000000000000
Jun 12 15:24:44 os-hv-am-4 kernel: [  260.207517] RBP: ffffab6e4068fe78 R08: 0000003c958fd955 R09: 0000003cc5b70acd
Jun 12 15:24:44 os-hv-am-4 kernel: [  260.207518] R10: ffff8a5c3f0aeb80 R11: ffff8a5c3f0aeb60 R12: ffff8a5bfea05800
Jun 12 15:24:44 os-hv-am-4 kernel: [  260.207519] R13: 0000000000000002 R14: 0000000000000002 R15: ffff8a5bfea05800
Jun 12 15:24:44 os-hv-am-4 kernel: [  260.207523]  ? cpuidle_enter_state+0xa1/0x450
Jun 12 15:24:44 os-hv-am-4 kernel: [  260.207525]  cpuidle_enter+0x2e/0x40
Jun 12 15:24:44 os-hv-am-4 kernel: [  260.207527]  call_cpuidle+0x23/0x40
Jun 12 15:24:44 os-hv-am-4 kernel: [  260.207529]  do_idle+0x1dd/0x270
Jun 12 15:24:44 os-hv-am-4 kernel: [  260.207532]  cpu_startup_entry+0x20/0x30
Jun 12 15:24:44 os-hv-am-4 kernel: [  260.207534]  start_secondary+0x167/0x1c0
Jun 12 15:24:44 os-hv-am-4 kernel: [  260.207537]  secondary_startup_64+0xa4/0xb0
Jun 12 15:24:44 os-hv-am-4 kernel: [  260.207539] ---[ end trace 4c646c84e9801deb ]---
Jun 12 15:25:44 os-hv-am-4 kernel: [  320.872034] drbd vm-57da08a3-2938-4769-8959-3a11616d6ef9 os-hv-am-3: Preparing remote state change 1094725292
Jun 12 15:25:44 os-hv-am-4 kernel: [  320.872097] drbd vm-57da08a3-2938-4769-8959-3a11616d6ef9 os-hv-am-3: Committing remote state change 1094725292 (primary_nodes=1)
Jun 12 15:25:44 os-hv-am-4 kernel: [  320.872103] drbd vm-57da08a3-2938-4769-8959-3a11616d6ef9 os-hv-am-3: conn( Connected -> TearDown ) peer( Secondary -> Unknown )
Jun 12 15:25:44 os-hv-am-4 kernel: [  320.872105] drbd vm-57da08a3-2938-4769-8959-3a11616d6ef9/0 drbd8 os-hv-am-3: repl( SyncSource -> Off )
Jun 12 15:25:44 os-hv-am-4 kernel: [  320.872185] drbd vm-57da08a3-2938-4769-8959-3a11616d6ef9/0 drbd8 os-hv-am-3: Sending resync reply failed
Jun 12 15:25:44 os-hv-am-4 kernel: [  320.872415] drbd vm-57da08a3-2938-4769-8959-3a11616d6ef9 os-hv-am-3: Terminating sender thread
Jun 12 15:25:44 os-hv-am-4 kernel: [  320.872992] drbd vm-57da08a3-2938-4769-8959-3a11616d6ef9 os-hv-am-3: Starting sender thread (from drbd_r_vm-57da0 [3539])
Jun 12 15:25:45 os-hv-am-4 kernel: [  321.875133] drbd vm-57da08a3-2938-4769-8959-3a11616d6ef9 rdma:os-hv-am-3: WARN: not properly disconnected, state = 6
Jun 12 15:25:45 os-hv-am-4 kernel: [  321.875779] drbd vm-57da08a3-2938-4769-8959-3a11616d6ef9 os-hv-am-3: pp_use = 2609, expected 0
Jun 12 15:25:45 os-hv-am-4 kernel: [  321.875781] drbd vm-57da08a3-2938-4769-8959-3a11616d6ef9 os-hv-am-3: Connection closed
Jun 12 15:25:45 os-hv-am-4 kernel: [  321.875793] drbd vm-57da08a3-2938-4769-8959-3a11616d6ef9 os-hv-am-3: helper command: /sbin/drbdadm disconnected
Jun 12 15:25:45 os-hv-am-4 kernel: [  321.877358] drbd vm-57da08a3-2938-4769-8959-3a11616d6ef9 os-hv-am-3: helper command: /sbin/drbdadm disconnected exit code 0
Jun 12 15:25:45 os-hv-am-4 kernel: [  321.877372] drbd vm-57da08a3-2938-4769-8959-3a11616d6ef9 os-hv-am-3: conn( TearDown -> Unconnected )
Jun 12 15:25:45 os-hv-am-4 kernel: [  321.877382] drbd vm-57da08a3-2938-4769-8959-3a11616d6ef9 os-hv-am-3: Restarting receiver thread
Jun 12 15:25:45 os-hv-am-4 kernel: [  321.877386] drbd vm-57da08a3-2938-4769-8959-3a11616d6ef9 os-hv-am-3: conn( Unconnected -> Connecting )
Jun 12 15:25:52 os-hv-am-4 kernel: [  328.185970] drbd vm-57da08a3-2938-4769-8959-3a11616d6ef9 os-hv-am-3: conn( Connecting -> Disconnecting )
Jun 12 15:25:52 os-hv-am-4 kernel: [  328.186035] drbd vm-57da08a3-2938-4769-8959-3a11616d6ef9 os-hv-am-3: Failed to initiate connection, err=-512
Jun 12 15:25:52 os-hv-am-4 kernel: [  328.186062] drbd vm-57da08a3-2938-4769-8959-3a11616d6ef9 os-hv-am-3: Terminating sender thread
Jun 12 15:25:52 os-hv-am-4 kernel: [  328.186066] drbd vm-57da08a3-2938-4769-8959-3a11616d6ef9 os-hv-am-3: Starting sender thread (from drbd_r_vm-57da0 [3539])
Jun 12 15:25:52 os-hv-am-4 kernel: [  328.186473] drbd vm-57da08a3-2938-4769-8959-3a11616d6ef9 os-hv-am-3: pp_use = 2610, expected 0
Jun 12 15:25:52 os-hv-am-4 kernel: [  328.186475] drbd vm-57da08a3-2938-4769-8959-3a11616d6ef9 os-hv-am-3: Connection closed
Jun 12 15:25:52 os-hv-am-4 kernel: [  328.186484] drbd vm-57da08a3-2938-4769-8959-3a11616d6ef9 os-hv-am-3: helper command: /sbin/drbdadm disconnected
Jun 12 15:25:52 os-hv-am-4 kernel: [  328.188149] drbd vm-57da08a3-2938-4769-8959-3a11616d6ef9 os-hv-am-3: helper command: /sbin/drbdadm disconnected exit code 0
Jun 12 15:25:52 os-hv-am-4 kernel: [  328.188176] drbd vm-57da08a3-2938-4769-8959-3a11616d6ef9 os-hv-am-3: conn( Disconnecting -> StandAlone )
Jun 12 15:25:52 os-hv-am-4 kernel: [  328.188180] drbd vm-57da08a3-2938-4769-8959-3a11616d6ef9 os-hv-am-3: Terminating receiver thread
Jun 12 15:26:02 os-hv-am-4 kernel: [  338.882721] drbd vm-57da08a3-2938-4769-8959-3a11616d6ef9: Preparing cluster-wide state change 2022799093 (0->-1 3/2)
Jun 12 15:26:02 os-hv-am-4 kernel: [  338.882724] drbd vm-57da08a3-2938-4769-8959-3a11616d6ef9: Committing cluster-wide state change 2022799093 (0ms)
Jun 12 15:26:02 os-hv-am-4 kernel: [  338.882728] drbd vm-57da08a3-2938-4769-8959-3a11616d6ef9: role( Primary -> Secondary )
Jun 12 15:26:02 os-hv-am-4 kernel: [  338.883097] drbd vm-57da08a3-2938-4769-8959-3a11616d6ef9 os-hv-am-3: Terminating sender thread
Jun 12 15:28:33 os-hv-am-4 kernel: [  489.615298] INFO: task drbdsetup:3646 blocked for more than 120 seconds.
Jun 12 15:28:33 os-hv-am-4 kernel: [  489.615345]       Tainted: G        W  OE     5.4.0-139-generic #156-Ubuntu
Jun 12 15:28:33 os-hv-am-4 kernel: [  489.615382] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jun 12 15:28:33 os-hv-am-4 kernel: [  489.615424] drbdsetup       D    0  3646      1 0x00004004
Jun 12 15:28:33 os-hv-am-4 kernel: [  489.615427] Call Trace:
Jun 12 15:28:33 os-hv-am-4 kernel: [  489.615438]  __schedule+0x2e3/0x740
Jun 12 15:28:33 os-hv-am-4 kernel: [  489.615441]  schedule+0x42/0xb0
Jun 12 15:28:33 os-hv-am-4 kernel: [  489.615450]  dtr_free+0x315/0x350 [drbd_transport_rdma]
Jun 12 15:28:33 os-hv-am-4 kernel: [  489.615455]  ? __wake_up_pollfree+0x40/0x40
Jun 12 15:28:33 os-hv-am-4 kernel: [  489.615473]  drbd_transport_shutdown+0x64/0xa0 [drbd]
Jun 12 15:28:33 os-hv-am-4 kernel: [  489.615487]  drbd_unregister_connection+0x12e/0x220 [drbd]
Jun 12 15:28:33 os-hv-am-4 kernel: [  489.615501]  del_connection+0x95/0x150 [drbd]
Jun 12 15:28:33 os-hv-am-4 kernel: [  489.615503]  ? _cond_resched+0x19/0x30
Jun 12 15:28:33 os-hv-am-4 kernel: [  489.615515]  drbd_adm_down+0xee/0x350 [drbd]
Jun 12 15:28:33 os-hv-am-4 kernel: [  489.615518]  ? remove_wait_queue+0x47/0x50
Jun 12 15:28:33 os-hv-am-4 kernel: [  489.615523]  genl_family_rcv_msg+0x1b9/0x470
Jun 12 15:28:33 os-hv-am-4 kernel: [  489.615526]  genl_rcv_msg+0x4c/0xa0
Jun 12 15:28:33 os-hv-am-4 kernel: [  489.615528]  ? _cond_resched+0x19/0x30
Jun 12 15:28:33 os-hv-am-4 kernel: [  489.615529]  ? genl_family_rcv_msg+0x470/0x470
Jun 12 15:28:33 os-hv-am-4 kernel: [  489.615531]  netlink_rcv_skb+0x50/0x120
Jun 12 15:28:33 os-hv-am-4 kernel: [  489.615533]  genl_rcv+0x29/0x40
Jun 12 15:28:33 os-hv-am-4 kernel: [  489.615535]  netlink_unicast+0x1a8/0x250
Jun 12 15:28:33 os-hv-am-4 kernel: [  489.615537]  netlink_sendmsg+0x240/0x480
Jun 12 15:28:33 os-hv-am-4 kernel: [  489.615540]  ? mem_cgroup_commit_charge+0x63/0x490
Jun 12 15:28:33 os-hv-am-4 kernel: [  489.615545]  sock_sendmsg+0x65/0x70
Jun 12 15:28:33 os-hv-am-4 kernel: [  489.615547]  sock_write_iter+0x93/0xf0
Jun 12 15:28:33 os-hv-am-4 kernel: [  489.615551]  new_sync_write+0x125/0x1c0
Jun 12 15:28:33 os-hv-am-4 kernel: [  489.615554]  __vfs_write+0x29/0x40
Jun 12 15:28:33 os-hv-am-4 kernel: [  489.615556]  vfs_write+0xb9/0x1a0
Jun 12 15:28:33 os-hv-am-4 kernel: [  489.615558]  ksys_write+0x67/0xe0
Jun 12 15:28:33 os-hv-am-4 kernel: [  489.615560]  __x64_sys_write+0x1a/0x20
Jun 12 15:28:33 os-hv-am-4 kernel: [  489.615564]  do_syscall_64+0x57/0x190
Jun 12 15:28:33 os-hv-am-4 kernel: [  489.615566]  entry_SYSCALL_64_after_hwframe+0x5c/0xc1
Jun 12 15:28:33 os-hv-am-4 kernel: [  489.615569] RIP: 0033:0x7f08fa258077
Jun 12 15:28:33 os-hv-am-4 kernel: [  489.615575] Code: Bad RIP value.
Jun 12 15:28:33 os-hv-am-4 kernel: [  489.615576] RSP: 002b:00007ffd3fb2c108 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
Jun 12 15:28:33 os-hv-am-4 kernel: [  489.615579] RAX: ffffffffffffffda RBX: 000000000000004c RCX: 00007f08fa258077
Jun 12 15:28:33 os-hv-am-4 kernel: [  489.615580] RDX: 000000000000004c RSI: 0000564099f3dd40 RDI: 0000000000000004
Jun 12 15:28:33 os-hv-am-4 kernel: [  489.615581] RBP: 0000564099f3dd40 R08: 00007ffd3fb2c184 R09: 0000000000000000
Jun 12 15:28:33 os-hv-am-4 kernel: [  489.615582] R10: 0000564099f3b010 R11: 0000000000000246 R12: 000000000000004c
Jun 12 15:28:33 os-hv-am-4 kernel: [  489.615583] R13: 0000000000000004 R14: 00000000ffffffff R15: 0000000000000000
Jun 12 15:28:43 os-hv-am-4 kernel: [  499.911721] sysrq: Emergency Sync
Jun 12 15:28:43 os-hv-am-4 kernel: [  499.913650] Emergency Sync complete
Jun 12 15:30:34 os-hv-am-4 kernel: [  610.443007] INFO: task drbdsetup:3646 blocked for more than 241 seconds.
Jun 12 15:30:34 os-hv-am-4 kernel: [  610.443056]       Tainted: G        W  OE     5.4.0-139-generic #156-Ubuntu
Jun 12 15:30:34 os-hv-am-4 kernel: [  610.443093] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jun 12 15:30:34 os-hv-am-4 kernel: [  610.443134] drbdsetup       D    0  3646      1 0x00004004
Jun 12 15:30:34 os-hv-am-4 kernel: [  610.443137] Call Trace:
Jun 12 15:30:34 os-hv-am-4 kernel: [  610.443147]  __schedule+0x2e3/0x740
Jun 12 15:30:34 os-hv-am-4 kernel: [  610.443150]  schedule+0x42/0xb0
Jun 12 15:30:34 os-hv-am-4 kernel: [  610.443159]  dtr_free+0x315/0x350 [drbd_transport_rdma]
Jun 12 15:30:34 os-hv-am-4 kernel: [  610.443164]  ? __wake_up_pollfree+0x40/0x40
Jun 12 15:30:34 os-hv-am-4 kernel: [  610.443182]  drbd_transport_shutdown+0x64/0xa0 [drbd]
Jun 12 15:30:34 os-hv-am-4 kernel: [  610.443196]  drbd_unregister_connection+0x12e/0x220 [drbd]
Jun 12 15:30:34 os-hv-am-4 kernel: [  610.443209]  del_connection+0x95/0x150 [drbd]
Jun 12 15:30:34 os-hv-am-4 kernel: [  610.443212]  ? _cond_resched+0x19/0x30
Jun 12 15:30:34 os-hv-am-4 kernel: [  610.443224]  drbd_adm_down+0xee/0x350 [drbd]
Jun 12 15:30:34 os-hv-am-4 kernel: [  610.443227]  ? remove_wait_queue+0x47/0x50
Jun 12 15:30:34 os-hv-am-4 kernel: [  610.443232]  genl_family_rcv_msg+0x1b9/0x470
Jun 12 15:30:34 os-hv-am-4 kernel: [  610.443235]  genl_rcv_msg+0x4c/0xa0
Jun 12 15:30:34 os-hv-am-4 kernel: [  610.443237]  ? _cond_resched+0x19/0x30
Jun 12 15:30:34 os-hv-am-4 kernel: [  610.443238]  ? genl_family_rcv_msg+0x470/0x470
Jun 12 15:30:34 os-hv-am-4 kernel: [  610.443240]  netlink_rcv_skb+0x50/0x120
Jun 12 15:30:34 os-hv-am-4 kernel: [  610.443242]  genl_rcv+0x29/0x40
Jun 12 15:30:34 os-hv-am-4 kernel: [  610.443244]  netlink_unicast+0x1a8/0x250
Jun 12 15:30:34 os-hv-am-4 kernel: [  610.443246]  netlink_sendmsg+0x240/0x480
Jun 12 15:30:34 os-hv-am-4 kernel: [  610.443249]  ? mem_cgroup_commit_charge+0x63/0x490
Jun 12 15:30:34 os-hv-am-4 kernel: [  610.443254]  sock_sendmsg+0x65/0x70
Jun 12 15:30:34 os-hv-am-4 kernel: [  610.443256]  sock_write_iter+0x93/0xf0
Jun 12 15:30:34 os-hv-am-4 kernel: [  610.443260]  new_sync_write+0x125/0x1c0
Jun 12 15:30:34 os-hv-am-4 kernel: [  610.443263]  __vfs_write+0x29/0x40
Jun 12 15:30:34 os-hv-am-4 kernel: [  610.443265]  vfs_write+0xb9/0x1a0
Jun 12 15:30:34 os-hv-am-4 kernel: [  610.443267]  ksys_write+0x67/0xe0
Jun 12 15:30:34 os-hv-am-4 kernel: [  610.443269]  __x64_sys_write+0x1a/0x20
Jun 12 15:30:34 os-hv-am-4 kernel: [  610.443272]  do_syscall_64+0x57/0x190
Jun 12 15:30:34 os-hv-am-4 kernel: [  610.443275]  entry_SYSCALL_64_after_hwframe+0x5c/0xc1
Jun 12 15:30:34 os-hv-am-4 kernel: [  610.443277] RIP: 0033:0x7f08fa258077
Jun 12 15:30:34 os-hv-am-4 kernel: [  610.443285] Code: Bad RIP value.
Jun 12 15:30:34 os-hv-am-4 kernel: [  610.443287] RSP: 002b:00007ffd3fb2c108 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
Jun 12 15:30:34 os-hv-am-4 kernel: [  610.443289] RAX: ffffffffffffffda RBX: 000000000000004c RCX: 00007f08fa258077
Jun 12 15:30:34 os-hv-am-4 kernel: [  610.443290] RDX: 000000000000004c RSI: 0000564099f3dd40 RDI: 0000000000000004
Jun 12 15:30:34 os-hv-am-4 kernel: [  610.443291] RBP: 0000564099f3dd40 R08: 00007ffd3fb2c184 R09: 0000000000000000
Jun 12 15:30:34 os-hv-am-4 kernel: [  610.443292] R10: 0000564099f3b010 R11: 0000000000000246 R12: 000000000000004c
Jun 12 15:30:34 os-hv-am-4 kernel: [  610.443293] R13: 0000000000000004 R14: 00000000ffffffff R15: 0000000000000000
Jun 12 15:32:35 os-hv-am-4 kernel: [  731.267490] INFO: task drbdsetup:3646 blocked for more than 362 seconds.
Jun 12 15:32:35 os-hv-am-4 kernel: [  731.267539]       Tainted: G        W  OE     5.4.0-139-generic #156-Ubuntu
Jun 12 15:32:35 os-hv-am-4 kernel: [  731.267576] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jun 12 15:32:35 os-hv-am-4 kernel: [  731.267617] drbdsetup       D    0  3646      1 0x00004004
Jun 12 15:32:35 os-hv-am-4 kernel: [  731.267621] Call Trace:
Jun 12 15:32:35 os-hv-am-4 kernel: [  731.267632]  __schedule+0x2e3/0x740
Jun 12 15:32:35 os-hv-am-4 kernel: [  731.267635]  schedule+0x42/0xb0
Jun 12 15:32:35 os-hv-am-4 kernel: [  731.267644]  dtr_free+0x315/0x350 [drbd_transport_rdma]
Jun 12 15:32:35 os-hv-am-4 kernel: [  731.267648]  ? __wake_up_pollfree+0x40/0x40
Jun 12 15:32:35 os-hv-am-4 kernel: [  731.267667]  drbd_transport_shutdown+0x64/0xa0 [drbd]
Jun 12 15:32:35 os-hv-am-4 kernel: [  731.267680]  drbd_unregister_connection+0x12e/0x220 [drbd]
Jun 12 15:32:35 os-hv-am-4 kernel: [  731.267694]  del_connection+0x95/0x150 [drbd]
Jun 12 15:32:35 os-hv-am-4 kernel: [  731.267697]  ? _cond_resched+0x19/0x30
Jun 12 15:32:35 os-hv-am-4 kernel: [  731.267710]  drbd_adm_down+0xee/0x350 [drbd]
Jun 12 15:32:35 os-hv-am-4 kernel: [  731.267712]  ? remove_wait_queue+0x47/0x50
Jun 12 15:32:35 os-hv-am-4 kernel: [  731.267717]  genl_family_rcv_msg+0x1b9/0x470
Jun 12 15:32:35 os-hv-am-4 kernel: [  731.267720]  genl_rcv_msg+0x4c/0xa0
Jun 12 15:32:35 os-hv-am-4 kernel: [  731.267722]  ? _cond_resched+0x19/0x30
Jun 12 15:32:35 os-hv-am-4 kernel: [  731.267724]  ? genl_family_rcv_msg+0x470/0x470
Jun 12 15:32:35 os-hv-am-4 kernel: [  731.267725]  netlink_rcv_skb+0x50/0x120
Jun 12 15:32:35 os-hv-am-4 kernel: [  731.267728]  genl_rcv+0x29/0x40
Jun 12 15:32:35 os-hv-am-4 kernel: [  731.267729]  netlink_unicast+0x1a8/0x250
Jun 12 15:32:35 os-hv-am-4 kernel: [  731.267731]  netlink_sendmsg+0x240/0x480
Jun 12 15:32:35 os-hv-am-4 kernel: [  731.267735]  ? mem_cgroup_commit_charge+0x63/0x490
Jun 12 15:32:35 os-hv-am-4 kernel: [  731.267739]  sock_sendmsg+0x65/0x70
Jun 12 15:32:35 os-hv-am-4 kernel: [  731.267742]  sock_write_iter+0x93/0xf0
Jun 12 15:32:35 os-hv-am-4 kernel: [  731.267746]  new_sync_write+0x125/0x1c0
Jun 12 15:32:35 os-hv-am-4 kernel: [  731.267749]  __vfs_write+0x29/0x40
Jun 12 15:32:35 os-hv-am-4 kernel: [  731.267750]  vfs_write+0xb9/0x1a0
Jun 12 15:32:35 os-hv-am-4 kernel: [  731.267752]  ksys_write+0x67/0xe0
Jun 12 15:32:35 os-hv-am-4 kernel: [  731.267754]  __x64_sys_write+0x1a/0x20
Jun 12 15:32:35 os-hv-am-4 kernel: [  731.267757]  do_syscall_64+0x57/0x190
Jun 12 15:32:35 os-hv-am-4 kernel: [  731.267760]  entry_SYSCALL_64_after_hwframe+0x5c/0xc1
Jun 12 15:32:35 os-hv-am-4 kernel: [  731.267763] RIP: 0033:0x7f08fa258077
Jun 12 15:32:35 os-hv-am-4 kernel: [  731.267767] Code: Bad RIP value.
Jun 12 15:32:35 os-hv-am-4 kernel: [  731.267769] RSP: 002b:00007ffd3fb2c108 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
Jun 12 15:32:35 os-hv-am-4 kernel: [  731.267771] RAX: ffffffffffffffda RBX: 000000000000004c RCX: 00007f08fa258077
Jun 12 15:32:35 os-hv-am-4 kernel: [  731.267772] RDX: 000000000000004c RSI: 0000564099f3dd40 RDI: 0000000000000004
Jun 12 15:32:35 os-hv-am-4 kernel: [  731.267773] RBP: 0000564099f3dd40 R08: 00007ffd3fb2c184 R09: 0000000000000000
Jun 12 15:32:35 os-hv-am-4 kernel: [  731.267774] R10: 0000564099f3b010 R11: 0000000000000246 R12: 000000000000004c
Jun 12 15:32:35 os-hv-am-4 kernel: [  731.267775] R13: 0000000000000004 R14: 00000000ffffffff R15: 0000000000000000
Jun 12 15:32:45 os-hv-am-4 kernel: [  741.203227] SGI XFS with ACLs, security attributes, realtime, no debug enabled
Jun 12 15:32:45 os-hv-am-4 kernel: [  741.212617] JFS: nTxBlock = 8192, nTxLock = 65536
Jun 12 15:32:45 os-hv-am-4 kernel: [  741.233034] ntfs: driver 2.1.32 [Flags: R/O MODULE].
Jun 12 15:32:45 os-hv-am-4 kernel: [  741.256163] QNX4 filesystem 0.2.3 registered.
Jun 12 15:32:45 os-hv-am-4 kernel: [  741.294546] Btrfs loaded, crc32c=crc32c-intel
Jun 12 15:32:46 os-hv-am-4 kernel: [  742.073017] device-mapper: table: 253:4: linear: Device lookup failed
Jun 12 15:32:46 os-hv-am-4 kernel: [  742.073492] device-mapper: ioctl: error adding target to table
Jun 12 15:34:36 os-hv-am-4 kernel: [  852.093603] INFO: task drbdsetup:3646 blocked for more than 483 seconds.
Jun 12 15:34:36 os-hv-am-4 kernel: [  852.093654]       Tainted: G        W  OE     5.4.0-139-generic #156-Ubuntu
Jun 12 15:34:36 os-hv-am-4 kernel: [  852.093691] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jun 12 15:34:36 os-hv-am-4 kernel: [  852.093731] drbdsetup       D    0  3646      1 0x00004004
Jun 12 15:34:36 os-hv-am-4 kernel: [  852.093735] Call Trace:
Jun 12 15:34:36 os-hv-am-4 kernel: [  852.093748]  __schedule+0x2e3/0x740
Jun 12 15:34:36 os-hv-am-4 kernel: [  852.093751]  schedule+0x42/0xb0
Jun 12 15:34:36 os-hv-am-4 kernel: [  852.093761]  dtr_free+0x315/0x350 [drbd_transport_rdma]
Jun 12 15:34:36 os-hv-am-4 kernel: [  852.093767]  ? __wake_up_pollfree+0x40/0x40
Jun 12 15:34:36 os-hv-am-4 kernel: [  852.093815]  drbd_transport_shutdown+0x64/0xa0 [drbd]
Jun 12 15:34:36 os-hv-am-4 kernel: [  852.093829]  drbd_unregister_connection+0x12e/0x220 [drbd]
Jun 12 15:34:36 os-hv-am-4 kernel: [  852.093843]  del_connection+0x95/0x150 [drbd]
Jun 12 15:34:36 os-hv-am-4 kernel: [  852.093845]  ? _cond_resched+0x19/0x30
Jun 12 15:34:36 os-hv-am-4 kernel: [  852.093858]  drbd_adm_down+0xee/0x350 [drbd]
Jun 12 15:34:36 os-hv-am-4 kernel: [  852.093860]  ? remove_wait_queue+0x47/0x50
Jun 12 15:34:36 os-hv-am-4 kernel: [  852.093865]  genl_family_rcv_msg+0x1b9/0x470
Jun 12 15:34:36 os-hv-am-4 kernel: [  852.093868]  genl_rcv_msg+0x4c/0xa0
Jun 12 15:34:36 os-hv-am-4 kernel: [  852.093870]  ? _cond_resched+0x19/0x30
Jun 12 15:34:36 os-hv-am-4 kernel: [  852.093872]  ? genl_family_rcv_msg+0x470/0x470
Jun 12 15:34:36 os-hv-am-4 kernel: [  852.093874]  netlink_rcv_skb+0x50/0x120
Jun 12 15:34:36 os-hv-am-4 kernel: [  852.093876]  genl_rcv+0x29/0x40
Jun 12 15:34:36 os-hv-am-4 kernel: [  852.093878]  netlink_unicast+0x1a8/0x250
Jun 12 15:34:36 os-hv-am-4 kernel: [  852.093880]  netlink_sendmsg+0x240/0x480
Jun 12 15:34:36 os-hv-am-4 kernel: [  852.093883]  ? mem_cgroup_commit_charge+0x63/0x490
Jun 12 15:34:36 os-hv-am-4 kernel: [  852.093888]  sock_sendmsg+0x65/0x70
Jun 12 15:34:36 os-hv-am-4 kernel: [  852.093891]  sock_write_iter+0x93/0xf0
Jun 12 15:34:36 os-hv-am-4 kernel: [  852.093894]  new_sync_write+0x125/0x1c0
Jun 12 15:34:36 os-hv-am-4 kernel: [  852.093897]  __vfs_write+0x29/0x40
Jun 12 15:34:36 os-hv-am-4 kernel: [  852.093899]  vfs_write+0xb9/0x1a0
Jun 12 15:34:36 os-hv-am-4 kernel: [  852.093901]  ksys_write+0x67/0xe0
Jun 12 15:34:36 os-hv-am-4 kernel: [  852.093903]  __x64_sys_write+0x1a/0x20
Jun 12 15:34:36 os-hv-am-4 kernel: [  852.093907]  do_syscall_64+0x57/0x190
Jun 12 15:34:36 os-hv-am-4 kernel: [  852.093910]  entry_SYSCALL_64_after_hwframe+0x5c/0xc1
Jun 12 15:34:36 os-hv-am-4 kernel: [  852.093913] RIP: 0033:0x7f08fa258077
Jun 12 15:34:36 os-hv-am-4 kernel: [  852.093918] Code: Bad RIP value.
Jun 12 15:34:36 os-hv-am-4 kernel: [  852.093920] RSP: 002b:00007ffd3fb2c108 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
Jun 12 15:34:36 os-hv-am-4 kernel: [  852.093922] RAX: ffffffffffffffda RBX: 000000000000004c RCX: 00007f08fa258077
Jun 12 15:34:36 os-hv-am-4 kernel: [  852.093923] RDX: 000000000000004c RSI: 0000564099f3dd40 RDI: 0000000000000004
Jun 12 15:34:36 os-hv-am-4 kernel: [  852.093924] RBP: 0000564099f3dd40 R08: 00007ffd3fb2c184 R09: 0000000000000000
Jun 12 15:34:36 os-hv-am-4 kernel: [  852.093925] R10: 0000564099f3b010 R11: 0000000000000246 R12: 000000000000004c
Jun 12 15:34:36 os-hv-am-4 kernel: [  852.093926] R13: 0000000000000004 R14: 00000000ffffffff R15: 0000000000000000

drbd version: 9.2.2

to reproduce

setup rdma synced disk. setup first node, setup second node. connect them, disconnect them, try to shut any of the two down. They will be stuck forever and only a hard reboot will release this.

drbd-9.1.7 build errors on RHEL9

Hi,

I'm trying to build DRBD 9.1.7 kernel module on RHEL9 (released last week) and I get the following build errors:

Executing(%build): /bin/sh -e /var/tmp/rpm-tmp.t8btTe

  • umask 022

  • cd /home/phil/rpmbuild/BUILD

  • cd drbd-9.1.7

  • /usr/bin/make -j32 module KDIR=/usr/src/kernels/5.14.0-70.13.1.el9_0.x86_64 'KVER=%{kversion}'
    Need a git checkout to regenerate drbd/.drbd_git_revision
    make[1]: Entering directory '/home/phil/rpmbuild/BUILD/drbd-9.1.7/drbd'

    Calling toplevel makefile of kernel source tree, which I believe is in
    KDIR=/usr/src/kernels/5.14.0-70.13.1.el9_0.x86_64

/usr/bin/make -C /usr/src/kernels/5.14.0-70.13.1.el9_0.x86_64 M=/home/phil/rpmbuild/BUILD/drbd-9.1.7/drbd modules
COMPAT __vmalloc_has_2_params
COMPAT before_4_13_kernel_read
COMPAT blkdev_issue_zeroout_discard
COMPAT can_include_vermagic_h
COMPAT genl_policy_in_ops
COMPAT have_BIO_MAX_VECS
COMPAT have_CRYPTO_TFM_NEED_KEY
COMPAT have_SHASH_DESC_ON_STACK
COMPAT have_WB_congested_enum
COMPAT have_allow_kernel_signal
COMPAT have_bdgrab
COMPAT have_bdi_congested_fn
COMPAT have_bio_bi_bdev
COMPAT have_bio_bi_error
COMPAT have_bio_bi_opf
COMPAT have_bio_bi_status
COMPAT have_bio_clone_fast
COMPAT have_bio_op_shift
COMPAT have_bio_set_dev
COMPAT have_bio_set_op_attrs
COMPAT have_bio_start_io_acct
COMPAT have_bioset_init
COMPAT have_bioset_need_bvecs
COMPAT have_blk_alloc_disk
COMPAT have_blk_alloc_queue_rh
COMPAT have_blk_check_plugged
COMPAT have_blk_qc_t_make_request
COMPAT have_blk_qc_t_submit_bio
COMPAT have_blk_queue_flag_set
COMPAT have_blk_queue_make_request
COMPAT have_blk_queue_merge_bvec
COMPAT have_blk_queue_split_bio
COMPAT have_blk_queue_split_q_bio
COMPAT have_blk_queue_split_q_bio_bioset
COMPAT have_blk_queue_update_readahead
COMPAT have_blk_queue_write_cache
COMPAT have_d_inode
COMPAT have_disk_update_readahead
COMPAT have_fallthrough
COMPAT have_fs_dax_get_by_bdev
COMPAT have_generic_start_io_acct_q_rw_sect_part
COMPAT have_generic_start_io_acct_rw_sect_part
COMPAT have_genl_family_parallel_ops
COMPAT have_hd_struct
COMPAT have_ib_cq_init_attr
COMPAT have_ib_get_dma_mr
COMPAT have_idr_is_empty
COMPAT have_inode_lock
COMPAT have_ktime_to_timespec64
COMPAT have_kvfree
COMPAT have_max_send_recv_sge
COMPAT have_nla_nest_start_noflag
COMPAT have_nla_parse_deprecated
COMPAT have_nla_put_64bit
COMPAT have_nla_strscpy
COMPAT have_part_stat_h
COMPAT have_part_stat_read_accum
COMPAT have_pointer_backing_dev_info
COMPAT have_proc_create_single
COMPAT have_queue_flag_stable_writes
COMPAT have_rb_declare_callbacks_max
COMPAT have_refcount_inc
COMPAT have_req_hardbarrier
COMPAT have_req_noidle
COMPAT have_req_nounmap
COMPAT have_req_op_write
COMPAT have_req_op_write_zeroes
COMPAT have_req_write
COMPAT have_revalidate_disk_size
COMPAT have_sched_set_fifo
COMPAT have_security_netlink_recv
COMPAT have_sendpage_ok
COMPAT have_set_capacity_and_notify
COMPAT have_shash_desc_zero
COMPAT have_simple_positive
COMPAT have_struct_bvec_iter
COMPAT have_sock_set_keepalive
COMPAT have_struct_size
COMPAT have_submit_bio_noacct
COMPAT have_tcp_sock_set_cork
COMPAT have_tcp_sock_set_nodelay
COMPAT have_tcp_sock_set_quickack
COMPAT have_time64_to_tm
COMPAT have_timer_setup
COMPAT have_void_make_request
COMPAT have_void_submit_bio
COMPAT ib_alloc_pd_has_2_params
COMPAT ib_device_has_ops
COMPAT ib_post_send_const_params
COMPAT ib_query_device_has_3_params
COMPAT need_make_request_recursion
COMPAT part_stat_read_takes_block_device
COMPAT queue_limits_has_discard_zeroes_data
COMPAT rdma_create_id_has_net_ns
COMPAT sock_create_kern_has_five_parameters
COMPAT sock_ops_returns_addr_len
COMPAT struct_gendisk_has_backing_dev_info
UPD /home/phil/rpmbuild/BUILD/drbd-9.1.7/drbd/compat.5.14.0-70.13.1.el9_0.x86_64.h
UPD /home/phil/rpmbuild/BUILD/drbd-9.1.7/drbd/compat.h
./drbd-kernel-compat/gen_compat_patch.sh: line 12: spatch: command not found
./drbd-kernel-compat/gen_compat_patch.sh: line 45: hash: spatch: not found
INFO: no suitable spatch found; trying spatch-as-a-service;
be patient, may take up to 10 minutes
if it is in the server side cache it might only take a second
SPAAS 8ce47eaf58b6993d6b790e6659e4b58f
Successfully connected to SPAAS ('')
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 8585 100 3353 0 5232 17373 27108 --:--:-- --:--:-- --:--:-- 44481
You can create a new .tgz including this pre-computed compat patch
by calling "make unpatch ; echo drbd-9.1.7/drbd/drbd-kernel-compat/cocci_cache/8ce47eaf58b6993d6b790e6659e4b58f/compat.patch >>.filelist ; make tgz"
PATCH
patching file ./drbd_int.h
patching file drbd_main.c
patching file drbd_debugfs.c
patching file drbd_nl.c
patching file drbd_req.c
patching file drbd_receiver.c
patching file drbd-headers/linux/genl_magic_func.h
Hunk #2 succeeded at 312 (offset -20 lines).
CC [M] /home/phil/rpmbuild/BUILD/drbd-9.1.7/drbd/drbd_dax_pmem.o
CC [M] /home/phil/rpmbuild/BUILD/drbd-9.1.7/drbd/drbd_debugfs.o
CC [M] /home/phil/rpmbuild/BUILD/drbd-9.1.7/drbd/drbd_bitmap.o
CC [M] /home/phil/rpmbuild/BUILD/drbd-9.1.7/drbd/drbd_proc.o
CC [M] /home/phil/rpmbuild/BUILD/drbd-9.1.7/drbd/drbd_sender.o
CC [M] /home/phil/rpmbuild/BUILD/drbd-9.1.7/drbd/drbd_receiver.o
CC [M] /home/phil/rpmbuild/BUILD/drbd-9.1.7/drbd/drbd_req.o
CC [M] /home/phil/rpmbuild/BUILD/drbd-9.1.7/drbd/drbd_actlog.o
CC [M] /home/phil/rpmbuild/BUILD/drbd-9.1.7/drbd/lru_cache.o
CC [M] /home/phil/rpmbuild/BUILD/drbd-9.1.7/drbd/drbd_main.o
CC [M] /home/phil/rpmbuild/BUILD/drbd-9.1.7/drbd/drbd_strings.o
CC [M] /home/phil/rpmbuild/BUILD/drbd-9.1.7/drbd/drbd_nl.o
CC [M] /home/phil/rpmbuild/BUILD/drbd-9.1.7/drbd/drbd_interval.o
CC [M] /home/phil/rpmbuild/BUILD/drbd-9.1.7/drbd/drbd_state.o
CC [M] /home/phil/rpmbuild/BUILD/drbd-9.1.7/drbd/drbd-kernel-compat/drbd_wrappers.o
CC [M] /home/phil/rpmbuild/BUILD/drbd-9.1.7/drbd/drbd_nla.o
CC [M] /home/phil/rpmbuild/BUILD/drbd-9.1.7/drbd/drbd_transport.o
CC [M] /home/phil/rpmbuild/BUILD/drbd-9.1.7/drbd/drbd_transport_tcp.o
In file included from /home/phil/rpmbuild/BUILD/drbd-9.1.7/drbd/drbd_dax_pmem.c:28:
/home/phil/rpmbuild/BUILD/drbd-9.1.7/drbd/drbd_int.h: In function 'drbd_submit_bio_noacct':
/home/phil/rpmbuild/BUILD/drbd-9.1.7/drbd/drbd_int.h:2101:17: error: implicit declaration of function 'generic_make_request' [-Werror=implicit-function-declaration]
2101 | generic_make_request(bio);
| ^~~~~~~~~~~~~~~~~~~~
In file included from /home/phil/rpmbuild/BUILD/drbd-9.1.7/drbd/drbd_proc.c:22:
/home/phil/rpmbuild/BUILD/drbd-9.1.7/drbd/drbd_int.h: In function 'drbd_submit_bio_noacct':
/home/phil/rpmbuild/BUILD/drbd-9.1.7/drbd/drbd_int.h:2101:17: error: implicit declaration of function 'generic_make_request' [-Werror=implicit-function-declaration]
2101 | generic_make_request(bio);
| ^~~~~~~~~~~~~~~~~~~~
In file included from /home/phil/rpmbuild/BUILD/drbd-9.1.7/drbd/drbd_debugfs.c:10:
/home/phil/rpmbuild/BUILD/drbd-9.1.7/drbd/drbd_int.h: In function 'drbd_submit_bio_noacct':
/home/phil/rpmbuild/BUILD/drbd-9.1.7/drbd/drbd_int.h:2101:17: error: implicit declaration of function 'generic_make_request' [-Werror=implicit-function-declaration]
2101 | generic_make_request(bio);
| ^~~~~~~~~~~~~~~~~~~~
In file included from /home/phil/rpmbuild/BUILD/drbd-9.1.7/drbd/drbd_req.c:18:
/home/phil/rpmbuild/BUILD/drbd-9.1.7/drbd/drbd_int.h: In function 'drbd_submit_bio_noacct':
/home/phil/rpmbuild/BUILD/drbd-9.1.7/drbd/drbd_int.h:2101:17: error: implicit declaration of function 'generic_make_request' [-Werror=implicit-function-declaration]
2101 | generic_make_request(bio);
| ^~~~~~~~~~~~~~~~~~~~
In file included from /home/phil/rpmbuild/BUILD/drbd-9.1.7/drbd/drbd_actlog.c:19:
/home/phil/rpmbuild/BUILD/drbd-9.1.7/drbd/drbd_int.h: In function 'drbd_submit_bio_noacct':
/home/phil/rpmbuild/BUILD/drbd-9.1.7/drbd/drbd_int.h:2101:17: error: implicit declaration of function 'generic_make_request' [-Werror=implicit-function-declaration]
2101 | generic_make_request(bio);
| ^~~~~~~~~~~~~~~~~~~~
In file included from /home/phil/rpmbuild/BUILD/drbd-9.1.7/drbd/drbd_nl.c:25:
/home/phil/rpmbuild/BUILD/drbd-9.1.7/drbd/drbd_int.h: In function 'drbd_submit_bio_noacct':
/home/phil/rpmbuild/BUILD/drbd-9.1.7/drbd/drbd_int.h:2101:17: error: implicit declaration of function 'generic_make_request' [-Werror=implicit-function-declaration]
2101 | generic_make_request(bio);
| ^~~~~~~~~~~~~~~~~~~~
In file included from /home/phil/rpmbuild/BUILD/drbd-9.1.7/drbd/drbd_sender.c:28:
/home/phil/rpmbuild/BUILD/drbd-9.1.7/drbd/drbd_int.h: In function 'drbd_submit_bio_noacct':
/home/phil/rpmbuild/BUILD/drbd-9.1.7/drbd/drbd_int.h:2101:17: error: implicit declaration of function 'generic_make_request' [-Werror=implicit-function-declaration]
2101 | generic_make_request(bio);
| ^~~~~~~~~~~~~~~~~~~~
cc1: some warnings being treated as errors
make[3]: *** [scripts/Makefile.build:271: /home/phil/rpmbuild/BUILD/drbd-9.1.7/drbd/drbd_proc.o] Error 1
make[3]: *** Waiting for unfinished jobs....
In file included from /home/phil/rpmbuild/BUILD/drbd-9.1.7/drbd/drbd_state.c:19:
/home/phil/rpmbuild/BUILD/drbd-9.1.7/drbd/drbd_int.h: In function 'drbd_submit_bio_noacct':
/home/phil/rpmbuild/BUILD/drbd-9.1.7/drbd/drbd_int.h:2101:17: error: implicit declaration of function 'generic_make_request' [-Werror=implicit-function-declaration]
2101 | generic_make_request(bio);
| ^~~~~~~~~~~~~~~~~~~~
cc1: some warnings being treated as errors
/home/phil/rpmbuild/BUILD/drbd-9.1.7/drbd/drbd_req.c: At top level:
/home/phil/rpmbuild/BUILD/drbd-9.1.7/drbd/drbd_req.c:2341:6: error: conflicting types for 'drbd_submit_bio'; have 'void(struct bio *)'
2341 | void drbd_submit_bio(struct bio *bio)
| ^~~~~~~~~~~~~~~
In file included from /home/phil/rpmbuild/BUILD/drbd-9.1.7/drbd/drbd_req.c:18:
/home/phil/rpmbuild/BUILD/drbd-9.1.7/drbd/drbd_int.h:1882:17: note: previous declaration of 'drbd_submit_bio' with type 'blk_qc_t(struct bio *)' {aka 'unsigned int(struct bio *)'}
1882 | extern blk_qc_t drbd_submit_bio(struct bio bio);
| ^~~~~~~~~~~~~~~
make[3]: *** [scripts/Makefile.build:271: /home/phil/rpmbuild/BUILD/drbd-9.1.7/drbd/drbd_dax_pmem.o] Error 1
cc1: some warnings being treated as errors
make[3]: *** [scripts/Makefile.build:271: /home/phil/rpmbuild/BUILD/drbd-9.1.7/drbd/drbd_req.o] Error 1
In file included from /home/phil/rpmbuild/BUILD/drbd-9.1.7/drbd/drbd_main.c:47:
/home/phil/rpmbuild/BUILD/drbd-9.1.7/drbd/drbd_int.h: In function 'drbd_submit_bio_noacct':
/home/phil/rpmbuild/BUILD/drbd-9.1.7/drbd/drbd_int.h:2101:17: error: implicit declaration of function 'generic_make_request' [-Werror=implicit-function-declaration]
2101 | generic_make_request(bio);
| ^~~~~~~~~~~~~~~~~~~~
/home/phil/rpmbuild/BUILD/drbd-9.1.7/drbd/drbd_main.c: At top level:
/home/phil/rpmbuild/BUILD/drbd-9.1.7/drbd/drbd_main.c:152:27: error: initialization of 'void (
)(struct bio )' from incompatible pointer type 'blk_qc_t ()(struct bio )' {aka 'unsigned int ()(struct bio *)'} [-Werror=incompatible-pointer-types]
152 | .submit_bio = drbd_submit_bio,
| ^~~~~~~~~~~~~~~
/home/phil/rpmbuild/BUILD/drbd-9.1.7/drbd/drbd_main.c:152:27: note: (near initialization for 'drbd_ops.submit_bio')
In file included from /home/phil/rpmbuild/BUILD/drbd-9.1.7/drbd/drbd_receiver.c:37:
/home/phil/rpmbuild/BUILD/drbd-9.1.7/drbd/drbd_int.h: In function 'drbd_submit_bio_noacct':
/home/phil/rpmbuild/BUILD/drbd-9.1.7/drbd/drbd_int.h:2101:17: error: implicit declaration of function 'generic_make_request' [-Werror=implicit-function-declaration]
2101 | generic_make_request(bio);
| ^~~~~~~~~~~~~~~~~~~~
In file included from /home/phil/rpmbuild/BUILD/drbd-9.1.7/drbd/drbd_transport.c:7:
/home/phil/rpmbuild/BUILD/drbd-9.1.7/drbd/drbd_int.h: In function 'drbd_submit_bio_noacct':
/home/phil/rpmbuild/BUILD/drbd-9.1.7/drbd/drbd_int.h:2101:17: error: implicit declaration of function 'generic_make_request' [-Werror=implicit-function-declaration]
2101 | generic_make_request(bio);
| ^~~~~~~~~~~~~~~~~~~~
In file included from /home/phil/rpmbuild/BUILD/drbd-9.1.7/drbd/drbd_bitmap.c:23:
/home/phil/rpmbuild/BUILD/drbd-9.1.7/drbd/drbd_int.h: In function 'drbd_submit_bio_noacct':
/home/phil/rpmbuild/BUILD/drbd-9.1.7/drbd/drbd_int.h:2101:17: error: implicit declaration of function 'generic_make_request' [-Werror=implicit-function-declaration]
2101 | generic_make_request(bio);
| ^~~~~~~~~~~~~~~~~~~~
cc1: some warnings being treated as errors
make[3]: *** [scripts/Makefile.build:271: /home/phil/rpmbuild/BUILD/drbd-9.1.7/drbd/drbd_transport.o] Error 1
cc1: some warnings being treated as errors
make[3]: *** [scripts/Makefile.build:271: /home/phil/rpmbuild/BUILD/drbd-9.1.7/drbd/drbd_actlog.o] Error 1
cc1: some warnings being treated as errors
make[3]: *** [scripts/Makefile.build:271: /home/phil/rpmbuild/BUILD/drbd-9.1.7/drbd/drbd_debugfs.o] Error 1
cc1: some warnings being treated as errors
make[3]: *** [scripts/Makefile.build:271: /home/phil/rpmbuild/BUILD/drbd-9.1.7/drbd/drbd_sender.o] Error 1
cc1: some warnings being treated as errors
make[3]: *** [scripts/Makefile.build:271: /home/phil/rpmbuild/BUILD/drbd-9.1.7/drbd/drbd_bitmap.o] Error 1
cc1: some warnings being treated as errors
make[3]: *** [scripts/Makefile.build:271: /home/phil/rpmbuild/BUILD/drbd-9.1.7/drbd/drbd_state.o] Error 1
cc1: some warnings being treated as errors
make[3]: *** [scripts/Makefile.build:271: /home/phil/rpmbuild/BUILD/drbd-9.1.7/drbd/drbd_main.o] Error 1
cc1: some warnings being treated as errors
make[3]: *** [scripts/Makefile.build:271: /home/phil/rpmbuild/BUILD/drbd-9.1.7/drbd/drbd_nl.o] Error 1
cc1: some warnings being treated as errors
make[3]: *** [scripts/Makefile.build:271: /home/phil/rpmbuild/BUILD/drbd-9.1.7/drbd/drbd_receiver.o] Error 1
make[2]: *** [Makefile:1862: /home/phil/rpmbuild/BUILD/drbd-9.1.7/drbd] Error 2
make[1]: *** [Makefile:132: kbuild] Error 2
make[1]: Leaving directory '/home/phil/rpmbuild/BUILD/drbd-9.1.7/drbd'
make: *** [Makefile:125: module] Error 2

Include compat patch cache for RHEL 8.6 kernel version 4.18.0-372

Not sure if this is the right place to ask.

The drbd-9.1.7 download on Linbit website (not the github release) does not include a compat patch cache for RHEL 8.6 kernel version 4.18.0-372. I understand that RHEL 8.6 was released in mid-May after drbd-9.1.7, can Linbit include such cache for the RHEL 8.6 kernel in the later release say drbd-9.1.8?

We need to use the patch cache in our build system that don't have coccinelle or Internet to use spatch-as-a-service. Thanks.

How can I enable primary mode on more than 1 node at the same time ?

How can I enable primary mode on more than 2 nodes at the same time ?

In my case what I want to achieve, is a network distributed volumen that will be shared in between multiple active servers (and this without involving a NAS / SAN systems).
The purpose of this volume is to share data in between servers in a HA way.

Is there a way to set as primary all the available nodes so that they all be able to mount the shared drive and modify / read its content at the same time ?

Mounting drbd not working

Than I try mounting the DRBD Device I get this error:

`
[eco_adm@cg3a54d9ac-mstrfs-s302 opt]$ sudo mount --all
mount: /opt/mstrtf: mount(2) Systemaufruf ist fehlgeschlagen: Wrong medium type.

`

Here the Status of the Cluster:
[eco_adm@cg3a54d9ac-mstrfs-s302 mstrtf]$ drbdsetup status -vs r0 node-id:1 role:Secondary suspended:no force-io-failures:no write-ordering:flush volume:0 minor:0 disk:UpToDate backing_dev:/dev/vdb quorum:yes size:20970844 read:0 written:20970844 al-writes:0 bm-writes:403 upper-pending:0 lower-pending:0 al-suspended:no blocked:no cg3a54d9ac-mstrfs-s301.sys.schwarz node-id:0 connection:Connected role:Primary congested:no ap-in-flight:0 rs-in-flight:0 volume:0 replication:Established peer-disk:UpToDate resync-suspended:no received:20970844 sent:0 out-of-sync:0 pending:0 unacked:0
Here the config file:

`
[eco_adm@cg3a54d9ac-mstrfs-s302 opt]$ cat /etc/drbd.d/global_common.conf
global {
usage-count no;
}

common {
disk {
}
net {
cram-hmac-alg sha1;
shared-secret "****";
}
handlers {
}
startup {
}
options {
}
}

resource r0 {
on cg3a54d9ac-mstrfs-s301.sys.schwarz {
device /dev/drbd0;
disk /dev/vdb;
address 10.124.141.15:7788;
meta-disk internal;
}
on cg3a54d9ac-mstrfs-s302.sys.schwarz {
device /dev/drbd0;
disk /dev/vdb;
address 10.124.141.228:7788;
meta-disk internal;
}
}
`
Here the fstab config

UUID=08d8ba03-a737-4caa-80fe-bdcf160bb8e9 / xfs defaults 0 0 UUID=3a94b570-1121-4bbd-8c1e-beb2b73d0c03 /boot xfs defaults 0 0 UUID=0607-95FB /boot/efi vfat defaults,uid=0,gid=0,umask=077,shortname=winnt 0 2 /dev/drbd0 /opt/mstrtf ext4 defaults 0 0

As anybody a Idea why this is happen ?
Thank You

Kernel panic with 9.2.1

Hello,

I am consistently getting errors during initial sync on a new node and the nodes goes completely offline and needs a hard reset. The errors happen within the first 1-2 minutes of the sync.

Here is the log:

[   11.755440] drbd: module verification failed: signature and/or required key missing - tainting kernel
[   11.764733] drbd: initialized. Version: 9.2.1 (api:2/proto:86-121)
[   11.764735] drbd: GIT-hash: 86ec2326fef3aede9f4d46f52bfd35aac4d5eb7e build by root@k8s1, 2022-11-18 23:24:46
[   11.764735] drbd: registered as block device major 147
[   14.482069] bridge: filtering via arp/ip/ip6tables is no longer available by default. Update your scripts to load br_netfilter if you need this.
[   14.704655] audit: type=1400 audit(1668954923.830:9): apparmor="STATUS" operation="profile_load" profile="unconfined" name="docker-default" pid=1493 comm="apparmor_parser"
[   14.830612] Bridge firewalling registered
[   14.858130] Initializing XFRM netlink socket
[   69.886273] ata4.00: Enabling discard_zeroes_data
[   71.102111] drbd music_res: Starting worker thread (from drbdsetup [1809])
[   71.106840] drbd music_res k8s2: Starting sender thread (from drbdsetup [1814])
[   71.112340] drbd music_res/0 drbd1002: meta-data IO uses: blk-bio
[   71.112503] drbd music_res/0 drbd1002: disk( Diskless -> Attaching )
[   71.112508] drbd music_res/0 drbd1002: Maximum number of peer devices = 7
[   71.112595] drbd music_res: Method to ensure write ordering: flush
[   71.112598] drbd music_res/0 drbd1002: drbd_bm_resize called with capacity == 2147491664
[   71.236359] drbd music_res/0 drbd1002: resync bitmap: bits=268436458 words=29360240 pages=57345
[   71.236365] drbd1002: detected capacity change from 0 to 2147491664
[   71.236366] drbd music_res/0 drbd1002: size = 1024 GB (1073745832 KB)
[   71.720226] drbd music_res/0 drbd1002: recounting of set bits took additional 65ms
[   71.720238] drbd music_res/0 drbd1002: disk( Attaching -> UpToDate )
[   71.720240] drbd music_res/0 drbd1002: attached to current UUID: BD56709A2B336DEE
[   71.720271] drbd music_res/0 drbd1002: size = 1024 GB (1073745832 KB)
[   71.726065] drbd music_res k8s2: conn( StandAlone -> Unconnected )
[   71.726986] drbd music_res k8s2: Starting receiver thread (from drbd_w_music_re [1810])
[   71.727043] drbd music_res k8s2: conn( Unconnected -> Connecting )
[   72.235838] drbd music_res k8s2: Handshake to peer 1 successful: Agreed network protocol version 121
[   72.235845] drbd music_res k8s2: Feature flags enabled on protocol level: 0x1f TRIM THIN_RESYNC WRITE_SAME WRITE_ZEROES RESYNC_DAGTAG
[   72.236868] drbd music_res k8s2: Peer authenticated using 20 bytes HMAC
[   72.258571] drbd music_res: Preparing cluster-wide state change 887551500 (0->1 499/146)
[   72.267487] drbd music_res/0 drbd1002 k8s2: drbd_sync_handshake:
[   72.267489] drbd music_res/0 drbd1002 k8s2: self BD56709A2B336DEE:0000000000000000:0000000000000000:0000000000000000 bits:0 flags:20
[   72.267492] drbd music_res/0 drbd1002 k8s2: peer 331E75BC538364B2:BD56709A2B336DEE:0000000000000000:0000000000000000 bits:268436458 flags:1020
[   72.267494] drbd music_res/0 drbd1002 k8s2: uuid_compare()=target-use-bitmap by rule=bitmap-peer
[   72.267499] drbd music_res: State change 887551500: primary_nodes=2, weak_nodes=FFFFFFFFFFFFFFFC
[   72.267501] drbd music_res: Committing cluster-wide state change 887551500 (9ms)
[   72.267514] drbd music_res k8s2: conn( Connecting -> Connected ) peer( Unknown -> Primary )
[   72.267515] drbd music_res/0 drbd1002: disk( UpToDate -> Outdated )
[   72.267516] drbd music_res/0 drbd1002 k8s2: pdsk( DUnknown -> UpToDate ) repl( Off -> WFBitMapT )
[   72.316487] drbd music_res/0 drbd1002 k8s2: receive bitmap stats [Bytes(packets)]: plain 0(0), RLE 23(1), total 23; compression: 100.0%
[   72.353433] drbd music_res/0 drbd1002 k8s2: send bitmap stats [Bytes(packets)]: plain 0(0), RLE 23(1), total 23; compression: 100.0%
[   72.353442] drbd music_res/0 drbd1002 k8s2: helper command: /sbin/drbdadm before-resync-target
[   72.354604] drbd music_res/0 drbd1002 k8s2: helper command: /sbin/drbdadm before-resync-target exit code 0
[   72.354613] drbd music_res/0 drbd1002: disk( Outdated -> Inconsistent )
[   72.354614] drbd music_res/0 drbd1002 k8s2: repl( WFBitMapT -> SyncTarget )
[   72.354639] drbd music_res/0 drbd1002 k8s2: Began resync as SyncTarget (will sync 1073745832 KB [268436458 bits set]).
[   77.737892] BUG: Bad page state in process kworker/u64:5  pfn:21fe2a
[   77.737918] page:00000000d7d6d96d refcount:-1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x21fe2a
[   77.737926] flags: 0x17ffffc0000000(node=0|zone=2|lastcpupid=0x1fffff)
[   77.737930] raw: 0017ffffc0000000 dead000000000100 dead000000000122 0000000000000000
[   77.737932] raw: 0000000000000000 0000000000000000 ffffffffffffffff 0000000000000000
[   77.737932] page dumped because: nonzero _refcount
[   77.737933] Modules linked in: drbd_transport_tcp(OE) nf_conntrack_netlink xfrm_user xfrm_algo xt_addrtype br_netfilter xt_CHECKSUM xt_MASQUERADE xt_conntrack ipt_REJECT nf_reject_ipv4 xt_tcpudp nft_compat nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nf_tables nfnetlink bridge drbd(OE) bcache dm_cache dm_writecache nvme_rdma nvme_fabrics nvmet_rdma nvmet 8021q garp mrp stp llc overlay bonding tls cfg80211 ipmi_ssif rpcrdma rdma_ucm ib_iser libiscsi scsi_transport_iscsi rdma_cm iw_cm ib_cm intel_rapl_msr intel_rapl_common edac_mce_amd iavf kvm_amd snd_hda_intel sunrpc snd_intel_dspcfg ast snd_intel_sdw_acpi kvm drm_vram_helper snd_hda_codec irdma drm_ttm_helper nls_iso8859_1 crct10dif_pclmul ttm snd_hda_core ghash_clmulni_intel i40e snd_hwdep drm_kms_helper aesni_intel snd_pcm i2c_algo_bit ib_uverbs snd_timer fb_sys_fops acpi_ipmi syscopyarea crypto_simd snd sysfillrect joydev cryptd input_leds ccp soundcore ipmi_si sysimgblt ib_core k10temp rapl wmi_bmof ipmi_devintf
[   77.737997]  ipmi_msghandler mac_hid ramoops reed_solomon pstore_blk pstore_zone drm efi_pstore ip_tables x_tables autofs4 raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq raid1 raid0 multipath linear hid_generic usbhid hid cdc_ether usbnet mii dm_thin_pool dm_persistent_data dm_bio_prison dm_bufio libcrc32c nvme crc32_pclmul ahci ice i2c_piix4 libahci nvme_core xhci_pci xhci_pci_renesas wmi
[   77.738026] CPU: 20 PID: 450 Comm: kworker/u64:5 Tainted: G           OE     5.19.0-1009-lowlatency #10-Ubuntu
[   77.738029] Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./X570D4I-NL, BIOS L0.04 04/21/2021
[   77.738030] Workqueue: drbd_as_music_res drbd_send_acks_wf [drbd]
[   77.738052] Call Trace:
[   77.738054]  <TASK>
[   77.738056]  show_stack+0x4e/0x61
[   77.738060]  dump_stack_lvl+0x4a/0x6d
[   77.738064]  dump_stack+0x10/0x18
[   77.738067]  bad_page.cold+0x63/0x8f
[   77.738070]  check_free_page_bad+0x66/0x80
[   77.738073]  free_pcppages_bulk+0x1a2/0x2d0
[   77.738076]  free_unref_page_commit+0x159/0x1b0
[   77.738078]  free_unref_page+0xde/0x180
[   77.738081]  __put_page+0x5d/0xf0
[   77.738084]  drbd_free_pages+0x14d/0x1f0 [drbd]
[   77.738099]  drbd_resync_request_complete+0x4f/0x1f0 [drbd]
[   77.738114]  e_end_resync_block+0x89/0x150 [drbd]
[   77.738127]  drbd_finish_peer_reqs+0xf9/0x170 [drbd]
[   77.738141]  drbd_send_acks_wf+0x68/0xb0 [drbd]
[   77.738155]  process_one_work+0x225/0x400
[   77.738158]  worker_thread+0x50/0x3e0
[   77.738160]  ? rescuer_thread+0x3c0/0x3c0
[   77.738162]  kthread+0xe9/0x110
[   77.738164]  ? kthread_complete_and_exit+0x20/0x20
[   77.738167]  ret_from_fork+0x22/0x30
[   77.738172]  </TASK>
[   77.738173] Disabling lock debugging due to kernel taint
[   77.738179] BUG: Bad page state in process kworker/u64:5  pfn:1144af
[   77.738193] page:000000007dc349d5 refcount:65476 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x1144af
[   77.738197] flags: 0x17ffffc0000000(node=0|zone=2|lastcpupid=0x1fffff)
[   77.738199] raw: 0017ffffc0000000 dead000000000100 dead000000000122 0000000000000000
[   77.738201] raw: 0000000000000000 0000000000000000 0000ffc4ffffffff 0000000000000000
[   77.738202] page dumped because: nonzero _refcount
[   77.738202] Modules linked in: drbd_transport_tcp(OE) nf_conntrack_netlink xfrm_user xfrm_algo xt_addrtype br_netfilter xt_CHECKSUM xt_MASQUERADE xt_conntrack ipt_REJECT nf_reject_ipv4 xt_tcpudp nft_compat nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nf_tables nfnetlink bridge drbd(OE) bcache dm_cache dm_writecache nvme_rdma nvme_fabrics nvmet_rdma nvmet 8021q garp mrp stp llc overlay bonding tls cfg80211 ipmi_ssif rpcrdma rdma_ucm ib_iser libiscsi scsi_transport_iscsi rdma_cm iw_cm ib_cm intel_rapl_msr intel_rapl_common edac_mce_amd iavf kvm_amd snd_hda_intel sunrpc snd_intel_dspcfg ast snd_intel_sdw_acpi kvm drm_vram_helper snd_hda_codec irdma drm_ttm_helper nls_iso8859_1 crct10dif_pclmul ttm snd_hda_core ghash_clmulni_intel i40e snd_hwdep drm_kms_helper aesni_intel snd_pcm i2c_algo_bit ib_uverbs snd_timer fb_sys_fops acpi_ipmi syscopyarea crypto_simd snd sysfillrect joydev cryptd input_leds ccp soundcore ipmi_si sysimgblt ib_core k10temp rapl wmi_bmof ipmi_devintf
[   77.738245]  ipmi_msghandler mac_hid ramoops reed_solomon pstore_blk pstore_zone drm efi_pstore ip_tables x_tables autofs4 raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq raid1 raid0 multipath linear hid_generic usbhid hid cdc_ether usbnet mii dm_thin_pool dm_persistent_data dm_bio_prison dm_bufio libcrc32c nvme crc32_pclmul ahci ice i2c_piix4 libahci nvme_core xhci_pci xhci_pci_renesas wmi
[   77.738267] CPU: 20 PID: 450 Comm: kworker/u64:5 Tainted: G    B      OE     5.19.0-1009-lowlatency #10-Ubuntu
[   77.738269] Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./X570D4I-NL, BIOS L0.04 04/21/2021
[   77.738270] Workqueue: drbd_as_music_res drbd_send_acks_wf [drbd]
[   77.738284] Call Trace:
[   77.738285]  <TASK>
[   77.738286]  show_stack+0x4e/0x61
[   77.738288]  dump_stack_lvl+0x4a/0x6d
[   77.738290]  dump_stack+0x10/0x18
[   77.738293]  bad_page.cold+0x63/0x8f
[   77.738295]  check_free_page_bad+0x66/0x80
[   77.738297]  free_pcppages_bulk+0x1a2/0x2d0
[   77.738299]  free_unref_page_commit+0x159/0x1b0
[   77.738302]  free_unref_page+0xde/0x180
[   77.738304]  __put_page+0x5d/0xf0
[   77.738306]  drbd_free_pages+0x14d/0x1f0 [drbd]
[   77.738320]  drbd_resync_request_complete+0x4f/0x1f0 [drbd]
[   77.738334]  e_end_resync_block+0x89/0x150 [drbd]
[   77.738347]  drbd_finish_peer_reqs+0xf9/0x170 [drbd]
[   77.738361]  drbd_send_acks_wf+0x68/0xb0 [drbd]
[   77.738374]  process_one_work+0x225/0x400
[   77.738376]  worker_thread+0x50/0x3e0
[   77.738378]  ? rescuer_thread+0x3c0/0x3c0
[   77.738380]  kthread+0xe9/0x110
[   77.738382]  ? kthread_complete_and_exit+0x20/0x20
[   77.738385]  ret_from_fork+0x22/0x30
[   77.738388]  </TASK>
[   90.232095] drbd music_res k8s2: Wrong magic value 0x352a77d9 in protocol version 121
[   90.232122] drbd music_res k8s2: conn( Connected -> ProtocolError ) peer( Primary -> Unknown )
[   90.232125] drbd music_res/0 drbd1002 k8s2: pdsk( UpToDate -> DUnknown ) repl( SyncTarget -> Off )
[   90.232370] drbd music_res k8s2: Terminating sender thread
[   90.232373] drbd music_res k8s2: Starting sender thread (from drbd_r_music_re [1826])

This is running on fresh Ubuntu 22.10:

uname -a
Linux k8s1 5.19.0-1009-lowlatency #10-Ubuntu SMP PREEMPT_DYNAMIC Fri Oct 14 20:14:19 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux

And here is the drbd module info:

modinfo drbd
filename:       /lib/modules/5.19.0-1009-lowlatency/updates/dkms/drbd.ko
alias:          block-major-147-*
license:        GPL
version:        9.2.1
description:    drbd - Distributed Replicated Block Device v9.2.1
author:         Philipp Reisner <[email protected]>, Lars Ellenberg <[email protected]>
srcversion:     C238589FEA231CBC3956E71
depends:        libcrc32c
retpoline:      Y
name:           drbd
vermagic:       5.19.0-1009-lowlatency SMP preempt mod_unload modversions 
sig_id:         PKCS#7
signer:         k8s1 Secure Boot Module Signature key
sig_key:        5D:30:20:73:6B:08:5B:45:41:2E:7A:66:D9:38:43:FE:DB:66:AE:2F
sig_hashalgo:   sha512
signature:      33:80:A8:09:B7:AB:91:99:D5:14:8A:D0:18:CB:BD:B9:AC:D9:D3:4B:
                75:BC:89:1B:3D:36:5A:9F:6F:C8:00:0C:68:EF:BA:D3:AC:03:6B:61:
                1B:1A:5E:C2:43:38:F6:D8:D4:FA:8A:A2:3B:8C:5F:A6:A5:71:34:9A:
                3E:9A:54:E3:D2:F6:9C:CB:96:06:50:3F:1A:C6:E5:44:8F:D7:E2:5C:
                09:E3:94:D9:2F:C1:AD:B1:41:13:50:F0:07:D8:AE:C7:07:95:9F:4C:
                7B:95:12:B0:FB:51:8E:79:6E:63:BC:71:B6:AF:27:AA:33:8F:A6:E2:
                19:88:1F:68:D8:5F:96:72:3D:51:49:A7:47:A5:AA:FB:34:44:8C:05:
                9C:E1:D9:E2:F0:C4:4B:FF:A5:AF:D8:32:3D:6A:45:B0:C5:2A:A9:D3:
                A9:5D:B3:39:DA:35:F2:BA:42:25:EE:C3:71:EA:29:EB:B8:DA:CA:6D:
                41:97:8F:65:8E:7E:20:6D:96:83:E9:AC:40:BB:AB:2D:C8:94:68:FA:
                74:CC:D6:6C:EE:61:DD:7E:52:86:36:B0:10:98:A5:64:83:52:7D:83:
                63:9C:9A:27:C3:6B:24:5D:6F:7D:8B:32:67:D5:D1:AF:54:85:68:26:
                32:73:44:47:82:87:60:4C:EE:E5:35:E8:E2:7A:DF:2D
parm:           enable_faults:int
parm:           fault_rate:int
parm:           fault_count:int
parm:           fault_devs:int
parm:           disable_sendpage:bool
parm:           allow_oos:DONT USE! (bool)
parm:           minor_count:Approximate number of drbd devices (1U-255U) (uint)
parm:           usermode_helper:string
parm:           protocol_version_min:drbd_protocol_version
parm:           strict_names:restrict resource and connection names to ascii alnum and a subset of punct (drbd_strict_names)

I've also tested the memory with memtest86 and tried it on a different node as well with the same results.
Let me know if I can provide additional information.
Thanks!

DRBD9 on Debian 11

Hi,
i cant find prebuild packages for Debian 11 anywhere, docs also dont mention whats the correct repo. Can you please provide any source of infos where / how can I deploy DRBD 9.x on Debian platform? (I dont want to build from sources)

BR!

Does DRBD support two different network segment IP addresses (each machine)

When DRBD is at the IP address of a single network segment (each machine), the abnormal network interface may lead to brain fissure of DRBD.
Does DRBD currently support two different network segment addresses (per machine)? If not, please tell me which part of the code should be modified. I could try to modify the following.

Thanks!

li kunyu

9.1.8 can't see metadata from earlier versions

Hi,

I have a couple of CentOS 7 servers which were using older drbd versions from elrepo.org (9.0.30) and after an update to 9.1.8 and a reboot, the resources stayed in Diskless status.

I see this in dmesg:

[Wed Aug 3 14:22:56 2022] drbd disk1/0 drbd1: drbd_md_sync_page_io(,1875385000s,READ) failed with error -5
[Wed Aug 3 14:22:56 2022] drbd disk1/0 drbd1: Error while reading metadata.

After downgrading to drbd 9.1.7 everithing went back to normal.

A few days later, the same happened with a pair of AlmaLinux 8 servers: 9.1.8 stays in Diskless mode, 9.1.7 and older work fine.

Those systems are otherwise fully uptodate.

Lost file when the master host is down.

Hi, we use DRBD 9.14 to synchronous the data(a partition ) between our HA cluster (host1, host2, host 3), once host1 is down, the application on host2 is starting to run, as our test, sometimes a few files in the synchronous partition will be reset to the empty file - the size is 0.

So does this is the DRBD issue or have we missed any configurations?

The below is our configuration files

# DRBD is the result of over a decade of development by LINBIT.
# In case you need professional services for DRBD or have
# feature requests visit http://www.linbit.com

global {
	usage-count yes;

	# Decide what kind of udev symlinks you want for "implicit" volumes
	# (those without explicit volume <vnr> {} block, implied vnr=0):
	# /dev/drbd/by-resource/<resource>/<vnr>   (explicit volumes)
	# /dev/drbd/by-resource/<resource>         (default for implict)
	udev-always-use-vnr; # treat implicit the same as explicit volumes

	# minor-count dialog-refresh disable-ip-verification
	# cmd-timeout-short 5; cmd-timeout-medium 121; cmd-timeout-long 600;
}

common {
	handlers {
		# These are EXAMPLE handlers only.
		# They may have severe implications,
		# like hard resetting the node under certain circumstances.
		# Be careful when choosing your poison.

		# pri-on-incon-degr "/usr/lib/drbd/notify-pri-on-incon-degr.sh; /usr/lib/drbd/notify-emergency-reboot.sh; echo b > /proc/sysrq-trigger ; reboot -f";
		# pri-lost-after-sb "/usr/lib/drbd/notify-pri-lost-after-sb.sh; /usr/lib/drbd/notify-emergency-reboot.sh; echo b > /proc/sysrq-trigger ; reboot -f";
		# local-io-error "/usr/lib/drbd/notify-io-error.sh; /usr/lib/drbd/notify-emergency-shutdown.sh; echo o > /proc/sysrq-trigger ; halt -f";
		# fence-peer "/usr/lib/drbd/crm-fence-peer.sh";
		# split-brain "/usr/lib/drbd/notify-split-brain.sh root";
		# out-of-sync "/usr/lib/drbd/notify-out-of-sync.sh root";
		# before-resync-target "/usr/lib/drbd/snapshot-resync-target-lvm.sh -p 15 -- -c 16k";
		# after-resync-target /usr/lib/drbd/unsnapshot-resync-target-lvm.sh;
		# quorum-lost "/usr/lib/drbd/notify-quorum-lost.sh root";
		# quorum-lost "echo b > /proc/sysrq-trigger ; reboot -f";
	}

	startup {
		# wfc-timeout degr-wfc-timeout outdated-wfc-timeout wait-after-sb
	}

	options {
		# cpu-mask on-no-data-accessible

		# RECOMMENDED for three or more storage nodes with DRBD 9:
		# quorum majority;
		# on-no-quorum suspend-io | io-error;
		quorum majority;
		on-no-quorum io-error;
	}

	disk {
		# size on-io-error fencing disk-barrier disk-flushes
		# disk-drain md-flushes resync-rate resync-after al-extents
                # c-plan-ahead c-delay-target c-fill-target c-max-rate
                # c-min-rate disk-timeout
	}

	net {
		# protocol timeout max-epoch-size max-buffers
		# connect-int ping-int sndbuf-size rcvbuf-size ko-count
		# allow-two-primaries cram-hmac-alg shared-secret after-sb-0pri
		# after-sb-1pri after-sb-2pri always-asbp rr-conflict
		# ping-timeout data-integrity-alg tcp-cork on-congestion
		# congestion-fill congestion-extents csums-alg verify-alg
		# use-rle
		protocol C;
	}
}

resource pbxdata {

meta-disk internal;
device /dev/drbd1;
disk /dev/pbxvg/pbxlv;

syncer {
  verify-alg sha1;
}

net {
  after-sb-0pri discard-least-changes;
  after-sb-1pri discard-secondary;
  after-sb-2pri disconnect;
}

on pbx01 {
  address 192.168.1.91:7789;
  node-id 0;
}

on pbx02 {
  address 192.168.1.92:7789;
  node-id 1;
}

on pbx03 {
  address 192.168.1.93:7789;
  node-id 2;
}

connection-mesh {
  #node 1,2,3 name
  hosts pbx01 pbx02 pbx03;
  net {
      use-rle no;
  }
}

}

Thanks

UpToDate and out-of-sync inconsistence

Hi all,

On drbd-9.0.23 (CentOS 8.2), we are facing an inconsistence of node status. That is, although the state of both DRBD device is UpToDate, there exists some out-of-sync value.

#drbdsetup status --verbose --statistics
 r008 node-id:0 role:Primary suspended:no
 write-ordering:flush
 volume:0 minor:8 disk:UpToDate quorum:yes
 size:109757124 read:3065 written:105640 al-writes:27 bm-writes:0
 upper-pending:0 lower-pending:0 al-suspended:no blocked:no
 fs100 node-id:1 connection:Connected role:Secondary congested:no
 ap-in-flight:0 rs-in-flight:0
 volume:0 replication:Established peer-disk:UpToDate
 resync-suspended:no
 received:0 sent:105028 out-of-sync:12 pending:0 unacked:0

I see this issue on several systems running drbd-9.0, and I made a stable reproducer (attached for reference). This repeats umount/mount of DRBD disk on primary node, and disconnect/connect on secondary node at the same time.
drbd-out-of-sync-mountumount.zip

When I run this reproducer on drbd-8.9 for comparison, I never see the issue.
On the other hand, I see this not only drbd-9.0.23, but also the latest 9.1.

Am I doing something wrong or do we have a bug in out-of-sync calculation and state transition to UpToDate?
Any feedback would be greatly appreciated.

9.1.6 drbd_req_destroy: Logic BUG

I have two secondary nodes with disks (A, B) and one diskless node acting as primary, and I wanted to know what happens if the connection between C and A/B experiences a problem.

To test that, while sending requests to the device, I drop packets on node C going to A and then B as well, then re-enable the connections, like this:

fio &
iptables -I OUTPUT -d nodeA -j DROP
sleep 20
iptables -I OUTPUT -d nodeB -j DROP
sleep 20
iptables -D OUTPUT -d nodeA -j DROP
sleep 20
iptables -D OUTPUT -d nodeB -j DROP

Configuration:

common {
    options {
        on-no-data-accessible suspend-io;
        on-no-quorum suspend-io;
    }
    disk {
        c-plan-ahead 7;
        c-fill-target 10M;
        c-min-rate 4M;
        c-max-rate 3000M;
    }
    net {
        max-buffers             36k;
        sndbuf-size            1024k;
        rcvbuf-size            2048k;
        verify-alg crc32;
    }
}

resource u12215 {
    protocol C;
    device minor 854;
    disk /dev/nodeC/u12215;
    meta-disk /dev/nodeC/md-u12215;
    on nodeA { node-id 10; address ipv4 nodeA:30854; }
    on nodeB { node-id 11; address ipv4 nodeB:30854; }
    on nodeC { node-id 7; address ipv4 nodeC:30854; disk none; }
    connection-mesh { hosts nodeA nodeB nodeC; }
}


When sending read requests, the result is drbd854: IO ERROR: neither local nor remote data, sector xxx, and it returns the error to fio (even though the configuration forbids that).

Feb 18 00:20:16 nodeC kernel: drbd: initialized. Version: 9.1.6 (api:2/proto:110-121)
Feb 18 00:20:16 nodeC kernel: drbd: GIT-hash: f85adeb71f16a0aead1e875665e2fb68852d94eb build by root@nodeC, 2022-02-17 21:05:39
Feb 18 00:20:16 nodeC kernel: drbd: registered as block device major 147
Feb 18 00:20:23 nodeC kernel: drbd u12215: Starting worker thread (from drbdsetup [5840])
Feb 18 00:20:23 nodeC kernel: drbd u12215 nodeA: Starting sender thread (from drbdsetup [5847])
Feb 18 00:20:23 nodeC kernel: drbd u12215 nodeB: Starting sender thread (from drbdsetup [5849])
Feb 18 00:20:23 nodeC kernel: drbd u12215 nodeA: conn( StandAlone -> Unconnected )
Feb 18 00:20:23 nodeC kernel: drbd u12215 nodeA: Starting receiver thread (from drbd_w_u12215 [5841])
Feb 18 00:20:23 nodeC kernel: drbd u12215 nodeA: conn( Unconnected -> Connecting )
Feb 18 00:20:23 nodeC kernel: drbd u12215 nodeB: conn( StandAlone -> Unconnected )
Feb 18 00:20:23 nodeC kernel: drbd u12215 nodeB: Starting receiver thread (from drbd_w_u12215 [5841])
Feb 18 00:20:23 nodeC kernel: drbd u12215 nodeB: conn( Unconnected -> Connecting )
Feb 18 00:20:23 nodeC kernel: drbd u12215 nodeA: Handshake to peer 10 successful: Agreed network protocol version 121
Feb 18 00:20:23 nodeC kernel: drbd u12215 nodeA: Feature flags enabled on protocol level: 0xf TRIM THIN_RESYNC WRITE_SAME WRITE_ZEROES.
Feb 18 00:20:23 nodeC kernel: drbd u12215 nodeA: Starting ack_recv thread (from drbd_r_u12215 [5857])
Feb 18 00:20:23 nodeC kernel: drbd u12215 nodeB: Handshake to peer 11 successful: Agreed network protocol version 121
Feb 18 00:20:23 nodeC kernel: drbd u12215 nodeB: Feature flags enabled on protocol level: 0xf TRIM THIN_RESYNC WRITE_SAME WRITE_ZEROES.
Feb 18 00:20:23 nodeC kernel: drbd u12215 nodeB: Starting ack_recv thread (from drbd_r_u12215 [5858])
Feb 18 00:20:23 nodeC kernel: drbd u12215: Preparing cluster-wide state change 447733307 (7->10 499/146)
Feb 18 00:20:23 nodeC kernel: drbd u12215/0 drbd854: size = 68 GB (71688192 KB)
Feb 18 00:20:23 nodeC kernel: drbd u12215: State change 447733307: primary_nodes=0, weak_nodes=0
Feb 18 00:20:23 nodeC kernel: drbd u12215: Committing cluster-wide state change 447733307 (25ms)
Feb 18 00:20:23 nodeC kernel: drbd u12215 nodeA: conn( Connecting -> Connected ) peer( Unknown -> Secondary )
Feb 18 00:20:23 nodeC kernel: drbd u12215/0 drbd854 nodeA: pdsk( DUnknown -> UpToDate ) repl( Off -> Established )
Feb 18 00:20:23 nodeC kernel: drbd u12215: Preparing cluster-wide state change 595176349 (7->11 499/146)
Feb 18 00:20:23 nodeC kernel: drbd u12215: State change 595176349: primary_nodes=0, weak_nodes=0
Feb 18 00:20:23 nodeC kernel: drbd u12215: Committing cluster-wide state change 595176349 (28ms)
Feb 18 00:20:23 nodeC kernel: drbd u12215 nodeB: conn( Connecting -> Connected ) peer( Unknown -> Secondary )
Feb 18 00:20:23 nodeC kernel: drbd u12215/0 drbd854 nodeB: pdsk( DUnknown -> UpToDate ) repl( Off -> Established )

Feb 18 00:20:44 nodeC kernel: drbd u12215 nodeA: sock was shut down by peer
Feb 18 00:20:44 nodeC kernel: drbd u12215 nodeA: conn( Connected -> BrokenPipe ) peer( Secondary -> Unknown )
Feb 18 00:20:44 nodeC kernel: drbd u12215/0 drbd854 nodeA: pdsk( UpToDate -> DUnknown ) repl( Established -> Off )
Feb 18 00:20:44 nodeC kernel: drbd u12215 nodeA: ack_receiver terminated
Feb 18 00:20:44 nodeC kernel: drbd u12215 nodeA: Terminating ack_recv thread
Feb 18 00:20:44 nodeC kernel: drbd u12215 nodeA: Terminating sender thread
Feb 18 00:20:44 nodeC kernel: drbd u12215 nodeA: Starting sender thread (from drbd_r_u12215 [5857])
Feb 18 00:20:44 nodeC kernel: drbd u12215 nodeA: Connection closed
Feb 18 00:20:44 nodeC kernel: drbd u12215 nodeA: helper command: /sbin/drbdadm disconnected
Feb 18 00:20:44 nodeC kernel: drbd u12215 nodeA: helper command: /sbin/drbdadm disconnected exit code 0
Feb 18 00:20:44 nodeC kernel: drbd u12215 nodeA: conn( BrokenPipe -> Unconnected )
Feb 18 00:20:44 nodeC kernel: drbd u12215 nodeA: Restarting receiver thread
Feb 18 00:20:44 nodeC kernel: drbd u12215 nodeA: conn( Unconnected -> Connecting )

Feb 18 00:21:05 nodeC kernel: drbd u12215 nodeB: PingAck did not arrive in time.
Feb 18 00:21:05 nodeC kernel: drbd u12215 nodeB: conn( Connected -> NetworkFailure ) peer( Secondary -> Unknown )
Feb 18 00:21:05 nodeC kernel: drbd u12215/0 drbd854 nodeB: pdsk( UpToDate -> DUnknown ) repl( Established -> Off )
Feb 18 00:21:05 nodeC kernel: drbd u12215 nodeB: ack_receiver terminated
Feb 18 00:21:05 nodeC kernel: drbd u12215 nodeB: Terminating ack_recv thread
Feb 18 00:21:05 nodeC kernel: drbd u12215 nodeB: Terminating sender thread
Feb 18 00:21:05 nodeC kernel: drbd u12215 nodeB: Starting sender thread (from drbd_r_u12215 [5858])
Feb 18 00:21:05 nodeC kernel: drbd u12215 nodeB: Connection closed
Feb 18 00:21:05 nodeC kernel: drbd u12215/0 drbd854: IO ERROR: neither local nor remote data, sector 15032400+8
Feb 18 00:21:05 nodeC kernel: drbd u12215 nodeB: helper command: /sbin/drbdadm disconnected
Feb 18 00:21:05 nodeC kernel: drbd u12215 nodeB: helper command: /sbin/drbdadm disconnected exit code 0
Feb 18 00:21:05 nodeC kernel: drbd u12215 nodeB: conn( NetworkFailure -> Unconnected )
Feb 18 00:21:05 nodeC kernel: drbd u12215 nodeB: Restarting receiver thread
Feb 18 00:21:05 nodeC kernel: drbd u12215 nodeB: conn( Unconnected -> Connecting )

Feb 18 00:21:14 nodeC kernel: drbd u12215 nodeA: Handshake to peer 10 successful: Agreed network protocol version 121
Feb 18 00:21:14 nodeC kernel: drbd u12215 nodeA: Feature flags enabled on protocol level: 0xf TRIM THIN_RESYNC WRITE_SAME WRITE_ZEROES.
Feb 18 00:21:14 nodeC kernel: drbd u12215 nodeA: Starting ack_recv thread (from drbd_r_u12215 [5857])
Feb 18 00:21:14 nodeC kernel: drbd u12215: Preparing cluster-wide state change 31959748 (7->10 499/146)
Feb 18 00:21:14 nodeC kernel: drbd u12215: State change 31959748: primary_nodes=0, weak_nodes=0
Feb 18 00:21:14 nodeC kernel: drbd u12215: Committing cluster-wide state change 31959748 (24ms)
Feb 18 00:21:14 nodeC kernel: drbd u12215 nodeA: conn( Connecting -> Connected ) peer( Unknown -> Secondary )
Feb 18 00:21:14 nodeC kernel: drbd u12215/0 drbd854 nodeA: pdsk( DUnknown -> UpToDate ) repl( Off -> Established )

Feb 18 00:21:43 nodeC kernel: drbd u12215 nodeB: Handshake to peer 11 successful: Agreed network protocol version 121
Feb 18 00:21:43 nodeC kernel: drbd u12215 nodeB: Feature flags enabled on protocol level: 0xf TRIM THIN_RESYNC WRITE_SAME WRITE_ZEROES.
Feb 18 00:21:43 nodeC kernel: drbd u12215 nodeB: Starting ack_recv thread (from drbd_r_u12215 [5858])
Feb 18 00:21:43 nodeC kernel: drbd u12215: Preparing cluster-wide state change 1385702553 (7->11 499/146)
Feb 18 00:21:43 nodeC kernel: drbd u12215: State change 1385702553: primary_nodes=0, weak_nodes=0
Feb 18 00:21:43 nodeC kernel: drbd u12215: Committing cluster-wide state change 1385702553 (28ms)
Feb 18 00:21:43 nodeC kernel: drbd u12215 nodeB: conn( Connecting -> Connected ) peer( Unknown -> Secondary )
Feb 18 00:21:43 nodeC kernel: drbd u12215/0 drbd854 nodeB: pdsk( DUnknown -> UpToDate ) repl( Off -> Established )

Using write requests, the logic BUG appears:

Feb 18 00:26:21 nodeC kernel: drbd u12215 nodeA: PingAck did not arrive in time.
Feb 18 00:26:21 nodeC kernel: drbd u12215 nodeA: conn( Connected -> NetworkFailure ) peer( Secondary -> Unknown )
Feb 18 00:26:21 nodeC kernel: drbd u12215/0 drbd854 nodeA: pdsk( UpToDate -> DUnknown ) repl( Established -> Off )
Feb 18 00:26:21 nodeC kernel: drbd u12215 nodeA: ack_receiver terminated
Feb 18 00:26:21 nodeC kernel: drbd u12215 nodeA: Terminating ack_recv thread
Feb 18 00:26:21 nodeC kernel: drbd u12215 nodeA: Terminating sender thread
Feb 18 00:26:21 nodeC kernel: drbd u12215 nodeA: Starting sender thread (from drbd_r_u12215 [5857])
Feb 18 00:26:21 nodeC kernel: drbd u12215/0 drbd854: sending new current UUID: DF9A66D16D3C4651
Feb 18 00:26:21 nodeC kernel: drbd u12215 nodeA: Connection closed
Feb 18 00:26:21 nodeC kernel: drbd u12215 nodeA: helper command: /sbin/drbdadm disconnected
Feb 18 00:26:21 nodeC kernel: drbd u12215 nodeA: helper command: /sbin/drbdadm disconnected exit code 0
Feb 18 00:26:21 nodeC kernel: drbd u12215 nodeA: conn( NetworkFailure -> Unconnected )
Feb 18 00:26:21 nodeC kernel: drbd u12215 nodeA: Restarting receiver thread
Feb 18 00:26:21 nodeC kernel: drbd u12215 nodeA: conn( Unconnected -> Connecting )

Feb 18 00:26:31 nodeC kernel: drbd u12215 nodeB: Preparing remote state change 3466013136

Feb 18 00:26:51 nodeC kernel: drbd u12215 nodeB: sock was shut down by peer
Feb 18 00:26:51 nodeC kernel: drbd u12215: susp-io( no -> no-disk)
Feb 18 00:26:51 nodeC kernel: drbd u12215 nodeB: conn( Connected -> BrokenPipe ) peer( Secondary -> Unknown )
Feb 18 00:26:51 nodeC kernel: drbd u12215/0 drbd854 nodeB: pdsk( UpToDate -> DUnknown ) repl( Established -> Off )
Feb 18 00:26:51 nodeC kernel: drbd u12215 nodeB: ack_receiver terminated
Feb 18 00:26:51 nodeC kernel: drbd u12215/0 drbd854 nodeA: helper command: /sbin/drbdadm pri-on-incon-degr
Feb 18 00:26:51 nodeC kernel: drbd u12215 nodeB: Terminating ack_recv thread
Feb 18 00:26:51 nodeC kernel: drbd u12215/0 drbd854 nodeA: helper command: /sbin/drbdadm pri-on-incon-degr exit code 0
Feb 18 00:26:51 nodeC kernel: drbd u12215/0 drbd854 nodeB: helper command: /sbin/drbdadm pri-on-incon-degr
Feb 18 00:26:51 nodeC kernel: drbd u12215/0 drbd854 nodeB: helper command: /sbin/drbdadm pri-on-incon-degr exit code 0
Feb 18 00:26:51 nodeC kernel: drbd u12215 nodeB: Terminating sender thread
Feb 18 00:26:51 nodeC kernel: drbd u12215 nodeB: Starting sender thread (from drbd_r_u12215 [5858])
Feb 18 00:26:51 nodeC kernel: drbd u12215 nodeB: Aborting remote state change 3466013136 commit not possible
Feb 18 00:26:51 nodeC kernel: drbd u12215 nodeB: Connection closed
Feb 18 00:26:51 nodeC kernel: drbd u12215 nodeB: helper command: /sbin/drbdadm disconnected
Feb 18 00:26:51 nodeC kernel: drbd u12215 nodeB: helper command: /sbin/drbdadm disconnected exit code 0
Feb 18 00:26:51 nodeC kernel: drbd u12215 nodeB: conn( BrokenPipe -> Unconnected )
Feb 18 00:26:51 nodeC kernel: drbd u12215 nodeB: Restarting receiver thread
Feb 18 00:26:51 nodeC kernel: drbd u12215 nodeB: conn( Unconnected -> Connecting )
Feb 18 00:26:51 nodeC kernel: drbd u12215 nodeA: Handshake to peer 10 successful: Agreed network protocol version 121
Feb 18 00:26:51 nodeC kernel: drbd u12215 nodeA: Feature flags enabled on protocol level: 0xf TRIM THIN_RESYNC WRITE_SAME WRITE_ZEROES.
Feb 18 00:26:51 nodeC kernel: drbd u12215 nodeA: Starting ack_recv thread (from drbd_r_u12215 [5857])
Feb 18 00:26:51 nodeC kernel: drbd u12215: Preparing cluster-wide state change 2236164137 (7->10 499/145)
Feb 18 00:26:51 nodeC kernel: drbd u12215: State change 2236164137: primary_nodes=80, weak_nodes=FFFFFFFFFFFFFB7F
Feb 18 00:26:51 nodeC kernel: drbd u12215: Committing cluster-wide state change 2236164137 (48ms)
Feb 18 00:26:51 nodeC kernel: drbd u12215 nodeA: conn( Connecting -> Connected ) peer( Unknown -> Secondary )
Feb 18 00:26:51 nodeC kernel: drbd u12215/0 drbd854 nodeA: pdsk( DUnknown -> Inconsistent ) repl( Off -> Established ) resync-susp( no -> peer )
Feb 18 00:26:51 nodeC kernel: drbd u12215: susp-io( no-disk -> no)

Feb 18 00:26:53 nodeC kernel: drbd u12215/0 drbd854 nodeA: pdsk( Inconsistent -> Outdated ) resync-susp( peer -> no )

Feb 18 00:27:01 nodeC kernel: drbd u12215: Two-phase commit 0 timeout

Feb 18 00:27:11 nodeC kernel: drbd u12215 nodeB: Handshake to peer 11 successful: Agreed network protocol version 121
Feb 18 00:27:11 nodeC kernel: drbd u12215 nodeB: Feature flags enabled on protocol level: 0xf TRIM THIN_RESYNC WRITE_SAME WRITE_ZEROES.
Feb 18 00:27:11 nodeC kernel: drbd u12215 nodeB: Starting ack_recv thread (from drbd_r_u12215 [5858])
Feb 18 00:27:11 nodeC kernel: drbd u12215: Preparing cluster-wide state change 2438872345 (7->11 499/145)
Feb 18 00:27:11 nodeC kernel: drbd u12215: State change 2438872345: primary_nodes=80, weak_nodes=FFFFFFFFFFFFF37F
Feb 18 00:27:11 nodeC kernel: drbd u12215: Committing cluster-wide state change 2438872345 (23ms)
Feb 18 00:27:11 nodeC kernel: drbd u12215 nodeB: conn( Connecting -> Connected ) peer( Unknown -> Secondary )
Feb 18 00:27:11 nodeC kernel: drbd u12215/0 drbd854 nodeB: pdsk( DUnknown -> Outdated ) repl( Off -> Established )
Feb 18 00:27:11 nodeC kernel: drbd u12215/0 drbd854 nodeB: pdsk( Outdated -> UpToDate )

Feb 18 00:27:58 nodeC kernel: drbd u12215/0 drbd854 nodeB: Remote failed to finish a request within 87644ms > ko-count (7) * timeout (60 * 0.1s)
Feb 18 00:27:58 nodeC kernel: drbd u12215: susp-io( no -> no-disk)
Feb 18 00:27:58 nodeC kernel: drbd u12215 nodeB: conn( Connected -> Timeout ) peer( Secondary -> Unknown )
Feb 18 00:27:58 nodeC kernel: drbd u12215/0 drbd854 nodeB: pdsk( UpToDate -> DUnknown ) repl( Established -> Off )
Feb 18 00:27:58 nodeC kernel: drbd u12215 nodeB: ack_receiver terminated
Feb 18 00:27:58 nodeC kernel: drbd u12215/0 drbd854 nodeA: helper command: /sbin/drbdadm pri-on-incon-degr
Feb 18 00:27:58 nodeC kernel: drbd u12215 nodeB: Terminating ack_recv thread
Feb 18 00:27:58 nodeC kernel: drbd u12215/0 drbd854 nodeA: helper command: /sbin/drbdadm pri-on-incon-degr exit code 0
Feb 18 00:27:58 nodeC kernel: drbd u12215/0 drbd854 nodeB: helper command: /sbin/drbdadm pri-on-incon-degr
Feb 18 00:27:58 nodeC kernel: drbd u12215/0 drbd854 nodeB: helper command: /sbin/drbdadm pri-on-incon-degr exit code 0
Feb 18 00:27:58 nodeC kernel: drbd u12215 nodeB: Terminating sender thread
Feb 18 00:27:58 nodeC kernel: drbd u12215 nodeB: Starting sender thread (from drbd_r_u12215 [5858])
Feb 18 00:27:58 nodeC kernel: drbd u12215 nodeB: Connection closed
Feb 18 00:27:58 nodeC kernel: drbd u12215 nodeA: Preparing remote state change 546097680
Feb 18 00:27:58 nodeC kernel: drbd u12215 nodeB: helper command: /sbin/drbdadm disconnected
Feb 18 00:27:58 nodeC kernel: drbd u12215 nodeB: helper command: /sbin/drbdadm disconnected exit code 0
Feb 18 00:27:58 nodeC kernel: drbd u12215 nodeB: conn( Timeout -> Unconnected )
Feb 18 00:27:58 nodeC kernel: drbd u12215 nodeB: Restarting receiver thread
Feb 18 00:27:58 nodeC kernel: drbd u12215 nodeB: conn( Unconnected -> Connecting )
Feb 18 00:27:58 nodeC kernel: drbd u12215 nodeA: Committing remote state change 546097680 (primary_nodes=80)
Feb 18 00:27:58 nodeC kernel: drbd u12215/0 drbd854 nodeB: pdsk( DUnknown -> Outdated )
Feb 18 00:27:58 nodeC kernel: drbd u12215 nodeB: Handshake to peer 11 successful: Agreed network protocol version 121
Feb 18 00:27:58 nodeC kernel: drbd u12215 nodeB: Feature flags enabled on protocol level: 0xf TRIM THIN_RESYNC WRITE_SAME WRITE_ZEROES.
Feb 18 00:27:58 nodeC kernel: drbd u12215 nodeB: Starting ack_recv thread (from drbd_r_u12215 [5858])
Feb 18 00:27:58 nodeC kernel: drbd u12215: Preparing cluster-wide state change 3267636032 (7->11 499/145)
Feb 18 00:27:58 nodeC kernel: drbd u12215: State change 3267636032: primary_nodes=80, weak_nodes=FFFFFFFFFFFFF37F
Feb 18 00:27:58 nodeC kernel: drbd u12215: Committing cluster-wide state change 3267636032 (28ms)
Feb 18 00:27:58 nodeC kernel: drbd u12215 nodeB: conn( Connecting -> Connected ) peer( Unknown -> Secondary )
Feb 18 00:27:58 nodeC kernel: drbd u12215/0 drbd854 nodeB: repl( Off -> Established )
Feb 18 00:27:58 nodeC kernel: drbd u12215: susp-io( no-disk -> no)
Feb 18 00:27:58 nodeC kernel: drbd u12215/0 drbd854 nodeB: pdsk( Outdated -> UpToDate )

Feb 18 00:28:43 nodeC kernel: drbd u12215/0 drbd854 nodeB: Remote failed to finish a request within 132700ms > ko-count (7) * timeout (60 * 0.1s)
Feb 18 00:28:43 nodeC kernel: drbd u12215: susp-io( no -> no-disk)
Feb 18 00:28:43 nodeC kernel: drbd u12215 nodeB: conn( Connected -> Timeout ) peer( Secondary -> Unknown )
Feb 18 00:28:43 nodeC kernel: drbd u12215/0 drbd854 nodeB: pdsk( UpToDate -> DUnknown ) repl( Established -> Off )
Feb 18 00:28:43 nodeC kernel: drbd u12215 nodeB: ack_receiver terminated
Feb 18 00:28:43 nodeC kernel: drbd u12215/0 drbd854 nodeA: helper command: /sbin/drbdadm pri-on-incon-degr
Feb 18 00:28:43 nodeC kernel: drbd u12215 nodeB: Terminating ack_recv thread
Feb 18 00:28:43 nodeC kernel: drbd u12215/0 drbd854 nodeA: helper command: /sbin/drbdadm pri-on-incon-degr exit code 0
Feb 18 00:28:43 nodeC kernel: drbd u12215/0 drbd854 nodeB: helper command: /sbin/drbdadm pri-on-incon-degr
Feb 18 00:28:43 nodeC kernel: drbd u12215/0 drbd854 nodeB: helper command: /sbin/drbdadm pri-on-incon-degr exit code 0
Feb 18 00:28:43 nodeC kernel: drbd u12215/0 drbd854: drbd_req_destroy: Logic BUG rq_state: 284000, completion_ref = 1

Feb 18 00:29:09 nodeC kernel: watchdog: BUG: soft lockup - CPU#0 stuck for 23s! [drbd_s_u12215:7045]
Feb 18 00:29:09 nodeC kernel: Modules linked in: dm_mod vhost_net vhost vhost_iotlb tap tun drbd_transport_tcp(OE) drbd(OE) nfsv3 nfs_acl nfs lockd grace fscache netconsole ixgbe mdio dca mlx4_ib mlx4_en nf_defrag_ipv6 nf_defrag_ipv4 bridge stp llc mlx4_core nft_compat nft_counter nf_tables nfnetlink rpcrdma rdma_ucm
b_srpt ib_isert iscsi_target_mod target_core_mod ib_iser libiscsi scsi_transport_iscsi rdma_cm ib_umad iw_cm ib_ipoib ib_cm ib_uverbs ib_core sunrpc vfat fat intel_rapl_msr iTCO_wdt iTCO_vendor_support mxm_wmi dcdbas intel_rapl_common sb_edac x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm irqbypass crct1
0dif_pclmul crc32_pclmul mgag200 ghash_clmulni_intel i2c_algo_bit rapl drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops intel_cstate cdc_ether intel_uncore drm pcspkr usbnet joydev mii ipmi_ssif lpc_ich tg3 ipmi_si ipmi_devintf ipmi_msghandler mei_me mei wmi acpi_power_meter xfs libcrc32c raid1 sr_mod cdro
m sd_mod t10_pi sg ahci libahci crc32c_intel libata megaraid_sas
Feb 18 00:29:09 nodeC kernel: [last unloaded: drbd]
Feb 18 00:29:09 nodeC kernel: CPU: 0 PID: 7045 Comm: drbd_s_u12215 Tainted: G           OE    --------- -  - 4.18.0-348.12.2.el8_5.x86_64 #1
Feb 18 00:29:09 nodeC kernel: Hardware name: Dell Inc. PowerEdge R730/0WCJNT, BIOS 2.13.0 05/14/2021
Feb 18 00:29:09 nodeC kernel: RIP: 0010:drbd_sender+0x208/0x3c0 [drbd]
Feb 18 00:29:09 nodeC kernel: Code: be 05 00 00 00 48 89 ef 48 8d 4c 24 10 e8 10 57 01 00 48 8b 53 10 b8 00 fe ff ff f0 0f c1 82 30 01 00 00 fb 66 0f 1f 44 00 00 <48> 83 7c 24 10 00 74 0d 48 8d 74 24 10 4c 89 ff e8 f3 55 01 00 48
Feb 18 00:29:09 nodeC kernel: RSP: 0018:ffffb97b59197e80 EFLAGS: 00000247 ORIG_RAX: ffffffffffffff13
Feb 18 00:29:09 nodeC kernel: RAX: 0000000000000200 RBX: ffffa12913dba000 RCX: 000000000000031c
Feb 18 00:29:09 nodeC kernel: RDX: ffffa12767670000 RSI: 0000000000284000 RDI: ffffa1268f5d7e78
Feb 18 00:29:09 nodeC kernel: RBP: ffffa1268f5d7cb8 R08: ffffa1268f5d7cce R09: 0000000000000000
Feb 18 00:29:09 nodeC kernel: R10: ffffa12913dba000 R11: ffffa12913dba050 R12: ffffa12913dba4f0
Feb 18 00:29:09 nodeC kernel: R13: ffffa1261f8aa000 R14: ffffa12913dba048 R15: ffffa127629da000
Feb 18 00:29:09 nodeC kernel: FS:  0000000000000000(0000) GS:ffffa1247f600000(0000) knlGS:0000000000000000
Feb 18 00:29:09 nodeC kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Feb 18 00:29:09 nodeC kernel: CR2: 000000c001da3010 CR3: 000000223b210003 CR4: 00000000003726f0
Feb 18 00:29:09 nodeC kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Feb 18 00:29:09 nodeC kernel: DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Feb 18 00:29:09 nodeC kernel: Call Trace:
Feb 18 00:29:09 nodeC kernel: ? __drbd_next_peer_device_ref+0x140/0x140 [drbd]
Feb 18 00:29:09 nodeC kernel: drbd_thread_setup+0x5e/0x160 [drbd]
Feb 18 00:29:09 nodeC kernel: ? __drbd_next_peer_device_ref+0x140/0x140 [drbd]
Feb 18 00:29:09 nodeC kernel: kthread+0x116/0x130
Feb 18 00:29:09 nodeC kernel: ? kthread_flush_work_fn+0x10/0x10
Feb 18 00:29:09 nodeC kernel: ret_from_fork+0x35/0x40


[root@nodeC log]# cat /sys/block/drbd854/inflight
       0        5


I can consistently reproduce these bugs.

drbd_transport_rdma incompatible with OFED drivers

Despite https://kb.linbit.com/enabling-rdma-support-in-linux telling that it should work with OFED, the drbd dkms is not actually compiled against the correct headers.

Used ppa.launchpad.net/linbit/linbit-drbd9-stack/

drbd_transport_rdma: disagrees about version of symbol ib_dealloc_pd_user
drbd_transport_rdma: Unknown symbol ib_dealloc_pd_user (err -22)

even if building from source and correcting this

os-hv-am-3: conn( Unconnected -> Connecting )
Jun 12 20:28:11 os-hv-am-4 kernel: [  245.703350] BUG: kernel NULL pointer dereference, address: 0000000000000000
Jun 12 20:28:11 os-hv-am-4 kernel: [  245.704772] #PF: supervisor instruction fetch in kernel mode
Jun 12 20:28:11 os-hv-am-4 kernel: [  245.706154] #PF: error_code(0x0010) - not-present page
Jun 12 20:28:11 os-hv-am-4 kernel: [  245.707532] PGD 0 P4D 0 
Jun 12 20:28:11 os-hv-am-4 kernel: [  245.708902] Oops: 0010 [#2] SMP NOPTI
Jun 12 20:28:11 os-hv-am-4 kernel: [  245.710264] CPU: 30 PID: 887 Comm: kworker/30:1 Tainted: G      D    OE     5.4.0-150-generic #167-Ubuntu
Jun 12 20:28:11 os-hv-am-4 kernel: [  245.711671] Hardware name: Dell Inc. PowerEdge R6515/0R4CNN, BIOS 2.7.3 03/31/2022
Jun 12 20:28:11 os-hv-am-4 kernel: [  245.713101] Workqueue: events dtr_cma_connect_work_fn [drbd_transport_rdma]
Jun 12 20:28:11 os-hv-am-4 kernel: [  245.714509] RIP: 0010:0x0
Jun 12 20:28:11 os-hv-am-4 kernel: [  245.715854] Code: Bad RIP value.
Jun 12 20:28:11 os-hv-am-4 kernel: [  245.717178] RSP: 0018:ffffb67bc247bbd8 EFLAGS: 00010246
Jun 12 20:28:11 os-hv-am-4 kernel: [  245.718464] RAX: 0000000000000000 RBX: ffff9b040b142000 RCX: 0000000000008000
Jun 12 20:28:11 os-hv-am-4 kernel: [  245.719732] RDX: ffffb67bc247bc08 RSI: ffffb67bc247bc90 RDI: ffff9b04216e0000
Jun 12 20:28:11 os-hv-am-4 kernel: [  245.720973] RBP: ffffb67bc247bde8 R08: 0000000000000014 R09: 8080808080808080
Jun 12 20:28:11 os-hv-am-4 kernel: [  245.722181] R10: ffff9b04bcefb980 R11: 0000000000102000 R12: 0000000000028000
Jun 12 20:28:11 os-hv-am-4 kernel: [  245.723357] R13: ffff9b042427a200 R14: ffff9b040b142000 R15: ffff9b042427a200
Jun 12 20:28:11 os-hv-am-4 kernel: [  245.724503] FS:  0000000000000000(0000) GS:ffff9b04fdb80000(0000) knlGS:0000000000000000
Jun 12 20:28:11 os-hv-am-4 kernel: [  245.725635] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jun 12 20:28:11 os-hv-am-4 kernel: [  245.726735] CR2: ffffffffffffffd6 CR3: 000000bc9abf0000 CR4: 0000000000340ee0
Jun 12 20:28:11 os-hv-am-4 kernel: [  245.727819] Call Trace:
Jun 12 20:28:11 os-hv-am-4 kernel: [  245.728899]  dtr_cm_alloc_rdma_res+0x87/0x5c0 [drbd_transport_rdma]
Jun 12 20:28:11 os-hv-am-4 kernel: [  245.730003]  ? update_dl_rq_load_avg+0x1d7/0x2c0
Jun 12 20:28:11 os-hv-am-4 kernel: [  245.731104]  ? sched_clock_cpu+0x11/0xb0
Jun 12 20:28:11 os-hv-am-4 kernel: [  245.732207]  ? dbs_update_util_handler+0x1b/0x80
Jun 12 20:28:11 os-hv-am-4 kernel: [  245.733307]  ? cpufreq_dbs_governor_start+0x180/0x180
Jun 12 20:28:11 os-hv-am-4 kernel: [  245.734400]  ? update_blocked_averages+0x11c/0x590
Jun 12 20:28:11 os-hv-am-4 kernel: [  245.735487]  ? sched_clock+0x9/0x10
Jun 12 20:28:11 os-hv-am-4 kernel: [  245.736565]  ? update_load_avg+0x7c/0x670
Jun 12 20:28:11 os-hv-am-4 kernel: [  245.737639]  ? update_load_avg+0x7c/0x670
Jun 12 20:28:11 os-hv-am-4 kernel: [  245.738715]  ? set_next_entity+0xb5/0x200
Jun 12 20:28:11 os-hv-am-4 kernel: [  245.739775]  dtr_path_prepare+0x111/0x240 [drbd_transport_rdma]
Jun 12 20:28:11 os-hv-am-4 kernel: [  245.740816]  dtr_cma_connect_work_fn+0x93/0x180 [drbd_transport_rdma]
Jun 12 20:28:11 os-hv-am-4 kernel: [  245.741856]  process_one_work+0x1eb/0x3b0
Jun 12 20:28:11 os-hv-am-4 kernel: [  245.742864]  worker_thread+0x4d/0x400
Jun 12 20:28:11 os-hv-am-4 kernel: [  245.743861]  kthread+0x104/0x140
Jun 12 20:28:11 os-hv-am-4 kernel: [  245.744845]  ? process_one_work+0x3b0/0x3b0
Jun 12 20:28:11 os-hv-am-4 kernel: [  245.745823]  ? kthread_park+0x90/0x90
Jun 12 20:28:11 os-hv-am-4 kernel: [  245.746797]  ret_from_fork+0x35/0x40
Jun 12 20:28:11 os-hv-am-4 kernel: [  245.747763] Modules linked in: drbd_transport_rdma(OE) drbd(OE) lru_cache nf_conntrack_netlink xfrm_user xt_addrtype br_netfilter xt_CHECKSUM xt_MASQUERADE xt_conntrack ipt_REJECT nf_reject_ipv4 xt_tcpudp ip6table_mangle ip6table_nat iptable_mangle iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nf_tables nfnetlink ip6table_filter ip6_tables iptable_filter bpfilter aufs cuse overlay rdma_ucm(OE) ib_ucm(OE) rdma_cm(OE) iw_cm(OE) ib_ipoib(OE) ib_cm(OE) ib_umad(OE) esp6_offload esp6 esp4_offload esp4 xfrm_algo mlx5_fpga_tools(OE) mlx5_ib(OE) ib_uverbs(OE) mlx4_ib(OE) ib_core(OE) nls_iso8859_1 ipmi_ssif amd64_edac_mod edac_mce_amd kvm_amd kvm crct10dif_pclmul mgag200 drm_vram_helper ttm ghash_clmulni_intel drm_kms_helper dell_smbios input_leds joydev i2c_algo_bit aesni_intel fb_sys_fops dcdbas syscopyarea crypto_simd cryptd sysfillrect dell_wmi_descriptor wmi_bmof sysimgblt glue_helper ccp k10temp ipmi_si ipmi_devintf ipmi_msghandler acpi_power_meter mac_hid bridge stp llc
Jun 12 20:28:11 os-hv-am-4 kernel: [  245.747802]  bonding sch_fq_codel ramoops knem(OE) reed_solomon drm efi_pstore ip_tables x_tables autofs4 raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid0 multipath linear mlx4_en(OE) hid_generic usbhid hid raid1 mlx5_core(OE) crc32_pclmul mlx4_core(OE) tg3 ahci nvme libahci tls nvme_core mlxfw(OE) mlx_compat(OE) i2c_piix4 wmi
Jun 12 20:28:11 os-hv-am-4 kernel: [  245.761929] CR2: 0000000000000000
Jun 12 20:28:11 os-hv-am-4 kernel: [  245.763213] ---[ end trace 7f4eaf44141a7de5 ]---
Jun 12 20:28:11 os-hv-am-4 kernel: [  245.850615] RIP: 0010:0x0
Jun 12 20:28:11 os-hv-am-4 kernel: [  245.851892] Code: Bad RIP value.
Jun 12 20:28:11 os-hv-am-4 kernel: [  245.853137] RSP: 0018:ffffb67bc24d3bd8 EFLAGS: 00010246
Jun 12 20:28:11 os-hv-am-4 kernel: [  245.854381] RAX: 0000000000000000 RBX: ffff9b040b142000 RCX: 0000000000008000
Jun 12 20:28:11 os-hv-am-4 kernel: [  245.855632] RDX: ffffb67bc24d3c08 RSI: ffffb67bc24d3c90 RDI: ffff9b04216e0000
Jun 12 20:28:11 os-hv-am-4 kernel: [  245.856894] RBP: ffffb67bc24d3de8 R08: 0000000000000014 R09: 8080808080808080
Jun 12 20:28:11 os-hv-am-4 kernel: [  245.858156] R10: ffff9b04bcefb980 R11: 0000000000102000 R12: 0000000000028000
Jun 12 20:28:11 os-hv-am-4 kernel: [  245.859410] R13: ffff9b049f822c00 R14: ffff9b040b142000 R15: ffff9b049f822c00
Jun 12 20:28:11 os-hv-am-4 kernel: [  245.860661] FS:  0000000000000000(0000) GS:ffff9b04fdb80000(0000) knlGS:0000000000000000
Jun 12 20:28:11 os-hv-am-4 kernel: [  245.861926] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jun 12 20:28:11 os-hv-am-4 kernel: [  245.863190] CR2: ffffffffffffffd6 CR3: 000000bc9abf0000 CR4: 0000000000340ee0

it seems to be non compatible

9.19.1: Build missing v8 header

I'm trying to build 9.19.1, however it's failing on not finding a header in the v8 directory:

drbd-utils-9.19.1 $ ./configure --prefix=/usr --build=x86_64-pc-linux-gnu --host=x86_64-pc-linux-gnu --mandir=/usr/share/man --infodir=/usr/share/info --datadir=/usr/share --sysconfd
ir=/etc --localstatedir=/var/lib --docdir=/usr/share/doc/drbd-utils-9.19.1 --htmldir=/usr/share/doc/drbd-utils-                                                                                                     
9.19.1/html --libdir=/usr/lib64 --localstatedir=/var --with-bashcompletion --with-distro=gentoo --with-prebuiltman --without-rgmanager --without-pacemaker --with-udev --without-xen                                
checking for x86_64-pc-linux-gnu-gcc... x86_64-pc-linux-gnu-gcc                                                                                                                                                     
checking whether the C compiler works... yes                                                                                                                                                                        
checking for C compiler default output file name... a.out                                                                                                                                                           
checking for suffix of executables...                                                                     
checking whether we are cross compiling... no                                                                                                                                                                       
checking for suffix of object files... o                                                                  
checking whether the compiler supports GNU C... yes                                                       
checking whether x86_64-pc-linux-gnu-gcc accepts -g... yes                                                                                                                                                          
checking for x86_64-pc-linux-gnu-gcc option to enable C11 features... none needed                                                                                                                                   
checking for getentropy... yes                                                                                                                                                                                      
checking for gethostbyname_r... yes                                                                                                                                                                                 
checking for __free_fn_t... yes                                                                           
checking for x86_64-pc-linux-gnu-pkg-config... /usr/bin/x86_64-pc-linux-gnu-pkg-config                                                                                                                              
checking pkg-config is at least version 0.9.0... yes                                                      
configure: Could not detect systemd unit directory                                                        
Using systemd unit directory:                                                                             
Using udev rules directory: /lib/udev                                                                                                                                                                               
checking for x86_64-pc-linux-gnu-gcc... (cached) x86_64-pc-linux-gnu-gcc                                                                                                                                            
checking whether the compiler supports GNU C... (cached) yes                                              
checking whether x86_64-pc-linux-gnu-gcc accepts -g... (cached) yes
checking whether x86_64-pc-linux-gnu-gcc accepts -g... (cached) yes                                                                                                                                                 
checking for x86_64-pc-linux-gnu-gcc option to enable C11 features... (cached) none needed                                                                                                                          
checking whether ln -s works... yes                                                                                                     
checking for sed... /bin/sed                                                                                                            
checking for grep... /bin/grep                                                                                                                                                                                      
checking for flex... /usr/bin/flex                                                                                                      
checking for rpmbuild... no                                 
checking for xsltproc... /usr/bin/xsltproc                                                                                              
checking for clitest... no                                    
checking for tar... /bin/tar                                                                                                            
checking for git... /usr/bin/git                                                                                                                                                                                    
checking for po4a-translate... /usr/bin/po4a-translate                                                                                                                                                              
checking for po4a-gettextize... /usr/bin/po4a-gettextize                                                                                                                                                            
checking for dpkg-buildpackage... /usr/bin/dpkg-buildpackage                                                                                                                                                        
checking for udevadm... /bin/udevadm                                                                                                                                                                                
checking for udevinfo... false                                                                                                                                                                                      
checking for x86_64-pc-linux-gnu-g++... x86_64-pc-linux-gnu-g++                                                                         
checking whether the compiler supports GNU C++... yes                                                                                   
checking whether x86_64-pc-linux-gnu-g++ accepts -g... yes                                                                              
checking for x86_64-pc-linux-gnu-g++ option to enable C++11 features... none needed                                                                                                                                 
checking whether x86_64-pc-linux-gnu-g++ supports C++11 features by default... yes                                                                                                                                  
checking for clock_gettime, timer_create, timer_settime, timer_delete in -lrt... yes                                                                                                                                
configure: WARNING: No rpmbuild found, building RPM packages is disabled.                                                                                                                                           
configure: WARNING: Cannot run tests without clitest, disabling test target.                                                                                                                                        
checking for stdio.h... yes                                                                                                             
checking for stdlib.h... yes                                                                                                                                                                                        
checking for string.h... yes                                                                                                            
checking for inttypes.h... yes                                                                                                                                                                                      
checking for stdint.h... yes                                        
checking for strings.h... yes                                                                                                                                                                                       
checking for sys/stat.h... yes                                                                                                          
checking for sys/types.h... yes                                                                                                                                                                                     
checking for unistd.h... yes                             
checking for linux/genetlink.h... yes                                                                                                   
checking for /etc/redhat-release... no                                                                                                  
checking for /etc/debian_version... no                          
checking for /etc/SuSE-release... no                                                                                                                                                                                
configure: WARNING: Unable to determine what distribution we are running on. Distribution-specific features will be disabled.                                                                                       
configure: creating ./config.status                                                                                                     
config.status: creating Makefile                                                                                                        
config.status: creating user/shared/Makefile                                                                                            
config.status: creating user/v9/Makefile                                                                                                
config.status: creating user/v83/Makefile                                                                                                                                                                           
config.status: creating user/v84/Makefile                                                                                               
config.status: creating scripts/Makefile                                                                                                                                                                            
config.status: creating documentation/v83/Makefile                                                                                                                                                                  
config.status: creating scripts/drbd.rules                                                                                              
config.status: creating user/windrbd/Makefile       
config.status: creating user/drbdmon/Makefile                                                                                                                                                                       
config.status: creating documentation/common/Makefile_v84_com                                                                                                                                                       
config.status: creating documentation/common/Makefile_v9_com                                                                            
config.status: creating user/shared/config.h
drbd-utils-9.19.1 $ make                                                                                                                                                              
make[1]: Entering directory '/home/buildd/tmp-drbd/drbd-utils-9.19.1/user/shared'                                                                                                                                  
flex -s -odrbdmeta_scanner.c drbdmeta_scanner.fl                                                          
./drbd_buildtag.sh drbd_buildtag.h                                                                        
+ calldir=/home/buildd/tmp-drbd/drbd-utils-9.19.1/user/shared                                            
+++ dirname ./drbd_buildtag.sh                                                                                                                                                                                      
++ cd .                                                                                                                                                                                                             
++ pwd -P                                                                                                                                                                                                           
+ cd /home/buildd/tmp-drbd/drbd-utils-9.19.1/user/shared                                                                                                                                                           
+ [[ drbd_buildtag.h =~ drbd_buildtag\.h$ ]]                                                                                                                                                                        
+ drbd_buildtag_h /home/buildd/tmp-drbd/drbd-utils-9.19.1/user/shared/drbd_buildtag.h                                                                                                                              
+ local out=/home/buildd/tmp-drbd/drbd-utils-9.19.1/user/shared/drbd_buildtag.h                                                                                                                                    
+ set -e                                                                                                  
+ exec                                                                                                                                                                                                              
+ echo -e '/* automatically generated. DO NOT EDIT. */'                                                   
+ test -e ../../.git                                                                                      
+ test -e /home/buildd/tmp-drbd/drbd-utils-9.19.1/user/shared/drbd_buildtag.h                                                                                                                                      
+ grep GITHASH /home/buildd/tmp-drbd/drbd-utils-9.19.1/user/shared/drbd_buildtag.h                                                                                                                                 
+ grep GITDIFF /home/buildd/tmp-drbd/drbd-utils-9.19.1/user/shared/drbd_buildtag.h                                                                                                                                 
+ mv -f /home/buildd/tmp-drbd/drbd-utils-9.19.1/user/shared/drbd_buildtag.h.new /home/buildd/tmp-drbd/drbd-utils-9.19.1/user/shared/drbd_buildtag.h                                                               
+ exit 0                                                                                                  
./drbd_buildtag.sh drbd_buildtag.c                                                                                                                                                                                  
+ calldir=/home/buildd/tmp-drbd/drbd-utils-9.19.1/user/shared                                            
+++ dirname ./drbd_buildtag.sh                                                                            
++ cd .                                                                                                   
++ pwd -P                                                                                                                                                                                                           
+ cd /home/buildd/tmp-drbd/drbd-utils-9.19.1/user/shared                                                                                                                                                           
+ [[ drbd_buildtag.c =~ drbd_buildtag\.h$ ]]                                                              
+ [[ drbd_buildtag.c =~ drbd_buildtag\.c$ ]]
+ drbd_buildtag_c /home/buildd/tmp-drbd/drbd-utils-9.19.1/user/shared/drbd_buildtag.c                                                                                                                              
+ local out=/home/buildd/tmp-drbd/drbd-utils-9.19.1/user/shared/drbd_buildtag.c                                                        
+ set -e                                                                                                                                
+ exec                                                                                                                                                                                                              
+ echo -e '/* automatically generated. DO NOT EDIT. */'                                                                                 
+ echo -e '#include "drbd_buildtag.h"'                      
+ echo -e 'const char *drbd_buildtag(void)\n{'                                                                                          
+ echo -e '\treturn "GIT-hash: " GITHASH GITDIFF'             
+ '[' -z '' ']'                                                                                                                         
++ date '+%F %T'                                                                                                                                                                                                    
+ buildinfo='build by buildd@localhost, 2021-12-01 14:48:18'                                                                                                                                                      
+ echo -e '\t\t" build by buildd@localhost, 2021-12-01 14:48:18";\n}'                                                                                                                                             
+ mv -f /home/buildd/tmp-drbd/drbd-utils-9.19.1/user/shared/drbd_buildtag.c.new /home/buildd/tmp-drbd/drbd-utils-9.19.1/user/shared/drbd_buildtag.c                                                               
+ exit 0                                                                                                                                                                                                            
make[1]: Leaving directory '/home/buildd/tmp-drbd/drbd-utils-9.19.1/user/shared'                                                                                                                                   
make[1]: Entering directory '/home/buildd/tmp-drbd/drbd-utils-9.19.1/user/v9'                                                          
../shared/drbdmeta_linux.c:59:10: fatal error: drbd_strings.h: No such file or directory                                                
   59 | #include "drbd_strings.h"                                                                                                       
      |          ^~~~~~~~~~~~~~~~                                                                                                                                                                                   
compilation terminated.                                                                                                                                                                                             
../shared/shared_tool.c:33:10: fatal error: linux/drbd.h: No such file or directory                                                                                                                                 
   33 | #include "linux/drbd.h"                                                                                                                                                                                     
      |          ^~~~~~~~~~~~~~                                                                                                                                                                                     
compilation terminated.                                                                                                                 
../shared/drbdmeta.c:60:10: fatal error: drbd_strings.h: No such file or directory                                                                                                                                  
   60 | #include "drbd_strings.h"                                                                                                       
      |          ^~~~~~~~~~~~~~~~                                                                                                                                                                                   
compilation terminated.                                             
drbdsetup_events2.c:37:10: fatal error: drbd_protocol.h: No such file or directory                                                                                                                                  
   37 | #include "drbd_protocol.h"                                                                                                      
      |          ^~~~~~~~~~~~~~~~~                                                                                                                                                                                  
compilation terminated.                                  
config_flags.c:10:10: fatal error: linux/drbd.h: No such file or directory                                                              
   10 | #include "linux/drbd.h"                                                                                                         
      |          ^~~~~~~~~~~~~~                                 
compilation terminated.                                                                                                                                                                                             
drbdsetup.c:72:10: fatal error: drbd_strings.h: No such file or directory                                                                                                                                           
   72 | #include "drbd_strings.h"                                                                                                       
      |          ^~~~~~~~~~~~~~~~                                                                                                       
compilation terminated.                                                                                                                 
drbdadm_usage_cnt.c:46:10: fatal error: linux/drbd.h: No such file or directory                                                         
   46 | #include "linux/drbd.h"         /* only use DRBD_MAGIC from here! */                                                                                                                                        
      |          ^~~~~~~~~~~~~~                                                                                                         
compilation terminated.                                                                                                                                                                                             
drbdadm_main.c:53:10: fatal error: linux/drbd.h: No such file or directory                                                                                                                                          
   53 | #include "linux/drbd.h"                                                                                                         
      |          ^~~~~~~~~~~~~~                     
compilation terminated.                                                                                                                                                                                             
drbdadm_parser.c:41:10: fatal error: linux/drbd.h: No such file or directory                                                                                                                                        
   41 | #include "linux/drbd.h"                                                                                                         
      |          ^~~~~~~~~~~~~~                                                                           
compilation terminated.                                                                                   
flex -s -odrbdadm_scanner.c drbdadm_scanner.fl                                                                                                                                                                      
../shared/drbdmeta_linux.c:59:10: fatal error: drbd_strings.h: No such file or directory                                                                                                                            
   59 | #include "drbd_strings.h"                                                                         
      |          ^~~~~~~~~~~~~~~~                                                                         
compilation terminated.                                                                                   
../shared/shared_tool.c:33:10: fatal error: linux/drbd.h: No such file or directory                                                                                                                                 
   33 | #include "linux/drbd.h"                                                                                                                                                                                     
      |          ^~~~~~~~~~~~~~                                                                                                                                                                                     
compilation terminated.                                                                                                                                                                                             
../shared/drbdmeta.c:60:10: fatal error: drbd_strings.h: No such file or directory                                                                                                                                  
   60 | #include "drbd_strings.h"                                                                                                                                                                                   
      |          ^~~~~~~~~~~~~~~~                                                                                                                                                                                   
compilation terminated.                                                                                   
drbdsetup_events2.c:37:10: fatal error: drbd_protocol.h: No such file or directory                                                                                                                                  
   37 | #include "drbd_protocol.h"                                                                        
      |          ^~~~~~~~~~~~~~~~~                                                                        
compilation terminated.                                                                                                                                                                                             
config_flags.c:10:10: fatal error: linux/drbd.h: No such file or directory                                                                                                                                          
   10 | #include "linux/drbd.h"                                                                                                                                                                                     
      |          ^~~~~~~~~~~~~~                                                                                                                                                                                     
compilation terminated.                                                                                   
drbdsetup.c:72:10: fatal error: drbd_strings.h: No such file or directory                                                                                                                                           
   72 | #include "drbd_strings.h"                                                                         
      |          ^~~~~~~~~~~~~~~~                                                                         
compilation terminated.                                                                                   
drbdadm_usage_cnt.c:46:10: fatal error: linux/drbd.h: No such file or directory                                                                                                                                     
   46 | #include "linux/drbd.h"         /* only use DRBD_MAGIC from here! */                                                                                                                                        
      |          ^~~~~~~~~~~~~~                                                                           
compilation terminated.
drbdadm_main.c:53:10: fatal error: linux/drbd.h: No such file or directory                                                                                                                                          
   53 | #include "linux/drbd.h"                                     
      |          ^~~~~~~~~~~~~~                                                                                                                                                                                     
compilation terminated.                                                                                                                 
drbdadm_parser.c:41:10: fatal error: linux/drbd.h: No such file or directory                                                                                                                                        
   41 | #include "linux/drbd.h"                             
      |          ^~~~~~~~~~~~~~                                                                                                         
compilation terminated.                                                                                                                 
flex -s -odrbdadm_scanner.c drbdadm_scanner.fl                  
../shared/drbdmeta_linux.c:59:10: fatal error: drbd_strings.h: No such file or directory                                                                                                                            
   59 | #include "drbd_strings.h"                                                                                                                                                                                   
      |          ^~~~~~~~~~~~~~~~                                                                                                       
compilation terminated.                                                                                                                 
../shared/shared_tool.c:33:10: fatal error: linux/drbd.h: No such file or directory                                                     
   33 | #include "linux/drbd.h"                                                                                                         
      |          ^~~~~~~~~~~~~~                                                                                                                                                                                     
compilation terminated.                                                                                                                 
../shared/drbdmeta.c:60:10: fatal error: drbd_strings.h: No such file or directory                                                                                                                                  
   60 | #include "drbd_strings.h"                                                                                                                                                                                   
      |          ^~~~~~~~~~~~~~~~                                                                                                       
compilation terminated.                             
drbdsetup_events2.c:37:10: fatal error: drbd_protocol.h: No such file or directory                                                                                                                                  
   37 | #include "drbd_protocol.h"                                                                                                                                                                                  
      |          ^~~~~~~~~~~~~~~~~                                                                                                      
compilation terminated.                                                                                                                                                                                             
config_flags.c:10:10: fatal error: linux/drbd.h: No such file or directory                                                                                                                                          
   10 | #include "linux/drbd.h"                                                                                                                                                                                     
      |          ^~~~~~~~~~~~~~                                                                                                                                                                                     
compilation terminated.                                                                                   
drbdsetup.c:72:10: fatal error: drbd_strings.h: No such file or directory                                                                                                                                           
   72 | #include "drbd_strings.h"                                                                         
      |          ^~~~~~~~~~~~~~~~                                                                                                                                                                                   
compilation terminated.                                                                                                                                                                                             
drbdadm_usage_cnt.c:46:10: fatal error: linux/drbd.h: No such file or directory                                                                                                                                     
   46 | #include "linux/drbd.h"         /* only use DRBD_MAGIC from here! */                                                                                                                                        
      |          ^~~~~~~~~~~~~~                                                                                                                                                                                     
compilation terminated.                                                                                                                                                                                             
drbdadm_main.c:53:10: fatal error: linux/drbd.h: No such file or directory                                                                                                                                          
   53 | #include "linux/drbd.h"                                                                                                                                                                                     
      |          ^~~~~~~~~~~~~~                                                                                                                                                                                     
compilation terminated.                                                                                   
drbdadm_parser.c:41:10: fatal error: linux/drbd.h: No such file or directory                                                                                                                                        
   41 | #include "linux/drbd.h"                                                                                                                                                                                     
      |          ^~~~~~~~~~~~~~                                                                                                                                                                                     
compilation terminated.                                                                                                                                                                                             
x86_64-pc-linux-gnu-gcc -g -O2 -Wall -I../../drbd-headers -I.. -I. -I../shared    -c -o drbdadm_scanner.o drbdadm_scanner.c                                                                                         
x86_64-pc-linux-gnu-gcc -g -O2 -Wall -I../../drbd-headers -I.. -I. -I../shared    -c -o drbdadm_parser.o drbdadm_parser.c                                                                                           
drbdadm_parser.c:41:10: fatal error: linux/drbd.h: No such file or directory                                                                                                                                        
   41 | #include "linux/drbd.h"                                                                                                                                                                                     
      |          ^~~~~~~~~~~~~~                                                                                                                                                                                     
compilation terminated.                                                                                   
make[1]: *** [<builtin>: drbdadm_parser.o] Error 1                                                                                                                                                                  
make[1]: Leaving directory '/home/buildd/tmp-drbd/drbd-utils-9.19.1/user/v9'                                                                                                                                       
make: *** [Makefile:90: tools] Error 2
 $ find . -name drbd_strings.h
./user/v84/drbd_strings.h
$ find . -name drbd.h
./user/v83/linux/drbd.h
./user/v84/linux/drbd.h

I can't find a reference to drbd_protocol.h.

AWS Master node terminations creates Split Brain unresolved

We have been running 2 vms in 2 AZa in AWS and sync the data using DRBD.
Currently we are finding anytime a master node is terminated when the new nodes comes up it always shows Split Brain and the cluster is getting disconnected.

we are using amzon linux2. and using drbd84-utils-9.6.0-1.el7.elrepo.x86_64.rpm which was build internally long time back.
The EBS volumes are encrypted.

Below is our r0 configuration
resource r0 {
protocol C;
startup {
wfc-timeout 15;
degr-wfc-timeout 60;
}
net {
cram-hmac-alg sha1;
shared-secret "DRBDPASW" ;
after-sb-0pri discard-least-changes;
after-sb-1pri consensus;
after-sb-2pri call-pri-lost-after-sb;
}
device /dev/drbd0;
disk /dev/sdc;
meta-disk internal;
on SELF_HA_DNS {
address SELF_HA_IP:7788;
}
on OTHER_HA_DNS {
address OTHER_HA_IP:7788;
}
}

Any time a drbd node gets terminated we assign an existing IP to the node again on eth1 and changes its hostname also to keep the configuration constant.
And we mount the same volume again to the nodes on same path. This has been working for few years.

We get below error in the console
May 23 17:26:59 ip-10-35-24-164.ec2.internal crmd[22845]: notice: Result of probe operation for Nfsd on ip-10-35-24-164.ec2.internal: 7 (not running)
May 23 17:26:59 ip-10-35-24-164.ec2.internal crmd[22845]: notice: ip-10-35-24-164.ec2.internal-Nfsd_monitor_0:20 [ ocf-exit-reason:nfs-mountd is not running\n ]
May 23 17:26:59 ip-10-35-24-164.ec2.internal crmd[22845]: notice: Result of notify operation for Data on ip-10-35-24-164.ec2.internal: 0 (ok)
May 23 17:27:17 ip-10-35-24-164.ec2.internal kernel: drbd r0: Handshake successful: Agreed network protocol version 101
May 23 17:27:17 ip-10-35-24-164.ec2.internal kernel: drbd r0: Feature flags enabled on protocol level: 0x7 TRIM THIN_RESYNC WRITE_SAME.
May 23 17:27:17 ip-10-35-24-164.ec2.internal kernel: drbd r0: Peer authenticated using 20 bytes HMAC
May 23 17:27:17 ip-10-35-24-164.ec2.internal kernel: drbd r0: conn( WFConnection -> WFReportParams )
May 23 17:27:17 ip-10-35-24-164.ec2.internal kernel: drbd r0: Starting ack_recv thread (from drbd_r_r0 [23029])
May 23 17:27:17 ip-10-35-24-164.ec2.internal kernel: block drbd0: drbd_sync_handshake:
May 23 17:27:17 ip-10-35-24-164.ec2.internal kernel: block drbd0: self 7875EB279B543400:F868A89A3AADEA22:25B3D5A6B6CFEDA4:25B2D5A6B6CFEDA4 bits:639 flags:0
May 23 17:27:17 ip-10-35-24-164.ec2.internal kernel: block drbd0: peer 4E7AF865B46E9A8F:22B34CE6215CBE32:F868A89A3AADEA23:25B3D5A6B6CFEDA4 bits:514 flags:2
May 23 17:27:17 ip-10-35-24-164.ec2.internal kernel: block drbd0: uuid_compare()=-100 by rule 100
May 23 17:27:17 ip-10-35-24-164.ec2.internal kernel: block drbd0: helper command: /sbin/drbdadm initial-split-brain minor-0
May 23 17:27:17 ip-10-35-24-164.ec2.internal kernel: block drbd0: helper command: /sbin/drbdadm initial-split-brain minor-0 exit code 0 (0x0)
May 23 17:27:17 ip-10-35-24-164.ec2.internal kernel: block drbd0: Split-Brain detected but unresolved, dropping connection!
May 23 17:27:17 ip-10-35-24-164.ec2.internal kernel: block drbd0: helper command: /sbin/drbdadm split-brain minor-0
May 23 17:27:18 ip-10-35-24-164.ec2.internal kernel: block drbd0: helper command: /sbin/drbdadm split-brain minor-0 exit code 0 (0x0)
May 23 17:27:18 ip-10-35-24-164.ec2.internal kernel: drbd r0: conn( WFReportParams -> Disconnecting )
May 23 17:27:18 ip-10-35-24-164.ec2.internal kernel: drbd r0: error receiving ReportState, e: -5 l: 0!

DEB Package broken symlink

Hi guys, building the dkms package for debian generates a broken symlink to drbd_strings.c that fails the dkms build on debian 10 systems. Building normally works ok.

centos build report erro

COMPAT sock_ops_returns_addr_len
UPD /root/drbd-9.0.30-1/drbd/compat.5.14.2-1.el7.elrepo.x86_64.h
UPD /root/drbd-9.0.30-1/drbd/compat.h
./drbd-kernel-compat/gen_compat_patch.sh: line 12: spatch: command not found
./drbd-kernel-compat/gen_compat_patch.sh: line 45: hash: spatch: not found
INFO: no suitable spatch found; trying spatch-as-a-service;
be patient, may take up to 10 minutes
if it is in the server side cache it might only take a second
SPAAS 716aa33e7a3c587aa903b52a475da653
curl: (7) Failed connect to drbd.io:2020; Connection refused
ERROR: SPAAS is not reachable! Please check if your network
configuration or some firewall prohibits access to
https://drbd.io:2020.
make[5]: *** [drbd-kernel-compat/cocci_cache/716aa33e7a3c587aa903b52a475da653/compat.patch] Error 1
make[4]: *** [/root/drbd-9.0.30-1/drbd/drbd-kernel-compat/compat.patch] Error 2
make[3]: *** [/root/drbd-9.0.30-1/drbd] Error 2
make[2]: *** [__sub-make] Error 2
make[1]: *** [kbuild] Error 2
make[1]: Leaving directory `/root/drbd-9.0.30-1/drbd'
make: *** [module] Error 2

[root@master1 drbd-9.0.30-1]# cat /etc/os-release
NAME="CentOS Linux"
VERSION="7 (Core)"
ID="centos"
ID_LIKE="rhel fedora"
VERSION_ID="7"
PRETTY_NAME="CentOS Linux 7 (Core)"
ANSI_COLOR="0;31"
CPE_NAME="cpe:/o:centos:centos:7"
HOME_URL="https://www.centos.org/"
BUG_REPORT_URL="https://bugs.centos.org/"

CENTOS_MANTISBT_PROJECT="CentOS-7"
CENTOS_MANTISBT_PROJECT_VERSION="7"
REDHAT_SUPPORT_PRODUCT="centos"
REDHAT_SUPPORT_PRODUCT_VERSION="7"

[root@master1 drbd-9.0.30-1]# uname -a
Linux master1 5.14.2-1.el7.elrepo.x86_64 #1 SMP Tue Sep 7 09:32:38 EDT 2021 x86_64 x86_64 x86_64 GNU/Linux
[root@master1 drbd-9.0.30-1]#

DRBD resource stuck in SyncTarget state and unable to proceed with Syncing

Hi,
After a serial/rolling upgrade of k8s cluster, one of the DRBD resources was found stuck in SyncTarget.


Linstor version -
[root@flex-103 ~]# k exec --namespace=piraeus deployment/piraeus-op-piraeus-operator-cs-controller -- linstor --version
linstor 1.13.0; GIT-hash: 840cf57c75c166659509e22447b2c0ca6377ee6d

DRBD version -
[root@flex-103 ~]# k exec -n piraeus piraeus-op-piraeus-operator-ns-node-jrhwv -c linstor-satellite -- drbdadm --version
DRBDADM_BUILDTAG=GIT-hash:\ 087ee6b4961ca154d76e4211223b03149373bed8\ build\ by\ @buildsystem,\ 2022-01-28\ 12:19:33
DRBDADM_API_VERSION=2
DRBD_KERNEL_VERSION_CODE=0x090106
DRBD_KERNEL_VERSION=9.1.6
DRBDADM_VERSION_CODE=0x091402
DRBDADM_VERSION=9.20.2

Piraeus 1.8.0


Setup details -

K8s cluster is a 3 nodes setup - 2 disk nodes and 1 Diskless node with Protocol C replication.
Disk node with InUse resource and SyncTarget is flex-106 (shorted as 106)
Disk node with Unused resource and UpToDate is flex-107 (shorted as 107)
Diskless node with Unused resource is flex-108 (shorted as 108)

Some relevant info -

[root@flex-103 ~]# k exec --namespace=piraeus deployment/piraeus-op-piraeus-operator-cs-controller -- linstor r l | grep pvc-353143c5-e55d-4c75-98be-5248124dd160
| pvc-353143c5-e55d-4c75-98be-5248124dd160 | flex-106.dr.avaya.com | 7009 | InUse  | Ok    | SyncTarget(64.11%) | 2022-05-25 22:59:58 |
| pvc-353143c5-e55d-4c75-98be-5248124dd160 | flex-107.dr.avaya.com | 7009 | Unused | Ok    |           UpToDate | 2022-05-25 23:00:00 |
| pvc-353143c5-e55d-4c75-98be-5248124dd160 | flex-108.dr.avaya.com | 7009 | Unused | Ok    |           Diskless | 2022-05-25 22:59:59 |


DRBD logs from the disk node (flex-106) that has this PVC resource stuck in SyncTarget -

[root@flex-106 ccmuser]# grep pvc-353143c5-e55d-4c75-98be-5248124dd160  /var/log/messages 
May 25 17:51:13 flex-106 kernel: drbd pvc-353143c5-e55d-4c75-98be-5248124dd160 flex-107.dr.avaya.com: conn( Connecting -> Connected ) peer( Unknown -> Secondary )
May 25 17:51:13 flex-106 kernel: drbd pvc-353143c5-e55d-4c75-98be-5248124dd160/0 drbd1009: quorum( no -> yes )
May 25 17:51:13 flex-106 kernel: drbd pvc-353143c5-e55d-4c75-98be-5248124dd160/0 drbd1009 flex-107.dr.avaya.com: pdsk( DUnknown -> UpToDate ) repl( Off -> Established )
May 25 19:14:05 flex-106 kernel: drbd pvc-353143c5-e55d-4c75-98be-5248124dd160/0 drbd1009: Would lose quorum, but using tiebreaker logic to keep
May 25 19:14:05 flex-106 kernel: drbd pvc-353143c5-e55d-4c75-98be-5248124dd160: Preparing cluster-wide state change 3657581006 (0->1 496/16)
May 25 19:14:05 flex-106 kernel: drbd pvc-353143c5-e55d-4c75-98be-5248124dd160: State change 3657581006: primary_nodes=4, weak_nodes=FFFFFFFFFFFFFFF9
May 25 19:14:05 flex-106 kernel: drbd pvc-353143c5-e55d-4c75-98be-5248124dd160: Committing cluster-wide state change 3657581006 (3ms)
May 25 19:14:05 flex-106 kernel: drbd pvc-353143c5-e55d-4c75-98be-5248124dd160 flex-107.dr.avaya.com: conn( Connected -> Disconnecting ) peer( Secondary -> Unknown )
May 25 19:14:05 flex-106 kernel: drbd pvc-353143c5-e55d-4c75-98be-5248124dd160/0 drbd1009 flex-107.dr.avaya.com: pdsk( UpToDate -> DUnknown ) repl( Established -> Off )
May 25 19:14:05 flex-106 kernel: drbd pvc-353143c5-e55d-4c75-98be-5248124dd160 flex-107.dr.avaya.com: ack_receiver terminated
May 25 19:14:05 flex-106 kernel: drbd pvc-353143c5-e55d-4c75-98be-5248124dd160 flex-107.dr.avaya.com: Terminating ack_recv thread
May 25 19:14:05 flex-106 kernel: drbd pvc-353143c5-e55d-4c75-98be-5248124dd160 flex-107.dr.avaya.com: Terminating sender thread
May 25 19:14:05 flex-106 kernel: drbd pvc-353143c5-e55d-4c75-98be-5248124dd160 flex-107.dr.avaya.com: Starting sender thread (from drbd_r_pvc-3531 [7369])
May 25 19:14:06 flex-106 kernel: drbd pvc-353143c5-e55d-4c75-98be-5248124dd160 flex-107.dr.avaya.com: Connection closed
May 25 19:14:06 flex-106 kernel: drbd pvc-353143c5-e55d-4c75-98be-5248124dd160 flex-107.dr.avaya.com: conn( Disconnecting -> StandAlone )
May 25 19:14:06 flex-106 kernel: drbd pvc-353143c5-e55d-4c75-98be-5248124dd160 flex-107.dr.avaya.com: Terminating receiver thread
May 25 19:14:06 flex-106 kernel: drbd pvc-353143c5-e55d-4c75-98be-5248124dd160 flex-107.dr.avaya.com: conn( StandAlone -> Unconnected )
May 25 19:14:06 flex-106 kernel: drbd pvc-353143c5-e55d-4c75-98be-5248124dd160 flex-107.dr.avaya.com: Starting receiver thread (from drbd_w_pvc-3531 [7349])
May 25 19:14:06 flex-106 kernel: drbd pvc-353143c5-e55d-4c75-98be-5248124dd160 flex-107.dr.avaya.com: conn( Unconnected -> Connecting )
May 25 19:14:06 flex-106 kernel: drbd pvc-353143c5-e55d-4c75-98be-5248124dd160 flex-107.dr.avaya.com: Handshake to peer 1 successful: Agreed network protocol version 121
May 25 19:14:06 flex-106 kernel: drbd pvc-353143c5-e55d-4c75-98be-5248124dd160 flex-107.dr.avaya.com: Feature flags enabled on protocol level: 0xf TRIM THIN_RESYNC WRITE_SAME WRITE_ZEROES.
May 25 19:14:06 flex-106 kernel: drbd pvc-353143c5-e55d-4c75-98be-5248124dd160 flex-107.dr.avaya.com: Peer authenticated using 20 bytes HMAC
May 25 19:14:06 flex-106 kernel: drbd pvc-353143c5-e55d-4c75-98be-5248124dd160 flex-107.dr.avaya.com: Starting ack_recv thread (from drbd_r_pvc-3531 [83213])
May 25 19:14:06 flex-106 kernel: drbd pvc-353143c5-e55d-4c75-98be-5248124dd160: Preparing cluster-wide state change 484053172 (0->1 499/146)
May 25 19:14:06 flex-106 kernel: drbd pvc-353143c5-e55d-4c75-98be-5248124dd160/0 drbd1009 flex-107.dr.avaya.com: drbd_sync_handshake:
May 25 19:14:06 flex-106 kernel: drbd pvc-353143c5-e55d-4c75-98be-5248124dd160/0 drbd1009 flex-107.dr.avaya.com: self 93E3EF437B3C9C06:0000000000000000:CB4D79960DC2245A:43030D49A0AAB1B8 bits:0 flags:100
May 25 19:14:06 flex-106 kernel: drbd pvc-353143c5-e55d-4c75-98be-5248124dd160/0 drbd1009 flex-107.dr.avaya.com: peer 2744834DBA493B50:0000000000000000:93E3EF437B3C9C06:CB4D79960DC2245A bits:2589 flags:1100
May 25 19:14:06 flex-106 kernel: drbd pvc-353143c5-e55d-4c75-98be-5248124dd160/0 drbd1009 flex-107.dr.avaya.com: uuid_compare()=target-set-bitmap by rule=history-peer
May 25 19:14:06 flex-106 kernel: drbd pvc-353143c5-e55d-4c75-98be-5248124dd160/0 drbd1009 flex-107.dr.avaya.com: Setting and writing one bitmap slot, after drbd_sync_handshake
May 25 19:14:06 flex-106 kernel: drbd pvc-353143c5-e55d-4c75-98be-5248124dd160: State change 484053172: primary_nodes=4, weak_nodes=FFFFFFFFFFFFFFF8
May 25 19:14:06 flex-106 kernel: drbd pvc-353143c5-e55d-4c75-98be-5248124dd160: Committing cluster-wide state change 484053172 (22ms)
May 25 19:14:06 flex-106 kernel: drbd pvc-353143c5-e55d-4c75-98be-5248124dd160 flex-107.dr.avaya.com: conn( Connecting -> Connected ) peer( Unknown -> Secondary )
May 25 19:14:06 flex-106 kernel: drbd pvc-353143c5-e55d-4c75-98be-5248124dd160/0 drbd1009: disk( Consistent -> Outdated )
May 25 19:14:06 flex-106 kernel: drbd pvc-353143c5-e55d-4c75-98be-5248124dd160/0 drbd1009 flex-107.dr.avaya.com: pdsk( DUnknown -> UpToDate ) repl( Off -> WFBitMapT )
May 25 19:14:06 flex-106 kernel: drbd pvc-353143c5-e55d-4c75-98be-5248124dd160/0 drbd1009 flex-107.dr.avaya.com: receive bitmap stats [Bytes(packets)]: plain 0(0), RLE 21(1), total 21; compression: 100.0%
May 25 19:14:06 flex-106 kernel: drbd pvc-353143c5-e55d-4c75-98be-5248124dd160/0 drbd1009 flex-107.dr.avaya.com: send bitmap stats [Bytes(packets)]: plain 0(0), RLE 21(1), total 21; compression: 100.0%
May 25 19:14:06 flex-106 kernel: drbd pvc-353143c5-e55d-4c75-98be-5248124dd160/0 drbd1009: disk( Outdated -> Inconsistent )
May 25 19:14:06 flex-106 kernel: drbd pvc-353143c5-e55d-4c75-98be-5248124dd160/0 drbd1009 flex-108.dr.avaya.com: resync-susp( no -> connection dependency )
May 25 19:14:06 flex-106 kernel: drbd pvc-353143c5-e55d-4c75-98be-5248124dd160/0 drbd1009 flex-107.dr.avaya.com: repl( WFBitMapT -> SyncTarget )
May 25 19:14:06 flex-106 kernel: drbd pvc-353143c5-e55d-4c75-98be-5248124dd160/0 drbd1009 flex-107.dr.avaya.com: Began resync as SyncTarget (will sync 2100760 KB [525190 bits set]).
May 25 19:14:13 flex-106 kernel: drbd pvc-353143c5-e55d-4c75-98be-5248124dd160: Preparing cluster-wide state change 236192581 (0->2 8176/3088)
May 25 19:14:13 flex-106 kernel: drbd pvc-353143c5-e55d-4c75-98be-5248124dd160: State change 236192581: primary_nodes=4, weak_nodes=FFFFFFFFFFFFFFF9
May 25 19:14:13 flex-106 kernel: drbd pvc-353143c5-e55d-4c75-98be-5248124dd160: Committing cluster-wide state change 236192581 (2ms)
May 25 19:14:13 flex-106 kernel: drbd pvc-353143c5-e55d-4c75-98be-5248124dd160 flex-108.dr.avaya.com: conn( Connected -> Disconnecting ) peer( Primary -> Unknown )
May 25 19:14:13 flex-106 kernel: drbd pvc-353143c5-e55d-4c75-98be-5248124dd160/0 drbd1009 flex-108.dr.avaya.com: pdsk( Diskless -> DUnknown ) repl( Established -> Off )
May 25 19:14:13 flex-106 kernel: drbd pvc-353143c5-e55d-4c75-98be-5248124dd160 flex-108.dr.avaya.com: ack_receiver terminated
May 25 19:14:13 flex-106 kernel: drbd pvc-353143c5-e55d-4c75-98be-5248124dd160 flex-108.dr.avaya.com: Terminating ack_recv thread
May 25 19:14:13 flex-106 kernel: drbd pvc-353143c5-e55d-4c75-98be-5248124dd160 flex-108.dr.avaya.com: Terminating sender thread
May 25 19:14:13 flex-106 kernel: drbd pvc-353143c5-e55d-4c75-98be-5248124dd160 flex-108.dr.avaya.com: Starting sender thread (from drbd_r_pvc-3531 [7370])
May 25 19:14:13 flex-106 kernel: drbd pvc-353143c5-e55d-4c75-98be-5248124dd160 flex-108.dr.avaya.com: Connection closed
May 25 19:14:13 flex-106 kernel: drbd pvc-353143c5-e55d-4c75-98be-5248124dd160 flex-108.dr.avaya.com: conn( Disconnecting -> StandAlone )
May 25 19:14:13 flex-106 kernel: drbd pvc-353143c5-e55d-4c75-98be-5248124dd160 flex-108.dr.avaya.com: Terminating receiver thread
May 25 19:14:14 flex-106 kernel: drbd pvc-353143c5-e55d-4c75-98be-5248124dd160 flex-108.dr.avaya.com: conn( StandAlone -> Unconnected )
May 25 19:14:14 flex-106 kernel: drbd pvc-353143c5-e55d-4c75-98be-5248124dd160 flex-108.dr.avaya.com: Starting receiver thread (from drbd_w_pvc-3531 [7349])
May 25 19:14:14 flex-106 kernel: drbd pvc-353143c5-e55d-4c75-98be-5248124dd160 flex-108.dr.avaya.com: conn( Unconnected -> Connecting )
May 25 19:14:14 flex-106 kernel: drbd pvc-353143c5-e55d-4c75-98be-5248124dd160 flex-108.dr.avaya.com: Handshake to peer 2 successful: Agreed network protocol version 121
May 25 19:14:14 flex-106 kernel: drbd pvc-353143c5-e55d-4c75-98be-5248124dd160 flex-108.dr.avaya.com: Feature flags enabled on protocol level: 0xf TRIM THIN_RESYNC WRITE_SAME WRITE_ZEROES.
May 25 19:14:14 flex-106 kernel: drbd pvc-353143c5-e55d-4c75-98be-5248124dd160 flex-108.dr.avaya.com: Peer authenticated using 20 bytes HMAC
May 25 19:14:14 flex-106 kernel: drbd pvc-353143c5-e55d-4c75-98be-5248124dd160 flex-108.dr.avaya.com: Starting ack_recv thread (from drbd_r_pvc-3531 [83820])
May 25 19:14:14 flex-106 kernel: drbd pvc-353143c5-e55d-4c75-98be-5248124dd160: Preparing cluster-wide state change 3661494316 (0->2 499/146)
May 25 19:14:14 flex-106 kernel: drbd pvc-353143c5-e55d-4c75-98be-5248124dd160: State change 3661494316: primary_nodes=4, weak_nodes=FFFFFFFFFFFFFFF8
May 25 19:14:14 flex-106 kernel: drbd pvc-353143c5-e55d-4c75-98be-5248124dd160: Committing cluster-wide state change 3661494316 (6ms)
May 25 19:14:14 flex-106 kernel: drbd pvc-353143c5-e55d-4c75-98be-5248124dd160 flex-108.dr.avaya.com: conn( Connecting -> Connected ) peer( Unknown -> Primary )
May 25 19:14:14 flex-106 kernel: drbd pvc-353143c5-e55d-4c75-98be-5248124dd160/0 drbd1009 flex-108.dr.avaya.com: pdsk( DUnknown -> Diskless ) repl( Off -> Established )
May 25 19:14:14 flex-106 kernel: drbd pvc-353143c5-e55d-4c75-98be-5248124dd160/0 drbd1009 flex-107.dr.avaya.com: strategy = target-set-bitmap
May 25 19:14:14 flex-106 kernel: drbd pvc-353143c5-e55d-4c75-98be-5248124dd160/0 drbd1009 flex-107.dr.avaya.com: drbd_sync_handshake:
May 25 19:14:14 flex-106 kernel: drbd pvc-353143c5-e55d-4c75-98be-5248124dd160/0 drbd1009 flex-107.dr.avaya.com: self 93E3EF437B3C9C06:0000000000000000:CB4D79960DC2245A:43030D49A0AAB1B8 bits:333702 flags:104
May 25 19:14:14 flex-106 kernel: drbd pvc-353143c5-e55d-4c75-98be-5248124dd160/0 drbd1009 flex-107.dr.avaya.com: peer 2744834DBA493B50:0000000000000000:93E3EF437B3C9C06:CB4D79960DC2245A bits:2589 flags:1100
May 25 19:14:14 flex-106 kernel: drbd pvc-353143c5-e55d-4c75-98be-5248124dd160/0 drbd1009 flex-107.dr.avaya.com: uuid_compare()=target-set-bitmap by rule=history-peer
May 25 19:14:14 flex-106 kernel: drbd pvc-353143c5-e55d-4c75-98be-5248124dd160/0 drbd1009 flex-107.dr.avaya.com: Setting and writing one bitmap slot, after drbd_sync_handshake
May 25 19:14:14 flex-106 kernel: drbd pvc-353143c5-e55d-4c75-98be-5248124dd160/0 drbd1009: ASSERTION current != device->resource->worker.task FAILED in drbd_bitmap_io
May 25 19:14:14 flex-106 kernel: drbd pvc-353143c5-e55d-4c75-98be-5248124dd160/0 drbd1009 flex-107.dr.avaya.com: Becoming WFBitMapT because primary is diskless
May 25 19:14:14 flex-106 kernel: drbd pvc-353143c5-e55d-4c75-98be-5248124dd160: State change failed: Can not start OV/resync since it is already active
May 25 19:14:14 flex-106 kernel: drbd pvc-353143c5-e55d-4c75-98be-5248124dd160/0 drbd1009 flex-108.dr.avaya.com: Failed: resync-susp( connection dependency -> no )
May 25 19:14:14 flex-106 kernel: drbd pvc-353143c5-e55d-4c75-98be-5248124dd160/0 drbd1009 flex-107.dr.avaya.com: Failed: repl( SyncTarget -> WFBitMapT )
May 25 19:14:14 flex-106 kernel: drbd pvc-353143c5-e55d-4c75-98be-5248124dd160/0 drbd1009 flex-107.dr.avaya.com: ...postponing this until current resync finished
May 25 19:14:22 flex-106 kubelet[2255]: I0525 19:14:22.787201    2255 reconciler.go:224] "operationExecutor.VerifyControllerAttachedVolume started for volume \"pvc-353143c5-e55d-4c75-98be-5248124dd160\" (UniqueName: \"kubernetes.io/csi/linstor.csi.linbit.com^pvc-353143c5-e55d-4c75-98be-5248124dd160\") pod \"alarming-db-service-pgo-repo-host-0\" (UID: \"617bdf9e-7c3d-4263-9799-2a89567677d3\") "
May 25 19:14:22 flex-106 kubelet[2255]: E0525 19:14:22.787266    2255 nestedpendingoperations.go:301] Operation for "{volumeName:kubernetes.io/csi/linstor.csi.linbit.com^pvc-353143c5-e55d-4c75-98be-5248124dd160 podName: nodeName:}" failed. No retries permitted until 2022-05-25 19:14:23.28724099 -0600 MDT m=+5071.728082294 (durationBeforeRetry 500ms). Error: Volume has not been added to the list of VolumesInUse in the node's volume status for volume "pvc-353143c5-e55d-4c75-98be-5248124dd160" (UniqueName: "kubernetes.io/csi/linstor.csi.linbit.com^pvc-353143c5-e55d-4c75-98be-5248124dd160") pod "alarming-db-service-pgo-repo-host-0" (UID: "617bdf9e-7c3d-4263-9799-2a89567677d3")
May 25 19:14:22 flex-106 kernel: drbd pvc-353143c5-e55d-4c75-98be-5248124dd160/0 drbd1009 flex-107.dr.avaya.com: BAD! enr=0 rs_left=-1 rs_failed=0 count=1 cstate=Connected SyncTarget
May 25 19:14:22 flex-106 kernel: drbd pvc-353143c5-e55d-4c75-98be-5248124dd160 flex-108.dr.avaya.com: peer( Primary -> Secondary )
May 25 19:14:23 flex-106 kubelet[2255]: I0525 19:14:23.293106    2255 reconciler.go:224] "operationExecutor.VerifyControllerAttachedVolume started for volume \"pvc-353143c5-e55d-4c75-98be-5248124dd160\" (UniqueName: \"kubernetes.io/csi/linstor.csi.linbit.com^pvc-353143c5-e55d-4c75-98be-5248124dd160\") pod \"alarming-db-service-pgo-repo-host-0\" (UID: \"617bdf9e-7c3d-4263-9799-2a89567677d3\") "
May 25 19:14:23 flex-106 kubelet[2255]: E0525 19:14:23.293189    2255 nestedpendingoperations.go:301] Operation for "{volumeName:kubernetes.io/csi/linstor.csi.linbit.com^pvc-353143c5-e55d-4c75-98be-5248124dd160 podName: nodeName:}" failed. No retries permitted until 2022-05-25 19:14:24.293174982 -0600 MDT m=+5072.734016279 (durationBeforeRetry 1s). Error: Volume has not been added to the list of VolumesInUse in the node's volume status for volume "pvc-353143c5-e55d-4c75-98be-5248124dd160" (UniqueName: "kubernetes.io/csi/linstor.csi.linbit.com^pvc-353143c5-e55d-4c75-98be-5248124dd160") pod "alarming-db-service-pgo-repo-host-0" (UID: "617bdf9e-7c3d-4263-9799-2a89567677d3")
May 25 19:14:24 flex-106 kubelet[2255]: I0525 19:14:24.301762    2255 reconciler.go:224] "operationExecutor.VerifyControllerAttachedVolume started for volume \"pvc-353143c5-e55d-4c75-98be-5248124dd160\" (UniqueName: \"kubernetes.io/csi/linstor.csi.linbit.com^pvc-353143c5-e55d-4c75-98be-5248124dd160\") pod \"alarming-db-service-pgo-repo-host-0\" (UID: \"617bdf9e-7c3d-4263-9799-2a89567677d3\") "
May 25 19:14:24 flex-106 kubelet[2255]: E0525 19:14:24.301836    2255 nestedpendingoperations.go:301] Operation for "{volumeName:kubernetes.io/csi/linstor.csi.linbit.com^pvc-353143c5-e55d-4c75-98be-5248124dd160 podName: nodeName:}" failed. No retries permitted until 2022-05-25 19:14:26.301822172 -0600 MDT m=+5074.742663467 (durationBeforeRetry 2s). Error: Volume has not been added to the list of VolumesInUse in the node's volume status for volume "pvc-353143c5-e55d-4c75-98be-5248124dd160" (UniqueName: "kubernetes.io/csi/linstor.csi.linbit.com^pvc-353143c5-e55d-4c75-98be-5248124dd160") pod "alarming-db-service-pgo-repo-host-0" (UID: "617bdf9e-7c3d-4263-9799-2a89567677d3")
May 25 19:14:26 flex-106 kubelet[2255]: I0525 19:14:26.319722    2255 reconciler.go:224] "operationExecutor.VerifyControllerAttachedVolume started for volume \"pvc-353143c5-e55d-4c75-98be-5248124dd160\" (UniqueName: \"kubernetes.io/csi/linstor.csi.linbit.com^pvc-353143c5-e55d-4c75-98be-5248124dd160\") pod \"alarming-db-service-pgo-repo-host-0\" (UID: \"617bdf9e-7c3d-4263-9799-2a89567677d3\") "
May 25 19:14:26 flex-106 kubelet[2255]: E0525 19:14:26.324195    2255 nestedpendingoperations.go:301] Operation for "{volumeName:kubernetes.io/csi/linstor.csi.linbit.com^pvc-353143c5-e55d-4c75-98be-5248124dd160 podName: nodeName:}" failed. No retries permitted until 2022-05-25 19:14:30.324175949 -0600 MDT m=+5078.765017244 (durationBeforeRetry 4s). Error: Volume not attached according to node status for volume "pvc-353143c5-e55d-4c75-98be-5248124dd160" (UniqueName: "kubernetes.io/csi/linstor.csi.linbit.com^pvc-353143c5-e55d-4c75-98be-5248124dd160") pod "alarming-db-service-pgo-repo-host-0" (UID: "617bdf9e-7c3d-4263-9799-2a89567677d3")
May 25 19:14:30 flex-106 kubelet[2255]: I0525 19:14:30.347525    2255 reconciler.go:224] "operationExecutor.VerifyControllerAttachedVolume started for volume \"pvc-353143c5-e55d-4c75-98be-5248124dd160\" (UniqueName: \"kubernetes.io/csi/linstor.csi.linbit.com^pvc-353143c5-e55d-4c75-98be-5248124dd160\") pod \"alarming-db-service-pgo-repo-host-0\" (UID: \"617bdf9e-7c3d-4263-9799-2a89567677d3\") "
May 25 19:14:30 flex-106 kubelet[2255]: I0525 19:14:30.354760    2255 operation_generator.go:1523] Controller attach succeeded for volume "pvc-353143c5-e55d-4c75-98be-5248124dd160" (UniqueName: "kubernetes.io/csi/linstor.csi.linbit.com^pvc-353143c5-e55d-4c75-98be-5248124dd160") pod "alarming-db-service-pgo-repo-host-0" (UID: "617bdf9e-7c3d-4263-9799-2a89567677d3") device path: ""
May 25 19:14:30 flex-106 kubelet[2255]: I0525 19:14:30.448399    2255 reconciler.go:269] "operationExecutor.MountVolume started for volume \"pvc-353143c5-e55d-4c75-98be-5248124dd160\" (UniqueName: \"kubernetes.io/csi/linstor.csi.linbit.com^pvc-353143c5-e55d-4c75-98be-5248124dd160\") pod \"alarming-db-service-pgo-repo-host-0\" (UID: \"617bdf9e-7c3d-4263-9799-2a89567677d3\") "
May 25 19:14:30 flex-106 kubelet[2255]: I0525 19:14:30.448643    2255 operation_generator.go:587] MountVolume.WaitForAttach entering for volume "pvc-353143c5-e55d-4c75-98be-5248124dd160" (UniqueName: "kubernetes.io/csi/linstor.csi.linbit.com^pvc-353143c5-e55d-4c75-98be-5248124dd160") pod "alarming-db-service-pgo-repo-host-0" (UID: "617bdf9e-7c3d-4263-9799-2a89567677d3") DevicePath ""
May 25 19:14:30 flex-106 kubelet[2255]: I0525 19:14:30.453864    2255 operation_generator.go:597] MountVolume.WaitForAttach succeeded for volume "pvc-353143c5-e55d-4c75-98be-5248124dd160" (UniqueName: "kubernetes.io/csi/linstor.csi.linbit.com^pvc-353143c5-e55d-4c75-98be-5248124dd160") pod "alarming-db-service-pgo-repo-host-0" (UID: "617bdf9e-7c3d-4263-9799-2a89567677d3") DevicePath "csi-a9e5b60d7e750fc54fdefe9a0b376ceb20ade3d5650f9c5533faaf827c6ac10e"
May 25 19:14:30 flex-106 kubelet[2255]: I0525 19:14:30.457200    2255 operation_generator.go:630] MountVolume.MountDevice succeeded for volume "pvc-353143c5-e55d-4c75-98be-5248124dd160" (UniqueName: "kubernetes.io/csi/linstor.csi.linbit.com^pvc-353143c5-e55d-4c75-98be-5248124dd160") pod "alarming-db-service-pgo-repo-host-0" (UID: "617bdf9e-7c3d-4263-9799-2a89567677d3") device mount path "/var/lib/kubelet/plugins/kubernetes.io/csi/pv/pvc-353143c5-e55d-4c75-98be-5248124dd160/globalmount"
May 25 19:14:30 flex-106 kernel: drbd pvc-353143c5-e55d-4c75-98be-5248124dd160: Preparing cluster-wide state change 266334151 (0->-1 3/1)
May 25 19:14:30 flex-106 kernel: drbd pvc-353143c5-e55d-4c75-98be-5248124dd160: State change 266334151: primary_nodes=1, weak_nodes=FFFFFFFFFFFFFFF8
May 25 19:14:30 flex-106 kernel: drbd pvc-353143c5-e55d-4c75-98be-5248124dd160: Committing cluster-wide state change 266334151 (1ms)
May 25 19:14:30 flex-106 kernel: drbd pvc-353143c5-e55d-4c75-98be-5248124dd160: role( Secondary -> Primary )
May 25 19:14:30 flex-106 kernel: drbd pvc-353143c5-e55d-4c75-98be-5248124dd160: role( Primary -> Secondary )
May 25 19:14:30 flex-106 kernel: drbd pvc-353143c5-e55d-4c75-98be-5248124dd160: Preparing cluster-wide state change 195577500 (0->-1 3/1)
May 25 19:14:30 flex-106 kernel: drbd pvc-353143c5-e55d-4c75-98be-5248124dd160: State change 195577500: primary_nodes=1, weak_nodes=FFFFFFFFFFFFFFF8
May 25 19:14:30 flex-106 kernel: drbd pvc-353143c5-e55d-4c75-98be-5248124dd160: Committing cluster-wide state change 195577500 (1ms)
May 25 19:14:30 flex-106 kernel: drbd pvc-353143c5-e55d-4c75-98be-5248124dd160: role( Secondary -> Primary )
May 25 19:14:30 flex-106 kubelet[2255]: I0525 19:14:30.682158    2255 operation_generator.go:712] MountVolume.SetUp succeeded for volume "pvc-353143c5-e55d-4c75-98be-5248124dd160" (UniqueName: "kubernetes.io/csi/linstor.csi.linbit.com^pvc-353143c5-e55d-4c75-98be-5248124dd160") pod "alarming-db-service-pgo-repo-host-0" (UID: "617bdf9e-7c3d-4263-9799-2a89567677d3")
May 25 20:00:05 flex-106 kernel: drbd pvc-353143c5-e55d-4c75-98be-5248124dd160/0 drbd1009 flex-107.dr.avaya.com: BAD! enr=1 rs_left=-55 rs_failed=0 count=55 cstate=Connected SyncTarget

[root@flex-103 cust]# k exec -n piraeus piraeus-op-piraeus-operator-ns-node-4vmfm -c linstor-satellite -- drbdadm dstate pvc-353143c5-e55d-4c75-98be-5248124dd160
Inconsistent/Diskless/UpToDate
[root@flex-103 cust]# k exec -n piraeus piraeus-op-piraeus-operator-ns-node-4vmfm -c linstor-satellite -- drbdadm cstate pvc-353143c5-e55d-4c75-98be-5248124dd160
Connected
Connected
[root@flex-103 ~]# k exec -n piraeus piraeus-op-piraeus-operator-ns-node-4vmfm -c linstor-satellite -- drbdadm status pvc-353143c5-e55d-4c75-98be-5248124dd160 --verbose
drbdsetup status pvc-353143c5-e55d-4c75-98be-5248124dd160
pvc-353143c5-e55d-4c75-98be-5248124dd160 role:Primary
  disk:Inconsistent
  flex-107.dr.avaya.com role:Secondary
    replication:SyncTarget peer-disk:UpToDate done:64.12
  flex-108.dr.avaya.com role:Secondary
    peer-disk:Diskless peer-client:yes resync-suspended:dependency


[root@flex-103 ~]# k exec -n piraeus piraeus-op-piraeus-operator-ns-node-4vmfm -c linstor-satellite -- drbdsetup status pvc-353143c5-e55d-4c75-98be-5248124dd160
pvc-353143c5-e55d-4c75-98be-5248124dd160 role:Primary
  disk:Inconsistent
  flex-107.dr.avaya.com role:Secondary
    replication:SyncTarget peer-disk:UpToDate done:64.12
  flex-108.dr.avaya.com role:Secondary
    peer-disk:Diskless peer-client:yes resync-suspended:dependency


Pod using this PVC is deployed on disk node with this SyncTarget resource -

[root@flex-103 ~]# k get pods -o wide | grep alarming-db-service-pgo-repo-host-0
alarming-db-service-pgo-repo-host-0                               1/1     Running     0               6h8m    10.200.56.55     flex-106.dr.avaya.com   <none>           <none>
[root@flex-103 cust]# k exec -n piraeus piraeus-op-piraeus-operator-ns-node-4vmfm -c linstor-satellite -- drbdsetup show pvc-353143c5-e55d-4c75-98be-5248124dd160 
resource "pvc-353143c5-e55d-4c75-98be-5248124dd160" {
    options {
        quorum          	majority;
        on-no-quorum    	io-error;
    }
    _this_host {
        node-id			0;
        volume 0 {
            device			minor 1009;
            disk			"/dev/vg_sds/pvc-353143c5-e55d-4c75-98be-5248124dd160_00000";
            meta-disk			internal;
            disk {
                rs-discard-granularity	262144; # bytes
            }
        }
    }
    connection {
        _peer_node_id 1;
        path {
            _this_host ipv4 10.129.185.106:7009;
            _remote_host ipv4 10.129.185.107:7009;
        }
        net {
            max-epoch-size  	10000;
            cram-hmac-alg   	"sha1";
            shared-secret   	"HK+1vTym1caM9w4aR3uh";
            verify-alg      	"crct10dif-pclmul";
            max-buffers     	10000;
            _name           	"flex-107.dr.avaya.com";
        }
    }
    connection {
        _peer_node_id 2;
        path {
            _this_host ipv4 10.129.185.106:7009;
            _remote_host ipv4 10.129.185.108:7009;
        }
        net {
            max-epoch-size  	10000;
            cram-hmac-alg   	"sha1";
            shared-secret   	"HK+1vTym1caM9w4aR3uh";
            verify-alg      	"crct10dif-pclmul";
            max-buffers     	10000;
            _name           	"flex-108.dr.avaya.com";
        }
        volume 0 {
            disk {
                bitmap          	no;
            }
        }
    }
}

[root@flex-103 ~]# k get pods -n piraeus -o wide | grep ns
piraeus-op-piraeus-operator-ns-node-4vmfm                     2/2     Running   4 (7h35m ago)   9h      10.129.185.106   flex-106.dr.avaya.com   <none>           <none>
piraeus-op-piraeus-operator-ns-node-jrhwv                     2/2     Running   2 (168m ago)    9h      10.129.185.108   flex-108.dr.avaya.com   <none>           <none>
piraeus-op-piraeus-operator-ns-node-nhrxp                     2/2     Running   2 (171m ago)    9h      10.129.185.107   flex-107.dr.avaya.com   <none>           <none>


To avoid cluttering, I have attached all logs related to this PVC in logs here -

Here are attached drbd states, DRBD kernel logs, sc def, and linstor r l output files

drbadm-status-verbose-on-replica-disk-node-107.log
drbdadm-cstate-disk-node-106.log
drbdadm-cstate-diskless.log
drbdadm-cstate-replica-disk-node-107.log
drbdadm-dstate-disk-node-106.log
drbdadm-dstate-diskless.log
drbdadm-dstate-replica-disk-node-107.log
drbdadm-show-resource-on-InUse.log
linstor-resource-list.log
node-associated-with-pod-using-this-pvc.log
pvc-description.log
sc-info.yaml.log
diskless-node-stuck-target-108.log
disk-node-stuck-target-primary-106.log
disk-node-stuck-target-secondary-107.log
drbadm-status-verbose-on-diskless-node-108.log
drbadm-status-verbose-on-InUse-node.log

Bug in drbd 9.1.5 on CentOS 7

Hello.
I don't sure it's drbd problem, but, after I upgrade packages kmod-drbd90-9.1.4 -> kmod-drbd90-9.1.5 from elrepo. I have а error in my message on a md raid.

My block stack is:
mdraid -> lvm -> drbd -> vdo -> lvm

I have trouble only with raid devices with chunks (usually 512K size) raid0, raid10. With raid1 no problem.
Please, could you give me a hint where could be the error?

Feb 11 02:48:58 arh kernel: md/raid10:md124: make_request bug: can't convert block across chunks or bigger than 512k 2755544 32
Feb 11 02:48:58 arh kernel: drbd r1/0 drbd2: disk( UpToDate -> Failed )
Feb 11 02:48:58 arh kernel: drbd r1/0 drbd2: Local IO failed in drbd_request_endio. Detaching...
Feb 11 02:48:58 arh kernel: drbd r1/0 drbd2: local READ IO error sector 2752472+64 on dm-3
Feb 11 02:48:58 arh kernel: drbd r1/0 drbd2: sending new current UUID: 3E82544B6FC832F1
Feb 11 02:48:59 arh kernel: drbd r1/0 drbd2: disk( Failed -> Diskless )
Feb 11 02:48:59 arh kernel: drbd r1/0 drbd2: Should have called drbd_al_complete_io(, 4294724168, 4096), but my Disk seems to have failed :(

After this, the primary worked in diskless mode. If primary on raid1, all works normal, and secondary UpToDate, even if the secondary is on a raid0.

drbd90-utils-9.19.1-1.el7.elrepo.x86_64
kmod-drbd90-9.1.5-1.el7_9.elrepo.x86_64

I don't try revert to kmod-9.1.4 yet, but with previous kernel and 9.1.5 I get the same.

Full resync always stuck in congested (behind) state after few days

Hi,

DRBD resource always stuck in Behind state and sync status start decreasing 98.20 -> 98.19 ... 98.12% after 2~3 days when on-congestion policy is "pull-ahead" there is no entry about congestion fill/extents reached in kernel logs as when you hit the configured limit.

I tried to increase congestion-fill to crazy value (100M -> 200M,500M or disable 0) and congestion-extents (to value even higher than al-extents) or commented them completely out from configuration but no help still same outcome.
Commenting out on-congestion pull-ahead (switch to default block) will help and resync started continuing again.

When congested logs on primary are filling with thousand same entries in loop:
[ +0.104537] drbd storage/0 drbd1 backup-dc: repl( PausedSyncS -> Ahead )
[ +1.026862] drbd storage/0 drbd1 backup-dc: helper command: /sbin/drbdadm before-resync-source
[ +0.002791] drbd storage/0 drbd1 backup-dc: helper command: /sbin/drbdadm before-resync-source exit code 0
[ +0.001076] drbd storage/0 drbd1 backup-dc: repl( Ahead -> PausedSyncS )
[ +0.000718] drbd storage/0 drbd1 backup-dc: Began resync as PausedSyncS (will sync 12844472428 KB [3211118107 bits set]).
[ +0.043059] drbd storage/0 drbd1 backup-dc: repl( PausedSyncS -> Ahead )
[ +1.040371] drbd storage/0 drbd1 backup-dc: helper command: /sbin/drbdadm before-resync-source
[ +0.004944] drbd storage/0 drbd1 backup-dc: helper command: /sbin/drbdadm before-resync-source exit code 0
[ +0.000894] drbd storage/0 drbd1 backup-dc: repl( Ahead -> PausedSyncS )
[ +0.000695] drbd storage/0 drbd1 backup-dc: Began resync as PausedSyncS (will sync 12844472428 KB [3211118107 bits set]).
[ +0.098964] drbd storage/0 drbd1 backup-dc: repl( PausedSyncS -> Ahead )
[ +1.046465] drbd storage/0 drbd1 backup-dc: helper command: /sbin/drbdadm before-resync-source
[ +0.003419] drbd storage/0 drbd1 backup-dc: helper command: /sbin/drbdadm before-resync-source exit code 0
[ +0.004561] drbd storage/0 drbd1 backup-dc: repl( Ahead -> PausedSyncS )
[ +0.010005] drbd storage/0 drbd1 backup-dc: Began resync as PausedSyncS (will sync 12844472428 KB [3211118107 bits set]).
[ +0.046983] drbd storage/0 drbd1 backup-dc: repl( PausedSyncS -> Ahead )
[ +1.022996] drbd storage/0 drbd1 backup-dc: helper command: /sbin/drbdadm before-resync-source
[ +0.006174] drbd storage/0 drbd1 backup-dc: helper command: /sbin/drbdadm before-resync-source exit code 0
[ +0.011331] drbd storage/0 drbd1 backup-dc: repl( Ahead -> PausedSyncS )
[ +0.009500] drbd storage/0 drbd1 backup-dc: Began resync as PausedSyncS (will sync 12844472428 KB [3211118107 bits set]).
[ +0.264396] drbd storage/0 drbd1 backup-dc: repl( PausedSyncS -> Ahead )
[ +1.052604] drbd storage/0 drbd1 backup-dc: helper command: /sbin/drbdadm before-resync-source
[ +0.004883] drbd storage/0 drbd1 backup-dc: helper command: /sbin/drbdadm before-resync-source exit code 0
[ +0.008559] drbd storage/0 drbd1 backup-dc: repl( Ahead -> PausedSyncS )
[ +0.008532] drbd storage/0 drbd1 backup-dc: Began resync as PausedSyncS (will sync 12844472428 KB [3211118107 bits set]).

ENV: DRBD 9.1.7 Oracle Linux 8.6(lattest updates, but same with few months old ackages)
Full configuration in attachment, congestion is only configured for backup storage(backup-dc) node because is way slower.
storage.txt

DRBD 9.1.12 drbdsetup completely frozen

One resource turned into a bad situation during the night. After trying to remove (linstor r d) the resource on a node with an outdated state it is not possible to check status with drbdadm or drbdsetup on the primary node (satellite). While drbdadm times out, drbdsetup just freezes.

root@de-fra-node11:/# drbdadm status pvc-976cacaa-af84-4398-814d-d4745288e81a
Command 'drbdsetup status pvc-976cacaa-af84-4398-814d-d4745288e81a' did not terminate within 5 seconds
root@de-fra-node11:/# drbdsetup show pvc-976cacaa-af84-4398-814d-d4745288e81a

All other resources on the node work just fine and drbdadm + drbdsetup work as expected.

This is how the resources looks like in Linstor:

╭──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
┊ Node          ┊ Resource                                 ┊ StoragePool          ┊ VolNr ┊ MinorNr ┊ DeviceName    ┊ Allocated ┊ InUse ┊    State ┊
╞══════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════╡
┊ de-fra-node11 ┊ pvc-976cacaa-af84-4398-814d-d4745288e81a ┊ lvm-thin             ┊     0 ┊    1008 ┊ /dev/drbd1008 ┊  1.51 GiB ┊ InUse ┊ UpToDate ┊
┊ de-fra-node55 ┊ pvc-976cacaa-af84-4398-814d-d4745288e81a ┊ DfltDisklessStorPool ┊     0 ┊    1008 ┊ /dev/drbd1008 ┊           ┊       ┊  Unknown ┊
┊ de-fra-node56 ┊ pvc-976cacaa-af84-4398-814d-d4745288e81a ┊ lvm-thin             ┊     0 ┊    1008 ┊ None          ┊           ┊       ┊    Error ┊
╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
ERROR:
Description:
    Node: 'de-fra-node56', resource: 'pvc-976cacaa-af84-4398-814d-d4745288e81a', volume: 0 - The device provider generated a StorageException. Error report number: 638F688E-C6205-000027
Cause:
    The volume could not be found on the system.

drbdmon still shows the resource:

⤷RES: pvc-976cacaa-af84-4398-814d-d4745288e81a         ⚫Primary     QUORUM LOST
    ✗    0:   1008  UpToDate            
    ⤷↯ de-fra-node55                                    Connecting            Unknown   
    ⤷↯ de-fra-node56                                    Disconnecting         Unknown   
        ✗    0  Outdated             Off  

We are using Piraeus with Linstor 1.20 and DRBD 1.9.12 on Ubuntu 20.04 LTS. Before DRBD 1.9.12 we never saw a freezing drbdsetup call.

We experienced the same situation with another resource. The only way to get out of this situation was to reboot the node.

Troubles while using DRBD 9.2.4

I had three troubles while using DRBD 9.2.4.
It is probably the bug of DRBD, but please understand that it could also be the bug of OS(Almalinux) or Software(Bind or Squid).

System Environment

  • Hyper-V
    • Almalinux 8.8
      • DRBD 9.2.4
      • Bind 9.18.15 or Squid 5.9
      • Pacemaker 2.1.5
      • pcs 0.10.15
      • corosync 3.1.7

1. Replacing device files

State

Definition of /dev/sda and /dev/sdb was replaced, and then the mount of DRBD using Pacemaker was failed.

ex. Normal

/dev/sda
|- /dev/sda1
|- /dev/sda2
 |-/dev/mapper/almalinux-root
 |-/dev/mapper/almalinux-swap
 |-/dev/mapper/almalinux-home
/dev/sdb
|- /dev/sdb1
 |- /dev/drbd0
/dev/sr0

ex. Abnormal

/dev/sda
|- /dev/sda1
/dev/sdb
|- /dev/sdb1
|- /dev/sdb2
 |-/dev/mapper/almalinux-root
 |-/dev/mapper/almalinux-swap
 |-/dev/mapper/almalinux-home
/dev/sr0

Temporary approach

It may be due to udev-always-use-vnr option(specification?) being enabled by default in global_common.conf, I commented it out.

2. Not recognizing the service user

State

The mount point of DRBD was /mnt and chown named:named /mnt. /mnt owner and group permissions were changed numerically, the service user was not recognized, and then files in /mnt were no longer able to read or write by named service.

ex. Normal

drwxrwxr-x 2 named named … mnt

ex. Abnormal

drwxrwxr-x 2 25 25 … mnt

Temporary approach

Changed to use chmod instead of chown.
There was no problem that root user was not recognized, so specifying Linux default user(nobody, etc.) in chown might be fine.

3. Loss of all data in mount point

State

The mount point of DRBD was /mnt and I was running the squid -k rotate command several times in /mnt, "Terminated" was displayed and all files in /mnt were deleted.

Temporary approach

Backup files in /mnt.

9.2.0-rc-6 Kernel Panic

While Testing performance with nvme-tcp with DRBD I received a panic:

Client workload was the following FIO:

fio --time_based --name=benchmark --size=4G --runtime=300 --filename=./randwrite --ioengine=libaio --randrepeat=0 --iodepth=32 --direct=1 --invalidate=1 --verify=0 --verify_fatal=0 --numjobs=8 --rw=randwrite --blocksize=4k --group_reporting

Client Side:

[root@drbd-linstor-2 nvm-test]# uname -a
Linux drbd-linstor-2 5.18.12-1.el8.elrepo.x86_64 #1 SMP PREEMPT_DYNAMIC Fri Jul 15 07:10:46 EDT 2022 x86_64 x86_64 x86_64 GNU/Linux

[root@drbd-linstor-2 nvm-test]# modinfo nvme_tcp
filename:       /lib/modules/5.18.12-1.el8.elrepo.x86_64/kernel/drivers/nvme/host/nvme-tcp.ko.xz
license:        GPL v2
srcversion:     E2363EE73FF90031E8C71B3
depends:        nvme-core,nvme-fabrics
retpoline:      Y
intree:         Y
name:           nvme_tcp
vermagic:       5.18.12-1.el8.elrepo.x86_64 SMP preempt mod_unload modversions
parm:           so_priority:nvme tcp socket optimize priority (int)

ServerSide:

[root@ac-1f-6b-a4-df-ee ~]# uname -a
Linux ac-1f-6b-a4-df-ee 5.17.9-1.el8.elrepo.x86_64 #1 SMP PREEMPT Tue May 17 16:22:04 EDT 2022 x86_64 x86_64 x86_64 GNU/Linux


nvmetcli ls
o- / ........................................................................................................... [...]
  o- hosts ..................................................................................................... [...]
  o- ports ..................................................................................................... [...]
  | o- 0 .................................... [trtype=tcp, traddr=10.91.230.214, trsvcid=4420, inline_data_size=16384]
  |   o- ana_groups ............................................................................................ [...]
  |   | o- 1 ....................................................................................... [state=optimized]
  |   o- referrals ............................................................................................. [...]
  |   o- subsystems ............................................................................................ [...]
  |     o- linbit:nvme:demo0 ................................................................................... [...]
  o- subsystems ................................................................................................ [...]
    o- linbit:nvme:demo0 ......................................... [version=1.3, allow_any=1, serial=c7499507d3254059]
      o- allowed_hosts ......................................................................................... [...]
      o- namespaces ............................................................................................ [...]
        o- 1  [path=/dev/drbd/by-res/demo0/1, uuid=2c73d2c7-e5b1-5ffd-94f2-dc51e384f49e, nguid=2c73d2c7-e5b1-5ffd-94f2-dc51e384f49e, grpid=1, enabled]


[root@ac-1f-6b-a4-df-ee ~]# modinfo drbd
filename:       /lib/modules/5.17.9-1.el8.elrepo.x86_64/extra/drbd/drbd.ko
alias:          block-major-147-*
license:        GPL
version:        9.2.0-rc.6
description:    drbd - Distributed Replicated Block Device v9.2.0-rc.6
author:         Philipp Reisner <[email protected]>, Lars Ellenberg <[email protected]>
srcversion:     860BC782A49F832D4EBBB95
depends:        libcrc32c
retpoline:      Y
name:           drbd
vermagic:       5.17.9-1.el8.elrepo.x86_64 SMP preempt mod_unload modversions
parm:           enable_faults:int
parm:           fault_rate:int
parm:           fault_count:int
parm:           fault_devs:int
parm:           disable_sendpage:bool
parm:           allow_oos:DONT USE! (bool)
parm:           minor_count:Approximate number of drbd devices (1-255) (uint)
parm:           usermode_helper:string
parm:           protocol_version_min:drbd_protocol_version



[root@ac-1f-6b-a4-df-ee ~]# modinfo zfs
filename:       /lib/modules/5.17.9-1.el8.elrepo.x86_64/extra/zfs.ko.xz
version:        2.1.4-1
license:        CDDL
author:         OpenZFS
description:    ZFS
alias:          devname:zfs
alias:          char-major-10-249
srcversion:     1F136E6AD43697243DB8CE6
depends:        spl,znvpair,icp,zlua,zzstd,zunicode,zcommon,zavl
retpoline:      Y
name:           zfs
vermagic:       5.17.9-1.el8.elrepo.x86_64 SMP preempt mod_unload modversions
sig_id:         PKCS#7
signer:         DKMS module signing key
sig_key:        02:E0:9D:99:71:16:71:B9:8A:82:95:56:5E:D3:82:31:61:36:CC:89
sig_hashalgo:   sha512
signature:      9A:58:52:2F:CD:27:DB:CF:4E:36:94:5C:F9:17:18:BA:57:1C:22:35:
                F6:03:9B:A8:7B:C7:CD:A0:AC:AA:C1:FE:10:A5:85:0E:3F:6B:AA:44:
                8D:1C:75:F0:05:48:41:A9:13:2A:04:2C:8A:90:CE:FF:09:E5:4C:90:
                75:FD:F7:41:A7:BA:B1:BE:4A:67:9A:C1:C6:DE:53:05:58:54:25:3E:
                05:C0:BD:80:06:A8:FF:36:CD:15:3F:D7:BE:0D:70:AE:5F:E5:E1:2C:
                61:E3:D8:1A:7C:8C:5D:92:82:40:7A:C2:4A:8B:1C:E1:E0:48:DC:4C:
                92:04:BE:30:92:09:2A:F9:79:58:4D:C3:24:90:9B:51:B5:72:7D:20:
                96:01:85:A9:A5:AD:B3:0D:32:76:E7:67:63:FB:F9:48:4D:03:C9:3E:
                17:8B:C0:9E:F6:7C:D8:7C:1E:FD:E0:6F:E9:DC:E4:1E:9C:13:1D:63:
                65:2A:72:F1:FD:12:4A:2A:63:2E:81:03:64:29:38:D1:BA:5D:C6:7F:
                CA:11:70:D2:CC:3F:7A:66:DE:52:AD:F9:8D:ED:D5:22:76:A3:75:75:
                E2:FB:66:37:62:C3:7B:E3:0E:08:47:F6:FC:2A:AB:BC:6D:F7:73:73:
                F4:66:DF:4B:B7:39:B1:4C:DE:19:32:25:F3:D9:47:5F
parm:           zvol_inhibit_dev:Do not create zvol device nodes (uint)
parm:           zvol_major:Major number for zvol device (uint)
parm:           zvol_threads:Max number of threads to handle I/O requests (uint)
parm:           zvol_request_sync:Synchronously handle bio requests (uint)
parm:           zvol_max_discard_blocks:Max number of blocks to discard (ulong)
parm:           zvol_prefetch_bytes:Prefetch N bytes at zvol start+end (uint)
parm:           zvol_volmode:Default volmode property value (uint)
parm:           zfs_fallocate_reserve_percent:Percentage of length to use for the available capacity check (uint)
parm:           zfs_key_max_salt_uses:Max number of times a salt value can be used for generating encryption keys before it is rotated (ulong)
parm:           zfs_object_mutex_size:Size of znode hold array (uint)
parm:           zfs_unlink_suspend_progress:Set to prevent async unlinks (debug - leaks space into the unlinked set) (int)
parm:           zfs_delete_blocks:Delete files larger than N blocks async (ulong)
parm:           zfs_dbgmsg_enable:Enable ZFS debug message log (int)
parm:           zfs_dbgmsg_maxsize:Maximum ZFS debug log size (int)
parm:           zfs_admin_snapshot:Enable mkdir/rmdir/mv in .zfs/snapshot (int)
parm:           zfs_expire_snapshot:Seconds to expire .zfs/snapshot (int)
parm:           vdev_file_logical_ashift:Logical ashift for file-based devices (ulong)
parm:           vdev_file_physical_ashift:Physical ashift for file-based devices (ulong)
parm:           zfs_vdev_scheduler:I/O scheduler
parm:           zfs_arc_shrinker_limit:Limit on number of pages that ARC shrinker can reclaim at once (int)
parm:           zfs_abd_scatter_enabled:Toggle whether ABD allocations must be linear. (int)
parm:           zfs_abd_scatter_min_size:Minimum size of scatter allocations. (int)
parm:           zfs_abd_scatter_max_order:Maximum order allocation used for a scatter ABD. (uint)
parm:           zio_slow_io_ms:Max I/O completion time (milliseconds) before marking it as slow (int)
parm:           zio_requeue_io_start_cut_in_line:Prioritize requeued I/O (int)
parm:           zfs_sync_pass_deferred_free:Defer frees starting in this pass (int)
parm:           zfs_sync_pass_dont_compress:Don't compress starting in this pass (int)
parm:           zfs_sync_pass_rewrite:Rewrite new bps starting in this pass (int)
parm:           zio_dva_throttle_enabled:Throttle block allocations in the ZIO pipeline (int)
parm:           zio_deadman_log_all:Log all slow ZIOs, not just those with vdevs (int)
parm:           zfs_commit_timeout_pct:ZIL block open timeout percentage (int)
parm:           zil_replay_disable:Disable intent logging replay (int)
parm:           zil_nocacheflush:Disable ZIL cache flushes (int)
parm:           zil_slog_bulk:Limit in bytes slog sync writes per commit (ulong)
parm:           zil_maxblocksize:Limit in bytes of ZIL log block size (int)
parm:           zfs_vnops_read_chunk_size:Bytes to read per chunk (ulong)
parm:           zfs_immediate_write_sz:Largest data block to write to zil (long)
parm:           zfs_max_nvlist_src_size:Maximum size in bytes allowed for src nvlist passed with ZFS ioctls (ulong)
parm:           zfs_history_output_max:Maximum size in bytes of ZFS ioctl output that will be logged (ulong)
parm:           zfs_zevent_retain_max:Maximum recent zevents records to retain for duplicate checking (uint)
parm:           zfs_zevent_retain_expire_secs:Expiration time for recent zevents records (uint)
parm:           zfs_lua_max_instrlimit:Max instruction limit that can be specified for a channel program (ulong)
parm:           zfs_lua_max_memlimit:Max memory limit that can be specified for a channel program (ulong)
parm:           zap_iterate_prefetch:When iterating ZAP object, prefetch it (int)
parm:           zfs_trim_extent_bytes_max:Max size of TRIM commands, larger will be split (uint)
parm:           zfs_trim_extent_bytes_min:Min size of TRIM commands, smaller will be skipped (uint)
parm:           zfs_trim_metaslab_skip:Skip metaslabs which have never been initialized (uint)
parm:           zfs_trim_txg_batch:Min number of txgs to aggregate frees before issuing TRIM (uint)
parm:           zfs_trim_queue_limit:Max queued TRIMs outstanding per leaf vdev (uint)
parm:           zfs_removal_ignore_errors:Ignore hard IO errors when removing device (int)
parm:           zfs_remove_max_segment:Largest contiguous segment to allocate when removing device (int)
parm:           vdev_removal_max_span:Largest span of free chunks a remap segment can span (int)
parm:           zfs_removal_suspend_progress:Pause device removal after this many bytes are copied (debug use only - causes removal to hang) (int)
parm:           zfs_rebuild_max_segment:Max segment size in bytes of rebuild reads (ulong)
parm:           zfs_rebuild_vdev_limit:Max bytes in flight per leaf vdev for sequential resilvers (ulong)
parm:           zfs_rebuild_scrub_enabled:Automatically scrub after sequential resilver completes (int)
parm:           zfs_vdev_raidz_impl:Select raidz implementation.
parm:           zfs_vdev_aggregation_limit:Max vdev I/O aggregation size (int)
parm:           zfs_vdev_aggregation_limit_non_rotating:Max vdev I/O aggregation size for non-rotating media (int)
parm:           zfs_vdev_aggregate_trim:Allow TRIM I/O to be aggregated (int)
parm:           zfs_vdev_read_gap_limit:Aggregate read I/O over gap (int)
parm:           zfs_vdev_write_gap_limit:Aggregate write I/O over gap (int)
parm:           zfs_vdev_max_active:Maximum number of active I/Os per vdev (int)
parm:           zfs_vdev_async_write_active_max_dirty_percent:Async write concurrency max threshold (int)
parm:           zfs_vdev_async_write_active_min_dirty_percent:Async write concurrency min threshold (int)
parm:           zfs_vdev_async_read_max_active:Max active async read I/Os per vdev (int)
parm:           zfs_vdev_async_read_min_active:Min active async read I/Os per vdev (int)
parm:           zfs_vdev_async_write_max_active:Max active async write I/Os per vdev (int)
parm:           zfs_vdev_async_write_min_active:Min active async write I/Os per vdev (int)
parm:           zfs_vdev_initializing_max_active:Max active initializing I/Os per vdev (int)
parm:           zfs_vdev_initializing_min_active:Min active initializing I/Os per vdev (int)
parm:           zfs_vdev_removal_max_active:Max active removal I/Os per vdev (int)
parm:           zfs_vdev_removal_min_active:Min active removal I/Os per vdev (int)
parm:           zfs_vdev_scrub_max_active:Max active scrub I/Os per vdev (int)
parm:           zfs_vdev_scrub_min_active:Min active scrub I/Os per vdev (int)
parm:           zfs_vdev_sync_read_max_active:Max active sync read I/Os per vdev (int)
parm:           zfs_vdev_sync_read_min_active:Min active sync read I/Os per vdev (int)
parm:           zfs_vdev_sync_write_max_active:Max active sync write I/Os per vdev (int)
parm:           zfs_vdev_sync_write_min_active:Min active sync write I/Os per vdev (int)
parm:           zfs_vdev_trim_max_active:Max active trim/discard I/Os per vdev (int)
parm:           zfs_vdev_trim_min_active:Min active trim/discard I/Os per vdev (int)
parm:           zfs_vdev_rebuild_max_active:Max active rebuild I/Os per vdev (int)
parm:           zfs_vdev_rebuild_min_active:Min active rebuild I/Os per vdev (int)
parm:           zfs_vdev_nia_credit:Number of non-interactive I/Os to allow in sequence (int)
parm:           zfs_vdev_nia_delay:Number of non-interactive I/Os before _max_active (int)
parm:           zfs_vdev_queue_depth_pct:Queue depth percentage for each top-level vdev (int)
parm:           zfs_vdev_mirror_rotating_inc:Rotating media load increment for non-seeking I/O's (int)
parm:           zfs_vdev_mirror_rotating_seek_inc:Rotating media load increment for seeking I/O's (int)
parm:           zfs_vdev_mirror_rotating_seek_offset:Offset in bytes from the last I/O which triggers a reduced rotating media seek increment (int)
parm:           zfs_vdev_mirror_non_rotating_inc:Non-rotating media load increment for non-seeking I/O's (int)
parm:           zfs_vdev_mirror_non_rotating_seek_inc:Non-rotating media load increment for seeking I/O's (int)
parm:           zfs_initialize_value:Value written during zpool initialize (ulong)
parm:           zfs_initialize_chunk_size:Size in bytes of writes by zpool initialize (ulong)
parm:           zfs_condense_indirect_vdevs_enable:Whether to attempt condensing indirect vdev mappings (int)
parm:           zfs_condense_indirect_obsolete_pct:Minimum obsolete percent of bytes in the mapping to attempt condensing (int)
parm:           zfs_condense_min_mapping_bytes:Don't bother condensing if the mapping uses less than this amount of memory (ulong)
parm:           zfs_condense_max_obsolete_bytes:Minimum size obsolete spacemap to attempt condensing (ulong)
parm:           zfs_condense_indirect_commit_entry_delay_ms:Used by tests to ensure certain actions happen in the middle of a condense. A maximum value of 1 should be sufficient. (int)
parm:           zfs_reconstruct_indirect_combinations_max:Maximum number of combinations when reconstructing split segments (int)
parm:           zfs_vdev_cache_max:Inflate reads small than max (int)
parm:           zfs_vdev_cache_size:Total size of the per-disk cache (int)
parm:           zfs_vdev_cache_bshift:Shift size to inflate reads too (int)
parm:           zfs_vdev_default_ms_count:Target number of metaslabs per top-level vdev (int)
parm:           zfs_vdev_default_ms_shift:Default limit for metaslab size (int)
parm:           zfs_vdev_min_ms_count:Minimum number of metaslabs per top-level vdev (int)
parm:           zfs_vdev_ms_count_limit:Practical upper limit of total metaslabs per top-level vdev (int)
parm:           zfs_slow_io_events_per_second:Rate limit slow IO (delay) events to this many per second (uint)
parm:           zfs_checksum_events_per_second:Rate limit checksum events to this many checksum errors per second (do not set below zed threshold). (uint)
parm:           zfs_scan_ignore_errors:Ignore errors during resilver/scrub (int)
parm:           vdev_validate_skip:Bypass vdev_validate() (int)
parm:           zfs_nocacheflush:Disable cache flushes (int)
parm:           zfs_embedded_slog_min_ms:Minimum number of metaslabs required to dedicate one for log blocks (int)
parm:           zfs_vdev_min_auto_ashift:Minimum ashift used when creating new top-level vdevs
parm:           zfs_vdev_max_auto_ashift:Maximum ashift used when optimizing for logical -> physical sector size on new top-level vdevs
parm:           zfs_txg_timeout:Max seconds worth of delta per txg (int)
parm:           zfs_read_history:Historical statistics for the last N reads (int)
parm:           zfs_read_history_hits:Include cache hits in read history (int)
parm:           zfs_txg_history:Historical statistics for the last N txgs (int)
parm:           zfs_multihost_history:Historical statistics for last N multihost writes (int)
parm:           zfs_flags:Set additional debugging flags (uint)
parm:           zfs_recover:Set to attempt to recover from fatal errors (int)
parm:           zfs_free_leak_on_eio:Set to ignore IO errors during free and permanently leak the space (int)
parm:           zfs_deadman_checktime_ms:Dead I/O check interval in milliseconds (ulong)
parm:           zfs_deadman_enabled:Enable deadman timer (int)
parm:           spa_asize_inflation:SPA size estimate multiplication factor (int)
parm:           zfs_ddt_data_is_special:Place DDT data into the special class (int)
parm:           zfs_user_indirect_is_special:Place user data indirect blocks into the special class (int)
parm:           zfs_deadman_failmode:Failmode for deadman timer
parm:           zfs_deadman_synctime_ms:Pool sync expiration time in milliseconds
parm:           zfs_deadman_ziotime_ms:IO expiration time in milliseconds
parm:           zfs_special_class_metadata_reserve_pct:Small file blocks in special vdevs depends on this much free space available (int)
parm:           spa_slop_shift:Reserved free space in pool
parm:           zfs_unflushed_max_mem_amt:Specific hard-limit in memory that ZFS allows to be used for unflushed changes (ulong)
parm:           zfs_unflushed_max_mem_ppm:Percentage of the overall system memory that ZFS allows to be used for unflushed changes (value is calculated over 1000000 for finer granularity) (ulong)
parm:           zfs_unflushed_log_block_max:Hard limit (upper-bound) in the size of the space map log in terms of blocks. (ulong)
parm:           zfs_unflushed_log_block_min:Lower-bound limit for the maximum amount of blocks allowed in log spacemap (see zfs_unflushed_log_block_max) (ulong)
parm:           zfs_unflushed_log_block_pct:Tunable used to determine the number of blocks that can be used for the spacemap log, expressed as a percentage of the total number of metaslabs in the pool (e.g. 400 means the number of log blocks is capped at 4 times the number of metaslabs) (ulong)
parm:           zfs_max_log_walking:The number of past TXGs that the flushing algorithm of the log spacemap feature uses to estimate incoming log blocks (ulong)
parm:           zfs_max_logsm_summary_length:Maximum number of rows allowed in the summary of the spacemap log (ulong)
parm:           zfs_min_metaslabs_to_flush:Minimum number of metaslabs to flush per dirty TXG (ulong)
parm:           zfs_keep_log_spacemaps_at_export:Prevent the log spacemaps from being flushed and destroyed during pool export/destroy (int)
parm:           spa_config_path:SPA config file (/etc/zfs/zpool.cache) (charp)
parm:           zfs_autoimport_disable:Disable pool import at module load (int)
parm:           zfs_spa_discard_memory_limit:Limit for memory used in prefetching the checkpoint space map done on each vdev while discarding the checkpoint (ulong)
parm:           spa_load_verify_shift:log2 fraction of arc that can be used by inflight I/Os when verifying pool during import (int)
parm:           spa_load_verify_metadata:Set to traverse metadata on pool import (int)
parm:           spa_load_verify_data:Set to traverse data on pool import (int)
parm:           spa_load_print_vdev_tree:Print vdev tree to zfs_dbgmsg during pool import (int)
parm:           zio_taskq_batch_pct:Percentage of CPUs to run an IO worker thread (uint)
parm:           zio_taskq_batch_tpq:Number of threads per IO worker taskqueue (uint)
parm:           zfs_max_missing_tvds:Allow importing pool with up to this number of missing top-level vdevs (in read-only mode) (ulong)
parm:           zfs_livelist_condense_zthr_pause:Set the livelist condense zthr to pause (int)
parm:           zfs_livelist_condense_sync_pause:Set the livelist condense synctask to pause (int)
parm:           zfs_livelist_condense_sync_cancel:Whether livelist condensing was canceled in the synctask (int)
parm:           zfs_livelist_condense_zthr_cancel:Whether livelist condensing was canceled in the zthr function (int)
parm:           zfs_livelist_condense_new_alloc:Whether extra ALLOC blkptrs were added to a livelist entry while it was being condensed (int)
parm:           zfs_multilist_num_sublists:Number of sublists used in each multilist (int)
parm:           zfs_multihost_interval:Milliseconds between mmp writes to each leaf
parm:           zfs_multihost_fail_intervals:Max allowed period without a successful mmp write (uint)
parm:           zfs_multihost_import_intervals:Number of zfs_multihost_interval periods to wait for activity (uint)
parm:           metaslab_aliquot:Allocation granularity (a.k.a. stripe size) (ulong)
parm:           metaslab_debug_load:Load all metaslabs when pool is first opened (int)
parm:           metaslab_debug_unload:Prevent metaslabs from being unloaded (int)
parm:           metaslab_preload_enabled:Preload potential metaslabs during reassessment (int)
parm:           metaslab_unload_delay:Delay in txgs after metaslab was last used before unloading (int)
parm:           metaslab_unload_delay_ms:Delay in milliseconds after metaslab was last used before unloading (int)
parm:           zfs_mg_noalloc_threshold:Percentage of metaslab group size that should be free to make it eligible for allocation (int)
parm:           zfs_mg_fragmentation_threshold:Percentage of metaslab group size that should be considered eligible for allocations unless all metaslab groups within the metaslab class have also crossed this threshold (int)
parm:           zfs_metaslab_fragmentation_threshold:Fragmentation for metaslab to allow allocation (int)
parm:           metaslab_fragmentation_factor_enabled:Use the fragmentation metric to prefer less fragmented metaslabs (int)
parm:           metaslab_lba_weighting_enabled:Prefer metaslabs with lower LBAs (int)
parm:           metaslab_bias_enabled:Enable metaslab group biasing (int)
parm:           zfs_metaslab_segment_weight_enabled:Enable segment-based metaslab selection (int)
parm:           zfs_metaslab_switch_threshold:Segment-based metaslab selection maximum buckets before switching (int)
parm:           metaslab_force_ganging:Blocks larger than this size are forced to be gang blocks (ulong)
parm:           metaslab_df_max_search:Max distance (bytes) to search forward before using size tree (int)
parm:           metaslab_df_use_largest_segment:When looking in size tree, use largest segment instead of exact fit (int)
parm:           zfs_metaslab_max_size_cache_sec:How long to trust the cached max chunk size of a metaslab (ulong)
parm:           zfs_metaslab_mem_limit:Percentage of memory that can be used to store metaslab range trees (int)
parm:           zfs_metaslab_try_hard_before_gang:Try hard to allocate before ganging (int)
parm:           zfs_metaslab_find_max_tries:Normally only consider this many of the best metaslabs in each vdev (int)
parm:           zfs_zevent_len_max:Max event queue length (int)
parm:           zfs_scan_vdev_limit:Max bytes in flight per leaf vdev for scrubs and resilvers (ulong)
parm:           zfs_scrub_min_time_ms:Min millisecs to scrub per txg (int)
parm:           zfs_obsolete_min_time_ms:Min millisecs to obsolete per txg (int)
parm:           zfs_free_min_time_ms:Min millisecs to free per txg (int)
parm:           zfs_resilver_min_time_ms:Min millisecs to resilver per txg (int)
parm:           zfs_scan_suspend_progress:Set to prevent scans from progressing (int)
parm:           zfs_no_scrub_io:Set to disable scrub I/O (int)
parm:           zfs_no_scrub_prefetch:Set to disable scrub prefetching (int)
parm:           zfs_async_block_max_blocks:Max number of blocks freed in one txg (ulong)
parm:           zfs_max_async_dedup_frees:Max number of dedup blocks freed in one txg (ulong)
parm:           zfs_free_bpobj_enabled:Enable processing of the free_bpobj (int)
parm:           zfs_scan_mem_lim_fact:Fraction of RAM for scan hard limit (int)
parm:           zfs_scan_issue_strategy:IO issuing strategy during scrubbing. 0 = default, 1 = LBA, 2 = size (int)
parm:           zfs_scan_legacy:Scrub using legacy non-sequential method (int)
parm:           zfs_scan_checkpoint_intval:Scan progress on-disk checkpointing interval (int)
parm:           zfs_scan_max_ext_gap:Max gap in bytes between sequential scrub / resilver I/Os (ulong)
parm:           zfs_scan_mem_lim_soft_fact:Fraction of hard limit used as soft limit (int)
parm:           zfs_scan_strict_mem_lim:Tunable to attempt to reduce lock contention (int)
parm:           zfs_scan_fill_weight:Tunable to adjust bias towards more filled segments during scans (int)
parm:           zfs_resilver_disable_defer:Process all resilvers immediately (int)
parm:           zfs_dirty_data_max_percent:Max percent of RAM allowed to be dirty (int)
parm:           zfs_dirty_data_max_max_percent:zfs_dirty_data_max upper bound as % of RAM (int)
parm:           zfs_delay_min_dirty_percent:Transaction delay threshold (int)
parm:           zfs_dirty_data_max:Determines the dirty space limit (ulong)
parm:           zfs_dirty_data_max_max:zfs_dirty_data_max upper bound in bytes (ulong)
parm:           zfs_dirty_data_sync_percent:Dirty data txg sync threshold as a percentage of zfs_dirty_data_max (int)
parm:           zfs_delay_scale:How quickly delay approaches infinity (ulong)
parm:           zfs_sync_taskq_batch_pct:Max percent of CPUs that are used to sync dirty data (int)
parm:           zfs_zil_clean_taskq_nthr_pct:Max percent of CPUs that are used per dp_sync_taskq (int)
parm:           zfs_zil_clean_taskq_minalloc:Number of taskq entries that are pre-populated (int)
parm:           zfs_zil_clean_taskq_maxalloc:Max number of taskq entries that are cached (int)
parm:           zfs_livelist_max_entries:Size to start the next sub-livelist in a livelist (ulong)
parm:           zfs_livelist_min_percent_shared:Threshold at which livelist is disabled (int)
parm:           zfs_max_recordsize:Max allowed record size (int)
parm:           zfs_allow_redacted_dataset_mount:Allow mounting of redacted datasets (int)
parm:           zfs_disable_ivset_guid_check:Set to allow raw receives without IVset guids (int)
parm:           zfs_prefetch_disable:Disable all ZFS prefetching (int)
parm:           zfetch_max_streams:Max number of streams per zfetch (uint)
parm:           zfetch_min_sec_reap:Min time before stream reclaim (uint)
parm:           zfetch_max_distance:Max bytes to prefetch per stream (uint)
parm:           zfetch_max_idistance:Max bytes to prefetch indirects for per stream (uint)
parm:           zfetch_array_rd_sz:Number of bytes in a array_read (ulong)
parm:           zfs_pd_bytes_max:Max number of bytes to prefetch (int)
parm:           zfs_traverse_indirect_prefetch_limit:Traverse prefetch number of blocks pointed by indirect block (int)
parm:           ignore_hole_birth:Alias for send_holes_without_birth_time (int)
parm:           send_holes_without_birth_time:Ignore hole_birth txg for zfs send (int)
parm:           zfs_send_corrupt_data:Allow sending corrupt data (int)
parm:           zfs_send_queue_length:Maximum send queue length (int)
parm:           zfs_send_unmodified_spill_blocks:Send unmodified spill blocks (int)
parm:           zfs_send_no_prefetch_queue_length:Maximum send queue length for non-prefetch queues (int)
parm:           zfs_send_queue_ff:Send queue fill fraction (int)
parm:           zfs_send_no_prefetch_queue_ff:Send queue fill fraction for non-prefetch queues (int)
parm:           zfs_override_estimate_recordsize:Override block size estimate with fixed size (int)
parm:           zfs_recv_queue_length:Maximum receive queue length (int)
parm:           zfs_recv_queue_ff:Receive queue fill fraction (int)
parm:           zfs_recv_write_batch_size:Maximum amount of writes to batch into one transaction (int)
parm:           dmu_object_alloc_chunk_shift:CPU-specific allocator grabs 2^N objects at once (int)
parm:           zfs_nopwrite_enabled:Enable NOP writes (int)
parm:           zfs_per_txg_dirty_frees_percent:Percentage of dirtied blocks from frees in one TXG (ulong)
parm:           zfs_dmu_offset_next_sync:Enable forcing txg sync to find holes (int)
parm:           dmu_prefetch_max:Limit one prefetch call to this size (int)
parm:           zfs_dedup_prefetch:Enable prefetching dedup-ed blks (int)
parm:           zfs_dbuf_state_index:Calculate arc header index (int)
parm:           dbuf_cache_max_bytes:Maximum size in bytes of the dbuf cache. (ulong)
parm:           dbuf_cache_hiwater_pct:Percentage over dbuf_cache_max_bytes when dbufs must be evicted directly. (uint)
parm:           dbuf_cache_lowater_pct:Percentage below dbuf_cache_max_bytes when the evict thread stops evicting dbufs. (uint)
parm:           dbuf_metadata_cache_max_bytes:Maximum size in bytes of the dbuf metadata cache. (ulong)
parm:           dbuf_cache_shift:Set the size of the dbuf cache to a log2 fraction of arc size. (int)
parm:           dbuf_metadata_cache_shift:Set the size of the dbuf metadata cache to a log2 fraction of arc size. (int)
parm:           zfs_arc_min:Min arc size
parm:           zfs_arc_max:Max arc size
parm:           zfs_arc_meta_limit:Metadata limit for arc size
parm:           zfs_arc_meta_limit_percent:Percent of arc size for arc meta limit
parm:           zfs_arc_meta_min:Min arc metadata
parm:           zfs_arc_meta_prune:Meta objects to scan for prune (int)
parm:           zfs_arc_meta_adjust_restarts:Limit number of restarts in arc_evict_meta (int)
parm:           zfs_arc_meta_strategy:Meta reclaim strategy (int)
parm:           zfs_arc_grow_retry:Seconds before growing arc size
parm:           zfs_arc_p_dampener_disable:Disable arc_p adapt dampener (int)
parm:           zfs_arc_shrink_shift:log2(fraction of arc to reclaim)
parm:           zfs_arc_pc_percent:Percent of pagecache to reclaim arc to (uint)
parm:           zfs_arc_p_min_shift:arc_c shift to calc min/max arc_p
parm:           zfs_arc_average_blocksize:Target average block size (int)
parm:           zfs_compressed_arc_enabled:Disable compressed arc buffers (int)
parm:           zfs_arc_min_prefetch_ms:Min life of prefetch block in ms
parm:           zfs_arc_min_prescient_prefetch_ms:Min life of prescient prefetched block in ms
parm:           l2arc_write_max:Max write bytes per interval (ulong)
parm:           l2arc_write_boost:Extra write bytes during device warmup (ulong)
parm:           l2arc_headroom:Number of max device writes to precache (ulong)
parm:           l2arc_headroom_boost:Compressed l2arc_headroom multiplier (ulong)
parm:           l2arc_trim_ahead:TRIM ahead L2ARC write size multiplier (ulong)
parm:           l2arc_feed_secs:Seconds between L2ARC writing (ulong)
parm:           l2arc_feed_min_ms:Min feed interval in milliseconds (ulong)
parm:           l2arc_noprefetch:Skip caching prefetched buffers (int)
parm:           l2arc_feed_again:Turbo L2ARC warmup (int)
parm:           l2arc_norw:No reads during writes (int)
parm:           l2arc_meta_percent:Percent of ARC size allowed for L2ARC-only headers (int)
parm:           l2arc_rebuild_enabled:Rebuild the L2ARC when importing a pool (int)
parm:           l2arc_rebuild_blocks_min_l2size:Min size in bytes to write rebuild log blocks in L2ARC (ulong)
parm:           l2arc_mfuonly:Cache only MFU data from ARC into L2ARC (int)
parm:           zfs_arc_lotsfree_percent:System free memory I/O throttle in bytes
parm:           zfs_arc_sys_free:System free memory target size in bytes
parm:           zfs_arc_dnode_limit:Minimum bytes of dnodes in arc
parm:           zfs_arc_dnode_limit_percent:Percent of ARC meta buffers for dnodes
parm:           zfs_arc_dnode_reduce_percent:Percentage of excess dnodes to try to unpin (ulong)
parm:           zfs_arc_eviction_pct:When full, ARC allocation waits for eviction of this % of alloc size (int)
parm:           zfs_arc_evict_batch_limit:The number of headers to evict per sublist before moving to the next (int)
parm:           zfs_arc_prune_task_threads:Number of arc_prune threads (int)

Panic Output:

[  637.965767] drbd demo0 ac-1f-6b-a4-df-ee: Preparing remote state change 665298238
[  638.041583] drbd demo0 ac-1f-6b-a4-df-ee: Committing remote state change 665298238 (primary_nodes=1)
[  639.376799] drbd demo0/0 drbd1000: new current UUID: FB35C925F1C156CD weak: FFFFFFFFFFFFFFFE
[  639.672337] nvmet: creating nvm controller 1 for subsystem linbit:nvme:demo0 for NQN nqn.2014-08.org.nvmexpress:uuid:a0ed6504-1546-40bd-a593-8bf9aa93fc58.
[  643.563056] drbd demo0 ac-1f-6b-a5-ab-ea: Handshake to peer 1 successful: Agreed network protocol version 121
[  643.573695] drbd demo0 ac-1f-6b-a5-ab-ea: Feature flags enabled on protocol level: 0x1f TRIM THIN_RESYNC WRITE_SAME WRITE_ZEROES RESYNC_DAGTAG
[  643.599194] drbd demo0 ac-1f-6b-a5-ab-ea: Peer authenticated using 20 bytes HMAC
[  643.648082] drbd demo0: Preparing cluster-wide state change 3344582659 (0->1 499/145)
[  643.670010] drbd demo0/0 drbd1000 ac-1f-6b-a5-ab-ea: drbd_sync_handshake:
[  643.677270] drbd demo0/0 drbd1000 ac-1f-6b-a5-ab-ea: self FB35C925F1C156CD:9FB8D3169D057002:0000000000000000:0000000000000000 bits:2 flags:120
[  643.690975] drbd demo0/0 drbd1000 ac-1f-6b-a5-ab-ea: peer 9FB8D3169D057002:0000000000000000:341A0FE913FD4D82:0000000000000000 bits:0 flags:20
[  643.704611] drbd demo0/0 drbd1000 ac-1f-6b-a5-ab-ea: uuid_compare()=source-use-bitmap by rule=bitmap-self
[  643.732998] drbd demo0/1 drbd1001 ac-1f-6b-a5-ab-ea: drbd_sync_handshake:
[  643.740277] drbd demo0/1 drbd1001 ac-1f-6b-a5-ab-ea: self 3B2E20FB38B2A4D8:0000000000000000:0000000000000000:0000000000000000 bits:0 flags:120
[  643.754031] drbd demo0/1 drbd1001 ac-1f-6b-a5-ab-ea: peer 3B2E20FB38B2A4D8:0000000000000000:A9837299945C2E84:0000000000000000 bits:0 flags:20
[  643.767722] drbd demo0/1 drbd1001 ac-1f-6b-a5-ab-ea: uuid_compare()=no-sync by rule=lost-quorum
[  643.787591] drbd demo0: State change 3344582659: primary_nodes=1, weak_nodes=FFFFFFFFFFFFFFF8
[  643.796642] drbd demo0: Committing cluster-wide state change 3344582659 (148ms)
[  643.812867] drbd demo0 ac-1f-6b-a5-ab-ea: conn( Connecting -> Connected ) peer( Unknown -> Secondary )
[  643.822690] drbd demo0/0 drbd1000 ac-1f-6b-a5-ab-ea: repl( Off -> WFBitMapS )
[  643.830345] drbd demo0/1 drbd1001 ac-1f-6b-a5-ab-ea: repl( Off -> Established )
[  643.838914] drbd demo0/0 drbd1000 ac-1f-6b-a5-ab-ea: send bitmap stats [Bytes(packets)]: plain 0(0), RLE 24(1), total 24; compression: 98.9%
[  643.867200] drbd demo0/1 drbd1001 ac-1f-6b-a5-ab-ea: pdsk( Outdated -> UpToDate )
[  643.883899] drbd demo0 ac-1f-6b-a5-ab-ea: Preparing remote state change 1952820562
[  643.945539] drbd demo0/0 drbd1000 ac-1f-6b-a5-ab-ea: receive bitmap stats [Bytes(packets)]: plain 0(0), RLE 24(1), total 24; compression: 98.9%
[  643.967828] drbd demo0/0 drbd1000 ac-1f-6b-a5-ab-ea: helper command: /sbin/drbdadm before-resync-source
[  643.986114] drbd demo0/0 drbd1000 ac-1f-6b-a5-ab-ea: helper command: /sbin/drbdadm before-resync-source exit code 0
[  644.005485] drbd demo0/0 drbd1000 ac-1f-6b-a5-ab-ea: pdsk( Outdated -> Inconsistent ) repl( WFBitMapS -> SyncSource )
[  644.016708] drbd demo0/0 drbd1000 ac-1f-6b-a5-ab-ea: Began resync as SyncSource (will sync 8 KB [2 bits set]).
[  644.044848] drbd demo0 ac-1f-6b-a5-ab-ea: Committing remote state change 1952820562 (primary_nodes=1)
[  644.192065] drbd demo0/0 drbd1000 ac-1f-6b-a5-ab-ea: updated UUIDs FB35C925F1C156CD:0000000000000000:9FB8D3169D057002:0000000000000000
[  644.214321] drbd demo0/0 drbd1000 ac-1f-6b-a5-ab-ea: Resync done (total 1 sec; paused 0 sec; 8 K/sec)
[  644.224100] drbd demo0/0 drbd1000 ac-1f-6b-a5-ab-ea: pdsk( Inconsistent -> UpToDate ) repl( SyncSource -> Established )
[  788.660635] nvmet: creating nvm controller 1 for subsystem linbit:nvme:demo0 for NQN nqn.2014-08.org.nvmexpress:uuid:a0ed6504-1546-40bd-a593-8bf9aa93fc58.
[  788.921321] general protection fault, probably for non-canonical address 0xffedcf3970da3000: 0000 [#1] PREEMPT SMP NOPTI
[  788.932971] CPU: 24 PID: 971 Comm: kworker/24:1H Tainted: P S         OE     5.17.9-1.el8.elrepo.x86_64 #1
[  788.943410] Hardware name: Supermicro SYS-1029U-TN10RT/X11DPU, BIOS 3.2 10/16/2019
[  788.951736] Workqueue: nvmet_tcp_wq nvmet_tcp_io_work [nvmet_tcp]
[  788.958587] RIP: 0010:memcpy_erms+0x6/0x10
[  788.963437] Code: cc cc cc cc eb 1e 0f 1f 00 48 89 f8 48 89 d1 48 c1 e9 03 83 e2 07 f3 48 a5 89 d1 f3 a4 c3 66 0f 1f 44 00 00 48 89 f8 48 89 d1 <f3> a4 c3 0f 1f 80 00 00 00 00 48 89 f8 48 83 fa 20 72 7e 40 38 fe
[  788.983681] RSP: 0018:ffffa6785bc03bb8 EFLAGS: 00010202
[  788.989637] RAX: ffedcf3970da3000 RBX: 000000000000021c RCX: 000000000000021c
[  788.997506] RDX: 000000000000021c RSI: ffff98a9d7ab89da RDI: ffedcf3970da3000
[  789.005365] RBP: ffff984ab05e6f80 R08: 0000000000000000 R09: ffffffff927e2740
[  789.013213] R10: 0000000000000000 R11: ffff98aa6fe9cf60 R12: 0000000000006168
[  789.021059] R13: 000000000000021c R14: 000000000000021c R15: 0000000000000000
[  789.028896] FS:  0000000000000000(0000) GS:ffff99083f500000(0000) knlGS:0000000000000000
[  789.037686] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  789.044137] CR2: 00007f13544ab000 CR3: 00000082b520a003 CR4: 00000000007706e0
[  789.051963] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  789.059775] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[  789.067569] PKRU: 55555554
[  789.070932] Call Trace:
[  789.074034]  <TASK>
[  789.076778]  _copy_to_iter+0x3e6/0x6a0
[  789.081160]  ? __check_object_size+0x53/0x170
[  789.086130]  __skb_datagram_iter+0x19c/0x300
[  789.090993]  ? receiver_wake_function+0x20/0x20
[  789.096117]  skb_copy_datagram_iter+0x30/0x90
[  789.101060]  tcp_recvmsg_locked+0x1cd/0x8e0
[  789.105818]  tcp_recvmsg+0xa9/0x1e0
[  789.109874]  inet_recvmsg+0x5c/0x130
[  789.114014]  nvmet_tcp_io_work+0xcf/0xb05 [nvmet_tcp]
[  789.119630]  ? __switch_to_asm+0x42/0x70
[  789.124111]  ? finish_task_switch+0xb2/0x2c0
[  789.128922]  process_one_work+0x222/0x3f0
[  789.133466]  ? process_one_work+0x3f0/0x3f0
[  789.138181]  worker_thread+0x2d/0x3b0
[  789.142361]  ? process_one_work+0x3f0/0x3f0
[  789.147054]  kthread+0xd7/0x100
[  789.150686]  ? kthread_complete_and_exit+0x20/0x20
[  789.155970]  ret_from_fork+0x1f/0x30
[  789.160041]  </TASK>
[  789.162706] Modules linked in: tcp_diag(E) inet_diag(E) ext4(E) mbcache(E) jbd2(E) xt_multiport(E) nft_compat(E) nf_tables(E) nfnetlink(E) bcache(E) crc64(E) dm_cache(E) dm_persistent_data(E) dm_bio_prison(E) dm_bufio(E) dm_writecache(E) nvme_rdma(E) nvmet_rdma(E) rdma_cm(E) iw_cm(E) ib_cm(E) ib_core(E) dm_mod(E) drbd_transport_tcp(OE) drbd(OE) 8021q(E) garp(E) mrp(E) stp(E) llc(E) rfkill(E) sunrpc(E) intel_rapl_msr(E) intel_rapl_common(E) skx_edac(E) iTCO_wdt(E) intel_pmc_bxt(E) nfit(E) iTCO_vendor_support(E) libnvdimm(E) x86_pkg_temp_thermal(E) intel_powerclamp(E) coretemp(E) kvm_intel(E) kvm(E) irqbypass(E) ipmi_ssif(E) crct10dif_pclmul(E) crc32_pclmul(E) ghash_clmulni_intel(E) rapl(E) intel_cstate(E) i2c_i801(E) mei_me(E) intel_uncore(E) pcspkr(E) acpi_ipmi(E) joydev(E) mei(E) lpc_ich(E) i2c_smbus(E) ioatdma(E) intel_pch_thermal(E) ipmi_si(E) acpi_power_meter(E) acpi_pad(E) vfat(E) fat(E) binfmt_misc(E) sr_mod(E) sd_mod(E) cdrom(E) sg(E) xfs(E) ast(E) i2c_algo_bit(E)
[  789.162757]  drm_vram_helper(E) libcrc32c(E) drm_kms_helper(E) syscopyarea(E) sysfillrect(E) sysimgblt(E) ahci(E) fb_sys_fops(E) drm_ttm_helper(E) ttm(E) libahci(E) uas(E) ixgbe(E) crc32c_intel(E) drm(E) libata(E) mdio(E) usb_storage(E) dca(E) wmi(E) zfs(POE) zunicode(POE) zzstd(OE) zlua(OE) zavl(POE) icp(POE) zcommon(POE) znvpair(POE) spl(OE) nvmet_tcp(E) nvmet(E) nvme_tcp(E) nvme_fabrics(E) nvme(E) nvme_core(E) t10_pi(E) ipmi_devintf(E) ipmi_msghandler(E)
[  789.295168] BUG: kernel NULL pointer dereference, address: 0000000000000008
[  789.295182] ---[ end trace 0000000000000000 ]---
[  789.302663] #PF: supervisor write access in kernel mode
[  789.302665] #PF: error_code(0x0002) - not-present page
[  789.351645] RIP: 0010:memcpy_erms+0x6/0x10
[  789.353655] PGD 0 P4D 0
[  789.353657] Oops: 0002 [#2] PREEMPT SMP NOPTI
[  789.359322] Code: cc cc cc cc eb 1e 0f 1f 00 48 89 f8 48 89 d1 48 c1 e9 03 83 e2 07 f3 48 a5 89 d1 f3 a4 c3 66 0f 1f 44 00 00 48 89 f8 48 89 d1 <f3> a4 c3 0f 1f 80 00 00 00 00 48 89 f8 48 83 fa 20 72 7e 40 38 fe
[  789.363915] CPU: 72 PID: 2493 Comm: kworker/72:1H Tainted: P S    D    OE     5.17.9-1.el8.elrepo.x86_64 #1
[  789.363918] Hardware name: Supermicro SYS-1029U-TN10RT/X11DPU, BIOS 3.2 10/16/2019
[  789.363919] Workqueue: nvmet_tcp_wq nvmet_tcp_io_work [nvmet_tcp]
[  789.366959] RSP: 0018:ffffa6785bc03bb8 EFLAGS: 00010202
[  789.371812]
[  789.371813] RIP: 0010:free_unref_page_commit.isra.119+0x62/0x100
[  789.391619]
[  789.401875] Code: 03 0d 22 03 d3 6d b9 04 00 00 00 83 fa 04 0f 4e ca 8d 04 49 48 8d 4f 08 01 f0 48 98 48 83 c0 01 48 c1 e0 04 4c 01 c8 48 8b 30 <48> 89 4e 08 48 89 47 10 48 89 77 08 48 89 08 b8 01 00 00 00 89 d1
[  789.401877] RSP: 0000:ffffa6785bdffdb8 EFLAGS: 00010086
[  789.401879] RAX: ffff99083fc368d8 RBX: ffffe02e065ae040 RCX: ffffe02e065ae048
[  789.409990] RAX: ffedcf3970da3000 RBX: 000000000000021c RCX: 000000000000021c
[  789.416612] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffe02e065ae040
[  789.416614] RBP: 000000000002d868 R08: ffff9909bffd4b80 R09: ffff99083fc368c8
[  789.416614] R10: ffff98a9ce02e5d0 R11: 0000000000000000 R12: 0000000000000000
[  789.416615] R13: 0000000000000293 R14: 000000007fffffff R15: 0000000000000018
[  789.416616] FS:  0000000000000000(0000) GS:ffff99083fc00000(0000) knlGS:0000000000000000
[  789.422389] RDX: 000000000000021c RSI: ffff98a9d7ab89da RDI: ffedcf3970da3000
[  789.424417] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  789.424419] CR2: 0000000000000008 CR3: 00000082b520a003 CR4: 00000000007706e0
[  789.424420] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  789.424420] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[  789.424421] PKRU: 55555554
[  789.424422] Call Trace:
[  789.430979] RBP: ffff984ab05e6f80 R08: 0000000000000000 R09: ffffffff927e2740
[  789.433020]  <TASK>
[  789.433022]  free_unref_page+0x7d/0xe0
[  789.433024]  sgl_free_n_order+0x55/0x70
[  789.452937] R10: 0000000000000000 R11: ffff98aa6fe9cf60 R12: 0000000000006168
[  789.458732]  nvmet_tcp_free_cmd_buffers+0x28/0x50 [nvmet_tcp]
[  789.458736]  nvmet_tcp_io_work+0x4b8/0xb05 [nvmet_tcp]
[  789.466458] R13: 000000000000021c R14: 000000000000021c R15: 0000000000000000
[  789.474182]  ? __switch_to_asm+0x42/0x70
[  789.474185]  ? finish_task_switch+0xb2/0x2c0
[  789.481906] FS:  0000000000000000(0000) GS:ffff99083f500000(0000) knlGS:0000000000000000
[  789.489632]  process_one_work+0x222/0x3f0
[  789.489634]  ? process_one_work+0x3f0/0x3f0
[  789.489636]  worker_thread+0x2d/0x3b0
[  789.497368] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  789.505093]  ? process_one_work+0x3f0/0x3f0
[  789.505095]  kthread+0xd7/0x100
[  789.513786] CR2: 00007f13544ab000 CR3: 00000082b520a003 CR4: 00000000007706e0
[  789.521524]  ? kthread_complete_and_exit+0x20/0x20
[  789.521528]  ret_from_fork+0x1f/0x30
[  789.527880] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  789.535624]  </TASK>
[  789.535625] Modules linked in: tcp_diag(E) inet_diag(E) ext4(E) mbcache(E) jbd2(E)
[  789.543379] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[  789.551121]  xt_multiport(E) nft_compat(E) nf_tables(E) nfnetlink(E) bcache(E) crc64(E) dm_cache(E) dm_persistent_data(E)
[  789.554450] PKRU: 55555554
[  789.557498]  dm_bio_prison(E) dm_bufio(E) dm_writecache(E) nvme_rdma(E) nvmet_rdma(E) rdma_cm(E) iw_cm(E)
[  789.565261] Kernel panic - not syncing: Fatal exception
[  789.567986]  ib_cm(E) ib_core(E) dm_mod(E) drbd_transport_tcp(OE) drbd(OE) 8021q(E) garp(E) mrp(E) stp(E) llc(E) rfkill(E) sunrpc(E) intel_rapl_msr(E) intel_rapl_common(E) skx_edac(E) iTCO_wdt(E) intel_pmc_bxt(E) nfit(E) iTCO_vendor_support(E) libnvdimm(E) x86_pkg_temp_thermal(E) intel_powerclamp(E) coretemp(E) kvm_intel(E) kvm(E) irqbypass(E) ipmi_ssif(E) crct10dif_pclmul(E) crc32_pclmul(E) ghash_clmulni_intel(E) rapl(E) intel_cstate(E) i2c_i801(E) mei_me(E) intel_uncore(E) pcspkr(E) acpi_ipmi(E) joydev(E) mei(E) lpc_ich(E) i2c_smbus(E) ioatdma(E) intel_pch_thermal(E) ipmi_si(E) acpi_power_meter(E) acpi_pad(E) vfat(E) fat(E) binfmt_misc(E) sr_mod(E) sd_mod(E) cdrom(E) sg(E) xfs(E) ast(E) i2c_algo_bit(E) drm_vram_helper(E) libcrc32c(E) drm_kms_helper(E) syscopyarea(E) sysfillrect(E) sysimgblt(E) ahci(E) fb_sys_fops(E) drm_ttm_helper(E) ttm(E) libahci(E) uas(E) ixgbe(E) crc32c_intel(E) drm(E) libata(E) mdio(E) usb_storage(E) dca(E) wmi(E) zfs(POE) zunicode(POE) zzstd(OE) zlua(OE)
[  789.723178]  zavl(POE) icp(POE) zcommon(POE) znvpair(POE) spl(OE) nvmet_tcp(E) nvmet(E) nvme_tcp(E) nvme_fabrics(E) nvme(E) nvme_core(E) t10_pi(E) ipmi_devintf(E) ipmi_msghandler(E)
[  789.831341] CR2: 0000000000000008
[  789.835247] ---[ end trace 0000000000000000 ]---
[  789.867305] RIP: 0010:memcpy_erms+0x6/0x10
[  789.871995] Code: cc cc cc cc eb 1e 0f 1f 00 48 89 f8 48 89 d1 48 c1 e9 03 83 e2 07 f3 48 a5 89 d1 f3 a4 c3 66 0f 1f 44 00 00 48 89 f8 48 89 d1 <f3> a4 c3 0f 1f 80 00 00 00 00 48 89 f8 48 83 fa 20 72 7e 40 38 fe
[  789.891941] RSP: 0018:ffffa6785bc03bb8 EFLAGS: 00010202
[  789.897768] RAX: ffedcf3970da3000 RBX: 000000000000021c RCX: 000000000000021c
[  789.905507] RDX: 000000000000021c RSI: ffff98a9d7ab89da RDI: ffedcf3970da3000
[  789.913247] RBP: ffff984ab05e6f80 R08: 0000000000000000 R09: ffffffff927e2740
[  789.920995] R10: 0000000000000000 R11: ffff98aa6fe9cf60 R12: 0000000000006168
[  789.928745] R13: 000000000000021c R14: 000000000000021c R15: 0000000000000000
[  789.936492] FS:  0000000000000000(0000) GS:ffff99083fc00000(0000) knlGS:0000000000000000
[  789.945201] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  789.951565] CR2: 0000000000000008 CR3: 00000082b520a003 CR4: 00000000007706e0
[  789.959322] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  789.967072] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[  789.974818] PKRU: 55555554
[  790.634767] Shutting down cpus with NMI
[  790.639222] Kernel Offset: 0x11000000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
[  790.655463] ---[ end Kernel panic - not syncing: Fatal exception ]---

troubleshoot connection problem

Hello,
I am new to drbd and find it very hard to get proper help from google on drbd.
I just setup 2 nodes using the following version on CentOS 7:
DRBDADM_BUILDTAG=GIT-hash:\ a7820b3c14497a34f955ba5ce56cf1bc9d2d353e\ build\ by\ [email protected],\ 2021-05-02\ 22:04:05
DRBDADM_API_VERSION=2
DRBD_KERNEL_VERSION_CODE=0x09001c
DRBD_KERNEL_VERSION=9.0.28
DRBDADM_VERSION_CODE=0x091000
DRBDADM_VERSION=9.16.0

The disk on the master is setup and running fine...but it keeps trying to connect to the secondary node.
The secondary node is also trying to connect.
No errors seen in /var/log/messages

May 17 16:27:13 xena kernel: drbd himalaya himalaya.xx: conn( Connecting -> Disconnecting )
May 17 16:27:13 xena kernel: drbd himalaya himalaya.xx: Restarting sender thread
May 17 16:27:13 xena kernel: drbd himalaya himalaya.xx: Connection closed
May 17 16:27:13 xena kernel: drbd himalaya himalaya.xx: helper command: /sbin/drbdadm disconnected
May 17 16:27:13 xena kernel: drbd himalaya himalaya.xx: helper command: /sbin/drbdadm disconnected exit code 0
May 17 16:27:13 xena kernel: drbd himalaya himalaya.xx: conn( Disconnecting -> StandAlone )
May 17 16:27:13 xena kernel: drbd himalaya himalaya.xx: Terminating receiver thread

The following is my resource file:

resource himalaya {
connection {
host himalaya.xxx port 6999;
host xena.xxx port 6998;
net {
cram-hmac-alg sha256;
shared-secret "VmKuU2PPZzoHF3kuG/Km";
protocol C;
}
}
on xena.xxx {
node-id 1;
address ipv4 10.8.9.15:6998;
device /dev/drbd1;
disk /dev/himalaya/himalayadata;
meta-disk internal;
}
on himalaya.xxx {
node-id 0;
address ipv4 10.17.0.254:6999;
device /dev/drbd0;
disk /dev/mailpool/maildata;
meta-disk internal;
}
}

How can I turn on debugging ?
Where can I see some useful logs on what is causing the problem?

On the secondary node:
[root@xena drbd.d]# netstat -an | grep 6998
tcp 0 0 10.8.9.15:6998 0.0.0.0:* LISTEN
tcp 0 0 10.8.9.15:6998 103.xxx:52423 SYN_RECV
tcp 0 0 10.8.9.15:6998 103.xxx:60099 SYN_RECV
tcp 0 0 10.8.9.15:6998 103.xxx:53059 SYN_RECV

On the master node:
tcp 0 0 103.xxx:6999 0.0.0.0:* LISTEN
tcp 0 1 103.xxx:58212 10.8.9.15:6998 SYN_SENT

Question: is it possible to increase DRBD sync speed ?

Hi,
I've just installed DRBD functionality in between 2 VMs (each with 22 CPU cores, 60 GB RAM and a 930 GB DRBD shared disk ).
All good but I have a question regarding the sync speed.
After I've starterd the initial disk synchronization, I can read network speeds up to 40 MB / s.

On each of my host servers I have 2 x 1 Gbps network cards bounded in balance-alb mode. On each server i have 2 x Xeon 2630V2 with 64GB RAM and 2 x 1 TB SSD in RAID 1 configuration. As virtualization env I use Proxmox 8. The Switch in between the nodes is a 48 x 1GB + 2 x 10GB upload links. There are no other VMs running on these servers other than the 2 VMs each on its one server.

Theoretically I should see network speeds up to 200MB/s but drbd sync speed seems to be stuck at 40 MB/s. On standard file transfers operations I can see regularly speeds in the range of 160 MB/s in between the VMs.

NetworkSpeedDuringInitialSync

 cat /proc/drbd
version: 8.4.11 (api:1/proto:86-101)
srcversion: 9B671CCC1F00886BA069043
 0: cs:SyncSource ro:Primary/Secondary ds:UpToDate/Inconsistent C r-----
    ns:180441216 nr:0 dw:0 dr:180441216 al:16 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:794704668
        [==>.................] sync'ed: 18.6% (776076/952288)M
        finish: 5:27:02 speed: 40,480 (39,064) K/sec

Any idea how I can increase the DRBD sync speeds ?

Thanks

Kernel Panic 9.1.13

Hello!

I have got kernel panic with 9.1.13 for servel times.

I used lvm thin volume for backend device of drbd, the panic happended after lvm report "out-of-space", and drbd report " Local IO failed". I'm not sure if this error is related.

Here is Call trace:

[852021.832388] drbd pvc-2eee1044-77eb-4917-a10a-a8ac7ec9fc87/0 drbd1027: Cannot write resync data to local disk.
[852021.842178] drbd pvc-2eee1044-77eb-4917-a10a-a8ac7ec9fc87/0 drbd1027: drbd_rs_complete_io() called, but extent not found
[852021.853517] drbd pvc-2eee1044-77eb-4917-a10a-a8ac7ec9fc87/0 drbd1027: Cannot write resync data to local disk.
[852021.861847] drbd pvc-2eee1044-77eb-4917-a10a-a8ac7ec9fc87/0 drbd1027: write: error=3 s=837120s
[852021.872107] drbd pvc-2eee1044-77eb-4917-a10a-a8ac7ec9fc87/0 drbd1027: Cannot write resync data to local disk.
[852021.882849] drbd pvc-2eee1044-77eb-4917-a10a-a8ac7ec9fc87/0 drbd1027: drbd_rs_complete_io() called, but extent not found
[852021.922816] drbd pvc-2eee1044-77eb-4917-a10a-a8ac7ec9fc87/0 drbd1027: Cannot write resync data to local disk.
[852021.933043] drbd pvc-2eee1044-77eb-4917-a10a-a8ac7ec9fc87/0 drbd1027: Cannot write resync data to local disk.
[852021.943391] drbd pvc-2eee1044-77eb-4917-a10a-a8ac7ec9fc87/0 drbd1027: Cannot write resync data to local disk.
[852021.953585] drbd pvc-2eee1044-77eb-4917-a10a-a8ac7ec9fc87/0 drbd1027: Cannot write resync data to local disk.
[852021.963790] drbd pvc-2eee1044-77eb-4917-a10a-a8ac7ec9fc87/0 drbd1027: Cannot write resync data to local disk.
[852021.976787] drbd pvc-2eee1044-77eb-4917-a10a-a8ac7ec9fc87/0 drbd1027: drbd_rs_complete_io() called, but extent not found
[852021.987836] drbd pvc-2eee1044-77eb-4917-a10a-a8ac7ec9fc87/0 drbd1027: disk( Failed -> Diskless )
[852023.864090] drbd pvc-22b954d2-8f4b-41cb-aa0b-cd8aae39ee80 node-2: conn( Connected -> Disconnecting ) peer( Secondary -> Unknown )
[852023.876550] drbd pvc-22b954d2-8f4b-41cb-aa0b-cd8aae39ee80 node-2: ack_receiver terminated
[852023.885639] drbd pvc-22b954d2-8f4b-41cb-aa0b-cd8aae39ee80 node-2: Terminating ack_recv thread
[852023.895096] drbd pvc-22b954d2-8f4b-41cb-aa0b-cd8aae39ee80: Preparing cluster-wide state change 516851886 (1->-1 0/0)
[852023.905796] drbd pvc-22b954d2-8f4b-41cb-aa0b-cd8aae39ee80 node-2: Terminating sender thread
[852023.914940] drbd pvc-22b954d2-8f4b-41cb-aa0b-cd8aae39ee80 node-2: Starting sender thread (from drbd_r_pvc-22b9 [1950774])
[852023.927129] drbd pvc-22b954d2-8f4b-41cb-aa0b-cd8aae39ee80: State change 516851886: primary_nodes=0, weak_nodes=0
[852023.937582] drbd pvc-22b954d2-8f4b-41cb-aa0b-cd8aae39ee80: Committing cluster-wide state change 516851886 (40ms)
[852024.125021] drbd pvc-22b954d2-8f4b-41cb-aa0b-cd8aae39ee80 node-3: Preparing remote state change 444886533
[852024.135663] drbd pvc-22b954d2-8f4b-41cb-aa0b-cd8aae39ee80 node-2: Connection closed
[852024.144133] drbd pvc-22b954d2-8f4b-41cb-aa0b-cd8aae39ee80 node-2: conn( Disconnecting -> StandAlone )
[852024.154180] drbd pvc-22b954d2-8f4b-41cb-aa0b-cd8aae39ee80 node-2: Terminating receiver thread
[852024.163557] drbd pvc-22b954d2-8f4b-41cb-aa0b-cd8aae39ee80 node-2: Terminating sender thread
[852024.173147] drbd pvc-22b954d2-8f4b-41cb-aa0b-cd8aae39ee80 node-3: conn( Connected -> Disconnecting ) peer( Secondary -> Unknown )
[852024.185604] drbd pvc-22b954d2-8f4b-41cb-aa0b-cd8aae39ee80 node-3: ack_receiver terminated
[852024.194577] drbd pvc-22b954d2-8f4b-41cb-aa0b-cd8aae39ee80 node-3: Terminating ack_recv thread
[852024.204030] drbd pvc-22b954d2-8f4b-41cb-aa0b-cd8aae39ee80: State change failed: Concurrent state changes detected and aborted
[852024.215564] drbd pvc-22b954d2-8f4b-41cb-aa0b-cd8aae39ee80 node-3: Terminating sender thread
[852024.224779] drbd pvc-22b954d2-8f4b-41cb-aa0b-cd8aae39ee80 node-3: Starting sender thread (from drbd_r_pvc-22b9 [1950786])
[852024.243309] drbd pvc-22b954d2-8f4b-41cb-aa0b-cd8aae39ee80: Two-phase commit 444886533 timeout
[852024.323344] drbd pvc-22b954d2-8f4b-41cb-aa0b-cd8aae39ee80: Preparing cluster-wide state change 4222145303 (1->-1 0/0)
[852024.334113] drbd pvc-22b954d2-8f4b-41cb-aa0b-cd8aae39ee80: Committing cluster-wide state change 4222145303 (10ms)
[852024.513534] Unable to handle kernel paging request at virtual address 0031287469647699
[852024.704649] CPU: 42 PID: 1950786 Comm: drbd_r_pvc-22b9 Kdump: loaded Tainted: G        W  OE     5.15.67-6.cl9.aarch64 #1
[852024.715681] Hardware name: Inspur NF2180M3/YZMB-02016-101, BIOS 4.0.30 20210714
[852024.723067] pstate: 80000005 (Nzcv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[852024.730110] pc : cleanup_unacked_peer_requests+0x174/0x23c [drbd]
[852024.736316] lr : conn_disconnect+0x234/0x510 [drbd]
[852024.741300] sp : ffff80004432fd00
[852024.744696] x29: ffff80004432fd00 x28: 3631287469647561 x27: 0000000000000001
[852024.751917] x26: ffffffffffffffff x25: dead000000000100 x24: dead000000000122
[852024.759135] x23: 0000000000000001 x22: ffff80004432fd68 x21: ffff80004432fd48
[852024.766346] x20: ffff050067f50ae8 x19: ffff050067f50ac8 x18: 0000000000000000
[852024.773566] x17: 0000000000000000 x16: 0000000000000000 x15: 000000009b6dfd0a
[852024.780785] x14: ffff0300462b0060 x13: 0000000000000040 x12: 0000000000000000
[852024.788004] x11: 0000000000000000 x10: 0000000000000000 x9 : ffff800004e42594
[852024.795214] x8 : 0000000000000000 x7 : ffff0300462b05f0 x6 : 0000000000000000
[852024.802425] x5 : ffff0300462b03c4 x4 : 000000007265735f x3 : ffff050067f50ae8
[852024.809636] x2 : ffff050067f50ae8 x1 : ffff0300462b03c4 x0 : ffff03007d1a4800
[852024.816848] Call trace:
[852024.819378]  cleanup_unacked_peer_requests+0x174/0x23c [drbd]
[852024.825226]  conn_disconnect+0x234/0x510 [drbd]
[852024.829855]  drbd_receiver+0x70/0x84 [drbd]
[852024.834146]  drbd_thread_setup+0x74/0x280 [drbd]
[852024.838867]  kthread+0x110/0x11c
[852024.842187]  ret_from_fork+0x10/0x20
[852024.845847] Code: 540002e0 f9400e60 f940081c b9408004 (b9413b80)
[852024.852019] SMP: stopping secondary CPUs
[852024.861200] Starting crashdump kernel...
[852024.865234] Bye!

And i analyse the trace, it seems that struct drbd_peer_device pointed by peer_req->peer_device is error:

crash> rd ffff03007d1a4800 20
ffff03007d1a4800:  0000044d000000f0 0000000000000000   ....M...........
ffff03007d1a4810:  3631287469647561 3635333935373738   audit(1687759356
ffff03007d1a4820:  3530323a3338332e 70203a2936393931   .383:2051996): p
ffff03007d1a4830:  32363832323d6469 20303d6469752034   id=228624 uid=0 
ffff03007d1a4840:  3932343d64697561 2035393237363934   auid=4294967295 
ffff03007d1a4850:  343932343d736573 7320353932373639   ses=4294967295 s
ffff03007d1a4860:  747379733d6a6275 7379733a755f6d65   ubj=system_u:sys
ffff03007d1a4870:  6e753a725f6d6574 64656e69666e6f63   tem_r:unconfined
ffff03007d1a4880:  656369767265735f 736d2030733a745f   _service_t:s0 ms
ffff03007d1a4890:  41503d706f273d67 6e756f6363613a4d   g='op=PAM:accoun

but struct peer_req looks right:

crash> rd ffff050067f50ac8 20
ffff050067f50ac8:  ffff80007f18fd58 ffff80007f18fd58   X.......X.......
ffff050067f50ad8:  ffff800004e45450 ffff03007d1a4800   PT.......H.}....
ffff050067f50ae8:  ffff80004432fd68 ffff80004432fd68   h.2D....h.2D....
ffff050067f50af8:  ffff050067f50af8 ffff050067f50af8   ...g.......g....
ffff050067f50b08:  0000000000000000 0000000000000000   ................
ffff050067f50b18:  0000000000000801 ffff050067f50b20   ........ ..g....
ffff050067f50b28:  0000000000000000 0000000000000000   ................
ffff050067f50b38:  00000000008ca9f8 0000000200009000   ................
ffff050067f50b48:  00000000008caa40 0000000000000000   @...............
ffff050067f50b58:  0000000000002540 ffff0501839c9880   @%..............

is the ptr "peer_req->peer_device" freed somewhere?

Thank you!

DRBD kernel module mismatch in different distro's

Hi, I'm trying to understand something I've noticed in multiple distro's. That's that the kernel module is still version 8 and the userspace utilities are version 9. This software mismatch prevents me from using the connection mesh method, which is exclusively in DRBD 9.

I noticed this way back already, as described in here.

But this is still the case in Debian 11 as well:

drbdadm --version
DRBDADM_BUILDTAG=GIT-hash:\ baaca8a080dc54652f57da4bafb2dce51dfe9f68\ reproducible\ build\,\ 2020-09-29\ 09:05:36
DRBDADM_API_VERSION=1
DRBD_KERNEL_VERSION_CODE=0x08040b
DRBDADM_VERSION_CODE=0x090f00
DRBDADM_VERSION=9.15.0

And on Fedora 34:

drbdadm --version
DRBDADM_BUILDTAG=GIT-hash:\ a7820b3c14497a34f955ba5ce56cf1bc9d2d353e\ build\ by\ mockbuild@\,\ 2021-02-21\ 21:23:10
DRBDADM_API_VERSION=1
DRBD_KERNEL_VERSION_CODE=0x08040b
DRBDADM_VERSION_CODE=0x091000
DRBDADM_VERSION=9.16.0

What could be the reason for this? Is the kernel module not mature enough to be used? I suppose there is no way to use the connection mesh option when this old kernel module is used. The only alternatives would be to use the old stacking or build a module myself. Both not ideal.

modules dependency lost after injection

After inject drbd kernel module, the contents in modules.dep was lost:

[root@kylinos ~]# cat /lib/modules/$(uname -r)/modules.dep
updates/drbd_transport_tcp.ko: updates/drbd.ko
updates/drbd.ko:

OS info:

[root@kylinos ~]# cat /etc/os-release 
NAME="Kylin Linux Advanced Server"
VERSION="V10 (Sword)"
ID="kylin"
VERSION_ID="V10"
PRETTY_NAME="Kylin Linux Advanced Server V10 (Sword)"
ANSI_COLOR="0;31"

Kernel version:

[root@kylinos ~]# uname -a 
Linux kylinos 4.19.90-25.9.v2101.ky10.aarch64 #1 SMP Wed Dec 1 17:24:28 CST 2021 aarch64 aarch64 aarch64 GNU/Linux

How to reproduce:

[root@kylinos ~]#  docker run  -v /sys:/sys -v /dev:/dev  -v /usr/src:/usr/src:ro  -v /lib/modules:/lib/modules   -e LB_HOW=compile  -e LB_INSTALL=yes  --privileged  --rm  -i piraeusdatastore/drbd9-bionic:v9.1.4

Injection logs:

[root@kylinos ~]# docker run  -v /sys:/sys -v /dev:/dev  -v /usr/src:/usr/src:ro  -v /lib/modules:/lib/modules   -e LB_HOW=compile  -e LB_INSTALL=yes  --privileged  --rm  -i piraeusdatastore/drbd9-bionic:v9.1.4
Need a git checkout to regenerate drbd/.drbd_git_revision
make[1]: Entering directory '/tmp/pkg/drbd-9.1.4/drbd'

    Calling toplevel makefile of kernel source tree, which I believe is in
    KDIR=/lib/modules/4.19.90-25.9.v2101.ky10.aarch64/build

make -C /lib/modules/4.19.90-25.9.v2101.ky10.aarch64/build   M=/tmp/pkg/drbd-9.1.4/drbd  modules
  COMPAT  __vmalloc_has_2_params
  COMPAT  alloc_workqueue_takes_fmt
  COMPAT  before_4_13_kernel_read
  COMPAT  blkdev_issue_zeroout_discard
  COMPAT  can_include_vermagic_h
  COMPAT  genl_policy_in_ops
  COMPAT  have_BIO_MAX_VECS
  COMPAT  have_CRYPTO_TFM_NEED_KEY
  COMPAT  have_SHASH_DESC_ON_STACK
  COMPAT  have_WB_congested_enum
  COMPAT  have_allow_kernel_signal
  COMPAT  have_bdi_cap_stable_writes
  COMPAT  have_bdi_congested_fn
  COMPAT  have_bio_bi_bdev
  COMPAT  have_bio_bi_error
  COMPAT  have_bio_bi_opf
  COMPAT  have_bio_bi_status
  COMPAT  have_bio_clone_fast
  COMPAT  have_bio_op_shift
  COMPAT  have_bio_set_dev
  COMPAT  have_bio_set_op_attrs
  COMPAT  have_bio_start_io_acct
  COMPAT  have_bioset_init
  COMPAT  have_bioset_need_bvecs
  COMPAT  have_blk_alloc_queue_rh
  COMPAT  have_blk_check_plugged
  COMPAT  have_blk_qc_t_make_request
  COMPAT  have_blk_queue_flag_set
  COMPAT  have_blk_queue_make_request
  COMPAT  have_blk_queue_merge_bvec
  COMPAT  have_blk_queue_plugged
  COMPAT  have_blk_queue_split_bio
  COMPAT  have_blk_queue_split_q_bio
  COMPAT  have_blk_queue_split_q_bio_bioset
  COMPAT  have_blk_queue_update_readahead
  COMPAT  have_blk_queue_write_cache
  COMPAT  have_d_inode
  COMPAT  have_fallthrough
  COMPAT  have_generic_start_io_acct_q_rw_sect_part
  COMPAT  have_generic_start_io_acct_rw_sect_part
  COMPAT  have_genl_family_parallel_ops
  COMPAT  have_hd_struct
  COMPAT  have_ib_cq_init_attr
  COMPAT  have_ib_get_dma_mr
  COMPAT  have_idr_is_empty
  COMPAT  have_inode_lock
  COMPAT  have_ktime_to_timespec64
  COMPAT  have_kvfree
  COMPAT  have_max_send_recv_sge
  COMPAT  have_nla_nest_start_noflag
  COMPAT  have_nla_parse_deprecated
  COMPAT  have_nla_put_64bit
  COMPAT  have_nla_strscpy
  COMPAT  have_part_stat_h
  COMPAT  have_part_stat_read_accum
  COMPAT  have_pointer_backing_dev_info
  COMPAT  have_proc_create_single
  COMPAT  have_queue_flag_stable_writes
  COMPAT  have_rb_declare_callbacks_max
  COMPAT  have_refcount_inc
  COMPAT  have_req_flush
  COMPAT  have_req_hardbarrier
  COMPAT  have_req_noidle
  COMPAT  have_req_nounmap
  COMPAT  have_req_op_write
  COMPAT  have_req_op_write_same
  COMPAT  have_req_op_write_zeroes
  COMPAT  have_req_prio
  COMPAT  have_req_write
  COMPAT  have_req_write_same
  COMPAT  have_revalidate_disk_size
  COMPAT  have_sched_set_fifo
  COMPAT  have_security_netlink_recv
  COMPAT  have_sendpage_ok
  COMPAT  have_set_capacity_and_notify
  COMPAT  have_shash_desc_zero
  COMPAT  have_simple_positive
  COMPAT  have_sock_set_keepalive
  COMPAT  have_struct_bvec_iter
  COMPAT  have_struct_kernel_param_ops
  COMPAT  have_struct_size
  COMPAT  have_submit_bio
  COMPAT  have_submit_bio_noacct
  COMPAT  have_tcp_sock_set_cork
  COMPAT  have_tcp_sock_set_nodelay
  COMPAT  have_tcp_sock_set_quickack
  COMPAT  have_time64_to_tm
  COMPAT  have_timer_setup
  COMPAT  have_void_make_request
  COMPAT  ib_alloc_pd_has_2_params
  COMPAT  ib_device_has_ops
  COMPAT  ib_post_send_const_params
  COMPAT  ib_query_device_has_3_params
  COMPAT  need_make_request_recursion
  COMPAT  part_stat_read_takes_block_device
  COMPAT  queue_limits_has_discard_zeroes_data
  COMPAT  rdma_create_id_has_net_ns
  COMPAT  sock_create_kern_has_five_parameters
  COMPAT  sock_ops_returns_addr_len
  UPD     /tmp/pkg/drbd-9.1.4/drbd/compat.4.19.90-25.9.v2101.ky10.aarch64.h
  UPD     /tmp/pkg/drbd-9.1.4/drbd/compat.h
make[4]: 'drbd-kernel-compat/cocci_cache/0227ebbe9035d69a25a6e7bee7eef61a/compat.patch' is up to date.
  PATCH
patching file ./drbd_int.h
patching file drbd-headers/linux/genl_magic_struct.h
patching file drbd_receiver.c
patching file drbd_main.c
patching file drbd_nla.c
patching file drbd_nl.c
patching file drbd_transport_tcp.c
patching file drbd_bitmap.c
patching file drbd_interval.c
patching file drbd_debugfs.c
patching file drbd_req.c
patching file drbd_state.c
patching file drbd_sender.c
patching file drbd-headers/linux/genl_magic_func.h
Hunk #2 succeeded at 312 (offset -20 lines).
  CC [M]  /tmp/pkg/drbd-9.1.4/drbd/drbd_debugfs.o
  CC [M]  /tmp/pkg/drbd-9.1.4/drbd/drbd_bitmap.o
  CC [M]  /tmp/pkg/drbd-9.1.4/drbd/drbd_proc.o
  CC [M]  /tmp/pkg/drbd-9.1.4/drbd/drbd_sender.o
  CC [M]  /tmp/pkg/drbd-9.1.4/drbd/drbd_receiver.o
  CC [M]  /tmp/pkg/drbd-9.1.4/drbd/drbd_req.o
  CC [M]  /tmp/pkg/drbd-9.1.4/drbd/drbd_actlog.o
  CC [M]  /tmp/pkg/drbd-9.1.4/drbd/lru_cache.o
  CC [M]  /tmp/pkg/drbd-9.1.4/drbd/drbd_main.o
  CC [M]  /tmp/pkg/drbd-9.1.4/drbd/drbd_strings.o
  CC [M]  /tmp/pkg/drbd-9.1.4/drbd/drbd_nl.o
  CC [M]  /tmp/pkg/drbd-9.1.4/drbd/drbd_interval.o
  CC [M]  /tmp/pkg/drbd-9.1.4/drbd/drbd_state.o
  CC [M]  /tmp/pkg/drbd-9.1.4/drbd/drbd-kernel-compat/drbd_wrappers.o
  CC [M]  /tmp/pkg/drbd-9.1.4/drbd/drbd_nla.o
  CC [M]  /tmp/pkg/drbd-9.1.4/drbd/drbd_transport.o
  GEN     /tmp/pkg/drbd-9.1.4/drbd/drbd_buildtag.c 
  CC [M]  /tmp/pkg/drbd-9.1.4/drbd/drbd_buildtag.o
  LD [M]  /tmp/pkg/drbd-9.1.4/drbd/drbd.o
  CC [M]  /tmp/pkg/drbd-9.1.4/drbd/drbd_transport_tcp.o
  Building modules, stage 2.
  MODPOST 2 modules
  CC      /tmp/pkg/drbd-9.1.4/drbd/drbd.mod.o
  LD [M]  /tmp/pkg/drbd-9.1.4/drbd/drbd.ko
  CC      /tmp/pkg/drbd-9.1.4/drbd/drbd_transport_tcp.mod.o
  LD [M]  /tmp/pkg/drbd-9.1.4/drbd/drbd_transport_tcp.ko
mv .drbd_kernelrelease.new .drbd_kernelrelease
Memorizing module configuration ... done.
make[1]: Leaving directory '/tmp/pkg/drbd-9.1.4/drbd'

        Module build was successful.
=======================================================================
  With DRBD module version 8.4.5, we split out the management tools
  into their own repository at https://github.com/LINBIT/drbd-utils
  (tarball at http://links.linbit.com/drbd-download)

  That started out as "drbd-utils version 8.9.0",
  has a different release cycle,
  and provides compatible drbdadm, drbdsetup and drbdmeta tools
  for DRBD module versions 8.3, 8.4 and 9.

  Again: to manage DRBD 9 kernel modules and above,
  you want drbd-utils >= 9.3 from above url.
=======================================================================
make -C drbd install
make[1]: Entering directory '/tmp/pkg/drbd-9.1.4/drbd'
install -d //lib/modules/4.19.90-25.9.v2101.ky10.aarch64/updates
set -e ; for ko in drbd.ko drbd_transport_tcp.ko; do \
        install -m 644 $ko //lib/modules/4.19.90-25.9.v2101.ky10.aarch64/updates; \
done
/sbin/depmod -a || :
make[1]: Leaving directory '/tmp/pkg/drbd-9.1.4/drbd'

DRBD version loaded:
version: 9.1.4 (api:2/proto:110-121)
GIT-hash: e4de25c3a65811b0fa4733b1c2a000ee322f5cfa build by @c8d9156d50f7, 2022-02-28 01:25:35
Transports (api:17): tcp (9.1.4)

Cannot write files into highly available NFS storage created with DRBD and Pacemake. (Permission denied error returned)

I am trying to set up a highly available NFS storage with DRBD and Pacemake (first time doing this), on 2 Fedora 38 VMs.

My main guidance on this endeavor were these 2 docs: doc1 doc2

I've managed to start the pacemaker cluster and to mount the NFS shared folder on my hosts, but when I try to write something in that folder, I get a prmission denied error.

Changing the mount point permission to 666 or 777 doesn't help.

Any idea what could be wrong ?

My DRBD configs looks like this:

#> sudo vi /etc/drbd.d/global_common.conf 
global {
 usage-count  yes;
}
common {
 disk {
    no-disk-flushes;
    no-disk-barrier;
    c-fill-target 24M;
    c-max-rate   720M;
    c-plan-ahead    15;
    c-min-rate     4M;
  }
  net {
    protocol C;
    max-buffers            36k;
    sndbuf-size            1024k;
    rcvbuf-size            2048k;
  }
}

#> sudo vi /etc/drbd.d/ha_nfs.res

resource ha_nfs {
  device "/dev/drbd1003";
  disk "/dev/nfs/share";
  meta-disk internal;
  on server1.test {
    address 192.168.1.116:7789;
  }
  on server2.test {
    address 192.168.1.167:7789;
  }
}

the pacemaker config looks like this:

crm> configure edit
node 1: server1.test
node 2: server2.test
primitive p_drbd_attr ocf:linbit:drbd-attr
primitive p_drbd_ha_nfs ocf:linbit:drbd \
        params drbd_resource=ha_nfs \
        op monitor timeout=20s interval=21s role=Slave start-delay=12s \
        op monitor timeout=20s interval=20s role=Master start-delay=8s
primitive p_expfs_nfsshare_exports_HA exportfs \
        params clientspec="192.168.1.0/24" directory="/nfsshare/exports/HA" fsid=1003 unlock_on_stop=1 options="rw,mountpoint" \
        op monitor interval=15s timeout=40s start-delay=15s \
        op_params OCF_CHECK_LEVEL=0 \
        op start interval=0s timeout=40s \
        op stop interval=0s timeout=120s
primitive p_fs_nfsshare_exports_HA Filesystem \
        params device="/dev/drbd1003" directory="/nfsshare/exports/HA" fstype=ext4 run_fsck=no \
        op monitor interval=15s timeout=40s start-delay=15s \
        op_params OCF_CHECK_LEVEL=0 \
        op start interval=0s timeout=60s \
        op stop interval=0s timeout=60s
primitive p_nfsserver nfsserver
primitive p_pb_block portblock \
        params action=block ip=192.168.1.101 portno=2049 protocol=tcp
primitive p_pb_unblock portblock \
        params action=unblock ip=192.168.1.101 portno=2049 tickle_dir="/srv/drbd-nfs/nfstest/.tickle" reset_local_on_unblock_stop=1 protocol=tcp \
        op monitor interval=10s timeout=20s start-delay=15s
primitive p_virtip IPaddr2 \
        params ip=192.168.1.101 cidr_netmask=32 \
        op monitor interval=1s timeout=40s start-delay=0s \
        op start interval=0s timeout=20s \
        op stop interval=0s timeout=20s
ms ms_drbd_ha_nfs p_drbd_ha_nfs \
        meta master-max=1 master-node-max=1 clone-node-max=1 clone-max=2 notify=true
clone c_drbd_attr p_drbd_attr
colocation co_ha_nfs inf: p_pb_block p_virtip ms_drbd_ha_nfs:Master p_fs_nfsshare_exports_HA p_expfs_nfsshare_exports_HA p_nfsserver p_pb_unblock
property cib-bootstrap-options: \
        have-watchdog=false \
        cluster-infrastructure=corosync \
        cluster-name=nfsCluster \
        stonith-enabled=false \
        no-quorum-policy=ignore

PCS sttatus output:

[bebe@server2 share]$ sudo pcs status
[sudo] password for bebe:
Cluster name: nfsCluster
Cluster Summary:
  * Stack: corosync (Pacemaker is running)
  * Current DC: server1.test (version 2.1.6-4.fc38-6fdc9deea29) - partition with quorum
  * Last updated: Thu Jul 13 08:50:34 2023 on server2.test
  * Last change:  Thu Jul 13 08:27:46 2023 by hacluster via crmd on server1.test
  * 2 nodes configured
  * 10 resource instances configured

Node List:
  * Online: [ server1.test server2.test ]

Full List of Resources:
  * p_virtip    (ocf::heartbeat:IPaddr2):        Started server2.test
  * p_expfs_nfsshare_exports_HA (ocf::heartbeat:exportfs):       Started server2.test
  * p_fs_nfsshare_exports_HA    (ocf::heartbeat:Filesystem):     Started server2.test
  * p_nfsserver (ocf::heartbeat:nfsserver):      Started server2.test
  * p_pb_block  (ocf::heartbeat:portblock):      Started server2.test
  * p_pb_unblock        (ocf::heartbeat:portblock):      Started server2.test
  * Clone Set: ms_drbd_ha_nfs [p_drbd_ha_nfs] (promotable):
    * Masters: [ server2.test ]
    * Slaves: [ server1.test ]
  * Clone Set: c_drbd_attr [p_drbd_attr]:
    * Started: [ server1.test server2.test ]

Daemon Status:
  corosync: active/enabled
  pacemaker: active/enabled
  pcsd: active/enabled
DRBD status output:

[bebe@server2 share]$ sudo drbdadm status ha_nfs
ha_nfs role:Primary
  disk:UpToDate
  peer role:Secondary
    replication:Established peer-disk:UpToDate

Multi-Primary support

Hello!

I would like to know if latest version supports multi-primary resources on more than 2 nodes (eg: 3 or more nodes)

Thanks,
Marco

Can the root partition be docked to DRBD without rebooting?

Now I have done the docking DRBD of data disks other than the root partition, but if I want to dock the root partition, I must reboot and use initramfs to dock the root partition. Is there a way to connect the root partition to DRBD without rebooting?

DRBD resources change to `Consistent` State after the disk node is powered off and on.

From time to time my set of HA testing on DRBD has shown that some DRBD resources go into the "Consistent" state when a node carrying them is powered off and powered on after a prolonged period of time.

Setup 1 -

This happens 7/10 times on the following setup -

3 nodes with 1 disk node and 2 diskless nodes.
Replication and auto-eviction are turned off.

Test case procedures -

  1. Shutdown the ONLY disk node and wait for 30 mins before powering on.
  2. linstor resource list shows some resources in Consistent state while their Diskless counterpart is Usage = InUse

There are only 2 replicas associated with a single resource - Diskless and Disk.

As a workaround,
kubectl exec -n piraeus <ns_pod-name> -c linstor-satellite -- drbdsetup disconnect <Consistent-state-resource-name> <node-id of diskless peer>
kubectl exec -n piraeus <ns_pod-name> -c linstor-satellite -- drbdsetup connect <Consistent-state-resource-name> <node-id of diskless peer>

This brings back the resource into UpToDate state. BUT sometimes this workaround puts the resource into Outdated and then it becomes a totally different problem to solve which I don't know how to recover from when this is the only physical replica available on a cluster and the Diskless resource is connected to it.

Setup 2 -

This issue happens almost 2/10 times on a 3 node cluster with 2 disk nodes and 1 diskless node.
Replication is turned on and auto-eviction is turned off.

Test case procedures -

  1. Shutdown any disk node and wait for 30 mins before powering on.
  2. linstor resource list shows some resources in Consistent state.
    This one is easy to get by because replication is turned on so the other replica becomes Primary and starts to serve data. So, I can use drbdsetup disconnect and connect and even delete this resource when it goes into an Outdated state.

However, it is not straightforward if the application replicates data by itself and does not use DRBD for replication. In other words, DRBD replication is turned off for this resource which goes back to a situation similar to Setup 1

For instance on a node cluster with 1 disk and 2 diskless nodes -

linstor resource list
Please see below for a complete set of resources. pvc-ae4bb911-a227-4dde-a81a-2911d1c14aac is marked as Consistent state

k exec --namespace=piraeus deployment/piraeus-op-piraeus-operator-cs-controller -- linstor r l
+--------------------------------------------------------------------------------------------------------------------------------+
| ResourceName | Node | Port | Usage | Conns | State | CreatedOn |
|================================================================================================================================|
| pvc-5bac2d82-9e42-4e0d-b828-3c598f2d8795 | flex188-126.dr.avaya.com | 7009 | Unused | Ok | UpToDate | 2022-03-22 04:58:52 |
| pvc-5bac2d82-9e42-4e0d-b828-3c598f2d8795 | flex188-128.dr.avaya.com | 7009 | InUse | Ok | Diskless | 2022-03-22 04:58:53 |
| pvc-16c0b34b-bed3-4219-8b9f-415e8d1734fb | flex188-126.dr.avaya.com | 7005 | InUse | Ok | UpToDate | 2022-03-22 04:56:35 |
| pvc-19dc5cea-733a-41b3-bd83-a2c4ea5012da | flex188-126.dr.avaya.com | 7004 | InUse | Ok | UpToDate | 2022-03-22 04:56:31 |
| pvc-32e7a7bf-f0c2-4bca-941b-102780fcf7bd | flex188-126.dr.avaya.com | 7003 | Unused | Ok | UpToDate | 2022-03-22 05:49:52 |
| pvc-32e7a7bf-f0c2-4bca-941b-102780fcf7bd | flex188-128.dr.avaya.com | 7003 | InUse | Ok | Diskless | 2022-03-22 05:49:54 |
| pvc-367dca54-39c8-415e-9633-0295730bbd44 | flex188-126.dr.avaya.com | 7002 | InUse | Ok | UpToDate | 2022-03-21 04:47:47 |
| pvc-69090137-263d-4cca-b402-02fd5f377041 | flex188-126.dr.avaya.com | 7008 | Unused | Ok | UpToDate | 2022-03-22 04:56:48 |
| pvc-69090137-263d-4cca-b402-02fd5f377041 | flex188-128.dr.avaya.com | 7008 | InUse | Ok | Diskless | 2022-03-22 04:56:52 |
| pvc-a36f02aa-da1a-4eaf-b797-b035dfcd5a22 | flex188-126.dr.avaya.com | 7006 | Unused | Ok | UpToDate | 2022-03-22 04:56:36 |
| pvc-a36f02aa-da1a-4eaf-b797-b035dfcd5a22 | flex188-127.dr.avaya.com | 7006 | InUse | Ok | Diskless | 2022-03-22 04:56:38 |
| pvc-ae4bb911-a227-4dde-a81a-2911d1c14aac | flex188-126.dr.avaya.com | 7012 | Unused | Ok | Consistent | 2022-03-22 06:22:05 |
| pvc-ae4bb911-a227-4dde-a81a-2911d1c14aac | flex188-128.dr.avaya.com | 7012 | InUse | Ok | Diskless | 2022-03-22 16:20:24 |

| pvc-c806c751-1efd-405b-bf63-22c7ffc53ede | flex188-126.dr.avaya.com | 7007 | InUse | Ok | UpToDate | 2022-03-22 04:56:48 |
| pvc-cf52d305-926d-40b3-95b8-72c07b623d19 | flex188-126.dr.avaya.com | 7001 | InUse | Ok | UpToDate | 2022-03-21 04:47:46 |
| pvc-d39187c2-d560-42db-8dc3-c6e57505ae72 | flex188-126.dr.avaya.com | 7000 | Unused | Ok | UpToDate | 2022-03-21 04:07:50 |
| pvc-d39187c2-d560-42db-8dc3-c6e57505ae72 | flex188-127.dr.avaya.com | 7000 | InUse | Ok | Diskless | 2022-03-21 04:07:53 |
+--------------------------------------------------------------------------------------------------------------------------------+

k exec -n piraeus piraeus-op-piraeus-operator-ns-node-cv9qs -c linstor-satellite -- drbdadm dstate pvc-ae4bb911-a227-4dde-a81a-2911d1c14aac
Consistent/Diskless

k exec -n piraeus piraeus-op-piraeus-operator-ns-node-cv9qs -c linstor-satellite -- drbdadm cstate pvc-ae4bb911-a227-4dde-a81a-2911d1c14aac
Connected

k exec -n piraeus piraeus-op-piraeus-operator-ns-node-cv9qs -c linstor-satellite -- drbdadm dump pvc-ae4bb911-a227-4dde-a81a-2911d1c14aac

# resource pvc-ae4bb911-a227-4dde-a81a-2911d1c14aac on flex188-126.dr.avaya.com: not ignored, not stacked
# defined at /var/lib/linstor.d/pvc-ae4bb911-a227-4dde-a81a-2911d1c14aac.res:6
resource pvc-ae4bb911-a227-4dde-a81a-2911d1c14aac {
    on flex188-126.dr.avaya.com {
        node-id 0;
        volume 0 {
            disk {
                discard-zeroes-if-aligned yes;
                rs-discard-granularity 262144;
            }
            device       minor 1012;
            disk         /dev/vg_sds/pvc-ae4bb911-a227-4dde-a81a-2911d1c14aac_00000;
            meta-disk    internal;
        }
    }
    on flex188-128.dr.avaya.com {
        node-id 1;
        volume 0 {
            disk {
                discard-zeroes-if-aligned yes;
                rs-discard-granularity 262144;
            }
            device       minor 1012;
            disk         none;
            meta-disk    internal;
        }
    }
    connection {
        host flex188-126.dr.avaya.com         address         ipv4 10.129.188.126:7012;
        host flex188-128.dr.avaya.com         address         ipv4 10.129.188.128:7012;
        net {
            _name        flex188-128.dr.avaya.com;
        }
    }
    options {
        quorum           off;
    }
    net {
        cram-hmac-alg    sha1;
        shared-secret    "GY+QykHkQDGxatroysuB";
        max-buffers      10000;
        max-epoch-size   10000;
        protocol           C;
        sndbuf-size        0;
        verify-alg       crct10dif-pclmul;
    }
}


Software version -

k exec --namespace=piraeus deployment/piraeus-op-piraeus-operator-cs-controller -- linstor --version
linstor 1.13.0; GIT-hash: 840cf57c75c166659509e22447b2c0ca6377ee6d

k exec --namespace=piraeus deployment/piraeus-op-piraeus-operator-cs-controller -- drbdadm -V
DRBDADM_BUILDTAG=GIT-hash:\ 087ee6b4961ca154d76e4211223b03149373bed8\ build\ by\ @buildsystem,\ 2022-01-28\ 12:19:33
DRBDADM_API_VERSION=2
DRBD_KERNEL_VERSION_CODE=0x090106
DRBD_KERNEL_VERSION=9.1.6
DRBDADM_VERSION_CODE=0x091402
DRBDADM_VERSION=9.20.2

piraeus-operator-1.8.0

uname -a
4.18.0-348.20.1.el8_5.x86_64 #1 SMP Tue Mar 8 12:56:54 EST 2022 x86_64 x86_64 x86_64 GNU/Linux

New kernel 5.10 compile issue on Debian: Fatal Python error: init_fs_encoding: failed to get the Python codec of the filesystem encoding

Hi,

I'm mentioning "new" because I've gotten drbd to compile just fine before, and even in the same (Debian 11/x86_64) environment.

I'm using coccinelle version 1.1.0.deb-1.1, as this comes with Debian 11 and this also appears to be a sufficient version according to the documentation. Also, Python is version 3.9. I haven't added anything to the environment.

The compile ends here:

  COMPAT  have_time64_to_tm
  COMPAT  have_timer_setup
  COMPAT  have_void_make_request
  COMPAT  have_void_submit_bio
  COMPAT  ib_alloc_pd_has_2_params
  COMPAT  ib_device_has_ops
  COMPAT  ib_post_send_const_params
  COMPAT  ib_query_device_has_3_params
  COMPAT  need_drbd_wrappers
  COMPAT  need_make_request_recursion
  COMPAT  part_stat_read_takes_block_device
  COMPAT  queue_limits_has_discard_zeroes_data
  COMPAT  rdma_create_id_has_net_ns
  COMPAT  sock_create_kern_has_five_parameters
  COMPAT  sock_ops_returns_addr_len
  COMPAT  struct_gendisk_has_backing_dev_info
  UPD     /usr/local/src/drbd/drbd/compat.5.10.167.h
  UPD     /usr/local/src/drbd/drbd/compat.h
  GENPATCHNAMES   5.10.167
  SPATCH   23697dbdc9be21da652a59bd3892db55  5.10.167
    drbd-kernel-compat/cocci_cache/23697dbdc9be21da652a59bd3892db55/.compat.cocci
    : Python path configuration:
    :   PYTHONHOME = '/usr/lib/x86_64-linux-gnu/..'
    :   PYTHONPATH = '/usr/bin/../lib/coccinelle/python'
    :   program name = 'python3'
    :   isolated = 0
    :   environment = 1
    :   user site = 1
    :   import site = 1
    :   sys._base_executable = '/usr/bin/python3'
    :   sys.base_prefix = '/usr/lib/x86_64-linux-gnu/..'
    :   sys.base_exec_prefix = '/usr/lib/x86_64-linux-gnu/..'
    :   sys.platlibdir = 'lib'
    :   sys.executable = '/usr/bin/python3'
    :   sys.prefix = '/usr/lib/x86_64-linux-gnu/..'
    :   sys.exec_prefix = '/usr/lib/x86_64-linux-gnu/..'
    :   sys.path = [
    :     '/usr/bin/../lib/coccinelle/python',
    :     '/usr/lib/x86_64-linux-gnu/../lib/python39.zip',
    :     '/usr/lib/x86_64-linux-gnu/../lib/python3.9',
    :     '/usr/lib/x86_64-linux-gnu/../lib/python3.9/lib-dynload',
    :   ]
    : Fatal Python error: init_fs_encoding: failed to get the Python codec of the filesystem encoding
    : Python runtime state: core initialized
    : ModuleNotFoundError: No module named 'encodings'
    : 
    : Current thread 0x00007f5f384a7180 (most recent call first):
    : <no Python frame>
make[4]: *** [Makefile:177: drbd-kernel-compat/cocci_cache/23697dbdc9be21da652a59bd3892db55/compat.patch] Error 1
make[3]: *** [/usr/local/src/drbd/drbd/Kbuild:122: /usr/local/src/drbd/drbd/drbd-kernel-compat/compat.patch] Error 2
make[2]: *** [Makefile:1819: /usr/local/src/drbd/drbd] Error 2
make[1]: *** [Makefile:134: kbuild] Error 2
make[1]: Leaving directory '/usr/local/src/drbd/drbd'
make: *** [Makefile:126: module] Error 2

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.