Is this a bug report or feature request? Bug

What I think I discovered is two things: some o

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Failure to create OSD (obtaining uniq OSD.id) blocks further provisioning and possibly does not reach desired multiple OSDs per device about rook HOT 12 CLOSED

bdowling commented on July 17, 2024

Failure to create OSD (obtaining uniq OSD.id) blocks further provisioning and possibly does not reach desired multiple OSDs per device

from rook.

Comments (12)

BlaineEXE commented on July 17, 2024

I have seen logs with text that is similar to "OSD ID exists and does not match my key" when there is an old OSD present on a device that hasn't been fully wiped after an old Rook/Ceph deployment. It's likely that you need to run sgdisk zap --all on the disk in question.

I suspect this may also resolve the unused space issues you're seeing.

from rook.

bdowling commented on July 17, 2024

I'll check, but I'm not sure that is the same issue here. These are all new hosts that have never had OSDs on these disks; they did actually have LVMs that were cleared before doing discovery. We did remove a number of old OSDs off other nodes recently, as we are trying to migrate storage to new nodes..

from rook.

bdowling commented on July 17, 2024

I may have more issues going on; I found a number of OSDs that were in Crashloop state reporting:

debug 2024-05-17T23:29:41.506+0000 7f048da93700 -1 monclient(hunting): handle_auth_bad_method server allowed_methods [2] but i only support [2]

What I discovered is that they key on the keyring file for these OSDs (/var/lib/rook/rook-ceph/*/keyring with corresponding whoami files) does not match what 'ceph auth get osd.[id]' returns. Manually fixing these keys I see the OSDs are able to start; but the keyring files are replaced at some point again with the incorrect keys (restarting the OSD pod seems to trigger this).

Not sure exactly what process is creating these keyring files or why they have the incorrect key in them.

from rook.

bdowling commented on July 17, 2024

Finding we have 31 duplicate OSD IDs on active storage nodes, a sample:

  ng-xdzv-ef254 /var/lib/rook/rook-ceph/3aaaed8f-7f05-4a4e-a780-70d0b19416fb_06040623-9669-47d0-83ec-2c3b796474a5/whoami:429
  ng-xdzv-2a0c7 /var/lib/rook/rook-ceph/3aaaed8f-7f05-4a4e-a780-70d0b19416fb_c28193f2-f527-43d6-aa1f-14287d5a13ec/whoami:429

  ng-xdzv-65657 /var/lib/rook/rook-ceph/3aaaed8f-7f05-4a4e-a780-70d0b19416fb_e39861f0-89b7-4c80-8b32-e23653d8f651/whoami:431
  ng-xdzv-da6f6 /var/lib/rook/rook-ceph/3aaaed8f-7f05-4a4e-a780-70d0b19416fb_359674c3-74d0-4836-846c-a22d2249efc9/whoami:431

  ng-xdzv-65657 /var/lib/rook/rook-ceph/3aaaed8f-7f05-4a4e-a780-70d0b19416fb_b649ea39-c19d-4577-9388-a2a108804df7/whoami:432
  ng-xdzv-2a0c7 /var/lib/rook/rook-ceph/3aaaed8f-7f05-4a4e-a780-70d0b19416fb_293a8678-d9a3-4227-9df6-562284b9bdbf/whoami:432

from rook.

bdowling commented on July 17, 2024

fwiw, here is the error that occurs on creating new ODSs in prepare:

2024-05-21 03:24:59.707950 D | exec: Running command: stdbuf -oL ceph-volume --log-path /tmp/ceph-log lvm batch --prepare --bluestore --yes --osds-per-device 4 /dev/nvme5n1
2024-05-21 03:25:00.324811 D | exec: --> DEPRECATION NOTICE
2024-05-21 03:25:00.324835 D | exec: --> You are using the legacy automatic disk sorting behavior
2024-05-21 03:25:00.324838 D | exec: --> The Pacific release will change the default to --no-auto
2024-05-21 03:25:00.324840 D | exec: --> passed data devices: 1 physical, 0 LVM
2024-05-21 03:25:00.324842 D | exec: --> relative data size: 0.25
2024-05-21 03:25:00.324843 D | exec: Running command: /usr/bin/ceph-authtool --gen-print-key
2024-05-21 03:25:00.324846 D | exec: Running command: /usr/bin/ceph --cluster ceph --name client.bootstrap-osd --keyring /var/lib/ceph/bootstrap-osd/ceph.keyring -i - osd new 89db0ce4-5283-4628-b183-ae28a03a52d9
2024-05-21 03:25:00.324848 D | exec:  stderr: Error EEXIST: entity osd.2 exists but key does not match
2024-05-21 03:25:00.325070 D | exec: Traceback (most recent call last):
2024-05-21 03:25:00.325073 D | exec:   File "/usr/sbin/ceph-volume", line 11, in <module>
2024-05-21 03:25:00.325074 D | exec:     load_entry_point('ceph-volume==1.0.0', 'console_scripts', 'ceph-volume')()
2024-05-21 03:25:00.325076 D | exec:   File "/usr/lib/python3.6/site-packages/ceph_volume/main.py", line 41, in __init__
2024-05-21 03:25:00.325078 D | exec:     self.main(self.argv)
2024-05-21 03:25:00.325080 D | exec:   File "/usr/lib/python3.6/site-packages/ceph_volume/decorators.py", line 59, in newfunc
2024-05-21 03:25:00.325082 D | exec:     return f(*a, **kw)
2024-05-21 03:25:00.325084 D | exec:   File "/usr/lib/python3.6/site-packages/ceph_volume/main.py", line 153, in main
2024-05-21 03:25:00.325086 D | exec:     terminal.dispatch(self.mapper, subcommand_args)
2024-05-21 03:25:00.325088 D | exec:   File "/usr/lib/python3.6/site-packages/ceph_volume/terminal.py", line 194, in dispatch
2024-05-21 03:25:00.325089 D | exec:     instance.main()
2024-05-21 03:25:00.325091 D | exec:   File "/usr/lib/python3.6/site-packages/ceph_volume/devices/lvm/main.py", line 46, in main
2024-05-21 03:25:00.325092 D | exec:     terminal.dispatch(self.mapper, self.argv)
2024-05-21 03:25:00.325094 D | exec:   File "/usr/lib/python3.6/site-packages/ceph_volume/terminal.py", line 194, in dispatch
2024-05-21 03:25:00.325095 D | exec:     instance.main()
2024-05-21 03:25:00.325097 D | exec:   File "/usr/lib/python3.6/site-packages/ceph_volume/decorators.py", line 16, in is_root
2024-05-21 03:25:00.325099 D | exec:     return func(*a, **kw)
2024-05-21 03:25:00.325100 D | exec:   File "/usr/lib/python3.6/site-packages/ceph_volume/devices/lvm/batch.py", line 414, in main
2024-05-21 03:25:00.325102 D | exec:     self._execute(plan)
2024-05-21 03:25:00.325104 D | exec:   File "/usr/lib/python3.6/site-packages/ceph_volume/devices/lvm/batch.py", line 429, in _execute
2024-05-21 03:25:00.325105 D | exec:     p.safe_prepare(argparse.Namespace(**args))
2024-05-21 03:25:00.325107 D | exec:   File "/usr/lib/python3.6/site-packages/ceph_volume/devices/lvm/prepare.py", line 196, in safe_prepare
2024-05-21 03:25:00.325108 D | exec:     self.prepare()
2024-05-21 03:25:00.325110 D | exec:   File "/usr/lib/python3.6/site-packages/ceph_volume/decorators.py", line 16, in is_root
2024-05-21 03:25:00.325112 D | exec:     return func(*a, **kw)
2024-05-21 03:25:00.325113 D | exec:   File "/usr/lib/python3.6/site-packages/ceph_volume/devices/lvm/prepare.py", line 236, in prepare
2024-05-21 03:25:00.325115 D | exec:     self.osd_id = prepare_utils.create_id(osd_fsid, json.dumps(secrets), osd_id=self.args.osd_id)
2024-05-21 03:25:00.325116 D | exec:   File "/usr/lib/python3.6/site-packages/ceph_volume/util/prepare.py", line 154, in create_id
2024-05-21 03:25:00.325117 D | exec:     raise RuntimeError('Unable to create a new OSD id')
2024-05-21 03:25:00.325119 D | exec: RuntimeError: Unable to create a new OSD id

from rook.

bdowling commented on July 17, 2024

Can someone provide any insight as to where to view/repair the source of this conflict? rook has a number of rook-ceph-osd-NN deployments that do not seem to match to active OSDs that are in the cluster anymore, and this is preventing new OSDs from being prepared. I tried to for example use the purge job to remove osd.2 in conflict above, and it is not in the osd tree/dump...

2024-05-21 03:37:12.809757 I | clusterdisruption-controller: osd "rook-ceph-osd-12" is down but no node drain is detected
2024-05-21 03:37:12.809839 I | clusterdisruption-controller: osd "rook-ceph-osd-2" is down but no node drain is detected
2024-05-21 03:37:12.809920 I | clusterdisruption-controller: osd "rook-ceph-osd-424" is down but no node drain is detected
2024-05-21 03:37:12.810020 I | clusterdisruption-controller: osd "rook-ceph-osd-54" is down but no node drain is detected
2024-05-21 03:37:12.810101 I | clusterdisruption-controller: osd "rook-ceph-osd-251" is down but no node drain is detected
2024-05-21 03:37:12.810178 I | clusterdisruption-controller: osd "rook-ceph-osd-20" is down but no node drain is detected
2024-05-21 03:37:12.810259 I | clusterdisruption-controller: osd "rook-ceph-osd-255" is down but no node drain is detected
2024-05-21 03:37:12.810338 I | clusterdisruption-controller: osd "rook-ceph-osd-6" is down but no node drain is detected

2024-05-21 03:41:04.404666 D | exec: Running command: ceph osd dump --connect-timeout=15 --cluster=rook-ceph --conf=/var/lib/rook/rook-ceph/rook-ceph.config --name=client.admin --keyring=/var/lib/rook/rook-ceph/client.admin.keyring --format json
2024-05-21 03:41:04.733780 I | cephosd: validating status of osd.2
2024-05-21 03:41:04.733835 C | rookcmd: failed to get osd status for osd 2: not found osd.2 in OSDDump

from rook.

satoru-takeuchi commented on July 17, 2024

Can someone provide any insight as to where to view/repair the source of this conflict?

I'm the assignee of this issue. I'll read this issue carefully tomorrow(I'm currently on PTO).

from rook.

bdowling commented on July 17, 2024

What I think I discovered is two things:

some of the duplicate OSDs existed on nodes and had auth that did not match what was in ceph, so those OSDs could not start as another node was using the same ID with correct key.
The complaints during discover were when it was trying to reuse an old OSD id that did not exist anymore in ceph, but the 'user' auth still existed. Removing the stale user auth for non-existant OSDs has now allowed it to create new OSDs.

bash-4.4$ for id in $(cat); do ceph auth del osd.$id; done
2       12      54      242     245     247     255     258     298     316     396     412     424
6       20      72      244     246     251     256     288     305     371     406     416     431

However I still believe there is a race-condition conflict in selecting and creating auth for new OSDs I have multiple new-nodes/drives being discovered and the prepare jobs are crashing out with the same exec: stderr: Error EEXIST: entity osd.NN exists but key does not match error message. I think it would be great if this ID selection was somehow a more atomic operation.

from rook.

bdowling commented on July 17, 2024

FYI, I monitored the provisioning process and anytime I saw a auth conflict, I deleted the auth entry and it was able to retry and proceed... Below is an example where 4 separate jobs all tried to assign osd.433. While provisioning some 300 OSDs, it had times where it conflicted 44 times and I had to go and wipe the disks and let it retry. Other times, as below I was able to stay on top of the auth conflicts and help it along.

while true; do for pod in $(kubectl get pods --no-headers -l app=rook-ceph-osd-prepare |grep -P 'Error|Crash'|awk '{print $1}'); do kubectl logs $pod |grep EEXIST; done; sleep 5; done

2024-05-21 06:52:23.076728 D | exec:  stderr: Error EEXIST: entity osd.433 exists but key does not match
[2024-05-21 06:52:23,072][ceph_volume.process][INFO  ] stderr Error EEXIST: entity osd.433 exists but key does not match

rook-ceph-osd-prepare-ng-xdzv7327xu-37018-bhx9l   0/1     CrashLoopBackOff   5 (30s ago)   5m26s
rook-ceph-osd-prepare-ng-xdzv7327xu-5662e-rcksc   0/1     CrashLoopBackOff   5 (32s ago)   5m
rook-ceph-osd-prepare-ng-xdzv7327xu-947c0-8bxn9   0/1     CrashLoopBackOff   5 (30s ago)   5m55s
rook-ceph-osd-prepare-ng-xdzv7327xu-da6f6-fcdhj   0/1     CrashLoopBackOff   5 (28s ago)   5m32s

Check for auth existing without corresponding OSD (I know this could be faster and also would be more robust using json, just quick script..):

for osd in $(ceph auth ls |grep ^osd.); do ceph osd tree |grep -q -w $osd || echo "DANGLING AUTH: $osd"; done

from rook.

satoru-takeuchi commented on July 17, 2024

@bdowling Many people have also encountered the similar or the same problem. To resolve this problem, ceph auth del has used. You hit this problem in high frequency because discovery daemons tries to create OSD in parallel and you have both many disks and many nodes.

I think it would be great if this ID selection was somehow a more atomic operation.

OSD ID allocation is done in ceph osd new command. Since Rook can't touch this logic, please open an issue in Ceph issue tracker if you'd like to make this logic completely atomic.

As a workaround, disabling discovery daemon (it's disabled by default) might help you. Could you try the following steps?

Set "ROOK_ENABLE_DISCOVERY_DAEMON" to "false" in rook-ceph-operator-config configmap.
Restart the operator. Then operator create osd-prepare pods if necessary.

It doesn't resolve this problem completely. However, I believe it reduces the prarallelism of OSD creation and then reduces the OSD ID confliction.

from rook.

travisn commented on July 17, 2024

@guits How exactly does ceph-volume allocate the OSD ID? This issue with a large number of OSDs being created in parallel makes it clear that it is not atomic, which causes quite a problem for large clusters.

from rook.

bdowling commented on July 17, 2024

I'm going to mark this issue closed for now. In testing, I was unable to get duplicate osd IDs with ceph osd new, so I suspect something else was going on. Like the old auth was never deleted when prior OSDs were purged or something. I'll revisit if I see this recur.

I went looking for where that code actually does this work in ceph, but got lost in the indirection where to find ceph osd new functions.

e.g. simple testing...

(ceph osd new $(uuidgen) & ceph osd new $(uuidgen) &ceph osd new $(uuidgen) & ceph osd new $(uuidgen) &ceph osd new $(uuidgen) & ceph osd new $(uuidgen) &ceph osd new $(uuidgen) & ceph osd new $(uuidgen) & ceph osd new $(uuidgen) & ceph osd new $(uuidgen) &ceph osd new $(uuidgen) & ceph osd new $(uuidgen) &ceph osd new $(uuidgen) & ceph osd new $(uuidgen) &ceph osd new $(uuidgen) & ceph osd new $(uuidgen) &) > /tmp/osd-ids
% sort /tmp/osd-ids |uniq -dc

from rook.

Failure to create OSD (obtaining uniq OSD.id) blocks further provisioning and possibly does not reach desired multiple OSDs per device about rook HOT 12 CLOSED

Comments (12)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent