Comments (12)
I have seen logs with text that is similar to "OSD ID exists and does not match my key" when there is an old OSD present on a device that hasn't been fully wiped after an old Rook/Ceph deployment. It's likely that you need to run sgdisk zap --all
on the disk in question.
I suspect this may also resolve the unused space issues you're seeing.
from rook.
I'll check, but I'm not sure that is the same issue here. These are all new hosts that have never had OSDs on these disks; they did actually have LVMs that were cleared before doing discovery. We did remove a number of old OSDs off other nodes recently, as we are trying to migrate storage to new nodes..
from rook.
I may have more issues going on; I found a number of OSDs that were in Crashloop state reporting:
debug 2024-05-17T23:29:41.506+0000 7f048da93700 -1 monclient(hunting): handle_auth_bad_method server allowed_methods [2] but i only support [2]
What I discovered is that they key on the keyring file for these OSDs (/var/lib/rook/rook-ceph/*/keyring
with corresponding whoami
files) does not match what 'ceph auth get osd.[id]' returns. Manually fixing these keys I see the OSDs are able to start; but the keyring files are replaced at some point again with the incorrect keys (restarting the OSD pod seems to trigger this).
Not sure exactly what process is creating these keyring files or why they have the incorrect key in them.
from rook.
Finding we have 31 duplicate OSD IDs on active storage nodes, a sample:
ng-xdzv-ef254 /var/lib/rook/rook-ceph/3aaaed8f-7f05-4a4e-a780-70d0b19416fb_06040623-9669-47d0-83ec-2c3b796474a5/whoami:429
ng-xdzv-2a0c7 /var/lib/rook/rook-ceph/3aaaed8f-7f05-4a4e-a780-70d0b19416fb_c28193f2-f527-43d6-aa1f-14287d5a13ec/whoami:429
ng-xdzv-65657 /var/lib/rook/rook-ceph/3aaaed8f-7f05-4a4e-a780-70d0b19416fb_e39861f0-89b7-4c80-8b32-e23653d8f651/whoami:431
ng-xdzv-da6f6 /var/lib/rook/rook-ceph/3aaaed8f-7f05-4a4e-a780-70d0b19416fb_359674c3-74d0-4836-846c-a22d2249efc9/whoami:431
ng-xdzv-65657 /var/lib/rook/rook-ceph/3aaaed8f-7f05-4a4e-a780-70d0b19416fb_b649ea39-c19d-4577-9388-a2a108804df7/whoami:432
ng-xdzv-2a0c7 /var/lib/rook/rook-ceph/3aaaed8f-7f05-4a4e-a780-70d0b19416fb_293a8678-d9a3-4227-9df6-562284b9bdbf/whoami:432
from rook.
fwiw, here is the error that occurs on creating new ODSs in prepare:
2024-05-21 03:24:59.707950 D | exec: Running command: stdbuf -oL ceph-volume --log-path /tmp/ceph-log lvm batch --prepare --bluestore --yes --osds-per-device 4 /dev/nvme5n1
2024-05-21 03:25:00.324811 D | exec: --> DEPRECATION NOTICE
2024-05-21 03:25:00.324835 D | exec: --> You are using the legacy automatic disk sorting behavior
2024-05-21 03:25:00.324838 D | exec: --> The Pacific release will change the default to --no-auto
2024-05-21 03:25:00.324840 D | exec: --> passed data devices: 1 physical, 0 LVM
2024-05-21 03:25:00.324842 D | exec: --> relative data size: 0.25
2024-05-21 03:25:00.324843 D | exec: Running command: /usr/bin/ceph-authtool --gen-print-key
2024-05-21 03:25:00.324846 D | exec: Running command: /usr/bin/ceph --cluster ceph --name client.bootstrap-osd --keyring /var/lib/ceph/bootstrap-osd/ceph.keyring -i - osd new 89db0ce4-5283-4628-b183-ae28a03a52d9
2024-05-21 03:25:00.324848 D | exec: stderr: Error EEXIST: entity osd.2 exists but key does not match
2024-05-21 03:25:00.325070 D | exec: Traceback (most recent call last):
2024-05-21 03:25:00.325073 D | exec: File "/usr/sbin/ceph-volume", line 11, in <module>
2024-05-21 03:25:00.325074 D | exec: load_entry_point('ceph-volume==1.0.0', 'console_scripts', 'ceph-volume')()
2024-05-21 03:25:00.325076 D | exec: File "/usr/lib/python3.6/site-packages/ceph_volume/main.py", line 41, in __init__
2024-05-21 03:25:00.325078 D | exec: self.main(self.argv)
2024-05-21 03:25:00.325080 D | exec: File "/usr/lib/python3.6/site-packages/ceph_volume/decorators.py", line 59, in newfunc
2024-05-21 03:25:00.325082 D | exec: return f(*a, **kw)
2024-05-21 03:25:00.325084 D | exec: File "/usr/lib/python3.6/site-packages/ceph_volume/main.py", line 153, in main
2024-05-21 03:25:00.325086 D | exec: terminal.dispatch(self.mapper, subcommand_args)
2024-05-21 03:25:00.325088 D | exec: File "/usr/lib/python3.6/site-packages/ceph_volume/terminal.py", line 194, in dispatch
2024-05-21 03:25:00.325089 D | exec: instance.main()
2024-05-21 03:25:00.325091 D | exec: File "/usr/lib/python3.6/site-packages/ceph_volume/devices/lvm/main.py", line 46, in main
2024-05-21 03:25:00.325092 D | exec: terminal.dispatch(self.mapper, self.argv)
2024-05-21 03:25:00.325094 D | exec: File "/usr/lib/python3.6/site-packages/ceph_volume/terminal.py", line 194, in dispatch
2024-05-21 03:25:00.325095 D | exec: instance.main()
2024-05-21 03:25:00.325097 D | exec: File "/usr/lib/python3.6/site-packages/ceph_volume/decorators.py", line 16, in is_root
2024-05-21 03:25:00.325099 D | exec: return func(*a, **kw)
2024-05-21 03:25:00.325100 D | exec: File "/usr/lib/python3.6/site-packages/ceph_volume/devices/lvm/batch.py", line 414, in main
2024-05-21 03:25:00.325102 D | exec: self._execute(plan)
2024-05-21 03:25:00.325104 D | exec: File "/usr/lib/python3.6/site-packages/ceph_volume/devices/lvm/batch.py", line 429, in _execute
2024-05-21 03:25:00.325105 D | exec: p.safe_prepare(argparse.Namespace(**args))
2024-05-21 03:25:00.325107 D | exec: File "/usr/lib/python3.6/site-packages/ceph_volume/devices/lvm/prepare.py", line 196, in safe_prepare
2024-05-21 03:25:00.325108 D | exec: self.prepare()
2024-05-21 03:25:00.325110 D | exec: File "/usr/lib/python3.6/site-packages/ceph_volume/decorators.py", line 16, in is_root
2024-05-21 03:25:00.325112 D | exec: return func(*a, **kw)
2024-05-21 03:25:00.325113 D | exec: File "/usr/lib/python3.6/site-packages/ceph_volume/devices/lvm/prepare.py", line 236, in prepare
2024-05-21 03:25:00.325115 D | exec: self.osd_id = prepare_utils.create_id(osd_fsid, json.dumps(secrets), osd_id=self.args.osd_id)
2024-05-21 03:25:00.325116 D | exec: File "/usr/lib/python3.6/site-packages/ceph_volume/util/prepare.py", line 154, in create_id
2024-05-21 03:25:00.325117 D | exec: raise RuntimeError('Unable to create a new OSD id')
2024-05-21 03:25:00.325119 D | exec: RuntimeError: Unable to create a new OSD id
from rook.
Can someone provide any insight as to where to view/repair the source of this conflict? rook has a number of rook-ceph-osd-NN deployments that do not seem to match to active OSDs that are in the cluster anymore, and this is preventing new OSDs from being prepared. I tried to for example use the purge job to remove osd.2 in conflict above, and it is not in the osd tree/dump...
2024-05-21 03:37:12.809757 I | clusterdisruption-controller: osd "rook-ceph-osd-12" is down but no node drain is detected
2024-05-21 03:37:12.809839 I | clusterdisruption-controller: osd "rook-ceph-osd-2" is down but no node drain is detected
2024-05-21 03:37:12.809920 I | clusterdisruption-controller: osd "rook-ceph-osd-424" is down but no node drain is detected
2024-05-21 03:37:12.810020 I | clusterdisruption-controller: osd "rook-ceph-osd-54" is down but no node drain is detected
2024-05-21 03:37:12.810101 I | clusterdisruption-controller: osd "rook-ceph-osd-251" is down but no node drain is detected
2024-05-21 03:37:12.810178 I | clusterdisruption-controller: osd "rook-ceph-osd-20" is down but no node drain is detected
2024-05-21 03:37:12.810259 I | clusterdisruption-controller: osd "rook-ceph-osd-255" is down but no node drain is detected
2024-05-21 03:37:12.810338 I | clusterdisruption-controller: osd "rook-ceph-osd-6" is down but no node drain is detected
2024-05-21 03:41:04.404666 D | exec: Running command: ceph osd dump --connect-timeout=15 --cluster=rook-ceph --conf=/var/lib/rook/rook-ceph/rook-ceph.config --name=client.admin --keyring=/var/lib/rook/rook-ceph/client.admin.keyring --format json
2024-05-21 03:41:04.733780 I | cephosd: validating status of osd.2
2024-05-21 03:41:04.733835 C | rookcmd: failed to get osd status for osd 2: not found osd.2 in OSDDump
from rook.
Can someone provide any insight as to where to view/repair the source of this conflict?
I'm the assignee of this issue. I'll read this issue carefully tomorrow(I'm currently on PTO).
from rook.
What I think I discovered is two things:
-
some of the duplicate OSDs existed on nodes and had auth that did not match what was in ceph, so those OSDs could not start as another node was using the same ID with correct key.
-
The complaints during discover were when it was trying to reuse an old OSD id that did not exist anymore in ceph, but the 'user' auth still existed. Removing the stale user auth for non-existant OSDs has now allowed it to create new OSDs.
bash-4.4$ for id in $(cat); do ceph auth del osd.$id; done
2 12 54 242 245 247 255 258 298 316 396 412 424
6 20 72 244 246 251 256 288 305 371 406 416 431
However I still believe there is a race-condition conflict in selecting and creating auth for new OSDs I have multiple new-nodes/drives being discovered and the prepare jobs are crashing out with the same exec: stderr: Error EEXIST: entity osd.NN exists but key does not match
error message. I think it would be great if this ID selection was somehow a more atomic operation.
from rook.
FYI, I monitored the provisioning process and anytime I saw a auth conflict, I deleted the auth entry and it was able to retry and proceed... Below is an example where 4 separate jobs all tried to assign osd.433. While provisioning some 300 OSDs, it had times where it conflicted 44 times and I had to go and wipe the disks and let it retry. Other times, as below I was able to stay on top of the auth conflicts and help it along.
while true; do for pod in $(kubectl get pods --no-headers -l app=rook-ceph-osd-prepare |grep -P 'Error|Crash'|awk '{print $1}'); do kubectl logs $pod |grep EEXIST; done; sleep 5; done
2024-05-21 06:52:23.076728 D | exec: stderr: Error EEXIST: entity osd.433 exists but key does not match
[2024-05-21 06:52:23,072][ceph_volume.process][INFO ] stderr Error EEXIST: entity osd.433 exists but key does not match
rook-ceph-osd-prepare-ng-xdzv7327xu-37018-bhx9l 0/1 CrashLoopBackOff 5 (30s ago) 5m26s
rook-ceph-osd-prepare-ng-xdzv7327xu-5662e-rcksc 0/1 CrashLoopBackOff 5 (32s ago) 5m
rook-ceph-osd-prepare-ng-xdzv7327xu-947c0-8bxn9 0/1 CrashLoopBackOff 5 (30s ago) 5m55s
rook-ceph-osd-prepare-ng-xdzv7327xu-da6f6-fcdhj 0/1 CrashLoopBackOff 5 (28s ago) 5m32s
Check for auth existing without corresponding OSD (I know this could be faster and also would be more robust using json, just quick script..):
for osd in $(ceph auth ls |grep ^osd.); do ceph osd tree |grep -q -w $osd || echo "DANGLING AUTH: $osd"; done
from rook.
@bdowling Many people have also encountered the similar or the same problem. To resolve this problem, ceph auth del
has used. You hit this problem in high frequency because discovery daemons tries to create OSD in parallel and you have both many disks and many nodes.
I think it would be great if this ID selection was somehow a more atomic operation.
OSD ID allocation is done in ceph osd new
command. Since Rook can't touch this logic, please open an issue in Ceph issue tracker if you'd like to make this logic completely atomic.
As a workaround, disabling discovery
daemon (it's disabled by default) might help you. Could you try the following steps?
- Set "ROOK_ENABLE_DISCOVERY_DAEMON" to "false" in
rook-ceph-operator-config
configmap. - Restart the operator. Then operator create osd-prepare pods if necessary.
It doesn't resolve this problem completely. However, I believe it reduces the prarallelism of OSD creation and then reduces the OSD ID confliction.
from rook.
@guits How exactly does ceph-volume allocate the OSD ID? This issue with a large number of OSDs being created in parallel makes it clear that it is not atomic, which causes quite a problem for large clusters.
from rook.
I'm going to mark this issue closed for now. In testing, I was unable to get duplicate osd IDs with ceph osd new
, so I suspect something else was going on. Like the old auth was never deleted when prior OSDs were purged or something. I'll revisit if I see this recur.
I went looking for where that code actually does this work in ceph, but got lost in the indirection where to find ceph osd new
functions.
e.g. simple testing...
(ceph osd new $(uuidgen) & ceph osd new $(uuidgen) &ceph osd new $(uuidgen) & ceph osd new $(uuidgen) &ceph osd new $(uuidgen) & ceph osd new $(uuidgen) &ceph osd new $(uuidgen) & ceph osd new $(uuidgen) & ceph osd new $(uuidgen) & ceph osd new $(uuidgen) &ceph osd new $(uuidgen) & ceph osd new $(uuidgen) &ceph osd new $(uuidgen) & ceph osd new $(uuidgen) &ceph osd new $(uuidgen) & ceph osd new $(uuidgen) &) > /tmp/osd-ids
% sort /tmp/osd-ids |uniq -dc
from rook.
Related Issues (20)
- How does the nodeport approach of rook ceph dashboard enable https access under the hood HOT 2
- Development environement sets docker-env for multi node deployments HOT 2
- Document how to create a storage class to consume a subvolumegroup HOT 2
- Rook-Ceph IO performance - why are the sequential IOPS in this benchmark so much lower than the random IOPS?
- OSD crash-looping after node reboot HOT 6
- Ceph occupies space and is not released HOT 1
- benchmarking OSD disk. HOT 1
- Document how to use new features with external mode
- Enable /dev/stderr or stdout as log file path to redirect to pod logs HOT 1
- The rook-ceph-default SA does not have associated Cluster Role Binding HOT 9
- External: User should have the ability to update their config
- how to distinguish different devices? HOT 1
- Module 'rook' has failed: HTTPSConnectionPool(host='172.16.0.1', port=443): Max retries exceeded with url: /api/v1/namespaces/rook-ceph/pods (Caused by SSLError(SSLError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:897)'),)) HOT 1
- govulncheck ci is failing HOT 3
- CSI Bucket Provisioner fails when RGW is using a cert signed by a non-trusted CA HOT 4
- Bug on v0.8.0 CSI-Addons Controller deployment - add hint to the documentation HOT 3
- Ceph rgw does not report ceph_version prometheus labels HOT 1
- .*detect-version jobs are forbidden to schedule due to failed quota requirements HOT 2
- configOverride doesn't reflect using helm at the initial cephcluster HOT 3
- Ceph MDS daemons are killed by liveliness probe after upgrade from 1.13 to 1.14 HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from rook.