Comments (23)
https://github.com/rook/rook/blob/release-1.12/deploy/examples/operator.yaml#L509 it was there till 1.12 and in 1.13 we removed it.
validating webhook rook-ceph-webhook
this is not a pod, this is kubernetes resource, try kubectl get validatingwebhook rook-ceph-webhook*
from rook.
Thank you. Take your free time! I'm not really blocked.
$ kubectl get validatingwebhookconfigurations -A NAME WEBHOOKS AGE cert-manager-webhook 1 3y116d ingress-nginx-admission 1 432d metallb-webhook-configuration 7 432d rook-ceph-webhook 5 2y3d
I see the issue you need to delete the rook-ceph-webhook (I forgot that webhooks are cluster based resouce) also here is the code
rook/pkg/operator/ceph/webhook-config.go
Lines 258 to 282 in b32948c
from rook.
Alright. I'm not into Go but I'll figure it out. Thank you for your help!
from rook.
My cluster.yaml
: 04-cluster-prod.txt
from rook.
Operator shows no errors or warnings.
from rook.
@subhamkrai What are the steps to manually disable the admission controller? I can't seem to find it from previous issues.
from rook.
@subhamkrai What are the steps to manually disable the admission controller? I can't seem to find it from previous issues.
not exactly remember but setting the value true
should work but if that is not working try deleting
validating webhook rook-ceph-webhook
from rook.
Thank you for your replies.
not exactly remember but setting the value
true
should work but if that is not working try deletingvalidating webhook rook-ceph-webhook
Are those supposed to be pods? I don't have any of those. I'm currently at v1.13.8: there is no ROOK_DISABLE_ADMISSION_CONTROLLER
anymore. How can I set it to true
now?
from rook.
@subhamkrai Thank you for pointing me in the right direction. I can see those resources:
$ kubectl api-resources --verbs=list -n rook-ceph | grep hook
mutatingwebhookconfigurations admissionregistration.k8s.io/v1 false MutatingWebhookConfiguration
validatingwebhookconfigurations admissionregistration.k8s.io/v1 false ValidatingWebhookConfiguration
$ kubectl api-resources --verbs=list -n rook-ceph | grep val
validatingwebhookconfigurations admissionregistration.k8s.io/v1 false ValidatingWebhookConfiguratio
So none of the ones you mentioned, or?
https://github.com/rook/rook/blob/release-1.12/deploy/examples/operator.yaml#L509 it was there till 1.12 and in 1.13 we removed it.
So no chance to set it to true
now?
from rook.
@maon-fp could you also share svc list in rook-ceoh namespace?
from rook.
Also could you share the top 10lines of rook operator pods logs
from rook.
Yes, of course.
List of services:
$ kgs production:rook-ceph
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
csi-rbdplugin-metrics ClusterIP 10.104.212.46 <none> 8080/TCP,8081/TCP 3y104d
rook-ceph-admission-controller ClusterIP 10.99.221.127 <none> 443/TCP 2y2d
rook-ceph-mgr ClusterIP 10.109.30.124 <none> 9283/TCP 3y104d
rook-ceph-mgr-dashboard ClusterIP 10.107.242.106 <none> 8443/TCP 3y104d
rook-ceph-mon-a ClusterIP 10.101.39.245 <none> 6789/TCP,3300/TCP 3y104d
rook-ceph-mon-c ClusterIP 10.110.130.143 <none> 6789/TCP,3300/TCP 3y104d
rook-ceph-mon-d ClusterIP 10.110.86.107 <none> 6789/TCP,3300/TCP 3y104d
First lines of operator log:
$ kl rook-ceph-operator-9f688fcc5-v2q6j | head -n 10 production:rook-ceph
2024/04/23 14:00:19 maxprocs: Leaving GOMAXPROCS=24: CPU quota undefined
2024-04-23 14:00:19.215493 I | rookcmd: starting Rook v1.13.8 with arguments '/usr/local/bin/rook ceph operator'
2024-04-23 14:00:19.215514 I | rookcmd: flag values: --enable-machine-disruption-budget=false, --help=false, --kubeconfig=, --log-level=INFO
2024-04-23 14:00:19.215519 I | cephcmd: starting Rook-Ceph operator
2024-04-23 14:00:19.322061 I | cephcmd: base ceph version inside the rook operator image is "ceph version 18.2.2 (531c0d11a1c5d39fbfe6aa8a521f023abf3bf3e2) reef (stable)"
2024-04-23 14:00:19.332548 I | op-k8sutil: ROOK_CURRENT_NAMESPACE_ONLY="false" (env var)
2024-04-23 14:00:19.332558 I | operator: watching all namespaces for Ceph CRs
2024-04-23 14:00:19.332604 I | operator: setting up schemes
2024-04-23 14:00:19.335083 I | operator: setting up the controller-runtime manager
2024-04-23 14:00:19.335422 I | ceph-cluster-controller: successfully started
``
from rook.
logs didn't help much but yeah delete the following resources in rook-ceph namespace(probably)
Certificate rook-admission-controller-cert
Issuer "selfsigned-issuer"
service "rook-ceph-admission-controller"
Also if you could share the -o yaml output of certificate and issue mentioned above to make sure that you are deleting the right resources. But yes we need to clean above three resources.
from rook.
rook-admission-controller-cert:
apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
creationTimestamp: "2022-04-23T18:45:33Z"
generation: 1
name: rook-admission-controller-cert
namespace: rook-ceph
resourceVersion: "301286319"
uid: 22aa348f-e223-4f98-870e-aab4ef1f71a9
spec:
dnsNames:
- rook-ceph-admission-controller
- rook-ceph-admission-controller.rook-ceph.svc
- rook-ceph-admission-controller.rook-ceph.svc.cluster.local
issuerRef:
kind: Issuer
name: selfsigned-issuer
secretName: rook-ceph-admission-controller
status:
conditions:
- lastTransitionTime: "2022-04-23T18:45:34Z"
message: Certificate is up to date and has not expired
observedGeneration: 1
reason: Ready
status: "True"
type: Ready
notAfter: "2024-07-11T18:45:34Z"
notBefore: "2024-04-12T18:45:34Z"
renewalTime: "2024-06-11T18:45:34Z"
revision: 13
selfsigned-issuer:
apiVersion: cert-manager.io/v1
kind: Issuer
metadata:
creationTimestamp: "2022-04-23T18:45:32Z"
generation: 1
name: selfsigned-issuer
namespace: rook-ceph
resourceVersion: "138597982"
uid: 68162730-aade-4670-b830-1cf97005ef5c
spec:
selfSigned: {}
status:
conditions:
- lastTransitionTime: "2022-04-23T18:45:32Z"
observedGeneration: 1
reason: IsReady
status: "True"
type: Ready
rook-ceph-admission-controller:
apiVersion: v1
kind: Service
metadata:
creationTimestamp: "2022-04-23T18:45:34Z"
name: rook-ceph-admission-controller
namespace: rook-ceph
resourceVersion: "214711462"
uid: b62cac4d-ce0c-4f3d-aa19-ff2f9d9d553c
spec:
clusterIP: 10.99.221.127
clusterIPs:
- 10.99.221.127
internalTrafficPolicy: Cluster
ipFamilies:
- IPv4
ipFamilyPolicy: SingleStack
ports:
- port: 443
protocol: TCP
targetPort: 9443
selector:
app: rook-ceph-operator
sessionAffinity: None
type: ClusterIP
status:
loadBalancer: {}
from rook.
I deleted those resources but still get (a slightly different) error:
Error from server (InternalError): error when applying patch:
{"metadata":{"annotations":{"kubectl.kubernetes.io/last-applied-configuration":"{\"apiVersion\":\"ceph.rook.io/v1\",\"kind\":\"CephCluster\",\"metadata\":{\"annotations\":{},\"name\":\"rook-ceph\",\"namespace\":\"rook-ceph\"},\"spec\":{\"annotations\":null,\"cephVersion\":{\"allowUnsupported\":false,\"image\":\"quay.io/ceph/ceph:v18.2.2\"},\"cleanupPolicy\":{\"allowUninstallWithVolumes\":false,\"confirmation\":\"\",\"sanitizeDisks\":{\"dataSource\":\"zero\",\"iteration\":1,\"method\":\"quick\"}},\"continueUpgradeAfterChecksEvenIfNotHealthy\":false,\"crashCollector\":{\"disable\":false},\"csi\":{\"cephfs\":null,\"readAffinity\":{\"enabled\":false}},\"dashboard\":{\"enabled\":true,\"ssl\":true},\"dataDirHostPath\":\"/var/lib/rook\",\"disruptionManagement\":{\"managePodBudgets\":true,\"osdMaintenanceTimeout\":30,\"pgHealthCheckTimeout\":0},\"healthCheck\":{\"daemonHealth\":{\"mon\":{\"disabled\":false,\"interval\":\"45s\"},\"osd\":{\"disabled\":false,\"interval\":\"60s\"},\"status\":{\"disabled\":false,\"interval\":\"60s\"}},\"livenessProbe\":{\"mgr\":{\"disabled\":false},\"mon\":{\"disabled\":false},\"osd\":{\"disabled\":false}},\"startupProbe\":{\"mgr\":{\"disabled\":false},\"mon\":{\"disabled\":false},\"osd\":{\"disabled\":false}}},\"labels\":null,\"logCollector\":{\"enabled\":true,\"maxLogSize\":\"500M\",\"periodicity\":\"daily\"},\"mgr\":{\"allowMultiplePerNode\":true,\"count\":2,\"modules\":null},\"mon\":{\"allowMultiplePerNode\":true,\"count\":3},\"monitoring\":{\"enabled\":false,\"metricsDisabled\":false},\"network\":{\"connections\":{\"compression\":{\"enabled\":false},\"encryption\":{\"enabled\":false},\"requireMsgr2\":false}},\"priorityClassNames\":{\"mgr\":\"system-cluster-critical\",\"mon\":\"system-node-critical\",\"osd\":\"system-node-critical\"},\"removeOSDsIfOutAndSafeToRemove\":false,\"resources\":null,\"skipUpgradeChecks\":false,\"storage\":{\"config\":null,\"nodes\":[{\"devices\":[{\"config\":{\"osdsPerDevice\":\"4\"},\"name\":\"nvme0n1\"},{\"config\":{\"osdsPerDevice\":\"4\"},\"name\":\"nvme1n1\"},{\"config\":{\"osdsPerDevice\":\"4\"},\"name\":\"nvme3n1\"}],\"name\":\"storage1.<redacted>\"},{\"devices\":[{\"config\":{\"osdsPerDevice\":\"4\"},\"name\":\"nvme0n1\"},{\"config\":{\"osdsPerDevice\":\"4\"},\"name\":\"nvme2n1\"},{\"config\":{\"osdsPerDevice\":\"4\"},\"name\":\"nvme3n1\"}],\"name\":\"storage2.<redacted>\"}],\"onlyApplyOSDPlacement\":false,\"useAllDevices\":false,\"useAllNodes\":false},\"waitTimeoutForHealthyOSDInMinutes\":10}}\n"}},"spec":{"cephVersion":{"image":"quay.io/ceph/ceph:v18.2.2"},"csi":{"cephfs":null,"readAffinity":{"enabled":false}},"mgr":{"modules":null}}}
to:
Resource: "ceph.rook.io/v1, Resource=cephclusters", GroupVersionKind: "ceph.rook.io/v1, Kind=CephCluster"
Name: "rook-ceph", Namespace: "rook-ceph"
for: "04-cluster-prod.yaml": error when patching "04-cluster-prod.yaml": Internal error occurred: failed calling webhook "cephcluster-wh-rook-ceph-admission-controller-rook-ceph.rook.io": failed to call webhook: Post "https://rook-ceph-admission-controller.rook-ceph.svc:443/validate-ceph-rook-io-v1-cephcluster?timeout=5s": service "rook-ceph-admission-controller" not found
I've also listed all resources in the namespace list_rook_ceph.txt and can find some admission controller resources:
$ grep admission list_rook_ceph.txt
secret/rook-ceph-admission-controller kubernetes.io/tls 3 2y3d
secret/rook-ceph-admission-controller-token-s47d8 kubernetes.io/service-account-token 3 3y105d
serviceaccount/rook-ceph-admission-controller 1 3y105d
from rook.
try deleting the resources mentioned above
from rook.
As stated before: the resource are already deleted. But now it complains about: service "rook-ceph-admission-controller" not found
instead of a timeout.
from rook.
kubectl get validatingwebhookconfigurations -A (search this in all namespace once). Also I'm on holiday today so will look on Monday.
Edit: I hope it's not something blocking you
from rook.
Thank you. Take your free time! I'm not really blocked.
$ kubectl get validatingwebhookconfigurations -A
NAME WEBHOOKS AGE
cert-manager-webhook 1 3y116d
ingress-nginx-admission 1 432d
metallb-webhook-configuration 7 432d
rook-ceph-webhook 5 2y3d
from rook.
Just to be 100% sure. Are you asking to run:
kubectl delete validatingwebhookconfigurations rook-ceph-webhook
? I'm a bit worried as I can see 5 webhooks there.
from rook.
Just to be 100% sure. Are you asking to run:
kubectl delete validatingwebhookconfigurations rook-ceph-webhook
? I'm a bit worried as I can see 5 webhooks there.
yess, delete rook-ceph-webhook only
from rook.
It worked. Thanks a lot for the quick and competent answers! 🙇
from rook.
Good to know it is working now @maon-fp
from rook.
Related Issues (20)
- How does the nodeport approach of rook ceph dashboard enable https access under the hood HOT 2
- Development environement sets docker-env for multi node deployments HOT 5
- Document how to create a storage class to consume a subvolumegroup HOT 3
- Rook-Ceph IO performance - why are the sequential IOPS in this benchmark so much lower than the random IOPS? HOT 2
- OSD crash-looping after node reboot HOT 7
- Ceph occupies space and is not released HOT 1
- benchmarking OSD disk. HOT 2
- Document how to use new features with external mode
- Enable /dev/stderr or stdout as log file path to redirect to pod logs HOT 8
- The rook-ceph-default SA does not have associated Cluster Role Binding HOT 9
- External: User should have the ability to update their config
- how to distinguish different devices? HOT 3
- Module 'rook' has failed: HTTPSConnectionPool(host='172.16.0.1', port=443): Max retries exceeded with url: /api/v1/namespaces/rook-ceph/pods (Caused by SSLError(SSLError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:897)'),)) HOT 1
- govulncheck ci is failing HOT 3
- CSI Bucket Provisioner fails when RGW is using a cert signed by a non-trusted CA HOT 4
- Bug on v0.8.0 CSI-Addons Controller deployment - add hint to the documentation HOT 3
- Ceph rgw does not report ceph_version prometheus labels HOT 1
- .*detect-version jobs are forbidden to schedule due to failed quota requirements HOT 2
- configOverride doesn't reflect using helm at the initial cephcluster HOT 3
- Ceph MDS daemons are killed by liveliness probe after upgrade from 1.13 to 1.14 HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from rook.