openshift / cluster-kube-controller-manager-operator Goto Github PK
View Code? Open in Web Editor NEWThe kube-controller-manager operator installs and maintains the kube-controller-manager on a cluster
License: Apache License 2.0
The kube-controller-manager operator installs and maintains the kube-controller-manager on a cluster
License: Apache License 2.0
Originally posted by @deads2k in https://github.com/openshift/cluster-kube-controller-manager-operator/pull/78/files
The following branches are being fast-forwarded from the current development branch (master) as placeholders for future releases. No merging is allowed into these release branches until they are unfrozen for production release.
release-4.18
release-4.19
For more information, see the branching documentation.
PodDisruptionBudgetAtLimit should not be raised when there is no disruption allowed:
Example:
When there is only one pod and no disruption is allowed on it, this alert keeps on showing all the time.
In this kind of situation, this alert should not be raised, as it can be a conscious decision to have the expected number of pods to be equal to the min number of pods, and not allow any disruption.
1 As the the kcm connect to apiserver via apiServerInternalURL, which was set by
2 In the worst case, all kcm connect to the same apisever , which would put more pressure to the apiserver
3 We should let the kcm connect to the local apiserver to achieve load balancing
controller manager
pod
does not aware of running host timezone
, it affect the CronJob
schedule time, Service serving cert generating or evaluation timing, log timestamp, and potential time aware tasks. I'm wondering how to sync timezone
with the running pods based on operator
with running host.
We need the counterpart of openshift/cluster-kube-apiserver-operator#74 for the controller manager. We have it for the bootstrapping phase (merged in #76), but not post-bootstrap.
1 Bug phenomenon:
2 Bug fix:
2.1. this bug can be resolved by adding kube-apiserver-server-ca check in manageServiceAccountCABundle of targerconfigcontroller ,as follows
2.3 this is also can be resolved by modifying the openshift library func CombineCABundleConfigMaps in resourcesynccontroller as
3 This is related to the openshift library-go issue # issue 1472 github.com/openshift/library-go missing key configmap
Right now the --asset-input-dir represents the /assets/tls
, however for the controller manager we also need to read /assets/auth
where the kubeconfig
file is created. We should split the assets input dir option to cover both cases.
we are using Terway cni on Aibaba Cloud, so we need use kube-controller-manager to allocate PodCIDR, we set follow config in KubeControllerManager.operator.openshift.io/v1, but no effect for us.
unsupportedConfigOverrides:
extendedArguments:
allocate-node-cidrs:
- "true"
node-cidr-mask-size:
- "23"
$ cat /etc/kubernetes/static-pod-resources/kube-controller-manager-pod-9/configmaps/config/config.yaml |jq
{
"apiVersion": "kubecontrolplane.config.openshift.io/v1",
"extendedArguments": {
"allocate-node-cidrs": [
"true"
],
"cert-dir": [
"/var/run/kubernetes"
],
"cluster-cidr": [
"10.108.128.0/18"
],
"cluster-name": [
"staging-mpcr4"
],
"cluster-signing-cert-file": [
"/etc/kubernetes/static-pod-certs/secrets/csr-signer/tls.crt"
],
"cluster-signing-key-file": [
"/etc/kubernetes/static-pod-certs/secrets/csr-signer/tls.key"
],
"configure-cloud-routes": [
"false"
],
"controllers": [
"*",
"-ttl",
"-bootstrapsigner",
"-tokencleaner"
],
"enable-dynamic-provisioning": [
"true"
],
"experimental-cluster-signing-duration": [
"720h"
],
"feature-gates": [
"RotateKubeletServerCertificate=true",
"SupportPodPidsLimit=true",
"NodeDisruptionExclusion=true",
"ServiceNodeExclusion=true",
"SCTPSupport=true",
"LegacyNodeRoleBehavior=false"
],
"flex-volume-plugin-dir": [
"/etc/kubernetes/kubelet-plugins/volume/exec"
],
"kube-api-burst": [
"300"
],
"kube-api-qps": [
"150"
],
"leader-elect": [
"true"
],
"leader-elect-resource-lock": [
"configmaps"
],
"leader-elect-retry-period": [
"3s"
],
"node-cidr-mask-size": [
"23"
],
"port": [
"0"
],
"root-ca-file": [
"/etc/kubernetes/static-pod-resources/configmaps/serviceaccount-ca/ca-bundle.crt"
],
"secure-port": [
"10257"
],
"service-account-private-key-file": [
"/etc/kubernetes/static-pod-resources/secrets/service-account-private-key/service-account.key"
],
"service-cluster-ip-range": [
"172.30.0.0/18"
],
"use-service-account-credentials": [
"true"
]
},
"kind": "KubeControllerManagerConfig",
"serviceServingCert": {
"certFile": "/etc/kubernetes/static-pod-resources/configmaps/service-ca/ca-bundle.crt"
}
}
kcm logs:
I0417 05:50:25.193690 1 patch.go:65] FLAGSET: generic
I0417 05:50:25.193699 1 flags.go:33] FLAG: --allocate-node-cidrs="false"
I0417 05:50:25.193701 1 flags.go:33] FLAG: --allow-untagged-cloud="false"
I0417 05:50:25.193704 1 flags.go:33] FLAG: --cidr-allocator-type="RangeAllocator"
I0417 05:50:25.193707 1 flags.go:33] FLAG: --cloud-config=""
I0417 05:50:25.193709 1 flags.go:33] FLAG: --cloud-provider=""
I0417 05:50:25.193712 1 flags.go:33] FLAG: --cluster-cidr="10.108.128.0/18"
I0417 05:50:25.193715 1 flags.go:33] FLAG: --cluster-name="staging-mpcr4"
I0417 05:50:25.193717 1 flags.go:33] FLAG: --configure-cloud-routes="true"
I0417 05:50:25.193720 1 flags.go:33] FLAG: --controller-start-interval="0s"
I0417 05:50:25.193722 1 flags.go:33] FLAG: --controllers="[*,-ttl,-bootstrapsigner,-tokencleaner]"
I0417 05:50:25.193728 1 flags.go:33] FLAG: --external-cloud-volume-plugin=""
I0417 05:50:25.193731 1 flags.go:33] FLAG: --feature-gates="LegacyNodeRoleBehavior=false,NodeDisruptionExclusion=true,RotateKubeletServerCertificate=true,SCTPSupport=true,ServiceNodeExclusion=true,SupportPodPidsLimit=true"
The following branches are being fast-forwarded from the current development branch (master) as placeholders for future releases. No merging is allowed into these release branches until they are unfrozen for production release.
release-4.16
release-4.17
For more information, see the branching documentation.
after removing openshift-cluster-kube-controller-manager
from the cvoconfig.overrides
the operator works for a little while (minutes) but then I see this panic and the whole openshift-kube-controler-manager
namespace goes away.
I1012 02:02:45.252594 1 reflector.go:286] github.com/openshift/cluster-kube-controller-manager-operator/vendor/k8s.io/client-go/informers/factory.go:130: forcing resync
I1012 02:02:45.254090 1 reflector.go:286] github.com/openshift/cluster-kube-controller-manager-operator/vendor/k8s.io/client-go/informers/factory.go:130: forcing resync
E1012 02:02:45.255370 1 runtime.go:66] Observed a panic: "asset: Asset(v3.11.0/kube-controller-manager/ns.yaml): Asset v3.11.0/kube-controller-manager/ns.yaml not found" (asset: Asset(v3.11.0/kube-controller-manager/ns.yaml): Asset v3.11.0/kube-controller-manager/ns.yaml not found)
/go/src/github.com/openshift/cluster-kube-controller-manager-operator/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:72
/go/src/github.com/openshift/cluster-kube-controller-manager-operator/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:65
/go/src/github.com/openshift/cluster-kube-controller-manager-operator/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:51
/usr/local/go/src/runtime/asm_amd64.s:573
/usr/local/go/src/runtime/panic.go:502
/go/src/github.com/openshift/cluster-kube-controller-manager-operator/pkg/operator/v311_00_assets/bindata.go:458
/go/src/github.com/openshift/cluster-kube-controller-manager-operator/pkg/operator/sync_kubecontrollermanager_v311_00.go:31
/go/src/github.com/openshift/cluster-kube-controller-manager-operator/pkg/operator/operator.go:129
/go/src/github.com/openshift/cluster-kube-controller-manager-operator/pkg/operator/operator.go:174
/go/src/github.com/openshift/cluster-kube-controller-manager-operator/pkg/operator/operator.go:163
/go/src/github.com/openshift/cluster-kube-controller-manager-operator/pkg/operator/operator.go:157
/go/src/github.com/openshift/cluster-kube-controller-manager-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:133
/go/src/github.com/openshift/cluster-kube-controller-manager-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:134
/go/src/github.com/openshift/cluster-kube-controller-manager-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:88
/usr/local/go/src/runtime/asm_amd64.s:2361
I1012 02:02:45.255499 1 reflector.go:286] github.com/openshift/cluster-kube-controller-manager-operator/pkg/generated/informers/externalversions/factory.go:101: forcing resync
I1012 02:02:45.258724 1 reflector.go:286] github.com/openshift/cluster-kube-controller-manager-operator/vendor/k8s.io/client-go/informers/factory.go:130: forcing resync
E1012 02:02:46.257355 1 runtime.go:66] Observed a panic: "asset: Asset(v3.11.0/kube-controller-manager/ns.yaml): Asset v3.11.0/kube-controller-manager/ns.yaml not found" (asset: Asset(v3.11.0/kube-controller-manager/ns.yaml): Asset v3.11.0/kube-controller-manager/ns.yaml not found)
/go/src/github.com/openshift/cluster-kube-controller-manager-operator/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:72
/go/src/github.com/openshift/cluster-kube-controller-manager-operator/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:65
/go/src/github.com/openshift/cluster-kube-controller-manager-operator/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:51
/usr/local/go/src/runtime/asm_amd64.s:573
/usr/local/go/src/runtime/panic.go:502
/go/src/github.com/openshift/cluster-kube-controller-manager-operator/pkg/operator/v311_00_assets/bindata.go:458
/go/src/github.com/openshift/cluster-kube-controller-manager-operator/pkg/operator/sync_kubecontrollermanager_v311_00.go:31
/go/src/github.com/openshift/cluster-kube-controller-manager-operator/pkg/operator/operator.go:129
/go/src/github.com/openshift/cluster-kube-controller-manager-operator/pkg/operator/operator.go:174
/go/src/github.com/openshift/cluster-kube-controller-manager-operator/pkg/operator/operator.go:163
/go/src/github.com/openshift/cluster-kube-controller-manager-operator/pkg/operator/operator.go:157
/go/src/github.com/openshift/cluster-kube-controller-manager-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:133
/go/src/github.com/openshift/cluster-kube-controller-manager-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:134
/go/src/github.com/openshift/cluster-kube-controller-manager-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:88
/usr/local/go/src/runtime/asm_amd64.s:2361
Metrics cannot be collected from controler-manager operator - Prometheus shows Get https://10.129.0.10:8443/metrics: x509: certificate is valid for localhost, not metrics.openshift-kube-controller-manager-operator.svc
in Targets
Running OKD 4.9
After a week or so of uptime internal certs seem to be refreshed. The following files on controllers are left with permissions 644 rather than 600.
This causes the okd4 compliance operator to flag an alert via the:
ocp4-file-permissions-openshift-pki-cert-files
ocp4-file-permissions-openshift-pki-key-files
Rules. Which are expecting all files to be 600 - as they are with these two exceptions.
/etc/kubernetes/static-pod-resources/kube-controller-manager-certs/secrets/csr-signer/tls.crt
/etc/kubernetes/static-pod-resources/kube-controller-manager-certs/secrets/csr-signer/tls.key
Is it possible to fix or control the permissions on these files when created?
I'm not entirely sure where in the okd4 operator chain creation of these files occur but here seems the best place to start!
On BYOR install using libvirt configs on GCE the installer keeps restarting:
I1203 14:41:51.153742 1 controller.go:154] clusterOperator openshift-cluster-kube-controller-manager-operator/openshift-cluster-kube-controller-manager-operator set to {"apiVersion":"config.openshift.io/v1","kind":"ClusterOperator","metadata":{"creationTimestamp":"2018-12-03T12:21:50Z","generation":1,"name":"openshift-cluster-kube-controller-manager-operator","namespace":"openshift-cluster-kube-controller-manager-operator","resourceVersion":"128030","selfLink":"/apis/config.openshift.io/v1/clusteroperators/openshift-cluster-kube-controller-manager-operator","uid":"031e8a98-f6f6-11e8-99a8-42010af00002"},"status":{"conditions":[{"Message":"ConfigObservationFailing: configmap/cluster-config-v1.kube-system: no recognized cloud provider platform found","Reason":"ConfigObservationFailing","Status":"True","Type":"Failing"}]}}
I1203 14:41:51.282575 1 request.go:485] Throttling request took 795.972047ms, request: GET:https://172.30.0.1:443/api/v1/namespaces/openshift-kube-controller-manager/secrets/serving-cert
I1203 14:41:51.482642 1 request.go:485] Throttling request took 787.843176ms, request: POST:https://172.30.0.1:443/api/v1/namespaces/openshift-kube-controller-manager/pods
I1203 14:41:51.582621 1 leaderelection.go:209] successfully renewed lease openshift-cluster-kube-controller-manager-operator/openshift-cluster-kube-controller-manager-operator-lock
I1203 14:41:51.684204 1 request.go:485] Throttling request took 797.718808ms, request: GET:https://172.30.0.1:443/api/v1/namespaces/openshift-kube-controller-manager/configmaps/kube-controller-manager-pod
I1203 14:41:51.882578 1 request.go:485] Throttling request took 732.083152ms, request: GET:https://172.30.0.1:443/api/v1/namespaces/openshift-kube-controller-manager/serviceaccounts/installer-sa
I1203 14:41:51.904727 1 controller.go:154] clusterOperator openshift-cluster-kube-controller-manager-operator/openshift-cluster-kube-controller-manager-operator set to {"apiVersion":"config.openshift.io/v1","kind":"ClusterOperator","metadata":{"creationTimestamp":"2018-12-03T12:21:50Z","generation":1,"name":"openshift-cluster-kube-controller-manager-operator","namespace":"openshift-cluster-kube-controller-manager-operator","resourceVersion":"128042","selfLink":"/apis/config.openshift.io/v1/clusteroperators/openshift-cluster-kube-controller-manager-operator","uid":"031e8a98-f6f6-11e8-99a8-42010af00002"},"status":{"conditions":[{"Message":"ConfigObservationFailing: configmap/cluster-config-v1.kube-system: no recognized cloud provider platform found","Reason":"ConfigObservationFailing","Status":"True","Type":"Failing"}]}}
I1203 14:41:52.082588 1 request.go:485] Throttling request took 796.323758ms, request: GET:https://172.30.0.1:443/api/v1/namespaces/openshift-kube-controller-manager/secrets/serving-cert-1
I1203 14:41:52.282626 1 request.go:485] Throttling request took 787.73166ms, request: GET:https://172.30.0.1:443/api/v1/namespaces/openshift-kube-controller-manager/pods/installer-1-vrutkovs-ig-m-0q7p
This makes operator rewrite service and rolebindings, which puts a large load on the cluster
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.