The kube-controller-manager operator installs and maintains the kube-controller-manager on a cluster

License: Apache License 2.0

Makefile 0.65% Go 99.35%

cluster-kube-controller-manager-operator's Issues

Create a generic construct to use as operator sync

Originally posted by @deads2k in https://github.com/openshift/cluster-kube-controller-manager-operator/pull/78/files

Future Release Branches Frozen For Merging | branch:release-4.18 branch:release-4.19

The following branches are being fast-forwarded from the current development branch (master) as placeholders for future releases. No merging is allowed into these release branches until they are unfrozen for production release.

release-4.18
release-4.19

For more information, see the branching documentation.

PodDisruptionBudgetAtLimit alert should not be raised when there no disruption allowed.

PodDisruptionBudgetAtLimit should not be raised when there is no disruption allowed:
Example:
When there is only one pod and no disruption is allowed on it, this alert keeps on showing all the time.

In this kind of situation, this alert should not be raised, as it can be a conscious decision to have the expected number of pods to be equal to the min number of pods, and not allow any disruption.

change kcm connetc localhost apiserver

1 As the the kcm connect to apiserver via apiServerInternalURL, which was set by

https://github.com/openshift/installer/blob/23b787c0bdd6399cb627eb5055a65869cb7b9f11/pkg/asset/manifests/utils.go#L42-L44

2 In the worst case, all kcm connect to the same apisever , which would put more pressure to the apiserver

3 We should let the kcm connect to the local apiserver to achieve load balancing

How to sync the timezone with running hosts ?

controller manager pod does not aware of running host timezone, it affect the CronJob schedule time, Service serving cert generating or evaluation timing, log timestamp, and potential time aware tasks. I'm wondering how to sync timezone with the running pods based on operator with running host.

Network CIDRs not observed from cluster config

We need the counterpart of openshift/cluster-kube-apiserver-operator#74 for the controller manager. We have it for the bootstrapping phase (merged in #76), but not post-bootstrap.

kcm rootCA missing apiserver ca leads to kube-root-ca problem

1 Bug phenomenon：

The kcm(kube-controller-manager) rootCA is generated by manageServiceAccountCABundle in targerconfigcontroller, and this func will get kube-apiserver-server-ca cm first , and then use it and other two cm to generate kcm rootCA
when sometimes（very unlikely to happen , but i have met this just one time） the kube-apiserver-server-ca cm is missing, and the manageServiceAccountCABundle generate rootCA without kube-apiserver-server-ca , and finally the kcm leader holds the wrong rootCA， it will lead to the kube-root-ca problem in every pod, and the ocp release wil not work until i stop the wrong kcm leader

2 Bug fix:
2.1. this bug can be resolved by adding kube-apiserver-server-ca check in manageServiceAccountCABundle of targerconfigcontroller ,as follows

cluster-kube-controller-manager-operator/pkg/operator/targetconfigcontroller/targetconfigcontroller.go

Lines 706 to 720 in 4ca346e

 func manageServiceAccountCABundle(ctx context.Context, lister corev1listers.ConfigMapLister, client corev1client.ConfigMapsGetter, recorder events.Recorder) (*corev1.ConfigMap, bool, error) { 

 requiredConfigMap, err := resourcesynccontroller.CombineCABundleConfigMaps( 

 resourcesynccontroller.ResourceLocation{Namespace: operatorclient.TargetNamespace, Name: "serviceaccount-ca"}, 

 lister, 

 // include the ca bundle needed to recognize the server 

 resourcesynccontroller.ResourceLocation{Namespace: operatorclient.GlobalMachineSpecifiedConfigNamespace, Name: "kube-apiserver-server-ca"}, 

 // include the ca bundle needed to recognize default 

 // certificates generated by cluster-ingress-operator 

 resourcesynccontroller.ResourceLocation{Namespace: operatorclient.GlobalMachineSpecifiedConfigNamespace, Name: "default-ingress-cert"}, 

 ) 

 if err != nil { 

 return nil, false, err 

 } 

 return resourceapply.ApplyConfigMap(ctx, client, recorder, requiredConfigMap) 

 }

2.1 maybe like this , but this is not the best way to resolve this problem

2.3 this is also can be resolved by modifying the openshift library func CombineCABundleConfigMaps in resourcesynccontroller as

cluster-kube-controller-manager-operator/vendor/github.com/openshift/library-go/pkg/operator/resourcesynccontroller/core.go

Lines 17 to 67 in 4ca346e

 func CombineCABundleConfigMaps(destinationConfigMap ResourceLocation, lister corev1listers.ConfigMapLister, inputConfigMaps ...ResourceLocation) (*corev1.ConfigMap, error) { 

 certificates := []*x509.Certificate{} 

 for _, input := range inputConfigMaps { 

 inputConfigMap, err := lister.ConfigMaps(input.Namespace).Get(input.Name) 

 if apierrors.IsNotFound(err) { 

 continue 

 } 

 if err != nil { 

 return nil, err 

 } 

 // configmaps must conform to this 

 inputContent := inputConfigMap.Data["ca-bundle.crt"] 

 if len(inputContent) == 0 { 

 continue 

 } 

 inputCerts, err := cert.ParseCertsPEM([]byte(inputContent)) 

 if err != nil { 

 return nil, fmt.Errorf("configmap/%s in %q is malformed: %v", input.Name, input.Namespace, err) 

 } 

 certificates = append(certificates, inputCerts...) 

 } 

 certificates = crypto.FilterExpiredCerts(certificates...) 

 finalCertificates := []*x509.Certificate{} 

 // now check for duplicates. n^2, but super simple 

 for i := range certificates { 

 found := false 

 for j := range finalCertificates { 

 if reflect.DeepEqual(certificates[i].Raw, finalCertificates[j].Raw) { 

 found = true 

 break 

 } 

 } 

 if !found { 

 finalCertificates = append(finalCertificates, certificates[i]) 

 } 

 } 

 caBytes, err := crypto.EncodeCertificates(finalCertificates...) 

 if err != nil { 

 return nil, err 

 } 

 return &corev1.ConfigMap{ 

 ObjectMeta: metav1.ObjectMeta{Namespace: destinationConfigMap.Namespace, Name: destinationConfigMap.Name}, 

 Data: map[string]string{ 

 "ca-bundle.crt": string(caBytes), 

 }, 

 }, nil 

 }

3 This is related to the openshift library-go issue # issue 1472 github.com/openshift/library-go missing key configmap

Split the --asset-input dir to --asset-tls-input and --asset-auth-input

Right now the --asset-input-dir represents the /assets/tls, however for the controller manager we also need to read /assets/auth where the kubeconfig file is created. We should split the assets input dir option to cover both cases.

–allocate-node-cidrs always been set to false

we are using Terway cni on Aibaba Cloud, so we need use kube-controller-manager to allocate PodCIDR, we set follow config in KubeControllerManager.operator.openshift.io/v1, but no effect for us.

unsupportedConfigOverrides:
extendedArguments:
allocate-node-cidrs:
- "true"
node-cidr-mask-size:
- "23"

$ cat /etc/kubernetes/static-pod-resources/kube-controller-manager-pod-9/configmaps/config/config.yaml |jq
{
"apiVersion": "kubecontrolplane.config.openshift.io/v1",
"extendedArguments": {
"allocate-node-cidrs": [
"true"
],
"cert-dir": [
"/var/run/kubernetes"
],
"cluster-cidr": [
"10.108.128.0/18"
],
"cluster-name": [
"staging-mpcr4"
],
"cluster-signing-cert-file": [
"/etc/kubernetes/static-pod-certs/secrets/csr-signer/tls.crt"
],
"cluster-signing-key-file": [
"/etc/kubernetes/static-pod-certs/secrets/csr-signer/tls.key"
],
"configure-cloud-routes": [
"false"
],
"controllers": [
"*",
"-ttl",
"-bootstrapsigner",
"-tokencleaner"
],
"enable-dynamic-provisioning": [
"true"
],
"experimental-cluster-signing-duration": [
"720h"
],
"feature-gates": [
"RotateKubeletServerCertificate=true",
"SupportPodPidsLimit=true",
"NodeDisruptionExclusion=true",
"ServiceNodeExclusion=true",
"SCTPSupport=true",
"LegacyNodeRoleBehavior=false"
],
"flex-volume-plugin-dir": [
"/etc/kubernetes/kubelet-plugins/volume/exec"
],
"kube-api-burst": [
"300"
],
"kube-api-qps": [
"150"
],
"leader-elect": [
"true"
],
"leader-elect-resource-lock": [
"configmaps"
],
"leader-elect-retry-period": [
"3s"
],
"node-cidr-mask-size": [
"23"
],
"port": [
"0"
],
"root-ca-file": [
"/etc/kubernetes/static-pod-resources/configmaps/serviceaccount-ca/ca-bundle.crt"
],
"secure-port": [
"10257"
],
"service-account-private-key-file": [
"/etc/kubernetes/static-pod-resources/secrets/service-account-private-key/service-account.key"
],
"service-cluster-ip-range": [
"172.30.0.0/18"
],
"use-service-account-credentials": [
"true"
]
},
"kind": "KubeControllerManagerConfig",
"serviceServingCert": {
"certFile": "/etc/kubernetes/static-pod-resources/configmaps/service-ca/ca-bundle.crt"
}
}

kcm logs:
I0417 05:50:25.193690 1 patch.go:65] FLAGSET: generic
I0417 05:50:25.193699 1 flags.go:33] FLAG: --allocate-node-cidrs="false"
I0417 05:50:25.193701 1 flags.go:33] FLAG: --allow-untagged-cloud="false"
I0417 05:50:25.193704 1 flags.go:33] FLAG: --cidr-allocator-type="RangeAllocator"
I0417 05:50:25.193707 1 flags.go:33] FLAG: --cloud-config=""
I0417 05:50:25.193709 1 flags.go:33] FLAG: --cloud-provider=""
I0417 05:50:25.193712 1 flags.go:33] FLAG: --cluster-cidr="10.108.128.0/18"
I0417 05:50:25.193715 1 flags.go:33] FLAG: --cluster-name="staging-mpcr4"
I0417 05:50:25.193717 1 flags.go:33] FLAG: --configure-cloud-routes="true"
I0417 05:50:25.193720 1 flags.go:33] FLAG: --controller-start-interval="0s"
I0417 05:50:25.193722 1 flags.go:33] FLAG: --controllers="[*,-ttl,-bootstrapsigner,-tokencleaner]"
I0417 05:50:25.193728 1 flags.go:33] FLAG: --external-cloud-volume-plugin=""
I0417 05:50:25.193731 1 flags.go:33] FLAG: --feature-gates="LegacyNodeRoleBehavior=false,NodeDisruptionExclusion=true,RotateKubeletServerCertificate=true,SCTPSupport=true,ServiceNodeExclusion=true,SupportPodPidsLimit=true"

Future Release Branches Frozen For Merging | branch:release-4.16 branch:release-4.17

release-4.16
release-4.17

For more information, see the branching documentation.

panic: asset v3.11.0/kube-controller-manager/ns.yaml not found

after removing openshift-cluster-kube-controller-manager from the cvoconfig.overrides the operator works for a little while (minutes) but then I see this panic and the whole openshift-kube-controler-manager namespace goes away.

I1012 02:02:45.252594       1 reflector.go:286] github.com/openshift/cluster-kube-controller-manager-operator/vendor/k8s.io/client-go/informers/factory.go:130: forcing resync
I1012 02:02:45.254090       1 reflector.go:286] github.com/openshift/cluster-kube-controller-manager-operator/vendor/k8s.io/client-go/informers/factory.go:130: forcing resync
E1012 02:02:45.255370       1 runtime.go:66] Observed a panic: "asset: Asset(v3.11.0/kube-controller-manager/ns.yaml): Asset v3.11.0/kube-controller-manager/ns.yaml not found" (asset: Asset(v3.11.0/kube-controller-manager/ns.yaml): Asset v3.11.0/kube-controller-manager/ns.yaml not found)
/go/src/github.com/openshift/cluster-kube-controller-manager-operator/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:72
/go/src/github.com/openshift/cluster-kube-controller-manager-operator/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:65
/go/src/github.com/openshift/cluster-kube-controller-manager-operator/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:51
/usr/local/go/src/runtime/asm_amd64.s:573
/usr/local/go/src/runtime/panic.go:502
/go/src/github.com/openshift/cluster-kube-controller-manager-operator/pkg/operator/v311_00_assets/bindata.go:458
/go/src/github.com/openshift/cluster-kube-controller-manager-operator/pkg/operator/sync_kubecontrollermanager_v311_00.go:31
/go/src/github.com/openshift/cluster-kube-controller-manager-operator/pkg/operator/operator.go:129
/go/src/github.com/openshift/cluster-kube-controller-manager-operator/pkg/operator/operator.go:174
/go/src/github.com/openshift/cluster-kube-controller-manager-operator/pkg/operator/operator.go:163
/go/src/github.com/openshift/cluster-kube-controller-manager-operator/pkg/operator/operator.go:157
/go/src/github.com/openshift/cluster-kube-controller-manager-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:133
/go/src/github.com/openshift/cluster-kube-controller-manager-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:134
/go/src/github.com/openshift/cluster-kube-controller-manager-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:88
/usr/local/go/src/runtime/asm_amd64.s:2361
I1012 02:02:45.255499       1 reflector.go:286] github.com/openshift/cluster-kube-controller-manager-operator/pkg/generated/informers/externalversions/factory.go:101: forcing resync
I1012 02:02:45.258724       1 reflector.go:286] github.com/openshift/cluster-kube-controller-manager-operator/vendor/k8s.io/client-go/informers/factory.go:130: forcing resync
E1012 02:02:46.257355       1 runtime.go:66] Observed a panic: "asset: Asset(v3.11.0/kube-controller-manager/ns.yaml): Asset v3.11.0/kube-controller-manager/ns.yaml not found" (asset: Asset(v3.11.0/kube-controller-manager/ns.yaml): Asset v3.11.0/kube-controller-manager/ns.yaml not found)
/go/src/github.com/openshift/cluster-kube-controller-manager-operator/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:72
/go/src/github.com/openshift/cluster-kube-controller-manager-operator/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:65
/go/src/github.com/openshift/cluster-kube-controller-manager-operator/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:51
/usr/local/go/src/runtime/asm_amd64.s:573
/usr/local/go/src/runtime/panic.go:502
/go/src/github.com/openshift/cluster-kube-controller-manager-operator/pkg/operator/v311_00_assets/bindata.go:458
/go/src/github.com/openshift/cluster-kube-controller-manager-operator/pkg/operator/sync_kubecontrollermanager_v311_00.go:31
/go/src/github.com/openshift/cluster-kube-controller-manager-operator/pkg/operator/operator.go:129
/go/src/github.com/openshift/cluster-kube-controller-manager-operator/pkg/operator/operator.go:174
/go/src/github.com/openshift/cluster-kube-controller-manager-operator/pkg/operator/operator.go:163
/go/src/github.com/openshift/cluster-kube-controller-manager-operator/pkg/operator/operator.go:157
/go/src/github.com/openshift/cluster-kube-controller-manager-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:133
/go/src/github.com/openshift/cluster-kube-controller-manager-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:134
/go/src/github.com/openshift/cluster-kube-controller-manager-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:88
/usr/local/go/src/runtime/asm_amd64.s:2361

Installer pods are not retried on error

Metrics certificate is valid for localhost, not service name

Metrics cannot be collected from controler-manager operator - Prometheus shows Get https://10.129.0.10:8443/metrics: x509: certificate is valid for localhost, not metrics.openshift-kube-controller-manager-operator.svc in Targets

kube-controller-manager certs permissions

Running OKD 4.9

After a week or so of uptime internal certs seem to be refreshed. The following files on controllers are left with permissions 644 rather than 600.

This causes the okd4 compliance operator to flag an alert via the:

ocp4-file-permissions-openshift-pki-cert-files
ocp4-file-permissions-openshift-pki-key-files

Rules. Which are expecting all files to be 600 - as they are with these two exceptions.

/etc/kubernetes/static-pod-resources/kube-controller-manager-certs/secrets/csr-signer/tls.crt
/etc/kubernetes/static-pod-resources/kube-controller-manager-certs/secrets/csr-signer/tls.key

Is it possible to fix or control the permissions on these files when created?

I'm not entirely sure where in the okd4 operator chain creation of these files occur but here seems the best place to start!

Operator won't accept installs without a cloudprovider

On BYOR install using libvirt configs on GCE the installer keeps restarting:

I1203 14:41:51.153742       1 controller.go:154] clusterOperator openshift-cluster-kube-controller-manager-operator/openshift-cluster-kube-controller-manager-operator set to {"apiVersion":"config.openshift.io/v1","kind":"ClusterOperator","metadata":{"creationTimestamp":"2018-12-03T12:21:50Z","generation":1,"name":"openshift-cluster-kube-controller-manager-operator","namespace":"openshift-cluster-kube-controller-manager-operator","resourceVersion":"128030","selfLink":"/apis/config.openshift.io/v1/clusteroperators/openshift-cluster-kube-controller-manager-operator","uid":"031e8a98-f6f6-11e8-99a8-42010af00002"},"status":{"conditions":[{"Message":"ConfigObservationFailing: configmap/cluster-config-v1.kube-system: no recognized cloud provider platform found","Reason":"ConfigObservationFailing","Status":"True","Type":"Failing"}]}}
I1203 14:41:51.282575       1 request.go:485] Throttling request took 795.972047ms, request: GET:https://172.30.0.1:443/api/v1/namespaces/openshift-kube-controller-manager/secrets/serving-cert
I1203 14:41:51.482642       1 request.go:485] Throttling request took 787.843176ms, request: POST:https://172.30.0.1:443/api/v1/namespaces/openshift-kube-controller-manager/pods
I1203 14:41:51.582621       1 leaderelection.go:209] successfully renewed lease openshift-cluster-kube-controller-manager-operator/openshift-cluster-kube-controller-manager-operator-lock
I1203 14:41:51.684204       1 request.go:485] Throttling request took 797.718808ms, request: GET:https://172.30.0.1:443/api/v1/namespaces/openshift-kube-controller-manager/configmaps/kube-controller-manager-pod
I1203 14:41:51.882578       1 request.go:485] Throttling request took 732.083152ms, request: GET:https://172.30.0.1:443/api/v1/namespaces/openshift-kube-controller-manager/serviceaccounts/installer-sa
I1203 14:41:51.904727       1 controller.go:154] clusterOperator openshift-cluster-kube-controller-manager-operator/openshift-cluster-kube-controller-manager-operator set to {"apiVersion":"config.openshift.io/v1","kind":"ClusterOperator","metadata":{"creationTimestamp":"2018-12-03T12:21:50Z","generation":1,"name":"openshift-cluster-kube-controller-manager-operator","namespace":"openshift-cluster-kube-controller-manager-operator","resourceVersion":"128042","selfLink":"/apis/config.openshift.io/v1/clusteroperators/openshift-cluster-kube-controller-manager-operator","uid":"031e8a98-f6f6-11e8-99a8-42010af00002"},"status":{"conditions":[{"Message":"ConfigObservationFailing: configmap/cluster-config-v1.kube-system: no recognized cloud provider platform found","Reason":"ConfigObservationFailing","Status":"True","Type":"Failing"}]}}
I1203 14:41:52.082588       1 request.go:485] Throttling request took 796.323758ms, request: GET:https://172.30.0.1:443/api/v1/namespaces/openshift-kube-controller-manager/secrets/serving-cert-1
I1203 14:41:52.282626       1 request.go:485] Throttling request took 787.73166ms, request: GET:https://172.30.0.1:443/api/v1/namespaces/openshift-kube-controller-manager/pods/installer-1-vrutkovs-ig-m-0q7p

This makes operator rewrite service and rolebindings, which puts a large load on the cluster

openshift / cluster-kube-controller-manager-operator Goto Github PK

cluster-kube-controller-manager-operator's Issues

Create a generic construct to use as operator sync

Future Release Branches Frozen For Merging | branch:release-4.18 branch:release-4.19

PodDisruptionBudgetAtLimit alert should not be raised when there no disruption allowed.

change kcm connetc localhost apiserver

How to sync the timezone with running hosts ?

Network CIDRs not observed from cluster config

kcm rootCA missing apiserver ca leads to kube-root-ca problem

Split the --asset-input dir to --asset-tls-input and --asset-auth-input

–allocate-node-cidrs always been set to false

Future Release Branches Frozen For Merging | branch:release-4.16 branch:release-4.17

panic: asset v3.11.0/kube-controller-manager/ns.yaml not found

Installer pods are not retried on error

Metrics certificate is valid for localhost, not service name

kube-controller-manager certs permissions

Operator won't accept installs without a cloudprovider

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

	func manageServiceAccountCABundle(ctx context.Context, lister corev1listers.ConfigMapLister, client corev1client.ConfigMapsGetter, recorder events.Recorder) (*corev1.ConfigMap, bool, error) {
	requiredConfigMap, err := resourcesynccontroller.CombineCABundleConfigMaps(
	resourcesynccontroller.ResourceLocation{Namespace: operatorclient.TargetNamespace, Name: "serviceaccount-ca"},
	lister,
	// include the ca bundle needed to recognize the server
	resourcesynccontroller.ResourceLocation{Namespace: operatorclient.GlobalMachineSpecifiedConfigNamespace, Name: "kube-apiserver-server-ca"},
	// include the ca bundle needed to recognize default
	// certificates generated by cluster-ingress-operator
	resourcesynccontroller.ResourceLocation{Namespace: operatorclient.GlobalMachineSpecifiedConfigNamespace, Name: "default-ingress-cert"},
	)
	if err != nil {
	return nil, false, err
	}
	return resourceapply.ApplyConfigMap(ctx, client, recorder, requiredConfigMap)
	}

	func CombineCABundleConfigMaps(destinationConfigMap ResourceLocation, lister corev1listers.ConfigMapLister, inputConfigMaps ...ResourceLocation) (*corev1.ConfigMap, error) {
	certificates := []*x509.Certificate{}
	for _, input := range inputConfigMaps {
	inputConfigMap, err := lister.ConfigMaps(input.Namespace).Get(input.Name)
	if apierrors.IsNotFound(err) {
	continue
	}
	if err != nil {
	return nil, err
	}

	// configmaps must conform to this
	inputContent := inputConfigMap.Data["ca-bundle.crt"]
	if len(inputContent) == 0 {
	continue
	}
	inputCerts, err := cert.ParseCertsPEM([]byte(inputContent))
	if err != nil {
	return nil, fmt.Errorf("configmap/%s in %q is malformed: %v", input.Name, input.Namespace, err)
	}
	certificates = append(certificates, inputCerts...)
	}

	certificates = crypto.FilterExpiredCerts(certificates...)
	finalCertificates := []*x509.Certificate{}
	// now check for duplicates. n^2, but super simple
	for i := range certificates {
	found := false
	for j := range finalCertificates {
	if reflect.DeepEqual(certificates[i].Raw, finalCertificates[j].Raw) {
	found = true
	break
	}
	}
	if !found {
	finalCertificates = append(finalCertificates, certificates[i])
	}
	}

	caBytes, err := crypto.EncodeCertificates(finalCertificates...)
	if err != nil {
	return nil, err
	}

	return &corev1.ConfigMap{
	ObjectMeta: metav1.ObjectMeta{Namespace: destinationConfigMap.Namespace, Name: destinationConfigMap.Name},
	Data: map[string]string{
	"ca-bundle.crt": string(caBytes),
	},
	}, nil
	}