Git Product home page Git Product logo

cluster-kube-controller-manager-operator's Issues

PodDisruptionBudgetAtLimit alert should not be raised when there no disruption allowed.

PodDisruptionBudgetAtLimit should not be raised when there is no disruption allowed:
Example:
When there is only one pod and no disruption is allowed on it, this alert keeps on showing all the time.

In this kind of situation, this alert should not be raised, as it can be a conscious decision to have the expected number of pods to be equal to the min number of pods, and not allow any disruption.

How to sync the timezone with running hosts ?

controller manager pod does not aware of running host timezone, it affect the CronJob schedule time, Service serving cert generating or evaluation timing, log timestamp, and potential time aware tasks. I'm wondering how to sync timezone with the running pods based on operator with running host.

kcm rootCA missing apiserver ca leads to kube-root-ca problem

1 Bug phenomenon

  1. The kcm(kube-controller-manager) rootCA is generated by manageServiceAccountCABundle in targerconfigcontroller, and this func will get kube-apiserver-server-ca cm first , and then use it and other two cm to generate kcm rootCA
  2. when sometimes(very unlikely to happen , but i have met this just one time) the kube-apiserver-server-ca cm is missing, and the manageServiceAccountCABundle generate rootCA without kube-apiserver-server-ca , and finally the kcm leader holds the wrong rootCA, it will lead to the kube-root-ca problem in every pod, and the ocp release wil not work until i stop the wrong kcm leader

2 Bug fix:
2.1. this bug can be resolved by adding kube-apiserver-server-ca check in manageServiceAccountCABundle of targerconfigcontroller ,as follows

func manageServiceAccountCABundle(ctx context.Context, lister corev1listers.ConfigMapLister, client corev1client.ConfigMapsGetter, recorder events.Recorder) (*corev1.ConfigMap, bool, error) {
requiredConfigMap, err := resourcesynccontroller.CombineCABundleConfigMaps(
resourcesynccontroller.ResourceLocation{Namespace: operatorclient.TargetNamespace, Name: "serviceaccount-ca"},
lister,
// include the ca bundle needed to recognize the server
resourcesynccontroller.ResourceLocation{Namespace: operatorclient.GlobalMachineSpecifiedConfigNamespace, Name: "kube-apiserver-server-ca"},
// include the ca bundle needed to recognize default
// certificates generated by cluster-ingress-operator
resourcesynccontroller.ResourceLocation{Namespace: operatorclient.GlobalMachineSpecifiedConfigNamespace, Name: "default-ingress-cert"},
)
if err != nil {
return nil, false, err
}
return resourceapply.ApplyConfigMap(ctx, client, recorder, requiredConfigMap)
}

image
2.1 maybe like this , but this is not the best way to resolve this problem
image

2.3 this is also can be resolved by modifying the openshift library func CombineCABundleConfigMaps in resourcesynccontroller as

func CombineCABundleConfigMaps(destinationConfigMap ResourceLocation, lister corev1listers.ConfigMapLister, inputConfigMaps ...ResourceLocation) (*corev1.ConfigMap, error) {
certificates := []*x509.Certificate{}
for _, input := range inputConfigMaps {
inputConfigMap, err := lister.ConfigMaps(input.Namespace).Get(input.Name)
if apierrors.IsNotFound(err) {
continue
}
if err != nil {
return nil, err
}
// configmaps must conform to this
inputContent := inputConfigMap.Data["ca-bundle.crt"]
if len(inputContent) == 0 {
continue
}
inputCerts, err := cert.ParseCertsPEM([]byte(inputContent))
if err != nil {
return nil, fmt.Errorf("configmap/%s in %q is malformed: %v", input.Name, input.Namespace, err)
}
certificates = append(certificates, inputCerts...)
}
certificates = crypto.FilterExpiredCerts(certificates...)
finalCertificates := []*x509.Certificate{}
// now check for duplicates. n^2, but super simple
for i := range certificates {
found := false
for j := range finalCertificates {
if reflect.DeepEqual(certificates[i].Raw, finalCertificates[j].Raw) {
found = true
break
}
}
if !found {
finalCertificates = append(finalCertificates, certificates[i])
}
}
caBytes, err := crypto.EncodeCertificates(finalCertificates...)
if err != nil {
return nil, err
}
return &corev1.ConfigMap{
ObjectMeta: metav1.ObjectMeta{Namespace: destinationConfigMap.Namespace, Name: destinationConfigMap.Name},
Data: map[string]string{
"ca-bundle.crt": string(caBytes),
},
}, nil
}

image

3 This is related to the openshift library-go issue # issue 1472 github.com/openshift/library-go missing key configmap

–allocate-node-cidrs always been set to false

we are using Terway cni on Aibaba Cloud, so we need use kube-controller-manager to allocate PodCIDR, we set follow config in KubeControllerManager.operator.openshift.io/v1, but no effect for us.

unsupportedConfigOverrides:
extendedArguments:
allocate-node-cidrs:
- "true"
node-cidr-mask-size:
- "23"

$ cat /etc/kubernetes/static-pod-resources/kube-controller-manager-pod-9/configmaps/config/config.yaml |jq
{
"apiVersion": "kubecontrolplane.config.openshift.io/v1",
"extendedArguments": {
"allocate-node-cidrs": [
"true"
],
"cert-dir": [
"/var/run/kubernetes"
],
"cluster-cidr": [
"10.108.128.0/18"
],
"cluster-name": [
"staging-mpcr4"
],
"cluster-signing-cert-file": [
"/etc/kubernetes/static-pod-certs/secrets/csr-signer/tls.crt"
],
"cluster-signing-key-file": [
"/etc/kubernetes/static-pod-certs/secrets/csr-signer/tls.key"
],
"configure-cloud-routes": [
"false"
],
"controllers": [
"*",
"-ttl",
"-bootstrapsigner",
"-tokencleaner"
],
"enable-dynamic-provisioning": [
"true"
],
"experimental-cluster-signing-duration": [
"720h"
],
"feature-gates": [
"RotateKubeletServerCertificate=true",
"SupportPodPidsLimit=true",
"NodeDisruptionExclusion=true",
"ServiceNodeExclusion=true",
"SCTPSupport=true",
"LegacyNodeRoleBehavior=false"
],
"flex-volume-plugin-dir": [
"/etc/kubernetes/kubelet-plugins/volume/exec"
],
"kube-api-burst": [
"300"
],
"kube-api-qps": [
"150"
],
"leader-elect": [
"true"
],
"leader-elect-resource-lock": [
"configmaps"
],
"leader-elect-retry-period": [
"3s"
],
"node-cidr-mask-size": [
"23"
],
"port": [
"0"
],
"root-ca-file": [
"/etc/kubernetes/static-pod-resources/configmaps/serviceaccount-ca/ca-bundle.crt"
],
"secure-port": [
"10257"
],
"service-account-private-key-file": [
"/etc/kubernetes/static-pod-resources/secrets/service-account-private-key/service-account.key"
],
"service-cluster-ip-range": [
"172.30.0.0/18"
],
"use-service-account-credentials": [
"true"
]
},
"kind": "KubeControllerManagerConfig",
"serviceServingCert": {
"certFile": "/etc/kubernetes/static-pod-resources/configmaps/service-ca/ca-bundle.crt"
}
}

kcm logs:
I0417 05:50:25.193690 1 patch.go:65] FLAGSET: generic
I0417 05:50:25.193699 1 flags.go:33] FLAG: --allocate-node-cidrs="false"
I0417 05:50:25.193701 1 flags.go:33] FLAG: --allow-untagged-cloud="false"
I0417 05:50:25.193704 1 flags.go:33] FLAG: --cidr-allocator-type="RangeAllocator"
I0417 05:50:25.193707 1 flags.go:33] FLAG: --cloud-config=""
I0417 05:50:25.193709 1 flags.go:33] FLAG: --cloud-provider=""
I0417 05:50:25.193712 1 flags.go:33] FLAG: --cluster-cidr="10.108.128.0/18"
I0417 05:50:25.193715 1 flags.go:33] FLAG: --cluster-name="staging-mpcr4"
I0417 05:50:25.193717 1 flags.go:33] FLAG: --configure-cloud-routes="true"
I0417 05:50:25.193720 1 flags.go:33] FLAG: --controller-start-interval="0s"
I0417 05:50:25.193722 1 flags.go:33] FLAG: --controllers="[*,-ttl,-bootstrapsigner,-tokencleaner]"
I0417 05:50:25.193728 1 flags.go:33] FLAG: --external-cloud-volume-plugin=""
I0417 05:50:25.193731 1 flags.go:33] FLAG: --feature-gates="LegacyNodeRoleBehavior=false,NodeDisruptionExclusion=true,RotateKubeletServerCertificate=true,SCTPSupport=true,ServiceNodeExclusion=true,SupportPodPidsLimit=true"

panic: asset v3.11.0/kube-controller-manager/ns.yaml not found

after removing openshift-cluster-kube-controller-manager from the cvoconfig.overrides the operator works for a little while (minutes) but then I see this panic and the whole openshift-kube-controler-manager namespace goes away.

I1012 02:02:45.252594       1 reflector.go:286] github.com/openshift/cluster-kube-controller-manager-operator/vendor/k8s.io/client-go/informers/factory.go:130: forcing resync
I1012 02:02:45.254090       1 reflector.go:286] github.com/openshift/cluster-kube-controller-manager-operator/vendor/k8s.io/client-go/informers/factory.go:130: forcing resync
E1012 02:02:45.255370       1 runtime.go:66] Observed a panic: "asset: Asset(v3.11.0/kube-controller-manager/ns.yaml): Asset v3.11.0/kube-controller-manager/ns.yaml not found" (asset: Asset(v3.11.0/kube-controller-manager/ns.yaml): Asset v3.11.0/kube-controller-manager/ns.yaml not found)
/go/src/github.com/openshift/cluster-kube-controller-manager-operator/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:72
/go/src/github.com/openshift/cluster-kube-controller-manager-operator/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:65
/go/src/github.com/openshift/cluster-kube-controller-manager-operator/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:51
/usr/local/go/src/runtime/asm_amd64.s:573
/usr/local/go/src/runtime/panic.go:502
/go/src/github.com/openshift/cluster-kube-controller-manager-operator/pkg/operator/v311_00_assets/bindata.go:458
/go/src/github.com/openshift/cluster-kube-controller-manager-operator/pkg/operator/sync_kubecontrollermanager_v311_00.go:31
/go/src/github.com/openshift/cluster-kube-controller-manager-operator/pkg/operator/operator.go:129
/go/src/github.com/openshift/cluster-kube-controller-manager-operator/pkg/operator/operator.go:174
/go/src/github.com/openshift/cluster-kube-controller-manager-operator/pkg/operator/operator.go:163
/go/src/github.com/openshift/cluster-kube-controller-manager-operator/pkg/operator/operator.go:157
/go/src/github.com/openshift/cluster-kube-controller-manager-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:133
/go/src/github.com/openshift/cluster-kube-controller-manager-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:134
/go/src/github.com/openshift/cluster-kube-controller-manager-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:88
/usr/local/go/src/runtime/asm_amd64.s:2361
I1012 02:02:45.255499       1 reflector.go:286] github.com/openshift/cluster-kube-controller-manager-operator/pkg/generated/informers/externalversions/factory.go:101: forcing resync
I1012 02:02:45.258724       1 reflector.go:286] github.com/openshift/cluster-kube-controller-manager-operator/vendor/k8s.io/client-go/informers/factory.go:130: forcing resync
E1012 02:02:46.257355       1 runtime.go:66] Observed a panic: "asset: Asset(v3.11.0/kube-controller-manager/ns.yaml): Asset v3.11.0/kube-controller-manager/ns.yaml not found" (asset: Asset(v3.11.0/kube-controller-manager/ns.yaml): Asset v3.11.0/kube-controller-manager/ns.yaml not found)
/go/src/github.com/openshift/cluster-kube-controller-manager-operator/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:72
/go/src/github.com/openshift/cluster-kube-controller-manager-operator/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:65
/go/src/github.com/openshift/cluster-kube-controller-manager-operator/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:51
/usr/local/go/src/runtime/asm_amd64.s:573
/usr/local/go/src/runtime/panic.go:502
/go/src/github.com/openshift/cluster-kube-controller-manager-operator/pkg/operator/v311_00_assets/bindata.go:458
/go/src/github.com/openshift/cluster-kube-controller-manager-operator/pkg/operator/sync_kubecontrollermanager_v311_00.go:31
/go/src/github.com/openshift/cluster-kube-controller-manager-operator/pkg/operator/operator.go:129
/go/src/github.com/openshift/cluster-kube-controller-manager-operator/pkg/operator/operator.go:174
/go/src/github.com/openshift/cluster-kube-controller-manager-operator/pkg/operator/operator.go:163
/go/src/github.com/openshift/cluster-kube-controller-manager-operator/pkg/operator/operator.go:157
/go/src/github.com/openshift/cluster-kube-controller-manager-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:133
/go/src/github.com/openshift/cluster-kube-controller-manager-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:134
/go/src/github.com/openshift/cluster-kube-controller-manager-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:88
/usr/local/go/src/runtime/asm_amd64.s:2361

kube-controller-manager certs permissions

Running OKD 4.9

After a week or so of uptime internal certs seem to be refreshed. The following files on controllers are left with permissions 644 rather than 600.

This causes the okd4 compliance operator to flag an alert via the:

ocp4-file-permissions-openshift-pki-cert-files
ocp4-file-permissions-openshift-pki-key-files

Rules. Which are expecting all files to be 600 - as they are with these two exceptions.

/etc/kubernetes/static-pod-resources/kube-controller-manager-certs/secrets/csr-signer/tls.crt
/etc/kubernetes/static-pod-resources/kube-controller-manager-certs/secrets/csr-signer/tls.key

Is it possible to fix or control the permissions on these files when created?

I'm not entirely sure where in the okd4 operator chain creation of these files occur but here seems the best place to start!

Operator won't accept installs without a cloudprovider

On BYOR install using libvirt configs on GCE the installer keeps restarting:

I1203 14:41:51.153742       1 controller.go:154] clusterOperator openshift-cluster-kube-controller-manager-operator/openshift-cluster-kube-controller-manager-operator set to {"apiVersion":"config.openshift.io/v1","kind":"ClusterOperator","metadata":{"creationTimestamp":"2018-12-03T12:21:50Z","generation":1,"name":"openshift-cluster-kube-controller-manager-operator","namespace":"openshift-cluster-kube-controller-manager-operator","resourceVersion":"128030","selfLink":"/apis/config.openshift.io/v1/clusteroperators/openshift-cluster-kube-controller-manager-operator","uid":"031e8a98-f6f6-11e8-99a8-42010af00002"},"status":{"conditions":[{"Message":"ConfigObservationFailing: configmap/cluster-config-v1.kube-system: no recognized cloud provider platform found","Reason":"ConfigObservationFailing","Status":"True","Type":"Failing"}]}}
I1203 14:41:51.282575       1 request.go:485] Throttling request took 795.972047ms, request: GET:https://172.30.0.1:443/api/v1/namespaces/openshift-kube-controller-manager/secrets/serving-cert
I1203 14:41:51.482642       1 request.go:485] Throttling request took 787.843176ms, request: POST:https://172.30.0.1:443/api/v1/namespaces/openshift-kube-controller-manager/pods
I1203 14:41:51.582621       1 leaderelection.go:209] successfully renewed lease openshift-cluster-kube-controller-manager-operator/openshift-cluster-kube-controller-manager-operator-lock
I1203 14:41:51.684204       1 request.go:485] Throttling request took 797.718808ms, request: GET:https://172.30.0.1:443/api/v1/namespaces/openshift-kube-controller-manager/configmaps/kube-controller-manager-pod
I1203 14:41:51.882578       1 request.go:485] Throttling request took 732.083152ms, request: GET:https://172.30.0.1:443/api/v1/namespaces/openshift-kube-controller-manager/serviceaccounts/installer-sa
I1203 14:41:51.904727       1 controller.go:154] clusterOperator openshift-cluster-kube-controller-manager-operator/openshift-cluster-kube-controller-manager-operator set to {"apiVersion":"config.openshift.io/v1","kind":"ClusterOperator","metadata":{"creationTimestamp":"2018-12-03T12:21:50Z","generation":1,"name":"openshift-cluster-kube-controller-manager-operator","namespace":"openshift-cluster-kube-controller-manager-operator","resourceVersion":"128042","selfLink":"/apis/config.openshift.io/v1/clusteroperators/openshift-cluster-kube-controller-manager-operator","uid":"031e8a98-f6f6-11e8-99a8-42010af00002"},"status":{"conditions":[{"Message":"ConfigObservationFailing: configmap/cluster-config-v1.kube-system: no recognized cloud provider platform found","Reason":"ConfigObservationFailing","Status":"True","Type":"Failing"}]}}
I1203 14:41:52.082588       1 request.go:485] Throttling request took 796.323758ms, request: GET:https://172.30.0.1:443/api/v1/namespaces/openshift-kube-controller-manager/secrets/serving-cert-1
I1203 14:41:52.282626       1 request.go:485] Throttling request took 787.73166ms, request: GET:https://172.30.0.1:443/api/v1/namespaces/openshift-kube-controller-manager/pods/installer-1-vrutkovs-ig-m-0q7p

This makes operator rewrite service and rolebindings, which puts a large load on the cluster

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.