openshift / cluster-version-operator Goto Github PK

License: Apache License 2.0

Go 97.76% Shell 0.24% Dockerfile 0.05% Makefile 0.02% Python 1.93%

cluster-version-operator's Issues

How to update object as the old spec seems to be deprecated

To get a list of current overrides, run:

$ oc get -o json clusterversion version | jq .spec.overrides
[
{
"kind": "APIService",
"name": "v1alpha1.packages.apps.redhat.com",
"unmanaged": true
}
]

However, the spec dont contain overrides section

spec:
channel: fast
clusterID: 37be53b4-bdbc-4b65-b76e-ddf9c2b671c6
upstream: http://localhost:8080/graph

So all the following steps wont work.

Similarly, below command needs to be updated to:

oc get clusterversion -o jsonpath='{.status.current.payload}{"\n"}' version
To
oc get clusterversion -o jsonpath='{.status.desired.payload}{"\n"}' version

Missing the specification for usage of --serving-key-file if --listen option is not unset.

While trying to run the CVO locally as per the doc/dev, the section specifies the following command:
./_output/linux/amd64/cluster-version-operator -v5 start --release-image 4.4.0-rc.4

But, as the --listen is set to "0.0.0.0:9099" by default, the command fails with the following error unless we append --listen="" to the command:
F0926 00:28:13.708624 62174 start.go:24] error: --listen was not set empty, so --serving-cert-file must be set

The documentation should be updated to either specify the --serving-cert-file or at least to unset the listen option --listen="".

Errors while upgrading and sync loop interactions

The sync loop being inline with the status update loop makes upgrades hard to debug (we have to wait for the whole loop to complete to update status). Also, if a sync is running and you change the desired state, we need to wait for the sync to complete. There's some other errors that then show up to users that are non obvious.

Instead, we should decouple the CV status from the sync loop, and make the sync loop cancellable.

To do that, I think we should:

Create a new sync loop, similar to available updates
Have the main loop (the one that reacts to CV changes) issue instructions to the sync loop like: Sync to version X
The sync loop would handle reconciling payloads, rate limiting how often it starts a resync (vs a change to a new version), and reporting completion. On completion, the sync loop would requeue the CV (like available updates loop does).
The main loop would observe the state of the sync loop and update status (like available updates).
During a sync, have the CV status get updated, although no more often than every 30s or so
Use context signalling to stop an update in progress (the code should be structured around a context)

End outcomes:

When a user sets a desired update and we are syncing the previous version, we stop immediately
When the user sets a desired update and we are still processing a previous update, we communicate that in status to the user immediately but continue processing if we aren't hitting a failure
When the user clears desiredUpdate we immediately go back to syncing our current state
Status updates from CVO feel immediate

Part of https://jira.coreos.com/browse/CORS-954

Readme doesn't explain what cluster-version-operator is

It would be nice to have a proper description of what 'cluster-version-operator' is, features, current status, etc. in the repo 'front page' instead build instructions.

Failure to update image registry service account due to cache sync error

CVO cannot deploy the image registry due to a cache sync error.

CVO payload: registry.svc.ci.openshift.org/openshift/origin-release:v4.0
CVO version: 4.0.0-0.alpha-2018-11-30-060640

Error in logs:

E1130 14:43:49.424010       1 sync.go:133] error running apply for serviceaccount "openshift-image-registry/cluster-image-registry-operator" (v1, 125 of 183): serviceaccounts "cluster-image-registry-operator" is forbidden: caches not synchronized

Unable to add whitelist annotation to cli downloads route

We would like to be able to restrict by IP access to the downloads route in the openshift-console project using the ip whitelist annotation.

This currently does not appear possible as any update to add the required annotation is reverted by the cluster-version-operator

Server Forbidden Updates To This Resource

Not sure what I've got here but this is a CI failure bringing up a cluster, from the CVO pod logs:

E0213 17:40:08.957050       1 task.go:57] error running apply for service "openshift-cloud-credential-operator/controller-manager-service" (84 of 273): services "controller-manager-service" is forbidden: caches not synchronized
E0213 17:40:28.980158       1 task.go:57] error running apply for service "openshift-cloud-credential-operator/controller-manager-service" (84 of 273): services "controller-manager-service" is forbidden: caches not synchronized
I0213 17:40:38.501814       1 leaderelection.go:209] successfully renewed lease openshift-cluster-version/version
I0213 17:40:40.483182       1 reflector.go:286] github.com/openshift/cluster-version-operator/vendor/github.com/openshift/client-go/config/informers/externalversions/factory.go:101: forcing resync
E0213 17:40:51.996136       1 task.go:57] error running apply for service "openshift-cloud-credential-operator/controller-manager-service" (84 of 273): services "controller-manager-service" is forbidden: caches not synchronized
I0213 17:40:51.996197       1 task_graph.go:438] No more reachable nodes in graph, continue
I0213 17:40:51.996203       1 task_graph.go:474] No more work
I0213 17:40:51.996221       1 task_graph.go:494] No more work for 3
I0213 17:40:51.996227       1 task_graph.go:494] No more work for 6
I0213 17:40:51.996234       1 task_graph.go:494] No more work for 7
I0213 17:40:51.996240       1 task_graph.go:494] No more work for 1
I0213 17:40:51.996246       1 task_graph.go:494] No more work for 4
I0213 17:40:51.996252       1 task_graph.go:494] No more work for 2
I0213 17:40:51.996252       1 task_graph.go:494] No more work for 0
I0213 17:40:51.996257       1 task_graph.go:494] No more work for 5
I0213 17:40:51.996277       1 task_graph.go:510] Workers finished
I0213 17:40:51.996290       1 task_graph.go:518] Result of work: [Could not update service "openshift-cloud-credential-operator/controller-manager-service" (84 of 273): the server has forbidden updates to this resource]
E0213 17:40:51.996341       1 sync_worker.go:263] unable to synchronize image (waiting 3m19.747206386s): Could not update service "openshift-cloud-credential-operator/controller-manager-service" (84 of 273): the server has forbidden updates to this resource
I0213 17:40:51.996400       1 cvo.go:298] Started syncing cluster version "openshift-cluster-version/version" (2019-02-13 17:40:51.996393402 +0000 UTC m=+2487.354191867)
I0213 17:40:51.996446       1 cvo.go:326] Desired version from operator is v1.Update{Version:"0.0.1-2019-02-13-164905", Image:"registry.svc.ci.openshift.org/ci-op-girsxxlp/release@sha256:ded54f5fb7dfe10f53176ac710f6309b05828dc0aa276b448ce5aefc8e5eae78"}
I0213 17:40:51.996541       1 cvo.go:300] Finished syncing cluster version "openshift-cluster-version/version" (144.1µs)

More logs available here: https://openshift-gce-devel.appspot.com/build/origin-ci-test/pr-logs/pull/openshift_cloud-credential-operator/31/pull-ci-openshift-cloud-credential-operator-master-e2e-aws/158

"Setting objects unmanaged" section issues

The cluster-network file is 0000_70_cluster-network-operator_03_daemonset.yaml
The release-image/0000_07_cluster-network-operator_03_daemonset.yaml file even if it says daemonset it is actually a deployment:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: network-operator
  namespace: openshift-network-operator

The patch complains because it seems to require a group as well (https://godoc.org/github.com/openshift/api/config/v1#ComponentOverride):

- op: add
  path: /spec/overrides
  value:
  - kind: Deployment
    group: apps                        <--------- This
    name: cluster-network-operator
    namespace: openshift-network-operator
    unmanaged: true

nodeSelector for Level 1 operators

Hi,

We're trying to setup an openshift 4.4.3 cluster for a customer and they need all of the worker nodes to be dedicated only to their business application pods (apart from daemonsets ofcourse). But the default installation comes with a bunch of cluster operators bundled in the release image and the manifests cannot be changed for them. This leads to some pods of cluster operators like marketplace operator scheduled on worker nodes which don't have a nodeSelector and other operators have a generic linux nodeSelector on them. Ofcourse we can create infra nodes and manually change nodeSelector to move stuff there. But the manifests will get reset with upgrades and we will have to manually patch them again. Is there are way to properly define nodeSelectors for Level 1 operators like cluster operators (Assuming CVO is Level 0 operator)? It would be great if we get even a small lead on that and we can contribute if the feature is not there. Thanks

Openshift upgrade version downloading update source

Hello,

In the Openshift cluster upgrade process, where are the images downloaded with the new version? from an external repository? the download source I want to know. Thanks in advance

ci/prow/integration is failing

Test PR: #210

Error: unknown command "openshift-kube-apiserver" for "hypershift"
Run 'hypershift --help' for usage.
unknown command "openshift-kube-apiserver" for "hypershift"

/cc @abhinavdahiya @wking

docs/dev/clusteroperator.md is out of date

It still refers to status.version in the singular. There maybe other things wrong too?

Provide the cluster ID in cluster_version Prometheus metrics

Access to the Cluster Version k8s object requires cluster role access, which makes it difficult to obtain the cluster ID.

Operator-metering is a useful tool for building reports from Prometheus data. For upcoming flows and customer interactions (support, billing, etc) it would be beneficial for the reports to contain the cluster ID.

If the cluster ID was available as label or its own metric in Prometheus that would help to simplify report origination.

Able to trigger a leader election panic, use of closed channel

TEST_INTEGRATION=1 go test ./pkg/start/ -tags integration -count=4

E0325 22:45:48.713617   10713 sync_worker.go:276] unable to synchronize image (waiting 625ms): Could not update configmap "e2e-cvo-ff4l/config2" (2 of 2): the object is invalid, possibly due to local cluster configuration
E0325 22:45:48.922205   10713 leaderelection.go:256] error initially creating leader election record: namespaces "e2e-cvo-mlm6zv" not found
E0325 22:45:54.301108   10713 event.go:259] Could not construct reference to: '&v1.ConfigMap{TypeMeta:v1.TypeMeta{Kind:"", APIVersion:""}, ObjectMeta:v1.ObjectMeta{Name:"", GenerateName:"", Namespace:"", SelfLink:"", UID:"", ResourceVersion:"", Generation:0, CreationTimestamp:v1.Time{Time:time.Time{wall:0x0, ext:0, loc:(*time.Location)(nil)}}, DeletionTimestamp:(*v1.Time)(nil), DeletionGracePeriodSeconds:(*int64)(nil), Labels:map[string]string(nil), Annotations:map[string]string(nil), OwnerReferences:[]v1.OwnerReference(nil), Initializers:(*v1.Initializers)(nil), Finalizers:[]string(nil), ClusterName:""}, Data:map[string]string(nil), BinaryData:map[string][]uint8(nil)}' due to: 'selfLink was empty, can't make reference'. Will not report event: 'Normal' 'LeaderElection' 'claytonc-mbp.local_4bb2f5bb-5f1d-4543-90dd-75de07986a26 stopped leading'
panic: close of closed channel

goroutine 63 [running]:
github.com/openshift/cluster-version-operator/pkg/start.(*Options).run.func3()
	/Users/clayton/projects/origin/src/github.com/openshift/cluster-version-operator/pkg/start/start.go:190 +0x74
github.com/openshift/cluster-version-operator/vendor/k8s.io/client-go/tools/leaderelection.(*LeaderElector).Run.func1(0xc0006e4240)
	/Users/clayton/projects/origin/src/github.com/openshift/cluster-version-operator/vendor/k8s.io/client-go/tools/leaderelection/leaderelection.go:148 +0x40
github.com/openshift/cluster-version-operator/vendor/k8s.io/client-go/tools/leaderelection.(*LeaderElector).Run(0xc0006e4240, 0x216d1e0, 0xc000184600)
	/Users/clayton/projects/origin/src/github.com/openshift/cluster-version-operator/vendor/k8s.io/client-go/tools/leaderelection/leaderelection.go:157 +0x112
github.com/openshift/cluster-version-operator/vendor/k8s.io/client-go/tools/leaderelection.RunOrDie(0x216d220, 0xc0000da010, 0x2174320, 0xc0001e5d40, 0x14f46b0400, 0xa7a358200, 0x6fc23ac00, 0xc0003b03f0, 0xc0000d13b0, 0x0)
	/Users/clayton/projects/origin/src/github.com/openshift/cluster-version-operator/vendor/k8s.io/client-go/tools/leaderelection/leaderelection.go:166 +0x87
created by github.com/openshift/cluster-version-operator/pkg/start.(*Options).run
	/Users/clayton/projects/origin/src/github.com/openshift/cluster-version-operator/pkg/start/start.go:157 +0x1ef
FAIL	github.com/openshift/cluster-version-operator/pkg/start	75.329s

Future Release Branches Frozen For Merging | branch:release-4.18 branch:release-4.19

The following branches are being fast-forwarded from the current development branch (master) as placeholders for future releases. No merging is allowed into these release branches until they are unfrozen for production release.

release-4.18
release-4.19

For more information, see the branching documentation.

Port conflict when running Calico CNI with Openshift

Hi,
I'm using the Cluster Version Operator on openshift 4.12 and OVN-Kubernetes CNI plugin.
Now I want to test my cluster with Calico, but the port 9099 is used by CVO and doesn't allow the Calico routing module (Felix) to start.
I would like to know if stop CVO can cause any issue in my cluster ?
How can I change the listening port of CVO ?
Best Regards,

getOverrideForManifest does not check manifest.GVK.Group

We have the following override in our ClusterVersion:

    - group: imageregistry.operator.openshift.io
      kind: Config
      name: cluster
      namespace: ""
      unmanaged: true

This is causing cluster provisioning to fail, because when the operator encounters this manifest...

0000_30_config-operator_01_operator.cr.yaml

apiVersion: operator.openshift.io/v1
kind: Config
metadata:
  name: cluster
  annotations:
    include.release.openshift.io/ibm-cloud-managed: "true"
    include.release.openshift.io/self-managed-high-availability: "true"
    include.release.openshift.io/single-node-developer: "true"
    release.openshift.io/create-only: "true"
spec:
  managementState: Managed

... the getOverrideForManifest function is improperly matching it to the above imageregistry.operator.openshift.io override because it disregards the Group in its comparison (imageregistry.operator.openshift.io != operator.openshift.io).

As a result, the cluster-config-operator has no custom resource to act on and it blocks the cluster-version-operator from ever completing:

$ oc get clusterversion
NAME      VERSION   AVAILABLE   PROGRESSING   SINCE   STATUS
version             False       True          3h18m   Working towards 4.9.7: 725 of 735 done (98% complete), waiting on config-operator

CVO needs to run without the service network being functional

The CVO needs to be able to install operators that take care of creating a viable service network. To do this, it could run with hostnetworking=true and cleverly setting KUBERNETES_SERVICE_PORT and KUBERNETES_SERVICE_HOST to point to the local kube-apiserver.

If we did this, I think it could come up after the apiserver, controller, scheduler and before anything else. I bumped into this while trying to get running from a kube control plane.

@mfojtik @knobunc @ironcladlou @abhinavdahiya @smarterclayton @derekwaynecarr

Ubi image unavailable to public

Hey folks,

As apart of the contributing documentation there's no mention of requiring a login to build the container image. In this case it's the registry.ci.openshift.org/ocp/ubi container image that appears to require authentication.

Can you please clarify if there is a authentication method for public contributors or if there should be a different URL there?

openshift/api <-> openshift/cluster-version-operator ClusterStatusConditionType mismatch

What happened

When CVO reports failing cluster v4.4 it returns Failing condition e.g.:

conditions:
    - lastTransitionTime: "2020-08-12T07:08:36Z"
      message: Done applying 4.4.10
      status: "True"
      type: Available
    - lastTransitionTime: "2020-08-12T06:53:47Z"
      status: "False"
      type: Failing
    - lastTransitionTime: "2020-08-12T07:08:36Z"
      message: Cluster version is 4.4.10
      status: "False"
      type: Progressing
    - lastTransitionTime: "2020-08-12T07:08:57Z"
      message: The update channel has not been configured.
      reason: NoChannel
      status: "False"
      type: RetrievedUpdates

This is an expected state related to the code: https://github.com/openshift/cluster-version-operator/blob/release-4.4/pkg/cvo/status.go#L240

However, the openshift/api expects one of the states https://github.com/openshift/api/blob/release-4.4/config/v1/types_cluster_operator.go#L141:

	OperatorAvailable ClusterStatusConditionType = "Available"
	OperatorProgressing ClusterStatusConditionType = "Progressing"
	OperatorDegraded ClusterStatusConditionType = "Degraded"
	OperatorUpgradeable ClusterStatusConditionType = "Upgradeable"

What you expected to happen

openshift/cvo and openshift/api to have mathich conditions defined.

Proxy Configuration for 4.1

Is there any way to get the communication to api.openshift.com running through a HTTP Proxy on Version 4.1.7?

We can't update the cluster over the UI because it can't reach api.openshift.com as the HTTP Proxy seems not to be configured in the operator.

Everything else on the cluster is configured to run through Proxy.

Regards

Upgrade jobs dead, clusterversion message unhelpful

Basically all our upgrade jobs are dead. The CVO should be able to say what isn't yet completed, but it just says "%x complete". We should try to make a reasonable message to aid debugging. The source error here:

all update jobs are basically dead

        {
            "apiVersion": "config.openshift.io/v1",
            "kind": "ClusterOperator",
            "metadata": {
                "creationTimestamp": "2019-06-29T12:41:31Z",
                "generation": 1,
                "name": "machine-config",
                "resourceVersion": "52233",
                "selfLink": "/apis/config.openshift.io/v1/clusteroperators/machine-config",
                "uid": "38f06bd4-9a6b-11e9-b262-12ce5335583c"
            },
            "spec": {},
            "status": {
                "conditions": [
                    {
                        "lastTransitionTime": "2019-06-29T13:48:35Z",
                        "message": "Cluster not available for 0.0.1-2019-06-29-122423",
                        "status": "False",
                        "type": "Available"
                    },
                    {
                        "lastTransitionTime": "2019-06-29T13:35:16Z",
                        "message": "Working towards 0.0.1-2019-06-29-122423",
                        "status": "True",
                        "type": "Progressing"
                    },
                    {
                        "lastTransitionTime": "2019-06-29T13:48:35Z",
                        "message": "Unable to apply 0.0.1-2019-06-29-122423: timed out waiting for the condition during syncRequiredMachineConfigPools: error pool master is not ready, retrying. Status: (pool degraded: true total: 3, ready 0, updated: 0, unavailable: 0)",
                        "reason": "FailedToSync",
                        "status": "True",
                        "type": "Degraded"
                    }
                ],
                "extension": {
                    "lastSyncError": "error pool master is not ready, retrying. Status: (pool degraded: true total: 3, ready 0, updated: 0, unavailable: 0)",
                    "master": "pool is degraded because nodes fail with \"1 nodes are reporting degraded status on sync\": \"Node ip-10-0-132-28.ec2.internal is reporting: \\\"failed to run pivot: failed to start machine-config-daemon-host.service: exit status 1\\\"\"",
                    "worker": "pool is degraded because nodes fail with \"1 nodes are reporting degraded status on sync\": \"Node ip-10-0-139-237.ec2.internal is reporting: \\\"failed to run pivot: failed to start machine-config-daemon-host.service: exit status 1\\\"\""
                },

https://openshift-gce-devel.appspot.com/builds/origin-ci-test/pr-logs/directory/pull-ci-openshift-origin-master-e2e-aws-upgrade?before=2882

first two I looked at were the machine config operator

a5a76d5 panic: interface conversion: runtime.Object is unstructured.UnstructuredList, not unstructured.Unstructured [recovered]

In a libvirt cluster I just launched using:

$ openshift-install version
openshift-install v0.1.0-52-gedc4d97104f7fefbe6ce778d18aaf53299f8af59
Terraform v0.11.8

I'm seeing:

[core@wking-bootstrap ~]$ kubectl logs -n openshift-cluster-version bootstrap-cluster-version-operator-wking-bootstrap
I1005 21:39:48.036769       1 start.go:67] ClusterVersionOperator v0.0.0-97-ga5a76d51-dirty
I1005 21:39:48.037010       1 start.go:180] Loading kube client config from path "/etc/kubernetes/kubeconfig"
...
E1005 21:42:10.909970       1 event.go:259] Could not construct reference to: '&v1.ConfigMap{TypeMeta:v1.TypeMeta{Kind:"", APIVersion:""}, ObjectMeta:v1.ObjectMeta{Name:"cluster-version-operator", GenerateName:"", Namespace:"openshift-cluster-version", SelfLink:"/api/v1/namespaces/openshift-cluster-version/configmaps/cluster-version-operator", UID:"d45bde3c-c8e4-11e8-8408-0214269547a8", ResourceVersion:"9259", Generation:0, CreationTimestamp:v1.Time{Time:time.Time{wall:0x0, ext:63674371377, loc:(*time.Location)(0x1bf25a0)}}, DeletionTimestamp:(*v1.Time)(nil), DeletionGracePeriodSeconds:(*int64)(nil), Labels:map[string]string(nil), Annotations:map[string]string{"control-plane.alpha.kubernetes.io/leader":"{\"holderIdentity\":\"wking-bootstrap_efc11fa1-5144-45db-a2cb-568952d64f05\",\"leaseDurationSeconds\":90,\"acquireTime\":\"2018-10-05T21:42:10Z\",\"renewTime\":\"2018-10-05T21:42:10Z\",\"leaderTransitions\":8}"}, OwnerReferences:[]v1.OwnerReference(nil), Initializers:(*v1.Initializers)(nil), Finalizers:[]string(nil), ClusterName:""}, Data:map[string]string(nil), BinaryData:map[string][]uint8(nil)}' due to: 'no kind is registered for the type v1.ConfigMap'. Will not report event: 'Normal' 'LeaderElection' 'wking-bootstrap_efc11fa1-5144-45db-a2cb-568952d64f05 became leader'
...
I1005 21:42:12.267560       1 sync.go:24] Running sync for (servicecertsigner.config.openshift.io/v1alpha1, Kind=ServiceCertSignerOperatorConfig) /instance
E1005 21:42:12.534057       1 memcache.go:147] couldn't get resource list for metrics.k8s.io/v1beta1: the server is currently unable to handle the request
I1005 21:42:12.556363       1 sync.go:60] Done syncing for (servicecertsigner.config.openshift.io/v1alpha1, Kind=ServiceCertSignerOperatorConfig) /instance
...
I1005 21:42:14.179926       1 sync.go:60] Done syncing for (/v1, Kind=Service) openshift-operator-lifecycle-manager/package-server
I1005 21:42:14.180040       1 sync.go:24] Running sync for (image.openshift.io/v1, Kind=ImageStream) /
I1005 21:42:14.336201       1 request.go:485] Throttling request took 155.956498ms, request: GET:https://wking-api.installer.testing:6443/apis/image.openshift.io/v1/imagestreams
I1005 21:42:14.349166       1 cvo.go:201] Finished syncing operator "openshift-cluster-version/cluster-version-operator" (3.337348292s)
E1005 21:42:14.349428       1 runtime.go:66] Observed a panic: &runtime.TypeAssertionError{interfaceString:"runtime.Object", concreteString:"*unstructured.UnstructuredList", assertedString:"*unstructured.Unstructured", missingMethod:""} (interface conversion: runtime.Object is *unstructured.UnstructuredList, not *unstructured.Unstructured)
/go/src/github.com/openshift/cluster-version-operator/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:72
/go/src/github.com/openshift/cluster-version-operator/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:65
/go/src/github.com/openshift/cluster-version-operator/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:51
/usr/local/go/src/runtime/asm_amd64.s:573
/usr/local/go/src/runtime/panic.go:502
/usr/local/go/src/runtime/iface.go:252
/usr/local/go/src/runtime/iface.go:262
/go/src/github.com/openshift/cluster-version-operator/vendor/k8s.io/client-go/dynamic/simple.go:197
/go/src/github.com/openshift/cluster-version-operator/pkg/cvo/internal/generic.go:31
/go/src/github.com/openshift/cluster-version-operator/pkg/cvo/internal/generic.go:88
/go/src/github.com/openshift/cluster-version-operator/pkg/cvo/sync.go:51
/go/src/github.com/openshift/cluster-version-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:203
/go/src/github.com/openshift/cluster-version-operator/pkg/cvo/sync.go:33
/go/src/github.com/openshift/cluster-version-operator/pkg/cvo/cvo.go:243
/go/src/github.com/openshift/cluster-version-operator/pkg/cvo/cvo.go:115
/go/src/github.com/openshift/cluster-version-operator/pkg/cvo/cvo.go:173
/go/src/github.com/openshift/cluster-version-operator/pkg/cvo/cvo.go:162
/go/src/github.com/openshift/cluster-version-operator/pkg/cvo/cvo.go:146
/go/src/github.com/openshift/cluster-version-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:133
/go/src/github.com/openshift/cluster-version-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:134
/go/src/github.com/openshift/cluster-version-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:88
/usr/local/go/src/runtime/asm_amd64.s:2361
panic: interface conversion: runtime.Object is *unstructured.UnstructuredList, not *unstructured.Unstructured [recovered]
	panic: interface conversion: runtime.Object is *unstructured.UnstructuredList, not *unstructured.Unstructured

goroutine 76 [running]:
github.com/openshift/cluster-version-operator/vendor/k8s.io/apimachinery/pkg/util/runtime.HandleCrash(0x0, 0x0, 0x0)
	/go/src/github.com/openshift/cluster-version-operator/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:58 +0x107
panic(0x11562c0, 0xc4202d3e00)
	/usr/local/go/src/runtime/panic.go:502 +0x229
github.com/openshift/cluster-version-operator/vendor/k8s.io/client-go/dynamic.(*dynamicResourceClient).Get(0xc420e173b0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
	/go/src/github.com/openshift/cluster-version-operator/vendor/k8s.io/client-go/dynamic/simple.go:197 +0x90f
github.com/openshift/cluster-version-operator/pkg/cvo/internal.applyUnstructured(0x13e3760, 0xc420e173b0, 0xc4200a6ca0, 0xc4200a6ca0, 0x0, 0x0, 0x13e3760)
	/go/src/github.com/openshift/cluster-version-operator/pkg/cvo/internal/generic.go:31 +0x99
github.com/openshift/cluster-version-operator/pkg/cvo/internal.(*genericBuilder).Do(0xc420576180, 0xc420405b80, 0x29f)
	/go/src/github.com/openshift/cluster-version-operator/pkg/cvo/internal/generic.go:88 +0x72
github.com/openshift/cluster-version-operator/pkg/cvo.(*Operator).syncUpdatePayload.func1(0xa, 0x0, 0x0)
	/go/src/github.com/openshift/cluster-version-operator/pkg/cvo/sync.go:51 +0x241
github.com/openshift/cluster-version-operator/vendor/k8s.io/apimachinery/pkg/util/wait.ExponentialBackoff(0x2540be400, 0x3ff4cccccccccccd, 0x0, 0x3, 0xc4207e5b20, 0x2c0, 0xc42041ef00)
	/go/src/github.com/openshift/cluster-version-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:203 +0x9c
github.com/openshift/cluster-version-operator/pkg/cvo.(*Operator).syncUpdatePayload(0xc42043cb00, 0xc420481380, 0xc4205511a0, 0x3b, 0xc4205511a0)
	/go/src/github.com/openshift/cluster-version-operator/pkg/cvo/sync.go:33 +0x749
github.com/openshift/cluster-version-operator/pkg/cvo.(*Operator).sync(0xc42043cb00, 0xc4203d4440, 0x32, 0x0, 0x0)
	/go/src/github.com/openshift/cluster-version-operator/pkg/cvo/cvo.go:243 +0x49a
github.com/openshift/cluster-version-operator/pkg/cvo.(*Operator).(github.com/openshift/cluster-version-operator/pkg/cvo.sync)-fm(0xc4203d4440, 0x32, 0xc4203e3b00, 0x10e38c0)
	/go/src/github.com/openshift/cluster-version-operator/pkg/cvo/cvo.go:115 +0x3e
github.com/openshift/cluster-version-operator/pkg/cvo.(*Operator).processNextWorkItem(0xc42043cb00, 0xc4203d2800)
	/go/src/github.com/openshift/cluster-version-operator/pkg/cvo/cvo.go:173 +0xe0
github.com/openshift/cluster-version-operator/pkg/cvo.(*Operator).worker(0xc42043cb00)
	/go/src/github.com/openshift/cluster-version-operator/pkg/cvo/cvo.go:162 +0x2b
github.com/openshift/cluster-version-operator/pkg/cvo.(*Operator).(github.com/openshift/cluster-version-operator/pkg/cvo.worker)-fm()
	/go/src/github.com/openshift/cluster-version-operator/pkg/cvo/cvo.go:146 +0x2a
github.com/openshift/cluster-version-operator/vendor/k8s.io/apimachinery/pkg/util/wait.JitterUntil.func1(0xc42026fae0)
	/go/src/github.com/openshift/cluster-version-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:133 +0x54
github.com/openshift/cluster-version-operator/vendor/k8s.io/apimachinery/pkg/util/wait.JitterUntil(0xc42026fae0, 0x3b9aca00, 0x0, 0x1, 0xc42008c900)
	/go/src/github.com/openshift/cluster-version-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:134 +0xbd
github.com/openshift/cluster-version-operator/vendor/k8s.io/apimachinery/pkg/util/wait.Until(0xc42026fae0, 0x3b9aca00, 0xc42008c900)
	/go/src/github.com/openshift/cluster-version-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:88 +0x4d
created by github.com/openshift/cluster-version-operator/pkg/cvo.(*Operator).Run
	/go/src/github.com/openshift/cluster-version-operator/pkg/cvo/cvo.go:146 +0x1d0

Does the clusterversion still have `.status.current.image`?

Just run the command at https://github.com/openshift/cluster-version-operator/blob/master/docs/dev/clusterversion.md#finding-your-current-update-image , and the output is empty for my cluster as follows:

[root@ocp42-inf ~]# oc get clusterversion -o jsonpath='{.status.current.image}{"\n"}' version

Then check the API of ClusterVersionStatus at https://github.com/openshift/api/blob/master/config/v1/types_cluster_version.go#L81 and found that current is not a field any more, am I missing anything? Shall I get this from .status.history[0].image?

[root@ocp42-inf ~]# oc get clusterversion -o jsonpath='{.status.history[0].image}{"\n"}' version
quay.io/openshift-release-dev/ocp-release@sha256:c5337afd85b94c93ec513f21c8545e3f9e36a227f55d41bc1dfb8fcc3f2be129

Question: auto update enabled

Question: How does the CVO monitors for new image when auto-update is enabled?

How does it looks for new ocp-release image when pushed to the container catalog(which would be used in future for ocp 4.0 if I am not wrong)?

Let me know your thoughts on it.

Some informers do not check if the cache `HasSynced()`

Hi, team! I'm a newbie to openshift. As I read the source code about CVO, I found only check coInformer's and cvInformer's cache HasSynced(), but did not check others. Is that a special design?

cluster-version-operator/pkg/cvo/cvo.go

Lines 215 to 223 in dfe5ef5

 optr.coLister = coInformer.Lister() 

 optr.cacheSynced = append(optr.cacheSynced, coInformer.Informer().HasSynced) 

 optr.cvLister = cvInformer.Lister() 

 optr.cacheSynced = append(optr.cacheSynced, cvInformer.Informer().HasSynced) 

 optr.proxyLister = proxyInformer.Lister() 

 optr.cmConfigLister = cmConfigInformer.Lister().ConfigMaps(internal.ConfigNamespace) 

 optr.cmConfigManagedLister = cmConfigManagedInformer.Lister().ConfigMaps(internal.ConfigManagedNamespace)

Imagestreams.image.openshift.io "origin-v4.0" not found

The error message displayed imagestreams.image.openshift.io "origin-v4.0" not found when I execute the command as follows:

oc adm release new -n openshift --server https://api.ci.openshift.org \
    --from-image-stream=origin-v4.0 \
    --to-image-base=docker.io/abhinavdahiya/origin-cluster-version-operator:latest \
    --to-image docker.io/abhinavdahiya/origin-release:latest

The step follow the doc at here.

How can I execute this command correctly?

CVO cannot rely on openshift resources

The CVO is creating an SCC for itself. This fails when installing on a kube-cluster and since the openshift-apiserver is created via an operator installed by the CVO, this creates a cycle.

Instead, create a clusterrole and clusterrolebinding for the SCC that will eventually exist.

Future Release Branches Frozen For Merging | branch:release-4.16 branch:release-4.17

release-4.16
release-4.17

For more information, see the branching documentation.

cluster version operator is not starting console

While starting openshift cluster we saw that we don't have running console operator.

Operator logs are here:
https://pastebin.pl/view/9a2143c2

After restarting operator's pod everything worked fine

pkg/cvo: sync_worker reports incorrect metrics on Complete()

Referring to this code:

cluster-version-operator/pkg/cvo/sync_worker.go

Lines 545 to 546 in 70c0232

 metricPayload.WithLabelValues(r.version, "pending").Set(float64(r.total)) 

 metricPayload.WithLabelValues(r.version, "applied").Set(float64(r.total))

Assuming the sync went correctly this could look like:

metricPayload.WithLabelValues(r.version, "pending").Set(float64(r.total-r.done)) 
metricPayload.WithLabelValues(r.version, "applied").Set(float64(r.done))

which should be the same as:

metricPayload.WithLabelValues(r.version, "pending").Set(float64(0)) 
metricPayload.WithLabelValues(r.version, "applied").Set(float64(r.total))

I suggest to put it to the test and use the former.

Cluster upgrade internal error caused by self signed cert

Upgrade from 4.7.0 to 4.7.9, the x509 error occurred in 2 places, one is calling api-int.xxxx and prometheus-operator.openshift-monitoring.svc:8080, was able to fix the first one by manually update-ca-trust the cert from the API server. However not sure how to do with 2nd one since it's a cluster internal URI.

cluster-version-operator log shown below:

I0523 01:03:07.033591       1 cvo.go:481] Started syncing cluster version "openshift-cluster-version/version" (2021-05-23 01:03:07.033585342 +0000 UTC m=+82331.507064819)
I0523 01:03:07.041158       1 cvo.go:510] Desired version from spec is v1.Update{Version:"4.7.9", Image:"quay.io/openshift-release-dev/ocp-release@sha256:5a5433a5f82a10c78783d7aed7d556d26602295ee8e9dcfaba97ebc1ab0bc2ac", Force:false}
I0523 01:03:07.041264       1 sync_worker.go:227] Update work is equal to current target; no change required
I0523 01:03:07.041289       1 status.go:161] Synchronizing errs=field.ErrorList{} status=&cvo.SyncWorkerStatus{Generation:2, Step:"ApplyResources", Failure:error(nil), Done:8, Total:668, Completed:0, Reconciling:false, Initial:false, VersionHash:"qi_N6BhDM3k=", LastProgress:time.Time{wall:0x0, ext:0, loc:(*time.Location)(nil)}, Actual:v1.Release{Version:"4.7.9", Image:"quay.io/openshift-release-dev/ocp-release@sha256:5a5433a5f82a10c78783d7aed7d556d26602295ee8e9dcfaba97ebc1ab0bc2ac", URL:"https://access.redhat.com/errata/RHBA-2021:1365", Channels:[]string(nil)}, Verified:false}
I0523 01:03:07.041331       1 status.go:81] merge into existing history completed=false desired=v1.Release{Version:"4.7.9", Image:"quay.io/openshift-release-dev/ocp-release@sha256:5a5433a5f82a10c78783d7aed7d556d26602295ee8e9dcfaba97ebc1ab0bc2ac", URL:"https://access.redhat.com/errata/RHBA-2021:1365", Channels:[]string{"candidate-4.7", "candidate-4.8", "fast-4.7", "stable-4.7"}} last=&v1.UpdateHistory{State:"Partial", StartedTime:v1.Time{Time:time.Time{wall:0x0, ext:63757219493, loc:(*time.Location)(0x223c360)}}, CompletionTime:(*v1.Time)(nil), Version:"4.7.9", Image:"quay.io/openshift-release-dev/ocp-release@sha256:5a5433a5f82a10c78783d7aed7d556d26602295ee8e9dcfaba97ebc1ab0bc2ac", Verified:true}
I0523 01:03:07.041474       1 cvo.go:483] Finished syncing cluster version "openshift-cluster-version/version" (7.885146ms)
E0523 01:03:07.136035       1 task.go:112] error running apply for prometheusrule "openshift-cluster-version/cluster-version-operator" (9 of 668): Internal error occurred: failed calling webhook "prometheusrules.openshift.io": Post "https://prometheus-operator.openshift-monitoring.svc:8080/admission-prometheusrules/validate?timeout=5s": x509: certificate signed by unknown authority
I0523 01:03:07.193126       1 cvo.go:554] Finished syncing available updates "openshift-cluster-version/version" (162.643079ms

Assign a priority class to pods

Priority classes docs:
https://docs.openshift.com/container-platform/3.11/admin_guide/scheduling/priority_preemption.html#admin-guide-priority-preemption-priority-class

Example: https://github.com/openshift/cluster-monitoring-operator/search?q=priority&unscoped_q=priority

Notes: The pre-configured system priority classes (system-node-critical and system-cluster-critical) can only be assigned to pods in kube-system or openshift-* namespaces. Most likely, core operators and their pods should be assigned system-cluster-critical. Please do not assign system-node-critical (the highest priority) unless you are really sure about it.

Question: how to config the CSV to use the latest version of one operator?

Hi, All

I built a cluster by using the openshif-installer v0.6.0 today. But, I find the manifest of the OLM is old, so how to use the latest version of the OLM? Details as below:

[jzhang@dhcp-140-18 installer]$  oc get clusterversion version -o yaml
...
    current:
      payload: quay.io/openshift-release-dev/ocp-release@sha256:4f02d5c7183360a519a7c7dbe601f58123c9867cd5721ae503072ae62920575b
      version: 0.0.1-2018-12-08-172651
...
[jzhang@dhcp-140-18 installer]$ oc adm release extract --from=quay.io/openshift-release-dev/ocp-release@sha256:4f02d5c7183360a519a7c7dbe601f58123c9867cd5721ae503072ae62920575b --to=release-payload
[jzhang@dhcp-140-18 installer]$ ls release-payload/ | grep 30
0000_30_00-namespace.yaml
0000_30_01-olm-operator.serviceaccount.yaml
0000_30_02-clusterserviceversion.crd.yaml
0000_30_03-installplan.crd.yaml
0000_30_04-subscription.crd.yaml
0000_30_05-catalogsource.crd.yaml
0000_30_06-rh-operators.configmap.yaml
0000_30_07-certified-operators.configmap.yaml
0000_30_08-certified-operators.catalogsource.yaml
0000_30_09-rh-operators.catalogsource.yaml
0000_30_10-olm-operator.deployment.yaml
0000_30_11-catalog-operator.deployment.yaml
0000_30_12-aggregated.clusterrole.yaml
0000_30_13-packageserver.csv.yaml
0000_30_14-operatorgroup.crd.yaml

One more question, if I want to update this CVO to the latest version, how to do that? The current version is 0.0.1-2018-12-08-172651.

	optr.coLister = coInformer.Lister()
	optr.cacheSynced = append(optr.cacheSynced, coInformer.Informer().HasSynced)

	optr.cvLister = cvInformer.Lister()
	optr.cacheSynced = append(optr.cacheSynced, cvInformer.Informer().HasSynced)

	optr.proxyLister = proxyInformer.Lister()
	optr.cmConfigLister = cmConfigInformer.Lister().ConfigMaps(internal.ConfigNamespace)
	optr.cmConfigManagedLister = cmConfigManagedInformer.Lister().ConfigMaps(internal.ConfigManagedNamespace)

	metricPayload.WithLabelValues(r.version, "pending").Set(float64(r.total))
	metricPayload.WithLabelValues(r.version, "applied").Set(float64(r.total))

openshift / cluster-version-operator Goto Github PK

cluster-version-operator's Issues

Recommend Projects

Recommend Topics

Recommend Org