Git Product home page Git Product logo

cluster-api-provider-nested's Introduction

Kubernetes Cluster API Provider Nested

Cluster API Provider for Nested Clusters

Community, discussion, contribution, and support

Learn how to engage with the Kubernetes community on the community page.

You can reach the maintainers of this project at:

  • Slack
  • Mailing List
  • Join our Cluster API Provider Nested working group sessions
    • Weekly on Tuesdays @ 10:00 PT
    • Previous meetings: notes

Code of conduct

Participation in the Kubernetes community is governed by the Kubernetes Code of Conduct.

cluster-api-provider-nested's People

Contributors

charleszheng44 avatar christopherhein avatar cpanato avatar crazywill avatar dependabot[bot] avatar evan-whitehouse avatar fenglixa avatar gyliu513 avatar hanlins avatar jichenjc avatar jinsongo avatar jzhoucliqr avatar k8s-ci-robot avatar kaushik229 avatar lubingtan avatar lukeweber avatar m-messiah avatar nikhita avatar rjsadow avatar srirammageswaran8 avatar stmcginnis avatar vincent-pli avatar vincepri avatar weiling61 avatar wondywang avatar ydp avatar yoonmac avatar yuanchen8911 avatar zhouhao3 avatar zhuangqh avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

cluster-api-provider-nested's Issues

[Quick Start] Failed connect to cluster

I was following https://github.com/kubernetes-sigs/cluster-api-provider-nested/blob/main/docs/README.md to run a quick. test for CAPN, but in last step, connect to the cluster failed, can anyone help share some insight for what is wrong?

@charleszheng44 ^^

Guangyas-MacBook-Pro:kubernetes-sigs guangyaliu$ kubectl --kubeconfig kubeconfig get all -A
Unable to connect to the server: net/http: TLS handshake timeout

I can see the nested control plane is running:

Guangyas-MacBook-Pro:kubernetes-sigs guangyaliu$ kubectl get pod
NAME                                  READY   STATUS    RESTARTS   AGE
cluster-sample-apiserver-0            1/1     Running   0          17m
cluster-sample-controller-manager-0   1/1     Running   2          17m
cluster-sample-etcd-0                 1/1     Running   0          17m

But the cluster status is Provisioning

Guangyas-MacBook-Pro:kubernetes-sigs guangyaliu$ kubectl get clusters
NAME             PHASE
cluster-sample   Provisioning
Guangyas-MacBook-Pro:kubernetes-sigs guangyaliu$ kubectl get clusters -oyaml
apiVersion: v1
items:
- apiVersion: cluster.x-k8s.io/v1alpha4
  kind: Cluster
  metadata:
    annotations:
      kubectl.kubernetes.io/last-applied-configuration: |
        {"apiVersion":"cluster.x-k8s.io/v1alpha4","kind":"Cluster","metadata":{"annotations":{},"name":"cluster-sample","namespace":"default"},"spec":{"controlPlaneEndpoint":{"host":"cluster-sample-apiserver","port":6443},"controlPlaneRef":{"apiVersion":"controlplane.cluster.x-k8s.io/v1alpha4","kind":"NestedControlPlane","name":"nestedcontrolplane-sample","namespace":"default"},"infrastructureRef":{"apiVersion":"infrastructure.cluster.x-k8s.io/v1alpha4","kind":"NestedCluster","name":"nestedcluster-sample","namespace":"default"}}}
    creationTimestamp: "2021-05-22T14:41:16Z"
    finalizers:
    - cluster.cluster.x-k8s.io
    generation: 1
    managedFields:
    - apiVersion: cluster.x-k8s.io/v1alpha4
      fieldsType: FieldsV1
      fieldsV1:
        f:metadata:
          f:annotations:
            .: {}
            f:kubectl.kubernetes.io/last-applied-configuration: {}
        f:spec:
          .: {}
          f:controlPlaneEndpoint:
            .: {}
            f:host: {}
            f:port: {}
          f:controlPlaneRef:
            .: {}
            f:apiVersion: {}
            f:kind: {}
            f:name: {}
            f:namespace: {}
          f:infrastructureRef:
            .: {}
            f:apiVersion: {}
            f:kind: {}
            f:name: {}
            f:namespace: {}
      manager: kubectl-client-side-apply
      operation: Update
      time: "2021-05-22T14:41:16Z"
    - apiVersion: cluster.x-k8s.io/v1alpha4
      fieldsType: FieldsV1
      fieldsV1:
        f:metadata:
          f:finalizers:
            .: {}
            v:"cluster.cluster.x-k8s.io": {}
        f:status:
          .: {}
          f:conditions: {}
          f:controlPlaneReady: {}
          f:observedGeneration: {}
          f:phase: {}
      manager: manager
      operation: Update
      time: "2021-05-22T14:44:32Z"
    name: cluster-sample
    namespace: default
    resourceVersion: "4670"
    uid: c2884e62-87f0-4709-85c5-c6a97e85a631
  spec:
    controlPlaneEndpoint:
      host: cluster-sample-apiserver
      port: 6443
    controlPlaneRef:
      apiVersion: controlplane.cluster.x-k8s.io/v1alpha4
      kind: NestedControlPlane
      name: nestedcontrolplane-sample
      namespace: default
    infrastructureRef:
      apiVersion: infrastructure.cluster.x-k8s.io/v1alpha4
      kind: NestedCluster
      name: nestedcluster-sample
      namespace: default
  status:
    conditions:
    - lastTransitionTime: "2021-05-22T14:41:45Z"
      reason: WaitingForInfrastructure
      severity: Info
      status: "False"
      type: Ready
    - lastTransitionTime: "2021-05-22T14:41:45Z"
      status: "True"
      type: ControlPlaneInitialized
    - lastTransitionTime: "2021-05-22T14:41:45Z"
      status: "True"
      type: ControlPlaneReady
    - lastTransitionTime: "2021-05-22T14:41:17Z"
      reason: WaitingForInfrastructure
      severity: Info
      status: "False"
      type: InfrastructureReady
    controlPlaneReady: true
    observedGeneration: 1
    phase: Provisioning
kind: List
metadata:
  resourceVersion: ""
  selfLink: ""

Test fails with race detected

How to reproduce?

Sync to ToT and run make

=== RUN   TestReconcile
==================
WARNING: DATA RACE
Write at 0x00c0001146a8 by goroutine 41:
  internal/race.Write()
      /usr/local/go/src/internal/race/race.go:41 +0x114
  sync.(*WaitGroup).Wait()
      /usr/local/go/src/sync/waitgroup.go:128 +0x115
  sigs.k8s.io/controller-runtime/pkg/manager.(*controllerManager).waitForRunnableToEnd.func2()
      /Users/f.guo/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/manager/internal.go:567 +0x40

Previous read at 0x00c0001146a8 by goroutine 173:
  internal/race.Read()
      /usr/local/go/src/internal/race/race.go:37 +0x1e8
  sync.(*WaitGroup).Add()
      /usr/local/go/src/sync/waitgroup.go:71 +0x1fb
  sigs.k8s.io/controller-runtime/pkg/manager.(*controllerManager).startRunnable()
      /Users/f.guo/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/manager/internal.go:678 +0x4e
  sigs.k8s.io/controller-runtime/pkg/manager.(*controllerManager).serveMetrics()
      /Users/f.guo/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/manager/internal.go:384 +0x318

Goroutine 41 (running) created at:
  sigs.k8s.io/controller-runtime/pkg/manager.(*controllerManager).waitForRunnableToEnd()
      /Users/f.guo/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/manager/internal.go:566 +0xc6
  sigs.k8s.io/controller-runtime/pkg/manager.(*controllerManager).engageStopProcedure()
      /Users/f.guo/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/manager/internal.go:548 +0x370
  sigs.k8s.io/controller-runtime/pkg/manager.(*controllerManager).Start.func1()
      /Users/f.guo/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/manager/internal.go:449 +0x49
  sigs.k8s.io/controller-runtime/pkg/manager.(*controllerManager).Start()
      /Users/f.guo/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/manager/internal.go:499 +0x573
  sigs.k8s.io/cluster-api-provider-nested/virtualcluster/pkg/controller/vcmanager.(*VirtualClusterManager).Start()
      <autogenerated>:1 +0x7d
  sigs.k8s.io/cluster-api-provider-nested/virtualcluster/pkg/controller/virtualcluster.StartTestManager.func1()
      /Users/f.guo/go/src/sigs.k8s.io/cluster-api-provider-nested/virtualcluster/pkg/controller/virtualcluster/virtualcluster_controller_suite_test.go:73 +0xb0

Goroutine 173 (running) created at:
  sigs.k8s.io/controller-runtime/pkg/manager.(*controllerManager).Start()
      /Users/f.guo/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/manager/internal.go:473 +0x5d4
  sigs.k8s.io/cluster-api-provider-nested/virtualcluster/pkg/controller/vcmanager.(*VirtualClusterManager).Start()
      <autogenerated>:1 +0x7d
  sigs.k8s.io/cluster-api-provider-nested/virtualcluster/pkg/controller/virtualcluster.StartTestManager.func1()
      /Users/f.guo/go/src/sigs.k8s.io/cluster-api-provider-nested/virtualcluster/pkg/controller/virtualcluster/virtualcluster_controller_suite_test.go:73 +0xb0
==================
    TestReconcile: testing.go:906: race detected during execution of test
--- FAIL: TestReconcile (0.16s)
    : testing.go:906: race detected during execution of test
FAIL
coverage: 10.7% of statements
FAIL	sigs.k8s.io/cluster-api-provider-nested/virtualcluster/pkg/controller/virtualcluster	11.843s

it can be 100% reproduced in my local machine.

Change flag names in `main.go`

To help align the flags in this provider with Kubernetes component standards we should change the flag names https://github.com/kubernetes-sigs/cluster-api-provider-nested/blob/master/main.go#L66-L79

	fs.StringVar(&metricsAddr, "metrics-bind-address", ":8080",
		"The address the metric endpoint binds to.")

	fs.BoolVar(&enableLeaderElection, "leader-elect", false,
		"Enable leader election for controller manager. Enabling this will ensure there is only one active controller manager.")

	fs.DurationVar(&leaderElectionLeaseDuration, "leader-elect-lease-duration", 15*time.Second,
		"Interval at which non-leader candidates will wait to force acquire leadership (duration string)")

	fs.DurationVar(&leaderElectionRenewDeadline, "leader-elect-renew-deadline", 10*time.Second,
		"Duration that the leading controller manager will retry refreshing leadership before giving up (duration string)")

	fs.DurationVar(&leaderElectionRetryPeriod, "leader-elect-retry-period", 2*time.Second,
		"Duration the LeaderElector clients should wait between tries of actions (duration string)")

๐Ÿ› Incorrect Manifest Path for Releases

Logs from Prow release Job.

Step #0: make set-manifest-image MANIFEST_IMG=gcr.io/k8s-staging-cluster-api-nested/cluster-api-nested-controller MANIFEST_TAG=v20210608-e593785 TARGET_RESOURCE="./config/manager/manager_image_patch.yaml"
Step #0: make[3]: Entering directory '/workspace'
Step #0: fatal: not a git repository (or any parent up to mount point /)
Step #0: Stopping at filesystem boundary (GIT_DISCOVERY_ACROSS_FILESYSTEM not set).
Step #0: Updating kustomize image patch file for manager resource
Step #0: sed -i'' -e 's@image: .*@image: '"gcr.io/k8s-staging-cluster-api-nested/cluster-api-nested-controller:v20210608-e593785"'@' ./config/manager/manager_image_patch.yaml
Step #0: make[3]: Leaving directory '/workspace'
Step #0: make[2]: Leaving directory '/workspace'
Step #0: sed: ./config/manager/manager_image_patch.yaml: No such file or directory
Step #0: make[3]: *** [Makefile:300: set-manifest-image] Error 1
Step #0: make[2]: *** [Makefile:289: docker-push-core-manifest] Error 2
Step #0: make[1]: *** [Makefile:268: docker-push-all] Error 2
Step #0: make[1]: Leaving directory '/workspace'
Step #0: make: *** [Makefile:346: release-staging] Error 2
Finished Step #0
ERROR
ERROR: build step 0 "gcr.io/k8s-testimages/gcb-docker-gcloud:v20200619-68869a4" failed: step exited with non-zero status: 2

[Quick Start] Failed to connect to cluster

Guangyas-MacBook-Pro:kubernetes-sigs guangyaliu$ kubectl --kubeconfig kubeconfig get all -A
Unable to connect to the server: dial tcp: lookup cluster-sample-apiserver on 1.1.1.2:53: server misbehaving
Guangyas-MacBook-Pro:kubernetes-sigs guangyaliu$ kubectl get svc
NAME                       TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)    AGE
cluster-sample-apiserver   ClusterIP   10.96.206.199   <none>        6443/TCP   5m7s
cluster-sample-etcd        ClusterIP   None            <none>        <none>     5m9s
kubernetes                 ClusterIP   10.96.0.1       <none>        443/TCP    24m

@charleszheng44 ^^

๐ŸŒฑ Add Release Prow Jobs

Prior to the first version being released we need to add to the scripts/ dir the proper release scripts as well as setup Prow release jobs. For now we should have just CAPN covered, we'll eventually create separate jobs for virtualcluster releases.

TODOS:

High Level CAPN Goals

Prelim Goals:

  • To configure new resource types for nested pod based control planes
  • To enable declarative orchestrated control plane upgrades (minus etcd)
  • To enable independent scaling of each of the control plane components
  • To enable configuring the control plane components
  • To enable the use of custom implementations for all components

Prelim Non-Goals:

  • To manage etcd, it's expected that etcd can be run separately with an integration point
  • To provide CNI configuration this is left to the management or super cluster

Enable different kubernetes distribution support

This is an issue migrate from kubernetes-retired/multi-tenancy#1479

The virtualcluster only support vanila Kubernetes now, but the real use case is that many customers are using some different Kubernetes distributions, it would be great to enable the virtualcluster can support OpenShift and other Kubernetes clusters which has passed the conformance test at https://github.com/cncf/k8s-conformance/tree/master/v1.20

What is the plan to enable multiple kubernetes distribution support? Anyone is working for this?

@Fei-Guo @christopherhein

High Level User Stories

  1. As a control plane operator, I want my Kubernetes control plane to have multiple replicas to meet my zero down time needs.
  2. As a control plane operator, I want to be able declare how each control plane is configured.
  3. As a control plane operator, I want to be able to rotate the certificates on all components so that my cluster continues to run
  4. As a developer, I want to be able to use control plane level resources without conflicting with another control plane
  5. As a control plane operator, I want to be able to upgrade to a minor version so my cluster remains supported.
  6. As a control plane operator, I want to be able to know my cluster is working properly after it's been created.

Who will fill out the ownerreference of the component CR.

As discussed in the PR #11 , we all agree that there will be multiple CRs, NestedAPIServer, NestedEtcd and NestedControllerManager, to associate them, we plan to set each component CR's metav1.OwnerReference as the owner NCP. But who will fill out the OwnerReference for component CRs?

We can let the end user do it, but the metav1.OwnerReference are normally filled out by the controller/operator (as the OwnerReference contains field like ObjectUID that is normally unknonw in advance). I think a more conventional method would be grouping the CRs using the label, and let the NCP controller fill out the metav1.OwnerReference for the component CRs.

For example, we have an NCP CR named NCP1, and for each of its component CR, end user will need to set metav1.Labels[ownerNCP] = NCP1, then after user applying them, the NCP controller will add OwnerReference for them.

What do you think? @christopherhein @vincepri @Fei-Guo @brightzheng100

๐Ÿ› Post Submit Image Builds

All Post Submit Image builds are failing because of the wrong GCP project being specified. We need to update the Makefile to point to the correct registry URL w/ project name.

Move images from docker hub to quay

Now all of the images for VC are hosted in docker hub.

The following cmd will create a ClusterVersion named cv-sample-np, which specifies the tenant master components as:

  • etcd: a StatefulSet with virtualcluster/etcd-v3.4.0 image, 1 replica;
  • apiServer: a StatefulSet with virtualcluster/apiserver-v1.16.2 image, 1 replica;
  • controllerManager: a StatefulSet with virtualcluster/controller-manager-v1.16.2 image, 1 replica.

When testing with Kind, I often got some problems of docker rate limit and cannot pull the image.

Is it possible to move the images from docker hub to quay.io? Then we will not have such problem.

root@gyliu-dev21:~/.docker# kubectl get pods
NAME                                  READY   STATUS             RESTARTS   AGE
cluster-sample-apiserver-0            0/1     ImagePullBackOff   0          9m43s
cluster-sample-controller-manager-0   0/1     ImagePullBackOff   0          9m55s
cluster-sample-etcd-0                 1/1     Running            0          9m48s
root@gyliu-dev21:~/.docker# kubectl describe po cluster-sample-apiserver-0
Name:         cluster-sample-apiserver-0
Namespace:    default
Priority:     0
Node:         capn-control-plane/172.18.0.2
Start Time:   Sun, 23 May 2021 18:53:43 -0700
Labels:       component-name=nestedapiserver-sample
              controller-revision-hash=cluster-sample-apiserver-7bff79549
              statefulset.kubernetes.io/pod-name=cluster-sample-apiserver-0
Annotations:  <none>
Status:       Pending
IP:           10.244.0.12
IPs:
  IP:           10.244.0.12
Controlled By:  StatefulSet/cluster-sample-apiserver
Containers:
  nestedapiserver-sample:
    Container ID:
    Image:         virtualcluster/apiserver-v1.16.2
    Image ID:
    Port:          6443/TCP
    Host Port:     0/TCP
    Command:
      kube-apiserver
    Args:
      --bind-address=0.0.0.0
      --allow-privileged=true
      --anonymous-auth=true
      --client-ca-file=/etc/kubernetes/pki/apiserver/ca/tls.crt
      --tls-cert-file=/etc/kubernetes/pki/apiserver/tls.crt
      --tls-private-key-file=/etc/kubernetes/pki/apiserver/tls.key
      --kubelet-https=true
      --kubelet-certificate-authority=/etc/kubernetes/pki/apiserver/ca/tls.crt
      --kubelet-client-certificate=/etc/kubernetes/pki/kubelet/tls.crt
      --kubelet-client-key=/etc/kubernetes/pki/kubelet/tls.key
      --kubelet-preferred-address-types=InternalIP,ExternalIP
      --enable-bootstrap-token-auth=true
      --etcd-servers=https://cluster-sample-etcd-0.cluster-sample-etcd.$(NAMESPACE):2379
      --etcd-cafile=/etc/kubernetes/pki/etcd/ca/tls.crt
      --etcd-certfile=/etc/kubernetes/pki/etcd/tls.crt
      --etcd-keyfile=/etc/kubernetes/pki/etcd/tls.key
      --service-account-key-file=/etc/kubernetes/pki/service-account/tls.key
      --service-cluster-ip-range=10.32.0.0/16
      --service-node-port-range=30000-32767
      --authorization-mode=Node,RBAC
      --runtime-config=api/all
      --enable-admission-plugins=NamespaceLifecycle,NodeRestriction,LimitRanger,ServiceAccount,DefaultStorageClass,ResourceQuota
      --apiserver-count=1
      --endpoint-reconciler-type=master-count
      --v=2
    State:          Waiting
      Reason:       ImagePullBackOff
    Ready:          False
    Restart Count:  0
    Liveness:       tcp-socket :6443 delay=15s timeout=15s period=10s #success=1 #failure=8
    Readiness:      http-get https://:6443/healthz delay=5s timeout=30s period=2s #success=1 #failure=8
    Environment:
      NAMESPACE:  default (v1:metadata.namespace)
    Mounts:
      /etc/kubernetes/pki/apiserver from cluster-sample-apiserver-client (ro)
      /etc/kubernetes/pki/apiserver/ca from cluster-sample-ca (ro)
      /etc/kubernetes/pki/etcd from cluster-sample-etcd-client (ro)
      /etc/kubernetes/pki/etcd/ca from cluster-sample-etcd-ca (ro)
      /etc/kubernetes/pki/kubelet from cluster-sample-kubelet-client (ro)
      /etc/kubernetes/pki/service-account from cluster-sample-sa (ro)
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-fltrm (ro)
Conditions:
  Type              Status
  Initialized       True
  Ready             False
  ContainersReady   False
  PodScheduled      True
Volumes:
  cluster-sample-apiserver-client:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  cluster-sample-apiserver-client
    Optional:    false
  cluster-sample-etcd-ca:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  cluster-sample-etcd
    Optional:    false
  cluster-sample-etcd-client:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  cluster-sample-etcd-client
    Optional:    false
  cluster-sample-kubelet-client:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  cluster-sample-kubelet-client
    Optional:    false
  cluster-sample-ca:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  cluster-sample-ca
    Optional:    false
  cluster-sample-sa:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  cluster-sample-sa
    Optional:    false
  default-token-fltrm:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  default-token-fltrm
    Optional:    false
QoS Class:       BestEffort
Node-Selectors:  <none>
Tolerations:     node.kubernetes.io/not-ready:NoExecute for 300s
                 node.kubernetes.io/unreachable:NoExecute for 300s
Events:
  Type     Reason     Age                     From                         Message
  ----     ------     ----                    ----                         -------
  Normal   Scheduled  9m51s                   default-scheduler            Successfully assigned default/cluster-sample-apiserver-0 to capn-control-plane
  Normal   Pulling    8m (x4 over 9m50s)      kubelet, capn-control-plane  Pulling image "virtualcluster/apiserver-v1.16.2"
  Warning  Failed     7m55s (x4 over 9m44s)   kubelet, capn-control-plane  Failed to pull image "virtualcluster/apiserver-v1.16.2": rpc error: code = Unknown desc = failed to pull and unpack image "docker.io/virtualcluster/apiserver-v1.16.2:latest": failed to copy: httpReaderSeeker: failed open: unexpected status code https://registry-1.docker.io/v2/virtualcluster/apiserver-v1.16.2/manifests/sha256:81fc8bb510b07535525413b725aed05765b56961c1f4ed28b92ba30acd65f6fb: 429 Too Many Requests - Server message: toomanyrequests: You have reached your pull rate limit. You may increase the limit by authenticating and upgrading: https://www.docker.com/increase-rate-limit
  Warning  Failed     7m55s (x4 over 9m44s)   kubelet, capn-control-plane  Error: ErrImagePull
  Warning  Failed     7m41s (x6 over 9m43s)   kubelet, capn-control-plane  Error: ImagePullBackOff
  Normal   BackOff    4m38s (x19 over 9m43s)  kubelet, capn-control-plane  Back-off pulling image "virtualcluster/apiserver-v1.16.2"

@Fei-Guo ^^

๐Ÿ› Unable to create a VirtualCluster on k8 v1.20.2

Problem

Virtual cluster does not deploy with k8 v1.20.2. Output from vc-manager:

{"level":"info","ts":1621457612.1333222,"logger":"clusterversion-controller","msg":"reconciling ClusterVersion..."}
{"level":"info","ts":1621457612.1334903,"logger":"clusterversion-controller","msg":"new ClusterVersion event","ClusterVersionName":"cv-sample-np"}
{"level":"info","ts":1621457635.4177175,"logger":"virtualcluster-webhook","msg":"validate create","vc-name":"vc-sample-1"}
{"level":"info","ts":1621457635.4421399,"logger":"virtualcluster-controller","msg":"reconciling VirtualCluster..."}
{"level":"info","ts":1621457635.4824774,"logger":"virtualcluster-webhook","msg":"validate update","vc-name":"vc-sample-1"}
{"level":"info","ts":1621457635.511791,"logger":"virtualcluster-controller","msg":"a finalizer has been registered for the VirtualCluster CRD","finalizer":"virtualcluster.finalizer.native"}
{"level":"info","ts":1621457635.5118568,"logger":"virtualcluster-controller","msg":"will create a VirtualCluster","vc":"vc-sample-1"}
{"level":"info","ts":1621457635.53576,"logger":"virtualcluster-webhook","msg":"validate update","vc-name":"vc-sample-1"}
{"level":"info","ts":1621457635.556264,"logger":"virtualcluster-controller","msg":"reconciling VirtualCluster..."}
{"level":"info","ts":1621457635.5563915,"logger":"virtualcluster-controller","msg":"VirtualCluster is pending","vc":"vc-sample-1"}
{"level":"info","ts":1621457638.3632772,"logger":"virtualcluster-controller","msg":"creating secret","name":"root-ca","namespace":"default-a4a766-vc-sample-1"}
{"level":"info","ts":1621457638.400915,"logger":"virtualcluster-controller","msg":"creating secret","name":"apiserver-ca","namespace":"default-a4a766-vc-sample-1"}
{"level":"info","ts":1621457638.4276915,"logger":"virtualcluster-controller","msg":"creating secret","name":"etcd-ca","namespace":"default-a4a766-vc-sample-1"}
{"level":"info","ts":1621457638.4523375,"logger":"virtualcluster-controller","msg":"creating secret","name":"controller-manager-kubeconfig","namespace":"default-a4a766-vc-sample-1"}
{"level":"info","ts":1621457638.485505,"logger":"virtualcluster-controller","msg":"creating secret","name":"admin-kubeconfig","namespace":"default-a4a766-vc-sample-1"}
{"level":"info","ts":1621457638.5329306,"logger":"virtualcluster-controller","msg":"creating secret","name":"serviceaccount-rsa","namespace":"default-a4a766-vc-sample-1"}
{"level":"info","ts":1621457638.562718,"logger":"virtualcluster-controller","msg":"deploying StatefulSet for master component","component":""}
{"level":"error","ts":1621457638.5628488,"logger":"virtualcluster-controller","msg":"fail to create virtualcluster","vc":"vc-sample-1","retrytimes":3,"error":"try to deploy unknwon component: "}
{"level":"info","ts":1621457638.5843189,"logger":"virtualcluster-webhook","msg":"validate update","vc-name":"vc-sample-1"}
{"level":"info","ts":1621457638.6019728,"logger":"virtualcluster-controller","msg":"reconciling VirtualCluster..."}
{"level":"info","ts":1621457638.6020927,"logger":"virtualcluster-controller","msg":"VirtualCluster is pending","vc":"vc-sample-1"}

The namespace and secrets were created but none of the statefulsets from the ClusterVersion.

What I did

git clone https://github.com/kubernetes-sigs/cluster-api-provider-nested.git
cd cluster-api-provider-nested/virtualcluster

Build kubectl-vc

make build WHAT=cmd/kubectl-vc
sudo cp -f _output/bin/kubectl-vc /usr/local/bin

Create new CRDs

(see #62)

cd pkg
controller-gen "crd:trivialVersions=true,maxDescLen=0" rbac:roleName=manager-role paths="./..." output:crd:artifacts:config=config/crds

Install CRD

kubectl create -f config/crds/cluster.x-k8s.io_clusters.yaml
kubectl create -f config/crds/tenancy.x-k8s.io_clusterversions.yaml
kubectl create -f config/crds/tenancy.x-k8s.io_virtualclusters.yaml

Create ns, rbac, deployment, ...

kubectl create -f config/setup/all_in_one.yaml

I've added events to the RBAC because of this:

{"level":"info","ts":1621388803.9796872,"logger":"controller-runtime.controller","msg":"Starting EventSource","controller":"clusterversion-controller","source":"kind source: /, Kind="}
E0519 01:46:43.981421 1 event.go:260] Server rejected event '&v1.Event{TypeMeta:v1.TypeMeta{Kind:"", APIVersion:""}, ObjectMeta:v1.ObjectMeta{Name:"vc-manager-leaderelection-lock.16805486d7f96288", GenerateName:"", Namespace:"vc-manager", SelfLink:"", UID:"", ResourceVersion:"", Generation:0, CreationTimestamp:v1.Time{Time:time.Time{wall:0x0, ext:0, loc:(*time.Location)(nil)}}, DeletionTimestamp:(*v1.Time)(nil), DeletionGracePeriodSeconds:(*int64)(nil), Labels:map[string]string(nil), Annotations:map[string]string(nil), OwnerReferences:[]v1.OwnerReference(nil), Finalizers:[]string(nil), ClusterName:"", ManagedFields:[]v1.ManagedFieldsEntry(nil)}, InvolvedObject:v1.ObjectReference{Kind:"ConfigMap", Namespace:"vc-manager", Name:"vc-manager-leaderelection-lock", UID:"5c94eb36-66a2-437a-a10f-6fc651533e96", APIVersion:"v1", ResourceVersion:"96800211", FieldPath:""}, Reason:"LeaderElection", Message:"vc-manager-76c5878465-6tq8f_e49ead0e-85c4-43f6-bb44-e4f0820e8ee8 became leader", Source:v1.EventSource{Component:"vc-manager-76c5878465-6tq8f_e49ead0e-85c4-43f6-bb44-e4f0820e8ee8", Host:""}, FirstTimestamp:v1.Time{Time:time.Time{wall:0xc0213960fa5d0488, ext:18231381017, loc:(*time.Location)(0x23049a0)}}, LastTimestamp:v1.Time{Time:time.Time{wall:0xc0213960fa5d0488, ext:18231381017, loc:(*time.Location)(0x23049a0)}}, Count:1, Type:"Normal", EventTime:v1.MicroTime{Time:time.Time{wall:0x0, ext:0, loc:(*time.Location)(nil)}}, Series:(*v1.EventSeries)(nil), Action:"", Related:(*v1.ObjectReference)(nil), ReportingController:"", ReportingInstance:""}': 'events is forbidden: User "system:serviceaccount:vc-manager:vc-manager" cannot create resource "events" in API group "" in the namespace "vc-manager"' (will not retry!)

Create a new ClusterVersion

kubectl create -f config/sampleswithspec/clusterversion_v1_nodeport.yaml

Had to remove kind and apiVersion below controllerManager: to match the schema:

error: error validating "cv-sample-nb.yaml": error validating data: [ValidationError(ClusterVersion.spec.controllerManager): unknown field "apiVersion" in io.x-k8s.tenancy.v1alpha1.ClusterVersion.spec.controllerManager, ValidationError(ClusterVersion.spec.controllerManager): unknown field "kind" in io.x-k8s.tenancy.v1alpha1.ClusterVersion.spec.controllerManager]; if you choose to ignore these errors, turn validation off with --validate=false

Create a new VirtualCluster

kubectl vc create -f config/sampleswithspec/virtualcluster_1_nodeport.yaml -o vc.kubeconfig

โš ๏ธ Change default branch to "main"

Instructions here https://www.kubernetes.dev/resources/rename/

Anytime

  • If a presubmit or postsubmit prowjob triggers on the master branch (branches field of the prowjob), add the main branch to the list (see kubernetes/test-infra#20665 for an example).
  • If the milestone_applier prow config references the master branch, add the main branch to the config (see kubernetes/test-infra#20675 for an example).
  • If the branch_protection prow config references the master branch, add the main branch to the config.

Just before rename

  • periodic prowjobs, or any prowjob that mentions the master
  • If a prowjob mentions master in its name, rename the job to not include the branch name
  • If a prowjob calls scripts or code in your repo that explicitly reference master

Finalizing

  • Set remote head to track main
  • Rename the default branch from master to main
  • #66 fix apidiff branch name

Post-rename

  • If a prowjob still references the master branch in the branches field, remove the master branch
  • If the milestone_applier prow config references the master branch, remove it from the config.
  • Send out comms

/kind cleanup
/wg naming

๐ŸŒฑ Investigate using KubeadmControlPlane (KCP) for NestedCluster

TL;DR

This is a spike to investigate what it would be like to use KCP for implementing CAPN. https://github.com/kubernetes-sigs/cluster-api/tree/master/controlplane/kubeadm KCP uses Kubeadm's ClusterConfiguration object and is what nearly all Cluster API providers use, this was originally not gone with because it only supports cloudinit based deployments.

Background:

KCP needs to map to Machines, Machines have to map to actual Kubernetes Nodes and the control plane pods need to show up on those specific. KCP also needs to be able to exec from the management cluster into the workload cluster to get access to etcd for health checks. KCP also doesn't manage client certificates, these are handled on each node by kubeadm that is called through cloudinit. We should look at the in-tree docker provider for some inspiration on how we could use the kcp outputs to create Pod based control planes within custom Machine controllers. https://github.com/kubernetes-sigs/cluster-api/tree/master/test/infrastructure/docker

๐ŸŒฑ Setup Prow Presubmits

We should have our PRs being tested for us with Prow on PR submission so we can have better assurance of the code.

Links

Our pre-submit should run make generate lint test. This should be called from a script like other Cluster API providers so that we can periodically change what it does without needing to update the test-infra repo. EG https://github.com/kubernetes-sigs/cluster-api-provider-aws/tree/main/scripts

๐Ÿ› make release-alias-tag fails on prow image-pushing

Step #0: make[3]: Leaving directory '/workspace'
Step #0: make[2]: Leaving directory '/workspace'
Step #0: gcloud container images add-tag gcr.io/k8s-staging-cluster-api-nested/cluster-api-nested-controller:v20210608-ea129c6 gcr.io/k8s-staging-cluster-api-nested/cluster-api-nested-controller:main
Step #0: Created [gcr.io/k8s-staging-cluster-api-nested/cluster-api-nested-controller:main].
Step #0: Updated [gcr.io/k8s-staging-cluster-api-nested/cluster-api-nested-controller:v20210608-ea129c6].
Step #0: gcloud container images add-tag gcr.io/k8s-staging-cluster-api-nested/nested-controlplane-controller:v20210608-ea129c6 gcr.io/k8s-staging-cluster-api-nested/nested-controlplane-controller:main
Step #0: ERROR: Error during upload of: gcr.io/k8s-staging-cluster-api-nested/nested-controlplane-controller:main
Step #0: ERROR: (gcloud.container.images.add-tag) Not found: response: {'status': '404', 'content-length': '202', 'x-xss-protection': '0', 'transfer-encoding': 'chunked', 'server': 'Docker Registry', '-content-encoding': 'gzip', 'docker-distribution-api-version': 'registry/2.0', 'cache-control': 'private', 'date': 'Tue, 08 Jun 2021 18:57:01 GMT', 'x-frame-options': 'SAMEORIGIN', 'content-type': 'application/json'}
Step #0: Failed to fetch "v20210608-ea129c6" from request "/v2/k8s-staging-cluster-api-nested/nested-controlplane-controller/manifests/v20210608-ea129c6".: <no details provided>
Step #0: make[1]: *** [Makefile:353: release-alias-tag] Error 1
Step #0: make[1]: Leaving directory '/workspace'
Step #0: make: *** [Makefile:346: release-staging] Error 2

make docker-build failed

$ make docker-build
fatal: No names found, cannot describe anything.
docker pull docker.io/docker/dockerfile:experimental
experimental: Pulling from docker/dockerfile
d7f0373ffb1d: Pull complete
Digest: sha256:600e5c62eedff338b3f7a0850beb7c05866e0ef27b2d2e8c02aa468e78496ff5
Status: Downloaded newer image for docker/dockerfile:experimental
docker.io/docker/dockerfile:experimental
docker pull docker.io/library/golang:1.15.3
1.15.3: Pulling from library/golang
e4c3d3e4f7b0: Pull complete
101c41d0463b: Pull complete
8275efcd805f: Pull complete
751620502a7a: Pull complete
aaabf962c4fc: Pull complete
7883babec904: Pull complete
1791d366c848: Pull complete
Digest: sha256:1ba0da74b20aad52b091877b0e0ece503c563f39e37aa6b0e46777c4d820a2ae
Status: Downloaded newer image for golang:1.15.3
docker.io/library/golang:1.15.3
docker pull gcr.io/distroless/static:latest
latest: Pulling from distroless/static
5dea5ec2316d: Pull complete
Digest: sha256:60a7d0c45932b6152b2f7ba561db2f91f58ab14aa90b895c58f72062c768fd77
Status: Downloaded newer image for gcr.io/distroless/static:latest
gcr.io/distroless/static:latest
bash: gcloud: command not found
bash: gcloud: command not found
DOCKER_BUILDKIT=1 docker build --build-arg goproxy=https://proxy.golang.org,direct --build-arg ARCH=amd64 --build-arg ldflags="" . -t gcr.io//cluster-api-nested-controller-amd64:dev
invalid argument "gcr.io//cluster-api-nested-controller-amd64:dev" for "-t, --tag" flag: invalid reference format

โš ๏ธ Upgrade Controller Runtime in VirtualCluster

This issue is to track upgrading from v0.6.1 controller runtime to v0.7.2+. This brings on a lot of changes to Controller Runtime and is closer to the release we use for the rest of CAPN.

This will start the move towards more consistent usage of contexts throughout all the reconcilers and syncer and align us for easier integration.

๐Ÿ› ClusterVersion CRD gets rejected on k8 v1.20.2

The ClusterVersion CRD is using apiVersion v1beta1 which requires properties in x-kubernetes-list-map-keys to be required (validation was introduced with kubernetes/kubernetes@2e18741). v1 supports defaults too.

A manual rebuild with a newer controller-gen (I've used controller-tools 0.5.0) using
controller-gen "crd:trivialVersions=true,maxDescLen=0" rbac:roleName=manager-role paths="./..." output:crd:artifacts:config=config/crds
creates an accepted CRD with the v1 CRD API.

Unfortunately, the CRD generated from make build is broken (contains an invalid validation: section) with newer controller-tools installed. I guess the reason is replace-null in hack/lib/util.sh.

โš ๏ธ Move controlplane.cluster.x-k8s.io components to controlplane/

When we look at other CAPI implementations instead of using multi-group controllers they typically are deployed as independent controllers, they're even built this way, for example see kcp in controlplane/kubeadm/ in https://github.com/kubernetes-sigs/cluster-ap while the main types NestedCluster in our world live usually at the top level, eg https://github.com/kubernetes-sigs/cluster-api-provider-aws you can even see the eks controlplane is built in controlplane/eks/ https://github.com/kubernetes-sigs/cluster-api-provider-aws/tree/main/controlplane/eks.

We should adjust to having the 4 controlplane group's resources in controlplane/nested/ and have only the top level NestedCluster at / this will also make it easier to co-operate if we eventually transition to kcp (eg #44)

Consider how to handle shared instance with VC

This issue is migrate from kubernetes-retired/multi-tenancy#1502

Issue description

With VC, all of the components installed in tenant cluster will be isolated, and the tenant cluster will have all resources for a specified application.

Here the question is there are some apps, which has a shared component, and the component will be shared by many apps. So with such apps, I was hoping I can install the shared component in the super cluster, but install other components for the app into tenant clusters, and I want all of the tenant cluster app can access the super cluster shared component, any comments for how can I achieve this?

I think with this model, I can also reduce the footprint for the supercluster as well, as I can abstract some common services and deploy them into the super cluster, and share it with all tenant clusters.

Comments from @christopherhein

Maybe we can take this question over to https://sigs.k8s.io/cluster-api-provider-nested.

Short answer is we do a bit of this, but it's slightly different we allow nested(virtual) clusters to operate on "real" super cluster Service clusterIPs so that we can have routable clusterIP ranges, this is done via a mutating admission webhook which acts as a proxy to the super cluster, we also have custom syncers written using this model - https://github.com/kubernetes-sigs/cluster-api-provider-nested/blob/main/virtualcluster/doc/customresource-syncer.md for CRDs that we expose only the implementation at the super cluster but want tenant clusters to be able to CRUD them.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.