syself / cluster-api-provider-hetzner Goto Github PK

Kubernetes Cluster API Provider Hetzner provides a consistent deployment and day 2 operations of "self-managed" Kubernetes clusters on Hetzner.

License: Apache License 2.0

Dockerfile 0.40% Makefile 3.83% Go 87.94% Python 2.72% Shell 5.11%

cluster-api cluster-api-provider-hetzner k8s-provider-hetzner k8s-sig-cluster-lifecycle k8s-sig-cluster-api kubernetes k8s hetzner hcloud devops

cluster-api-provider-hetzner's People

Contributors

Stargazers

Watchers

cluster-api-provider-hetzner's Issues

Warning ReconcileError secrets "my-cluster-kubeconfig" not found

/kind bug

What steps did you take and what happened:

I followed the docs, but applying the created yaml fails:

guettli@p15$ k get events -A --sort-by=.metadata.creationTimestamp 

default     8m14s       Warning   ReconcileError                machinehealthcheck/my-cluster-control-plane-unhealthy-5m   error creating client and cache for remote cluster: error fetching REST client config for remote cluster "default/my-cluster": failed to retrieve kubeconfig secret for Cluster default/my-cluster: secrets "my-cluster-kubeconfig" not found
default     8m14s       Warning   ReconcileError                machinehealthcheck/my-cluster-md-0-unhealthy-5m            error creating client and cache for remote cluster: error fetching REST client config for remote cluster "default/my-cluster": failed to retrieve kubeconfig secret for Cluster default/my-cluster: secrets "my-cluster-kubeconfig" not found
default     8m35s       Warning   ReconcileError                machinedeployment/my-cluster-md-0                          failed to retrieve HCloudMachineTemplate external object "default"/"my-cluster-md-0": hcloudmachinetemplates.infrastructure.cluster.x-k8s.io "my-cluster-md-0" not found
default     8m35s       Normal    SuccessfulCreate              machinedeployment/my-cluster-md-0                          Created MachineSet "my-cluster-md-0-7465476f6d"
default     8m4s        Normal    ChangeLoadBalancerAlgorithm   hetznercluster/my-cluster                                  Changed load balancer algorithm
default     8m4s        Normal    CreateLoadBalancer            hetznercluster/my-cluster                                  Created load balancer
default     8m          Normal    SuccessfulCreate              hcloudmachine/my-cluster-control-plane-72v5w               Created new server with id 20435964
default     7m56s       Warning   ReconcileError                machinehealthcheck/my-cluster-control-plane-unhealthy-5m   error creating client and cache for remote cluster: error creating dynamic rest mapper for remote cluster "default/my-cluster": Get "https://142.132.242.98:443/api?timeout=10s": dial tcp 142.132.242.98:443: connect: connection refused
default     7m45s       Normal    AddedAsTargetToLoadBalancer   hetznercluster/my-cluster                                  Added new server with id 20435964 to the loadbalancer 719385
default     6m52s       Warning   ReconcileError                machinehealthcheck/my-cluster-control-plane-unhealthy-5m   error creating client and cache for remote cluster: error creating dynamic rest mapper for remote cluster "default/my-cluster": context deadline exceeded
default     6m31s       Warning   ReconcileError                machinehealthcheck/my-cluster-md-0-unhealthy-5m            error creating client and cache for remote cluster: error creating dynamic rest mapper for remote cluster "default/my-cluster": context deadline exceeded
default     66s         Normal    DetectedUnhealthy             machine/my-cluster-control-plane-rb8qb                     Machine default/my-cluster-control-plane-unhealthy-5m/my-cluster-control-plane-rb8qb/ has unhealthy node

What could be the reason?

According to these docs, the secret is called hetzner https://github.com/syself/cluster-api-provider-hetzner/blob/main/docs/topics/preparation.md#create-a-secret-for-hcloud-only

I changed this, and used that:

kubectl create secret generic my-cluster-kubeconfig --from-literal=hcloud=$HCLOUD_TOKEN
kubectl patch secret my-cluster-kubeconfig -p '{"metadata":{"labels":{"clusterctl.cluster.x-k8s.io/move":""}}}'

Now I get this error:

default     29s         Warning   ReconcileError                machinehealthcheck/my-cluster-md-0-unhealthy-5m            error creating client and cache for remote cluster: error fetching REST client config for remote cluster "default/my-cluster": failed to retrieve kubeconfig secret for Cluster default/my-cluster: secrets "my-cluster-kubeconfig" not found
default     29s         Warning   ReconcileError                machinehealthcheck/my-cluster-control-plane-unhealthy-5m   error creating client and cache for remote cluster: error fetching REST client config for remote cluster "default/my-cluster": failed to retrieve kubeconfig secret for Cluster default/my-cluster: secrets "my-cluster-kubeconfig" not found

Link Checker Dashboard

No output. Check pipeline run to see if lychee panicked.
Full Github Actions output

Link Checker Dashboard

No output. Check pipeline run to see if lychee panicked.
Full Github Actions output

Making controlPlaneLoadBalancer optional?

/kind feature

Describe the solution you'd like
Would it be possible to make the controlPlaneLoadBalancer parameter optional? There are some cases where I don't need the hetzner load balancer.

Link Checker Dashboard

No output. Check pipeline run to see if lychee panicked.
Full Github Actions output

talos: machine deployment created too early

I'm running into a problem with the combination CAPH, CABPT, CACPPT where worker nodes end up in an endless reboot loop and never reach the Ready state.
My HetznerCluster resource has:

[...]
spec:
  controlPlaneEndpoint:
    host: ""
    port: 6443
[...]

Now, when creating a cluster with at least one worker node, only the control plane nodes are created with bootstrap data that contains a valid endpoint (load balancer IP as expected). Worker node(s) are missing the host:

kubectl --context kind-test get secret -l cluster.x-k8s.io/cluster-name=test -o json | jq -r '.items[] | select(.metadata.name | endswith("bootstrap-data")) | [.metadata.name, .data.value] | join(" ")' | while read name data; do echo -n "$name: "; base64 -d <<<$data | yq -r .cluster.controlPlane.endpoint; done
test-controlplane-6v8gn-bootstrap-data: https://XXX.XXX.XXX.XXX:6443
test-controlplane-qw2h6-bootstrap-data: https://XXX.XXX.XXX.XXX:6443
test-controlplane-tb9dz-bootstrap-data: https://XXX.XXX.XXX.XXX:6443
test-md-0-5f6c9b9447-gj6gt-bootstrap-data: https://:6443

My assumption is that the workers' bootstrap data is generated before CAPH replaces the empty spec.controlPlaneEndpoint.host value with the LB's external IP.
When creating the cluster with a worker count of 0 and scaling it up later, the bootstrap data is correct and the worker node(s) reach Ready. As it is now, they can never even join. I guess the workers are created too early as they don't depend on the LB creation. Not sure whether this is directly CAPH's fault, but would expect this dependency ordering to be a responsibility of the infrastructure provider. CABPT seems to just happily pick up the empty endpoint and only when booting the node it is flagged as invalid (Invalid controlplane endpoint: host must not be blank)

Link Checker Dashboard

No output. Check pipeline run to see if lychee panicked.
Full Github Actions output

Thoughts using Kubernetes/Rook on Dedicated root servers to create Ceph RBDs and using those RBDs as PVs?

Now that we have Kubernetes support for Dedicated root servers, are there:

any thoughts on deploying Rook Ceph RBDs and using those RBDs as PVs?
anyone tried this already?

cluster with baremetal host waiting for available machine

/kind bug

I setup a cluster with BareMetalHost using hetzner-hcloud-control-planes flavor. My cluster template is edited to have 1 HCloudMachine for control plane, use CIDR bloks 10.243.0.0/16, zero replicas in g-work-md-0 that would contain HCloud machine and 1 replica in g-work-md-1 where bare metal is used.
When the template is applied the CRs are created, machine for control-plane is added, its load-balancer too. Bare-metal server is reinstalled and it is up and running but not added to cluster. I also deployed CNI and CCM (Hetzner version).

What did you expect to happen:

The machine should join the cluster and KubeadmConfigTemplate/g-work-md-1 should run installation process there. Now the cluster is waiting for the machine because it thinks it is unhealthy

$ clusterctl describe cluster g-work
NAME                                                                       READY  SEVERITY  REASON                       SINCE  MESSAGE                                                       
Cluster/g-work                                                             True                                          51m                                                                   
├─ClusterInfrastructure - HetznerCluster/g-work                                                                                                                                                
├─ControlPlane - KubeadmControlPlane/g-work-control-plane                  True                                          51m                                                                   
│ └─Machine/g-work-control-plane-4gvd4                                     True                                          52m                                                                   
│   └─MachineInfrastructure - HCloudMachine/g-work-control-plane-4fxsc                                                                                                                         
└─Workers                                                                                                                                                                                      
  ├─MachineDeployment/g-work-md-0                                          True                                          54m                                                                   
  └─MachineDeployment/g-work-md-1                                          False  Warning   WaitingForAvailableMachines  54m    Minimum availability requires 1 replicas, current 0 available  
    └─Machine/g-work-md-1-5bd54bfd6c-tzxql                                 True                                          45m                                                                   
      └─MachineInfrastructure - HetznerBareMetalMachine/g-work-md-1-gh6ks                                                                             

$ kubectl describe MachineHealthCheck g-work-md-1-unhealthy-5m
...
Status:
  Conditions:
    Last Transition Time:  2022-11-03T07:49:57Z
    Status:                True
    Type:                  RemediationAllowed
  Expected Machines:       1
  Observed Generation:     1
  Targets:
    g-work-md-1-5bd54bfd6c-tzxql
Events:
  Type     Reason          Age                 From                           Message
  ----     ------          ----                ----                           -------
  Warning  ReconcileError  60m (x16 over 60m)  machinehealthcheck-controller  error creating client and cache for remote cluster: error fetching REST client config for remote cluster "default/g-work": failed to retrieve kubeconfig secret for Cluster default/g-work: secrets "g-work-kubeconfig" not found
  Warning  ReconcileError  59m (x2 over 59m)   machinehealthcheck-controller  error creating client and cache for remote cluster: error creating dynamic rest mapper for remote cluster "default/g-work": client rate limiter Wait returned an error: context deadline exceeded - error from a previous attempt: EOF

$ kubectl describe secrets g-work-kubeconfig
Name:         g-work-kubeconfig
Namespace:    default
Labels:       caph.environment=owned
              cluster.x-k8s.io/cluster-name=g-work
Annotations:  <none>

Type:  cluster.x-k8s.io/secret

Data
====
value:  5535 bytes

$ kubectl describe hetznerbaremetalmachines g-work-md-1-gh6ks
Name:         g-work-md-1-gh6ks
Namespace:    default
Labels:       cluster.x-k8s.io/cluster-name=g-work
              cluster.x-k8s.io/deployment-name=g-work-md-1
              machine-template-hash=1681069827
              nodepool=g-work-md-1
Annotations:  cluster.x-k8s.io/cloned-from-groupkind: HetznerBareMetalMachineTemplate.infrastructure.cluster.x-k8s.io
              cluster.x-k8s.io/cloned-from-name: g-work-md-1
              infrastructure.cluster.x-k8s.io/HetznerBareMetalHost: default/bm-0
API Version:  infrastructure.cluster.x-k8s.io/v1beta1
Kind:         HetznerBareMetalMachine
...
Status:
  Addresses:
    Address:  46.4.66.173/26
    Type:     InternalIP
    Address:  2a01:4f8:140:2484::2/64
    Type:     InternalIP
    Address:  g-work-md-1-gh6ks
    Type:     Hostname
    Address:  g-work-md-1-gh6ks
    Type:     InternalDNS
  Conditions:
    Last Transition Time:  2022-11-03T07:49:58Z
    Status:                True
    Type:                  AssociateBMHCondition
    Last Transition Time:  2022-11-03T07:49:58Z
    Status:                True
    Type:                  InstanceBootstrapReady
    Last Transition Time:  2022-11-03T07:55:34Z
    Status:                True
    Type:                  InstanceReady
  Last Updated:            2022-11-03T07:51:34Z
  Ready:                   true
Events:                    <none>

Anything else you would like to add:

Configs related to bare-metal server. There is Ubuntu 22.04 image, hostSelector to match the host through added label. The machine is up and running and reachable with provided SSH key. No k8s packages and/or processes there though.

---
apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
kind: HetznerBareMetalMachineTemplate
metadata:
  name: g-work-md-1
  namespace: default
spec:
  template:
    spec:
      installImage:
        image:
          path: /root/images/Ubuntu-2204-jammy-amd64-base.tar.gz
          # path: /root/.oldroot/nfs/install/../images/Ubuntu-2004-focal-64-minimal-hwe.tar.gz
        partitions:
        - fileSystem: ext4
          mount: /boot
          size: 1024M
        - fileSystem: ext4
          mount: /
          size: all
        postInstallScript: |
          #!/bin/bash
          apt-get update && apt-get install -y cloud-init apparmor apparmor-utils
      hostSelector:
        matchExpressions:
        - key: "geneea/name"
          operator: "in"
          values:
          - "sigma"
          - "theta"
      sshSpec:
        portAfterCloudInit: 22
        portAfterInstallImage: 22
        secretRef:
          key:
            name: sshkey-name
            privateKey: ssh-privatekey
            publicKey: ssh-publickey
          name: robot-ssh
---
apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
kind: HetznerBareMetalHost
metadata:
  name: "bm-0"
  labels:
    geneea/name: "sigma"
spec:
  serverID: 1702753
  rootDeviceHints:
    raid:
      wwn:
        - "0x500a07511320714e"
        - "0x5002538d41d5de7c"
  maintenanceMode: false
  description: "sigma"

Environment:

cluster-api-provider-hetzner version: ccm-hetzner-1.1.4 installed to workload cluster, ccm-hcloud-1.0.11 in management cluster
Kubernetes version: (use kubectl version) 1.25.2
OS (e.g. from /etc/os-release):

Dependency Dashboard 🤖

Awaiting Schedule

These updates are awaiting their schedule. Click on a checkbox to get an update now.

🌱 Update Github Actions group to v40.1.9

Open

These updates have all been created already. Click a checkbox below to force a retry/rebase of any.

🌱 Update Github Actions group to v3.5.0

Detected dependencies

dockerfile

images/builder/Dockerfile

docker.io/library/alpine 3.19.1@sha256:6457d53fb065d6f250e1504b9bc42d5b6c65941d57532c072d929dd0628977d0

docker.io/library/alpine 3.19.1@sha256:6457d53fb065d6f250e1504b9bc42d5b6c65941d57532c072d929dd0628977d0

docker.io/hadolint/hadolint v2.12.0-alpine@sha256:7dba9a9f1a0350f6d021fb2f6f88900998a4fb0aaf8e4330aa8c38544f04db42

docker.io/aquasec/trivy 0.50.1@sha256:0aff831cd122c9cc8dbd25fc75974c21cd49ca7c72d522ce11978373f695f55d

docker.io/library/golang 1.21.5-bullseye@sha256:810dd3335e68f0b6ea802486fd0a027dda4013797b6fa58407f354244d9db2b7

images/cache/Dockerfile

docker.io/library/alpine 3.19.1@sha256:c5b1261d6d3e43071626931fc004f70149baeba2c8ec672bd4f27761f8e1ad6b

docker.io/library/alpine 3.19.1@sha256:c5b1261d6d3e43071626931fc004f70149baeba2c8ec672bd4f27761f8e1ad6b

images/caph/Dockerfile

docker.io/library/golang 1.21.6-bullseye@sha256:a8712f27d9ac742e7bded8f81f7547c5635e855e8b80302e8fc0ce424f559295

github-actions

.github/actions/e2e/action.yaml

actions/cache v4@0c45773b623bea8c8e75f6c82b208c3cf94ea4f9

actions/download-artifact v4@c850b930e6ba138125429b7e5c93fc707a7f8427

hetznercloud/tps-action dee5dd2546322c28ed8f74b910189066e8b6f31a

actions/upload-artifact v4@5d5d22a31266ced268874388b861e4b58bb5c2f3

.github/actions/manager-image/action.yaml

docker/setup-buildx-action v3.3.0@d70bba72b1f3fd22344832f00baa16ece964efeb

docker/login-action v3.1.0@e92390c5fb421da1463c202d546fed0ec5c39f20

actions/cache v4.0.2@0c45773b623bea8c8e75f6c82b208c3cf94ea4f9

docker/build-push-action v5.3.0@2cdde995de11925a030ce8070c3d77a52ffcf1c0

docker/build-push-action v5@2cdde995de11925a030ce8070c3d77a52ffcf1c0

.github/actions/metadata/action.yaml

docker/metadata-action v5.5.1@8e5442c4ef9f78752691e2d8f8d19755c6f78e81

.github/actions/setup-go/action.yaml

actions/setup-go v5.0.0@0c52d547c9bc32b1aa3301fd7a9cb496313a4491

actions/cache v4@0c45773b623bea8c8e75f6c82b208c3cf94ea4f9

actions/cache v4@0c45773b623bea8c8e75f6c82b208c3cf94ea4f9

.github/actions/test-release/action.yaml

actions/cache v4@0c45773b623bea8c8e75f6c82b208c3cf94ea4f9

actions/upload-artifact v4@5d5d22a31266ced268874388b861e4b58bb5c2f3

.github/workflows/build.yml

actions/checkout v4@b4ffde65f46336ab88eb53be808477a3936bae11

docker/setup-qemu-action v3@68827325e0b33c7199eb31dd4e31fbe9023e06e3

docker/setup-buildx-action v3@d70bba72b1f3fd22344832f00baa16ece964efeb

docker/login-action v3.1.0@e92390c5fb421da1463c202d546fed0ec5c39f20

sigstore/cosign-installer v3.4.0@e1523de7571e31dbe865fd2e80c5c7c23ae71eb4

actions/cache v4.0.2@0c45773b623bea8c8e75f6c82b208c3cf94ea4f9

docker/build-push-action v5.3.0@2cdde995de11925a030ce8070c3d77a52ffcf1c0

docker/build-push-action v5@2cdde995de11925a030ce8070c3d77a52ffcf1c0

actions/upload-artifact v4.3.1@5d5d22a31266ced268874388b861e4b58bb5c2f3

docker/build-push-action v5.3.0@2cdde995de11925a030ce8070c3d77a52ffcf1c0

.github/workflows/e2e-basic-baremetal.yaml

actions/checkout v4@b4ffde65f46336ab88eb53be808477a3936bae11

actions/checkout v4@b4ffde65f46336ab88eb53be808477a3936bae11

actions/checkout v4@b4ffde65f46336ab88eb53be808477a3936bae11

.github/workflows/e2e-basic-packer.yaml

actions/checkout v4@b4ffde65f46336ab88eb53be808477a3936bae11

actions/checkout v4@b4ffde65f46336ab88eb53be808477a3936bae11

actions/checkout v4@b4ffde65f46336ab88eb53be808477a3936bae11

.github/workflows/e2e-basic.yaml

actions/checkout v4@b4ffde65f46336ab88eb53be808477a3936bae11

actions/checkout v4@b4ffde65f46336ab88eb53be808477a3936bae11

actions/checkout v4@b4ffde65f46336ab88eb53be808477a3936bae11

.github/workflows/e2e-feature-baremetal.yaml

actions/checkout v4@b4ffde65f46336ab88eb53be808477a3936bae11

actions/checkout v4@b4ffde65f46336ab88eb53be808477a3936bae11

actions/checkout v4@b4ffde65f46336ab88eb53be808477a3936bae11

.github/workflows/e2e-feature.yaml

actions/checkout v4@b4ffde65f46336ab88eb53be808477a3936bae11

actions/checkout v4@b4ffde65f46336ab88eb53be808477a3936bae11

actions/checkout v4@b4ffde65f46336ab88eb53be808477a3936bae11

.github/workflows/e2e-periodic.yaml

actions/checkout v4@b4ffde65f46336ab88eb53be808477a3936bae11

actions/checkout v4@b4ffde65f46336ab88eb53be808477a3936bae11

actions/checkout v4@b4ffde65f46336ab88eb53be808477a3936bae11

actions/checkout v4@b4ffde65f46336ab88eb53be808477a3936bae11

actions/checkout v4@b4ffde65f46336ab88eb53be808477a3936bae11

actions/checkout v4@b4ffde65f46336ab88eb53be808477a3936bae11

actions/checkout v4@b4ffde65f46336ab88eb53be808477a3936bae11

.github/workflows/e2e-upgrade-caph.yaml

actions/checkout v4@b4ffde65f46336ab88eb53be808477a3936bae11

actions/checkout v4@b4ffde65f46336ab88eb53be808477a3936bae11

actions/checkout v4@b4ffde65f46336ab88eb53be808477a3936bae11

.github/workflows/e2e-upgrade-kubernetes.yaml

actions/checkout v4@b4ffde65f46336ab88eb53be808477a3936bae11

actions/checkout v4@b4ffde65f46336ab88eb53be808477a3936bae11

actions/checkout v4@b4ffde65f46336ab88eb53be808477a3936bae11

.github/workflows/main-promote-builder-image.yml

actions/checkout v4.1.1@b4ffde65f46336ab88eb53be808477a3936bae11

ghcr.io/syself/caph-builder 1.0.16

.github/workflows/pr-e2e.yaml

actions/checkout v4.1.1@b4ffde65f46336ab88eb53be808477a3936bae11

actions/checkout v4@b4ffde65f46336ab88eb53be808477a3936bae11

actions/checkout v4@b4ffde65f46336ab88eb53be808477a3936bae11

actions/checkout v4@b4ffde65f46336ab88eb53be808477a3936bae11

.github/workflows/pr-lint.yml

actions/checkout v4.1.1@b4ffde65f46336ab88eb53be808477a3936bae11

ghcr.io/syself/caph-builder 1.0.16

.github/workflows/pr-verify.yml

kubernetes-sigs/kubebuilder-release-tools v0.4.3@012269a88fa4c034a0acf1ba84c26b195c0dbab4

actions/checkout v4.1.1@b4ffde65f46336ab88eb53be808477a3936bae11

actions/setup-node v4@60edb5dd545a775178f52524783378180af0d1f8

actions/create-github-app-token v1@7bfa3a4717ef143a604ee0a99d859b8886a96d00

pascalgn/size-label-action v0.5.0@37a5ad4ae20ea8032abf169d953bcd661fd82cd3

actions/labeler v5@8558fd74291d67161a8a78ce36a881fa63b766a9

EndBug/label-sync v2@52074158190acb45f3077f9099fea818aa43f97a

.github/workflows/release.yml

actions/checkout v4@b4ffde65f46336ab88eb53be808477a3936bae11

docker/setup-qemu-action v3@68827325e0b33c7199eb31dd4e31fbe9023e06e3

docker/setup-buildx-action v3@d70bba72b1f3fd22344832f00baa16ece964efeb

docker/login-action v3.1.0@e92390c5fb421da1463c202d546fed0ec5c39f20

sigstore/cosign-installer v3.4.0@e1523de7571e31dbe865fd2e80c5c7c23ae71eb4

docker/build-push-action v5@2cdde995de11925a030ce8070c3d77a52ffcf1c0

actions/upload-artifact v4.3.1@5d5d22a31266ced268874388b861e4b58bb5c2f3

actions/checkout v4@b4ffde65f46336ab88eb53be808477a3936bae11

actions/setup-go v5.0.0@0c52d547c9bc32b1aa3301fd7a9cb496313a4491

softprops/action-gh-release v2@9d7c94cfd0a1f3ed45544c887983e9fa900f0564

.github/workflows/report-bin-size.yml

actions/checkout v4.1.1@b4ffde65f46336ab88eb53be808477a3936bae11

actions/setup-go v5.0.0@0c52d547c9bc32b1aa3301fd7a9cb496313a4491

actions/upload-artifact v4.3.1@5d5d22a31266ced268874388b861e4b58bb5c2f3

.github/workflows/schedule-cache-cleaner-caph-image.yml

actions/cache v4.0.2@0c45773b623bea8c8e75f6c82b208c3cf94ea4f9

ubuntu 22.04

.github/workflows/schedule-scan-image.yml

actions/checkout v4.1.1@b4ffde65f46336ab88eb53be808477a3936bae11

ghcr.io/syself/caph-builder 1.0.16

.github/workflows/schedule-update-bot.yaml

actions/checkout v4.1.1@b4ffde65f46336ab88eb53be808477a3936bae11

actions/create-github-app-token v1@7bfa3a4717ef143a604ee0a99d859b8886a96d00

renovatebot/github-action v40.1.7@7d358366277001f3316d7fa54ff49a81c0158948

.github/workflows/test.yml

actions/checkout v4.1.1@b4ffde65f46336ab88eb53be808477a3936bae11

actions/setup-go v5.0.0@0c52d547c9bc32b1aa3301fd7a9cb496313a4491

test-summary/action v2.3@032c8a9cec6aaa3c20228112cae6ca10a3b29336

actions/upload-artifact v4.3.1@5d5d22a31266ced268874388b861e4b58bb5c2f3

gomod

go.mod

go 1.21

github.com/blang/semver/v4 v4.0.0

github.com/go-logr/logr v1.4.1

github.com/go-logr/zapr v1.3.0

github.com/hetznercloud/hcloud-go/v2 v2.7.0

github.com/onsi/ginkgo/v2 v2.17.1

github.com/onsi/gomega v1.32.0

github.com/spf13/pflag v1.0.5

github.com/stretchr/testify v1.9.0

github.com/syself/hrobot-go v0.2.5

go.uber.org/zap v1.27.0

golang.org/x/crypto v0.22.0

golang.org/x/exp v0.0.0-20240409090435-93d18d7e34b8@93d18d7e34b8

golang.org/x/mod v0.17.0

k8s.io/klog/v2 v2.120.1

k8s.io/utils v0.0.0-20240310230437-4693a0247e57@4693a0247e57

sigs.k8s.io/controller-runtime v0.16.5

sigs.k8s.io/kind v0.22.0

regex

templates/cluster-templates/bases/hcloud-kcp-ubuntu.yaml

containerd/containerd 1.7.15

templates/cluster-templates/bases/hetznerbaremetal-kcp-ubuntu.yaml

containerd/containerd 1.7.15

templates/cluster-templates/bases/kct-md-0-ubuntu.yaml

containerd/containerd 1.7.15

images/builder/Dockerfile

lycheeverse/lychee v0.14.3

golangci/golangci-lint v1.57.2

debian_11/skopeo 1.2.2+dfsg1-1+b6

adrienverge/yamllint v1.35.1

opt-nc/yamlfixer 0.9.15

Link Checker Dashboard

No output. Check pipeline run to see if lychee panicked.
Full Github Actions output

Link Checker Dashboard

No output. Check pipeline run to see if lychee panicked.
Full Github Actions output

This issue tracks Hetzner dedicated servers coming soon Proposal

Love what this allows us to do on hcloud.

I am particularly excited to see Hetzner dedicated servers coming soon. Creating this issue to track that discussion/work. Hope this is OK.

There's already an exciting startup that's running Kubernetes on Hetzner root/dedicated servers. Here's their roadmap: https://docs.symbiosis.host/about/roadmap

I am unsure how close their OSS contributions brings us closer to running Kubernetes on Hetzner root/dedicated servers

I sketched out a way how I would go about it: https://www.reddit.com/r/hetzner/comments/ttm3n7/how_do_you_manage_hetzner_robot_resources_like/i3d0g8w/

Link Checker Dashboard

No output. Check pipeline run to see if lychee panicked.
Full Github Actions output

Support loadbalancer for ingress-nginx

/kind feature

Describe the solution you'd like
[A clear and concise description of what you want to happen.]
Hi, I've been following this tutorial
and have created manually a loadbalancer with two service pointing to both ports 80 and 443 with ssl-passthrough, because ingress-nginx should be terminating that.

Is it possible to allow the creation of a loadbalancer additionally to the already provisioned controlPlaneLoadBalancer?

Do I need a regular floating ip that ingress-nginx is able to pick that up or use something like metallb for multiple ones?

Thanks in advance!

Anything else you would like to add:
[Miscellaneous information that will assist in solving the issue.]
I'll definitely try out your tilt setup next
https://github.com/syself/cluster-api-provider-hetzner/blob/main/docs/developers/development.md#tilt-for-dev-in-caph
to be able to incorporate your changes sooner.

Environment:

cluster-api-provider-hetzner version: v1.1.1 (This new version works with provisioning too)
Kubernetes version: (use kubectl version) 1.23.3
OS (e.g. from /etc/os-release): fedora-35

talos v0.14 won't bootstrap the etcd

/kind bug

What steps did you take and what happened:
talos cacppt won't bootsrap the etcd cluster until CP machine is ready.

[slack thread]

How to Guide for Talos

/kind feature

Describe the solution you'd like
First and foremost thank you very much for this great work!

I'd like to ask for a Guideline / How to Guide to bootstrap the cluster with Talos as I'm sure many people would appreciate and favor it over standard linux distros like e.g. ubuntu.

Link Checker Dashboard

No output. Check pipeline run to see if lychee panicked.
Full Github Actions output

Support autoscaling from zero

/kind feature

Describe the solution you'd like
[A clear and concise description of what you want to happen.]

Add support for autoscaling from zero defined in https://github.com/kubernetes-sigs/cluster-api/blob/main/docs/proposals/20210310-opt-in-autoscaling-from-zero.md

Anything else you would like to add:
[Miscellaneous information that will assist in solving the issue.]

Just implemented upstream in kubernetes/autoscaler#4840

IPv6 Only/Dual Stack support?

/kind feature

Describe the solution you'd like
Hetzner recently added a feature to allow IPv6-only VMs on HCloud. IPv4s are now considered an extra cost for each instance.
Is there currently a way to do IPv6 Only or Dual Stack deployments? I haven't seen any documentation on this anywhere so I assume not.

Link Checker Dashboard

No output. Check pipeline run to see if lychee panicked.
Full Github Actions output

Link Checker Dashboard

No output. Check pipeline run to see if lychee panicked.
Full Github Actions output

Talos github organization was renamed from `talos-systems` to `siderolabs`

Just wanted to point out that the github URLs in this repo still point to the old organization.
This organization was renamed a few months ago due to legal reasons, so it would be wise to update the URLs in this repo as well :)

These are the references I could find. Simply replace github.com/talos-systems with github.com/siderolabs

cluster-api-provider-hetzner/templates/node-image/talos-image/image.json

Line 32 in b56ff52

 "IMAGE_URL=https://github.com/talos-systems/talos/releases/download/{{user `talos_version`}}/hcloud-amd64.raw.xz" 

cluster-api-provider-hetzner/Tiltfile

Line 84 in 55796e1

 cabpt_uri = "https://github.com/talos-systems/cluster-api-bootstrap-provider-talos/releases/download/{}/bootstrap-components.yaml".format(version) 

cluster-api-provider-hetzner/Tiltfile

Line 90 in 55796e1

 cacppt_uri = "https://github.com/talos-systems/cluster-api-control-plane-provider-talos/releases/download/{}/control-plane-components.yaml".format(version) 

Fix usage of webhooks in unit tests with envtest

/kind bug

What steps did you take and what happened:
I tried to include unit tests in the /controllers folder that test the functionality of webhooks. However, as far as I can see, no webhooks are used at any stage and therefore the validation of webhooks does not work at all in our current unit tests. I don't know why, as the webhook port should be set correctly.

I tried to disable the renaming happening in the function appendWebhookConfiguration, but it didn't change anything.

What did you expect to happen:
Webhook server should be started in NewTestEnvironment (which is the case currently, as we check this in WaitForWebhooks) and the webhooks should be triggered when a relevant object is created or updated.

Environment:

cluster-api-provider-hetzner version: v1.0.0-alpha.10
Kubernetes version: 1.22
OS: Fedora 34

Improve /docs/reference/hetzner-bare-metal-machine-template.md

Describe the solution you'd like
/docs/reference/hetzner-bare-metal-machine-template.md and especially template.spec.hostSelector could probably use some references/examples.

Should we use GET/server object values from hetzner webservice reference for keys in the template.spec.hostSelector.matchExpressions.key ?

Anything else you would like to add:
I am also wondering whether there should be a cluster-template for HetznerBareMetalHost to make it clearer.

/kind proposal

NameServer limits were exceeded

/kind bug

What steps did you take and what happened:
I've followed the quickstart and setup a
kind HA management cluster with target Hetzner:

clusterctl init --core cluster-api --bootstrap kubeadm --control-plane kubeadm --infrastructure hetzner

# Make local cluster reliable
kubectl -n capi-system scale deployment capi-controller-manager --replicas=2
...

export HCLOUD_TOKEN="psst" \
export SSH_KEY="home-computer" \
export HCLOUD_IMAGE_NAME=fedora-35 \
export CLUSTER_NAME="my-cluster" \
export REGION="fsn1" \
export CONTROL_PLANE_MACHINE_COUNT=1\
export WORKER_MACHINE_COUNT=2 \
export KUBERNETES_VERSION=1.23.3 \
export HCLOUD_CONTROL_PLANE_MACHINE_TYPE=cpx21 \
export HCLOUD_NODE_MACHINE_TYPE=cpx31

# API secret
kubectl create secret generic hetzner --from-literal=hcloud=$HCLOUD_TOKEN
kubectl patch secret hetzner -p '{"metadata":{"labels":{"clusterctl.cluster.x-k8s.io/move":""}}}'

# Notice v1.23.3 and flavor hcloud-network
clusterctl generate cluster my-cluster --kubernetes-version v1.23.3 --control-plane-machine-count=1 --worker-machine-count=2 --flavor hcloud-network > my-cluster.yaml

# Cluster start
kubectl apply -f my-cluster.yaml

# -> Initiated!
kubectl get kubeadmcontrolplane

# Set hetzner kubeconfig as active
clusterctl get kubeconfig my-cluster > $PWD/hetzner.kubeconfig
export KUBECONFIG=$PWD/hetzner.kubeconfig

I received an error similar to this:
kubernetes/kubernetes#82756

# Temporary fix by removing 2 unused IPv6 entries
ssh root@all-servers
# remove 2 lines after 3 nameservers

vi /etc/resolv.conf
sudo resolvconf -a -d eth0

What did you expect to happen:
That fedora 35 wouldn't use unused nameserver entries.

Anything else you would like to add:
Great project so far!
Is it possible to make use of the cleanup in a kubeadm cloud-init script through systemd-resolvd?

Environment:

cluster-api-provider-hetzner version: [v1.0.0-alpha.9]
Kubernetes version: 1.23.3
OS fedora-35

bm-node routing broken after cloud-init (wrong gateway, duplicate routes)

/kind bug
/lifecycle active

What steps did you take and what happened:

Create a three-node cluster with flavor hetzner-baremetal-control-planes-remediation / Ubuntu 20.04 HWE.
Nodes with IPs within the same subnet – in my case 94.130.10.24 and 94.130.10.27 – won't be able to communicate with each other due to a wrong gateway route-setting:

=> non-working routing after HetznerBareMetalHost has finished rebooting and trying to join the cluster:

Pinging between the nodes by ssh-ing to either machine (first controlplane and second) does not work Destination Host Unreachable.

routing looks like this in rescue-mode:

This can be fixed manually by replacing the wrong gateway (0.0.0.0):

route del -net 94.130.10.0 gw 0.0.0.0 netmask 255.255.255.192 enp35s0
route add -net 94.130.10.0 gw 94.130.10.1 netmask 255.255.255.192 enp35s0

also route del -net 0.0.0.0 gw 94.130.10.1 netmask 0.0.0.0 enp35s0 as the first route is set-up twice.

What did you expect to happen:

Correct gateway in routing, no duplicate routes.

Environment:

cluster-api-provider-hetzner version: v1.0.0-beta.7
Kubernetes version: v1.24.8
OS: Ubuntu 20.04 HWE

CAPH-controller – secret has been modified although a provisioned machine uses it

/kind bug

What steps did you take and what happened:

Create a three-node cluster with flavor hetzner-baremetal-control-planes-remediation / Ubuntu 20.04 HWE using a fresh kind-bootstrap-cluster.
Moving objects to target cluster via clusterctl move
all bm-nodes stay in error-state as moving the secrets seems to alter them
the controller reports:

What did you expect to happen:

Moving the secrets from the bootstrap to the target cluster seems to alter them which should be "communicated" to the caph-controller (if so).

Environment:

cluster-api-provider-hetzner version: v1.0.0-beta.7
Kubernetes version: v1.24.8
OS: Ubuntu 20.04 HWE

Private network for bare metal host / cloud hybrid

/kind feature

Describe the solution you'd like
It is possible to link bare metal servers to the same private network as the cloud servers. I think it would be good to have an option for the private networking for hybrid clusters.

Anything else you would like to add:
More info: https://docs.hetzner.com/cloud/networks/connect-dedi-vswitch/

Adding the vswitch is possible with the robot web service:
https://robot.your-server.de/doc/webservice/en.html#post-vswitch

imageName in HCloudMachine doesnt work with snapshots

/kind bug

[Before submitting an issue, have you checked the Troubleshooting Guide]
Yes
What steps did you take and what happened:
[A clear and concise description of what the bug is.]
I have created a simple hcloudMachineTemplate. it has an imageName field. this name doesn't work with snapshots since snapshots in hcloud don't have a name. Snapshots are only created with a description, id or optional labels.

infrastructureprovider controller throws the following error if and id, description or labels are given in imageName field.

{"level":"ERROR","time":"2022-03-19T01:08:55.839Z","logger":"controller.hcloudmachine","file":"controller/controller.go:317","message":"Reconciler error","reconciler group":"infrastructure.cluster.x-k8s.io","reconciler kind":"HCloudMachine","name":"my-cluster-control-plane-mwtmn","namespace":"hcloud","error":"failed to reconcile server for HCloudMachine hcloud/my-cluster-control-plane-mwtmn: failed to create server: failed to get server image: no image found with name testImage","errorVerbose":"no image found with name testImage\nfailed to get server image\ngithub.com/syself/cluster-api-provider-hetzner/pkg/services/hcloud/server.(*Service).createServer\n\t/workspace/pkg/services/hcloud/server/server.go:175\ngithub.com/syself/cluster-api-provider-hetzner/pkg/services/hcloud/server.(*Service).Reconcile\n\t/workspace/pkg/services/hcloud/server/server.go:75\ngithub.com/syself/cluster-api-provider-hetzner/controllers.(*HCloudMachineReconciler).reconcileNormal\n\t/workspace/controllers/hcloudmachine_controller.go:185\ngithub.com/syself/cluster-api-provider-hetzner/controllers.(*HCloudMachineReconciler).Reconcile\n\t/workspace/controllers/hcloudmachine_controller.go:146\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile\n\t/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:114\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:311\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:266\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:227\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1581\nfailed to create server\ngithub.com/syself/cluster-api-provider-hetzner/pkg/services/hcloud/server.(*Service).Reconcile\n\t/workspace/pkg/services/hcloud/server/server.go:77\ngithub.com/syself/cluster-api-provider-hetzner/controllers.(*HCloudMachineReconciler).reconcileNormal\n\t/workspace/controllers/hcloudmachine_controller.go:185\ngithub.com/syself/cluster-api-provider-hetzner/controllers.(*HCloudMachineReconciler).Reconcile\n\t/workspace/controllers/hcloudmachine_controller.go:146\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile\n\t/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:114\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:311\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:266\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:227\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1581\nfailed to reconcile server for HCloudMachine hcloud/my-cluster-control-plane-mwtmn\ngithub.com/syself/cluster-api-provider-hetzner/controllers.(*HCloudMachineReconciler).reconcileNormal\n\t/workspace/controllers/hcloudmachine_controller.go:186\ngithub.com/syself/cluster-api-provider-hetzner/controllers.(*HCloudMachineReconciler).Reconcile\n\t/workspace/controllers/hcloudmachine_controller.go:146\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile\n\t/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:114\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:311\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:266\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:227\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1581","stacktrace":"sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:317\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:266\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:227"}

What did you expect to happen:
Image should be selectable using an Id, description or labels. It is a common usecase to pre-build kubernetes images and use them with clusterapi instead of using default system images and doing installations during image provisioning.

Anything else you would like to add:
[Miscellaneous information that will assist in solving the issue.]

Environment:

cluster-api-provider-hetzner version:
Kubernetes version: (use kubectl version)
OS (e.g. from /etc/os-release):

Link Checker Dashboard

No output. Check pipeline run to see if lychee panicked.
Full Github Actions output

Public IPv4 network settings for machine templates apply incorrectly

/kind bug

After discovering the functionality to disable public network access does exist, I gave it a try and found it was not applying the configuration correctly. See the comments below.

Also there is a typo in the documentation.

Original Request for Optional Public IPv4 addresses for nodes

IPv4 addresses are optional on Hetzner instances and are not strictly necessary when nodes are only intended to be accessible within the cluster and/or external access is always done in front of a load balancer. Alternatively, one might want to exclusively use IPv6 for external access.

It would be nice to recover the expense of IPv4 addresses for pools of nodes that do not strictly need those addresses.

Environment:

cluster-api-provider-hetzner version: 1.0.0-beta.3
Kubernetes version: 1.25.2
OS: N/A

controlplane bootstrap fails/stuck after first controlplane node

/kind bug

What steps did you take and what happened:
i tried to deploy a k8s cluster by using the Hetzner capi provider but the control plane is not able to get healthy.
the deployment stuck after the first control plane.

my steps:

clusterctl generate provider --infrastructure hetzner:v1.0.0-alpha.20 > hetzner-capi.yml
kubectl apply -f hetzner-capi.yml

export HCLOUD_TOKEN=xxxxxxxxxxxx
export HCLOUD_SSH_KEY="mykey"
export CLUSTER_NAME="mycluster"
export HCLOUD_REGION="fsn1"
export CONTROL_PLANE_MACHINE_COUNT=3
export WORKER_MACHINE_COUNT=3
export KUBERNETES_VERSION=1.24.1
export HCLOUD_CONTROL_PLANE_MACHINE_TYPE=cpx31
export HCLOUD_WORKER_MACHINE_TYPE=cpx31

kubectl create secret generic hetzner --from-literal=hcloud=${HCLOUD_TOKEN}
kubectl patch secret hetzner-p '{"metadata":{"labels":{"clusterctl.cluster.x-k8s.io/move":""}}}'
clusterctl generate cluster --infrastructure hetzner:v1.0.0-alpha.20 ${CLUSTER_NAME} > ${CLUSTER_NAME}.yaml
kubectl apply -f ${CLUSTER_NAME}.yaml

cluster is created but the controlplane is not able to get healthy:

$ clusterctl describe cluster ${CLUSTER_NAME}
NAME                                                                       READY  SEVERITY  REASON                       SINCE  MESSAGE                                                                  
Cluster/mycluster                                                          False  Warning   ScalingUp                    71m    Scaling up control plane to 3 replicas (actual 1)                        
├─ClusterInfrastructure - HetznerCluster/mycluster                                                                                                                                                       
├─ControlPlane - KubeadmControlPlane/mycluster-control-plane               False  Warning   ScalingUp                    71m    Scaling up control plane to 3 replicas (actual 1)                        
│ └─Machine/mycluster-control-plane-x5jb5                                  False  Warning   NodeStartupTimeout           49m    Node failed to report startup in &Duration{Duration:20m0s,}              
│   └─MachineInfrastructure - HCloudMachine/mycluster-control-plane-prxwq                                                                                                                                
└─Workers                                                                                                                                                                                                
  └─MachineDeployment/mycluster-md-0                                       False  Warning   WaitingForAvailableMachines  72m    Minimum availability requires 3 replicas, current 0 available            
    └─3 Machines...                                                        True                                          7m53s  See mycluster-md-0-59f5696b48-khjkp, mycluster-md-0-59f5696b48-v57kg, ...

$ kubectl get KubeadmControlPlane
NAME                          CLUSTER         INITIALIZED   API SERVER AVAILABLE   REPLICAS   READY   UPDATED   UNAVAILABLE   AGE   VERSION
mycluster-control-plane       mycluster       true                                 1                  1         1             73m   v1.24.1

$ kubectl describe KubeadmControlPlane mycluster-control-plane
.....
Events:
  Type     Reason                 Age                    From                              Message
  ----     ------                 ----                   ----                              -------
  Warning  ControlPlaneUnhealthy  3m28s (x285 over 73m)  kubeadm-control-plane-controller  Waiting for control plane to pass preflight checks to continue reconciliation: [machine mycluster-control-plane-x5jb5 does not have APIServerPodHealthy condition, machine mycluster-control-plane-x5jb5 does not have ControllerManagerPodHealthy condition, machine mycluster-control-plane-x5jb5 does not have SchedulerPodHealthy condition, machine mycluster-control-plane-x5jb5 does not have EtcdPodHealthy condition, machine mycluster-control-plane-x5jb5 does not have EtcdMemberHealthy condition]

$ kubectl get MachineHealthCheck
NAME                                       CLUSTER         EXPECTEDMACHINES   MAXUNHEALTHY   CURRENTHEALTHY   AGE
mycluster-control-plane-unhealthy-5m       mycluster       1                  100%                            74m
mycluster-md-0-unhealthy-5m                mycluster       3                  100%                            74m

$ kubectl describe MachineHealthCheck mycluster-control-plane-unhealthy-5m
Events:
  Type     Reason          Age                 From                           Message
  ----     ------          ----                ----                           -------
  Warning  ReconcileError  75m (x13 over 75m)  machinehealthcheck-controller  error creating client and cache for remote cluster: error fetching REST client config for remote cluster "default/mycluster": failed to retrieve kubeconfig secret for Cluster default/mycluster: secrets "mycluster-kubeconfig" not found
  Warning  ReconcileError  74m                 machinehealthcheck-controller  error creating client and cache for remote cluster: error creating dynamic rest mapper for remote cluster "default/mycluster": Get "https://142.132.240.114:443/api?timeout=10s": dial tcp 142.132.240.114:443: i/o timeout
  Warning  ReconcileError  73m (x4 over 74m)   machinehealthcheck-controller  error creating client and cache for remote cluster: error creating dynamic rest mapper for remote cluster "default/mycluster": context deadline exceeded

$ kubectl get secrets
NAME                                TYPE                                  DATA   AGE
default-token-vcccj                 kubernetes.io/service-account-token   3      5h35m
hetzner                             Opaque                                1      78m
mycluster-ca                        cluster.x-k8s.io/secret               2      76m
mycluster-control-plane-xkvnf       cluster.x-k8s.io/secret               2      76m
mycluster-etcd                      cluster.x-k8s.io/secret               2      76m
mycluster-kubeconfig                cluster.x-k8s.io/secret               1      76m
mycluster-md-0-48wdl                cluster.x-k8s.io/secret               2      12m
mycluster-md-0-rlccb                cluster.x-k8s.io/secret               2      13m
mycluster-md-0-x2q8s                cluster.x-k8s.io/secret               2      12m
mycluster-proxy                     cluster.x-k8s.io/secret               2      76m
mycluster-sa                        cluster.x-k8s.io/secret               2      76m

# Get the node status of the deployed cluster
$ kubectl get no --kubeconfig mycluster
NAME                            STATUS   ROLES              AGE    VERSION
mycluster-control-plane-x5jb5   NotReady    control-plane   74m    v1.24.1
mycluster-md-0-khjkp            NotReady    <none>          72m    v1.24.1
mycluster-md-0-v57kg            NotReady    <none>          72m    v1.24.1
mycluster-md-0-x5d32            NotReady    <none>          72m    v1.24.1

# Try to fetch API - possible
$ curl https://142.132.240.114:443/api?timeout=10s -k
{
  "kind": "Status",
  "apiVersion": "v1",
  "metadata": {},
  "status": "Failure",
  "message": "forbidden: User \"system:anonymous\" cannot get path \"/api\"",
  "reason": "Forbidden",
  "details": {},
  "code": 403
}

If tested hetzner:v1.0.0-alpha.19 and hetzner:v1.0.0-alpha.20 but i get the same result.

Environment:

cluster-api-provider-hetzner version: v1.0.0-alpha.19 and v1.0.0-alpha.20
Kubernetes version:

Client Version: version.Info{Major:"1", Minor:"22", GitVersion:"v1.22.4", GitCommit:"b695d79d4f967c403a96986f1750a35eb75e75f1", GitTreeState:"clean", BuildDate:"2021-11-17T15:48:33Z", GoVersion:"go1.16.10", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"22", GitVersion:"v1.22.6", GitCommit:"42a9a90338d705a1650fb68b7891f84b62adb0b0", GitTreeState:"clean", BuildDate:"2022-06-15T04:25:21Z", GoVersion:"go1.16.12", Compiler:"gc", Platform:"linux/amd64"}

OS:
- Server image: ubuntu 20.04
- client: OSX

using on 3x physical servers and sw raid 1?

/kind feature

Describe the solution you'd like
We are running k8s clusters, run up by puppet currently, which are setup like this:
software raid 1 on nvme disks (fast but smaller) and on hdd disks (slow but large)
and we put OS on a small seperate sw raid1 on HDD - and rest is just unpartitioned disk, grabbed by Ceph.
we then setup rook/ceph to use HDD (with 1gb interface, fast disks is not gonna work anyways for ceph) - and then we setup localzfs storageclass for nvme disks, and use those for HA postgres-operator clusters (with HA in operator it handles failover and data sync in app - so local disk is best and fast).
We then use the hetzner docker image, that manages a floating ip - and points it to the node on which it is running - creating a HA bare metal cluster, where floating ip is simply moved to diff. node - if active ingress server dies
.
I am trying to figure out, how to use this to roll out such a cluster setup on 3 new bare metal servers with hetzner.. ?

ARM compatible image

/kind feature

Describe the solution you'd like
[A clear and concise description of what you want to happen.]

Make image Multi-Arch / ARM compatible

Environment:

OS (e.g. from /etc/os-release): macOS 12.2.1 / Raspberry Pi OS (64-bit)

Link Checker Dashboard

No output. Check pipeline run to see if lychee panicked.
Full Github Actions output

Link Checker Dashboard

No output. Check pipeline run to see if lychee panicked.
Full Github Actions output

If Loadbalancer is protected do not try to delete

/kind bug

If a Loadbalancer is protected and cannot be deleted by the controller the system will try at the moment forever. We should check on deletion if the lb is protected and then skip the deletion (maybe also throwing an error or event).

Breaking changes documentation

First off, thanks a bunch for the amazing work on this cluster api provider!
We're using alpha.13 for our development clusters, at so far it's been pretty awesome in tandem with talos!
I was considering using this to spin up a new production cluster, however I saw that there are some Breaking changes from .16 to .17, but I wasn't able to see what was needed to be done in order to migrate from an older version from .13 to .17 :-)

Is there any way to figure out what the breaking change is?

Thanks!

Upgrade controlplane to a more powerful node

/kind bug

Hi, I am asking myself, what would be the best way to upgrade the control plane server?
cx21 -> cx31

I've created a helm chart for this:

Before

control_plane_machine_count: 1
hcloud_control_plane_machine_type: cx21

After

control_plane_machine_count: 1
hcloud_control_plane_machine_type: cx31

But upgrade of the helm chart had not the intended effect of migrating everything to a cx31 server.

Environment:

cluster-api-provider-hetzner version: v1.0.0-alpha.11
Kubernetes version: (use kubectl version) 1.23.4
OS (e.g. from /etc/os-release): fedora-35

Link Checker Dashboard

No output. Check pipeline run to see if lychee panicked.
Full Github Actions output

Link Checker Dashboard

No output. Check pipeline run to see if lychee panicked.
Full Github Actions output

Please make SSH keys optional for clean a Talos setup

/kind feature

Describe the solution you'd like
Specification of SSH Keys should be optional

Anything else you would like to add:
Talos doesn't provide shell access so there is no point in configuring an SSH key for Talos based nodes. As it is currently, I have to configure a dummy key only because of https://github.com/syself/cluster-api-provider-hetzner/blob/main/pkg/services/hcloud/server/server.go#L233

Environment:

cluster-api-provider-hetzner version: v1.0.0-alpha.14

{"level":"ERROR","time":"2022-02-22T10:12:50.710Z","logger":"controller.hcloudmachine","file":"controller/controller.go:317","message":"Reconciler error","reconciler group":"infrastructure.cluster.x-k8s.io","reconciler kind":"HCloudMachine","name":"test-control-plane-b8a01-f65ww","namespace":"cluster","error":"failed to reconcile server for HCloudMachine cluster/test-control-plane-b8a01-f65ww: failed to create server: error while creating HCloud server test-control-plane-b8a01-f65ww: error during placement (resource_unavailable)","errorVerbose":"error while creating HCloud server test-control-plane-b8a01-f65ww: error during placement (resource_unavailable)\nfailed to create server\ngithub.com/syself/cluster-api-provider-hetzner/pkg/services/hcloud/server.(Service).Reconcile\n\t/workspace/pkg/services/hcloud/server/server.go:78\ngithub.com/syself/cluster-api-provider-hetzner/controllers.(HCloudMachineReconciler).reconcileNormal\n\t/workspace/controllers/hcloudmachine_controller.go:180\ngithub.com/syself/cluster-api-provider-hetzner/controllers.(HCloudMachineReconciler).Reconcile\n\t/workspace/controllers/hcloudmachine_controller.go:141\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(Controller).Reconcile\n\t/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:114\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(Controller).reconcileHandler\n\t/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:311\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(Controller).processNextWorkItem\n\t/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:266\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(Controller).Start.func2.2\n\t/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:227\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1581\nfailed to reconcile server for HCloudMachine cluster/test-control-plane-b8a01-f65ww\ngithub.com/syself/cluster-api-provider-hetzner/controllers.(HCloudMachineReconciler).reconcileNormal\n\t/workspace/controllers/hcloudmachine_controller.go:181\ngithub.com/syself/cluster-api-provider-hetzner/controllers.(HCloudMachineReconciler).Reconcile\n\t/workspace/controllers/hcloudmachine_controller.go:141\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(Controller).Reconcile\n\t/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:114\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(Controller).reconcileHandler\n\t/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:311\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(Controller).processNextWorkItem\n\t/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:266\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(Controller).Start.func2.2\n\t/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:227\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1581","stacktrace":"sigs.k8s.io/controller-runtime/pkg/internal/controller.(Controller).reconcileHandler\n\t/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:317\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(Controller).processNextWorkItem\n\t/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:266\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(Controller).Start.func2.2\n\t/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:227"}

What did you expect to happen:
The server should be created.

Anything else you would like to add:
It looks like that the controller is not requeuing after the error. Workaround is to restart the controller.

Environment:

cluster-api-provider-hetzner version: v1.0.0-alpha.10
Kubernetes version: v1.23.3
OS: fedora-35

Before submitting an issue, have you checked the Troubleshooting Guide

But this file docs/topics/troubleshooting.md does not exist.

if no placement group is set - they shouldn't be reconciled

/kind bug

If no placementGroup is provided. Placement groups should not be reconciled. @janiskemper

syself / cluster-api-provider-hetzner Goto Github PK

cluster-api-provider-hetzner's People

Contributors

Stargazers

Watchers

Forkers

cluster-api-provider-hetzner's Issues

Awaiting Schedule

Open

Detected dependencies

Original Request for Optional Public IPv4 addresses for nodes

Before

After

Recommend Projects

Recommend Topics

Recommend Org