metal-stack / metal-ccm Goto Github PK

View Code? Open in Web Editor NEW

4.0 7.0 0.0 377 KB

K8s Cloud Controller Manager for metal-stack

License: MIT License

Dockerfile 0.28% Go 95.23% Makefile 1.17% Shell 3.32%

kubernetes cloud-controller-manager bare-metal

metal-ccm's Introduction

Kubernetes Cloud Controller Manager for metal

metal-ccm is the Kubernetes cloud controller manager implementation for Metal.

Deploy

Read how to deploy the metal CCM here!

Building

To build the binary, run:

make build

It will deposit the binary for your local architecture as dist/bin/metal-cloud-controller-manager-$(ARCH)

By default make build builds the binary using a docker container. To install using your locally installed go toolchain, do:

make build LOCALBUILD=true

Docker Image

To build a docker image, run:

make dockerimage

The image will be tagged with :latest.

metal-ccm's People

Contributors

Stargazers

Watchers

metal-ccm's Issues

Explicitly set service's LoadBalancerIP in EnsureLoadBalancer

Sometimes we see different IP addresses on service type loadbalancer resources than we acquired through the CCM. For Gardener, this means that the cluster does not come up because DNS entries are set for another IP address and things like that.

I think, it is not guaranteed that the service gets the acquired IP address only when we return it in the load balanacer status.

Also in other CCM implementations I see the service instances are getting updated / patched.

Maybe we should just set the LoadBalancerIP field such that MetalLB will explicitly make use of the IP that we acquired with the CCM.

check against https://github.com/kubernetes/cloud-provider

IP not freed when primary asn node label is missing

Leads to unused ip addresses:

cloud-controller-manager-78d46b8cd-lfwhp cloud-controller-manager I1030 10:58:03.057786       1 event.go:209] Event(v1.ObjectReference{Kind:"Service", Namespace:"kube-system", Name:"vpn-shoot", UID:"f2bcc242-cc87-4bf1-a5a1-9ca51676fab4", APIVersion:"v1", ResourceVersion:"259", FieldPath:""}): type: 'Normal' reason: 'EnsuringLoadBalancer' Ensuring load balancer
cloud-controller-manager-78d46b8cd-lfwhp cloud-controller-manager I1030 10:58:03.654823       1 log.go:172] metal-ccm housekeeping | metallb syncher failed: error updating metallb config: node "shoot--local--fra-equ01-default-worker-z1-75b96486f8-8fvpz" misses label: machine.metal-pod.io/network.primary.asn
cloud-controller-manager-78d46b8cd-lfwhp cloud-controller-manager I1030 10:58:03.741365       1 log.go:172] metal-ccm loadbalancer | acquired ip in network "internet-fra-equ01": 212.34.83.2
cloud-controller-manager-78d46b8cd-lfwhp cloud-controller-manager E1030 10:58:04.051544       1 service_controller.go:219] error processing service kube-system/vpn-shoot (will retry): failed to ensure load balancer for service kube-system/vpn-shoot: node "shoot--local--fra-equ01-default-worker-z1-75b96486f8-8fvpz" misses label: machine.metal-pod.io/network.primary.asn
cloud-controller-manager-78d46b8cd-lfwhp cloud-controller-manager I1030 10:58:04.052051       1 event.go:209] Event(v1.ObjectReference{Kind:"Service", Namespace:"kube-system", Name:"vpn-shoot", UID:"f2bcc242-cc87-4bf1-a5a1-9ca51676fab4", APIVersion:"v1", ResourceVersion:"259", FieldPath:""}): type: 'Warning' reason: 'CreatingLoadBalancerFailed' Error creating load balancer (will retry): failed to ensure load balancer for service kube-system/vpn-shoot: node "shoot--local--fra-equ01-default-worker-z1-75b96486f8-8fvpz" misses label: machine.metal-pod.io/network.primary.asn
...
cloud-controller-manager-78d46b8cd-lfwhp cloud-controller-manager I1030 10:58:09.051825       1 service_controller.go:300] Ensuring LB for service kube-system/vpn-shoot
cloud-controller-manager-78d46b8cd-lfwhp cloud-controller-manager I1030 10:58:09.052160       1 log.go:172] metal-ccm loadbalancer | EnsureLoadBalancer: clusterName "", namespace "kube-system", serviceName "vpn-shoot", nodes "shoot--local--fra-equ01-default-worker-z1-75b96486f8-8fvpz"
cloud-controller-manager-78d46b8cd-lfwhp cloud-controller-manager I1030 10:58:09.052720       1 event.go:209] Event(v1.ObjectReference{Kind:"Service", Namespace:"kube-system", Name:"vpn-shoot", UID:"f2bcc242-cc87-4bf1-a5a1-9ca51676fab4", APIVersion:"v1", ResourceVersion:"259", FieldPath:""}): type: 'Normal' reason: 'EnsuringLoadBalancer' Ensuring load balancer
cloud-controller-manager-78d46b8cd-lfwhp cloud-controller-manager I1030 10:58:09.172871       1 log.go:172] metal-ccm loadbalancer | acquired ip in network "internet-fra-equ01": 212.34.83.3
cloud-controller-manager-78d46b8cd-lfwhp cloud-controller-manager I1030 10:58:09.296879       1 log.go:172] metal-ccm loadbalancer | metallb config updated successfully
cloud-controller-manager-78d46b8cd-lfwhp cloud-controller-manager I1030 10:58:09.297408       1 event.go:209] Event(v1.ObjectReference{Kind:"Service", Namespace:"kube-system", Name:"vpn-shoot", UID:"f2bcc242-cc87-4bf1-a5a1-9ca51676fab4", APIVersion:"v1", ResourceVersion:"259", FieldPath:""}): type: 'Normal' reason: 'EnsuredLoadBalancer' Ensured load balancer

Performance: Single backend call to retrieve cluster nodes for label sync

At the moment we have an individual backend call for every node, which can also be a single one.

When metal-api is stressed out, the last nodes of large clusters pretty much reliably cannot be updated during tag sync.

Node shutdown not implemented sufficiently

I saw in #71 that we implemented machine shutdown like:

// InstanceShutdown returns true if the instance is shutdown according to the cloud provider.
// Use the node.name or node.spec.providerID field to find the node in the cloud provider.
func (i *InstancesController) InstanceShutdown(ctx context.Context, node *v1.Node) (bool, error) {
	klog.Infof("InstanceShutdown: node %q", node.GetName())
	machine, err := i.MetalService.GetMachineFromProviderID(ctx, node.Spec.ProviderID)
	if err != nil || machine.Allocation == nil {
		return true, err
	}
	return false, nil
}

We can really tell if a machine is shut down through the machine IPMI information now, so we could adjust that information.

Remove superflous log message

cloud-controller-manager-7bfdbc5f6c-c8rcd cloud-controller-manager I1105 15:53:32.544159       1 log.go:172] metal-ccm housekeeping | node was not modified and calico tunnel address has not changed, not updating metallb config

Performance: Don't `network ls` without filtering

provide health/readiness endpoint

cloud-controller-manager should report unhealthy if metal-api is not reachable.

Adopt to new cloud-provider startup logic once cloud-provider 1.20 is released

Sample can be seen here:

https://github.com/kubernetes/cloud-provider/tree/master/sample

We also need to remove the k8s.io/kubernetes dependency.

CCM configures all external networks with IP auto-assignment for metalLB

Not fully sure if this a real issue at the moment, but we saw that during Kubernetes cluster creation the kube-apiserver was getting an IP address of an external network that has no internet connectivity. This prevented the cluster from coming up.

I would assume that only the default external network should have enabled IP auto-assigment for MetalLB?

Tag deletion can go wrong

Seems like it's possible that a tag deletion on an IP address fails and then the IP address will stick with this tag for the rest of its allocation time.

Is there a way we can reconcile the tag state of IP addresses?

region not set correctly

region is not set correctly, which leads to inconsistencies in node tags:

    failure-domain.beta.kubernetes.io/region: fel-wps101
    failure-domain.beta.kubernetes.io/zone: fel-wps101
    (...)
    topology.kubernetes.io/region: fel
    topology.kubernetes.io/zone: fel-wps101

Validate cache entries for allocation project and hostname matches

Default external network algorithm is too opinionated

Finding the default external network from which to acquire IPs is pretty opinionated:

Prefer partition networks with network IDs that start with internet
Otherwise try to find an external network called internet

Firstly, it's a questionable convention that every adopter must call networks like this. Secondly, and more importantly, this algorithm can lead to confusing behavior because the CCM decides on its own where to acquire IPs from. In the current situation we have a problem that we want to slowly migrate from partition-specific internet networks to a global internet network. However, the CCM always prefers the partition-bound internet network over the global network, such that all new clusters still use the partition bound network, which makes the migration very hard.

This PR explicitly requires to configure an external network ID. This given network is then being used for acquiring IP addresses in case no specific pools were specified by the user.

Regarding Gardener integration the idea is the following:

Introduce a new field in the control plane provider config, e.g. cloudControllerManager.defaultExternalNetwork
If this field is undefined
- Search for networks that have a tag called network.metal-stack.io/default-external (needs to be introduced, alternatively use internet network ID convention as before)
- Iterate over the networks of the firewall config to check if the network is contained in the search result
- Take the first network that meets this condition, otherwise (If there is no such network) fail the reconciliation

add clusterid tag if not present anymore

External IP address of service load balancer does not update status

There is a service in the cluster of type load balancer, metal-ccm will properly acquire the IP at the metal-api and write it into the service's load balancer status field:

k get svc -o yaml ingress-nginx 
apiVersion: v1
kind: Service
metadata:
  ...
  finalizers:
  - service.kubernetes.io/load-balancer-cleanup
  labels:
    app.kubernetes.io/name: ingress-nginx
    app.kubernetes.io/part-of: ingress-nginx
  name: ingress-nginx
  namespace: ingress-nginx
  ...
spec:
  clusterIP: 10.244.73.145
  externalTrafficPolicy: Local
  healthCheckNodePort: 30904
  ports:
  - name: http
    nodePort: 30469
    port: 80
    protocol: TCP
    targetPort: http
  - name: https
    nodePort: 31565
    port: 443
    protocol: TCP
    targetPort: https
  selector:
    app.kubernetes.io/name: ingress-nginx
    app.kubernetes.io/part-of: ingress-nginx
  sessionAffinity: None
  type: LoadBalancer
status:
  loadBalancer:
    ingress:
    - ip: 185.153.67.128

When you do the following:

Delete this service again
(metal-ccm releases the IP)
(Someone else acquires the IP)
And you recreate this service

Then:

metal-ccm will acquire a new IP address
The service will get the old IP address instead of the new address
The old IP is not in the metallb config anymore, so this works fine
The service refuses to take any other IP address even when trying to set it explicitly via the LoadBalancerIP field

k get svc
NAME            TYPE           CLUSTER-IP      EXTERNAL-IP   PORT(S)                      AGE
ingress-nginx   LoadBalancer   10.244.73.145   <pending>     80:30469/TCP,443:31565/TCP   13m

k get svc -o yaml ingress-nginx 
apiVersion: v1
kind: Service
metadata:
  ...
  finalizers:
  - service.kubernetes.io/load-balancer-cleanup
  labels:
    app.kubernetes.io/name: ingress-nginx
    app.kubernetes.io/part-of: ingress-nginx
  name: ingress-nginx
  namespace: ingress-nginx
  ...
spec:
  clusterIP: 10.244.73.145
  externalTrafficPolicy: Local
  healthCheckNodePort: 30904
  loadBalancerIP: 185.153.67.234
  ports:
  - name: http
    nodePort: 30469
    port: 80
    protocol: TCP
    targetPort: http
  - name: https
    nodePort: 31565
    port: 443
    protocol: TCP
    targetPort: https
  selector:
    app.kubernetes.io/name: ingress-nginx
    app.kubernetes.io/part-of: ingress-nginx
  sessionAffinity: None
  type: LoadBalancer
status:
  loadBalancer: {}

Disable auto-assign for ephemeral IPs after Gardener 1.18

With version 1.18 of Gardener the LoadBalancerIP field of a service is preserved: gardener-attic/gardener-resource-manager#108

The auto-assign feature is not used because we set the loadBalancerIP field of a service directly. Auto-assignment has the danger of invalidating our label associations for IPs.

Because we have very little IPs available in our environments at the moment, we see weird issues like IPs moving over from the one service to another (vpn-shoot to an ingress) when certain nodes are restarted. Disabling auto-assign will eliminate this behavior.

NPE prevents makes cluster unhealthy

in v0.7.8

Maybe machine was deleted manually

cloud-controller-manager-d6f9654fd-8qnkt cloud-controller-manager I0625 09:00:30.495942       1 controller.go:701] Successfully updated 0 out of 0 load balancers to direct traffic to the updated set of nodes
cloud-controller-manager-d6f9654fd-8qnkt cloud-controller-manager I0625 09:00:30.496463       1 instances.go:197] InstanceMetadata: node "shoot--5435c37fd9--albatross-group-0-6fd98-9j9w8"
cloud-controller-manager-d6f9654fd-8qnkt cloud-controller-manager panic: runtime error: invalid memory address or nil pointer dereference
cloud-controller-manager-d6f9654fd-8qnkt cloud-controller-manager [signal SIGSEGV: segmentation violation code=0x1 addr=0x30 pc=0x1d71ca0]
cloud-controller-manager-d6f9654fd-8qnkt cloud-controller-manager 
cloud-controller-manager-d6f9654fd-8qnkt cloud-controller-manager goroutine 563 [running]:
cloud-controller-manager-d6f9654fd-8qnkt cloud-controller-manager github.com/metal-stack/metal-ccm/pkg/controllers/housekeeping.(*Housekeeper).getMachineTags(0x26ff038?, {0xc000594000?, 0x0?, 0x0?})
cloud-controller-manager-d6f9654fd-8qnkt cloud-controller-manager       github.com/metal-stack/metal-ccm/pkg/controllers/housekeeping/tags.go:76 +0xa0
cloud-controller-manager-d6f9654fd-8qnkt cloud-controller-manager github.com/metal-stack/metal-ccm/pkg/controllers/housekeeping.(*Housekeeper).syncMachineTagsToNodeLabels(0xc000894b00)
cloud-controller-manager-d6f9654fd-8qnkt cloud-controller-manager       github.com/metal-stack/metal-ccm/pkg/controllers/housekeeping/tags.go:35 +0xcc
cloud-controller-manager-d6f9654fd-8qnkt cloud-controller-manager github.com/metal-stack/metal-ccm/pkg/controllers/housekeeping.(*tickerSyncer).Start(0x442f65?, {0x2381f5c, 0xc}, 0x9071ea?, 0xc000b11200, 0xc000efc820)
cloud-controller-manager-d6f9654fd-8qnkt cloud-controller-manager       github.com/metal-stack/metal-ccm/pkg/controllers/housekeeping/ticker.go:21 +0xac
cloud-controller-manager-d6f9654fd-8qnkt cloud-controller-manager created by github.com/metal-stack/metal-ccm/pkg/controllers/housekeeping.(*Housekeeper).startTagSynching
cloud-controller-manager-d6f9654fd-8qnkt cloud-controller-manager       github.com/metal-stack/metal-ccm/pkg/controllers/housekeeping/tags.go:23 +0xd8

Create a docker image

Deploy metalLB Config for Cilium Clusters

For Cilium clusters we can skip the deployment of metalLB in the GEPM and just configure Cilium to do the BGP load balancing through the same kind of config: https://docs.cilium.io/en/v1.12/gettingstarted/bgp/