Git Product home page Git Product logo

Comments (7)

iron-sam avatar iron-sam commented on August 19, 2024

I opened an issue on github.com/drone/drone, and just realized that this already exist. We are facing the same problem. This is the text of my issue, because maybe adds some more info:

Problem

When Drone configures affinity of pipelines' pods, it sets this chunk of code:

spec:
  affinity:                                                                                                                                                                                                                                                                                                                  
    nodeAffinity:                                                                                                                                                                                                                                                                                                            
      requiredDuringSchedulingIgnoredDuringExecution:                                                                                                                                                                                                                                                                        
        nodeSelectorTerms:                                                                                                                                                                                                                                                                                                   
        - matchExpressions:                                                                                                                                                                                                                                                                                                  
          - key: kubernetes.io/hostname                                                                                                                                                                                                                                                                                      
            operator: In                                                                                                                                                                                                                                                                                                     
            values:                                                                                                                                                                                                                                                                                                          
            - ip-10-100-45-161.eu-west-1.compute.internal

Correct me if I'm wrong, but this affinity seems to be set based on pipeline's pod spec.nodeName. On AWS EKS, worker nodes names are tagged like ${HOST_DNS}.${AWS_REGION}.compute.internal, but Kubernetes' label kubernetes.io/hostname is set to the actual host name of the EC2 instance, ${HOST_DNS}, e.g.:

kubectl get nodes -L kubernetes.io/hostname
NAME                                          STATUS   ROLES    AGE    VERSION   HOSTNAME
ip-10-100-15-22.eu-west-1.compute.internal    Ready    <none>   12d    v1.12.7   ip-10-100-15-22
ip-10-100-45-161.eu-west-1.compute.internal   Ready    <none>   121m   v1.12.7   ip-10-100-45-161
ip-10-100-63-36.eu-west-1.compute.internal    Ready    <none>   11d    v1.12.7   ip-10-100-63-36

So, when the pipeline launches a new namespace to run all the steps of the pipeline, child jobs have the upper affinity, and never find a node to schedule the pod.

Possible solution

You can set your nodes label kubernetes.io/hostname to the node name manually, and Drone would launch pods without errors.

But maybe Drone should use another type of affinity, or none affinity at all, but I assume that the affinity setting is there for a reason.

Let me know if you need more information.

Thank you for everything!

from drone-runtime.

iron-sam avatar iron-sam commented on August 19, 2024

I've been trying some workarounds, and be able to fix this issue without touching Drone configuration.

I'm using EKS Terraform Module to create our cluster and auto scaling groups. Due to the fact that nodeName != hostname when worker nodes are created, I added one line to user_data to set hostname equal to nodeName (like ip-XXX-XXX-XXX-XXX.${REGION}.compute.internal), just like this:

# Code from Terraform EKS Module
  worker_groups = [
    {
      key_name             = "${aws_key_pair.key.key_name}"
      name                 = "workers-m5"
      # pre_userdata sets hostname equal to dns
      pre_userdata         = "hostnamectl set-hostname $$( cat /etc/hostname ).${var.aws_region}.compute.internal" 
      kubelet_extra_args   = "--node-labels=stage=${var.stage}"
      asg_desired_capacity = 2
      asg_max_size         = 2
      asg_min_size         = 1
      instance_type        = "m5.large"
      enable_monitoring    = false
      public_ip            = false
      autoscaling_enabled  = false
    }
]

New nodes associated with this auto scaling group will be created with the same node name, and Kubernetes label ''kubernetes.io/hostname`.

Hope this will be helpful to someone. :)

from drone-runtime.

totogo avatar totogo commented on August 19, 2024

@iron-sam sam I'm using this workaround you mentioned and it works:

You can set your nodes label kubernetes.io/hostname to the node name manually, and Drone would launch pods without errors.

But we are using Rancher to manage the Kubernetes cluster, we might have issues when I scale up and down the cluster nodes.

from drone-runtime.

iron-sam avatar iron-sam commented on August 19, 2024

@totogo I don't use Rancher, but the idea is to set the host name of the server to the DNS assigned by AWS + region + compute.internal. Check if Rancher allows adding user data script on instance creation.

from drone-runtime.

HighwayofLife avatar HighwayofLife commented on August 19, 2024

Rancher does have a hostname override option.

from drone-runtime.

bradrydzewski avatar bradrydzewski commented on August 19, 2024

We are overhauling the kubernetes runtime, and should have an alpha available end of next week. The runtime is standalone and is being developed at https://github.com/drone-runners/drone-runner-kube

This new implementation is better aligned with kubernetes and executes all pipeline steps in a single pod, similar to Tekton, as opposed to executing each step in its own pod. Since all steps are executed in the same pod we no longer need to rely on affinity, which means this issue goes away :)

from drone-runtime.

bradrydzewski avatar bradrydzewski commented on August 19, 2024

closing this since docs for the new (still experimental) kubernetes runner will be posted today for early adopters. This iteration of the runner does not use node affinity and instead runs all pipeline steps in the same pod.
https://github.com/drone-runners/drone-runner-kube

from drone-runtime.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.