Git Product home page Git Product logo

amazon-vpc-resource-controller-k8s's Introduction

amazon-vpc-resource-controller-k8s

GitHub go.mod Go version Go Report Card GitHub

Usage

Controller running on EKS Control Plane for managing Branch & Trunk Network Interface for Kubernetes Pod using the Security Group for Pod feature and IPv4 Address Management(IPAM) of Windows Nodes.

The controller broadcasts its version to nodes. Describing any node will provide the version information in node Events. The mapping between the controller's version and the cluster's platform version is also available in release notes.

Security Group for Pods

The controller only manages the Trunk/Branch Network Interface for EKS Cluster using the Security Group for Pods feature. The Networking on the host is setup by amazon-vpc-cni-k8s plugin.

ENI Trunking is a private feature even though the APIs are publicly accessible using AWS SDK. Hence, attempting to run the controller on your worker node for enabling Security Group for Pod for managing Trunk and Branch Network Interface will result in failure of the API calls.

Please follow the guide for enabling Security Group for Pods on your EKS Cluster.

Note: The SecurityGroupPolicy CRD only supports up to 5 security groups per custom resource. If you need more than 5 security groups for a pod, please consider to use more than one custom resources. For example, you can have two custom resources to associate up to 10 security groups to a pod. Please be aware when you are doing so:

1, you need to request increasing the limit since the default limit is 5 security groups per interface and there is a hard limit of 16 currently.

2, currently Fargate only allows up to 5 security groups. If you are using Fargate, you can only use up to 5 security groups per pod.

Windows IPv4 Address Management

The controller manages the IPv4 Addresses for all the Windows Node in EKS Cluster and allocates IPv4 Address to Windows Pods. The Networking on the host is setup by amazon-vpc-cni-plugins.

The controller supports the following modes for IPv4 address management on Windows-

  • Secondary IPv4 address mode → Secondary private IPv4 addresses are assigned to the primary instance ENI and the same are allocated to the Windows pods.

    For more details about the high level workflow, please visit our documentation here.

  • Prefix delegation mode → /28 IPv4 prefixes are assigned to the primary instance ENI and the IP addresses from the prefix are allocated to the Windows pods.

    For more details about the configuration options with prefix delegation, please visit our documentation here.

    For more details about the high level workflow, please visit our documentation here.

Please follow this guide for enabling Windows Support on your EKS cluster.

Configuring the controller via amazon-vpc-cni configmap

The controller supports various configuration options for managing security groups for pods and Windows nodes which can be set via the EKS-managed configmap amazon-vpc-cni. For more details, refer to the security group for pods configuration options here and Windows IPAM/PD related configuration options here

Troubleshooting

For troubleshooting issues related to Security group for pods or Windows IPv4 address management, please visit our troubleshooting guide here.

License

This library is licensed under the Apache 2.0 License.

Contributing

See CONTRIBUTING.md

We would appreciate your feedback and suggestions to improve the project and your experience with EKS and Kubernetes.

amazon-vpc-resource-controller-k8s's People

Contributors

abhipth avatar dependabot[bot] avatar ellistarn avatar gnatorx avatar haouc avatar ivyostosh avatar jaydeokar avatar jchen6585 avatar jdn5126 avatar jiechen0826 avatar jyotimahapatra avatar krishnaindani avatar m00nf1sh avatar nithu0115 avatar oliviassss avatar orsenthil avatar rawahars avatar saranbalaji90 avatar sushrk avatar wangamel avatar waterincup avatar xdu31 avatar ysam12345 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

amazon-vpc-resource-controller-k8s's Issues

Allow Security Groups to be referenced by Name in Security Group Policies

What would you like to be enhanced:

IIUC, the Security Group Policies, can only contain a security group id today. Can this be extended to support security group names as well?

Why is the change needed and what use case will it solve:

In our setup developers have access to security-group-names that help them understand what resources they will have access to (for e.g. allow-foo-service). This makes it easy for developers to attach the appropriate security groups to their deployments. Supporting this use case will enable us in directly using the interface provided by the VPC Resource Controller.

Bug: Cannot use PodSecurityGroups with C7G instances - EKS AMI and VPC CNI support C7G's, but vpc-resource-controller does not

Describe the Bug:
C7G instances appear to be supported in EKS - as per these references -->

Unfortunately, C7G instances do not appear to support PodSecurity Groups. I tried to spin up a cluster and PodSecurity Groups didn't work. I suspect that it's because we're missing c7g instances from this file:

https://github.com/aws/amazon-vpc-resource-controller-k8s/blob/master/pkg/aws/vpc/limits.go

Could someone confirm? If so, I can try to open a PR.

Observed Behavior:

C7G instances in EKS work, but don't work with PodSecurityGroups.

Expected Behavior:

I can replace my m6g's with c7g's in terraform, and re-apply my infra & configs, getting back an EKS cluster with node_groups and Pods w/ PodSecurityGroups.

Substitute C7G instances in our current configurations, and operate them normally - with PodSecurityGroups.

How to reproduce it (as minimally and precisely as possible):

  • Create EKS cluster
  • Configure for C7Gs
  • Reuse the same configs & deployments as you would for a working, m6g cluster

Additional Context:
N/A

Environment:

  • Kubernetes version (use kubectl version): EKS 1.22.6
  • CNI Version: latest
  • OS (Linux/Windows): OSX

Panic when node is initalized and deleted immediately

Describe the Bug:
Controller panics during the following scenarios.

  1. A new node is created and deleted immediately while initialization.
  2. Node is being deleted and controller is restarted.

On restart the controller recovers and funcion as expected. There is no functional impact for existing workload but the new pods using security group will experience startup latency when the above two scenarios are hit.

Observed Behavior:

"level":"info","timestamp":"2020-12-02T22:13:49.164Z","logger":"node manager","msg":"node added to list of managed node","node name":"ip-192-168-79-251.us-west-2.compute.internal","request":"add/update"}
{"level":"info","timestamp":"2020-12-02T22:13:49.164Z","logger":"controller-runtime.controller","msg":"Starting workers","controller":"pod-create","worker count":1}
E1202 22:13:49.276883       1 runtime.go:78] Observed a panic: "invalid memory address or nil pointer dereference" (runtime error: invalid memory address or nil pointer dereference)
goroutine 342 [running]:
k8s.io/apimachinery/pkg/util/runtime.logPanic(0x1944ce0, 0x2d54650)
        /go/pkg/mod/k8s.io/[email protected]/pkg/util/runtime/runtime.go:74 +0xa3
k8s.io/apimachinery/pkg/util/runtime.HandleCrash(0x0, 0x0, 0x0)
        /go/pkg/mod/k8s.io/[email protected]/pkg/util/runtime/runtime.go:48 +0x82
panic(0x1944ce0, 0x2d54650)
        /opt/bitnami/go/src/runtime/panic.go:679 +0x1b2
github.com/aws/amazon-vpc-resource-controller-k8s/pkg/aws/ec2.(*ec2Instance).LoadDetails(0xc000926000, 0x1fc6b80, 0xc000278ed0, 0x0, 0x0)
        /workspace/pkg/aws/ec2/instance.go:102 +0xde
github.com/aws/amazon-vpc-resource-controller-k8s/pkg/node.(*node).InitResources(0xc000080240, 0xc000278ef0, 0x1, 0x1, 0x1fc6b80, 0xc000278ed0, 0x0, 0x0)
        /workspace/pkg/node/node.go:99 +0xbe
github.com/aws/amazon-vpc-resource-controller-k8s/pkg/node.(*manager).performPostUnlockOperation(0xc000104070, 0xc0001ec2a0, 0x2a, 0xc0006e0760, 0x0, 0x0)
        /workspace/pkg/node/manager.go:213 +0x13d
github.com/aws/amazon-vpc-resource-controller-k8s/pkg/node.(*manager).AddOrUpdateNode(0xc000104070, 0xc000848000, 0xc000096000, 0x0)
        /workspace/pkg/node/manager.go:88 +0x8b
github.com/aws/amazon-vpc-resource-controller-k8s/controllers/core.(*NodeReconciler).Reconcile(0xc0003e1cc0, 0x0, 0x0, 0xc0001ec2a0, 0x2a, 0xc000575cd8, 0xc00036f8c0, 0xc00036f838, 0x1f6c9c0)
        /workspace/controllers/core/node_controller.go:62 +0x332
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler(0xc00061e180, 0x19ca620, 0xc0006e0400, 0x0)
        /go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:256 +0x162
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem(0xc00061e180, 0x1493300)
        /go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:232 +0xcb
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).worker(0xc00061e180)
        /go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:211 +0x2b
k8s.io/apimachinery/pkg/util/wait.JitterUntil.func1(0xc000180ff0)
        /go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:152 +0x5e
k8s.io/apimachinery/pkg/util/wait.JitterUntil(0xc000180ff0, 0x3b9aca00, 0x0, 0xc000097201, 0xc0004357a0)
        /go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:153 +0xf8
k8s.io/apimachinery/pkg/util/wait.Until(0xc000180ff0, 0x3b9aca00, 0xc0004357a0)
        /go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:88 +0x4d
created by sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func1
        /go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:193 +0x328
panic: runtime error: invalid memory address or nil pointer dereference [recovered]
        panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x8 pc=0x15da7fe]
goroutine 342 [running]:
k8s.io/apimachinery/pkg/util/runtime.HandleCrash(0x0, 0x0, 0x0)

How to reproduce it (as minimally and precisely as possible):

  1. Scale the ASG from 100 to 0 and restart the controller or
  2. Scale up the ASG from 0 to 100 and scale down the ASG immediately to 0.

Environment:

  • Kubernetes version (use kubectl version): v1.0.5
  • CNI Version: v1.7.5

Support Prefixes for Windows Worker Instances for Higher Pod Density

Is your feature request related to a problem? Please describe.
Currently the number of Pods that can be scheduled on a Node is limited by the number of Secondary IPv4 Addresses on the Primary ENI. This limits the Pod density on the Windows Nodes.

With the feature of assigning a /28 prefix on a ENI, we can significantly increase the Pod Density for Nitro Based instances. For instance, the theoretical limit increase for c5.large changes from 9 -> /28 X 9 Prefixes = 144

This would further reduce the number of calls to Assign/UnAssing IPv4 Addresses as the IP address allocation from a given Prefix can be done locally.

Describe the solution you'd like

  • Support VPC Prefixes using the Primary ENI only for Nitro Instances.

VPCResourceController should release branch-eni resources for terminated Pods.

What would you like to be enhanced:
Pods created by Jobs will enter terminated state(Succeed/Failed), and the scheduler will treat resources used by terminated pods are released when making scheduling decision around resource limits.
For normal CNI workflow, kubelet will invoke DelNetwork for terminated pods, and our normal CNI will correctly reuse IP for new pods.
However, for VPCResourceController, the branch-eni resource is still holding by terminated pods, and scheduler might schedule pod onto nodes where there are no branch-eni resource available.

Why is the change needed and what use case will it solve:
We should change VPCResourceController to release the branch-eni resource for terminated pods as well.

The released instance type list file should be isolated

What would you like to be enhanced:
Currently the controller is using the limit.go to check the availability of an instance type. Due to the lagging between release and update on this file, the file can mislead users when they use it to guide deployment manually or automatically.
Why is the change needed and what use case will it solve:
A independent file can be created which should only refer to the state of limit.go when the latest release is made.

Branch network interfaces support for new generation instances

What would you like to be added:
Branch network interfaces support to enable Security groups for pods is not supported for new generation instance types like m6i, m6a, c6i, c6a, r6i and r6a.

Why is this needed:
This is needed to take advantage of the new instance types in Kubernetes in the case of security group for pods.

Usage Metrics for branch eni resources

What would you like to be enhanced:
Currently, VPC resource controller doesn't expose any branch eni usage metrics at an instance/node level and providing them would be very useful for tracking resource usage and to be able to act on it.

Why is the change needed and what use case will it solve:
Having access to resource limits at instance and cluster level will be useful to monitor the limits for resources that VPC resource controller manages and can be used for alerting. Will also be extremely useful for debugging purposes.

Ability to use Security Group Name and/or tags in Security Group Policy

Is your feature request related to a problem? Please describe.
Security Groups are tied to VPC which means you cannot reuse Security Group Policy on clusters belonging to different VPC.

Describe the solution you'd like
SGP should allow specifying security groups using security group name and/or tags.

Support Warm Pool for Security Group for Pods

Problem
Currently we create Branch Network Interface on demand in the VPC Resource Controller which adds up to higher pod startup latency at scale.

Supporting Warm Pool for Security Group Policy would alleviate the pod startup latency.

Challenges
Creating a warm pool for Security Group Policy requires solving a couple of major challenges

  • Scheduler level awareness for mapping the pod requiring SGP to the right node with the WP.
  • Warm pool for ENIs using Security Group form multiple Security Group Policy requires pre configuration from user because there can be any combination of SGP used by the pod based on the label selection criteria.

Invalid memory address or nil pointer dereference

Describe the Bug:

During node initialization, a panic was triggered at

github.com/aws/amazon-vpc-resource-controller-k8s/pkg/aws/ec2.(*ec2Instance).LoadDetails(0xc013440500, {0x2353218, 0xc00061e740}) /workspace/pkg/aws/ec2/instance.go:126 +0x59c

Observed Behavior:
Due to the panic, the controller instance was forcefully restarted.

Expected Behavior:
The controller should handle the condition gracefully instead of failing the routine and entire controller instance.

How to reproduce it (as minimally and precisely as possible):
N/A currently. I will update if this can be reproduced in dev environment.

Additional Context:
The entire stack trace

2022-07-11T10:45:07.786-07:00 | panic: runtime error: invalid memory address or nil pointer dereference
2022-07-11T10:45:07.786-07:00 | [signal SIGSEGV: segmentation violation code=0x1 addr=0x8 pc=0x18e3e3c]
2022-07-11T10:45:07.786-07:00 | goroutine 513 [running]:
2022-07-11T10:45:07.786-07:00 | github.com/aws/amazon-vpc-resource-controller-k8s/pkg/aws/ec2.(*ec2Instance).LoadDetails(0xc013440500, {0x2353218, 0xc00061e740}) /workspace/pkg/aws/ec2/instance.go:126 +0x59c
2022-07-11T10:45:07.786-07:00 | github.com/aws/amazon-vpc-resource-controller-k8s/pkg/node.(*node).InitResources(0xc00cd72ff0, {0x22f3740, 0xc00000e0a0}, {0x2353218, 0xc00061e740}) /workspace/pkg/node/node.go:117 +0xd0
2022-07-11T10:45:07.786-07:00 | github.com/aws/amazon-vpc-resource-controller-k8s/pkg/node/manager.(*manager).performAsyncOperation(0xc000220420, {0x1d1fe80, 0xc004c41a70}) /workspace/pkg/node/manager/manager.go:305 +0x254
2022-07-11T10:45:07.786-07:00 | github.com/aws/amazon-vpc-resource-controller-k8s/pkg/worker.(*worker).processNextItem(0xc0001cbf80) /workspace/pkg/worker/worker.go:147 +0x1e6
2022-07-11T10:45:07.786-07:00 | github.com/aws/amazon-vpc-resource-controller-k8s/pkg/worker.(*worker).runWorker(0xc0009df768) /workspace/pkg/worker/worker.go:132 +0x25
2022-07-11T10:45:09.036-07:00 | created by github.com/aws/amazon-vpc-resource-controller-k8s/pkg/worker.(*worker).StartWorkerPool /workspace/pkg/worker/worker.go:190 +0x145

Environment:

  • Kubernetes version (use kubectl version): EKS1.21-eks.7
  • CNI Version: v1.11.0
  • OS (Linux/Windows): Linux

Pod IP address assignment slow/stuck

Describe the Bug:
Pods using the Security Groups for pods feature remain stuck in the ContainerCreating status waiting for an IP address to be assigned. We notice the pods do not have an IP address assigned and the most recent event on the pod is like the following:

Warning  FailedCreatePodSandBox  6s (x175 over 38m)  kubelet            (combined from similar events): Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "928809399f9e9384f0a1f97c7dace19e9ee3e33596428695fa6dda205a987bd0": plugin type="aws-cni" name="aws-cni" failed (add): add cmd: failed to assign an IP address to container

Observed Behavior:
Pods using the Security Groups for pods feature are stuck waiting in excess of 10 mintues for IP address assignment.

Expected Behavior:
Pods using the Security Groups for pods feature are assigned IP addresses quickly (not sure what's the EKS SLA here, I assume in a minute or two?)

How to reproduce it (as minimally and precisely as possible):
Unknown

Additional Context:
We have had several instances of this happening on a single EKS cluster. In the worst case it we have noted pods waiting over 100 minutes for an IP address. We never see these pods receive an IP address. We keep deleting these stuck pods until we find the newly created pods have an IP address assigned within a few minutes after the pod has been created. These incidents where the VPC resource controller appears to be unable to assign pod IPs for pods using the Security Groups for pods feature seem to happen sporadically and eventually recovers on it's own.

Environment:

  • Kubernetes version (use kubectl version):
Client Version: version.Info{Major:"1", Minor:"22", GitVersion:"v1.22.6", GitCommit:"f59f5c2fda36e4036b49ec027e556a15456108f0", GitTreeState:"archive", BuildDate:"1980-01-01T00:00:00Z", GoVersion:"go1.16.13", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"21+", GitVersion:"v1.21.13-eks-84b4fe6", GitCommit:"e1318dce57b3e319a2e3fecf343677d1c4d4aa75", GitTreeState:"clean", BuildDate:"2022-06-09T18:22:07Z", GoVersion:"go1.16.15", Compiler:"gc", Platform:"linux/amd64"}
  • CNI Version: v1.11.0
    aws-vpc-cni helm chart version: 1.1.16

  • OS (Linux/Windows): Linux

Add uniqueID in the podAnnotation field for PerPodSecurityGroup

What would you like to be enhanced:
We should add some unique ID in the vpc.amazonaws.com/pod-eni annotation field.

Why is the change needed and what use case will it solve:
This will help to prevent deleting new pod network when Kubelet invokes deletion for old pod even after the old pod network is removed.

Some Branch ENI Pods loosing network connectivity on VPC Controller Restart

Describe the Bug:
On VPC Controller Restart we are seeing that some of the Branch ENIs are being deleted when the number of pods on the cluster are more than >300.

Observed Behavior:
While testing, for 300 regular pods and 200 Branch ENI pods we saw that 5 branch ENIs for running pods were deleted. Snippet of the logs is shown below.

{"level":"info","timestamp":"2020-11-26T02:02:33.884Z","logger":"branch eni provider","msg":"pushing eni to delete queue as no pod owns it","node name":"ip-192-168-24-151.us-west-2.compute.internal","eni":"eni-080621d110a6f8d6e"}
{"level":"info","timestamp":"2020-11-26T02:07:02.666Z","logger":"branch eni provider","msg":"pushing eni to delete queue as no pod owns it","node name":"ip-192-168-111-144.us-west-2.compute.internal","eni":"eni-02f4ea02f299bdb48"}
{"level":"info","timestamp":"2020-11-26T02:07:02.818Z","logger":"branch eni provider","msg":"pushing eni to delete queue as no pod owns it","node name":"ip-192-168-68-155.us-west-2.compute.internal","eni":"eni-0cc8222366cc83c5e"}

The pods with the deleted branch ENIs will loose network connectivity.

Expected Behavior:
VPC Controller shouldn't delete Branch ENIs associated with Running pod.

How to reproduce it (as minimally and precisely as possible):
Introduce >300 pods in your cluster with some pods using security group per pods, restart the VPC Controller. It can be triggered on platform version upgrade or k8s version upgrade.

Fix:
The issue is happening because the cache is being queried before it has completely synced with the API Server. We need to wait for the cache to sync before responding to any query to the data store.

Working on the fix, it will be rolled out in coming weeks.

Environment:

  • Kubernetes version (use kubectl version): v1.17
  • CNI Version: v1.7.5

Enable EKS Windows Support using the new VPC Resource Controller

The newer version of VPC Controller has the support for managing IP Address for Windows Pods. However, the feature is not enabled yet.

Need a migration strategy to migrate users from old VPC Resource Controller to the newer version of VPC Resource Controller which has significant performance improvements.

Windows containers stuck in ContainerCreating when many are scheduled on the same node at once, "Failed to create pod sandbox

What happened:

From the pods stuck in ContainerCreating:

  Normal   ResourceAllocated       33m                   vpc-resource-controller  Allocated Resource vpc.amazonaws.com/PrivateIPv4Address: 192.168.61.207/19 to the pod
  Warning  FailedCreatePodSandBox  33m                   kubelet                  Failed to create pod sandbox: rpc error: code = Unavailable desc = error reading from server: EOF
  Warning  FailedCreatePodSandBox  33m                   kubelet                  Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "d8d59a993d3cf47f969c624ceda80257de2f12549f4472be73cb2232c39ba798": plugin type="vpc-shared-eni" name="vpc" failed (add): hnsCall failed in Win32: The object already exists. (0x1392)

Attach logs

Logs have been sent with AWS case ID 10328164281

What you expected to happen:
I should be able to start containers on Windows
How to reproduce it (as minimally and precisely as possible):
Create a deployment with N - 1 replicas (e.g. 8) tainted to not schedule. Observe they are now pending. Add a Windows node large enough to have N private VPC IP addresses (e.g. vpc.amazonaws.com/PrivateIPv4Address: 9). Change your deployment to now schedule. Observe that they will all get scheduled on the one Windows node, as expected. Observe they are stuck in ContainerCreating.
Anything else we need to know?:

Environment:

  • Kubernetes version (use kubectl version): 1.22
  • CNI Version: on Linux, aws-node is v1.11.2
  • OS (e.g: cat /etc/os-release): Windows Server 2019 LTSC

Add design documentation

Is your feature request related to a problem? Please describe.
Add design documentation for the following items.

  • Allocation of IP Address to Windows Pods.
  • Allocation of Branch ENI for Pods using SGP.
  • Creation of Trunk ENI.

Allow More Than 5 Security Groups When Using Security Groups For Pods

What would you like to be enhanced:

When using the Security Groups For Pods feature the following error is generated when assigning more than 5 Security Groups:

"The SecurityGroupPolicy \"collibradq-spark\" is invalid: spec.securityGroups.groupIds: Invalid value: 6: spec.securityGroups.groupIds in body should have at most 5 items"
 
The limit is enforced at:
https://github.com/aws/amazon-vpc-resource-controller-k8s/blob/master/apis/vpcresources/v1beta1/securitygrouppolicy_types.go#L33
 
Before the controller can be updated branch ENI support needs to be confirmed.
 
Why is the change needed and what use case will it solve:

Allow customers to use more than 5 Security Groups.

Set role session name to easier track down who's doing what in cloudtrail

What would you like to be enhanced:

Consider setting role session name on the AWS SDK session.

Why is the change needed and what use case will it solve:

To allow easier audit and understanding what happens, and easier track down when the VPC resource controller policy is missing.
The current role session name is unix nano timestamp.

Controller doesn't check if node capacity is nil and could be panic

Describe the Bug:

The controller doesn't check if the node capacity map is initialized before calling DeepCopy() which will not initialize a map if the source doesn't. In some rare case, the GET node call from API server could be broken and node object may not have the map initialized. In this case, the controller will panic.
Observed Behavior:

Observed a panic: "assignment to entry in nil map" (assignment to entry in nil map)
goroutine 848 [running]:
k8s.io/apimachinery/pkg/util/runtime.logPanic({0x1c1c380, 0x229e9e0})
	/go/pkg/mod/k8s.io/[email protected]/pkg/util/runtime/runtime.go:74 +0x7d
k8s.io/apimachinery/pkg/util/runtime.HandleCrash({0x0, 0x0, 0x2ad5ab56ad5a})
	/go/pkg/mod/k8s.io/[email protected]/pkg/util/runtime/runtime.go:48 +0x75
panic({0x1c1c380, 0x229e9e0})
	/usr/local/go/src/runtime/panic.go:1038 +0x215
github.com/aws/amazon-vpc-resource-controller-k8s/pkg/k8s.(*k8sWrapper).AdvertiseCapacityIfNotSet.func1()
	/workspace/pkg/k8s/wrapper.go:161 +0x2f4
k8s.io/client-go/util/retry.OnError.func1()
	/go/pkg/mod/k8s.io/[email protected]/util/retry/util.go:51 +0x33
k8s.io/apimachinery/pkg/util/wait.ConditionFunc.WithContext.func1({0x40d094, 0xc000cbb868})
	/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:220 +0x1b
k8s.io/apimachinery/pkg/util/wait.runConditionWithCrashProtection(0x1d586e0)
	/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:226 +0x39
k8s.io/apimachinery/pkg/util/wait.ExponentialBackoff({0x989680, 0x4014000000000000, 0x3fb999999999999a, 0x4, 0x0}, 0x40d3e7)
	/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:421 +0x5f
k8s.io/client-go/util/retry.OnError({0x989680, 0x4014000000000000, 0x3fb999999999999a, 0x4, 0x0}, 0x1ff73e0, 0xc000621180)
	/go/pkg/mod/k8s.io/[email protected]/util/retry/util.go:50 +0xf1
k8s.io/client-go/util/retry.RetryOnConflict(...)
	/go/pkg/mod/k8s.io/[email protected]/util/retry/util.go:104
github.com/aws/amazon-vpc-resource-controller-k8s/pkg/k8s.(*k8sWrapper).AdvertiseCapacityIfNotSet(0xc000c09f20, {0xc000b99020, 0x28}, {0x1f35efe, 0x19}, 0x12)
	/workspace/pkg/k8s/wrapper.go:146 +0x15e
github.com/aws/amazon-vpc-resource-controller-k8s/pkg/provider/branch.(*branchENIProvider).UpdateResourceCapacity(0xc000615360, {0x234ea00, 0xc000b9d540})
	/workspace/pkg/provider/branch/provider.go:251 +0xcf
github.com/aws/amazon-vpc-resource-controller-k8s/pkg/node.(*node).UpdateResources(0xc000b133b0, {0x22f3740, 0xc00046a170}, {0x2353218, 0xc0005f9210})
	/workspace/pkg/node/node.go:93 +0x239
github.com/aws/amazon-vpc-resource-controller-k8s/pkg/node/manager.(*manager).performAsyncOperation(0xc00015c420, {0x1d1fe80, 0xc000cbbe30})
	/workspace/pkg/node/manager/manager.go:316 +0x3d4
github.com/aws/amazon-vpc-resource-controller-k8s/pkg/node/manager.(*manager).performAsyncOperation(0xc00015c420, {0x1d1fe80, 0xc000bb5aa0})
	/workspace/pkg/node/manager/manager.go:314 +0x32a
github.com/aws/amazon-vpc-resource-controller-k8s/pkg/worker.(*worker).processNextItem(0xc000500e70)
	/workspace/pkg/worker/worker.go:147 +0x1e6

Expected Behavior:
The controller should check for nil and handle this gracefully. Retry or requeue would be suggested.
How to reproduce it (as minimally and precisely as possible):
The condition is not local reproduce-able.
Additional Context:

Environment:

  • Kubernetes version (use kubectl version): 1.22
  • CNI Version: v1.11.3
  • OS (Linux/Windows): Linux

Remove the Low QPS threshold for SGPP

What would you like to be enhanced:

Currently, SGPP assignment by design , a low QPS call done VPC Resource Controller when it tries to associate Trunk ENI to Branch.

https://github.com/aws/amazon-vpc-resource-controller-k8s/blob/master/pkg/config/loader.go#L29-L30

That is because SGP by costly operation, and this will be lower qps set than any other EC calls.

However, we should consider removing that low QPS call and let the max pod creation threshold be handled by kubernetes.

Why is the change needed and what use case will it solve:

There was customer report that when SGPP was enabled, their deployment scaling took a long time, and they wanted to avoid it

Avoid recycling VLAN ID too quickly

What would you like to be enhanced:
I'm debugging an AWS CNI issue, where we see pods stuck with just vlanXXXX and no vlan.eth.$vlanid@ethN interface:

aws/amazon-vpc-cni-k8s#1644 (comment)

I notice that the lowest available VLAN ID is used in this code:

// assignVlanId assigns a free vlan id from the list of available vlan ids. In the future this can be changed to LL
func (t *trunkENI) assignVlanId() (int, error) {
t.lock.Lock()
defer t.lock.Unlock()
for index, used := range t.usedVlanIds {
if !used {
t.usedVlanIds[index] = true
return index, nil
}
}
return 0, fmt.Errorf("failed to find free vlan id in the available %d ids", len(t.usedVlanIds))
}

We suspect this causes the ID to be reused before the cleanup process on the node.

Why is the change needed and what use case will it solve:

I think it would be easier to debug if the IDs were allocated sequentially instead. This would decrease the likelyhood of race conditions too, and possibly fix the issue. Regardless of if this is the root cause, I think the change would be nice to do. Is there a downside to this?

Changes in the security group ids list in SecurityGroupPolicy are not propagate

Is your feature request related to a problem? Please describe.
If I add/remove a security group from my manifest (SecurityGroupPolicy) and apply it, it doesn't propagate the change in the branch ENI.

Describe the solution you'd like
N/A

Describe alternatives you've considered
Right now a pod recreation is required in order to create a new ENI with security groups specified in the manifest.

Additional context
EKS version 1.18
CNI plugin 1.7.5

I also couldn't see any events from the controller into the resource when I do a "kubectl describe sgp ..."

The controller shouldn't requeue not found pods

What would you like to be enhanced:
The controller shouldn't requeue pods which are not found. Not found will remain as not-found till cache updates and requeue them.
The similar improvement in controller runtime can be referenced at kubernetes-sigs/controller-runtime#377

{"level":"error","timestamp":"2022-09-13T18:23:20.282Z","logger":"vpc.amazonaws.com/pod-eni-worker","msg":"re-queuing job","job":{"Operation":"Create","UID":"","PodName":"hello-world-7c4b75bf8c-tzzp8","PodNamespace":"default","RequestCount":1,"NodeName":""},"retry count":1,"error":"failed to find pod default/hello-world-7c4b75bf8c-tzzp8","stacktrace":"github.com/aws/amazon-vpc-resource-controller-k8s/pkg/worker.(*worker).processNextItem\n\t/workspace/pkg/worker/worker.go:154\ngithub.com/aws/amazon-vpc-resource-controller-k8s/pkg/worker.(*worker).runWorker\n\t/workspace/pkg/worker/worker.go:132"}
{"level":"error","timestamp":"2022-09-13T18:24:19.143Z","logger":"vpc.amazonaws.com/pod-eni-worker","msg":"re-queuing job","job":{"Operation":"Create","UID":"","PodName":"hello-world-7c4b75bf8c-tzzp8","PodNamespace":"default","RequestCount":1,"NodeName":""},"retry count":3,"error":"failed to find pod default/hello-world-7c4b75bf8c-tzzp8","stacktrace":"github.com/aws/amazon-vpc-resource-controller-k8s/pkg/worker.(*worker).processNextItem\n\t/workspace/pkg/worker/worker.go:154\ngithub.com/aws/amazon-vpc-resource-controller-k8s/pkg/worker.(*worker).runWorker\n\t/workspace/pkg/worker/worker.go:132"}
{"level":"error","timestamp":"2022-09-13T18:25:09.442Z","logger":"vpc.amazonaws.com/pod-eni-worker","msg":"re-queuing job","job":{"Operation":"Create","UID":"","PodName":"hello-world-7c4b75bf8c-tzzp8","PodNamespace":"default","RequestCount":1,"NodeName":""},"retry count":4,"error":"failed to find pod default/hello-world-7c4b75bf8c-tzzp8","stacktrace":"github.com/aws/amazon-vpc-resource-controller-k8s/pkg/worker.(*worker).processNextItem\n\t/workspace/pkg/worker/worker.go:154\ngithub.com/aws/amazon-vpc-resource-controller-k8s/pkg/worker.(*worker).runWorker\n\t/workspace/pkg/worker/worker.go:132"}
{"level":"error","timestamp":"2022-09-13T18:25:59.743Z","logger":"vpc.amazonaws.com/pod-eni-worker","msg":"exceeded maximum retries","job":{"Operation":"Create","UID":"","PodName":"hello-world-7c4b75bf8c-tzzp8","PodNamespace":"default","RequestCount":1,"NodeName":""},"max retries":5,"error":"failed to find pod default/hello-world-7c4b75bf8c-tzzp8","stacktrace":"github.com/aws/amazon-vpc-resource-controller-k8s/pkg/worker.(*worker).processNextItem\n\t/workspace/pkg/worker/worker.go:149\ngithub.com/aws/amazon-vpc-resource-controller-k8s/pkg/worker.(*worker).runWorker\n\t/workspace/pkg/worker/worker.go:132"}

Why is the change needed and what use case will it solve:
This behavior will add deleted pods into queue till reaching the max retry. In large and high churning rate cluster, this can worsen queue performance unnecessarily.

GitHub Actions

This is a feature request to fix GitHub workflows and automatically run unit tests against submitted PRs.

arm64 support

Hi,

We're investigating using arm64 node groups in EKS and have a lot of windows container workloads.
We'd need an arm64 build, same for vpc-admission-webhook I guess, is this on the roadmap?

Many thanks,
Chris

Do limited requeues of pod event when the node is not ready

Describe the Bug:
Do limited number of re-queues in the controller when the node is not ready and we get a pod event- https://github.com/aws/amazon-vpc-resource-controller-k8s/blob/master/controllers/core/pod_controller.go#L100

Observed Behavior:
If the node becomes not ready for long period of time then the logs will be flooded with the following messages

{"level":"info","timestamp":"2021-02-03T11:55:42.559Z","logger":"setup.pod reconciler","msg":"pod's node is not ready to handle request yet, will retry","UID":"<UID>","pod":"namespace/pod-name","node":"<node-name>"}

Expected Behavior:
We should do limited number of retries and if the node doesn't become ready we should pick up the pod for processing again during the next resync period.

windows controller: unknown node

I'm observing a lot of error message in my windows-vpc-resource-controller:v0.2.7 deployment and wondering if it has any impact to my cluster.

Error output when applying to k8s

Describe the Bug:

Hello, I am using SecurityGroupPolicy and observed an error output.

The result is working as expected, but the log generates output.

My yaml file is as below.

apiVersion: vpcresources.k8s.aws/v1beta1
kind: SecurityGroupPolicy
metadata:
  name: ${NAME}
  namespace: ${NAMESPACE}
spec:
  podSelector:
    matchLabels:
      foo: bar
  securityGroups:
    groupIds:
      - ${VALID_SECURITY_GROUP_ID}

As I mentioned above, this works perfectly, but when I run kubectl logs, I get an output below.

❯ kubectl logs -n planit-dev securitygrouppolicy.vpcresources.k8s.aws/${NAME}
error: no kind "SecurityGroupPolicy" is registered for version "vpcresources.k8s.aws/v1beta1" in scheme "k8s.io/kubectl/pkg/scheme/scheme.go:28"

This output give some frustrations to users, and should be fixed.

Additional ENI tag: Pod name

What would you like to be enhanced:
In addition to the current tags (kubernetes.io/cluster/$NAME, eks:eni:owner, vpcresources.k8s.aws/vlan-id, and vpcresources.k8s.aws/trunk-eni-id) I would like to see an additional pod-name tag attached to each ENI.

Why is the change needed and what use case will it solve:
This will help with debugging and will lead to a slightly improved developer experience 🙂
I had to debug some complex Security Groups rules when using the Security-Groups-per-Pod feature, and getting the ENI used by a certain Pod was not as straightforward as it could be. I did most of my work in the AWS Console, but I was forced to use kubectl to get the pod's ENI (either by getting the Pod's IP and searching for that in the AWS Console or by doing a kubectl describe pod and getting the ENI ID from the events). Being able to stay in the AWS Console and filter the ENIs by a specific tag value would have made the process much easier.


Based on a quick code reading, I am pretty sure this enhancement request should be here and not in aws/amazon-vpc-cni-k8s, but I could be wrong. Apologies if that is the case!

VLAN ID is not freed from cache becasuse as EC2 API is evenutally consistent

Describe the Bug:
When a pod using Branch ENI is created and the controller is immediately restarted since EC2 is eventually consistent the EC2 API call can return ENI list without the newly created ENI leading to the controller not marking the VLAN ID allocated to the ENI as assigned in it's internal cache.

This can lead to new Branch ENI created on the node to fail since the controller will try to reuse the existing VLAN ID which it's not aware of.

Proposed Fix:
Check the error message, if the error message says that the VLAN ID is still in use, add the VLAN ID to cache.

_, err = t.ec2ApiHelper.AssociateBranchToTrunk(&t.trunkENIId, nwInterface.NetworkInterfaceId, vlanID)
	if err != nil {
                // Check error here if VLAN is already used mark it in cache.
		trunkENIOperationsErrCount.WithLabelValues("associate_branch").Inc()
		break
	}

How to reproduce it (as minimally and precisely as possible):
Can be reproduced on trying multiple times to create Branch ENIs for new pods and kill the controller just after new Branch ENIs are created.

Environment:

  • Kubernetes version (use kubectl version): v1.0.5
  • CNI Version: v1.7.5

New instance types not supported

What would you like to be enhanced:

c6in, r6idn, m6idn instances currently do not have the vpc.amazonaws.com/has-trunk-attached=true label applied nor trunk ENIs attached. This means pods requiring Security Groups ( via ENABLE_POD_ENI ) can't be scheduled on them.

Why is the change needed and what use case will it solve:

Our AutoScaling Groups use Instance Requirements and some AZs are only getting c6in instances.

I've also opened aws/amazon-vpc-cni-k8s#2166 on the vpc-cni side. I'm not entirely clear if their is a mutual requirement to support the new types.

Create pod event when VPC Controller is unable to assign Branch ENI to Pod

What would you like to be enhanced:

  • Emit all error messages associated with creating Branch ENIs to pod event to help users debug and fix the issue easily.
  • Emit node events with appropriate error message if the Trunk ENI Creation fails.

Why is the change needed and what use case will it solve:
This change will help users resolve configuration issues like missing IAM permissions, PSP blocking VPC Resource Controller to annotate pod with the Branch ENI details.

Windows EKS Pod IP allocation

Environment:

  • Kubernetes version (use kubectl version):
Client Version: version.Info{Major:"1", Minor:"19", GitVersion:"v1.19.7", GitCommit:"1dd5338295409edcfff11505e7bb246f0d325d15", GitTreeState:"clean", BuildDate:"2021-01-13T13:23:52Z", GoVersion:"go1.15.5", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"21+", GitVersion:"v1.21.2-eks-06eac09", GitCommit:"5f6d83fe4cb7febb5f4f4e39b3b2b64ebbbe3e97", GitTreeState:"clean", BuildDate:"2021-09-13T14:20:15Z", GoVersion:"go1.16.5", Compiler:"gc", Platform:"linux/amd64"}
  • CNI Version
    EKS Addon version - v1.10.1-eksbuild.1
  • OS (e.g: cat /etc/os-release):
    Amazon Linux 2
  • Kernel (e.g. uname -a):
    Kernel 5.4.156-83.273.amzn2.x86_64 on an x86_64
 (combined from similar events): Failed to create  pod sandbox: rpc error: code = Unknown desc = failed to set up sandbox  container  "72a8cd086b154bf32710ad159e34ff352ef7e135b13ed46e6d63bb7abeab0235"  network for pod "platform-api-c6f7c5fcd-vzc52": networkPlugin cni failed  to set up pod "platform-api-c6f7c5fcd-vzc52_uk-platform-test" network:  failed to parse Kubernetes args: pod does not have label  vpc.amazonaws.com/PrivateIPv4Address

Service role is set to Inherited from node (attempted using a service account created by eksctl to no avail)
Nodes all have AmazonEKS_CNI_Policy AmazonEC2ContainerRegistryReadOnly AmazonEKSWorkerNodePolicy

The majority of the nodes are linux based and function fine; the issues only with the windows node.
I assume that either this app / https://github.com/aws/amazon-vpc-resource-controller-k8s or https://github.com/aws/amazon-vpc-cni-plugins is supposed to annotate the pods as the containers start in docker but the networking fails.

The cluster config maps are like

apiVersion: v1
data:
  enable-windows-ipam: "true"
kind: ConfigMap
metadata:
  name: amazon-vpc-cni
  namespace: kube-system
apiVersion: v1
data:
  mapAccounts: |
    []
  mapRoles: |
    - groups:
      - system:bootstrappers
      - system:nodes
      rolearn: arn:aws:iam::ACCOUNTIDREMOVED:role/dev-eks-workers
      username: system:node:{{EC2PrivateDNSName}}
    - groups:
      - eks:kube-proxy-windows
      - system:bootstrappers
      - system:nodes
      rolearn: arn:aws:iam::ACCOUNTIDREMOVED:role/dev-eks-cluster_workerinstance_role
      username: system:node:{{EC2PrivateDNSName}}
  mapUsers: |
    []
kind: ConfigMap
metadata: 
  name: aws-auth
  namespace: kube-system

aws-node which only runs on the linux side doesnt provide anything insightful

{"level":"info","ts":"2021-12-08T15:54:29.966Z","caller":"entrypoint.sh","msg":"Validating env variables ..."}
{"level":"info","ts":"2021-12-08T15:54:29.968Z","caller":"entrypoint.sh","msg":"Install CNI binaries.."}
{"level":"info","ts":"2021-12-08T15:54:29.995Z","caller":"entrypoint.sh","msg":"Starting IPAM daemon in the background ... "}
{"level":"info","ts":"2021-12-08T15:54:29.997Z","caller":"entrypoint.sh","msg":"Checking for IPAM connectivity ... "}
{"level":"info","ts":"2021-12-08T15:54:32.008Z","caller":"entrypoint.sh","msg":"Retrying waiting for IPAM-D"}
{"level":"info","ts":"2021-12-08T15:54:34.017Z","caller":"entrypoint.sh","msg":"Retrying waiting for IPAM-D"}
{"level":"info","ts":"2021-12-08T15:54:34.050Z","caller":"entrypoint.sh","msg":"Copying config file ... "}
{"level":"info","ts":"2021-12-08T15:54:34.057Z","caller":"entrypoint.sh","msg":"Successfully copied CNI plugin binary and config file."}
{"level":"info","ts":"2021-12-08T15:54:34.059Z","caller":"entrypoint.sh","msg":"Foregrounding IPAM daemon ..."}

On the windows machines the vpc-shared-eni logs don't really say anything more than the error the container dies on. But it sure generates a lot of logs. (the container its trying to run in this case is just IIS 2019lts)

2021-12-08T12:29:33Z [INFO] Plugin vpc-shared-eni version 1.2 executing CNI command.
2021-12-08T12:29:47Z [INFO] Waiting for pod label vpc.amazonaws.com/PrivateIPv4Address.
2021-12-08T12:29:47Z [INFO] Waiting for pod label vpc.amazonaws.com/PrivateIPv4Address.
... truncated repeated logs
2021-12-08T12:29:51Z [INFO] Waiting for pod label vpc.amazonaws.com/PrivateIPv4Address.
2021-12-08T12:29:51Z [ERROR] Failed to parse netconfig from args: failed to parse Kubernetes args: pod does not have label vpc.amazonaws.com/PrivateIPv4Address.
2021-12-08T12:29:51Z [ERROR] CNI command failed: failed to parse Kubernetes args: pod does not have label vpc.amazonaws.com/PrivateIPv4Address
2021-12-08T12:29:53Z [INFO] Executing DEL with netconfig: &{NetConf:{CNIVersion:0.3.1 Name:vpc Type:vpc-shared-eni Capabilities:map[] IPAM:{Type:} DNS:{Nameservers:[172.20.0.10] Domain: Search:[{%namespace%}.svc.cluster.local svc.cluster.local cluster.local] Options:[]}} ENIName: ENIMACAddress:02:d0:52:f2:02:ca ENIIPAddress:10.22.3.7/24 VPCCIDRs:[{IP:10.22.0.0 Mask:ffff0000}] BridgeType:L3 BridgeNetNSPath: IPAddress:<nil> GatewayIPAddress:10.22.3.1 InterfaceType:veth TapUserID:0 Kubernetes:{Namespace:uk-platform-test PodName:platform-api-6c55d8c7df-7kbb2 PodInfraContainerID:0a8b63bc118124ddf108e345b5c198842765481dccf18286c56d81a059849906 ServiceCIDR:172.20.0.0/16}} ContainerID:0a8b63bc118124ddf108e345b5c198842765481dccf18286c56d81a059849906 Netns:none IfName:eth0 Args:IgnoreUnknown=1;K8S_POD_NAMESPACE=uk-platform-test;K8S_POD_NAME=platform-api-6c55d8c7df-7kbb2;K8S_POD_INFRA_CONTAINER_ID=0a8b63bc118124ddf108e345b5c198842765481dccf18286c56d81a059849906.
2021-12-08T13:00:00Z [INFO] Plugin vpc-shared-eni version 1.2 executing CNI command.
2021-12-08T13:00:04Z [INFO] Plugin vpc-shared-eni version 1.2 executing CNI command.
2021-12-08T13:00:04Z [INFO] Waiting for pod label vpc.amazonaws.com/PrivateIPv4Address.

The EKS event log I've linked from gist https://gist.github.com/ChrisMcKee/0d435ea22a0c16e8280e777b7a0d7776

node describe

Name:               ip-10-22-2-234.eu-west-2.compute.internal
Roles:              <none>
Labels:             alpha.eksctl.io/cluster-name=dev-eks-cluster
                    alpha.eksctl.io/nodegroup-name=eks-nt-workers
                    beta.kubernetes.io/arch=amd64
                    beta.kubernetes.io/instance-type=t3a.medium
                    beta.kubernetes.io/os=windows
                    failure-domain.beta.kubernetes.io/region=eu-west-2
                    failure-domain.beta.kubernetes.io/zone=eu-west-2b
                    kubernetes.io/arch=amd64
                    kubernetes.io/hostname=ip-10-22-2-234.eu-west-2.compute.internal
                    kubernetes.io/os=windows
                    node.kubernetes.io/instance-type=t3a.medium
                    node.kubernetes.io/windows-build=10.0.17763
                    role=workers
                    topology.kubernetes.io/region=eu-west-2
                    topology.kubernetes.io/zone=eu-west-2b
Annotations:        node.alpha.kubernetes.io/ttl: 0
                    volumes.kubernetes.io/controller-managed-attach-detach: true
CreationTimestamp:  Wed, 08 Dec 2021 15:22:05 +0000
Taints:             <none>
Unschedulable:      false
Lease:
  HolderIdentity:  ip-10-22-2-234.eu-west-2.compute.internal
  AcquireTime:     <unset>
  RenewTime:       Wed, 08 Dec 2021 16:12:12 +0000
Conditions:
  Type             Status  LastHeartbeatTime                 LastTransitionTime                Reason                       Message
  ----             ------  -----------------                 ------------------                ------                       -------
  MemoryPressure   False   Wed, 08 Dec 2021 16:07:25 +0000   Wed, 08 Dec 2021 15:22:05 +0000   KubeletHasSufficientMemory   kubelet has sufficient memory available
  DiskPressure     False   Wed, 08 Dec 2021 16:07:25 +0000   Wed, 08 Dec 2021 15:22:05 +0000   KubeletHasNoDiskPressure     kubelet has no disk pressure
  PIDPressure      False   Wed, 08 Dec 2021 16:07:25 +0000   Wed, 08 Dec 2021 15:22:05 +0000   KubeletHasSufficientPID      kubelet has sufficient PID available
  Ready            True    Wed, 08 Dec 2021 16:07:25 +0000   Wed, 08 Dec 2021 15:22:15 +0000   KubeletReady                 kubelet is posting ready status
Addresses:
  InternalIP:   10.22.2.234
  Hostname:     ip-10-22-2-234.eu-west-2.compute.internal
  InternalDNS:  ip-10-22-2-234.eu-west-2.compute.internal
Capacity:
  attachable-volumes-aws-ebs:  25
  cpu:                         2
  ephemeral-storage:           62912508Ki
  memory:                      4144668Ki
  pods:                        10
Allocatable:
  attachable-volumes-aws-ebs:  25
  cpu:                         2
  ephemeral-storage:           57980167277
  memory:                      4042268Ki
  pods:                        10
System Info:
  Machine ID:                 EC2AMAZ-4EH66L4
  System UUID:                EC217CE8-733F-42A8-EAC3-D35B6E54D65C
  Boot ID:                    
  Kernel Version:             10.0.17763.2300
  OS Image:                   Windows Server 2019 Datacenter
  Operating System:           windows
  Architecture:               amd64
  Container Runtime Version:  docker://20.10.7
  Kubelet Version:            v1.21.4-eks-033ce7e
  Kube-Proxy Version:         v1.21.4-eks-033ce7e
ProviderID:                   aws:///eu-west-2b/i-029a0e80e89b5f526
Non-terminated Pods:          (3 in total)
  Namespace                   Name                                CPU Requests  CPU Limits  Memory Requests  Memory Limits  AGE
  ---------                   ----                                ------------  ----------  ---------------  -------------  ---
  haproxy-ingress             ingress-kubernetes-ingress-q4cd7    100m (5%)     0 (0%)      64Mi (1%)        0 (0%)         50m
  uk-platform-test            platform-api-6c55d8c7df-6zcb9       0 (0%)        0 (0%)      0 (0%)           0 (0%)         69m
  uk-platform-test            platform-api-c6f7c5fcd-vzc52        0 (0%)        0 (0%)      0 (0%)           0 (0%)         69m
Allocated resources:
  (Total limits may be over 100 percent, i.e., overcommitted.)
  Resource                    Requests   Limits
  --------                    --------   ------
  cpu                         100m (5%)  0 (0%)
  memory                      64Mi (1%)  0 (0%)
  ephemeral-storage           0 (0%)     0 (0%)
  attachable-volumes-aws-ebs  0          0
Events:
  Type    Reason                   Age                From        Message
  ----    ------                   ----               ----        -------
  Normal  Starting                 50m                kube-proxy  Starting kube-proxy.
  Normal  Starting                 50m                kubelet     Starting kubelet.
  Normal  NodeHasSufficientMemory  50m (x2 over 50m)  kubelet     Node ip-10-22-2-234.eu-west-2.compute.internal status is now: NodeHasSufficientPID
  Normal  NodeReady                50m                kubelet     Node ip-10-22-2-234.eu-west-2.compute.internal status is now: NodeReady

Totally at a loss at this point as I've no idea which of the repositories to look at; or if I'm even looking at the right thing; the troubleshooting doesnt really shine any lights on anything / targets linux while most of the logging is a mix of 'pushed to event viewer' and log files on windows.
Best documentation I've found so far is a slideshow from reinvent https://d1.awsstatic.com/events/reinvent/2019/REPEAT_1_Advanced_network_resource_management_on_Amazon_EKS_CON411-R1.pdf

Get Instance to Branch ENI limits at runtime.

Is your feature request related to a problem? Please describe.
Currently the Instance to Branch ENI Limits are Hard coded which means for each new Instance type added we need to do a new release of controller.

Describe the solution you'd like
If EC2 team can expose API in aws-sdk-go with the Instance -> Branch ENI Limits we can fetch this information at runtime without needing a new controller release.

We are working with EC2 team to evaluate this request.

Endlessly requeue non-existent pods to pod reconcile

Describe the Bug:

Observed Behavior:
The controller endlessly retries no longer existent pods by requeueing them into pod reconcile work queue. The code in pod_controller.go will requeue pods if the pods' node(s) can't be found from cache.

node, found := r.NodeManager.GetNode(pod.Spec.NodeName)
if !found {
	logger.V(1).Info("pod's node is not yet initialized by the manager, will retry")
	return PodRequeueRequest, nil
} 

Since PodRequeueRequest sets Requeue to true, in some condition the pods will be infinitely retried. When the cluster has high frequent node/pod churning rate, the infinite retries will probably cause a larger problem to queue and reconciler.

log is like

{"level":"debug","timestamp":"2022-07-29T06:03:06.916Z","logger":"controllers.Pod Reconciler","msg":"pod's node is not yet initialized by the manager, will retry","UID":"847c170a-1756-4d9e-ab1f-bf9f498fa994","pod":"/","node":"ip-192-168-x-x.us-west-2.compute.internal"}
{"level":"debug","timestamp":"2022-07-29T06:03:07.916Z","logger":"controllers.Pod Reconciler","msg":"pod's node is not yet initialized by the manager, will retry","UID":"847c170a-1756-4d9e-ab1f-bf9f498fa994","pod":"/","node":"ip-192-168-x-x.us-west-2.compute.internal"}
{"level":"debug","timestamp":"2022-07-29T06:03:08.917Z","logger":"controllers.Pod Reconciler","msg":"pod's node is not yet initialized by the manager, will retry","UID":"847c170a-1756-4d9e-ab1f-bf9f498fa994","pod":"/","node":"ip-192-168-x-x.us-west-2.compute.internal"}
{"level":"debug","timestamp":"2022-07-29T06:03:09.918Z","logger":"controllers.Pod Reconciler","msg":"pod's node is not yet initialized by the manager, will retry","UID":"847c170a-1756-4d9e-ab1f-bf9f498fa994","pod":"/","node":"ip-1192-168-x-x.us-west-2.compute.internal"}
{"level":"debug","timestamp":"2022-07-29T06:03:10.919Z","logger":"controllers.Pod Reconciler","msg":"pod's node is not yet initialized by the manager, will retry","UID":"847c170a-1756-4d9e-ab1f-bf9f498fa994","pod":"/","node":"ip-192-168-x-x.us-west-2.compute.internal"}

Expected Behavior:
The pod controller should be smart enough to make a decision if the node has already gone and stop requeue the pods.

How to reproduce it (as minimally and precisely as possible):

  1. turn on debug log
  2. scale pods down to 1 from a bigger number (for example 10, in theory this should be reproduce-able with any number)
  3. scale nodes down to 0
  4. check the controller log

Additional Context:

Environment:

  • Kubernetes version (use kubectl version): EKS 1.21
  • CNI Version: v1.10.1
  • OS (Linux/Windows): Linux

Pod stuck at Pending, default VPC CNI setting and pod security group enabled

I'm running EKS 1.22 with default VPC CNI setting (pod and node shares the same subnet) and pod security group is enabled.
When I tried to perform kubectl rollout restart deploy appname, the pod stuck at Pending.

The deployment has node selector and toleration so it must be scheduled to a particular node group.
The node group runs on m5a.large EC2 instances.

In the K8s event, I have:

53s         Warning   FailedScheduling    pod/db-proxy-tcp-proxy-558c5688df-kdw5b          0/4 nodes are available: 1 node(s) had taint {NodeType: backend}, that the pod didn't tolerate, 1 node(s) had taint {NodeType: frontend}, that the pod didn't tolerate, 2 Insufficient vpc.amazonaws.com/pod-eni.
23s         Warning   FailedScheduling    pod/drupaldb-proxy-tcp-proxy-56456b975c-2lflz    0/4 nodes are available: 1 node(s) had taint {NodeType: backend}, that the pod didn't tolerate, 1 node(s) had taint {NodeType: frontend}, that the pod didn't tolerate, 2 Insufficient vpc.amazonaws.com/pod-eni.

Note the: 2 Insufficient vpc.amazonaws.com/pod-eni.
When I check each instance of the node group, I have 29 pod capacity per instance. The max pod number scheduled was 14 on the node.

Allocatable:
  attachable-volumes-aws-ebs:  25
  cpu:                         1930m
  ephemeral-storage:           47233297124
  hugepages-1Gi:               0
  hugepages-2Mi:               0
  memory:                      7201292Ki
  pods:                        29

I manually scale the ASG of the node group so it has 1 more instance, and the pod can be scheduled successfully.

Question: why pod scheduler cannot successfully allocate the pod to node group? I'm concerned if manual intervention must be done to address this.

Thank you.

Environment:

  • Kubernetes version (use kubectl version): v1.22.13-eks-15b7512
  • CNI Version: v1.10.1-eksbuild.1
  • OS (e.g: cat /etc/os-release): Amazon Linux 2 (ami-07c0ff1e7b8d001e0)

Data race was detected between pod and node controllers reconcile.

Describe the Bug:

After enabling go race detection, one data race was recognized on the share boolean variable of hasPodDataStoreSynced.

WARNING: DATA RACE
Read at 0x00c0004cc09f by goroutine 53:
  github.com/aws/amazon-vpc-resource-controller-k8s/pkg/condition.(*condition).WaitTillPodDataStoreSynced()
      /workspace/pkg/condition/conditions.go:84 +0x98
  github.com/aws/amazon-vpc-resource-controller-k8s/controllers/core.(*NodeReconciler).Reconcile()
      /workspace/controllers/core/node_controller.go:55 +0x93
  sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile()
      /go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:114 +0x39b
  sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler()
      /go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:311 +0x43a
  sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem()
      /go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:266 +0x350
  sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2()
      /go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:227 +0xd7

Previous write at 0x00c0004cc09f by goroutine 144:
  github.com/aws/amazon-vpc-resource-controller-k8s/controllers/custom.(*CustomController).WaitForCacheSync()
      /workspace/controllers/custom/custom_controller.go:154 +0xdc
  github.com/aws/amazon-vpc-resource-controller-k8s/controllers/custom.(*CustomController).Start.func1()
      /workspace/controllers/custom/custom_controller.go:128 +0x364
  github.com/aws/amazon-vpc-resource-controller-k8s/controllers/custom.(*CustomController).Start()
      /workspace/controllers/custom/custom_controller.go:137 +0x104
  sigs.k8s.io/controller-runtime/pkg/manager.(*runnableGroup).reconcile.func1()
      /go/pkg/mod/sigs.k8s.io/[email protected]/pkg/manager/runnable_group.go:218 +0x1cb
  sigs.k8s.io/controller-runtime/pkg/manager.(*runnableGroup).reconcile·dwrap·12()
      /go/pkg/mod/sigs.k8s.io/[email protected]/pkg/manager/runnable_group.go:221 +0x47

Goroutine 53 (running) created at:
  sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2()
      /go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:223 +0x4e4
  sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start()
      /go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:234 +0x2b5
  sigs.k8s.io/controller-runtime/pkg/manager.(*runnableGroup).reconcile.func1()
      /go/pkg/mod/sigs.k8s.io/[email protected]/pkg/manager/runnable_group.go:218 +0x1cb
  sigs.k8s.io/controller-runtime/pkg/manager.(*runnableGroup).reconcile·dwrap·12()
      /go/pkg/mod/sigs.k8s.io/[email protected]/pkg/manager/runnable_group.go:221 +0x47

Goroutine 144 (running) created at:
  sigs.k8s.io/controller-runtime/pkg/manager.(*runnableGroup).reconcile()
      /go/pkg/mod/sigs.k8s.io/[email protected]/pkg/manager/runnable_group.go:202 +0x23c
  sigs.k8s.io/controller-runtime/pkg/manager.(*runnableGroup).Start.func1·dwrap·11()
      /go/pkg/mod/sigs.k8s.io/[email protected]/pkg/manager/runnable_group.go:131 +0x39

Observed Behavior:
Two routines race to same address.
Expected Behavior:
W/R operations should be managed by channel or mutex to avoid the potential issue.
How to reproduce it (as minimally and precisely as possible):
Build the image with -race and run the controller.
Additional Context:

Environment:

  • Kubernetes version (use kubectl version): EKS1.19
  • CNI Version: v1.7.5
  • OS (Linux/Windows): Linux

Observed authentication behavior inconsistent with documentation

Describe the Bug:

Starting on Friday, we noticed that pods in per-pod-securitygroups enabled namespaces were failing to start. The pod events showed that vpc-resource-controller was failing to annotate the affected pods due to PodSecurityPolicy (PSP). Oddly, there was no known change to cluster RBAC or PSP during this time.

It appears as if over the weekend the identity vpc-resource-controller uses to authenticate to our clusters changed, and that the identity used is no longer the service account added to the documentation in #89, visible here.

Before the issue started the apiserver audit logs show the vpc-resource-controller authenticating to the cluster as ServiceAccount eks-vpc-resource-controller. After the issue started, audit logs show vpc-resource-controller authenticating as User eks:vpc-resource-controller. Our PSPs allow the former but not the latter. I've confirmed that ServiceAccount eks-vpc-resource-controller still exists within our clusters, and has remained undisturbed between before/after.

Expected Behavior:

vpc-resource-controller would continue to authenticate as ServiceAccount eks-vpc-resource-controller, or the linked documentation should include User eks:vpc-resource-controller.

How to reproduce it (as minimally and precisely as possible):

Unknown, as this appears to be a change on the AWS-managed-control-plane side, and vpc-resource-controller logs are not exposed in cloudwatch.

This issue started Friday morning EDT in one cluster, and then progressed to at least two other clusters over the weekend. All clusters exist in us-east-1.

Additional Context:

Environment:

  • Kubernetes version (use kubectl version): 1.21
  • CNI Version: 602401143452.dkr.ecr.us-east-1.amazonaws.com/amazon-k8s-cni:v1.11.3
  • OS (Linux/Windows): Linux

See also: Support case 10949972441

Pods using Security Group for Pods stuck in ContainerCreating due to missing SA in Pod Security Policy

Description of problem
If you are using Security Group for Pods and your pods are stuck in ContainerCreating and you also use PSP which restricts operations like patch the pod with annotation then it's possible that the PSP is restricting the VPC Resource Controller to patch the Pod with the Branch ENI Details.

How to Verify
Check the events of the pod that is stuck in ContainerCreating state. If you see the following event then it's likely it's because the VPC Controller is unable to annotate the pod.

Warning BranchAllocationFailed <invalid> (xN over <invalid>) vpc-resource-controller failed to allocate branch ENI to pod: cannot create new eni entry already exist, older entry : [0xcXXXXXXXXX]

Fix
Add the following Service Accounts used by VPC Resource Controller to your PSP.

The new SA eks-vpc-resource-controller

kind: ServiceAccount
name: eks-vpc-resource-controller
namespace: kube-system

The older SA vpc-resource-controller which will be deprecated in future.

kind: ServiceAccount
name: vpc-resource-controller
namespace: kube-system

After adding the SA, if you delete and recreate the pod it should work as expected.

Error: Trunk already exist in cache when initializing a node

Describe the Bug:

A trunk initializing conflict was observed when the controller was adding a node and initializing its trunk resources. Looks like the node was failed to be initialized for some reason and removed by the controller. During the other attempts adding the node, the trunk interface was successfully initialized and added into cache. Afterwards the other attempt tried to create and add a trunk interface for the same node again. Since the cache uses node name as key for trunk resource, the attempt failed the node was failed to be initialized. Without correctly removing the trunk record in cache, reconciling keeps failing initializing the node resource and resulted in no pods with SGP feature can be created in the node.
Observed Behavior:
Pods using the Security Group for Pods feature can not be created successfully due to the error in VPC CNI log.

{"level":"info","ts":"2022-07-12T21:59:50.620Z","caller":"rpc/rpc.pb.go:713","msg":"Send AddNetworkReply: failed to get Branch ENI resource"}

The node has correct capacity and allocable branch ENIs and trunk ENI attached.
The failing pods have request/limit branch ENI annotated but don't have branch interface network annotation added.
Expected Behavior:
The node should be added into managed pool successfully even some type of race/conflict occurs. The node's trunk/branch interfaces resources should be successfully initialized. Pods scheduled to the node should have their branch network resource annotation added successfully.
How to reproduce it (as minimally and precisely as possible):
N/A. This could be caused by rare race condition between routines or incorrect fallback when a conflict was encountered.
Additional Context:

{"level":"info","timestamp":"2022-07-11T13:09:28.873Z","logger":"node validation webhook","msg":"update request received from aws-node","node":"ip-x-x-x-x.us-west-2.compute.internal"}
{"level":"info","timestamp":"2022-07-11T17:58:28.121Z","logger":"node manager","msg":"node removed from data store","node name":"ip-x-x-x-x.us-west-2.compute.internal","request":"delete"}
{"level":"info","timestamp":"2022-07-11T17:58:28.122Z","logger":"controllers.Node","msg":"deleted the node from manager","node":"/ip-x-x-x-x.us-west-2.compute.internal"}
{"level":"info","timestamp":"2022-07-11T18:05:34.315Z","logger":"controllers.Node","msg":"adding node","node":"/ip-x-x-x-x.us-west-2.compute.internal"}
{"level":"info","timestamp":"2022-07-11T18:05:57.309Z","logger":"node manager","msg":"node was previously un-managed, will be added as managed node now","node name":"ip-x-x-x-x.us-west-2.compute.internal","request":"update"}
{"level":"error","timestamp":"2022-07-11T18:38:28.359Z","logger":"node manager","msg":"removing the node from cache as it failed to initialize","node":"ip-x-x-x-x.us-west-2.compute.internal","operation":"Init","error":"failed to load instance details: failed to find instance i-xxxxxxxxxx details from EC2 API","stacktrace":"github.com/aws/amazon-vpc-resource-controller-k8s/pkg/node/manager.(*manager).performAsyncOperation\n\t/workspace/pkg/node/manager/manager.go:307\ngithub.com/aws/amazon-vpc-resource-controller-k8s/pkg/worker.(*worker).processNextItem\n\t/workspace/pkg/worker/worker.go:147\ngithub.com/aws/amazon-vpc-resource-controller-k8s/pkg/worker.(*worker).runWorker\n\t/workspace/pkg/worker/worker.go:132"}
{"level":"info","timestamp":"2022-07-11T18:41:20.846Z","logger":"controllers.Node","msg":"adding node","node":"/ip-x-x-x-x.us-west-2.compute.internal"}
{"level":"info","timestamp":"2022-07-11T18:41:20.846Z","logger":"node manager","msg":"node added as a managed node","node name":"ip-x-x-x-x.us-west-2.compute.internal","request":"add"}
{"level":"info","timestamp":"2022-07-11T18:42:37.930Z","logger":"node manager.node resource handler","msg":"node is not initialized yet, will not advertise the capacity","node name":"ip-x-x-x-x.us-west-2.compute.internal"}
{"level":"info","timestamp":"2022-07-11T18:42:42.470Z","logger":"branch eni provider","msg":"created a new trunk interface","node name":"ip-x-x-x-x.us-west-2.compute.internal","request":"initialize","instance ID":{},"trunk id":"eni-xxxxxxxxx"}
{"level":"info","timestamp":"2022-07-11T18:42:42.470Z","logger":"branch eni provider","msg":"trunk added to cache successfully","node":"ip-x-x-x-x.us-west-2.compute.internal"}
{"level":"error","timestamp":"2022-07-11T18:42:53.725Z","logger":"branch eni provider","msg":"trunk already exist in cache","node":"ip-x-x-x-x.us-west-2.compute.internal","error":"trunk eni already exist in cache","stacktrace":"github.com/aws/amazon-vpc-resource-controller-k8s/pkg/provider/branch.(*branchENIProvider).addTrunkToCache\n\t/workspace/pkg/provider/branch/provider.go:421\ngithub.com/aws/amazon-vpc-resource-controller-k8s/pkg/provider/branch.(*branchENIProvider).InitResource\n\t/workspace/pkg/provider/branch/provider.go:172\ngithub.com/aws/amazon-vpc-resource-controller-k8s/pkg/node.(*node).InitResources\n\t/workspace/pkg/node/node.go:130\ngithub.com/aws/amazon-vpc-resource-controller-k8s/pkg/node/manager.(*manager).performAsyncOperation\n\t/workspace/pkg/node/manager/manager.go:305\ngithub.com/aws/amazon-vpc-resource-controller-k8s/pkg/worker.(*worker).processNextItem\n\t/workspace/pkg/worker/worker.go:147\ngithub.com/aws/amazon-vpc-resource-controller-k8s/pkg/worker.(*worker).runWorker\n\t/workspace/pkg/worker/worker.go:132"}
{"level":"error","timestamp":"2022-07-11T18:42:53.725Z","logger":"node manager.node resource handler","msg":"failed to init resource","node name":"ip-x-x-x-x.us-west-2.compute.internal","error":"trunk eni already exist in cache","stacktrace":"github.com/aws/amazon-vpc-resource-controller-k8s/pkg/node.(*node).InitResources\n\t/workspace/pkg/node/node.go:145\ngithub.com/aws/amazon-vpc-resource-controller-k8s/pkg/node/manager.(*manager).performAsyncOperation\n\t/workspace/pkg/node/manager/manager.go:305\ngithub.com/aws/amazon-vpc-resource-controller-k8s/pkg/worker.(*worker).processNextItem\n\t/workspace/pkg/worker/worker.go:147\ngithub.com/aws/amazon-vpc-resource-controller-k8s/pkg/worker.(*worker).runWorker\n\t/workspace/pkg/worker/worker.go:132"}
{"level":"error","timestamp":"2022-07-11T18:42:53.725Z","logger":"node manager","msg":"removing the node from cache as it failed to initialize","node":"ip-x-x-x-x.us-west-2.compute.internal","operation":"Init","error":"failed to init resources: trunk eni already exist in cache","stacktrace":"github.com/aws/amazon-vpc-resource-controller-k8s/pkg/node/manager.(*manager).performAsyncOperation\n\t/workspace/pkg/node/manager/manager.go:307\ngithub.com/aws/amazon-vpc-resource-controller-k8s/pkg/worker.(*worker).processNextItem\n\t/workspace/pkg/worker/worker.go:147\ngithub.com/aws/amazon-vpc-resource-controller-k8s/pkg/worker.(*worker).runWorker\n\t/workspace/pkg/worker/worker.go:132"}

Environment:

  • Kubernetes version (use kubectl version): EKS 1.21
  • CNI Version: v1.11.0
  • OS (Linux/Windows): Linux

Reconcile Security Groups for Pods

Describe the Bug:

When using Karpenter and SGP, pods are stuck in ContainerCreating for up to 30 minutes. aws/karpenter-provider-aws#1252

Observed Behavior:
Security groups are only attached for pods that are bound after a Node is Ready. For pods that define pod.spec.nodeName or are bound by optimistic schedulers (e.g. Karpenter), the pod will remain stuck in Container Creating

Expected Behavior:
The VPC Resource Controller should trigger a reconciliation loop for all pods on a node when the node becomes ready.

How to reproduce it (as minimally and precisely as possible):

Additional Context:

Environment:

  • Kubernetes version (use kubectl version):
  • CNI Version
  • OS (Linux/Windows):

Tags are set to Nil when creating network interfaces in https://github.com/aws/amazon-vpc-resource-controller-k8s/blob/master/pkg/provider/branch/trunk/trunk.go#L191

Tags are set to Nil when creating network interfaces in https://github.com/aws/amazon-vpc-resource-controller-k8s/blob/master/pkg/provider/branch/trunk/trunk.go#L191

This does not work for any enterprise deployments where IAM deployer policies revolve around tag conditions for resources.

Please provide a mechanism similar to AWS ENI plugin https://github.com/aws/amazon-vpc-cni-k8s#additional_eni_tags-v160 to tag resource with custom tags so that governance and ownership can easily be defined.

vpc-resource-validating-webhook causing pods to fail to create sporadically even though we're not using it

Describe the Bug:

We don't use the security group for pods feature, so we should not get errors creating pods.

We tried to create the a regular pod but recieved the following error message from the webhook

https://github.com/aws/amazon-vpc-resource-controller-k8s/blob/master/webhooks/core/pod_webhook.go#L94

kubernetes.client.exceptions.ApiException: (403)
Reason: Forbidden
HTTP response headers: HTTPHeaderDict({'Audit-Id': 'b1fea4b3-b577-4845-a4ff-9bd167448adf', 'Cache-Control': 'no-cache, private', 'Content-Type': 'application/json', 'Date': 'Wed, 23 Jun 2021 20:00:19 GMT', 'Content-Length': '290'})
HTTP response body: {"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"admission webhook \"mpod.vpc.k8s.aws\" denied the request: Webhood encountered error to Get or List object from k8s cache.","reason":"Webhood encountered error to Get or List object from k8s cache.","code":403}

I didn't even know that this admission webhook was installed by default on the EKS clusters until we got this error message.

Observed Behavior:

We got an error from this webhook when trying to create a pod.

Expected Behavior:

I expect that admission webhook to not cause any issues especially since we're not using the pod security group feature

How to reproduce it (as minimally and precisely as possible):

We're not sure how to reproduce it, this issue happens rarely after creating lots of pods over time.

Additional Context:

Environment:

  • Kubernetes version (use kubectl version): v1.19.6-eks-49a6c0
  • CNI Version: v1.7.5-eksbuild.1
  • OS (Linux/Windows): Amazon Linux 2

Support RuntimeClasses for Windows Pod Scheduling

Is your feature request related to a problem? Please describe.
Kubernetes documentation recommends to use RuntimeClasses to simplify Windows Pod Scheduling however this is not possible when using EKS as the webhook code is just looking for NodeSelector configuration in PodSpec.

Describe the solution you'd like
Looking in PodSpec in runtimeClassName is defined and search for NodeSelector config in specified RuntimeClass otherwise look NodeSelector in PodSpec.

Describe alternatives you've considered

Additional context
Take into account that defining different configurations for NodeSelector in both PodSpec and RuntimeClass will make Kubernetes scheduler to return an error.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.