Comments (23)
@sharmasushant Following up on this, thanks!
from azure-container-networking.
Thanks for the reporting. Adding a few other Azure folks who are familiar with CNI: @lachie83 @anhowe
from azure-container-networking.
@rocketraman Thanks for letting us know! We will look into this.
from azure-container-networking.
I'd also be interested in a workaround if you can provide one? Maybe some way to force cni to recycle unused IPs?
from azure-container-networking.
from azure-container-networking.
@rocketraman - Can you send us the CNI logs of node causing the issue?
Location of Logs:
/var/log/azure-vnet.log and /var/log/azure-vnet-ipam.log
from azure-container-networking.
@tamilmani1989 On two of the five nodes in my cluster, I see DEL commands failing in the logs. On the other two nodes they have complete successfully. All five nodes were created at cluster creation time -- there is nothing special about the ones in which the DEL commands are failing. Here is a snippet of the failing DEL commands from those nodes.
2017/09/30 14:34:58 [cni-net] Plugin started.
2017/09/30 14:34:58 [cni-net] Processing DEL command with args {ContainerID:f363b81299c8676e0e5b491b05e31649d39ebb07b076864b7ace76064e2e300c Netns: IfName:eth0 Args:IgnoreUnknown=1;K8S_POD_NAMESPACE=default;K8S_POD_NAME=yyy-1506541740-fms3c;K8S_POD_INFRA_CONTAINER_ID=f363b81299c8676e0e5b491b05e31649d39ebb07b076864b7ace76064e2e300c Path:/opt/azure-vnet/bin:/opt/cni/bin}.
2017/09/30 14:34:58 [cni-net] Read network configuration &{CNIVersion:0.2.0 Name:azure Type:azure-vnet Mode:bridge Master: Bridge:azure0 LogLevel: LogTarget: Ipam:{Type:azure-vnet-ipam Environment: AddrSpace: Subnet: Address: QueryInterval:}}.
2017/09/30 14:34:58 [net] Deleting endpoint f363b812-eth0 from network azure.
2017/09/30 14:34:58 [net] Deleting veth pair azvethf363b81 eth0.
2017/09/30 14:34:58 [net] Failed to delete veth pair azvethf363b81: route ip+net: no such network interface.
2017/09/30 14:34:58 [net] Failed to delete endpoint f363b812-eth0, err:route ip+net: no such network interface.
2017/09/30 14:34:58 [azure-vnet] Failed to delete endpoint: route ip+net: no such network interface.
2017/09/30 14:34:58 [cni-net] DEL command completed with err:Failed to delete endpoint: route ip+net: no such network interface.
2017/09/30 14:34:58 [cni-net] Plugin stopped.
2017/09/30 14:35:58 [cni-net] Plugin azure-vnet version v0.9.
2017/09/30 14:35:58 [cni-net] Running on Linux version 4.4.0-96-generic (buildd@lgw01-10) (gcc version 5.4.0 20160609 (Ubuntu 5.4.0-6ubuntu1~16.04.4) ) #119-Ubuntu SMP Tue Sep 12 14:59:54 UTC 2017
2017/09/30 14:35:58 [net] Network interface: {Index:1 MTU:65536 Name:lo HardwareAddr: Flags:up|loopback} with IP addresses: [127.0.0.1/8 ::1/128]
2017/09/30 14:35:58 [net] Network interface: {Index:2 MTU:1500 Name:eth0 HardwareAddr:00:0d:3a:f4:4c:84 Flags:up|broadcast} with IP addresses: [fe80::20d:3aff:fef4:4c84/64]
2017/09/30 14:35:58 [net] Network interface: {Index:3 MTU:1500 Name:docker0 HardwareAddr:02:42:dc:35:55:9f Flags:up|broadcast|multicast} with IP addresses: [172.17.0.1/16 fe80::42:dcff:fe35:559f/64]
2017/09/30 14:35:58 [net] Network interface: {Index:6 MTU:1500 Name:azure0 HardwareAddr:00:0d:3a:f4:4c:84 Flags:up|broadcast|multicast} with IP addresses: [10.2.0.189/17 fe80::20d:3aff:fef4:4c84/64]
2017/09/30 14:35:58 [net] Network interface: {Index:19 MTU:1500 Name:azveth28b1e3f HardwareAddr:2e:21:ca:a4:c9:f0 Flags:up|broadcast} with IP addresses: [fe80::2c21:caff:fea4:c9f0/64]
2017/09/30 14:35:58 [net] Network interface: {Index:21 MTU:1500 Name:azveth311e7b3 HardwareAddr:7e:6f:ae:1b:57:fc Flags:up|broadcast} with IP addresses: [fe80::7c6f:aeff:fe1b:57fc/64]
2017/09/30 14:35:58 [net] Network interface: {Index:33 MTU:1500 Name:azveth0a6ad70 HardwareAddr:ca:eb:65:0e:88:b7 Flags:up|broadcast} with IP addresses: [fe80::c8eb:65ff:fe0e:88b7/64]
2017/09/30 14:35:58 [net] Network interface: {Index:997 MTU:1500 Name:azvethafa8f14 HardwareAddr:ba:83:b7:bb:27:f8 Flags:up|broadcast} with IP addresses: [fe80::b883:b7ff:febb:27f8/64]
2017/09/30 14:35:58 [net] Network interface: {Index:1005 MTU:1500 Name:azvethab20fbc HardwareAddr:16:2f:c3:44:5b:ff Flags:up|broadcast} with IP addresses: [fe80::142f:c3ff:fe44:5bff/64]
2017/09/30 14:35:58 [net] Store timestamp is 2017-09-28 07:57:01.325437193 +0000 UTC.
2017/09/30 14:35:58 [net] Restored state, &{Version:v0.9 TimeStamp:2017-09-28 07:57:01.329397489 +0000 UTC ExternalInterfaces:map[eth0:0xc420138000] store:0xc420016de0 Mutex:{state:0 sema:0}}
And here are the logs from the other failing node:
017/09/24 16:16:56 [cni-net] Plugin started.
2017/09/24 16:16:56 [cni-net] Processing DEL command with args {ContainerID:c62015c61b57966292b1890cce11063f4ae84e8693275060df75269af6044f15 Netns: IfName:eth0 Args:IgnoreUnknown=1;K8S_POD_NAMESPACE=default;K8S_POD_NAME=xxx-1760939698-f7sv9;K8S_POD_INFRA_CONTAINER_ID=c62015c61b57966292b1890cce11063f4ae84e8693275060df75269af6044f15 Path:/opt/azure-vnet/bin:/opt/cni/bin}.
2017/09/24 16:16:56 [cni-net] Read network configuration &{CNIVersion:0.2.0 Name:azure Type:azure-vnet Mode:bridge Master: Bridge:azure0 LogLevel: LogTarget: Ipam:{Type:azure-vnet-ipam Environment: AddrSpace: Subnet: Address: QueryInterval:}}.
2017/09/24 16:16:56 [net] Deleting endpoint c62015c6-eth0 from network azure.
2017/09/24 16:16:56 [net] Deleting veth pair azvethc62015c eth0.
2017/09/24 16:16:56 [net] Failed to delete veth pair azvethc62015c: route ip+net: no such network interface.
2017/09/24 16:16:56 [net] Failed to delete endpoint c62015c6-eth0, err:route ip+net: no such network interface.
2017/09/24 16:16:56 [azure-vnet] Failed to delete endpoint: route ip+net: no such network interface.
2017/09/24 16:16:56 [cni-net] DEL command completed with err:Failed to delete endpoint: route ip+net: no such network interface.
2017/09/24 16:16:56 [cni-net] Plugin stopped.
2017/09/24 16:17:56 [cni-net] Plugin azure-vnet version v0.9.
2017/09/24 16:17:56 [cni-net] Running on Linux version 4.4.0-96-generic (buildd@lgw01-10) (gcc version 5.4.0 20160609 (Ubuntu 5.4.0-6ubuntu1~16.04.4) ) #119-Ubuntu SMP Tue Sep 12 14:59:54 UTC 2017
2017/09/24 16:17:56 [net] Network interface: {Index:1 MTU:65536 Name:lo HardwareAddr: Flags:up|loopback} with IP addresses: [127.0.0.1/8 ::1/128]
2017/09/24 16:17:56 [net] Network interface: {Index:2 MTU:1500 Name:eth0 HardwareAddr:00:0d:3a:f4:40:d8 Flags:up|broadcast} with IP addresses: [fe80::20d:3aff:fef4:40d8/64]
2017/09/24 16:17:56 [net] Network interface: {Index:3 MTU:1500 Name:docker0 HardwareAddr:02:42:b5:1e:06:b2 Flags:up|broadcast|multicast} with IP addresses: [172.17.0.1/16 fe80::42:b5ff:fe1e:6b2/64]
2017/09/24 16:17:56 [net] Network interface: {Index:6 MTU:1500 Name:azure0 HardwareAddr:00:0d:3a:f4:40:d8 Flags:up|broadcast|multicast} with IP addresses: [10.2.0.34/17 fe80::20d:3aff:fef4:40d8/64]
2017/09/24 16:17:56 [net] Network interface: {Index:7 MTU:1500 Name:azvethf0707dd HardwareAddr:96:4d:44:62:41:41 Flags:up|broadcast} with IP addresses: [fe80::944d:44ff:fe62:4141/64]
2017/09/24 16:17:56 [net] Network interface: {Index:9 MTU:1500 Name:azveth2756177 HardwareAddr:d6:9d:87:8c:fb:10 Flags:up|broadcast} with IP addresses: [fe80::d49d:87ff:fe8c:fb10/64]
2017/09/24 16:17:56 [net] Network interface: {Index:11 MTU:1500 Name:azveth38821cd HardwareAddr:9e:43:20:62:15:51 Flags:up|broadcast} with IP addresses: [fe80::9c43:20ff:fe62:1551/64]
2017/09/24 16:17:56 [net] Network interface: {Index:33 MTU:1500 Name:azvetheb8b62e HardwareAddr:52:70:e3:04:55:d3 Flags:up|broadcast} with IP addresses: [fe80::5070:e3ff:fe04:55d3/64]
2017/09/24 16:17:56 [net] Network interface: {Index:35 MTU:1500 Name:azveth4a96cc1 HardwareAddr:c6:20:f6:59:89:ea Flags:up|broadcast} with IP addresses: [fe80::c420:f6ff:fe59:89ea/64]
2017/09/24 16:17:56 [net] Network interface: {Index:41 MTU:1500 Name:azvethbed0aa2 HardwareAddr:da:00:d2:e5:ed:1a Flags:up|broadcast} with IP addresses: [fe80::d800:d2ff:fee5:ed1a/64]
2017/09/24 16:17:56 [net] Store timestamp is 2017-09-23 18:21:22.550734741 +0000 UTC.
2017/09/24 16:17:56 [net] Restored state, &{Version:v0.9 TimeStamp:2017-09-23 18:21:22.553208047 +0000 UTC ExternalInterfaces:map[eth0:0xc4200180c0] store:0xc420066d80 Mutex:{state:0 sema:0}}
Every single DEL command on these two nodes is failing in the same way.
For contrast, here are some logs for the DEL command on one of the three working nodes.
2017/09/21 03:48:12 [cni-net] Plugin started.
2017/09/21 03:48:12 [cni-net] Processing DEL command with args {ContainerID:d7462e28528cb029b3c77732913ed044185284c47748872ab635dee5f3050243 Netns:/proc/2093/ns/net IfName:eth0 Args:IgnoreUnknown=1;K8S_POD_NAMESPACE=kube-system
;K8S_POD_NAME=heapster-2888171832-vdhg1;K8S_POD_INFRA_CONTAINER_ID=d7462e28528cb029b3c77732913ed044185284c47748872ab635dee5f3050243 Path:/opt/azure-vnet/bin:/opt/cni/bin}.
2017/09/21 03:48:12 [cni-net] Read network configuration &{CNIVersion:0.2.0 Name:azure Type:azure-vnet Mode:bridge Master: Bridge:azure0 LogLevel: LogTarget: Ipam:{Type:azure-vnet-ipam Environment: AddrSpace: Subnet: Address: QueryInterval:}}.
2017/09/21 03:48:12 [net] Deleting endpoint d7462e28-eth0 from network azure.
2017/09/21 03:48:12 [net] Deleting veth pair azvethd7462e2 eth0.
2017/09/21 03:48:13 [net] Deleting ARP reply rule for IP address 10.2.0.133/17 on d7462e28-eth0.
2017/09/21 03:48:13 [net] Deleting MAC DNAT rule for IP address 10.2.0.133/17 on d7462e28-eth0.
2017/09/21 03:48:13 [net] Deleted endpoint &{Id:d7462e28-eth0 HnsId: SandboxKey: IfName:eth0 HostIfName:azvethd7462e2 MacAddress:0e:35:4a:88:b0:d0 IPAddresses:[{IP:10.2.0.133 Mask:ffff8000}] Gateways:[10.2.0.1]}.
2017/09/21 03:48:13 [net] Save succeeded.
2017/09/21 03:48:13 [cni] Calling plugin azure-vnet-ipam DEL nwCfg:&{CNIVersion:0.2.0 Name:azure Type:azure-vnet Mode:bridge Master: Bridge:azure0 LogLevel: LogTarget: Ipam:{Type:azure-vnet-ipam Environment: AddrSpace: Subnet:10.2.0.0/17 Address:10.2.0.133 QueryInterval:}}.
2017/09/21 03:48:13 [cni] Plugin azure-vnet-ipam returned err:<nil>.
2017/09/21 03:48:13 [cni-net] DEL command completed with err:<nil>.
2017/09/21 03:48:13 [cni-net] Plugin stopped.
2017/09/21 03:48:13 [cni-net] Plugin azure-vnet version v0.9.
2017/09/21 03:48:13 [cni-net] Running on Linux version 4.4.0-96-generic (buildd@lgw01-10) (gcc version 5.4.0 20160609 (Ubuntu 5.4.0-6ubuntu1~16.04.4) ) #119-Ubuntu SMP Tue Sep 12 14:59:54 UTC 2017
2017/09/21 03:48:13 [net] Network interface: {Index:1 MTU:65536 Name:lo HardwareAddr: Flags:up|loopback} with IP addresses: [127.0.0.1/8 ::1/128]
2017/09/21 03:48:13 [net] Network interface: {Index:2 MTU:1500 Name:eth0 HardwareAddr:00:0d:3a:f4:44:42 Flags:up|broadcast} with IP addresses: [fe80::20d:3aff:fef4:4442/64]
2017/09/21 03:48:13 [net] Network interface: {Index:3 MTU:1500 Name:docker0 HardwareAddr:02:42:d3:09:07:e4 Flags:up|broadcast|multicast} with IP addresses: [172.17.0.1/16 fe80::42:d3ff:fe09:7e4/64]
2017/09/21 03:48:13 [net] Network interface: {Index:6 MTU:1500 Name:azure0 HardwareAddr:00:0d:3a:f4:44:42 Flags:up|broadcast|multicast} with IP addresses: [10.2.0.127/17 fe80::20d:3aff:fef4:4442/64]
2017/09/21 03:48:13 [net] Store timestamp is 2017-09-21 03:48:13.004817097 +0000 UTC.
2017/09/21 03:48:13 [net] Restored state, &{Version:v0.9 TimeStamp:2017-09-21 03:48:13.0061967 +0000 UTC ExternalInterfaces:map[eth0:0xc420018600] store:0xc420016de0 Mutex:{state:0 sema:0}}
Let me know if this is sufficient? If you provide some contact info, I can send more complete logs privately.
from azure-container-networking.
@rocketraman - Can you please attach both azure-vnet and azure-vnet-ipam logs of any one of failing nodes by dragging and dropping?
from azure-container-networking.
@tamilmani1989 Ok
azure-vnet-logs.zip
from azure-container-networking.
@rocketraman Have you manually deleted any of the veth interfaces(eg: azvethf363b81) in node?
from azure-container-networking.
@rocketraman Is it possible that delete is getting called before create is finished? We will need logs that contain both creation and deletion of containers. Your previous logs does not seem to have creation.
Edit: saw that you are using acs-engine.
from azure-container-networking.
@tamilmani1989 No I have not deleted any interfaces manually. I did delete one vnet IP address from the portal a couple of days ago, as I was trying to find a way to recover from this. I quickly restored it, realizing that approach wasn't going to get me anywhere. However, any logs from September should not be affected by this at all.
@sharmasushant I uploaded all of the logs in the zip file above. Yes, I am using acs-engine.
from azure-container-networking.
We looked further. It seems that container namespace and veth is being removed prior to the call being made to CNI for deleting endpoint. As per CNI spec, we need to handle this and release IPs even if veth/container-namespace are missing. We will provide a bug fix for this. Thanks @rocketraman for reporting this.
from azure-container-networking.
@sharmasushant Great!
Now that you have found the issue, are you able to provide a workaround?
from azure-container-networking.
Any news on the fix for this serious issue?
from azure-container-networking.
@rocketraman Sorry for the delay. Some internal deliverables required our attention so this got delayed. We are targeting next week for a new release that will have fix for the issue. Will keep you posted.
from azure-container-networking.
@tamilmani1989 @sharmasushant I see you guys merged e246fec that fixes this... awesome! A reference to this issue number somewhere in the commit log would have been nice to confirm this is fixed. Assuming that it is, when do you this will be released and included in acs-engine?
from azure-container-networking.
@sharmasushant I believe acs-engine is downloading the CNI plugin from https://acs-mirror.azureedge.net/cni/cni-plugins-amd64-latest.tgz. Can you make updating that URL a part of the release process for CNI?
cc: @colemickens
from azure-container-networking.
Sure @rocketraman, We will make the process of updating acs-engine more streamlined.
from azure-container-networking.
This issue is open, but the comments seem to indicate this has been fixed.
Whats the current state?
from azure-container-networking.
@edevil Yes, the issue is now fixed.
from azure-container-networking.
Closing the issue would make it clear.
from azure-container-networking.
This is now fixed. Closing the issue.
from azure-container-networking.
Related Issues (20)
- CNI not setting the Flags in endpoint policies to use IPV6 in case of DualStack cluster HOT 2
- overlay IPAM not reporting version HOT 4
- Azure CNI breaks with Linux kernel 6.2 HOT 2
- Test
- question iptables vs firewalld / non(AKS) Kubernetes HOT 3
- NPM constantly output telemetry errors HOT 2
- Critical CVEs HOT 2
- Add support for port ranges in NetworkPolicy with "endPort" field HOT 3
- On k8s 1.28 (public preview), Windows NPM crash HOT 1
- CNI Installer migration
- Change flow of using Azure Virtual Network Subnet in the Azure Kubernetes Service HOT 4
- Invalid CIDR address when creating Docker network using CNM plugin HOT 1
- Authenticate with Managed Identity from a Docker container connected to an Azure VNet using CNM plugin
- Azure CNI dynamic ip allocation batch size does not starts with 16 HOT 1
- Remove assests from release which are not supported
- Remove CNMS Code HOT 1
- Windows LoopbackDSR policy missing for IPv6 HOT 9
- Remove CNM Code HOT 9
- Unable to send packets from pods on certain nodes to certain pods in the rest of the cluster. HOT 7
- Failed to initialize key-value store of network plugin: error Acquiring store lock: timed out locking store HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from azure-container-networking.