Git Product home page Git Product logo

Comments (23)

rocketraman avatar rocketraman commented on July 19, 2024 1

@sharmasushant Following up on this, thanks!

from azure-container-networking.

seanknox avatar seanknox commented on July 19, 2024

Thanks for the reporting. Adding a few other Azure folks who are familiar with CNI: @lachie83 @anhowe

from azure-container-networking.

sharmasushant avatar sharmasushant commented on July 19, 2024

@rocketraman Thanks for letting us know! We will look into this.

from azure-container-networking.

rocketraman avatar rocketraman commented on July 19, 2024

I'd also be interested in a workaround if you can provide one? Maybe some way to force cni to recycle unused IPs?

from azure-container-networking.

jeffbarnes769 avatar jeffbarnes769 commented on July 19, 2024

from azure-container-networking.

tamilmani1989 avatar tamilmani1989 commented on July 19, 2024

@rocketraman - Can you send us the CNI logs of node causing the issue?
Location of Logs:
/var/log/azure-vnet.log and /var/log/azure-vnet-ipam.log

from azure-container-networking.

rocketraman avatar rocketraman commented on July 19, 2024

@tamilmani1989 On two of the five nodes in my cluster, I see DEL commands failing in the logs. On the other two nodes they have complete successfully. All five nodes were created at cluster creation time -- there is nothing special about the ones in which the DEL commands are failing. Here is a snippet of the failing DEL commands from those nodes.

2017/09/30 14:34:58 [cni-net] Plugin started.
2017/09/30 14:34:58 [cni-net] Processing DEL command with args {ContainerID:f363b81299c8676e0e5b491b05e31649d39ebb07b076864b7ace76064e2e300c Netns: IfName:eth0 Args:IgnoreUnknown=1;K8S_POD_NAMESPACE=default;K8S_POD_NAME=yyy-1506541740-fms3c;K8S_POD_INFRA_CONTAINER_ID=f363b81299c8676e0e5b491b05e31649d39ebb07b076864b7ace76064e2e300c Path:/opt/azure-vnet/bin:/opt/cni/bin}.
2017/09/30 14:34:58 [cni-net] Read network configuration &{CNIVersion:0.2.0 Name:azure Type:azure-vnet Mode:bridge Master: Bridge:azure0 LogLevel: LogTarget: Ipam:{Type:azure-vnet-ipam Environment: AddrSpace: Subnet: Address: QueryInterval:}}.
2017/09/30 14:34:58 [net] Deleting endpoint f363b812-eth0 from network azure.
2017/09/30 14:34:58 [net] Deleting veth pair azvethf363b81 eth0.
2017/09/30 14:34:58 [net] Failed to delete veth pair azvethf363b81: route ip+net: no such network interface.
2017/09/30 14:34:58 [net] Failed to delete endpoint f363b812-eth0, err:route ip+net: no such network interface.
2017/09/30 14:34:58 [azure-vnet] Failed to delete endpoint: route ip+net: no such network interface.
2017/09/30 14:34:58 [cni-net] DEL command completed with err:Failed to delete endpoint: route ip+net: no such network interface.
2017/09/30 14:34:58 [cni-net] Plugin stopped.
2017/09/30 14:35:58 [cni-net] Plugin azure-vnet version v0.9.
2017/09/30 14:35:58 [cni-net] Running on Linux version 4.4.0-96-generic (buildd@lgw01-10) (gcc version 5.4.0 20160609 (Ubuntu 5.4.0-6ubuntu1~16.04.4) ) #119-Ubuntu SMP Tue Sep 12 14:59:54 UTC 2017
2017/09/30 14:35:58 [net] Network interface: {Index:1 MTU:65536 Name:lo HardwareAddr: Flags:up|loopback} with IP addresses: [127.0.0.1/8 ::1/128]
2017/09/30 14:35:58 [net] Network interface: {Index:2 MTU:1500 Name:eth0 HardwareAddr:00:0d:3a:f4:4c:84 Flags:up|broadcast} with IP addresses: [fe80::20d:3aff:fef4:4c84/64]
2017/09/30 14:35:58 [net] Network interface: {Index:3 MTU:1500 Name:docker0 HardwareAddr:02:42:dc:35:55:9f Flags:up|broadcast|multicast} with IP addresses: [172.17.0.1/16 fe80::42:dcff:fe35:559f/64]
2017/09/30 14:35:58 [net] Network interface: {Index:6 MTU:1500 Name:azure0 HardwareAddr:00:0d:3a:f4:4c:84 Flags:up|broadcast|multicast} with IP addresses: [10.2.0.189/17 fe80::20d:3aff:fef4:4c84/64]
2017/09/30 14:35:58 [net] Network interface: {Index:19 MTU:1500 Name:azveth28b1e3f HardwareAddr:2e:21:ca:a4:c9:f0 Flags:up|broadcast} with IP addresses: [fe80::2c21:caff:fea4:c9f0/64]
2017/09/30 14:35:58 [net] Network interface: {Index:21 MTU:1500 Name:azveth311e7b3 HardwareAddr:7e:6f:ae:1b:57:fc Flags:up|broadcast} with IP addresses: [fe80::7c6f:aeff:fe1b:57fc/64]
2017/09/30 14:35:58 [net] Network interface: {Index:33 MTU:1500 Name:azveth0a6ad70 HardwareAddr:ca:eb:65:0e:88:b7 Flags:up|broadcast} with IP addresses: [fe80::c8eb:65ff:fe0e:88b7/64]
2017/09/30 14:35:58 [net] Network interface: {Index:997 MTU:1500 Name:azvethafa8f14 HardwareAddr:ba:83:b7:bb:27:f8 Flags:up|broadcast} with IP addresses: [fe80::b883:b7ff:febb:27f8/64]
2017/09/30 14:35:58 [net] Network interface: {Index:1005 MTU:1500 Name:azvethab20fbc HardwareAddr:16:2f:c3:44:5b:ff Flags:up|broadcast} with IP addresses: [fe80::142f:c3ff:fe44:5bff/64]
2017/09/30 14:35:58 [net] Store timestamp is 2017-09-28 07:57:01.325437193 +0000 UTC.
2017/09/30 14:35:58 [net] Restored state, &{Version:v0.9 TimeStamp:2017-09-28 07:57:01.329397489 +0000 UTC ExternalInterfaces:map[eth0:0xc420138000] store:0xc420016de0 Mutex:{state:0 sema:0}}

And here are the logs from the other failing node:

017/09/24 16:16:56 [cni-net] Plugin started.
2017/09/24 16:16:56 [cni-net] Processing DEL command with args {ContainerID:c62015c61b57966292b1890cce11063f4ae84e8693275060df75269af6044f15 Netns: IfName:eth0 Args:IgnoreUnknown=1;K8S_POD_NAMESPACE=default;K8S_POD_NAME=xxx-1760939698-f7sv9;K8S_POD_INFRA_CONTAINER_ID=c62015c61b57966292b1890cce11063f4ae84e8693275060df75269af6044f15 Path:/opt/azure-vnet/bin:/opt/cni/bin}.
2017/09/24 16:16:56 [cni-net] Read network configuration &{CNIVersion:0.2.0 Name:azure Type:azure-vnet Mode:bridge Master: Bridge:azure0 LogLevel: LogTarget: Ipam:{Type:azure-vnet-ipam Environment: AddrSpace: Subnet: Address: QueryInterval:}}.
2017/09/24 16:16:56 [net] Deleting endpoint c62015c6-eth0 from network azure.
2017/09/24 16:16:56 [net] Deleting veth pair azvethc62015c eth0.
2017/09/24 16:16:56 [net] Failed to delete veth pair azvethc62015c: route ip+net: no such network interface.
2017/09/24 16:16:56 [net] Failed to delete endpoint c62015c6-eth0, err:route ip+net: no such network interface.
2017/09/24 16:16:56 [azure-vnet] Failed to delete endpoint: route ip+net: no such network interface.
2017/09/24 16:16:56 [cni-net] DEL command completed with err:Failed to delete endpoint: route ip+net: no such network interface.
2017/09/24 16:16:56 [cni-net] Plugin stopped.
2017/09/24 16:17:56 [cni-net] Plugin azure-vnet version v0.9.
2017/09/24 16:17:56 [cni-net] Running on Linux version 4.4.0-96-generic (buildd@lgw01-10) (gcc version 5.4.0 20160609 (Ubuntu 5.4.0-6ubuntu1~16.04.4) ) #119-Ubuntu SMP Tue Sep 12 14:59:54 UTC 2017
2017/09/24 16:17:56 [net] Network interface: {Index:1 MTU:65536 Name:lo HardwareAddr: Flags:up|loopback} with IP addresses: [127.0.0.1/8 ::1/128]
2017/09/24 16:17:56 [net] Network interface: {Index:2 MTU:1500 Name:eth0 HardwareAddr:00:0d:3a:f4:40:d8 Flags:up|broadcast} with IP addresses: [fe80::20d:3aff:fef4:40d8/64]
2017/09/24 16:17:56 [net] Network interface: {Index:3 MTU:1500 Name:docker0 HardwareAddr:02:42:b5:1e:06:b2 Flags:up|broadcast|multicast} with IP addresses: [172.17.0.1/16 fe80::42:b5ff:fe1e:6b2/64]
2017/09/24 16:17:56 [net] Network interface: {Index:6 MTU:1500 Name:azure0 HardwareAddr:00:0d:3a:f4:40:d8 Flags:up|broadcast|multicast} with IP addresses: [10.2.0.34/17 fe80::20d:3aff:fef4:40d8/64]
2017/09/24 16:17:56 [net] Network interface: {Index:7 MTU:1500 Name:azvethf0707dd HardwareAddr:96:4d:44:62:41:41 Flags:up|broadcast} with IP addresses: [fe80::944d:44ff:fe62:4141/64]
2017/09/24 16:17:56 [net] Network interface: {Index:9 MTU:1500 Name:azveth2756177 HardwareAddr:d6:9d:87:8c:fb:10 Flags:up|broadcast} with IP addresses: [fe80::d49d:87ff:fe8c:fb10/64]
2017/09/24 16:17:56 [net] Network interface: {Index:11 MTU:1500 Name:azveth38821cd HardwareAddr:9e:43:20:62:15:51 Flags:up|broadcast} with IP addresses: [fe80::9c43:20ff:fe62:1551/64]
2017/09/24 16:17:56 [net] Network interface: {Index:33 MTU:1500 Name:azvetheb8b62e HardwareAddr:52:70:e3:04:55:d3 Flags:up|broadcast} with IP addresses: [fe80::5070:e3ff:fe04:55d3/64]
2017/09/24 16:17:56 [net] Network interface: {Index:35 MTU:1500 Name:azveth4a96cc1 HardwareAddr:c6:20:f6:59:89:ea Flags:up|broadcast} with IP addresses: [fe80::c420:f6ff:fe59:89ea/64]
2017/09/24 16:17:56 [net] Network interface: {Index:41 MTU:1500 Name:azvethbed0aa2 HardwareAddr:da:00:d2:e5:ed:1a Flags:up|broadcast} with IP addresses: [fe80::d800:d2ff:fee5:ed1a/64]
2017/09/24 16:17:56 [net] Store timestamp is 2017-09-23 18:21:22.550734741 +0000 UTC.
2017/09/24 16:17:56 [net] Restored state, &{Version:v0.9 TimeStamp:2017-09-23 18:21:22.553208047 +0000 UTC ExternalInterfaces:map[eth0:0xc4200180c0] store:0xc420066d80 Mutex:{state:0 sema:0}}

Every single DEL command on these two nodes is failing in the same way.

For contrast, here are some logs for the DEL command on one of the three working nodes.

2017/09/21 03:48:12 [cni-net] Plugin started.
2017/09/21 03:48:12 [cni-net] Processing DEL command with args {ContainerID:d7462e28528cb029b3c77732913ed044185284c47748872ab635dee5f3050243 Netns:/proc/2093/ns/net IfName:eth0 Args:IgnoreUnknown=1;K8S_POD_NAMESPACE=kube-system
;K8S_POD_NAME=heapster-2888171832-vdhg1;K8S_POD_INFRA_CONTAINER_ID=d7462e28528cb029b3c77732913ed044185284c47748872ab635dee5f3050243 Path:/opt/azure-vnet/bin:/opt/cni/bin}.
2017/09/21 03:48:12 [cni-net] Read network configuration &{CNIVersion:0.2.0 Name:azure Type:azure-vnet Mode:bridge Master: Bridge:azure0 LogLevel: LogTarget: Ipam:{Type:azure-vnet-ipam Environment: AddrSpace: Subnet: Address: QueryInterval:}}.
2017/09/21 03:48:12 [net] Deleting endpoint d7462e28-eth0 from network azure.
2017/09/21 03:48:12 [net] Deleting veth pair azvethd7462e2 eth0.
2017/09/21 03:48:13 [net] Deleting ARP reply rule for IP address 10.2.0.133/17 on d7462e28-eth0.
2017/09/21 03:48:13 [net] Deleting MAC DNAT rule for IP address 10.2.0.133/17 on d7462e28-eth0.
2017/09/21 03:48:13 [net] Deleted endpoint &{Id:d7462e28-eth0 HnsId: SandboxKey: IfName:eth0 HostIfName:azvethd7462e2 MacAddress:0e:35:4a:88:b0:d0 IPAddresses:[{IP:10.2.0.133 Mask:ffff8000}] Gateways:[10.2.0.1]}.
2017/09/21 03:48:13 [net] Save succeeded.
2017/09/21 03:48:13 [cni] Calling plugin azure-vnet-ipam DEL nwCfg:&{CNIVersion:0.2.0 Name:azure Type:azure-vnet Mode:bridge Master: Bridge:azure0 LogLevel: LogTarget: Ipam:{Type:azure-vnet-ipam Environment: AddrSpace: Subnet:10.2.0.0/17 Address:10.2.0.133 QueryInterval:}}.
2017/09/21 03:48:13 [cni] Plugin azure-vnet-ipam returned err:<nil>.
2017/09/21 03:48:13 [cni-net] DEL command completed with err:<nil>.
2017/09/21 03:48:13 [cni-net] Plugin stopped.
2017/09/21 03:48:13 [cni-net] Plugin azure-vnet version v0.9.
2017/09/21 03:48:13 [cni-net] Running on Linux version 4.4.0-96-generic (buildd@lgw01-10) (gcc version 5.4.0 20160609 (Ubuntu 5.4.0-6ubuntu1~16.04.4) ) #119-Ubuntu SMP Tue Sep 12 14:59:54 UTC 2017
2017/09/21 03:48:13 [net] Network interface: {Index:1 MTU:65536 Name:lo HardwareAddr: Flags:up|loopback} with IP addresses: [127.0.0.1/8 ::1/128]
2017/09/21 03:48:13 [net] Network interface: {Index:2 MTU:1500 Name:eth0 HardwareAddr:00:0d:3a:f4:44:42 Flags:up|broadcast} with IP addresses: [fe80::20d:3aff:fef4:4442/64]
2017/09/21 03:48:13 [net] Network interface: {Index:3 MTU:1500 Name:docker0 HardwareAddr:02:42:d3:09:07:e4 Flags:up|broadcast|multicast} with IP addresses: [172.17.0.1/16 fe80::42:d3ff:fe09:7e4/64]
2017/09/21 03:48:13 [net] Network interface: {Index:6 MTU:1500 Name:azure0 HardwareAddr:00:0d:3a:f4:44:42 Flags:up|broadcast|multicast} with IP addresses: [10.2.0.127/17 fe80::20d:3aff:fef4:4442/64]
2017/09/21 03:48:13 [net] Store timestamp is 2017-09-21 03:48:13.004817097 +0000 UTC.
2017/09/21 03:48:13 [net] Restored state, &{Version:v0.9 TimeStamp:2017-09-21 03:48:13.0061967 +0000 UTC ExternalInterfaces:map[eth0:0xc420018600] store:0xc420016de0 Mutex:{state:0 sema:0}}

Let me know if this is sufficient? If you provide some contact info, I can send more complete logs privately.

from azure-container-networking.

tamilmani1989 avatar tamilmani1989 commented on July 19, 2024

@rocketraman - Can you please attach both azure-vnet and azure-vnet-ipam logs of any one of failing nodes by dragging and dropping?

from azure-container-networking.

rocketraman avatar rocketraman commented on July 19, 2024

@tamilmani1989 Ok
azure-vnet-logs.zip

from azure-container-networking.

tamilmani1989 avatar tamilmani1989 commented on July 19, 2024

@rocketraman Have you manually deleted any of the veth interfaces(eg: azvethf363b81) in node?

from azure-container-networking.

sharmasushant avatar sharmasushant commented on July 19, 2024

@rocketraman Is it possible that delete is getting called before create is finished? We will need logs that contain both creation and deletion of containers. Your previous logs does not seem to have creation.

Edit: saw that you are using acs-engine.

from azure-container-networking.

rocketraman avatar rocketraman commented on July 19, 2024

@tamilmani1989 No I have not deleted any interfaces manually. I did delete one vnet IP address from the portal a couple of days ago, as I was trying to find a way to recover from this. I quickly restored it, realizing that approach wasn't going to get me anywhere. However, any logs from September should not be affected by this at all.

@sharmasushant I uploaded all of the logs in the zip file above. Yes, I am using acs-engine.

from azure-container-networking.

sharmasushant avatar sharmasushant commented on July 19, 2024

We looked further. It seems that container namespace and veth is being removed prior to the call being made to CNI for deleting endpoint. As per CNI spec, we need to handle this and release IPs even if veth/container-namespace are missing. We will provide a bug fix for this. Thanks @rocketraman for reporting this.

from azure-container-networking.

rocketraman avatar rocketraman commented on July 19, 2024

@sharmasushant Great!

Now that you have found the issue, are you able to provide a workaround?

from azure-container-networking.

rocketraman avatar rocketraman commented on July 19, 2024

Any news on the fix for this serious issue?

from azure-container-networking.

sharmasushant avatar sharmasushant commented on July 19, 2024

@rocketraman Sorry for the delay. Some internal deliverables required our attention so this got delayed. We are targeting next week for a new release that will have fix for the issue. Will keep you posted.

from azure-container-networking.

rocketraman avatar rocketraman commented on July 19, 2024

@tamilmani1989 @sharmasushant I see you guys merged e246fec that fixes this... awesome! A reference to this issue number somewhere in the commit log would have been nice to confirm this is fixed. Assuming that it is, when do you this will be released and included in acs-engine?

from azure-container-networking.

rocketraman avatar rocketraman commented on July 19, 2024

@sharmasushant I believe acs-engine is downloading the CNI plugin from https://acs-mirror.azureedge.net/cni/cni-plugins-amd64-latest.tgz. Can you make updating that URL a part of the release process for CNI?

cc: @colemickens

from azure-container-networking.

sharmasushant avatar sharmasushant commented on July 19, 2024

Sure @rocketraman, We will make the process of updating acs-engine more streamlined.

from azure-container-networking.

edevil avatar edevil commented on July 19, 2024

This issue is open, but the comments seem to indicate this has been fixed.

Whats the current state?

from azure-container-networking.

sharmasushant avatar sharmasushant commented on July 19, 2024

@edevil Yes, the issue is now fixed.

from azure-container-networking.

edevil avatar edevil commented on July 19, 2024

Closing the issue would make it clear.

from azure-container-networking.

sharmasushant avatar sharmasushant commented on July 19, 2024

This is now fixed. Closing the issue.

from azure-container-networking.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.