Git Product home page Git Product logo

Comments (16)

sharmasushant avatar sharmasushant commented on July 19, 2024

Ack! We will look and update if fix is needed.. Thanks Patrick!

from azure-container-networking.

atomaras avatar atomaras commented on July 19, 2024

I think it's worth noting that I've been having this issue on a brand new cluster with barely any containers running on the failed node.

from azure-container-networking.

sharmasushant avatar sharmasushant commented on July 19, 2024

@atomaras What version of azure-cni and aci-engine are you using? Also, can you please confirm if you are using server 1803 as agent VMs?

from azure-container-networking.

atomaras avatar atomaras commented on July 19, 2024

@sharmasushant
acs-engine: 0.19.5
azure-cni: containernetworking/networkmonitor:v0.0.4
k8s: 1.11.1
agent VMs: Server 1803

from azure-container-networking.

sharmasushant avatar sharmasushant commented on July 19, 2024

@atomaras networkmonitor is not valid for windows cluster. Also, I think you mistyped acs-engine version, and missed azure-cni version. Can you please provide those?

from azure-container-networking.

atomaras avatar atomaras commented on July 19, 2024

@sharmasushant Fixed the acs-engine version. I have a k8s cluster with 1 linux master and 2 windows nodes. The azure-cni image i mentioned is the actual image acs-engine selected for my cluster.

from azure-container-networking.

sharmasushant avatar sharmasushant commented on July 19, 2024

Can you please attach the log file C:\K\azure-vnet.log
It should contain the version of azure-cni v1.0.xxx

from azure-container-networking.

atomaras avatar atomaras commented on July 19, 2024

@sharmasushant This one?
2018/07/20 20:52:57 [cni-net] Plugin azure-vnet version v1.0.7.

The only other cni version i see is CNIVersion 0.3.0

from azure-container-networking.

tamilmani1989 avatar tamilmani1989 commented on July 19, 2024

@PatrickLang Opened an issue in containernetworking/cni github repo - containernetworking/cni#571

from azure-container-networking.

Cherishty avatar Cherishty commented on July 19, 2024

Since this issue has not been fixed yet, do we have any quick fix or workaround now? Or I have to re-deploy a totally new cluster?

from azure-container-networking.

tamilmani1989 avatar tamilmani1989 commented on July 19, 2024

We identified the cause of the issue and will fix it in next release. The issue is CNI should not write result to stdout in case of error.

from azure-container-networking.

PatrickLang avatar PatrickLang commented on July 19, 2024

kubelet.log before

2018/08/01 22:04:51 "Start Flag true CniSucceeded false Name CNI Version v1.0.7 ErrorMessage A Compute System with the specified identifier does not exist. vnet [] 
				Context AzureCNI SubContext "
2018/08/01 22:04:51 OrchestratorDetails &{  kubectl command failed due to exit status 1}
2018/08/01 22:04:51 OSDetails &{windows    }
2018/08/01 22:04:51 SystemDetails &{0 0 0 0 0 0 }
2018/08/01 22:04:51 InterfaceDetails &{Primary 10.240.0.0/12 10.240.0.96 00:0d:3a:f9:09:bb vEthernet (Ethernet 2) 30 0 }
2018/08/01 22:04:51 BridgeDetails <nil>
E0801 22:04:53.995054    5092 cni.go:260] Error adding network: netplugin failed but error parsing its diagnostic message "{\n    \"cniVersion\": \"0.3.0\",\n    \"interfaces\": [\n        {\n            \"name\": \"eth0\"\n        }\n    ],\n    \"dns\": {}\n}{\n    \"code\": 100,\n    \"msg\": \"A Compute System with the specified identifier does not exist.\"\n}": invalid character '{' after top-level value
E0801 22:04:53.995054    5092 cni_windows.go:49] error while adding to cni network: netplugin failed but error parsing its diagnostic message "{\n    \"cniVersion\": \"0.3.0\",\n    \"interfaces\": [\n        {\n            \"name\": \"eth0\"\n        }\n    ],\n    \"dns\": {}\n}{\n    \"code\": 100,\n    \"msg\": \"A Compute System with the specified identifier does not exist.\"\n}": invalid character '{' after top-level value

kubelet.log after

2018/08/02 01:14:02 "Start Flag true CniSucceeded false Name CNI Version v1.0.10-2-g14efabc-dirty ErrorMessage  vnet [] 
				Context AzureCNI SubContext "
2018/08/02 01:14:02 OrchestratorDetails &{  kubectl command failed due to exit status 1}
2018/08/02 01:14:02 OSDetails &{windows    }
2018/08/02 01:14:02 SystemDetails &{0 0 0 0 0 0 }
2018/08/02 01:14:02 InterfaceDetails &{Primary 10.240.0.0/12 10.240.0.96 00:0d:3a:f9:09:bb vEthernet (Ethernet 2) 30 0 }
2018/08/02 01:14:02 BridgeDetails <nil>
E0802 01:14:04.307700    4616 cni.go:268] Error adding network: A Compute System with the specified identifier does not exist.
E0802 01:14:04.307700    4616 cni_windows.go:49] error while adding to cni network: A Compute System with the specified identifier does not exist.

Looks like the error results are parsed correctly now. There are still errors that need investigation but at least they're correct in the logs.

Here's the corresponding error from azure-vnet.log for the curious

2018/08/02 01:14:01 [cni-net] Plugin azure-vnet version v1.0.10-2-g14efabc-dirty.
2018/08/02 01:14:01 [cni-net] Running on windows
2018/08/02 01:14:01 [net] Network interface: {Index:12 MTU:1500 Name:vEthernet (Ethernet 2) HardwareAddr:00:0d:3a:f9:09:bb Flags:up|broadcast|multicast} with IP addresses: [fe80::4ff:f6a0:4d9a:b291/64 10.240.0.96/12]
2018/08/02 01:14:01 [net] Network interface: {Index:1 MTU:-1 Name:Loopback Pseudo-Interface 1 HardwareAddr: Flags:up|loopback|multicast} with IP addresses: [::1/128 127.0.0.1/8]
2018/08/02 01:14:01 [net] Network interface: {Index:5 MTU:1500 Name:vEthernet (nat) HardwareAddr:00:15:5d:b2:61:10 Flags:up|broadcast|multicast} with IP addresses: [fe80::e033:81f0:4fc2:1b1a/64 172.21.224.1/20]
2018/08/02 01:14:01 [net] reboot time 0001-01-01 00:00:00 +0000 UTC store mod time 2018-08-02 01:13:58.4576856 +0000 UTC
2018/08/02 01:14:01 [net] Restored state, &{Version:v1.0.10-2-g14efabc-dirty TimeStamp:2018-08-02 01:13:58.4516939 +0000 UTC ExternalInterfaces:map[vEthernet (Ethernet 2):0xc042165980] store:0xc04210a8a0 Mutex:{state:0 sema:0}}
2018/08/02 01:14:01 External Interface &{vEthernet (Ethernet 2) map[azure:0xc042168a80] [10.240.0.0/12]  00:0d:3a:f9:09:bb [] [] 0.0.0.0 ::}
2018/08/02 01:14:01 network &{azure dd561022-1e37-4078-91d3-a30b26ee3124 bridge 0 [{2 {10.240.0.0 fff00000} 10.240.0.1}] map[88e3a956-eth0:0xc0421ce1e0 e5e4f423-eth0:0xc0421ce2d0 054b1b95-eth0:0xc0421ce000 63354a97-eth0:0xc0421ce0f0] 0xc042165980 false}
2018/08/02 01:14:01 endpoint &{054b1b95-eth0 b540b79d-36ab-4d56-88a3-489c9f964f31 054b1b95bf9e23330bc1db7dcee74e0b061244b71c4a404278216a0de9325d09 eth0  00:15:5d:54:18:9d [{10.240.0.99 fff00000}] [10.240.0.1] {default.svc.cluster.local [10.0.0.10 168.63.129.16]} [{{0.0.0.0 00000000} 10.240.0.1 }] 0 false}
2018/08/02 01:14:01 endpoint &{63354a97-eth0 26b883bb-5929-4181-8b45-07caebac5eeb 63354a97ca8e48605e132e1cdf8cfe2345c138ea51b2857e40d7fc7d3286f559 eth0  00:15:5d:54:18:39 [{10.240.0.119 fff00000}] [10.240.0.1] {default.svc.cluster.local [10.0.0.10 168.63.129.16]} [{{0.0.0.0 00000000} 10.240.0.1 }] 0 false}
2018/08/02 01:14:01 endpoint &{88e3a956-eth0 fc2c34c3-e8c8-4ad1-bfb3-0ee7871b5b71 88e3a95615b475593c3037c21e4515716aeff6052ad6861494bc7ac518caab31 eth0  00:15:5d:54:1c:15 [{10.240.0.105 fff00000}] [10.240.0.1] {default.svc.cluster.local [10.0.0.10 168.63.129.16]} [{{0.0.0.0 00000000} 10.240.0.1 }] 0 false}
2018/08/02 01:14:01 endpoint &{e5e4f423-eth0 277ee58f-6a6a-43f0-a0d2-0523263a5001 e5e4f42351ac8f4d645b3c4ca1772b061b0e727171fea85d5a427f06dd15cc59 eth0  00:15:5d:54:1e:fb [{10.240.0.100 fff00000}] [10.240.0.1] {default.svc.cluster.local [10.0.0.10 168.63.129.16]} [{{0.0.0.0 00000000} 10.240.0.1 }] 0 false}
2018/08/02 01:14:01 [cni-net] Plugin started.
2018/08/02 01:14:01 [cni-net] Processing ADD command with args {ContainerID:9441e82f9379324696a34449317b9f013467db91780adf5babcd6f993280f538 Netns:container:e5e4f42351ac8f4d645b3c4ca1772b061b0e727171fea85d5a427f06dd15cc59 IfName:eth0 Args:IgnoreUnknown=1;K8S_POD_NAMESPACE=default;K8S_POD_NAME=whoami-1803-78fd64846f-9r5qz;K8S_POD_INFRA_CONTAINER_ID=9441e82f9379324696a34449317b9f013467db91780adf5babcd6f993280f538 Path:c:\k\azurecni\bin}.
2018/08/02 01:14:01 [cni-net] Read network configuration &{CNIVersion:0.3.0 Name:azure Type:azure-vnet Mode:bridge Master: Bridge:azure0 LogLevel: LogTarget: MultiTenancy:false EnableSnatOnHost:false Ipam:{Type:azure-vnet-ipam Environment: AddrSpace: Subnet: Address: QueryInterval:} DNS:{Nameservers:[10.0.0.10 168.63.129.16] Domain: Search:[svc.cluster.local] Options:[]} AdditionalArgs:[{Name:EndpointPolicy Value:[123 34 69 120 99 101 112 116 105 111 110 76 105 115 116 34 58 91 34 49 48 46 50 52 48 46 48 46 48 47 49 50 34 44 34 49 48 46 50 52 48 46 48 46 48 47 49 50 34 93 44 34 84 121 112 101 34 58 34 79 117 116 66 111 117 110 100 78 65 84 34 125]} {Name:EndpointPolicy Value:[123 34 68 101 115 116 105 110 97 116 105 111 110 80 114 101 102 105 120 34 58 34 49 48 46 48 46 48 46 48 47 49 54 34 44 34 78 101 101 100 69 110 99 97 112 34 58 116 114 117 101 44 34 84 121 112 101 34 58 34 82 79 85 84 69 34 125]}]}.
2018/08/02 01:14:01 PrimaryInterfaceIdentifier :<nil>
2018/08/02 01:14:01 Trying to retrieve endpoint id e5e4f423-eth0
2018/08/02 01:14:01 [net] Found existing endpoint through hcsshim: &{Id:277ee58f-6a6a-43f0-a0d2-0523263a5001 Name:e5e4f423-eth0 VirtualNetwork:dd561022-1e37-4078-91d3-a30b26ee3124 VirtualNetworkName:azure Policies:[[123 32 34 69 120 99 101 112 116 105 111 110 76 105 115 116 34 32 58 32 91 32 34 49 48 46 50 52 48 46 48 46 48 47 49 50 34 44 32 34 49 48 46 50 52 48 46 48 46 48 47 49 50 34 32 93 44 32 34 84 121 112 101 34 32 58 32 34 79 117 116 66 111 117 110 100 78 65 84 34 32 125] [123 32 34 68 101 115 116 105 110 97 116 105 111 110 80 114 101 102 105 120 34 32 58 32 34 49 48 46 48 46 48 46 48 92 47 49 54 34 44 32 34 78 101 101 100 69 110 99 97 112 34 32 58 32 116 114 117 101 44 32 34 84 121 112 101 34 32 58 32 34 82 79 85 84 69 34 32 125] [123 32 34 84 121 112 101 34 32 58 32 34 76 50 68 114 105 118 101 114 34 32 125]] MacAddress:00-15-5D-54-1E-FB IPAddress:10.240.0.100 DNSSuffix:default.svc.cluster.local DNSServerList:10.0.0.10,168.63.129.16 GatewayAddress:10.240.0.1 EnableInternalDNS:false DisableICC:false PrefixLength:12 IsRemoteEndpoint:false Namespace:<nil>}
2018/08/02 01:14:01 [net] Attaching ep 277ee58f-6a6a-43f0-a0d2-0523263a5001 to container 9441e82f9379324696a34449317b9f013467db91780adf5babcd6f993280f538
2018/08/02 01:14:01 [cni-net] Failed to hot attach shared endpoint[277ee58f-6a6a-43f0-a0d2-0523263a5001] to container [9441e82f9379324696a34449317b9f013467db91780adf5babcd6f993280f538], err:A Compute System with the specified identifier does not exist..
2018/08/02 01:14:01 handleConsecutiveAdd failed with error A Compute System with the specified identifier does not exist.
2018/08/02 01:14:01 [cni-net] ADD command completed with result:Interfaces:[{Name:eth0 Mac: Sandbox:}], DNS:{Nameservers:[] Domain: Search:[] Options:[]} err:A Compute System with the specified identifier does not exist..
2018/08/02 01:14:01 Failed to execute network plugin, err:A Compute System with the specified identifier does not exist..
2018/08/02 01:14:01 Report plugin error
2018/08/02 01:14:04 SendReport failed due to [Azure CNI] HTTP Post returned statuscode 500
2018/08/02 01:14:04 [cni-net] Plugin stopped.

from azure-container-networking.

PatrickLang avatar PatrickLang commented on July 19, 2024

@tamilmani1989 @sharmasushant - can you get a new azure-cni release with this fix?

from azure-container-networking.

PatrickLang avatar PatrickLang commented on July 19, 2024

This fix is available in ACS-Engine now:
https://github.com/Azure/acs-engine/releases/tag/v0.20.9

from azure-container-networking.

Cherishty avatar Cherishty commented on July 19, 2024

Hi team, I was meeting a regression on this issue on my k8s cluster deployed by acs-engine 0.21.2, the with the similar error in the kubelet.log, another error reported as:

Failed to initialize key-value store of network plugin, err:Store is locked.

and when I describe pods I find the pvc of azurefile mounts error which results in Pod sandbox changed, see below:

Events:
  Type     Reason                 Age                From                   Message
  ----     ------                 ----               ----                   -------
  Normal   SandboxChanged         13m (x25 over 1h)  kubelet, 37359k8s9001  Pod sandbox changed, it will be killed and re-created.
  Normal   SuccessfulMountVolume  9m                 kubelet, 37359k8s9001  MountVolume.SetUp succeeded for volume "log"
  Normal   SuccessfulMountVolume  9m                 kubelet, 37359k8s9001  MountVolume.SetUp succeeded for volume "default-token-bkpkh"
  Warning  FailedMount            9m                 kubelet, 37359k8s9001  MountVolume.SetUp failed for volume "pvc-211d3e2f-eda4-11e8-a14f-000d3a408701" : exit status 1
  Normal   SuccessfulMountVolume  9m                 kubelet, 37359k8s9001  MountVolume.SetUp succeeded for volume "pvc-211d3e2f-eda4-11e8-a14f-000d3a408701"
  Normal   SandboxChanged         2m (x2 over 6m)    kubelet, 37359k8s9001  Pod sandbox changed, it will be killed and re-created.

from azure-container-networking.

ngrigoriev avatar ngrigoriev commented on July 19, 2024

This problem is happening and seems to be quick to reproduce in our cluster.

E0909 12:47:27.938338    6072 cni_windows.go:50] error while adding to cni network: invalid character '{' after top-level value
W0909 12:47:27.938338    6072 docker_sandbox.go:384] failed to read pod IP from plugin/docker: NetworkPlugin cni failed on the status hook for pod "********": invalid character '{' after top-level value
2019/09/09 12:47:28 Formatted Boot time: 2019-09-06 17:06:01
2019/09/09 12:47:28 Connected to telemetry service
2019/09/09 12:47:28 [Azure-Utils] ver
I0909 12:47:28.369804    6072 fake_cpu_manager.go:45] [fake cpumanager] RemoveContainer (container id: 32ef256a8bc968aef122e9c4b8f78379460ef50ab5aabcd9b559121ca3ad3619)
I0909 12:47:28.370806    6072 fake_cpu_manager.go:45] [fake cpumanager] RemoveContainer (container id: 6e8aedba55d6639f42c66a5674dd1c04c9c71fb0687c4eed7e7775104f863d82)
I0909 12:47:28.370806    6072 fake_cpu_manager.go:45] [fake cpumanager] RemoveContainer (container id: 7ed04d04acb645b8d99a0b926559089dd12d1c5a04f55c5bb817657cc5314c09)
E0909 12:47:28.370806    6072 pod_workers.go:190] Error syncing pod dc207ac8-d2ce-11e9-9c76-a62f07b3f53e ("**********(dc207ac8-d2ce-11e9-9c76-a62f07b3f53e)"), skipping: failed to "StartContainer" for "*******" with CrashLoopBackOff: "Back-off 5m0s restarting failed container=******** pod=********(dc207ac8-d2ce-11e9-9c76-a62f07b3f53e)"
E0909 12:47:28.370806    6072 pod_workers.go:190] Error syncing pod d6bf2baf-d2ce-11e9-9c76-a62f07b3f53e ("**********(d6bf2baf-d2ce-11e9-9c76-a62f07b3f53e)"), skipping: failed to "StartContainer" for "********" with CrashLoopBackOff: "Back-off 5m0s restarting failed container=******** pod=************(d6bf2baf-d2ce-11e9-9c76-a62f07b3f53e)"
I0909 12:47:28.549806    6072 fake_cpu_manager.go:40] [fake cpumanager] AddContainer (pod: *******, container:********, container id: de1dc01965ed9756db581642846ef4d7b96c79a0b73eb6f7539390f0d9cce75c)
2019/09/09 12:47:28 Formatted Boot time: 2019-09-06 17:06:01
2019/09/09 12:47:28 Connected to telemetry service

AKS Kubernetes 1.14.5, latest Windows VM image + KB4512534 installed.

from azure-container-networking.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.