cloudnativelabs / kube-router Goto Github PK
View Code? Open in Web Editor NEWKube-router, a turnkey solution for Kubernetes networking.
Home Page: https://kube-router.io
License: Apache License 2.0
Kube-router, a turnkey solution for Kubernetes networking.
Home Page: https://kube-router.io
License: Apache License 2.0
Here's the log:
ubuntu@osh-sh-ci-01:~$ kubectl logs kube-router-hkf2d -n kube-system
panic: nodes "kubernetes" not found
goroutine 1 [running]:
panic(0x1596120, 0xc42040a450)
/usr/local/go/src/runtime/panic.go:500 +0x1a1
github.com/cloudnativelabs/kube-router/app/controllers.NewNetworkPolicyController(0xc420315540, 0xc420314960, 0x0, 0x0, 0x0)
/home/kube/go/src/github.com/cloudnativelabs/kube-router/app/controllers/network_policy_controller.go:785 +0x413
github.com/cloudnativelabs/kube-router/app.(*KubeRouter).Run(0xc4203c1660, 0xc4203c1660, 0x0)
/home/kube/go/src/github.com/cloudnativelabs/kube-router/app/server.go:120 +0x710
main.main()
/home/kube/go/src/github.com/cloudnativelabs/kube-router/kube-router.go:37 +0x13c
ubuntu@osh-sh-ci-01:~$
In this case kubelet's --hostname-override
is set to the node's IP address. I believe that kube-router/app/controllers/network_policy_controller.go around L785 will only succeed if --hostname-override
is omitted, or is set to the real FQDN on the host. It should probably not assume anything about the node's name, and pull the node name from the API server. I'll try to find how that's done and submit a PR if time permits. Thanks!
Both kops and bootkube seems to deploy cluster on AWS with --hostname-override flag for kubelet resulting in nodes to be registered to master with FQDN
± kubectl get nodes
NAME STATUS AGE VERSION
ip-172-20-51-2.us-west-2.compute.internal Ready,node 1h v1.6.2
ip-172-20-55-216.us-west-2.compute.internal Ready,node 1h v1.6.2
ip-172-20-61-204.us-west-2.compute.internal Ready,master 1h v1.6.2
Where as kube-proxy and kube-router just use hostname when retrieving node info from master. This mismatch results in failures of kube-proxy and kube-router.
W0528 22:32:42.648040 6 server.go:469] Failed to retrieve node info: Get https://api.internal.mycluster.aws.cloudnativelabs.net/api/v1/nodes/ip-172-20-51-2: dial tcp 203.0.113.123:443: i/o timeout
Atleast in kube-router could be changed to use safe option. First try with os.Hostname first then in case it fails try with full FQDN.
Accessing cluste IP from a pod, which is end point of the service fails, when traffic gets load balanced to same pod as the pod from which request originated.
We need a way to have hair pin nat , so that this scenario works.
For redundancy purposes having the ability to specify multiple peer-router/peer-asn pairs would be useful.
Thanks @rmb938 @thoro for reporting this issue over gitter.
When nodes in cluster running iBGP advertises routes to the global peer, next hop to subnet is not the node IP corresponding to the subnet. But they all point to single node
For e.g, in 3 node cluster with IP 192.168.1.100, 192.168.1.101, 192.168.1.102 globally peering with 192.168.1.98
root@kube-master:~# gobgp neighbor -u 192.168.1.100
Peer AS Up/Down State |#Received Accepted
192.168.1.98 64513 00:00:19 Establ | 3 0
192.168.1.101 64512 00:00:21 Establ | 1 1
192.168.1.102 64512 00:00:25 Establ | 1 1
root@kube-master:~# gobgp neighbor -u 192.168.1.102
Peer AS Up/Down State |#Received Accepted
192.168.1.98 64513 00:00:37 Establ | 1 0
192.168.1.100 64512 00:00:29 Establ | 1 1
192.168.1.101 64512 00:00:21 Establ | 1 1
root@kube-master:~# gobgp neighbor -u 192.168.1.101
Peer AS Up/Down State |#Received Accepted
192.168.1.98 64513 00:00:37 Establ | 2 0
192.168.1.100 64512 00:00:26 Establ | 1 1
192.168.1.102 64512 00:00:22 Establ | 1 1
root@kube-master:~# gobgp global rib -u 192.168.1.100
Network Next Hop AS_PATH Age Attrs
*> 10.1.0.0/24 192.168.1.100 00:00:00 [{Origin: i}]
*> 10.1.1.0/24 192.168.1.101 00:00:47 [{Origin: i} {LocalPref: 100}]
*> 10.1.2.0/24 192.168.1.102 00:00:51 [{Origin: i} {LocalPref: 100}]
root@kube-master:~# gobgp global rib -u 192.168.1.101
Network Next Hop AS_PATH Age Attrs
*> 10.1.0.0/24 192.168.1.100 00:00:49 [{Origin: i} {LocalPref: 100}]
*> 10.1.1.0/24 192.168.1.101 00:00:03 [{Origin: i}]
*> 10.1.2.0/24 192.168.1.102 00:00:45 [{Origin: i} {LocalPref: 100}]
root@kube-master:~# gobgp global rib -u 192.168.1.102
Network Next Hop AS_PATH Age Attrs
*> 10.1.0.0/24 192.168.1.100 00:00:54 [{Origin: i} {LocalPref: 100}]
*> 10.1.1.0/24 192.168.1.101 00:00:46 [{Origin: i} {LocalPref: 100}]
*> 10.1.2.0/24 192.168.1.102 00:00:05 [{Origin: i}]
routes all point next hop as 192.168.1.102 on BGP global peer
root@router:~# ip route
default via 192.168.1.1 dev ens33 onlink
10.1.0.0/24 via 192.168.1.102 dev ens33 proto zebra
10.1.1.0/24 via 192.168.1.102 dev ens33 proto zebra
10.1.2.0/24 via 192.168.1.102 dev ens33 proto zebra
192.168.1.0/24 dev ens33 proto kernel scope link src 192.168.1.98
root@router:~# ip route
default via 192.168.1.1 dev ens33 onlink
10.1.0.0/24 via 192.168.1.102 dev ens33 proto zebra
10.1.1.0/24 via 192.168.1.101 dev ens33 proto zebra
10.1.2.0/24 via 192.168.1.102 dev ens33 proto zebra
192.168.1.0/24 dev ens33 proto kernel scope link src 192.168.1.98
root@router:~# ping 10.1.0.70
PING 10.1.0.70 (10.1.0.70) 56(84) bytes of data.
64 bytes from 10.1.0.70: icmp_seq=1 ttl=63 time=2.46 ms
From 192.168.1.102: icmp_seq=2 Redirect Host(New nexthop: 192.168.1.100)
64 bytes from 10.1.0.70: icmp_seq=2 ttl=63 time=0.589 ms
From 192.168.1.102: icmp_seq=3 Redirect Host(New nexthop: 192.168.1.100)
64 bytes from 10.1.0.70: icmp_seq=3 ttl=63 time=0.958 ms
root@router:~# traceroute 10.1.0.70
traceroute to 10.1.0.70 (10.1.0.70), 30 hops max, 60 byte packets
1 192.168.1.102 (192.168.1.102) 0.307 ms 0.298 ms 0.308 ms
2 192.168.1.100 (192.168.1.100) 0.648 ms 0.573 ms 0.537 ms
3 10.1.0.70 (10.1.0.70) 3.238 ms 3.847 ms 3.775 ms
I got excited about metrics since #65 landed, so this issue is just for mapping out short and long-term goals for kube-router metrics. Once we agree on the the abstract goals I will create issues to track their implementation. I will edit this issue with any changes we discuss.
--metrics-port
option to change the port.
--metrics-port=0
means disabled. Default port should be uncommon.Could use or draw inspiration from k8sconntrack
Quality Assurance: Have a basic test framework in place that can find some issues/regressions without manual testing. Using and passing kubernetes conformance tests would also help alleviate fears of new software.
I've successfully created a new Kubernetes cluster with Bootkube using kube-router instead of kube-proxy/flannel. Creating this issue to remind me to document how it was done and share with this project. Thanks!
I created a brand new kubernetes cluster with kube-router managing pod-to-pod networking, service proxy, and namespace firewall. Everything appears OK with the first node (node1-dev
), however I get different results when adding additional nodes (node2-dev
), specifically with full BGP mesh setup.
IPAM works, service discovery/IPVS seem to work. But communication on non-publically routable IPs (pod/service cidr) between nodes does not work.
I see in the kube-router logs for node1-dev
it seems to detect a peering attempt with node2-dev
, however I see the following error:
time="2017-05-25T01:40:54Z" level=info msg="Can't find configuration for a new passive connection from:10.10.3.2" Topic=Peer
10.10.3.2 is the IP of node2-dev
.
To hopefully help with troubleshooting, here's some ip/gobgp output pertaining to both nodes.
node1-dev
ip route
and ip addr
default via 10.10.10.1 dev enp0s25 proto static
10.2.0.0/24 dev kube-bridge proto kernel scope link src 10.2.0.1
10.10.0.0/16 dev enp0s25 proto kernel scope link src 10.10.3.1
blackhole 10.10.150.1
blackhole 10.10.250.1
172.17.0.0/16 dev docker0 proto kernel scope link src 172.17.0.1
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet 10.10.250.1/32 brd 10.10.250.1 scope global lo
valid_lft forever preferred_lft forever
inet 10.10.150.1/32 brd 10.10.150.1 scope global lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: enp0s25: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
link/ether 00:1e:4f:92:b2:38 brd ff:ff:ff:ff:ff:ff
inet 10.10.3.1/16 brd 10.10.255.255 scope global enp0s25
valid_lft forever preferred_lft forever
inet6 fe80::21e:4fff:fe92:b238/64 scope link
valid_lft forever preferred_lft forever
3: docker0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN group default
link/ether 02:42:6b:0c:bc:c8 brd ff:ff:ff:ff:ff:ff
inet 172.17.0.1/16 scope global docker0
valid_lft forever preferred_lft forever
4: dummy0: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN group default qlen 1000
link/ether 92:dd:c8:bc:f2:53 brd ff:ff:ff:ff:ff:ff
5: kube-dummy-if: <BROADCAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default
link/ether 3e:de:1f:dd:7f:28 brd ff:ff:ff:ff:ff:ff
inet 10.3.0.10/32 scope link kube-dummy-if
valid_lft forever preferred_lft forever
inet 10.3.0.1/32 scope link kube-dummy-if
valid_lft forever preferred_lft forever
inet6 fe80::3cde:1fff:fedd:7f28/64 scope link
valid_lft forever preferred_lft forever
6: kube-bridge: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
link/ether 0a:58:0a:02:00:01 brd ff:ff:ff:ff:ff:ff
inet 10.2.0.1/24 scope global kube-bridge
valid_lft forever preferred_lft forever
inet6 fe80::f4a2:f4ff:feea:2a18/64 scope link
valid_lft forever preferred_lft forever
7: veth9683a0b6@docker0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master kube-bridge state UP group default
link/ether e6:a5:1d:ef:e7:8d brd ff:ff:ff:ff:ff:ff
inet6 fe80::e4a5:1dff:feef:e78d/64 scope link
valid_lft forever preferred_lft forever
8: vethdbacf2a7@docker0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master kube-bridge state UP group default
link/ether 96:37:ff:9c:91:7e brd ff:ff:ff:ff:ff:ff
inet6 fe80::9437:ffff:fe9c:917e/64 scope link
valid_lft forever preferred_lft forever
9: veth06603cee@docker0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master kube-bridge state UP group default
link/ether 02:42:98:1e:6f:35 brd ff:ff:ff:ff:ff:ff
inet6 fe80::42:98ff:fe1e:6f35/64 scope link
valid_lft forever preferred_lft forever
10: vetha1ffb2fd@docker0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master kube-bridge state UP group default
link/ether c2:d9:d8:93:af:ff brd ff:ff:ff:ff:ff:ff
inet6 fe80::c0d9:d8ff:fe93:afff/64 scope link
valid_lft forever preferred_lft forever
11: vethbc1e38cc@docker0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master kube-bridge state UP group default
link/ether 16:37:4f:d2:43:61 brd ff:ff:ff:ff:ff:ff
inet6 fe80::1437:4fff:fed2:4361/64 scope link
valid_lft forever preferred_lft forever
node2-dev
ip route
and ip addr
default via 10.10.10.1 dev enp0s25 proto static
10.10.0.0/16 dev enp0s25 proto kernel scope link src 10.10.3.2
blackhole 10.10.150.1
blackhole 10.10.250.1
172.17.0.0/16 dev docker0 proto kernel scope link src 172.17.0.1
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet 10.10.250.1/32 brd 10.10.250.1 scope global lo
valid_lft forever preferred_lft forever
inet 10.10.150.1/32 brd 10.10.150.1 scope global lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: enp0s25: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
link/ether 00:1e:4f:92:ad:ef brd ff:ff:ff:ff:ff:ff
inet 10.10.3.2/16 brd 10.10.255.255 scope global enp0s25
valid_lft forever preferred_lft forever
inet6 fe80::21e:4fff:fe92:adef/64 scope link
valid_lft forever preferred_lft forever
3: docker0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN group default
link/ether 02:42:e7:ce:9d:9b brd ff:ff:ff:ff:ff:ff
inet 172.17.0.1/16 scope global docker0
valid_lft forever preferred_lft forever
4: dummy0: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN group default qlen 1000
link/ether 4a:bd:98:93:1b:19 brd ff:ff:ff:ff:ff:ff
5: kube-dummy-if: <BROADCAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default
link/ether c6:31:ca:49:e8:2a brd ff:ff:ff:ff:ff:ff
inet 10.3.0.10/32 scope link kube-dummy-if
valid_lft forever preferred_lft forever
inet 10.3.0.1/32 scope link kube-dummy-if
valid_lft forever preferred_lft forever
inet6 fe80::c431:caff:fe49:e82a/64 scope link
valid_lft forever preferred_lft forever
gobgp neighbor for both nodes:
$GOPATH/bin/gobgp -u node1-dev.zbrbdl neighbor; echo "---"; $GOPATH/bin/gobgp -u node2-dev.zbrbdl neighbor
Peer AS Up/Down State |#Received Accepted
---
Peer AS Up/Down State |#Received Accepted
10.10.3.1 64512 never Active | 0 0
Issue opened to track DR mode feature.
Initial research to put in this issue:
I'm not at my computer so I can get logs of the panic later. I believe there's an issue when kube-router tries to advertise the IP of a headless service, which has no service IP but instead sets up DNS records going directly to the POD IPs for the service. The log mentioned advertising service IP [] (empty slice I believe), and then the panic was a memory access error, probably accessing a nonexistent element of the empty slice.
Reference: https://kubernetes.io/docs/concepts/services-networking/service/#headless-services
very likely network policy is going to be GA in 1.7 [1]
there are some semantic changes not compatible with v1beta1 so existing implementation in kube-router does not work for GA/1.7 network policies. Refactor code so that it works for both GA and v1beta1 semantics of network policies
Kubeadm is one of the most popular and officially supported methods of deploying Kubernetes. Need to test and document it for those users.
User @rmb938 has provided initial RBAC related definitions for this environment.
Though logs indicate service is created Successfully added service: 172.20.59.228:tcp:31044
, actual service is not created
ERROR: logging before flag.Parse: I0713 05:35:46.085555 1 network_services_controller.go:109] Performing periodic syn of the ipvs services and server to reflect desired state of kubernetes services and endpoints
ERROR: logging before flag.Parse: I0713 05:35:46.085621 1 network_services_controller.go:419] No hairpin-mode enabled services found -- no hairpin rules created
ERROR: logging before flag.Parse: I0713 05:35:46.088221 1 network_services_controller.go:639] Successfully added service: 100.67.46.129:tcp:80
ERROR: logging before flag.Parse: I0713 05:35:46.088733 1 network_services_controller.go:639] Successfully added service: 172.20.59.228:tcp:31044
ERROR: logging before flag.Parse: I0713 05:35:46.088933 1 network_services_controller.go:648] Successfully added destination 100.96.1.10:80 to the service 100.67.46.129:tcp:80
ERROR: logging before flag.Parse: I0713 05:35:46.089081 1 network_services_controller.go:648] Successfully added destination 100.96.1.10:80 to the service 172.20.59.228:tcp:31044
ERROR: logging before flag.Parse: I0713 05:35:46.089267 1 network_services_controller.go:648] Successfully added destination 100.96.1.8:80 to the service 100.67.46.129:tcp:80
ERROR: logging before flag.Parse: I0713 05:35:46.089410 1 network_services_controller.go:648] Successfully added destination 100.96.1.8:80 to the service 172.20.59.228:tcp:31044
ERROR: logging before flag.Parse: I0713 05:35:46.089613 1 network_services_controller.go:648] Successfully added destination 100.96.1.9:80 to the service 100.67.46.129:tcp:80
ERROR: logging before flag.Parse: I0713 05:35:46.089741 1 network_services_controller.go:648] Successfully added destination 100.96.1.9:80 to the service 172.20.59.228:tcp:31044
ERROR: logging before flag.Parse: I0713 05:35:46.090123 1 network_services_controller.go:639] Successfully added service: 100.64.0.1:tcp:443
ERROR: logging before flag.Parse: I0713 05:35:46.090316 1 network_services_controller.go:648] Successfully added destination 172.20.33.155:443 to the service 100.64.0.1:tcp:443
ERROR: logging before flag.Parse: I0713 05:35:46.090752 1 network_services_controller.go:639] Successfully added service: 100.64.0.10:udp:53
ERROR: logging before flag.Parse: I0713 05:35:46.090998 1 network_services_controller.go:648] Successfully added destination 100.96.1.2:53 to the service 100.64.0.10:udp:53
ERROR: logging before flag.Parse: I0713 05:35:46.091167 1 network_services_controller.go:648] Successfully added destination 100.96.1.4:53 to the service 100.64.0.10:udp:53
ERROR: logging before flag.Parse: I0713 05:35:46.094938 1 network_services_controller.go:639] Successfully added service: 100.64.0.10:tcp:53
ERROR: logging before flag.Parse: I0713 05:35:46.095102 1 network_services_controller.go:648] Successfully added destination 100.96.1.2:53 to the service 100.64.0.10:tcp:53
ERROR: logging before flag.Parse: I0713 05:35:46.095279 1 network_services_controller.go:648] Successfully added destination 100.96.1.4:53 to the service 100.64.0.10:tcp:53
ERROR: logging before flag.Parse: I0713 05:35:46.095638 1 network_services_controller.go:639] Successfully added service: 100.70.67.153:tcp:6379
ERROR: logging before flag.Parse: I0713 05:35:46.095781 1 network_services_controller.go:648] Successfully added destination 100.96.1.5:6379 to the service 100.70.67.153:tcp:6379
ERROR: logging before flag.Parse: I0713 05:35:46.096280 1 network_services_controller.go:639] Successfully added service: 100.66.191.116:tcp:6379
ERROR: logging before flag.Parse: I0713 05:35:46.096445 1 network_services_controller.go:648] Successfully added destination 100.96.1.6:6379 to the service 100.66.191.116:tcp:6379
ERROR: logging before flag.Parse: I0713 05:35:46.096632 1 network_services_controller.go:648] Successfully added destination 100.96.1.7:6379 to the service 100.66.191.116:tcp:6379
~ # ipvsadm
IP Virtual Server version 1.2.1 (size=4096)
Prot LocalAddress:Port Scheduler Flags
-> RemoteAddress:Port Forward Weight ActiveConn InActConn
TCP 100.64.0.1:https rr persistent 10800 mask 0.0.0.0
-> ip-172-20-33-155.us-west-2.c Masq 1 3 0
TCP 100.64.0.10:domain rr
-> 100.96.1.2:domain Masq 1 0 0
-> 100.96.1.4:domain Masq 1 0 0
TCP 100.66.191.116:6379 rr
-> 100.96.1.6:6379 Masq 1 0 0
-> 100.96.1.7:6379 Masq 1 0 0
TCP 100.67.46.129:http rr
-> 100.96.1.8:http Masq 1 0 0
-> 100.96.1.9:http Masq 1 0 0
-> 100.96.1.10:http Masq 1 0 0
TCP 100.70.67.153:6379 rr
-> 100.96.1.5:6379 Masq 1 2 2
TCP ip-172-20-59-228.us-west-2.c rr
-> 100.96.1.8:http Masq 1 0 0
-> 100.96.1.9:http Masq 1 0 0
-> 100.96.1.10:http Masq 1 0 0
UDP 100.64.0.10:domain rr
-> 100.96.1.2:domain Masq 1 0 1
-> 100.96.1.4:domain Masq 1 0 1
Also logs shows ERROR for INFO
For AWS ec2 instances to send and recieve traffic from/to pods we need to disable source-destination check. Currently this is manual step, which can be automated.
In cluster deployers like KOPS, pod running on the master has access to EC2 API. So kube-router can detect when running on AWS, and program disabling source-destination checks if it has access to EC2 API.
This is one action item pending with KOPS integration.
For #66 - Phase 1
--metrics-port
option to change the port.
--metrics-port=0
means disabled. Default port should be uncommon.Add support for configuring a BGP peer information. Kube-router on each node will peer with provide external peer. Once peered, external peer will know how to route traffic to the pods with in the cluster.
This will enable use-case where external access to the pods are required.
Also cluster IP for the servces can be advertised, so external access to the services through the cluster IP can be achived.
Even though advertising cluster IP's is optional (with --advertise-cluster-ip). Once flag is enabled all the cluster IP's are advertised which is not desirable. For e.g Db. so use an annotation to selectivley advertise the service which user request to.
iptable masqurade rule in POSTROUTING chain of NAT table is failing to get added with error
E0608 09:36:44.979263 1 network_routes_controller.go:91] Failed to add iptable rule to masqurade outbound traffic from pods due to exit status 2: iptables v1.6.0: invalid mask 10"' specified Try
iptables -h' or 'iptables --help' for more information.
KOPS uses 100.64.0.0/10 subnet by default for pod cidr, below command is failing on default OS kops uses (debian jesse)
iptables -t nat -C POSTROUTING -s "100.64.0.0/10" ! -d "100.64.0.0/10" -j MASQUERADE --wait
When a Service has externalIPs
defined, kube-proxy binds the service ports to those IPs if they exist on the node it's running on. Although with iBGP peering a network admin is able to expose service IPs, it may still be beneficial to users to specify an IP that's not in the service IP CIDR. Adding this support to kube-router would also streamline transitioning away from kube-proxy.
References
There does not seems to be a release version of client-go that supports Kubernetes 1.7 so far. So test out the latest 3.0 beta with Kube-router and if it works with out any hiccups, vendor the latest client-go.
this is needed for #16
on pre-merge checks, do travis CI build, and run tests
also through travis we can build and push docker image. Can be used for #31
For e.g. something like https://github.com/coreos/flannel/blob/master/.travis.yml
Since Service manifest does not have support for choosing load balancing method, use service annotations to add meta data to service to specify load balancing method. Use this method details to configure ipvs service.
On side note how useful it is needs to be analyzed. Given that each node is making decision on the knowledge of connections it is aware of. Even if a node performs least connection load balancing it does not necessarily mean endpoint has least connection. As there can be connetions that are load balanced from across the cluster nodes. This is nature of distributed load balancer.
This is stack trace, need further investigation.
panic: Running modprobe ip_vs failed with message: ``, error: fork/exec /sbin/modprobe: too many open files
goroutine 95 [running]:
panic(0x15458e0, 0xc42118d990)
/usr/local/go/src/runtime/panic.go:500 +0x1a1
github.com/cloudnativelabs/kube-router/app/controllers.ipvsAddService(0xc42118c820, 0x10, 0x10, 0x101bb0006, 0xc42118d501, 0x4, 0x0)
/home/kube/go/src/github.com/cloudnativelabs/kube-router/app/controllers/network_services_controller.go:399 +0x964
github.com/cloudnativelabs/kube-router/app/controllers.(*NetworkServicesController).syncIpvsServices(0xc4203924d0, 0xc420a7ade0, 0xc420a7aed0)
/home/kube/go/src/github.com/cloudnativelabs/kube-router/app/controllers/network_services_controller.go:205 +0x3cb
github.com/cloudnativelabs/kube-router/app/controllers.(*NetworkServicesController).sync(0xc4203924d0)
/home/kube/go/src/github.com/cloudnativelabs/kube-router/app/controllers/network_services_controller.go:126 +0xbf
github.com/cloudnativelabs/kube-router/app/controllers.(*NetworkServicesController).Run(0xc4203924d0, 0xc42033a4e0, 0xc4202e27b0)
/home/kube/go/src/github.com/cloudnativelabs/kube-router/app/controllers/network_services_controller.go:106 +0x1d2
created by github.com/cloudnativelabs/kube-router/app.(*KubeRouter).Run
/home/kube/go/src/github.com/cloudnativelabs/kube-router/app/server.go:146 +0x4ec
Lots of inforamtion is readily available (Information from conntrack, ipvs, iptables dropped packets etc) to Kube-router that can be exposed as prometheus metric.
Add capability to present metrics as prometheus endpoint.
For people like me that use CoreOS or other minimal/immutable operating systems, there are very few tools easily available for troubleshooting and advanced configuration. A common practice for Kubernetes applications is to make available a toolbox container that comes with software and configuration ready to perform these tasks.
For kube-router the toolbox should include:
Configuration:
For #66 - Phase 1
cloudnativelabs/kube-router:latest
tag should point to the latest release version.cloudnativelabs/kube-router:master
tag should point to the latest commit version.All previous releases and commits should be available in the registry separately.
I will have to investigate how to do this automatically with every release/commit. I've done it before with quay.io registry but I haven't used Docker Hub yet.
requiremnet to set source-destination check to false is not required. Kube-router automatically does it #35 . Raise PR in KOPS once we make new release of kube-router.
If service manifest has "SessionAffinity" set, then configure IPVS to provide session persistence. Only persistence type supported by manifest is "ClientIP" so we just nees to provide client ip based session persistence.
To increase testing/adoption and further automate and standardize configuration/deployment. It should probably live in the official Chart repository for maximum exposure and help with issues, so it won't be added to this repository in that case. I will work on this and close this ticket once it's added to the official Chart repo.
I set the net.beta.kubernetes.io/network-policy
annotation to an invalid (json) value:
kubectl annotate ns test "net.beta.kubernetes.io/network-policy={\"ingress\": {\"isolation\": \"DefaultDeny\"}}"- --overwrite
This caused kube-router to panic and crashloop, but only on the nodes that had pods in the namespace the policy was applied to. Fixing the json in the annotation cleared things up.
I think I've found an unhandled Error that will lead me to a fix. I'll submit a PR soon if I find it.
Log:
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x4eec82]
goroutine 205 [running]:
panic(0x1645840, 0xc420018050)
/usr/local/go/src/runtime/panic.go:500 +0x1a1
github.com/cloudnativelabs/kube-router/app/controllers.(*NetworkPolicyController).syncPodFirewallChains(0xc42027a100, 0xc42089e7e0, 0x0, 0x0)
/home/kube/go/src/github.com/cloudnativelabs/kube-router/app/controllers/network_policy_controller.go:323 +0x112
github.com/cloudnativelabs/kube-router/app/controllers.(*NetworkPolicyController).Sync(0xc42027a100)
/home/kube/go/src/github.com/cloudnativelabs/kube-router/app/controllers/network_policy_controller.go:160 +0x2ac
github.com/cloudnativelabs/kube-router/app/controllers.(*NetworkPolicyController).OnPodUpdate(0xc42027a100, 0xc4203981f0)
/home/kube/go/src/github.com/cloudnativelabs/kube-router/app/controllers/network_policy_controller.go:112 +0x17a
github.com/cloudnativelabs/kube-router/app/watchers.(*podWatcher).RegisterHandler.func1(0x1540e00, 0xc4203981f0)
/home/kube/go/src/github.com/cloudnativelabs/kube-router/app/watchers/pods_watcher.go:65 +0x4a
github.com/cloudnativelabs/kube-router/utils.ListenerFunc.OnUpdate(0xc42066b800, 0x1540e00, 0xc4203981f0)
/home/kube/go/src/github.com/cloudnativelabs/kube-router/utils/utils.go:14 +0x3a
created by github.com/cloudnativelabs/kube-router/utils.(*Broadcaster).Notify
/home/kube/go/src/github.com/cloudnativelabs/kube-router/utils/utils.go:37 +0xa3
From the TODO:
explore integration of an ingress controller so Kube-router will be one complete solution for both east-west and north-south traffic
@rmb938 reported that on BGP peer disconnect, advertised route from peer is not cleanup in local routing table. Kube-router should listen to BGP disconnect and remove the routes.
In cloud environments kube-router users should be able to define a LoadBalancer service that:
Should be tested on AWS and GKE. h/t @drobinson123
See if kops can get support kube-router
kubernetes/kops#2606
Current model host gateway based routing of Kube-router assumes nodes are L2 adjacent or in same subnet. this should work fine for most cases. But if we have a cluster with nodes in different subnets, i.e nodes are not in L2 adjacent., then host routing based approach for cross node pod-to-pod connectivity does not work.
Perhaps easy way to address is to use VXLAN encapsulation. So routes are set such that if destination node is in not in same subnet then use the interface to VXLAN encap.
In specific, this project calico document shows a really common design that network engineers really like for these sorts of things. I'd like to use kube-router, but can't due to it not supporting this.
This involves each rack using a different AS
In some use-cases its desirable for external (outside cluster) access for the cluster IP's. While NodePort can be used its not convenient to use non standard node ports. Its more familiar to use some thing like cluster-ip:80 thank node-ip:node-port.
Add a flag, when true, add a route to RIB that GoBgp can advertise to its peers. Ofcourse we will endup every node advertising the cluster IP. Upstream routers can use ECMP to load balance.
We need a masquerade rule so that pod can reach to external network. A pod can reach any pod in the cluster, cluster IP's and nodes. Any traffic not destined for any of (other pod, cluster ip, node) then masquerade traffic.
@thoro reported this issue. I tested it out and could reproduce it.
~ # gobgp neighbor
Peer AS Up/Down State |#Received Accepted
64512 00:30:53 Establ | 1 1
64512 00:31:05 Establ | 1 1
64512 00:30:57 Establ | 1 1
Problem seems to be with neighbor command only. Global rib shows up fine.
~ # gobgp global rib
Network Next Hop AS_PATH Age Attrs
*> 100.96.0.0/24 172.20.41.250 4000 400000 300000 40001 00:30:59 [{Origin: i} {LocalPref: 100}]
*> 100.96.1.0/24 172.20.47.91 4000 400000 300000 40001 00:31:11 [{Origin: i} {LocalPref: 100}]
*> 100.96.2.0/24 172.20.61.45 4000 400000 300000 40001 00:31:03 [{Origin: i} {LocalPref: 100}]
*> 100.96.3.0/24 172.20.33.233 4000 400000 300000 40001 00:00:13 [{Origin: i}]
~ #
From local gobgp client i am able too see the information fine.
/home/kube/go/bin/gobgp neighbor -u 52.36.44.2
Peer AS Up/Down State |#Received Accepted
172.20.33.233 64512 00:19:22 Establ | 1 1
172.20.41.250 64512 00:19:24 Establ | 1 1
172.20.47.91 64512 00:19:32 Establ | 1 1
kube@kube-master:~$ /home/kube/go/bin/gobgp global rib -u 52.36.44.2
Network Next Hop AS_PATH Age Attrs
*> 100.96.0.0/24 172.20.41.250 4000 400000 300000 40001 00:29:36 [{Origin: i} {LocalPref: 100}]
*> 100.96.1.0/24 172.20.47.91 4000 400000 300000 40001 00:29:44 [{Origin: i} {LocalPref: 100}]
*> 100.96.2.0/24 172.20.61.45 4000 400000 300000 40001 00:00:51 [{Origin: i}]
*> 100.96.3.0/24 172.20.33.233 4000 400000 300000 40001 00:29:34 [{Origin: i} {LocalPref: 100}]
Threads seem to be piling up on my kube-router pods and it is eventually killed and restarted.
I0607 14:14:54.708042 1 network_services_controller.go:407] ipvs service 10.3.0.15:tcp:2379 already exists so returning
I0607 14:14:54.885319 1 network_services_controller.go:448] ipvs destination 10.10.2.1:2379 already exists in the ipvs service 10.3.0.15:tcp:2379 so not adding destination
I0607 14:14:55.074909 1 network_services_controller.go:448] ipvs destination 10.10.2.2:2379 already exists in the ipvs service 10.3.0.15:tcp:2379 so not adding destination
I0607 14:14:55.249060 1 network_services_controller.go:448] ipvs destination 10.10.2.3:2379 already exists in the ipvs service 10.3.0.15:tcp:2379 so not adding destination
I0607 14:14:55.341417 1 network_policy_controller.go:94] Performing periodic syn of the iptables to reflect network policies
I0607 14:14:55.342273 1 network_services_controller.go:103] Performing periodic syn of the ipvs services and server to reflect desired state of kubernetes services and endpoints
runtime: program exceeds 10000-thread limit
fatal error: thread exhaustion
runtime stack:
runtime.throw(0x182a6f1, 0x11)
/usr/local/go/src/runtime/panic.go:566 +0x95
runtime.checkmcount()
/usr/local/go/src/runtime/proc.go:486 +0xa4
runtime.mcommoninit(0xc44146bc00)
/usr/local/go/src/runtime/proc.go:506 +0xd5
runtime.allocm(0xc42001f500, 0x190e5c0, 0xc400000001)
/usr/local/go/src/runtime/proc.go:1286 +0x9b
runtime.newm(0x190e5c0, 0xc42001f500)
/usr/local/go/src/runtime/proc.go:1555 +0x39
runtime.startm(0xc42001f500, 0x100000001)
/usr/local/go/src/runtime/proc.go:1642 +0x181
runtime.wakep()
/usr/local/go/src/runtime/proc.go:1723 +0x57
runtime.resetspinning()
/usr/local/go/src/runtime/proc.go:2039 +0x8b
runtime.schedule()
/usr/local/go/src/runtime/proc.go:2127 +0x136
runtime.mstart1()
/usr/local/go/src/runtime/proc.go:1136 +0xd8
runtime.mstart()
/usr/local/go/src/runtime/proc.go:1096 +0x64
goroutine 1 [chan receive, 27 minutes]:
github.com/cloudnativelabs/kube-router/app.(*KubeRouter).Run(0xc420402560, 0xc420402560, 0x0)
/home/kube/go/src/github.com/cloudnativelabs/kube-router/app/server.go:152 +0x228
main.main()
/home/kube/go/src/github.com/cloudnativelabs/kube-router/kube-router.go:37 +0x13c
goroutine 17 [syscall, 27 minutes, locked to thread]:
runtime.goexit()
/usr/local/go/src/runtime/asm_amd64.s:2086 +0x1
goroutine 5 [syscall, 27 minutes]:
os/signal.signal_recv(0x0)
/usr/local/go/src/runtime/sigqueue.go:116 +0x157
os/signal.loop()
/usr/local/go/src/os/signal/signal_unix.go:22 +0x22
created by os/signal.init.1
/usr/local/go/src/os/signal/signal_unix.go:28 +0x41
goroutine 6 [chan receive]:
github.com/cloudnativelabs/kube-router/vendor/github.com/golang/glog.(*loggingT).flushDaemon(0x246f740)
/home/kube/go/src/github.com/cloudnativelabs/kube-router/vendor/github.com/golang/glog/glog.go:879 +0x7a
created by github.com/cloudnativelabs/kube-router/vendor/github.com/golang/glog.init.1
/home/kube/go/src/github.com/cloudnativelabs/kube-router/vendor/github.com/golang/glog/glog.go:410 +0x21d
I've attached kube-router.log.gz which is the full log the snippet above came from.
On one of the nodes I can see the threads increasing slowly this way:
core@node1 ~ $ ps aux|grep kube-router
root 1834843 8.9 0.2 26928172 242768 ? Ssl 14:14 0:31 /kube-router --run-router=true --run-firewall=true --run-service-proxy=true --cluster-cidr=10.2.0.0/16 --advertise-cluster-ip --cluster-asn=64512 --peer-asn=64512 --peer-router=10.10.10.33 --kubeconfig=/etc/kubernetes/kubeconfig
core 1839688 0.0 0.0 6736 936 pts/0 S+ 14:20 0:00 grep --colour=auto kube-router
core@node1 ~ $ ps huH p 1834843|wc -l
2365
core@node1 ~ $ ps huH p 1834843|wc -l
2367
core@node1 ~ $ ps huH p 1834843|wc -l
2371
core@node1 ~ $ ps huH p 1834843|wc -l
2388
Network service controller in Kube-router need to generate a unique service key ( combination of namespace, service name, and spec.ports.name) that will be used to index service info map, and end points info map.
Current key generation is flawed, where it fails if there is mismatch in port opened by the service and port opened by endpoint.
Correct way is to use spec.ports.name, which is internally copied by the API server into endpoint API object as well. This is what Kube-proxy uses as well.
So we can edit the release before it's published.
IPVS has this nice LB method http://kb.linuxvirtualserver.org/wiki/Dynamic_Feedback_Load_Balancing_Scheduling
Which is all the more relevent in case of distrubuted load balancing requirements of 'ClusterIP' and NodePort service types. Each node doing load balancing in round robin fashion has below limitations.
Here is the proposal:
Use 'clusterCIDR' similar to the flag Kube-proxy has to distinguish between internal and external traffic. For the external traffic hitting the node port, we need to ensure traffic goes through the node in reverse path (when using IPVS nat mode). Pod will try to send directly to the source if we dont maqurade the traffic.
If we have to track connections for #10 and #5 anyway, we very well expose http endpoint on Kube-router to visuaize dynamic state of services. Kube-router runs a daemonset so we have pod running on the node, we can expose a service as node port type that can give a light weight service visualization.
With glog updates we're running into this bug, kubernetes/kubernetes#17162
There's a workaround in there that I'll try out.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.