Git Product home page Git Product logo

Comments (6)

guillaumelauzier avatar guillaumelauzier commented on May 27, 2024 1

It looks like the coredns pods are failing to start because they are unable to connect to the Kubernetes API server. This could be due to a network issue, or an issue with the configuration of the coredns pods.

One possible solution is to check the logs of the coredns pods to see if there is more detailed information about the error. You can do this by running the following command:

kubectl logs -n kube-system coredns-<POD-ID>

Replace with the actual ID of the coredns pod that is failing. This should give you more information about the cause of the error.

Additionally, you can try restarting the coredns pods to see if that fixes the issue. You can do this by running the following command:

kubectl delete pod -n kube-system coredns-<POD-ID>

Again, replace with the actual ID of the coredns pod. This will delete the failing pod, and the Kubernetes cluster will automatically create a new one in its place.

from kubernetes-the-hard-way.

saeed0808 avatar saeed0808 commented on May 27, 2024 1

from kubernetes-the-hard-way.

sven-borkert avatar sven-borkert commented on May 27, 2024

Hi,

yes, the coredns pods are starting, but not going "ready" because they cannot reach the cluster ip of the Kubernetes API server. I deleted one of the pods and checked the logs of the newly created pod:

$ kubectl logs coredns-7c9cfc6995-snvgp -n kube-system -f
plugin/kubernetes: Get "https://10.32.0.1:443/version?timeout=32s": dial tcp 10.32.0.1:443: i/o timeout

My networking seems to be broken, and I don't see the 10.32.0.1 in the iptables rules on the worker nodes. I think that kube-proxy should create a rule that catches connections to that virtual IP and forwards them to the controller nodes running the kube-apiservers, right? I checked the logs of kube-proxy on the nodes and I don't see any errors.

In the iptables rules of the workers I see it has created a rule for kube-dns, but this has no targets yet as it rejected as the coredns pods don't start correctly:

-A KUBE-SERVICES -d 10.32.0.10/32 -p udp -m comment --comment "kube-system/kube-dns:dns has no endpoints" -m udp --dport 53 -j REJECT --reject-with icmp-port-unreachable
-A KUBE-SERVICES -d 10.32.0.10/32 -p tcp -m comment --comment "kube-system/kube-dns:dns-tcp has no endpoints" -m tcp --dport 53 -j REJECT --reject-with icmp-port-unreachable
-A KUBE-SERVICES -d 10.32.0.10/32 -p tcp -m comment --comment "kube-system/kube-dns:metrics has no endpoints" -m tcp --dport 9153 -j REJECT --reject-with icmp-port-unreachable

From my understanding, I would have expected a rule here that filters on destination 10.32.0.1:443, right? But for some reason there is no rule like that, so the service it not reachable.

Besides that, my container networking seems to be fully broken. I have checked the tutorial multiple times but did not find the error yet. I have installed the cni-plugins-linux to /opt/cni/bin/ and created /etc/cni/net.d/10-bridge.conf and 99-loopback.conf:

root@worker-0:/etc/cni/net.d# ls -l
total 8
-rw-r--r-- 1 root root 303 Dez  1 19:31 10-bridge.conf
-rw-r--r-- 1 root root  72 Dez  1 19:32 99-loopback.conf
root@worker-0:/etc/cni/net.d# cat *
{
    "cniVersion": "0.4.0",
    "name": "bridge",
    "type": "bridge",
    "bridge": "cnio0",
    "isGateway": true,
    "ipMasq": true,
    "ipam": {
        "type": "host-local",
        "ranges": [
          [{"subnet": "10.200.0.0/24"}]
        ],
        "routes": [{"dst": "0.0.0.0/0"}]
    }
}
{
    "cniVersion": "0.4.0",
    "name": "lo",
    "type": "loopback"
}

I verified they each have their own subnet 10.200.0.0/24, 10.200.1.0/24, 10.200.2.0/24.

I can see the expected bridge interface cnio0 on the worker nodes:

root@worker-0:~# ifconfig 
cnio0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 10.200.0.1  netmask 255.255.255.0  broadcast 10.200.0.255
        inet6 fe80::f0aa:c0ff:fe70:2040  prefixlen 64  scopeid 0x20<link>
        ether 4a:55:d3:bc:d7:b6  txqueuelen 1000  (Ethernet)
        RX packets 230  bytes 13102 (13.1 KB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 30  bytes 1588 (1.5 KB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

ens33: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 192.168.0.220  netmask 255.255.255.0  broadcast 192.168.0.255
        inet6 fe80::250:56ff:fe3b:abcb  prefixlen 64  scopeid 0x20<link>
        ether 00:50:56:3b:ab:cb  txqueuelen 1000  (Ethernet)
        RX packets 8906  bytes 2294242 (2.2 MB)
        RX errors 0  dropped 1840  overruns 0  frame 0
        TX packets 4248  bytes 616743 (616.7 KB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

ens34: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 10.240.0.20  netmask 255.255.255.0  broadcast 10.240.0.255
        inet6 fe80::250:56ff:fe38:d826  prefixlen 64  scopeid 0x20<link>
        ether 00:50:56:38:d8:26  txqueuelen 1000  (Ethernet)
        RX packets 725  bytes 91678 (91.6 KB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 1005  bytes 84992 (84.9 KB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

lo: flags=73<UP,LOOPBACK,RUNNING>  mtu 65536
        inet 127.0.0.1  netmask 255.0.0.0
        inet6 ::1  prefixlen 128  scopeid 0x10<host>
        loop  txqueuelen 1000  (Local Loopback)
        RX packets 2930  bytes 196926 (196.9 KB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 2930  bytes 196926 (196.9 KB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

veth74645b5c: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet6 fe80::7c22:c3ff:fe57:87ee  prefixlen 64  scopeid 0x20<link>
        ether 6a:70:f9:6f:be:48  txqueuelen 0  (Ethernet)
        RX packets 19  bytes 1424 (1.4 KB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 40  bytes 3076 (3.0 KB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

veth857e8004: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet6 fe80::7cfb:c3ff:fe59:9c08  prefixlen 64  scopeid 0x20<link>
        ether 12:30:8f:a8:f2:3c  txqueuelen 0  (Ethernet)
        RX packets 120  bytes 8408 (8.4 KB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 48  bytes 3180 (3.1 KB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

The routes on the worker look like this:

root@worker-0:~# ip route
default via 192.168.0.1 dev ens33 proto dhcp src 192.168.0.220 metric 100 
10.200.0.0/24 via 10.240.0.20 dev ens34 proto static 
10.200.0.0/24 dev cnio0 proto kernel scope link src 10.200.0.1 
10.200.1.0/24 via 10.240.0.21 dev ens34 proto static 
10.200.2.0/24 via 10.240.0.22 dev ens34 proto static 
10.240.0.0/24 dev ens34 proto kernel scope link src 10.240.0.20 
192.168.0.0/24 dev ens33 proto kernel scope link src 192.168.0.220 metric 100 
192.168.0.1 dev ens33 proto dhcp scope link src 192.168.0.220 metric 100 

I started a "busybox" pod on worker-0 to check the network connection. It has it's interface and an IP from the correct subnet. But it's not even able to ping the gateway, and nothing else:

$ kubectl exec -ti busybox -- /bin/sh
/ # ifconfig 
eth0      Link encap:Ethernet  HWaddr BA:96:88:8C:34:EE  
          inet addr:10.200.0.55  Bcast:10.200.0.255  Mask:255.255.255.0
          inet6 addr: fe80::b896:88ff:fe8c:34ee/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:41 errors:0 dropped:0 overruns:0 frame:0
          TX packets:22 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0 
          RX bytes:3118 (3.0 KiB)  TX bytes:1662 (1.6 KiB)

lo        Link encap:Local Loopback  
          inet addr:127.0.0.1  Mask:255.0.0.0
          inet6 addr: ::1/128 Scope:Host
          UP LOOPBACK RUNNING  MTU:65536  Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:0 (0.0 B)  TX bytes:0 (0.0 B)

/ # ip route
default via 10.200.0.1 dev eth0 
10.200.0.0/24 dev eth0 scope link  src 10.200.0.55 
/ # ping 10.200.0.1
PING 10.200.0.1 (10.200.0.1): 56 data bytes

Thank you for any hints that might me help to understand this.
Regards,
Sven

from kubernetes-the-hard-way.

sven-borkert avatar sven-borkert commented on May 27, 2024

Aaaaaah! Sometimes it's good to write the details to someone else. My routes are wrong. I fixed the routing and the pods went healthy. Not sure if everything works now, but I'm one step further. :)

from kubernetes-the-hard-way.

sven-borkert avatar sven-borkert commented on May 27, 2024

All the tests from the tutorial are working and the containers can reach each other, nice.

I did this installation on Ubuntu 22.04. It seems to be a good idea (easier) to use the containerd and runc that come with this Ubuntu version, the manually installed version from this tutorial seem to be unhappy with cgroup v2. (I know this tutorial is meant for an older Ubuntu version)

The coredns does resolve the name "kubernetes", and after I added a forward to it's configuration is also resolves external IPs. But it does not seem to resolve any pod names. Shouldn't it?

In other tutorials I always read I would need a CNI provider like "Calico" for the networking between the pods. The package cni-plugins-linux is not Calico as far as I understand, so this tutorial does not install Calico. What do I need it for then?

Regards,
Sven

from kubernetes-the-hard-way.

justizin avatar justizin commented on May 27, 2024

Ran into this in the current version of the tutorial as of today. I was able to resolve it by following instructions to modify the coredns config to point at 1.8, but also kubectl apply -f deployments/kube-dns.yaml. not sure if this puts things in an optimal state, but i wasn't able to resolve with only one applied, and with both it works fine.

from kubernetes-the-hard-way.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.