kelseyhightower / kubernetes-the-hard-way Goto Github PK

Bootstrap Kubernetes the hard way. No scripts.

License: Apache License 2.0

kubernetes-the-hard-way's Introduction

Kubernetes The Hard Way

This tutorial walks you through setting up Kubernetes the hard way. This guide is not for someone looking for a fully automated tool to bring up a Kubernetes cluster. Kubernetes The Hard Way is optimized for learning, which means taking the long route to ensure you understand each task required to bootstrap a Kubernetes cluster.

The results of this tutorial should not be viewed as production ready, and may receive limited support from the community, but don't let that stop you from learning!

Copyright

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

Target Audience

The target audience for this tutorial is someone who wants to understand the fundamentals of Kubernetes and how the core components fit together.

Cluster Details

Kubernetes The Hard Way guides you through bootstrapping a basic Kubernetes cluster with all control plane components running on a single node, and two worker nodes, which is enough to learn the core concepts.

Component versions:

kubernetes v1.28.x
containerd v1.7.x
cni v1.3.x
etcd v3.4.x

Labs

This tutorial requires four (4) ARM64 based virtual or physical machines connected to the same network. While ARM64 based machines are used for the tutorial, the lessons learned can be applied to other platforms.

kubernetes-the-hard-way's People

Contributors

Stargazers

Watchers

Forkers

wolfeidau amouat junoteam 40a indcoder wallyqs elsonrodriguez danisla warrenv dockernuts kkonrad lisa l0r3zz tsugliani ikennaokpala rnaveiras paultyng hartmantis linearregression sohamchakraborty shane-c venezia 21hub eisig symbiosespa sub-mod pmarkowsky smugcloud c24io linuxus skysec archyufa benjvi victoru eftychis vmtrooper sushant a-yiorgos dlorenc izogain zed9 mapennell mohammed90 elijahlynn sharifmamun dinh claudiouzelac mirzmaster andrewlouisx pdnguyen utahdave dyhpoon needtolearn vidbina rtvt123 tvtritin devopstips zerohistory danishkhakwani levackt pangm2016 dmyerscough caledhwa ryanj matzew hansbickhofe nanne007 dlsniper foodiefm yolkov redsift moandcompany cardamaro segmond justinsb eformat faun borisroman kchitrapu jjjesus jasonstrimpel stevesloka hbcbh1999 raj347 watkinsv-hp steeef sysbot grodrigues3 colemickens sureshn number0 rlugojr jmarley stanchan thejsj slon1024 adamar gabrielcc2 herbygillot joeint

kubernetes-the-hard-way's Issues

DNS addon as ReplicationController rather than Deployment

Great tutorial!

I see that Deployments seem to be way to go for deploying stuff, but it's not the case for the DNS addon.

Just wondering, is there any technical reason specific to this addon, or is just because it works fine as RC and it just isn't worth making a Deployment instead?

Etcd –client-cert-auth

Hello,

When I use this flag in etcd, kubectl get cs will say:

etcd-2               Unhealthy   Get https://gc03.gloriouscloud.com:2379/health: remote error: bad certificate

But otherwise the cluster seems to be working fine.

Is this a reason in your tutorial you are not using client authentication in etcd?

I am trying to understand what I am doing wrong or if it is something else.

I was writing similar set of tutorial regarding installation of Kubernetes on bare metal.

https://medium.com/@elcct/kubernetes-on-bare-metal-part-5-kubernetes-master-b6a0388fa993

and I am stuck on this problem :)

Thanks!

Explain how HA works

Kelsey, this is great stuff. Thank you.

When I studied the API server command line flags, I expected to see some indication of where the other API servers were running in the cluster. Instead, I see only --apiserver-count=3, which doesn't say where to find the other two instances.

In the same vein, when I look at the controller-manager and scheduler command line config, I can reason about how they leader-elect, because they both point at a master in the know.

So what is not clear is how the masters find each other, if in fact they need to find each other at all.

Could you add a paragraph elaborating on this?

kube-proxy cannot connect - connection refused

I've followed steps 1-4 via the AWS steps and I'm seeing an error when I get to the last stage of 05

sudo systemctl status kube-proxy --no-pager

returns the following:

Failed to list *api.Service: Get https://10.0.32.10:6443/api/v1/services?resourceVersion=0: dial tcp 10.0.32.10:6443: getsockopt: connection refused

I've pretty much copied everything and changed the values to fit my AWS configuration.

I also see the same error when trying to connect the kubelet

sudo systemctl status kubelet --no-pager

Failed to list *api.Service: Get https://10.0.32.10:6443/api/v1/services?resourceVersion=0: dial tcp 10.0.32.10:6443: getsockopt: connection refused

Double escape on kubelet unit file

Don't know if this is a issue but I have to double escape my kubelet unit file for workers to register on controllers.

Example:

sudo sh -c 'echo "[Unit]
Description=Kubernetes Kubelet
Documentation=https://github.com/GoogleCloudPlatform/kubernetes
After=docker.service
Requires=docker.service

[Service]
ExecStart=/usr/bin/kubelet \
  --allow-privileged=true \
  --api-servers=https://10.240.0.20:6443,https://10.240.0.21:6443,https://10.240.0.22:6443 \
  --cloud-provider= \
  --cluster-dns=10.32.0.10 \
  --cluster-domain=cluster.local \
  --configure-cbr0=true \
  --container-runtime=docker \
  --docker=unix:///var/run/docker.sock \
  --network-plugin=kubenet \
  --kubeconfig=/var/lib/kubelet/kubeconfig \
  --reconcile-cidr=true \
  --serialize-image-pulls=false \
  --tls-cert-file=/var/lib/kubernetes/kubernetes.pem \
  --tls-private-key-file=/var/lib/kubernetes/kubernetes-key.pem \
  --v=2

Restart=on-failure
RestartSec=5

[Install]
WantedBy=multi-user.target" > /etc/systemd/system/kubelet.service'

changed to this:

sudo sh -c 'echo "[Unit]
Description=Kubernetes Kubelet
Documentation=https://github.com/GoogleCloudPlatform/kubernetes
After=docker.service
Requires=docker.service

[Service]
ExecStart=/usr/bin/kubelet \\
  --allow-privileged=true \\
  --api-servers=https://10.240.0.20:6443,https://10.240.0.21:6443,https://10.240.0.22:6443 \\
  --cloud-provider= \\
  --cluster-dns=10.32.0.10 \\
  --cluster-domain=cluster.local \\
  --configure-cbr0=true \\
  --container-runtime=docker \\
  --docker=unix:///var/run/docker.sock \\
  --network-plugin=kubenet \\
  --kubeconfig=/var/lib/kubelet/kubeconfig \\
  --reconcile-cidr=true \\
  --serialize-image-pulls=false \\
  --tls-cert-file=/var/lib/kubernetes/kubernetes.pem \\
  --tls-private-key-file=/var/lib/kubernetes/kubernetes-key.pem \\
  --v=2

Restart=on-failure
RestartSec=5

[Install]
WantedBy=multi-user.target" > /etc/systemd/system/kubelet.service'

GH Relative Links Broken

Hello,

The links on the bottom of the page are not including the repo.

https://github.com/kelseyhightower/docs/10-cleanup.md

Should be:

https://github.com/kelseyhightower/kubernetes-the-hard-way/blob/master/docs/01-infrastructure.md

I have also seen

https://github.com/kelseyhightower/kubernetes-the-hard-way/docs/01-infrastructure.md

appear to work. I suspect this is GH more than the MD. For the record I am at Heathrow trying this.

Regards.

Ron

is kubectl required on the worker nodes ?

In the 05-kubernetes-worker guide we install kubectl, but to my current understanding, only kubelet, kube-proxy and docker are required.
To me all the communications happens between the kubernetes master kube-apiserver <-> and the kubelet agent as shown in the architecture diagram below:

Can anyone clarify what kubectl is for on the worker nodes ?

Thanks in advance,

Quota Exceeded on #CPU using Google Cloud on Free Trial

So, I currently have nothing but the $300 credit for Google Cloud over 60 days. That's all good. However, since https://github.com/kelseyhightower/kubernetes-the-hard-way/blob/master/docs/01-infrastructure.md requires creating 9 machines, I run into issues because I'm only allowed to create 8 machines.

Therefore I get

ERROR: (gcloud.compute.instances.create) Some requests did not succeed:
 - Quota 'CPUS' exceeded. Limit: 8.0

I'm just running through the tutorial so I thought I'd let you know. I figure that I can do the tutorial with only 2 workers anyhow -- probably not as compelling, but that's ok.

Could not resolve host: $NODE_PUBLIC_IP

Everything worked brilliantly for me until the last steps in section 9, the smoketest.

➤ export NODE_PUBLIC_IP=$(gcloud compute instances describe worker0 \ --format 'value(networkInterfaces[0].accessConfigs[0].natIP)')
➤ curl http://$\{NODE_PUBLIC_IP\}:$\{NODE_PORT\}
curl: (6) Could not resolve host: $NODE_PUBLIC_IP

I'm using zsh but tried it with bash and got the same result.

Add an Upgrading Kubernetes lab.

Document how to upgrade a Kubernetes cluster.

Kube-proxy on workers are configured to single controller

I see that the kube-proxies on workers are configured to a single controller (controller0 in this case) as the api server. Is it by design or should we configure worker0 to controller0, worker1 to controller1 and worker2 to controller2?

Please clarify.

Thank you.

cfssl can be installed with homebrew on OSX

Are you wanting to have this repository transform eventually into an automated tool, or recommend ways to make it easier? Because installing CFSSL on OS X can be done with:

brew install cfssl

Scheduler token

Hi there,

this is the best Kubernetes deep dive tutorial I've ever seen. Thank you!

Can someone please explain, where the scheduler token is used, or if it is unnecessary.

smoke test fails

When I run the curl command against any of my workers, it fails. How do I go about troubleshooting? I'm able to kubectl exec into the nginx containers so I know nginx is running.

$ curl http://23.236.52.15:31972
curl: (7) Failed to connect to 23.236.52.15 port 31972: Connection refused

$ gcloud compute instances list
NAME ZONE MACHINE_TYPE PREEMPTIBLE INTERNAL_IP EXTERNAL_IP STATUS
controller0 us-central1-c n1-standard-1 10.240.0.20 104.197.2.56 RUNNING
controller1 us-central1-c n1-standard-1 10.240.0.21 104.197.12.79 RUNNING
controller2 us-central1-c n1-standard-1 10.240.0.22 104.197.232.99 RUNNING
etcd0 us-central1-c n1-standard-1 10.240.0.10 104.197.137.217 RUNNING
etcd1 us-central1-c n1-standard-1 10.240.0.11 104.154.102.172 RUNNING
etcd2 us-central1-c n1-standard-1 10.240.0.12 104.197.183.71 RUNNING
worker0 us-central1-c n1-standard-1 10.240.0.30 130.211.169.241 RUNNING
worker1 us-central1-c n1-standard-1 10.240.0.31 23.236.52.15 RUNNING
worker2 us-central1-c n1-standard-1 10.240.0.32 130.211.181.18 RUNNING

$ gcloud compute firewall-rules list
NAME NETWORK SRC_RANGES RULES SRC_TAGS TARGET_TAGS
default-allow-icmp default 0.0.0.0/0 icmp
default-allow-internal default 10.128.0.0/9 tcp:0-65535,udp:0-65535,icmp
default-allow-rdp default 0.0.0.0/0 tcp:3389
default-allow-ssh default 0.0.0.0/0 tcp:22
kubernetes-allow-api-server kubernetes 0.0.0.0/0 tcp:6443
kubernetes-allow-healthz kubernetes 130.211.0.0/22 tcp:8080
kubernetes-allow-icmp kubernetes 0.0.0.0/0 icmp
kubernetes-allow-internal kubernetes 10.240.0.0/24 tcp:0-65535,udp:0-65535,icmp
kubernetes-allow-rdp kubernetes 0.0.0.0/0 tcp:3389
kubernetes-allow-ssh kubernetes 0.0.0.0/0 tcp:22
kubernetes-nginx-service kubernetes 0.0.0.0/0 tcp:31972

Document Dashboard add-on

Worker nodes not connecting to controllers

Cluster is running but when I tried to run kubectl get nodes I get nothing.

ubuntu@ip-10-240-0-32:~$ sudo systemctl status kubelet --no-pager -l
sudo: unable to resolve host ip-10-240-0-32
● kubelet.service - Kubernetes Kubelet
   Loaded: loaded (/etc/systemd/system/kubelet.service; enabled; vendor preset: enabled)
   Active: active (running) since Mon 2016-09-12 15:25:18 UTC; 29min ago
     Docs: https://github.com/GoogleCloudPlatform/kubernetes
 Main PID: 4826 (kubelet)
   CGroup: /system.slice/kubelet.service
           ├─4826 /usr/bin/kubelet --allow-privileged=true --api-servers=https://10.240.0.20:6443,https://10.240.0.21:6443,https://10.240.0.22:6443 --cloud-provider= --cluster-dns=10.32.0.10 --cluster-domain=cluster.local --configure-cbr0=true --container-runtime=docker --docker=unix:///var/run/docker.sock --network-plugin=kubenet --kubeconfig=/var/lib/kubelet/kubeconfig --reconcile-cidr=true --serialize-image-pulls=false --tls-cert-file=/var/lib/kubernetes/kubernetes.pem --tls-private-key-file=/var/lib/kubernetes/kubernetes-key.pem --v=2
           └─4926 journalctl -k -f

Sep 12 15:54:07 ip-10-240-0-32 kubelet[4826]: E0912 15:54:07.792577    4826 kubelet.go:1193] Unable to construct api.Node object for kubelet: can't get ip address of node ip-10-240-0-32: lookup ip-10-240-0-32 on 10.240.0.2:53: no such host
Sep 12 15:54:08 ip-10-240-0-32 kubelet[4826]: I0912 15:54:08.966674    4826 kubelet.go:2510] skipping pod synchronization - [Kubenet does not have netConfig. This is most likely due to lack of PodCIDR]
Sep 12 15:54:13 ip-10-240-0-32 kubelet[4826]: I0912 15:54:13.967726    4826 kubelet.go:2510] skipping pod synchronization - [Kubenet does not have netConfig. This is most likely due to lack of PodCIDR]
Sep 12 15:54:14 ip-10-240-0-32 kubelet[4826]: E0912 15:54:14.795108    4826 kubelet.go:1193] Unable to construct api.Node object for kubelet: can't get ip address of node ip-10-240-0-32: lookup ip-10-240-0-32 on 10.240.0.2:53: no such host
Sep 12 15:54:18 ip-10-240-0-32 kubelet[4826]: I0912 15:54:18.972411    4826 kubelet.go:2510] skipping pod synchronization - [Kubenet does not have netConfig. This is most likely due to lack of PodCIDR]
Sep 12 15:54:21 ip-10-240-0-32 kubelet[4826]: E0912 15:54:21.829771    4826 kubelet.go:1193] Unable to construct api.Node object for kubelet: can't get ip address of node ip-10-240-0-32: lookup ip-10-240-0-32 on 10.240.0.2:53: no such host
Sep 12 15:54:23 ip-10-240-0-32 kubelet[4826]: I0912 15:54:23.973540    4826 kubelet.go:2510] skipping pod synchronization - [Kubenet does not have netConfig. This is most likely due to lack of PodCIDR]
Sep 12 15:54:28 ip-10-240-0-32 kubelet[4826]: E0912 15:54:28.832401    4826 kubelet.go:1193] Unable to construct api.Node object for kubelet: can't get ip address of node ip-10-240-0-32: lookup ip-10-240-0-32 on 10.240.0.2:53: no such host
Sep 12 15:54:28 ip-10-240-0-32 kubelet[4826]: I0912 15:54:28.974251    4826 kubelet.go:2510] skipping pod synchronization - [Kubenet does not have netConfig. This is most likely due to lack of PodCIDR]
Sep 12 15:54:33 ip-10-240-0-32 kubelet[4826]: I0912 15:54:33.974961    4826 kubelet.go:2510] skipping pod synchronization - [Kubenet does not have netConfig. This is most likely due to lack of PodCIDR]
ubuntu@ip-10-240-0-32:~$

ubuntu@ip-10-240-0-32:~$ sudo systemctl status kube-proxy --no-pager -l
sudo: unable to resolve host ip-10-240-0-32
● kube-proxy.service - Kubernetes Kube Proxy
   Loaded: loaded (/etc/systemd/system/kube-proxy.service; enabled; vendor preset: enabled)
   Active: active (running) since Mon 2016-09-12 15:25:18 UTC; 29min ago
     Docs: https://github.com/GoogleCloudPlatform/kubernetes
 Main PID: 4891 (kube-proxy)
   CGroup: /system.slice/kube-proxy.service
           └─4891 /usr/bin/kube-proxy --master=https://10.240.0.20:6443 --kubeconfig=/var/lib/kubelet/kubeconfig --proxy-mode=iptables --v=2

Sep 12 15:25:18 ip-10-240-0-32 kube-proxy[4891]: W0912 15:25:18.818446    4891 server.go:416] Failed to retrieve node info: nodes "ip-10-240-0-32" not found
Sep 12 15:25:18 ip-10-240-0-32 kube-proxy[4891]: W0912 15:25:18.818512    4891 proxier.go:227] invalid nodeIP, initialize kube-proxy with 127.0.0.1 as nodeIP
Sep 12 15:25:18 ip-10-240-0-32 kube-proxy[4891]: I0912 15:25:18.818524    4891 server.go:214] Tearing down userspace rules.
Sep 12 15:25:18 ip-10-240-0-32 kube-proxy[4891]: I0912 15:25:18.831959    4891 conntrack.go:40] Setting nf_conntrack_max to 32768
Sep 12 15:25:18 ip-10-240-0-32 kube-proxy[4891]: I0912 15:25:18.833675    4891 conntrack.go:57] Setting conntrack hashsize to 8192
Sep 12 15:25:18 ip-10-240-0-32 kube-proxy[4891]: I0912 15:25:18.833906    4891 conntrack.go:62] Setting nf_conntrack_tcp_timeout_established to 86400
Sep 12 15:25:18 ip-10-240-0-32 kube-proxy[4891]: I0912 15:25:18.837957    4891 proxier.go:440] Adding new service "default/kubernetes:https" at 10.32.0.1:443/TCP
Sep 12 15:25:18 ip-10-240-0-32 kube-proxy[4891]: I0912 15:25:18.838227    4891 proxier.go:674] Not syncing iptables until Services and Endpoints have been received from master
Sep 12 15:25:18 ip-10-240-0-32 kube-proxy[4891]: I0912 15:25:18.839653    4891 proxier.go:516] Setting endpoints for "default/kubernetes:https" to [10.240.0.20:6443 10.240.0.21:6443 10.240.0.22:6443]
Sep 12 15:35:48 ip-10-240-0-32 systemd[1]: Started Kubernetes Kube Proxy.
ubuntu@ip-10-240-0-32:~$

Something is wrong with the hostname setup of my worker node it seems. It can't get a valid IP for itself.

I do not see CIDR addresses assigned to my worker nodes, even though they show up as Ready .

Hi!
I do not see CIDR addresses assigned to my worker nodes, even though they show up as Ready .

Here is some information about the setup:

Using your guide.
6 VMs running on Fedora (23) KVM/Libvirt , all running Fedora (24) 64 bit.
2 x etcd, 2 x controllers, 2 x workers
Kubernetes 1.3.6
Docker 1.11.2

Here is the problem. I do not see a cbr0 up on my worker nodes. i.e. No CIDR IPs assigned to network nodes :(

[root@worker1 ~]# ifconfig
docker0: flags=4099<UP,BROADCAST,MULTICAST>  mtu 1500
        inet 172.17.0.1  netmask 255.255.0.0  broadcast 0.0.0.0
        ether 02:42:4a:68:e4:2f  txqueuelen 0  (Ethernet)
        RX packets 0  bytes 0 (0.0 B)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 0  bytes 0 (0.0 B)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

ens3: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 10.240.0.31  netmask 255.255.255.0  broadcast 10.240.0.255
        inet6 fe80::5054:ff:fe03:a650  prefixlen 64  scopeid 0x20<link>
        ether 52:54:00:03:a6:50  txqueuelen 1000  (Ethernet)
        RX packets 2028  bytes 649017 (633.8 KiB)
        RX errors 0  dropped 6  overruns 0  frame 0
        TX packets 1689  bytes 262384 (256.2 KiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

lo: flags=73<UP,LOOPBACK,RUNNING>  mtu 65536
        inet 127.0.0.1  netmask 255.0.0.0
        inet6 ::1  prefixlen 128  scopeid 0x10<host>
        loop  txqueuelen 1  (Local Loopback)
        RX packets 20  bytes 1592 (1.5 KiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 20  bytes 1592 (1.5 KiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

[root@worker1 ~]#

Here is some more information:

[root@controller1 ~]# kubectl get componentstatuses
NAME                 STATUS    MESSAGE              ERROR
scheduler            Healthy   ok                   
controller-manager   Healthy   ok                   
etcd-0               Healthy   {"health": "true"}   
etcd-1               Healthy   {"health": "true"}   


[root@controller1 ~]# kubectl get nodes
NAME                  STATUS    AGE
worker1.example.com   Ready     47s
worker2.example.com   Ready     41s
[root@controller1 ~]#

[root@worker1 ~]# systemctl status kubelet -l
● kubelet.service - Kubernetes Kubelet
   Loaded: loaded (/etc/systemd/system/kubelet.service; enabled; vendor preset: disabled)
   Active: active (running) since Wed 2016-09-14 13:16:13 CEST; 9min ago
     Docs: https://github.com/GoogleCloudPlatform/kubernetes
 Main PID: 4744 (kubelet)
    Tasks: 11 (limit: 512)
   CGroup: /system.slice/kubelet.service
           ├─4744 /usr/bin/kubelet --allow-privileged=true --api-servers=https://10.240.0.21:6443,https://10.240.0.22:6443 --cloud-provider= --c
           └─4781 journalctl -k -f

4744 kubelet.go:2510] skipping pod synchronization - [Kubenet does not have netConfig. This is most likely due to lack of PodCIDR]
4744 kubelet.go:2510] skipping pod synchronization - [Kubenet does not have netConfig. This is most likely due to lack of PodCIDR]
4744 kubelet.go:2510] skipping pod synchronization - [Kubenet does not have netConfig. This is most likely due to lack of PodCIDR]
4744 kubelet.go:2924] Recording NodeReady event message for node worker1.example.com

On the controller I see that the kube-controller-manager has some problems:

[root@controller2 ~]# systemctl status kube-controller-manager.service -l
● kube-controller-manager.service - Kubernetes Controller Manager
   Loaded: loaded (/etc/systemd/system/kube-controller-manager.service; enabled; vendor preset: disabled)
   Active: active (running) since Wed 2016-09-14 13:07:10 CEST; 25min ago
     Docs: https://github.com/GoogleCloudPlatform/kubernetes
 Main PID: 550 (kube-controller)
    Tasks: 5 (limit: 512)
   CGroup: /system.slice/kube-controller-manager.service
           └─550 /usr/bin/kube-controller-manager --allocate-node-cidrs=true --cluster-cidr=10.200.0.0/16 --cluster-name=kubernetes --leader-ele

. . .
13:10:52.772513     550 nodecontroller.go:534] NodeController is entering network segmentation mode.
13:10:52.772630     550 event.go:216] Event(api.ObjectReference{Kind:"Node", Namespace:"", Name:"worker2.example.com", UID:"worker2.example.
13:10:57.775051     550 nodecontroller.go:534] NodeController is entering network segmentation mode.
13:11:02.777334     550 nodecontroller.go:534] NodeController is entering network segmentation mode.
13:11:07.781592     550 nodecontroller.go:534] NodeController is entering network segmentation mode.
13:11:12.784489     550 nodecontroller.go:534] NodeController is entering network segmentation mode.
13:11:17.787018     550 nodecontroller.go:539] NodeController exited network segmentation mode.
13:17:36.729147     550 request.go:347] Field selector: v1 - serviceaccounts - metadata.name - default: need to check if this is versioned correctly.
13:25:32.730591     550 request.go:347] Field selector: v1 - serviceaccounts - metadata.name - default: need to check if this is versioned correctly.

Thanks for great work!

Change Cert directory from /var/run to anything else.

/var/run gets cleaned out after reboots.

Minor typo in kubernetes-the-hard-way/docs/06-kubectl.md

In the section - Configure Kubectl

/var/run/kubernetes/token.csv on the controller nodes

chAng3m3,admin,admin

Should be

/var/lib/kubernetes/token.csv on the controller nodes

chAng3m3,admin,admin

Setup Kubernetes in a bare-metal data center

Hi,
I also read your book on Kubernetes Up and Running. In both this kubernetes-the-hard-way and in the book Kubernetes Up and Running, the examples are there only for GCE.

Will be much helpful if you could help with setting up kubernetes with bare metal multi host servers.

kube-proxy and kubelet have different master values

If the kubelet has CSV list of masters for api-servers, should kube-proxy have the same value for master?

Add some narrative in infra setup section

Why are we not using the IP of the load balanced Kubernetes Control plane in some of the config files?

First, thank you Kelsey for a great HowTo! Very much appreciated! Indeed!

In https://github.com/kelseyhightower/kubernetes-the-hard-way/blob/master/docs/05-kubernetes-worker.md , you create a file /var/lib/kubelet/kubeconfig . You are using a fixed IP of one of the controllers in that file, such as

. . . 
clusters:
- cluster:
    certificate-authority: /var/lib/kubernetes/ca.pem
    server: https://10.240.0.21:6443
  name: kubernetes
. . .

Shouldn't we be able to use the IP of the Kubernetes load balancer, which you setup at the end of the previous step, "Bootstrapping an H/A Kubernetes Control Plane" ? I am not sure why should we ask a worker node to access just one controller and not the IP address of the Kubernetes Control Plane".

The same is observed when you create a systemd file for kube-proxy , in which you specify a fixed IP of only one of the controllers as 'master':

. . . 
[Service]
ExecStart=/usr/bin/kube-proxy \
  --master=https://10.240.0.21:6443 \
  --kubeconfig=/var/lib/kubelet/kubeconfig \
  --proxy-mode=iptables \
  --v=2
. . .

Ideally we should be using the IP of the load balancer which is put in front of the Kubernetes Control Plane? Or is there something wrong with my understanding? Or is there a limitation why we can't use that (load balancer IP) here?

instances failing checks in elb

I'm trying to run a smallish application on my workers.

Its a simple service and deployment, which i'm creating via kubectl

The service and deployment get created fine and it creates the elb fine (using type: LoadBalancer)

The problem is, I'm seeing at least one of my instances dropping from the elb with the following:

Instance has failed at least the UnhealthyThreshold number of health checks consecutively

I've ripped out the workers 3 times and re-built them and everytime I see this

Any ideas?

Creating an ELB - pending state

Hi,

So, having followed the AWS tutorial, i have everything working, which is great!

The problem now, is that I'd like run my application on the workers, but, it seems that I cannot create the service load balancer

My service is something like the following:

apiVersion: v1
kind: Service
metadata:
  name: cms
spec:
  ports:
    - 
      name: http
      port: 80
      protocol: TCP
      targetPort: http
    - 
      name: https
      port: 443
      protocol: TCP
      targetPort: https  
  selector:
    name: cms
  type: LoadBalancer

When I check the service, I see the following:

NAME         CLUSTER-IP   EXTERNAL-IP   PORT(S)          AGE
cms          10.32.0.74   <pending>     80/TCP,443/TCP   10m

I was hoping that the EXTERNAL-IP would be an AWS ELB, but its just stuck in this state

Any ideas?

Document DNS add-on

Add AWS Cloud Provider Integration docs

Add a logging lab

I know that this guide is to setup a minimum working Kubernetes cluster but I feel that some information about sending logs to a centralized logging service would really help. There is a link to the official documentation about this subject but it revolves around specifying a flag at cluster creation and then some magic is supposed to happen in the background... Also, this would help people wanting to configure it for an existing cluster, instead of having to start over.

Maybe a lab on how to configure the cluster to send app and kube logs to an ELK cluster? ELK seems to be the most widely used solution along with fluentd.

Thanks.

port-forward fails, socat not found

I followed all steps through the smoke test. I've tested several tasks/tutorials, and everything has been working. But when I tried a port forwarding exercise, I got this:

kubectl port-forward lobsters-2273945462-s4dr5 8080:3000
Forwarding from 127.0.0.1:8080 -> 3000
Forwarding from [::1]:8080 -> 3000
...
curl localhost:8080
...
E1003 15:41:30.093639 23963 portforward.go:327] an error occurred forwarding 8080 -> 3000: error forwarding port 3000 to pod lobsters-2273945462-s4dr5_default, uid : unable to do port forwarding: socat not found.

So I went to the worker node and installed socat.
Try again.
It works.

rkt

It would be nice to include rkt in this docs as an alternative to docker

Google Cloud Platform free account limited to 8 compute instances

The workshop uses 9 instances but the GCP free account is limited to 8. While not a big deal, it might improve UX to change example to use 8 instances to accommodate the limits on the GCE free account...or perhaps just add a note in the beginning acknowledging this.

DNS does not work from pods ... something is missing ...

Hi again! So I tried to see if DNS works from the pods. Below is my work log , which also includes a solution:

First, the default nginx image does not have any dns client tools in it.

[kamran@kworkhorse ~]$ kubectl exec nginx-2032906785-00uoo -i -t -- "bash"

root@nginx-2032906785-00uoo:/# nslookup kubernetes
bash: nslookup: command not found
root@nginx-2032906785-00uoo:/#

I tried to install nslookup or dig, but apt-get update fails. Apparently it was not able to resolve any names:

root@nginx-2032906785-00uoo:/# apt-get update
Err http://httpredir.debian.org jessie InRelease                               

Err http://httpredir.debian.org jessie-updates InRelease                       

Err http://security.debian.org jessie/updates InRelease                        

Err http://nginx.org jessie InRelease                                          

Err http://httpredir.debian.org jessie Release.gpg                             
  Could not resolve 'httpredir.debian.org'
Err http://security.debian.org jessie/updates Release.gpg
  Could not resolve 'security.debian.org'
Err http://nginx.org jessie Release.gpg
  Could not resolve 'nginx.org'
Err http://httpredir.debian.org jessie-updates Release.gpg
  Could not resolve 'httpredir.debian.org'
Reading package lists... Done
W: Failed to fetch http://httpredir.debian.org/debian/dists/jessie/InRelease  

W: Failed to fetch http://httpredir.debian.org/debian/dists/jessie-updates/InRelease  

W: Failed to fetch http://security.debian.org/dists/jessie/updates/InRelease  

W: Failed to fetch http://nginx.org/packages/mainline/debian/dists/jessie/InRelease  

W: Failed to fetch http://httpredir.debian.org/debian/dists/jessie/Release.gpg  Could not resolve 'httpredir.debian.org'

W: Failed to fetch http://httpredir.debian.org/debian/dists/jessie-updates/Release.gpg  Could not resolve 'httpredir.debian.org'

W: Failed to fetch http://security.debian.org/dists/jessie/updates/Release.gpg  Could not resolve 'security.debian.org'

W: Failed to fetch http://nginx.org/packages/mainline/debian/dists/jessie/Release.gpg  Could not resolve 'nginx.org'

W: Some index files failed to download. They have been ignored, or old ones used instead.
root@nginx-2032906785-00uoo:/#

Though the resolv.conf file on the nginx pod has the DNS entry and looks like this:

root@nginx-2032906785-00uoo:/# cat /etc/resolv.conf 
search default.svc.cluster.local svc.cluster.local cluster.local c.learn-kubernetes-1289.internal google.internal
nameserver 10.32.0.10
options ndots:5
root@nginx-2032906785-00uoo:/#

I have a custom image I use for such troubleshooting. I used that and found out that DNS server is not reachable.

[kamran@kworkhorse ~]$ kubectl run centos-multitool --image=kamranazeem/centos-multitool --replicas=1
deployment "centos-multitool" created

[kamran@kworkhorse ~]$ kubectl get pods
NAME                                READY     STATUS    RESTARTS   AGE
centos-multitool-3822887632-pwlr1   1/1       Running   0          11s
nginx-2032906785-00uoo              1/1       Running   0          20h
nginx-2032906785-4drom              1/1       Running   0          20h
nginx-2032906785-el26y              1/1       Running   0          20h


[kamran@kworkhorse ~]$ kubectl exec centos-multitool-3822887632-pwlr1  -i -t -- "bash"

[root@centos-multitool-3822887632-pwlr1 /]# dig yahoo.com

; <<>> DiG 9.9.4-RedHat-9.9.4-29.el7_2.3 <<>> yahoo.com
;; global options: +cmd
;; connection timed out; no servers could be reached

[root@centos-multitool-3822887632-pwlr1 /]# cat /etc/resolv.conf 
search default.svc.cluster.local svc.cluster.local cluster.local c.learn-kubernetes-1289.internal google.internal
nameserver 10.32.0.10
options ndots:5

[root@centos-multitool-3822887632-pwlr1 /]# dig yahoo.com @10.32.0.10

; <<>> DiG 9.9.4-RedHat-9.9.4-29.el7_2.3 <<>> yahoo.com @10.32.0.10
;; global options: +cmd
;; connection timed out; no servers could be reached
[root@centos-multitool-3822887632-pwlr1 /]#

I checked the name resolution from a worker, using cluster DNS, and it works:

kamran@worker3:~$ dig yahoo.com @10.32.0.10

; <<>> DiG 9.10.3-P4-Ubuntu <<>> yahoo.com @10.32.0.10
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 53713
;; flags: qr rd ra; QUERY: 1, ANSWER: 3, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 512
;; QUESTION SECTION:
;yahoo.com.         IN  A

;; ANSWER SECTION:
yahoo.com.      1799    IN  A   98.139.183.24
yahoo.com.      1799    IN  A   98.138.253.109
yahoo.com.      1799    IN  A   206.190.36.45

;; Query time: 37 msec
;; SERVER: 10.32.0.10#53(10.32.0.10)
;; WHEN: Fri Jul 15 10:13:46 UTC 2016
;; MSG SIZE  rcvd: 86

kamran@worker3:~$

So, maybe it is a firewall thing, which is not letting the pods access the DNS service? I created a (very open) firewall rule to allow DNS traffic:

[kamran@kworkhorse ~]$ gcloud compute firewall-rules create kubernetes-allow-dns \
>   --allow tcp:53,udp:53 \
>   --network kubernetes \
>   --source-ranges 0.0.0.0/0
Created [https://www.googleapis.com/compute/v1/projects/learn-kubernetes-1289/global/firewalls/kubernetes-allow-dns].
NAME                  NETWORK     SRC_RANGES  RULES          SRC_TAGS  TARGET_TAGS
kubernetes-allow-dns  kubernetes  0.0.0.0/0   tcp:53,udp:53
[kamran@kworkhorse ~]$

And now it works!

[kamran@kworkhorse ~]$ kubectl exec centos-multitool-3822887632-pwlr1  -i -t -- "bash"
[root@centos-multitool-3822887632-pwlr1 /]# dig yahoo.com

; <<>> DiG 9.9.4-RedHat-9.9.4-29.el7_2.3 <<>> yahoo.com
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 5006
;; flags: qr rd ra; QUERY: 1, ANSWER: 3, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:
;yahoo.com.         IN  A

;; ANSWER SECTION:
yahoo.com.      1249    IN  A   206.190.36.45
yahoo.com.      1249    IN  A   98.138.253.109
yahoo.com.      1249    IN  A   98.139.183.24

;; Query time: 3 msec
;; SERVER: 10.32.0.10#53(10.32.0.10)
;; WHEN: Fri Jul 15 10:22:56 UTC 2016
;; MSG SIZE  rcvd: 86

[root@centos-multitool-3822887632-pwlr1 /]#

[root@centos-multitool-3822887632-pwlr1 /]# dig kubernetes.default.svc.cluster.local                   

; <<>> DiG 9.9.4-RedHat-9.9.4-29.el7_2.3 <<>> kubernetes.default.svc.cluster.local
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 61700
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:
;kubernetes.default.svc.cluster.local. IN A

;; ANSWER SECTION:
kubernetes.default.svc.cluster.local. 13 IN A   10.32.0.1

;; Query time: 2 msec
;; SERVER: 10.32.0.10#53(10.32.0.10)
;; WHEN: Fri Jul 15 10:23:48 UTC 2016
;; MSG SIZE  rcvd: 81

[root@centos-multitool-3822887632-pwlr1 /]#

Furthermore, I tried to secure it a bit, by recreating this firewall rule only for the pod networks:

[kamran@kworkhorse ~]$ gcloud compute firewall-rules delete kubernetes-allow-dns   
The following firewalls will be deleted:
 - [kubernetes-allow-dns]

Do you want to continue (Y/n)?  y

Deleted [https://www.googleapis.com/compute/v1/projects/learn-kubernetes-1289/global/firewalls/kubernetes-allow-dns].

[kamran@kworkhorse ~]$

[kamran@kworkhorse ~]$ gcloud compute firewall-rules create kubernetes-allow-dns   --allow tcp:53,udp:53   --network kubernetes   --source-ranges 10.200.0.0/16
Created [https://www.googleapis.com/compute/v1/projects/learn-kubernetes-1289/global/firewalls/kubernetes-allow-dns].
NAME                  NETWORK     SRC_RANGES     RULES          SRC_TAGS  TARGET_TAGS
kubernetes-allow-dns  kubernetes  10.200.0.0/16  tcp:53,udp:53
[kamran@kworkhorse ~]$

Lets see if the DNS still works from a pod!

[kamran@kworkhorse ~]$ kubectl exec centos-multitool-3822887632-pwlr1  -i -t -- "bash"
[root@centos-multitool-3822887632-pwlr1 /]# dig yahoo.com 

; <<>> DiG 9.9.4-RedHat-9.9.4-29.el7_2.3 <<>> yahoo.com
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 43623
;; flags: qr rd ra; QUERY: 1, ANSWER: 3, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:
;yahoo.com.         IN  A

;; ANSWER SECTION:
yahoo.com.      968 IN  A   98.139.183.24
yahoo.com.      968 IN  A   206.190.36.45
yahoo.com.      968 IN  A   98.138.253.109

;; Query time: 4 msec
;; SERVER: 10.32.0.10#53(10.32.0.10)
;; WHEN: Fri Jul 15 10:27:37 UTC 2016
;; MSG SIZE  rcvd: 86

[root@centos-multitool-3822887632-pwlr1 /]# dig kubernetes.default.svc.cluster.local      

; <<>> DiG 9.9.4-RedHat-9.9.4-29.el7_2.3 <<>> kubernetes.default.svc.cluster.local
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 52286
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:
;kubernetes.default.svc.cluster.local. IN A

;; ANSWER SECTION:
kubernetes.default.svc.cluster.local. 21 IN A   10.32.0.1

;; Query time: 2 msec
;; SERVER: 10.32.0.10#53(10.32.0.10)
;; WHEN: Fri Jul 15 10:27:40 UTC 2016
;; MSG SIZE  rcvd: 81

[root@centos-multitool-3822887632-pwlr1 /]#

It works! Hurray!

kubectl connection error

$ kubectl get componentstatuses
Unable to connect to the server: dial tcp 104.197.211.40:6443: i/o timeout

$ ping 104.197.211.40
PING 104.197.211.40 (104.197.211.40): 56 data bytes
64 bytes from 104.197.211.40: icmp_seq=0 ttl=56 time=0.615 ms
64 bytes from 104.197.211.40: icmp_seq=1 ttl=56 time=0.475 ms
^C--- 104.197.211.40 ping statistics ---
2 packets transmitted, 2 packets received, 0% packet loss
round-trip min/avg/max/stddev = 0.475/0.545/0.615/0.070 ms

I cannot connect to k8s from from a remote client, the same commands work from any of the VMs.
I am using the browser cloud shell

Below is my setup

$ gcloud compute instances list
NAME         ZONE           MACHINE_TYPE   PREEMPTIBLE  INTERNAL_IP  EXTERNAL_IP      STATUS
controller0  us-central1-a  n1-standard-1               10.240.0.20  23.251.156.238   RUNNING
controller1  us-central1-a  n1-standard-1               10.240.0.21  162.222.183.196  RUNNING
controller2  us-central1-a  n1-standard-1               10.240.0.22  104.154.108.92   RUNNING
etcd0        us-central1-a  n1-standard-1               10.240.0.10  104.197.211.42   RUNNING
etcd1        us-central1-a  n1-standard-1               10.240.0.11  130.211.175.0    RUNNING
etcd2        us-central1-a  n1-standard-1               10.240.0.12  146.148.96.80    RUNNING
worker0      us-central1-a  n1-standard-1               10.240.0.30  146.148.63.39    RUNNING
worker1      us-central1-a  n1-standard-1               10.240.0.31  104.197.65.1     RUNNING

==============================================================
$ gcloud compute firewall-rules list --filter "network=kubernetes"
NAME                         NETWORK     SRC_RANGES      RULES                         SRC_TAGS  TARGET_TAGS
kubernetes-allow-api-server  kubernetes  0.0.0.0/0       tcp:6443
kubernetes-allow-healthz     kubernetes  130.211.0.0/22  tcp:8080
kubernetes-allow-icmp        kubernetes  0.0.0.0/0       icmp
kubernetes-allow-internal    kubernetes  10.240.0.0/24   tcp:0-65535,udp:0-65535,icmp
kubernetes-allow-rdp         kubernetes  0.0.0.0/0       tcp:3389
kubernetes-allow-ssh         kubernetes  0.0.0.0/0       tcp:22

==============================================================
$ gcloud compute addresses list kubernetes
NAME        REGION       ADDRESS         STATUS
kubernetes  us-central1  104.197.211.40  IN_USE

==============================================================
$ gcloud compute forwarding-rules list
NAME             REGION       IP_ADDRESS      IP_PROTOCOL  TARGET
kubernetes-rule  us-central1  104.197.211.40  TCP          us-central1/targetPools/kubernetes-pool

AWS tags

Hey @kelseyhightower, really awesome guide, deep dive and straight to the point!

Question, after reading the Kubernetes "AWS under the hood" doc, it says that you should apply a tag called "KubernetesCluster" to each AWS resource which Kubernetes then uses for filtering - https://github.com/kubernetes/kubernetes/blob/release-1.4/docs/design/aws_under_the_hood.md#tagging

Does your setup avoid the need for this tag? or should we add it if we want more integration with AWS?

Thanks!

Could you add a lab to support the type LoadBalancer for exposing services to Internet?

By following the instructions in "kubernetes-the-hard-way", I was able to setup the cluster and run deployments. But, I couldn't get the applications exposed in AWS externally. I think it would be helpful to add a lab for that.

Thank you.

Add note about upgrading to latest gcloud

the syntax for gcloud compute forwarding-rules create changed in newer versions

Generate one TLS certificate pair for each node.

Access to apiservers fails from any container

From kube-dns:

E0713 20:38:46.910179       1 reflector.go:216] pkg/dns/dns.go:154: Failed to list *api.Endpoints: Get https://10.32.0.1/api/v1/endpoints?resourceVersion=0: dial tcp 10.32.0.1:443: i/o timeout
E0713 20:38:46.910318       1 reflector.go:216] pkg/dns/dns.go:155: Failed to list *api.Service: Get https://10.32.0.1/api/v1/services?resourceVersion=0: dial tcp 10.32.0.1:443: i/o timeout
I0713 20:38:46.910438       1 dns.go:172] Ignoring error while waiting for service default/kubernetes: Get https://10.32.0.1/api/v1/namespaces/default/services/kubernetes: dial tcp 10.32.0.1:443: i/o timeout. Sleeping 1s before retrying.

From random pod:

$ kubectl run curl --image=tutum/curl:latest yes
deployment "curl" created

$ kubectl exec -it curl-14193874-k3esx bash

# curl https://10.32.0.1
curl: (7) Failed to connect to 10.32.0.1 port 443: Connection timed out

# curl https://10.240.0.20:6443
curl: (7) Failed to connect to 10.240.0.20 port 6443: Connection timed out

Routes:

$ gcloud compute routes list
NAME                            NETWORK     DEST_RANGE     NEXT_HOP                  PRIORITY
default-route-4e889f0c670475f9  default     0.0.0.0/0      default-internet-gateway  1000
default-route-6aca960aaec01e82  kubernetes  0.0.0.0/0      default-internet-gateway  1000
default-route-b3f6dfcccde00a9b  kubernetes  10.240.0.0/24                            1000
default-route-fe4c4def818434df  default     10.240.0.0/16                            1000
kubernetes-route-10-200-0-0-24  kubernetes  10.200.0.0/24  10.240.0.30               1000
kubernetes-route-10-200-1-0-24  kubernetes  10.200.1.0/24  10.240.0.31               1000
kubernetes-route-10-200-2-0-24  kubernetes  10.200.2.0/24  10.240.0.32               1000

Firewall rules:

$ gcloud compute firewall-rules list
NAME                         NETWORK     SRC_RANGES      RULES                         SRC_TAGS  TARGET_TAGS
default-allow-icmp           default     0.0.0.0/0       icmp
default-allow-internal       default     10.240.0.0/16   tcp:1-65535,udp:1-65535,icmp
default-allow-ssh            default     0.0.0.0/0       tcp:22
kubernetes-allow-api-server  kubernetes  0.0.0.0/0       tcp:6443
kubernetes-allow-healthz     kubernetes  130.211.0.0/22  tcp:8080
kubernetes-allow-icmp        kubernetes  0.0.0.0/0       icmp
kubernetes-allow-internal    kubernetes  10.240.0.0/24   tcp:0-65535,udp:0-65535,icmp
kubernetes-allow-rdp         kubernetes  0.0.0.0/0       tcp:3389
kubernetes-allow-ssh         kubernetes  0.0.0.0/0       tcp:22

What am I missing?

Ordering?

This is so great! Can I make a suggestion? Can you put numbers before each file similar to rc.d to show people the order things are used in?

Kubedns not working

Doing the setup on AWS. I've tried the guide a 2nd time thinking I made a mistake somewhere. There's either a flaw or I'm doing the same mistake.

After creating the kubedns deployment I get:

NAME                           READY     STATUS    RESTARTS   AGE
kube-dns-v19-965658604-7xmto   2/3       Running   1          2m
kube-dns-v19-965658604-cb362   2/3       Running   1          2m

after a little while it's clear things won't get better:

NAME                           READY     STATUS             RESTARTS   AGE
kube-dns-v19-965658604-7xmto   2/3       CrashLoopBackOff   6          11m
kube-dns-v19-965658604-cb362   2/3       CrashLoopBackOff   6          11m

Docker logs of kube-dns container:

I0927 10:07:03.811974       1 server.go:94] Using https://10.32.0.1:443 for kubernetes master, kubernetes API: <nil>
I0927 10:07:03.814879       1 server.go:99] v1.4.0-alpha.2.1652+c69e3d32a29cfa-dirty
I0927 10:07:03.814974       1 server.go:101] FLAG: --alsologtostderr="false"
I0927 10:07:03.815017       1 server.go:101] FLAG: --dns-port="10053"
I0927 10:07:03.815051       1 server.go:101] FLAG: --domain="cluster.local."
I0927 10:07:03.815100       1 server.go:101] FLAG: --federations=""
I0927 10:07:03.815132       1 server.go:101] FLAG: --healthz-port="8081"
I0927 10:07:03.815161       1 server.go:101] FLAG: --kube-master-url=""
I0927 10:07:03.815190       1 server.go:101] FLAG: --kubecfg-file=""
I0927 10:07:03.815220       1 server.go:101] FLAG: --log-backtrace-at=":0"
I0927 10:07:03.815272       1 server.go:101] FLAG: --log-dir=""
I0927 10:07:03.815302       1 server.go:101] FLAG: --log-flush-frequency="5s"
I0927 10:07:03.815331       1 server.go:101] FLAG: --logtostderr="true"
I0927 10:07:03.815360       1 server.go:101] FLAG: --stderrthreshold="2"
I0927 10:07:03.815404       1 server.go:101] FLAG: --v="0"
I0927 10:07:03.815435       1 server.go:101] FLAG: --version="false"
I0927 10:07:03.815466       1 server.go:101] FLAG: --vmodule=""
I0927 10:07:03.815531       1 server.go:138] Starting SkyDNS server. Listening on port:10053
I0927 10:07:03.815626       1 server.go:145] skydns: metrics enabled on : /metrics:
I0927 10:07:03.815666       1 dns.go:167] Waiting for service: default/kubernetes
I0927 10:07:03.816385       1 logs.go:41] skydns: ready for queries on cluster.local. for tcp://0.0.0.0:10053 [rcache 0]
I0927 10:07:03.816455       1 logs.go:41] skydns: ready for queries on cluster.local. for udp://0.0.0.0:10053 [rcache 0]
I0927 10:07:11.645924       1 logs.go:41] skydns: failure to forward request "read udp 10.200.1.2:37292->10.240.0.2:53: i/o timeout"
I0927 10:07:16.651339       1 logs.go:41] skydns: failure to forward request "read udp 10.200.1.2:44119->10.240.0.2:53: i/o timeout"
I0927 10:07:21.656743       1 logs.go:41] skydns: failure to forward request "read udp 10.200.1.2:51574->10.240.0.2:53: i/o timeout"
I0927 10:07:26.810289       1 logs.go:41] skydns: failure to forward request "read udp 10.200.1.2:44669->10.240.0.2:53: i/o timeout"
I0927 10:07:31.714975       1 logs.go:41] skydns: failure to forward request "read udp 10.200.1.2:55292->10.240.0.2:53: i/o timeout"
I0927 10:07:33.816486       1 dns.go:173] Ignoring error while waiting for service default/kubernetes: Get https://10.32.0.1:443/api/v1/namespaces/default/services/kubernetes: dial tcp 10.32.0.1:443: i/o timeout. Sleeping 1s before retrying.
E0927 10:07:33.817358       1 reflector.go:214] pkg/dns/dns.go:156: Failed to list *api.Service: Get https://10.32.0.1:443/api/v1/services?resourceVersion=0: dial tcp 10.32.0.1:443: i/o timeout
E0927 10:07:33.817416       1 reflector.go:214] pkg/dns/dns.go:155: Failed to list *api.Endpoints: Get https://10.32.0.1:443/api/v1/endpoints?resourceVersion=0: dial tcp 10.32.0.1:443: i/o timeout
I0927 10:07:36.720983       1 logs.go:41] skydns: failure to forward request "read udp 10.200.1.2:49125->10.240.0.2:53: i/o timeout"
I0927 10:07:41.810468       1 logs.go:41] skydns: failure to forward request "read udp 10.200.1.2:35071->10.240.0.2:53: i/o timeout"
I0927 10:07:46.731315       1 logs.go:41] skydns: failure to forward request "read udp 10.200.1.2:54565->10.240.0.2:53: i/o timeout"

I'm learning kubernetes via this guide, so I'm not sure what to look for. I tried checking connectivity:

ubuntu@ip-10-240-0-32:~$ nc -v 10.32.0.1 443
Connection to 10.32.0.1 443 port [tcp/https] succeeded!

Any help is appreciated.

Cannot view api via browser - Unauthorized

Having gone through the tutorial step by step, when I list the cluster info kubectl cluster-info, I see the following:

Kubernetes master is running at https://kubernetes-xxxxxxxx.eu-west-1.elb.amazonaws.com:6443

Going to that URL, gives me a 401 Unauthorized.

I'm guessing it should either redirect me to the api list, or, ask for basic auth user/password?

kube-dns: Ignoring error while waiting for service default/kubernetes: the server has asked for the client to provide credentials

After kube-dns deployment container generates error message:

I1005 09:08:14.257581       1 server.go:94] Using https://10.32.0.1:443 for kubernetes master, kubernetes API: <nil>
I1005 09:08:14.258434       1 server.go:99] v1.4.0-alpha.2.1652+c69e3d32a29cfa-dirty
I1005 09:08:14.258502       1 server.go:101] FLAG: --alsologtostderr="false"
I1005 09:08:14.258518       1 server.go:101] FLAG: --dns-port="10053"
I1005 09:08:14.258525       1 server.go:101] FLAG: --domain="cluster.local."
I1005 09:08:14.258531       1 server.go:101] FLAG: --federations=""
I1005 09:08:14.258537       1 server.go:101] FLAG: --healthz-port="8081"
I1005 09:08:14.258542       1 server.go:101] FLAG: --kube-master-url=""
I1005 09:08:14.258547       1 server.go:101] FLAG: --kubecfg-file=""
I1005 09:08:14.258551       1 server.go:101] FLAG: --log-backtrace-at=":0"
I1005 09:08:14.258557       1 server.go:101] FLAG: --log-dir=""
I1005 09:08:14.258571       1 server.go:101] FLAG: --log-flush-frequency="5s"
I1005 09:08:14.258577       1 server.go:101] FLAG: --logtostderr="true"
I1005 09:08:14.258583       1 server.go:101] FLAG: --stderrthreshold="2"
I1005 09:08:14.258588       1 server.go:101] FLAG: --v="0"
I1005 09:08:14.258592       1 server.go:101] FLAG: --version="false"
I1005 09:08:14.258598       1 server.go:101] FLAG: --vmodule=""
I1005 09:08:14.258708       1 server.go:138] Starting SkyDNS server. Listening on port:10053
I1005 09:08:14.258775       1 server.go:145] skydns: metrics enabled on : /metrics:
I1005 09:08:14.258786       1 dns.go:167] Waiting for service: default/kubernetes
I1005 09:08:14.259118       1 logs.go:41] skydns: ready for queries on cluster.local. for tcp://0.0.0.0:10053 [rcache 0]
I1005 09:08:14.259128       1 logs.go:41] skydns: ready for queries on cluster.local. for udp://0.0.0.0:10053 [rcache 0]
I1005 09:08:14.459951       1 dns.go:173] Ignoring error while waiting for service default/kubernetes: the server has asked for the client to provide credentials (get services kubernetes). Sleeping 1s before retrying.
E1005 09:08:14.460066       1 reflector.go:214] pkg/dns/dns.go:155: Failed to list *api.Endpoints: the server has asked for the client to provide credentials (get endpoints)
E1005 09:08:14.460108       1 reflector.go:214] pkg/dns/dns.go:156: Failed to list *api.Service: the server has asked for the client to provide credentials (get services)

docker inspect of kube-dns container:
https://gist.github.com/r0bj/25474e6eba5a9fc5cc38ca63ba28d5e7

Get rid of NamespaceExists in admission control

NamespaceLifecycle is enough

Add instructions on creating a project and setting default region

Using CNI as overlay networking

Hi @kelseyhightower
Although I can see the that Layer 3 routing is used in these labs to allow inter-PODs communication when they're running on different worker nodes. I was hoping to see the leverage of CNI software as the underpinning networking technology. Probably would be interested to see how to use an overlay network with CNI interface with the built-in plugins.

Thanks a lot for this amazing work!

Error from server: the server has asked for the client to provide credentials

Don't know what I'm doing wrong but I can't use kubectl.

More info: cat .kube/config

apiVersion: v1
clusters:
- cluster:
    certificate-authority-data: 
    MYCERT (ca.pem)
    server: https://23.251.140.185:6443
  name: kubernetes-the-hard-way
contexts:
- context:
    cluster: kubernetes-the-hard-way
    user: admin
  name: default-context
current-context: default-context
kind: Config
preferences: {}
users:
- name: admin
  user:
    token: MYPASSWORD

Another thing, I have created a new firewall rule so I can access my cluster outside my project.

Thanks.

kube-proxy just uses one api-server

Hi,

I'm looking at the 05-kubernetes-worker page, in the kubelet systemd file you have the --api-servers to list the 3 already configured but for the kube-proxy just the one master is set. Should the master bet set to some externally load balanced address or does it not matter ?

thanks

Kubernetes binaries or kubelet with manifests ?

Great guide for everyone, well done.
This guide definitely explains in details how to put all the pieces together. Wish I had something like this a year ago :)

I know that this guide is just a walk through, but there are some details that might create confusion. I already have an automated deployment based on CoreOS and wanted to know if this is "the right" way to automate in production.

No need for flannel anymore ? CNI will take care of that, right ?
Is it ok to use the Kubernetes binaries as in this documentation or is it better to use the CoreOS way (https://coreos.com/kubernetes/docs/latest/deploy-master.html), using kubelet with :

--config=/etc/kubernetes/manifests ?

Is it just me or there something missing with the 10.32.0.0/24 services network

Trying to figure out where this is routed?

I can only find it declared then used in the skydns pod, when i try deploy the pod i get.

$ kubectl create -f https://raw.githubusercontent.com/kelseyhightower/kubernetes-the-hard-way/master/skydns-svc.yaml
The Service "kube-dns" is invalid.
spec.clusterIP: Invalid value: "10.32.0.10": provided IP is not in the valid range

So close to having this working on AWS, just need to figure out this.

Cheers for your efforts so far!