kairen / kube-ansible Goto Github PK
View Code? Open in Web Editor NEWBuild a Kubernetes cluster via Ansible playbook. :wrench: :wrench: :wrench:
License: Apache License 2.0
Build a Kubernetes cluster via Ansible playbook. :wrench: :wrench: :wrench:
License: Apache License 2.0
你好,有幸能够了解到本项目,在搭建时有一些小疑问还请大神能够指点一下,谢谢。
priority
值越大才是主节点,其余为备用节点,但本项目中priority
值最小的只有一个节点priority
配置文件:还请大神能够在百忙之中指点一二,万分感激,谢谢。
Hello,
It's more than an issue, it's a question about how and what is the best practices to upgrade a cluster, knowing that I use your playbook to deploy the cluster ?
Thank you
Hello,
I'm trying to use your code to setup a HA k8s but unfortunately I get this error :
fatal: [Master-k8s-2]: FAILED! => {"changed": false, "failed": true, "msg": "Unable to start service etcd: Job for etcd.service failed because a timeout was exceeded. See "systemctl status etcd.service" and "journalctl -xe" for details.\n"}
and when I check the logs I see error: tls: bad certificate)
do you have any idea why ?
Thank you for your reply
Hi
When i use your repo to start a cluster, everything works great, but i have one issue.
When the cert signing for the master node has to happen, the request gets created, but it keeps in a pending state, the script continue and completes.
When i log into the cluster all the nodes are there but the master, the one where the cert signing is pending.
I reset the cluster and started the playbook again, this time keeping an eye for when the cert signing request gets done, and then I approved the request by hand, and that fixed the issue, and i could see all nodes on the cluster.
What could cause the cert signing to be stuck in pending, and i have to do it by hand??
Thank you for he awesome work so far, and any help will be greatly appreciated.
Cheers
I'm getting this error while deploying:
TASK [k8s-setup : Copy Keepalived manifest and config files into cluster] ***********************************************************************************************************************************************************************************************************************************************************
Wednesday 02 October 2019 13:17:18 +0200 (0:00:00.666) 0:04:54.832 *****
fatal: [172.16.202.130]: FAILED! => {"changed": false, "msg": "AnsibleUndefinedVariable: 'ansible.vars.hostvars.HostVarsVars object' has no attribute 'ansible_eth1'"}
I tried to fix this by adding ansible_eth1: eth1
to the group_vars. But no I'm getting this error:
TASK [k8s-setup : Copy Keepalived manifest and config files into cluster] ***********************************************************************************************************************************************************************************************************************************************************
Wednesday 02 October 2019 13:30:26 +0200 (0:00:00.679) 0:04:42.554 *****
fatal: [172.16.202.130]: FAILED! => {"changed": false, "msg": "AnsibleUndefinedVariable: 'ansible.parsing.yaml.objects.AnsibleUnicode object' has no attribute 'ipv4'"}
Please advise. Ansible used:
ansible 2.8.5
config file = /Users/tdeutsch/git/kube-ansible/ansible.cfg
configured module search path = ['/Users/tdeutsch/.ansible/plugins/modules', '/usr/share/ansible/plugins/modules']
ansible python module location = /usr/local/Cellar/ansible/2.8.5/libexec/lib/python3.7/site-packages/ansible
executable location = /usr/local/bin/ansible
python version = 3.7.4 (default, Sep 7 2019, 18:27:02) [Clang 10.0.1 (clang-1001.0.46.4)]
TASK [k8s-setup : Create kube-apiserver to kubelet RBAC] **************************************************************************************************************************************************
Tuesday 20 April 2021 20:30:08 +0800 (0:00:01.632) 0:03:44.899 *********
FAILED - RETRYING: Create kube-apiserver to kubelet RBAC (10 retries left).
FAILED - RETRYING: Create kube-apiserver to kubelet RBAC (9 retries left).
FAILED - RETRYING: Create kube-apiserver to kubelet RBAC (8 retries left).
FAILED - RETRYING: Create kube-apiserver to kubelet RBAC (7 retries left).
FAILED - RETRYING: Create kube-apiserver to kubelet RBAC (6 retries left).
FAILED - RETRYING: Create kube-apiserver to kubelet RBAC (5 retries left).
FAILED - RETRYING: Create kube-apiserver to kubelet RBAC (4 retries left).
FAILED - RETRYING: Create kube-apiserver to kubelet RBAC (3 retries left).
FAILED - RETRYING: Create kube-apiserver to kubelet RBAC (2 retries left).
FAILED - RETRYING: Create kube-apiserver to kubelet RBAC (1 retries left).
fatal: [10.85.246.115 -> 10.85.246.115]: FAILED! => {"attempts": 10, "changed": true, "cmd": ["/usr/local/bin/kubectl", "--kubeconfig=/etc/kubernetes/admin.conf", "apply", "-f", "/tmp/apiserver-to-kubelet-rbac.yml"], "delta": "0:00:30.385037", "end": "2021-04-20 20:36:06.896236", "msg": "non-zero return code", "rc": 1, "start": "2021-04-20 20:35:36.511199", "stderr": "Unable to connect to the server: dial tcp 10.85.247.115:6443: i/o timeout", "stderr_lines": ["Unable to connect to the server: dial tcp 10.85.247.115:6443: i/o timeout"], "stdout": "", "stdout_lines": []}
Hi, I hit below errors, when I first success install it and reset it, then install it again ,have below errors:
fatal: [192.168.4.91 -> 192.168.4.91]: FAILED! => {"attempts": 10, "changed": true, "cmd": ["/usr/local/bin/kubectl", "--kubeconfig=/etc/kubernetes/admin.conf", "apply", "-f", "/tmp/apiserver-to-kubelet-rbac.yml"], "delta": "0:00:02.393257", "end": "2019-07-16 09:45:25.181932", "msg": "non-zero return code", "rc": 1, "start": "2019-07-16 09:45:22.788675", "stderr": "unable to recognize \"/tmp/apiserver-to-kubelet-rbac.yml\": Get https://192.168.4.91:8443/api?timeout=32s: x509: certificate signed by unknown authority (possibly because of \"crypto/rsa: verification error\" while trying to verify candidate authority certificate \"kubernetes\")\nunable to recognize \"/tmp/apiserver-to-kubelet-rbac.yml\": Get https://192.168.4.91:8443/api?timeout=32s: x509: certificate signed by unknown authority (possibly because of \"crypto/rsa: verification error\" while trying to verify candidate authority certificate \"kubernetes\")", "stderr_lines": ["unable to recognize \"/tmp/apiserver-to-kubelet-rbac.yml\": Get https://192.168.4.91:8443/api?timeout=32s: x509: certificate signed by unknown authority (possibly because of \"crypto/rsa: verification error\" while trying to verify candidate authority certificate \"kubernetes\")", "unable to recognize \"/tmp/apiserver-to-kubelet-rbac.yml\": Get https://192.168.4.91:8443/api?timeout=32s: x509: certificate signed by unknown authority (possibly because of \"crypto/rsa: verification error\" while trying to verify candidate authority certificate \"kubernetes\")"], "stdout": "", "stdout_lines": []}
kube-ansible can be updated without work to kube_version: 1.14.7 in kube-ansible/inventory/group_vars/all.yml
I cannot upgrade to (latest)stable nor to kube_version: 1.15.X
setup completed zero errors on cent OS 7 vm's running in vmware.
kubectl retrieves everything, get pods services describe etc.
kubectl proxy then running
http://localhost:8001/api/v1/namespaces/kube-system/services/https:kubernetes-dashboard:/proxy/
get's me this
Error: 'dial tcp 10.244.1.6:8443: connect: network is unreachable'
Trying to reach: 'https://10.244.1.6:8443/'
kubectl -n kube-system get po,svc
NAME READY STATUS RESTARTS AGE
pod/calico-node-7gvv8 2/2 Running 4 6h
pod/calico-node-gqxz2 2/2 Running 4 6h
pod/coredns-6d98b868c7-slk7m 1/1 Running 1 4h
pod/elasticsearch-logging-0 1/1 Running 1 3h
pod/fluentd-es-8tw67 1/1 Running 1 3h
pod/fluentd-es-q6fj6 1/1 Running 1 3h
pod/kibana-logging-56c4d58dcd-blcxh 1/1 Running 1 3h
pod/kube-proxy-7vqh5 1/1 Running 3 6h
pod/kube-proxy-vb2fs 1/1 Running 3 6h
pod/kubernetes-dashboard-6948bdb78-wb59s 1/1 Running 1 4h
pod/metrics-server-86bd9d7667-lwpg6 1/1 Running 1 4h
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/calico-typha ClusterIP 10.111.173.6 5473/TCP 6h
service/elasticsearch-logging ClusterIP 10.111.176.201 9200/TCP 3h
service/kibana-logging ClusterIP 10.111.132.176 5601/TCP 3h
service/kube-controller-manager ClusterIP None 10252/TCP 1h
service/kube-dns ClusterIP 10.96.0.10 53/UDP,53/TCP,9153/TCP 6h
service/kube-scheduler ClusterIP None 10251/TCP 1h
service/kubelet ClusterIP None 10250/TCP 3h
service/kubernetes-dashboard ClusterIP 10.106.191.94 443/TCP 6h
service/metrics-server ClusterIP 10.103.82.224 443/TCP 6h
What else can I check?
In the version v1.6.0 later, Kubernetes default use RBAC auth mode.
Trying to run a vagrant build based upon centos7, after a few minutes this error keeps appearing.
TASK [etcd : Copy Etcd conf template file] *************************************
fatal: [172.16.35.10]: FAILED! => {"changed": false, "failed": true, "msg": "AnsibleUndefinedVariable: https://{{ etcd_ip_addr }}:{{ etcd_peer_port }}: {% if etcd_iface != '' %}{{ hostvars[inventory_hostname]['ansible_' + etcd_iface].ipv4.address }} {%- else %}{{ hostvars[inventory_hostname].ansible_default_ipv4.address }}{% endif %}: 'dict object' has no attribute u'ansible_eth1'"}
Tried to run the vagrant build using ubuntu as os, this runs without errors.
Hello,
first of all thank you for this amazing work,
I have a question about the ingress controller, I always get this:
curl -H "Host: game.domain1.io" 192.168.232.133
curl: (7) Failed connect to 192.168.232.133:80; Connection refused
thank you
Add vagrant new provider for Hyper-v, refer: https://www.vagrantup.com/docs/hyperv/configuration.html
Hello,
It's more than an issue, it's a question about how and what is the best practices to upgrade a cluster, knowing that I use your playbook to deploy the cluster ?
Thank you
Is there a way to deploy the default environment (1b, 2w, 1c, 2048m...) without the "Start deploying? (y)" prompt?
Tried ./tools/setup --force but didn't work
./tools/setup -f
./tools/func-vars: line 109: 2: unbound variable
I'd like to run kube-ansible within my sh script and deploy automatically
Masters and Nodes they all use different kubelet config file. What if the master is also the node ?
I think the best method is to untaint masters and let pods run on them.
Thank you
This feature wants to add CRI-O for container runtime, see the link at https://github.com/kubernetes-incubator/cri-o.
upon running the shell script: ./hack/setup-vms
I get error:
TASK [k8s-setup : Wait for Kubernetes core component start] ********************
Thursday 02 May 2019 14:12:40 -0400 (0:00:00.353) 0:03:15.687 **********
failed: [192.168.1.10] (item=6443) => {"changed": false, "elapsed": 300, "item": 6443, "msg": "Timeout when waiting for 127.0.0.1:6443"}```
PLAY RECAP *********************************************************************
192.168.1.10 : ok=140 changed=77 unreachable=0 failed=1
192.168.1.12 : ok=28 changed=19 unreachable=0 failed=0
192.168.1.13 : ok=28 changed=19 unreachable=0 failed=0
k8s-setup : Wait for Kubernetes core component start ------------------ 423.15s
download/package : Downloading kubelet file ---------------------------- 81.90s
download/package : Downloading kubectl file ---------------------------- 28.81s
download/package : Downloading docker file ----------------------------- 15.01s
download/package : Downloading cni file --------------------------------- 4.37s
download/package : Downloading cfssl file ------------------------------- 4.25s
download/package : Extract docker file ---------------------------------- 3.69s
download/package : Downloading etcd file -------------------------------- 3.27s
cert : Generate Kubernetes SSL certificate json files ------------------- 2.05s
cert : Create Kubernetes SSL certificate key files ---------------------- 1.82s
common/copy-files : Write the content of files -------------------------- 1.67s
k8s-setup : Copy Kubernetes manifest and config files into cluster ------ 1.49s
cert : Delete unnecessary Kubernetes files ------------------------------ 1.48s
download/package : Downloading cfssljson file --------------------------- 1.43s
common/copy-files : Check the files already exists ---------------------- 1.41s
common/copy-files : Read the config files ------------------------------- 1.28s
download/package : Extract cni file ------------------------------------- 1.24s
download/package : Symlinks docker to /usr/local/bin -------------------- 1.16s
download/package : Extract etcd file ------------------------------------ 0.99s
etcd : Enable and restart etcd ------------------------------------------ 0.93s
Ansible failed to complete successfully. Any error output should be
visible above. Please fix these errors and try again.
==> k8s-n3: The previous process exited with exit code 1.
==> k8s-n2: The previous process exited with exit code 1.
==> k8s-n1: The previous process exited with exit code 1.
==> k8s-m1: The previous process exited with exit code 1.
TASK [cert : Create etcd SSL certificate key files] ***********************************************************************************************************************
Tuesday 12 February 2019 15:11:02 -0500 (0:00:00.194) 0:00:10.056 ******
fatal: [k8s-m1]: FAILED! => {"msg": "The task includes an option with an undefined variable. The error was: 'ansible.vars.hostvars.HostVarsVars object' has no attribute 'ansible_default_ipv4'\n\nThe error appears to have been in '/home/mike/kube-ansible/roles/cert/tasks/create-etcd-certs.yml': line 49, column 3, but may\nbe elsewhere in the file depending on the exact syntax problem.\n\nThe offending line appears to be:\n\n\n- name: Create etcd SSL certificate key files\n ^ here\n"}
Hi there
during the ansible run to create the cluster, i get stuck at "[k8s-setup : Wait for Kubernetes core component start]"
with error:
"11:14:24.197345 11914 reflector.go:205] k8s.io/kubernetes/pkg/kubelet/kubelet.go:464: Failed to list *v1.Node: Get https://:8443/api/v1/nodes?fieldSelector=metadata.name%3Dcptk8spoc01.capetown.fwslash.net&limit=500&resourceVersion=0: tls: either ServerName or InsecureSkipVerify must be specified in the tls.Config"
Also found that the docker network is on 172.x.x.x and so is our environment.
Any idea what this could be
After installing k8s, I try to open the dashboard (https://server-ip:8443/api/v1/namespaces/kube-system/services/https:kubernetes-dashboard:/proxy/) and get an error:
Error: 'dial tcp 10.244.0.24:8443: i/o timeout' Trying to reach: 'https://10.244.0.24:8443/'
Can you tell me what the problem might be?
Test cluster:
$ kubectl -n kube-system get po,svc
pod/calico-node-cwpt9 2/2 Running 0 14m
pod/coredns-896d9f87d-2nlfs 1/1 Running 0 15m
pod/coredns-autoscaler-58784cd54d-svlwv 1/1 Running 0 15m
pod/kube-proxy-lrj6c 1/1 Running 0 14m
pod/kubernetes-dashboard-57df4db6b-6w49h 1/1 Running 0 10m
pod/metrics-server-68d85f76bb-7n2n5 1/1 Running 0 8m15s
service/calico-typha ClusterIP 10.106.84.71 5473/TCP 14m
service/kube-controller-manager ClusterIP None 10252/TCP 8m37s
service/kube-dns ClusterIP 10.96.0.10 53/UDP,53/TCP,9153/TCP 15m
service/kube-scheduler ClusterIP None 10251/TCP 8m37s
service/kubelet ClusterIP None 10250/TCP 6m37s
service/kubernetes-dashboard ClusterIP 10.103.103.89 443/TCP 10m
service/metrics-server ClusterIP 10.106.81.208 443/TCP 8m15s
Hi, I am running a couple of applications that checked for API server certificate and your ansible doesn't seem to contain the right hostnames (the default / usual API hostname is kubernetes.default.svc, but your certificate seems to have only kubernetes.default). Where could I changed that? Thanks
Hi
Firstly, thank you for the awesome work with this repo.
I am stuck in my process of creating a cluster.
The script runs fine until it gets to the: "TASK [k8s-setup : Create kube-apiserver to kubelet RBAC]" part.
Error im getting:
"fatal: [172.24.49.100 -> 172.24.49.100]: FAILED! => {"attempts": 10, "changed": true, "cmd": ["/usr/local/bin/kubectl", "--kubeconfig=/etc/kubernetes/admin.conf", "apply", "-f", "/tmp/apiserver-to-kubelet-rbac.yml"], "delta": "0:00:00.074493", "end": "2018-09-06 14:30:49.764857", "msg": "non-zero return code", "rc": 1, "start": "2018-09-06 14:30:49.690364", "stderr": "unable to recognize "/tmp/apiserver-to-kubelet-rbac.yml": Get https://:8443/api?timeout=32s: tls: either ServerName or InsecureSkipVerify must be specified in the tls.Config\nunable to recognize "/tmp/apiserver-to-kubelet-rbac.yml": Get https://:8443/api?timeout=32s: tls: either ServerName or InsecureSkipVerify must be specified in the tls.Config", "stderr_lines": ["unable to recognize "/tmp/apiserver-to-kubelet-rbac.yml": Get https://:8443/api?timeout=32s: tls: either ServerName or InsecureSkipVerify must be specified in the tls.Config", "unable to recognize "/tmp/apiserver-to-kubelet-rbac.yml": Get https://:8443/api?timeout=32s: tls: either ServerName or InsecureSkipVerify must be specified in the tls.Config"], "stdout": "", "stdout_lines": []}"
Im also getting this error whenever i try to use kubectl
Unable to connect to the server: tls: either ServerName or InsecureSkipVerify must be specified in the tls.Config
So it would seem that the issue is that nothing can invoke the command kubectl
Please, any help will be greatly appreciated
first of all, i love your project. it was quite helpful for me to get k8s with hyperconverged ceph running in ansible. Do you have an example storageclass based on this project? I have been trying to create one but im not having luck. it gives an error about incorrect secrets and i have been trying a different number of id and secret names and have not had luck. I attached one version of the example below.
---
apiVersion: storage.k8s.io/v1beta1
kind: StorageClass
metadata:
name: normal
provisioner: kubernetes.io/rbd
parameters:
monitors: 10.10.73.3:6789,10.10.44.4:6789,10.10.39.4:6789
adminId: admin
adminSecretName: ceph-conf-combined
adminSecretNamespace: ceph
pool: kube
userId: client
userSecretName: ceph-client-key
Appreciate any help you can give.
Hi there
This is an amazing Repository, thank you very much.
May I ask if you will ever add the option to use Weave as your CNI??
Cheers
Thanks for taking time to write put this together. I ran into this error:
fatal: [192.168.1.20]: FAILED! => {"msg": "Unexpected templating type error occurred on ({% if num_nodes >= 0 -%}{{ 90 * 1024 + num_nodes|int * nanny_memory_per_node}}Ki{%- else -%}90Mi{% endif -%}): '>=' not supported between instances of 'str' and 'int'"}
Ansible does like the comparison in this statement:
"{% if num_nodes >= 0 -%}{{ 90 * 1024 + num_nodes|int * nanny_memory_per_node}}Ki{%- else -%}90Mi{% endif -%}"
I believe it should be like this:
"{% if num_nodes | int >= 0 -%}{{ 90 * 1024 + num_nodes|int * nanny_memory_per_node}}Ki{%- else -%}90Mi{% endif -%}"
Hello,
The etcd cluster is not "work".
root@master1:~# etcdctl cluster-health
cluster may be unhealthy: failed to list members
Error: client: etcd cluster is unavailable or misconfigured; error #0: dial tcp 127.0.0.1:4001: getsockopt: connection refused
; error #1: malformed HTTP response "\x15\x03\x01\x00\x02\x02"
error #0: dial tcp 127.0.0.1:4001: getsockopt: connection refused
error #1: malformed HTTP response "\x15\x03\x01\x00\x02\x02"
Maxence
Todo list:
Hello,
I had tried this quick start method with 1 k8s master and 2 k8s node. Kubernetes dashboard isn't available though I executed addons also.
root@master1:# kubectl get nodes# kubectl cluster-info
NAME STATUS ROLES AGE VERSION
master1 Ready master 33m v1.9.1
node1 Ready 32m v1.9.1
node2 Ready master 33m v1.9.1
root@master1:
Kubernetes master is running at https://172.16.35.9:6443
To further debug and diagnose cluster problems, use 'kubectl cluster-info dump'.
root@master1:~# kubectl get pods --all-namespaces
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system calico-node-92nfd 2/2 Running 0 32m
kube-system calico-node-gb8gv 2/2 Running 0 32m
kube-system calico-node-nbll8 2/2 Running 0 32m
kube-system calico-policy-controller-fb675cfbc-s9kpl 1/1 Running 0 32m
kube-system haproxy-master1 1/1 Running 0 32m
kube-system haproxy-node2 1/1 Running 0 32m
kube-system keepalived-master1 1/1 Running 0 32m
kube-system keepalived-node2 1/1 Running 0 32m
kube-system kube-apiserver-master1 1/1 Running 0 32m
kube-system kube-apiserver-node2 1/1 Running 0 33m
kube-system kube-controller-manager-master1 1/1 Running 0 32m
kube-system kube-controller-manager-node2 1/1 Running 0 32m
kube-system kube-dns-74bf5c4b94-vhcc5 3/3 Running 0 32m
kube-system kube-proxy-7mjrf 1/1 Running 0 32m
kube-system kube-proxy-hm2bj 1/1 Running 0 32m
kube-system kube-proxy-ncqfs 1/1 Running 0 32m
kube-system kube-scheduler-master1 1/1 Running 0 32m
kube-system kube-scheduler-node2 1/1 Running 0 32m
Could you guide me where am i doing mistake, that I can avoid and bring k8s dashbord
{
"kind": "Status",
"apiVersion": "v1",
"metadata": {
},
"status": "Failure",
"message": "services \"kibana-logging\" is forbidden: User \"system:anonymous\" cannot get services/proxy in the namespace \"kube-system\"",
"reason": "Forbidden",
"details": {
"name": "kibana-logging",
"kind": "services"
},
"code": 403
}
I think echo /etc/sysctl.d/keepalived.conf is better. It will be a permenent setting.
I am curious about "try_message" when I use kube-ansible. My next issue will be about it.
Thank you.
Hello,
When there are several master, it is absolutely necessary to generate the SSL certificates on the machine host (localhost) and after sending them to the different server.
Otherwise, we end up with totally different certificates on the masters, and it does not work.
Maxence
in master branch
my inventory/group_vars/all.yml
---
kube_version: 1.13.4
# Container runtime,
# Supported: docker, nvidia-docker, containerd.
container_runtime: docker
# Container network,
# Supported: calico, flannel.
cni_enable: true
container_network: calico
# Kubernetes HA extra variables.
vip_interface: ""
vip_address: 192.168.3.100
# Kubernetes extra addons
enable_ingress: true
enable_dashboard: false
enable_logging: true
enable_monitoring: false
enable_metric_server: true
grafana_user: "admin"
grafana_password: "admin"
error message:
fatal: [k8s-m1]: FAILED! => {"msg": "The task includes an option with an undefined variable. The error was: 'dict object' has no attribute 'namespace'\n\nThe error appears to have been in '/root/kube-ansible/roles/k8s-addon/tasks/main.yml': line 35, column 3, but may\nbe elsewhere in the file depending on the exact syntax problem.\n\nThe offending line appears to be:\n\n\n- name: Check {{ addon.name }} addon dependencies status\n ^ here\nWe could be wrong, but this one looks like it might be an issue with\nmissing quotes. Always quote template expression brackets when they\nstart a value. For instance:\n\n with_items:\n - {{ foo }}\n\nShould be written as:\n\n with_items:\n - \"{{ foo }}\"\n"}
Hello,
It's more than an issue, it's a question about how and what is the best practices to upgrade a cluster, knowing that I use your playbook to deploy the cluster ?
@kairen
Thank you
TASK [k8s-setup : Set taint to effect NoSchedule] **********************************************************************************************************************
Monday 15 October 2018 13:09:52 +0200 (0:00:00.353) 0:01:34.555 ********
FAILED - RETRYING: Set taint to effect NoSchedule (10 retries left).
FAILED - RETRYING: Set taint to effect NoSchedule (10 retries left).
FAILED - RETRYING: Set taint to effect NoSchedule (10 retries left).
FAILED - RETRYING: Set taint to effect NoSchedule (9 retries left).
FAILED - RETRYING: Set taint to effect NoSchedule (9 retries left).
FAILED - RETRYING: Set taint to effect NoSchedule (9 retries left).
FAILED - RETRYING: Set taint to effect NoSchedule (8 retries left).
FAILED - RETRYING: Set taint to effect NoSchedule (8 retries left).
FAILED - RETRYING: Set taint to effect NoSchedule (8 retries left).
FAILED - RETRYING: Set taint to effect NoSchedule (7 retries left).
FAILED - RETRYING: Set taint to effect NoSchedule (7 retries left).
FAILED - RETRYING: Set taint to effect NoSchedule (7 retries left).
FAILED - RETRYING: Set taint to effect NoSchedule (6 retries left).
FAILED - RETRYING: Set taint to effect NoSchedule (6 retries left).
FAILED - RETRYING: Set taint to effect NoSchedule (6 retries left).
FAILED - RETRYING: Set taint to effect NoSchedule (5 retries left).
FAILED - RETRYING: Set taint to effect NoSchedule (5 retries left).
FAILED - RETRYING: Set taint to effect NoSchedule (5 retries left).
FAILED - RETRYING: Set taint to effect NoSchedule (4 retries left).
FAILED - RETRYING: Set taint to effect NoSchedule (4 retries left).
FAILED - RETRYING: Set taint to effect NoSchedule (4 retries left).
FAILED - RETRYING: Set taint to effect NoSchedule (3 retries left).
FAILED - RETRYING: Set taint to effect NoSchedule (3 retries left).
FAILED - RETRYING: Set taint to effect NoSchedule (3 retries left).
FAILED - RETRYING: Set taint to effect NoSchedule (2 retries left).
FAILED - RETRYING: Set taint to effect NoSchedule (2 retries left).
FAILED - RETRYING: Set taint to effect NoSchedule (2 retries left).
FAILED - RETRYING: Set taint to effect NoSchedule (1 retries left).
FAILED - RETRYING: Set taint to effect NoSchedule (1 retries left).
FAILED - RETRYING: Set taint to effect NoSchedule (1 retries left).
fatal: [foundery02]: FAILED! => {"attempts": 10, "changed": true, "cmd": ["/usr/local/bin/kubectl", "--kubeconfig=/etc/kubernetes/admin.conf", "taint", "nodes", "Foundery02", "node-role.kubernetes.io/master=:NoSchedule", "--overwrite"], "delta": "0:00:00.088008", "end": "2018-10-15 13:10:14.851340", "msg": "non-zero return code", "rc": 1, "start": "2018-10-15 13:10:14.763332", "stderr": "Error from server (NotFound): nodes "Foundery02" not found", "stderr_lines": ["Error from server (NotFound): nodes "Foundery02" not found"], "stdout": "", "stdout_lines": []}
...ignoring
fatal: [foundery03]: FAILED! => {"attempts": 10, "changed": true, "cmd": ["/usr/local/bin/kubectl", "--kubeconfig=/etc/kubernetes/admin.conf", "taint", "nodes", "Foundery03", "node-role.kubernetes.io/master=:NoSchedule", "--overwrite"], "delta": "0:00:00.085448", "end": "2018-10-15 13:10:15.293709", "msg": "non-zero return code", "rc": 1, "start": "2018-10-15 13:10:15.208261", "stderr": "Error from server (NotFound): nodes "Foundery03" not found", "stderr_lines": ["Error from server (NotFound): nodes "Foundery03" not found"], "stdout": "", "stdout_lines": []}
...ignoring
fatal: [foundery04]: FAILED! => {"attempts": 10, "changed": true, "cmd": ["/usr/local/bin/kubectl", "--kubeconfig=/etc/kubernetes/admin.conf", "taint", "nodes", "Foundery04", "node-role.kubernetes.io/master=:NoSchedule", "--overwrite"], "delta": "0:00:00.086700", "end": "2018-10-15 13:10:16.130539", "msg": "non-zero return code", "rc": 1, "start": "2018-10-15 13:10:16.043839", "stderr": "Error from server (NotFound): nodes "Foundery04" not found", "stderr_lines": ["Error from server (NotFound): nodes "Foundery04" not found"], "stdout": "", "stdout_lines": []}
...ignoring
I install the cluster with default parameters, but coredns not working properly.
Node OS: ubuntu 18.04
kubectl logs coredns-7945bc8d5c-qp5s7 -n kube-system --tail=100 -f
......
2018/08/20 07:02:44 [ERROR] 2 smtp.office365.com. A: unreachable backend: read udp 10.244.2.27:34504->10.96.0.10:53: i/o timeout
2018/08/20 07:02:44 [ERROR] 2 smtp.office365.com. AAAA: unreachable backend: read udp 10.244.2.27:40291->10.96.0.10:53: i/o timeout
2018/08/20 07:02:44 [ERROR] 2 smtp.office365.com. AAAA: unreachable backend: read udp 10.244.2.27:42891->10.96.0.10:53: i/o timeout
2018/08/20 07:02:44 [ERROR] 2 elasticsearch-logging. AAAA: unreachable backend: read udp 10.244.2.27:55194->10.96.0.10:53: i/o timeout
2018/08/20 07:02:44 [ERROR] 2 smtp.office365.com. AAAA: unreachable backend: read udp 10.244.2.27:36631->10.96.0.10:53: i/o timeout
......
kubectl describe pod coredns-7945bc8d5c-qp5s7 -n kube-system
......
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 55m default-scheduler Successfully assigned kube-system/coredns-7945bc8d5c-qp5s7 to k8s-n2
Normal Pulled 54m kubelet, k8s-n2 Container image "192.168.21.29:5000/coredns:1.1.3" already present on machine
Normal Created 54m kubelet, k8s-n2 Created container
Normal Started 54m kubelet, k8s-n2 Started container
Warning Unhealthy 4m (x9 over 51m) kubelet, k8s-n2 Liveness probe failed: Get http://10.244.2.27:8080/health: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
Hello,
The Ansible Playbook cluster.yml
can not be thrown twice, otherwise the kubernetes cluster falls into error.
Regards,
Maxence
Hi,
thanks for sharing this!
After setting up a working cluster the individual VMs are not reboot safe. After a reboot the kubelet was not start because of swap being enabled. The playbooks does a 'swapoff -a' which is not persistent.
One way of diabling swap in a persistent way would be to create a SystemD service to turn it off before starting Kubelet. Like this:
# /etc/systemd/system/swapoff.service
[Unit]
Description=Swapoff, kubelet requirement
After=network.target
Before=kubelet.service
[Service]
Type=oneshot
ExecStart=/sbin/swapoff -a
RemainAfterExit=true
[Install]
WantedBy=multi-user.target
Where would be the best place in the code to implement this? I would be happy to give a pull request a shot.
Cheers,
Thomas
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.