kairen / kube-ansible Goto Github PK

View Code? Open in Web Editor NEW

437.0 25.0 196.0 2.73 MB

Build a Kubernetes cluster via Ansible playbook. :wrench: :wrench: :wrench:

License: Apache License 2.0

Shell 5.07% Python 1.30% Ruby 0.15% HTML 93.47%

kubernetes ceph vagrant virtualbox ansible k8s-conformance

kube-ansible's People

Contributors

Stargazers

Watchers

Forkers

chen361 limited1010 dalitun iahmad-khan ngocngv linzhengen liu-rui acinwinstack adrianreza macchiang bravosierrasierra macauleycheng addteq-himanshu opsnoober razor3kr tkokok jamesh23 inwinstack andreytiop lehanoisy roadrunner malli1983 seekyiyi xlogin sufuf3 carlosmanoel jkevlin leoxlin ashutosh091 wellsmuker pnetwork mfmwmsj farice4 starcloud-ai puneethreddy20 simisoz zeroc0d3 nick-louis zhaoxqing kingflyok bridgewell yujianping123 dannyaj jobcespedes valmach petersendev jialzhang tonyfud renepardon duke-lv ichristo thbeh jamestung1990 qingfeng786 tariknaeem yylin1 xinzhang iharkazlouski xiaoruiguo jaychang9 seecsea winniepooh07 ellis-wu ahmedrgb1 jybjobs epasham pryorda pragyesh15 iming0319 hungld kamal2222ahmed twillert normansu-tw enyachoke nuaays cinco armohamm wangxiaodong628 realmjian oukaja onlywyn sadpdtchr dsnoor onophris ytou waldirsp11 alemartinez-devops chiingting denis256 frankinbj fk0 xiaolt2019 box9527 lucabotti asd20277 hakanaku1234 chinaxiaobin ratnamgowd123 samloyal cody0704

kube-ansible's Issues

关于keepalived主备的疑问

你好，有幸能够了解到本项目，在搭建时有一些小疑问还请大神能够指点一下，谢谢。

在本项目进行apiserver负载的时候用到了keepalived，但是我查阅了一些资料发现都是说的priority值越大才是主节点，其余为备用节点，但本项目中priority值最小的只有一个节点
本项目priority配置文件：

kube-ansible/roles/k8s-setup/defaults/main.yml

Line 30 in 2f3086e

keepalived_priority: "{% if inventory_hostname == groups['masters'][0] %}100{% else %}150{% endif %}"

我查阅到相关文档

在本项目中keepalived部署到了所有master节点，但并没有发现在keepalived的配置中有主备切换的配置，也没有配置keepalived pod的探针，本项目中的keepalived有主备切换的相关介绍吗？

还请大神能够在百忙之中指点一二，万分感激，谢谢。

Upgrade existing cluster

Hello,

It's more than an issue, it's a question about how and what is the best practices to upgrade a cluster, knowing that I use your playbook to deploy the cluster ?

Thank you

ETCD enable and started failed

Hello,
I'm trying to use your code to setup a HA k8s but unfortunately I get this error :
fatal: [Master-k8s-2]: FAILED! => {"changed": false, "failed": true, "msg": "Unable to start service etcd: Job for etcd.service failed because a timeout was exceeded. See "systemctl status etcd.service" and "journalctl -xe" for details.\n"}

and when I check the logs I see error: tls: bad certificate)

do you have any idea why ?

Thank you for your reply

Cert Sign Approval

When i use your repo to start a cluster, everything works great, but i have one issue.

When the cert signing for the master node has to happen, the request gets created, but it keeps in a pending state, the script continue and completes.
When i log into the cluster all the nodes are there but the master, the one where the cert signing is pending.
I reset the cluster and started the playbook again, this time keeping an eye for when the cert signing request gets done, and then I approved the request by hand, and that fixed the issue, and i could see all nodes on the cluster.

What could cause the cert signing to be stuck in pending, and i have to do it by hand??

Thank you for he awesome work so far, and any help will be greatly appreciated.

Cheers

Deployment failure with ansible issue

I'm getting this error while deploying:

TASK [k8s-setup : Copy Keepalived manifest and config files into cluster] ***********************************************************************************************************************************************************************************************************************************************************
Wednesday 02 October 2019  13:17:18 +0200 (0:00:00.666)       0:04:54.832 *****
fatal: [172.16.202.130]: FAILED! => {"changed": false, "msg": "AnsibleUndefinedVariable: 'ansible.vars.hostvars.HostVarsVars object' has no attribute 'ansible_eth1'"}

I tried to fix this by adding ansible_eth1: eth1 to the group_vars. But no I'm getting this error:

TASK [k8s-setup : Copy Keepalived manifest and config files into cluster] ***********************************************************************************************************************************************************************************************************************************************************
Wednesday 02 October 2019  13:30:26 +0200 (0:00:00.679)       0:04:42.554 *****
fatal: [172.16.202.130]: FAILED! => {"changed": false, "msg": "AnsibleUndefinedVariable: 'ansible.parsing.yaml.objects.AnsibleUnicode object' has no attribute 'ipv4'"}

Please advise. Ansible used:

ansible 2.8.5
  config file = /Users/tdeutsch/git/kube-ansible/ansible.cfg
  configured module search path = ['/Users/tdeutsch/.ansible/plugins/modules', '/usr/share/ansible/plugins/modules']
  ansible python module location = /usr/local/Cellar/ansible/2.8.5/libexec/lib/python3.7/site-packages/ansible
  executable location = /usr/local/bin/ansible
  python version = 3.7.4 (default, Sep  7 2019, 18:27:02) [Clang 10.0.1 (clang-1001.0.46.4)]

timeout to create kube-apiserver to kubelet RBAC

TASK [k8s-setup : Create kube-apiserver to kubelet RBAC] **************************************************************************************************************************************************
Tuesday 20 April 2021 20:30:08 +0800 (0:00:01.632) 0:03:44.899 *********
FAILED - RETRYING: Create kube-apiserver to kubelet RBAC (10 retries left).
FAILED - RETRYING: Create kube-apiserver to kubelet RBAC (9 retries left).
FAILED - RETRYING: Create kube-apiserver to kubelet RBAC (8 retries left).
FAILED - RETRYING: Create kube-apiserver to kubelet RBAC (7 retries left).
FAILED - RETRYING: Create kube-apiserver to kubelet RBAC (6 retries left).
FAILED - RETRYING: Create kube-apiserver to kubelet RBAC (5 retries left).
FAILED - RETRYING: Create kube-apiserver to kubelet RBAC (4 retries left).
FAILED - RETRYING: Create kube-apiserver to kubelet RBAC (3 retries left).
FAILED - RETRYING: Create kube-apiserver to kubelet RBAC (2 retries left).
FAILED - RETRYING: Create kube-apiserver to kubelet RBAC (1 retries left).
fatal: [10.85.246.115 -> 10.85.246.115]: FAILED! => {"attempts": 10, "changed": true, "cmd": ["/usr/local/bin/kubectl", "--kubeconfig=/etc/kubernetes/admin.conf", "apply", "-f", "/tmp/apiserver-to-kubelet-rbac.yml"], "delta": "0:00:30.385037", "end": "2021-04-20 20:36:06.896236", "msg": "non-zero return code", "rc": 1, "start": "2021-04-20 20:35:36.511199", "stderr": "Unable to connect to the server: dial tcp 10.85.247.115:6443: i/o timeout", "stderr_lines": ["Unable to connect to the server: dial tcp 10.85.247.115:6443: i/o timeout"], "stdout": "", "stdout_lines": []}

x509: certificate signed by unknown authority issue after reset cluster and reinstall

Hi, I hit below errors, when I first success install it and reset it, then install it again ,have below errors:

fatal: [192.168.4.91 -> 192.168.4.91]: FAILED! => {"attempts": 10, "changed": true, "cmd": ["/usr/local/bin/kubectl", "--kubeconfig=/etc/kubernetes/admin.conf", "apply", "-f", "/tmp/apiserver-to-kubelet-rbac.yml"], "delta": "0:00:02.393257", "end": "2019-07-16 09:45:25.181932", "msg": "non-zero return code", "rc": 1, "start": "2019-07-16 09:45:22.788675", "stderr": "unable to recognize \"/tmp/apiserver-to-kubelet-rbac.yml\": Get https://192.168.4.91:8443/api?timeout=32s: x509: certificate signed by unknown authority (possibly because of \"crypto/rsa: verification error\" while trying to verify candidate authority certificate \"kubernetes\")\nunable to recognize \"/tmp/apiserver-to-kubelet-rbac.yml\": Get https://192.168.4.91:8443/api?timeout=32s: x509: certificate signed by unknown authority (possibly because of \"crypto/rsa: verification error\" while trying to verify candidate authority certificate \"kubernetes\")", "stderr_lines": ["unable to recognize \"/tmp/apiserver-to-kubelet-rbac.yml\": Get https://192.168.4.91:8443/api?timeout=32s: x509: certificate signed by unknown authority (possibly because of \"crypto/rsa: verification error\" while trying to verify candidate authority certificate \"kubernetes\")", "unable to recognize \"/tmp/apiserver-to-kubelet-rbac.yml\": Get https://192.168.4.91:8443/api?timeout=32s: x509: certificate signed by unknown authority (possibly because of \"crypto/rsa: verification error\" while trying to verify candidate authority certificate \"kubernetes\")"], "stdout": "", "stdout_lines": []}

Kubernetes version update/upgrade issue.

kube-ansible can be updated without work to kube_version: 1.14.7 in kube-ansible/inventory/group_vars/all.yml
I cannot upgrade to (latest)stable nor to kube_version: 1.15.X

Dashboard access

setup completed zero errors on cent OS 7 vm's running in vmware.
kubectl retrieves everything, get pods services describe etc.
kubectl proxy then running

http://localhost:8001/api/v1/namespaces/kube-system/services/https:kubernetes-dashboard:/proxy/
get's me this
Error: 'dial tcp 10.244.1.6:8443: connect: network is unreachable'
Trying to reach: 'https://10.244.1.6:8443/'

kubectl -n kube-system get po,svc
NAME READY STATUS RESTARTS AGE
pod/calico-node-7gvv8 2/2 Running 4 6h
pod/calico-node-gqxz2 2/2 Running 4 6h
pod/coredns-6d98b868c7-slk7m 1/1 Running 1 4h
pod/elasticsearch-logging-0 1/1 Running 1 3h
pod/fluentd-es-8tw67 1/1 Running 1 3h
pod/fluentd-es-q6fj6 1/1 Running 1 3h
pod/kibana-logging-56c4d58dcd-blcxh 1/1 Running 1 3h
pod/kube-proxy-7vqh5 1/1 Running 3 6h
pod/kube-proxy-vb2fs 1/1 Running 3 6h
pod/kubernetes-dashboard-6948bdb78-wb59s 1/1 Running 1 4h
pod/metrics-server-86bd9d7667-lwpg6 1/1 Running 1 4h

NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/calico-typha ClusterIP 10.111.173.6 5473/TCP 6h
service/elasticsearch-logging ClusterIP 10.111.176.201 9200/TCP 3h
service/kibana-logging ClusterIP 10.111.132.176 5601/TCP 3h
service/kube-controller-manager ClusterIP None 10252/TCP 1h
service/kube-dns ClusterIP 10.96.0.10 53/UDP,53/TCP,9153/TCP 6h
service/kube-scheduler ClusterIP None 10251/TCP 1h
service/kubelet ClusterIP None 10250/TCP 3h
service/kubernetes-dashboard ClusterIP 10.106.191.94 443/TCP 6h
service/metrics-server ClusterIP 10.103.82.224 443/TCP 6h

What else can I check?

Certificate error

Hello,

I had a certificate error when I use kubectl.
I do not know to go wrong, would it be me?

Regards,
Maxence

[Feature] Add RBAC mode

In the version v1.6.0 later, Kubernetes default use RBAC auth mode.

centos07, dict object' has no attribute u'ansible_eth1'"

Trying to run a vagrant build based upon centos7, after a few minutes this error keeps appearing.

TASK [etcd : Copy Etcd conf template file] *************************************
fatal: [172.16.35.10]: FAILED! => {"changed": false, "failed": true, "msg": "AnsibleUndefinedVariable: https://{{ etcd_ip_addr }}:{{ etcd_peer_port }}: {% if etcd_iface != '' %}{{ hostvars[inventory_hostname]['ansible_' + etcd_iface].ipv4.address }} {%- else %}{{ hostvars[inventory_hostname].ansible_default_ipv4.address }}{% endif %}: 'dict object' has no attribute u'ansible_eth1'"}

Tried to run the vagrant build using ubuntu as os, this runs without errors.

nginx ingress controller

Hello,

first of all thank you for this amazing work,
I have a question about the ingress controller, I always get this:
curl -H "Host: game.domain1.io" 192.168.232.133
curl: (7) Failed connect to 192.168.232.133:80; Connection refused

thank you

Add provider support for Hyper-v

Add vagrant new provider for Hyper-v, refer: https://www.vagrantup.com/docs/hyperv/configuration.html

Upgrade existing Cluster

Hello,

It's more than an issue, it's a question about how and what is the best practices to upgrade a cluster, knowing that I use your playbook to deploy the cluster ?

Thank you

Default deployment using setup tool without prompt?

Is there a way to deploy the default environment (1b, 2w, 1c, 2048m...) without the "Start deploying? (y)" prompt?
Tried ./tools/setup --force but didn't work

./tools/setup -f
./tools/func-vars: line 109: 2: unbound variable

I'd like to run kube-ansible within my sh script and deploy automatically

What if the master is also the node ?

Masters and Nodes they all use different kubelet config file. What if the master is also the node ?
I think the best method is to untaint masters and let pods run on them.
Thank you

https://github.com/inwinstack/kube-ansible/blob/60aaa37128f2a885ab3df26d5917154b60b908e9/roles/kubernetes/node/templates/services/10-kubelet.conf.j2#L3

[feature] Add cri-o container runtime

This feature wants to add CRI-O for container runtime, see the link at https://github.com/kubernetes-incubator/cri-o.

Could not start Kubernetes core components

upon running the shell script: ./hack/setup-vms

I get error:

TASK [k8s-setup : Wait for Kubernetes core component start] ********************
Thursday 02 May 2019  14:12:40 -0400 (0:00:00.353)       0:03:15.687 **********
failed: [192.168.1.10] (item=6443) => {"changed": false, "elapsed": 300, "item": 6443, "msg": "Timeout when waiting for 127.0.0.1:6443"}```

PLAY RECAP *********************************************************************
192.168.1.10 : ok=140 changed=77 unreachable=0 failed=1
192.168.1.12 : ok=28 changed=19 unreachable=0 failed=0
192.168.1.13 : ok=28 changed=19 unreachable=0 failed=0

Thursday 02 May 2019 14:19:43 -0400 (0:07:03.147) 0:10:18.835 **********

k8s-setup : Wait for Kubernetes core component start ------------------ 423.15s
download/package : Downloading kubelet file ---------------------------- 81.90s
download/package : Downloading kubectl file ---------------------------- 28.81s
download/package : Downloading docker file ----------------------------- 15.01s
download/package : Downloading cni file --------------------------------- 4.37s
download/package : Downloading cfssl file ------------------------------- 4.25s
download/package : Extract docker file ---------------------------------- 3.69s
download/package : Downloading etcd file -------------------------------- 3.27s
cert : Generate Kubernetes SSL certificate json files ------------------- 2.05s
cert : Create Kubernetes SSL certificate key files ---------------------- 1.82s
common/copy-files : Write the content of files -------------------------- 1.67s
k8s-setup : Copy Kubernetes manifest and config files into cluster ------ 1.49s
cert : Delete unnecessary Kubernetes files ------------------------------ 1.48s
download/package : Downloading cfssljson file --------------------------- 1.43s
common/copy-files : Check the files already exists ---------------------- 1.41s
common/copy-files : Read the config files ------------------------------- 1.28s
download/package : Extract cni file ------------------------------------- 1.24s
download/package : Symlinks docker to /usr/local/bin -------------------- 1.16s
download/package : Extract etcd file ------------------------------------ 0.99s
etcd : Enable and restart etcd ------------------------------------------ 0.93s
Ansible failed to complete successfully. Any error output should be
visible above. Please fix these errors and try again.
==> k8s-n3: The previous process exited with exit code 1.
==> k8s-n2: The previous process exited with exit code 1.
==> k8s-n1: The previous process exited with exit code 1.
==> k8s-m1: The previous process exited with exit code 1.

etcd SSL cert creation fails

TASK [cert : Create etcd SSL certificate key files] ***********************************************************************************************************************
Tuesday 12 February 2019 15:11:02 -0500 (0:00:00.194) 0:00:10.056 ******
fatal: [k8s-m1]: FAILED! => {"msg": "The task includes an option with an undefined variable. The error was: 'ansible.vars.hostvars.HostVarsVars object' has no attribute 'ansible_default_ipv4'\n\nThe error appears to have been in '/home/mike/kube-ansible/roles/cert/tasks/create-etcd-certs.yml': line 49, column 3, but may\nbe elsewhere in the file depending on the exact syntax problem.\n\nThe offending line appears to be:\n\n\n- name: Create etcd SSL certificate key files\n ^ here\n"}

Trouble with setting up cluster

Hi there

during the ansible run to create the cluster, i get stuck at "[k8s-setup : Wait for Kubernetes core component start]"
with error:
"11:14:24.197345 11914 reflector.go:205] k8s.io/kubernetes/pkg/kubelet/kubelet.go:464: Failed to list *v1.Node: Get https://:8443/api/v1/nodes?fieldSelector=metadata.name%3Dcptk8spoc01.capetown.fwslash.net&limit=500&resourceVersion=0: tls: either ServerName or InsecureSkipVerify must be specified in the tls.Config"

Also found that the docker network is on 172.x.x.x and so is our environment.

Any idea what this could be

Dashboard timeout error

After installing k8s, I try to open the dashboard (https://server-ip:8443/api/v1/namespaces/kube-system/services/https:kubernetes-dashboard:/proxy/) and get an error:
Error: 'dial tcp 10.244.0.24:8443: i/o timeout' Trying to reach: 'https://10.244.0.24:8443/'
Can you tell me what the problem might be?

Test cluster:
$ kubectl -n kube-system get po,svc
pod/calico-node-cwpt9 2/2 Running 0 14m
pod/coredns-896d9f87d-2nlfs 1/1 Running 0 15m
pod/coredns-autoscaler-58784cd54d-svlwv 1/1 Running 0 15m
pod/kube-proxy-lrj6c 1/1 Running 0 14m
pod/kubernetes-dashboard-57df4db6b-6w49h 1/1 Running 0 10m
pod/metrics-server-68d85f76bb-7n2n5 1/1 Running 0 8m15s

service/calico-typha ClusterIP 10.106.84.71 5473/TCP 14m
service/kube-controller-manager ClusterIP None 10252/TCP 8m37s
service/kube-dns ClusterIP 10.96.0.10 53/UDP,53/TCP,9153/TCP 15m
service/kube-scheduler ClusterIP None 10251/TCP 8m37s
service/kubelet ClusterIP None 10250/TCP 6m37s
service/kubernetes-dashboard ClusterIP 10.103.103.89 443/TCP 10m
service/metrics-server ClusterIP 10.106.81.208 443/TCP 8m15s

Prometheus alerts

KubeControllerManager and KubeScheduler work normaly , but here has critical alerts.

API server certificate

Hi, I am running a couple of applications that checked for API server certificate and your ansible doesn't seem to contain the right hostnames (the default / usual API hostname is kubernetes.default.svc, but your certificate seems to have only kubernetes.default). Where could I changed that? Thanks

[Feature request] Add auto scaler for CoreDNS

TLS issue

Firstly, thank you for the awesome work with this repo.

I am stuck in my process of creating a cluster.

The script runs fine until it gets to the: "TASK [k8s-setup : Create kube-apiserver to kubelet RBAC]" part.

Error im getting:
"fatal: [172.24.49.100 -> 172.24.49.100]: FAILED! => {"attempts": 10, "changed": true, "cmd": ["/usr/local/bin/kubectl", "--kubeconfig=/etc/kubernetes/admin.conf", "apply", "-f", "/tmp/apiserver-to-kubelet-rbac.yml"], "delta": "0:00:00.074493", "end": "2018-09-06 14:30:49.764857", "msg": "non-zero return code", "rc": 1, "start": "2018-09-06 14:30:49.690364", "stderr": "unable to recognize "/tmp/apiserver-to-kubelet-rbac.yml": Get https://:8443/api?timeout=32s: tls: either ServerName or InsecureSkipVerify must be specified in the tls.Config\nunable to recognize "/tmp/apiserver-to-kubelet-rbac.yml": Get https://:8443/api?timeout=32s: tls: either ServerName or InsecureSkipVerify must be specified in the tls.Config", "stderr_lines": ["unable to recognize "/tmp/apiserver-to-kubelet-rbac.yml": Get https://:8443/api?timeout=32s: tls: either ServerName or InsecureSkipVerify must be specified in the tls.Config", "unable to recognize "/tmp/apiserver-to-kubelet-rbac.yml": Get https://:8443/api?timeout=32s: tls: either ServerName or InsecureSkipVerify must be specified in the tls.Config"], "stdout": "", "stdout_lines": []}"

Im also getting this error whenever i try to use kubectl

Unable to connect to the server: tls: either ServerName or InsecureSkipVerify must be specified in the tls.Config

So it would seem that the issue is that nothing can invoke the command kubectl

Please, any help will be greatly appreciated

storageclass example?

first of all, i love your project. it was quite helpful for me to get k8s with hyperconverged ceph running in ansible. Do you have an example storageclass based on this project? I have been trying to create one but im not having luck. it gives an error about incorrect secrets and i have been trying a different number of id and secret names and have not had luck. I attached one version of the example below.

---
apiVersion: storage.k8s.io/v1beta1
kind: StorageClass
metadata:
   name: normal
provisioner: kubernetes.io/rbd
parameters:
  monitors: 10.10.73.3:6789,10.10.44.4:6789,10.10.39.4:6789
  adminId: admin
  adminSecretName: ceph-conf-combined
  adminSecretNamespace: ceph
  pool: kube
  userId: client
  userSecretName: ceph-client-key

Appreciate any help you can give.

Request

Hi there

This is an amazing Repository, thank you very much.

May I ask if you will ever add the option to use Weave as your CNI??

Cheers

Ansible issue with if statement

Thanks for taking time to write put this together. I ran into this error:

fatal: [192.168.1.20]: FAILED! => {"msg": "Unexpected templating type error occurred on ({% if num_nodes >= 0 -%}{{ 90 * 1024 + num_nodes|int * nanny_memory_per_node}}Ki{%- else -%}90Mi{% endif -%}): '>=' not supported between instances of 'str' and 'int'"}

Ansible does like the comparison in this statement:

"{% if num_nodes >= 0 -%}{{ 90 * 1024 + num_nodes|int * nanny_memory_per_node}}Ki{%- else -%}90Mi{% endif -%}"

I believe it should be like this:

"{% if num_nodes | int >= 0 -%}{{ 90 * 1024 + num_nodes|int * nanny_memory_per_node}}Ki{%- else -%}90Mi{% endif -%}"

etcd not work

Hello,

The etcd cluster is not "work".

root@master1:~# etcdctl cluster-health
cluster may be unhealthy: failed to list members
Error:  client: etcd cluster is unavailable or misconfigured; error #0: dial tcp 127.0.0.1:4001: getsockopt: connection refused
; error #1: malformed HTTP response "\x15\x03\x01\x00\x02\x02"

error #0: dial tcp 127.0.0.1:4001: getsockopt: connection refused
error #1: malformed HTTP response "\x15\x03\x01\x00\x02\x02"

Maxence

Add new features for v1.9.x

Todo list:

Add new integration test.
Update legacy addons and CNIs.
Support offline install.
Refactor some codes.

Dashboard isn't installed

Hello,

I had tried this quick start method with 1 k8s master and 2 k8s node. Kubernetes dashboard isn't available though I executed addons also.

root@master1:# kubectl get nodes
NAME STATUS ROLES AGE VERSION
master1 Ready master 33m v1.9.1
node1 Ready 32m v1.9.1
node2 Ready master 33m v1.9.1
root@master1:# kubectl cluster-info
Kubernetes master is running at https://172.16.35.9:6443

To further debug and diagnose cluster problems, use 'kubectl cluster-info dump'.
root@master1:~# kubectl get pods --all-namespaces
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system calico-node-92nfd 2/2 Running 0 32m
kube-system calico-node-gb8gv 2/2 Running 0 32m
kube-system calico-node-nbll8 2/2 Running 0 32m
kube-system calico-policy-controller-fb675cfbc-s9kpl 1/1 Running 0 32m
kube-system haproxy-master1 1/1 Running 0 32m
kube-system haproxy-node2 1/1 Running 0 32m
kube-system keepalived-master1 1/1 Running 0 32m
kube-system keepalived-node2 1/1 Running 0 32m
kube-system kube-apiserver-master1 1/1 Running 0 32m
kube-system kube-apiserver-node2 1/1 Running 0 33m
kube-system kube-controller-manager-master1 1/1 Running 0 32m
kube-system kube-controller-manager-node2 1/1 Running 0 32m
kube-system kube-dns-74bf5c4b94-vhcc5 3/3 Running 0 32m
kube-system kube-proxy-7mjrf 1/1 Running 0 32m
kube-system kube-proxy-hm2bj 1/1 Running 0 32m
kube-system kube-proxy-ncqfs 1/1 Running 0 32m
kube-system kube-scheduler-master1 1/1 Running 0 32m
kube-system kube-scheduler-node2 1/1 Running 0 32m

Could you guide me where am i doing mistake, that I can avoid and bring k8s dashbord

kibana-logging service cannot access

{
  "kind": "Status",
  "apiVersion": "v1",
  "metadata": {
    
  },
  "status": "Failure",
  "message": "services \"kibana-logging\" is forbidden: User \"system:anonymous\" cannot get services/proxy in the namespace \"kube-system\"",
  "reason": "Forbidden",
  "details": {
    "name": "kibana-logging",
    "kind": "services"
  },
  "code": 403
}

It is not permanent

https://github.com/inwinstack/kube-ansible/blob/60aaa37128f2a885ab3df26d5917154b60b908e9/roles/kubernetes/ha/tasks/main.yml#L7

I think echo /etc/sysctl.d/keepalived.conf is better. It will be a permenent setting.

I am curious about "try_message" when I use kube-ansible. My next issue will be about it.

Thank you.

Add Harbor registry integration

https://github.com/vmware/harbor

Generation of certificates locally

Hello,

When there are several master, it is absolutely necessary to generate the SSL certificates on the machine host (localhost) and after sending them to the different server.
Otherwise, we end up with totally different certificates on the masters, and it does not work.

Maxence

Update to v1.9.x as default

Unable to install addons properly

in master branch
my inventory/group_vars/all.yml

---

kube_version: 1.13.4

# Container runtime,
# Supported: docker, nvidia-docker, containerd.
container_runtime: docker

# Container network,
# Supported: calico, flannel.
cni_enable: true
container_network: calico

# Kubernetes HA extra variables.
vip_interface: ""
vip_address: 192.168.3.100

# Kubernetes extra addons
enable_ingress: true
enable_dashboard: false
enable_logging: true
enable_monitoring: false
enable_metric_server: true

grafana_user: "admin"
grafana_password: "admin"

error message:
fatal: [k8s-m1]: FAILED! => {"msg": "The task includes an option with an undefined variable. The error was: 'dict object' has no attribute 'namespace'\n\nThe error appears to have been in '/root/kube-ansible/roles/k8s-addon/tasks/main.yml': line 35, column 3, but may\nbe elsewhere in the file depending on the exact syntax problem.\n\nThe offending line appears to be:\n\n\n- name: Check {{ addon.name }} addon dependencies status\n ^ here\nWe could be wrong, but this one looks like it might be an issue with\nmissing quotes. Always quote template expression brackets when they\nstart a value. For instance:\n\n with_items:\n - {{ foo }}\n\nShould be written as:\n\n with_items:\n - \"{{ foo }}\"\n"}

Upgrade existing cluster

Hello,

It's more than an issue, it's a question about how and what is the best practices to upgrade a cluster, knowing that I use your playbook to deploy the cluster ?
@kairen
Thank you

One Task is failing : Set taint to effect NoSchedule please help.

TASK [k8s-setup : Set taint to effect NoSchedule] **********************************************************************************************************************
Monday 15 October 2018 13:09:52 +0200 (0:00:00.353) 0:01:34.555 ********
FAILED - RETRYING: Set taint to effect NoSchedule (10 retries left).
FAILED - RETRYING: Set taint to effect NoSchedule (10 retries left).
FAILED - RETRYING: Set taint to effect NoSchedule (10 retries left).
FAILED - RETRYING: Set taint to effect NoSchedule (9 retries left).
FAILED - RETRYING: Set taint to effect NoSchedule (9 retries left).
FAILED - RETRYING: Set taint to effect NoSchedule (9 retries left).
FAILED - RETRYING: Set taint to effect NoSchedule (8 retries left).
FAILED - RETRYING: Set taint to effect NoSchedule (8 retries left).
FAILED - RETRYING: Set taint to effect NoSchedule (8 retries left).
FAILED - RETRYING: Set taint to effect NoSchedule (7 retries left).
FAILED - RETRYING: Set taint to effect NoSchedule (7 retries left).
FAILED - RETRYING: Set taint to effect NoSchedule (7 retries left).
FAILED - RETRYING: Set taint to effect NoSchedule (6 retries left).
FAILED - RETRYING: Set taint to effect NoSchedule (6 retries left).
FAILED - RETRYING: Set taint to effect NoSchedule (6 retries left).
FAILED - RETRYING: Set taint to effect NoSchedule (5 retries left).
FAILED - RETRYING: Set taint to effect NoSchedule (5 retries left).
FAILED - RETRYING: Set taint to effect NoSchedule (5 retries left).
FAILED - RETRYING: Set taint to effect NoSchedule (4 retries left).
FAILED - RETRYING: Set taint to effect NoSchedule (4 retries left).
FAILED - RETRYING: Set taint to effect NoSchedule (4 retries left).
FAILED - RETRYING: Set taint to effect NoSchedule (3 retries left).
FAILED - RETRYING: Set taint to effect NoSchedule (3 retries left).
FAILED - RETRYING: Set taint to effect NoSchedule (3 retries left).
FAILED - RETRYING: Set taint to effect NoSchedule (2 retries left).
FAILED - RETRYING: Set taint to effect NoSchedule (2 retries left).
FAILED - RETRYING: Set taint to effect NoSchedule (2 retries left).
FAILED - RETRYING: Set taint to effect NoSchedule (1 retries left).
FAILED - RETRYING: Set taint to effect NoSchedule (1 retries left).
FAILED - RETRYING: Set taint to effect NoSchedule (1 retries left).
fatal: [foundery02]: FAILED! => {"attempts": 10, "changed": true, "cmd": ["/usr/local/bin/kubectl", "--kubeconfig=/etc/kubernetes/admin.conf", "taint", "nodes", "Foundery02", "node-role.kubernetes.io/master=:NoSchedule", "--overwrite"], "delta": "0:00:00.088008", "end": "2018-10-15 13:10:14.851340", "msg": "non-zero return code", "rc": 1, "start": "2018-10-15 13:10:14.763332", "stderr": "Error from server (NotFound): nodes "Foundery02" not found", "stderr_lines": ["Error from server (NotFound): nodes "Foundery02" not found"], "stdout": "", "stdout_lines": []}
...ignoring
fatal: [foundery03]: FAILED! => {"attempts": 10, "changed": true, "cmd": ["/usr/local/bin/kubectl", "--kubeconfig=/etc/kubernetes/admin.conf", "taint", "nodes", "Foundery03", "node-role.kubernetes.io/master=:NoSchedule", "--overwrite"], "delta": "0:00:00.085448", "end": "2018-10-15 13:10:15.293709", "msg": "non-zero return code", "rc": 1, "start": "2018-10-15 13:10:15.208261", "stderr": "Error from server (NotFound): nodes "Foundery03" not found", "stderr_lines": ["Error from server (NotFound): nodes "Foundery03" not found"], "stdout": "", "stdout_lines": []}
...ignoring
fatal: [foundery04]: FAILED! => {"attempts": 10, "changed": true, "cmd": ["/usr/local/bin/kubectl", "--kubeconfig=/etc/kubernetes/admin.conf", "taint", "nodes", "Foundery04", "node-role.kubernetes.io/master=:NoSchedule", "--overwrite"], "delta": "0:00:00.086700", "end": "2018-10-15 13:10:16.130539", "msg": "non-zero return code", "rc": 1, "start": "2018-10-15 13:10:16.043839", "stderr": "Error from server (NotFound): nodes "Foundery04" not found", "stderr_lines": ["Error from server (NotFound): nodes "Foundery04" not found"], "stdout": "", "stdout_lines": []}
...ignoring

CoreDNS cannot access external domain names.

I install the cluster with default parameters, but coredns not working properly.

Node OS: ubuntu 18.04

kubectl logs coredns-7945bc8d5c-qp5s7 -n kube-system --tail=100 -f

......
2018/08/20 07:02:44 [ERROR] 2 smtp.office365.com. A: unreachable backend: read udp 10.244.2.27:34504->10.96.0.10:53: i/o timeout
2018/08/20 07:02:44 [ERROR] 2 smtp.office365.com. AAAA: unreachable backend: read udp 10.244.2.27:40291->10.96.0.10:53: i/o timeout
2018/08/20 07:02:44 [ERROR] 2 smtp.office365.com. AAAA: unreachable backend: read udp 10.244.2.27:42891->10.96.0.10:53: i/o timeout
2018/08/20 07:02:44 [ERROR] 2 elasticsearch-logging. AAAA: unreachable backend: read udp 10.244.2.27:55194->10.96.0.10:53: i/o timeout
2018/08/20 07:02:44 [ERROR] 2 smtp.office365.com. AAAA: unreachable backend: read udp 10.244.2.27:36631->10.96.0.10:53: i/o timeout
......

kubectl describe pod coredns-7945bc8d5c-qp5s7 -n kube-system

......
Events:
  Type     Reason     Age               From               Message
  ----     ------     ----              ----               -------
  Normal   Scheduled  55m               default-scheduler  Successfully assigned kube-system/coredns-7945bc8d5c-qp5s7 to k8s-n2
  Normal   Pulled     54m               kubelet, k8s-n2    Container image "192.168.21.29:5000/coredns:1.1.3" already present on machine
  Normal   Created    54m               kubelet, k8s-n2    Created container
  Normal   Started    54m               kubelet, k8s-n2    Started container
  Warning  Unhealthy  4m (x9 over 51m)  kubelet, k8s-n2    Liveness probe failed: Get http://10.244.2.27:8080/health: net/http: request canceled (Client.Timeout exceeded while awaiting headers)

cluster.yml

Hello,

The Ansible Playbook cluster.yml can not be thrown twice, otherwise the kubernetes cluster falls into error.

Regards,
Maxence

VMs not reboot persistent - kubelet not starting because of swap

Hi,
thanks for sharing this!

After setting up a working cluster the individual VMs are not reboot safe. After a reboot the kubelet was not start because of swap being enabled. The playbooks does a 'swapoff -a' which is not persistent.

One way of diabling swap in a persistent way would be to create a SystemD service to turn it off before starting Kubelet. Like this:

# /etc/systemd/system/swapoff.service

[Unit]
Description=Swapoff, kubelet requirement
After=network.target
Before=kubelet.service

[Service]
Type=oneshot
ExecStart=/sbin/swapoff -a
RemainAfterExit=true

[Install]
WantedBy=multi-user.target

Where would be the best place in the code to implement this? I would be happy to give a pull request a shot.

Cheers,
Thomas

kairen / kube-ansible Goto Github PK

kube-ansible's People

Contributors

Stargazers

Watchers

Forkers

kube-ansible's Issues

Thursday 02 May 2019 14:19:43 -0400 (0:07:03.147) 0:10:18.835 **********

Recommend Projects

Recommend Topics

Recommend Org