Git Product home page Git Product logo

Comments (36)

mrlesmithjr avatar mrlesmithjr commented on July 28, 2024

Looks like it might be time for me to wipe my cluster out and go through the provisioning again. It has been a few months since I have and looks like some things may be out of whack here.

from ansible-rpi-k8s-cluster.

mrlesmithjr avatar mrlesmithjr commented on July 28, 2024

So I just spun up the Vagrant test environment which is included in the ansible-k8s repo without any issue. This repo uses the same codebase as that repo which is pulled in via requirements.yml so possibly need to do some more investigation.

image

from ansible-rpi-k8s-cluster.

mrlesmithjr avatar mrlesmithjr commented on July 28, 2024

So I just ran through this on my PI cluster and saw similar results but what had happened to me also looks like what has happened in your setup possibly. A few nodes failed in previous tasks, therefore, leaving them out of the task of capturing cluster nodes which needs to match the number of hosts defined in the Ansible group k8s_cluster_group. Below is that task:

- name: cluster_summary | Capturing Cluster Nodes
  command: >
           kubectl --kubeconfig {{ k8s_admin_config }} get nodes
  changed_when: false
  become: true
  # We wait for the number of nodes to match the number of hosts defined in
  # the ansible group. We subtract 1 to account for the header line
  until: >
         ((_k8s_cluster_nodes['stdout_lines']|length - 1) == (groups[k8s_cluster_group]|length) and
         'NotReady' not in _k8s_cluster_nodes['stdout'])
  retries: 30
  delay: 10
  register: _k8s_cluster_nodes
  when: inventory_hostname == k8s_master

So what did I need to do to resolve? I removed each of my nodes from ~/.ssh/known_hosts and ran playbooks/deploy.yml again and all was good.

from ansible-rpi-k8s-cluster.

aaronkjones avatar aaronkjones commented on July 28, 2024

Steps to reproduce:

  1. clone repo
  2. modify hosts.inv
[rpi_k8s:children]
rpi_k8s_master
rpi_k8s_slaves

[rpi_k8s_master]
rpi-k8s-1 ansible_host=192.168.1.139 # WiFi DHCP IP

[rpi_k8s_slaves]
rpi-k8s-2 ansible_host=192.168.100.128
rpi-k8s-3 ansible_host=192.168.100.129
rpi-k8s-4 ansible_host=192.168.100.130
# Removed rpi-k8s-5 since I only have 4 Raspberry Pi's
  1. In /group_vars/all/all.yml change,
dhcp_scope_end_range: "{{ dhcp_scope_subnet }}.130"
dhcp_scope_start_range: "{{ dhcp_scope_subnet }}.128"
jumphost_ip: 192.168.1.139
rpi_nodes: 4

Oddly it doesn't look like I have a known_hosts file on any of the nodes.

pi@rpi-k8s-2:~/.ssh $ ls -la
total 12
drwx------ 2 pi pi 4096 May  1 16:39 .
drwxr-xr-x 4 pi pi 4096 May  1 16:40 ..
-rw------- 1 pi pi  737 May  1 16:39 authorized_keys

I tried again without changing anything (only rebooted the cluster) and I got this result:

TASK [ansible-k8s : cluster_summary | Capturing Cluster Nodes] **************************************************************************************************************************
Wednesday 02 May 2018  07:17:35 -0700 (0:00:00.125)       0:04:08.394 *********
skipping: [rpi-k8s-2]
skipping: [rpi-k8s-3]
skipping: [rpi-k8s-4]
FAILED - RETRYING: cluster_summary | Capturing Cluster Nodes (30 retries left).
FAILED - RETRYING: cluster_summary | Capturing Cluster Nodes (29 retries left).
FAILED - RETRYING: cluster_summary | Capturing Cluster Nodes (28 retries left).
FAILED - RETRYING: cluster_summary | Capturing Cluster Nodes (27 retries left).
FAILED - RETRYING: cluster_summary | Capturing Cluster Nodes (26 retries left).
FAILED - RETRYING: cluster_summary | Capturing Cluster Nodes (25 retries left).
FAILED - RETRYING: cluster_summary | Capturing Cluster Nodes (24 retries left).
FAILED - RETRYING: cluster_summary | Capturing Cluster Nodes (23 retries left).
FAILED - RETRYING: cluster_summary | Capturing Cluster Nodes (22 retries left).
FAILED - RETRYING: cluster_summary | Capturing Cluster Nodes (21 retries left).
FAILED - RETRYING: cluster_summary | Capturing Cluster Nodes (20 retries left).
FAILED - RETRYING: cluster_summary | Capturing Cluster Nodes (19 retries left).
FAILED - RETRYING: cluster_summary | Capturing Cluster Nodes (18 retries left).
FAILED - RETRYING: cluster_summary | Capturing Cluster Nodes (17 retries left).
FAILED - RETRYING: cluster_summary | Capturing Cluster Nodes (16 retries left).
FAILED - RETRYING: cluster_summary | Capturing Cluster Nodes (15 retries left).
FAILED - RETRYING: cluster_summary | Capturing Cluster Nodes (14 retries left).
FAILED - RETRYING: cluster_summary | Capturing Cluster Nodes (13 retries left).
FAILED - RETRYING: cluster_summary | Capturing Cluster Nodes (12 retries left).
FAILED - RETRYING: cluster_summary | Capturing Cluster Nodes (11 retries left).
FAILED - RETRYING: cluster_summary | Capturing Cluster Nodes (10 retries left).
FAILED - RETRYING: cluster_summary | Capturing Cluster Nodes (9 retries left).
FAILED - RETRYING: cluster_summary | Capturing Cluster Nodes (8 retries left).
FAILED - RETRYING: cluster_summary | Capturing Cluster Nodes (7 retries left).
FAILED - RETRYING: cluster_summary | Capturing Cluster Nodes (6 retries left).
FAILED - RETRYING: cluster_summary | Capturing Cluster Nodes (5 retries left).
FAILED - RETRYING: cluster_summary | Capturing Cluster Nodes (4 retries left).
FAILED - RETRYING: cluster_summary | Capturing Cluster Nodes (3 retries left).
FAILED - RETRYING: cluster_summary | Capturing Cluster Nodes (2 retries left).
FAILED - RETRYING: cluster_summary | Capturing Cluster Nodes (1 retries left).
fatal: [rpi-k8s-1]: FAILED! => {"attempts": 30, "changed": false, "cmd": ["kubectl", "--kubeconfig", "/etc/kubernetes/admin.conf", "get", "nodes"], "delta": "0:00:01.911321", "end": "2018-05-02 14:23:42.748631", "rc": 0, "start": "2018-05-02 14:23:40.837310", "stderr": "", "stderr_lines": [], "stdout": "NAME        STATUS    ROLES     AGE       VERSION\nrpi-k8s-1   Ready     master    20h       v1.10.2", "stdout_lines": ["NAME        STATUS    ROLES     AGE       VERSION", "rpi-k8s-1   Ready     master    20h       v1.10.2"]}

On the master

pi@rpi-k8s-1:~ $ sudo kubectl --kubeconfig /etc/kubernetes/admin.conf get nodes
NAME        STATUS    ROLES     AGE       VERSION
rpi-k8s-1   Ready     master    20h       v1.10.2

So, appears that no nodes are joining. Will continue to troubleshoot.

from ansible-rpi-k8s-cluster.

mrlesmithjr avatar mrlesmithjr commented on July 28, 2024

Sorry, I meant on the machine you are running Ansible from for known_hosts.

from ansible-rpi-k8s-cluster.

aaronkjones avatar aaronkjones commented on July 28, 2024

I removed them from known_hosts on my workstation. I also repulled the repo and switched to the issue#7 branch. I changed the hosts and group_vars and deployed again. Same error.

fatal: [rpi-k8s-1]: FAILED! => {"attempts": 30, "changed": false, "cmd": ["kubectl", "--kubeconfig", "/etc/kubernetes/admin.conf", "get", "nodes"], "delta": "0:00:01.376847", "end": "2018-05-02 15:43:08.166073", "rc": 0, "start": "2018-05-02 15:43:06.789226", "stderr": "", "stderr_lines": [], "stdout": "NAME        STATUS    ROLES     AGE       VERSION\nrpi-k8s-1   Ready     master    19m       v1.10.2", "stdout_lines": ["NAME        STATUS    ROLES     AGE       VERSION", "rpi-k8s-1   Ready     master    19m       v1.10.2"]}

Going to try reimaging the Pi's.

from ansible-rpi-k8s-cluster.

mrlesmithjr avatar mrlesmithjr commented on July 28, 2024

Man, that sucks. I'll keep digging too. Was hoping what I saw and what I did to resolve the issue would help you out as well.

from ansible-rpi-k8s-cluster.

aaronkjones avatar aaronkjones commented on July 28, 2024

No worries. I appreciate the work you have put into this so far.

For thoroughness, my setup is:
4 x Raspberry Pi 3B
1 x Aukey 5 port USB Charger
1 x TRENDnet 5-Port Unmanaged Gigabit Switch TEG-S50G
4 x SanDisk Ultra 32GB microSDHC UHS-I Class 10
1 x GeauxRobot Raspberry Pi 3 4-Layer Dog Bone Stack Case

I have tried this with 2018-04 Raspbian Lite on all Pis and Raspbian full on master with Raspbian lite on the rest. Maybe I need full Raspbian on all nodes? Seems unlikely though.

I tried this on my home network and at work

If you can, maybe try it with 4 nodes instead of 5 to see if you see similar results.

from ansible-rpi-k8s-cluster.

mrlesmithjr avatar mrlesmithjr commented on July 28, 2024

I'll give it a go here in just a bit and let you know what I find.

from ansible-rpi-k8s-cluster.

mrlesmithjr avatar mrlesmithjr commented on July 28, 2024

Looks like redeploying from scratch using the latest version of Raspbian Lite I hit the same thing as you did. So not sure why this is, but this was even with my 5 nodes.

image

Now to figure out if it is an issue with the latest version of Raspbian or something else.

from ansible-rpi-k8s-cluster.

mrlesmithjr avatar mrlesmithjr commented on July 28, 2024

Do you also see the same in /etc/hosts

image

The IP for rpi-k8s-1 is wrong and could be part of the issue potentially.

from ansible-rpi-k8s-cluster.

mrlesmithjr avatar mrlesmithjr commented on July 28, 2024

Going to now try the original Raspbian Lite version that I installed with. Will start from scratch and see if the same results occur.

Update: Failed as well but before the previous failure.

from ansible-rpi-k8s-cluster.

mrlesmithjr avatar mrlesmithjr commented on July 28, 2024

What Ansible version are you using? Want to make sure this isn't something with a specific Ansible version by chance either. It seems like others are getting this error occasionally as well and I don't want to be chasing the wrong issue.

Mine is:

ansible 2.5.2
  config file = /Users/larry/Git_Projects/Personal/GitHub/mrlesmithjr/ansible-rpi-k8s-cluster/ansible.cfg
  configured module search path = ['/Users/larry/.ansible/plugins/modules', '/usr/share/ansible/plugins/modules']
  ansible python module location = /usr/local/lib/python3.6/site-packages/ansible
  executable location = /usr/local/bin/ansible
  python version = 3.6.5 (default, Mar 30 2018, 06:42:10) [GCC 4.2.1 Compatible Apple LLVM 9.0.0 (clang-900.0.39.2)]

from ansible-rpi-k8s-cluster.

mrlesmithjr avatar mrlesmithjr commented on July 28, 2024

During this last run, 2 of my RPI's rebooted during the wait for nodes during K8s role. Not sure why, and not sure if they did this previously but one of them was the master.

from ansible-rpi-k8s-cluster.

mrlesmithjr avatar mrlesmithjr commented on July 28, 2024

And doing a reset cluster on them several more just rebooted. This is with the latest version of Raspbian Lite and using Ansible 2.5.0.0

from ansible-rpi-k8s-cluster.

mrlesmithjr avatar mrlesmithjr commented on July 28, 2024

And now this again, and when it happened 2 of the RPI's rebooted again.

image

from ansible-rpi-k8s-cluster.

mrlesmithjr avatar mrlesmithjr commented on July 28, 2024

Now I am curious whether the fix from #7 changing the GlusterFS version to 3.13 is causing the instability. Just a thought.

from ansible-rpi-k8s-cluster.

mrlesmithjr avatar mrlesmithjr commented on July 28, 2024

So, I am going about this a bit differently now. I am going to force a K8s version to be installed vs. just the latest to rule out a K8s version causing us issues here. At the time that I put all of this together, K8s version was 1.9.3 as I had published this article once 1.10 was released and I captured the version the cluster was running. Not sure this will prove anything but definitely worth a shot. If that doesn't resolve the issue then I will look into GlusterFS being a potential culprit.

Stay tuned.

from ansible-rpi-k8s-cluster.

mrlesmithjr avatar mrlesmithjr commented on July 28, 2024

And after all of that, 2 nodes rebooted during waiting for capturing cluster nodes. So not sure what is going on.

from ansible-rpi-k8s-cluster.

aaronkjones avatar aaronkjones commented on July 28, 2024

Ansible

ansible 2.5.2
  config file = None
  configured module search path = [u'/Users/-/.ansible/plugins/modules', u'/usr/share/ansible/plugins/modules']
  ansible python module location = /usr/local/Cellar/ansible/2.5.2/libexec/lib/python2.7/site-packages/ansible
  executable location = /usr/local/bin/ansible
  python version = 2.7.15 (default, May  1 2018, 16:44:08) [GCC 4.2.1 Compatible Apple LLVM 9.1.0 (clang-902.0.39.1)]

Syslog dump

root@rpi-k8s-2:/var/log# tail syslog
May  3 15:57:27 rpi-k8s-2 kubelet[29302]: Flag --cluster-dns has been deprecated, This parameter should be set via the config file specified by the Kubelet's --config flag. See https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/ for more information.
May  3 15:57:27 rpi-k8s-2 kubelet[29302]: Flag --cluster-domain has been deprecated, This parameter should be set via the config file specified by the Kubelet's --config flag. See https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/ for more information.
May  3 15:57:27 rpi-k8s-2 kubelet[29302]: Flag --authorization-mode has been deprecated, This parameter should be set via the config file specified by the Kubelet's --config flag. See https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/ for more information.
May  3 15:57:27 rpi-k8s-2 kubelet[29302]: Flag --client-ca-file has been deprecated, This parameter should be set via the config file specified by the Kubelet's --config flag. See https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/ for more information.
May  3 15:57:27 rpi-k8s-2 kubelet[29302]: Flag --cadvisor-port has been deprecated, The default will change to 0 (disabled) in 1.12, and the cadvisor port will be removed entirely in 1.13
May  3 15:57:27 rpi-k8s-2 kubelet[29302]: I0503 15:57:27.821880   29302 feature_gate.go:226] feature gates: &{{} map[]}
May  3 15:57:27 rpi-k8s-2 kubelet[29302]: F0503 15:57:27.822171   29302 server.go:218] unable to load client CA file /etc/kubernetes/pki/ca.crt: open /etc/kubernetes/pki/ca.crt: no such file or directory
May  3 15:57:27 rpi-k8s-2 systemd[1]: kubelet.service: Main process exited, code=exited, status=255/n/a
May  3 15:57:27 rpi-k8s-2 systemd[1]: kubelet.service: Unit entered failed state.
May  3 15:57:27 rpi-k8s-2 systemd[1]: kubelet.service: Failed with result 'exit-code'.
root@rpi-k8s-2:/var/log# tail syslog
May  3 15:57:52 rpi-k8s-2 kubelet[29448]: I0503 15:57:52.746530   29448 reconciler.go:207] operationExecutor.VerifyControllerAttachedVolume started for volume "dbus" (UniqueName: "kubernetes.io/host-path/bab9ca87-4eea-11e8-a4c4-b827eb7e84c6-dbus") pod "weave-net-khv4w" (UID: "bab9ca87-4eea-11e8-a4c4-b827eb7e84c6")
May  3 15:57:52 rpi-k8s-2 kubelet[29448]: I0503 15:57:52.746660   29448 reconciler.go:207] operationExecutor.VerifyControllerAttachedVolume started for volume "lib-modules" (UniqueName: "kubernetes.io/host-path/bab9ca87-4eea-11e8-a4c4-b827eb7e84c6-lib-modules") pod "weave-net-khv4w" (UID: "bab9ca87-4eea-11e8-a4c4-b827eb7e84c6")
May  3 15:57:52 rpi-k8s-2 kubelet[29448]: I0503 15:57:52.746791   29448 reconciler.go:207] operationExecutor.VerifyControllerAttachedVolume started for volume "xtables-lock" (UniqueName: "kubernetes.io/host-path/bab9ca87-4eea-11e8-a4c4-b827eb7e84c6-xtables-lock") pod "weave-net-khv4w" (UID: "bab9ca87-4eea-11e8-a4c4-b827eb7e84c6")
May  3 15:57:52 rpi-k8s-2 systemd[1]: Started Kubernetes transient mount for /var/lib/kubelet/pods/bab80f63-4eea-11e8-a4c4-b827eb7e84c6/volumes/kubernetes.io~secret/kube-proxy-token-hrqqj.
May  3 15:57:52 rpi-k8s-2 systemd[1]: Started Kubernetes transient mount for /var/lib/kubelet/pods/bab9ca87-4eea-11e8-a4c4-b827eb7e84c6/volumes/kubernetes.io~secret/weave-net-token-7mmh8.
May  3 15:57:53 rpi-k8s-2 dockerd[461]: time="2018-05-03T15:57:53.635094728Z" level=error msg="Handler for GET /v1.31/images/weaveworks/weave-kube:2.3.0/json returned error: readlink /var/lib/docker/overlay2: invalid argument"
May  3 15:57:53 rpi-k8s-2 kubelet[29448]: E0503 15:57:53.640435   29448 remote_image.go:83] ImageStatus "weaveworks/weave-kube:2.3.0" from image service failed: rpc error: code = Unknown desc = Error response from daemon: readlink /var/lib/docker/overlay2: invalid argument
May  3 15:57:53 rpi-k8s-2 kubelet[29448]: E0503 15:57:53.641021   29448 kuberuntime_image.go:87] ImageStatus for image {"weaveworks/weave-kube:2.3.0"} failed: rpc error: code = Unknown desc = Error response from daemon: readlink /var/lib/docker/overlay2: invalid argument
May  3 15:57:53 rpi-k8s-2 kubelet[29448]: E0503 15:57:53.641414   29448 kuberuntime_manager.go:733] container start failed: ImageInspectError: Failed to inspect image "weaveworks/weave-kube:2.3.0": rpc error: code = Unknown desc = Error response from daemon: readlink /var/lib/docker/overlay2: invalid argument
May  3 15:57:53 rpi-k8s-2 kubelet[29448]: W0503 15:57:53.645017   29448 pod_container_deletor.go:77] Container "ba9c711827fef0575b30bbc2e14e0f8538d5c84c7cb31f580e2f7852b32ecded" not found in pod's containers
root@rpi-k8s-2:/var/log# packet_write_wait: Connection to 10.73.140.193 port 22: Broken pipe

I captured that at step Capturing Cluster Nodes. As you can see at the end it rebooted (Broken pipe)

from ansible-rpi-k8s-cluster.

aaronkjones avatar aaronkjones commented on July 28, 2024

My hunch is this is related to the many issues people are filing with kubeadm and kubelet 1.10 on Raspberry Pi. Did you try kubeadm and kubelet 1.9.7 like this guy reported working?

I will try to run 1.9.7 and see if it works

from ansible-rpi-k8s-cluster.

mrlesmithjr avatar mrlesmithjr commented on July 28, 2024

I have pushed the branch issue#8 which now has the ability to define the K8s version to install. You can change it in inventory/group_vars/all/k8s.yml

k8s_version: 1.9.3

from ansible-rpi-k8s-cluster.

mrlesmithjr avatar mrlesmithjr commented on July 28, 2024

/cc @alexellis Does this ring a bell into what you have been seeing?

from ansible-rpi-k8s-cluster.

aaronkjones avatar aaronkjones commented on July 28, 2024

I had already manually installed kubectl kubelet and kubeadm version 1.9.7-00 on all nodes and reran branch issue#7.

Failure at Waiting For Kube-DNS or CoreDNS To Be Running this time:

root@rpi-k8s-2:/home/pi# kubectl --kubeconfig /etc/kubernetes/admin.conf get pods --all-namespaces
NAMESPACE     NAME                                READY     STATUS              RESTARTS   AGE
kube-system   etcd-rpi-k8s-2                      1/1       Running             0          6m
kube-system   kube-apiserver-rpi-k8s-2            1/1       Running             1          6m
kube-system   kube-controller-manager-rpi-k8s-2   1/1       Running             0          5m
kube-system   kube-dns-7b6ff86f69-55gtr           0/3       Pending             0          6m
kube-system   kube-proxy-w9tqt                    1/1       Running             0          6m
kube-system   kube-scheduler-rpi-k8s-2            1/1       Running             0          6m
kube-system   weave-net-gjnss                     1/2       ImageInspectError   0          6m

Looks related to issues with Weave: https://gist.github.com/alexellis/fdbc90de7691a1b9edb545c17da2d975#gistcomment-2535496

Seems people are having success with flannel in place of weave.

from ansible-rpi-k8s-cluster.

aaronkjones avatar aaronkjones commented on July 28, 2024

I used this other playbook https://github.com/carlosroman/ansible-k8s-raspberry-playbook which @carlosroman commented uses flannel instead of weave to get working. He posted about it here.

from ansible-rpi-k8s-cluster.

mrlesmithjr avatar mrlesmithjr commented on July 28, 2024

So Flannel did work?

from ansible-rpi-k8s-cluster.

mrlesmithjr avatar mrlesmithjr commented on July 28, 2024

I am testing now. We can change inventory/group_vars/all/k8s.yml as below:

# k8s_pod_network_config: "https://cloud.weave.works/k8s/net?k8s-version=$(kubectl version | base64 | tr -d '\n')"
k8s_pod_network_config: https://gist.githubusercontent.com/mrlesmithjr/3a71dea11d5cda061abd46b67246e6b5/raw/87e8acf67d44d1878979383d9bb49738430f0588/kube-flannel-arm64.yml

If this test works then I will push this up to branch issue#8

from ansible-rpi-k8s-cluster.

aaronkjones avatar aaronkjones commented on July 28, 2024

Flannel did work. I broke my cluster for another reason (trying to get 3.5" LCD display dashboard running).

from ansible-rpi-k8s-cluster.

aaronkjones avatar aaronkjones commented on July 28, 2024

Well, maybe not. Now I am seeing:

└─▪kubectl get pods --namespace kube-system
NAME                                 READY     STATUS              RESTARTS   AGE
etcd-k8s-master                      1/1       Running             0          9m
kube-apiserver-k8s-master            1/1       Running             1          9m
kube-controller-manager-k8s-master   1/1       Running             0          9m
kube-dns-686d6fb9c-hpprt             0/3       ContainerCreating   0          9m
kube-flannel-ds-6hjwr                0/1       CrashLoopBackOff    6          9m
kube-flannel-ds-b7w5p                0/1       CrashLoopBackOff    6          9m
kube-flannel-ds-cmtcr                0/1       CrashLoopBackOff    6          9m
kube-flannel-ds-j86js                0/1       CrashLoopBackOff    6          9m
kube-proxy-bpt49                     1/1       Running             0          9m
kube-proxy-k8rfj                     1/1       Running             0          9m
kube-proxy-v2rs8                     1/1       Running             0          9m
kube-proxy-zlmvj                     1/1       Running             0          9m
kube-scheduler-k8s-master            1/1       Running             0          9m
tiller-deploy-df4fdf55d-g7pq9        0/1       ContainerCreating   0          31s

Grrrrrr

from ansible-rpi-k8s-cluster.

alexellis avatar alexellis commented on July 28, 2024

I would not attempt to switch to flannel - it seems to have issues on arm. If you think you may be having issues with Weave net then you should log an issue with them or join their Slack.

https://weave.works

from ansible-rpi-k8s-cluster.

mrlesmithjr avatar mrlesmithjr commented on July 28, 2024

Thanks for the input @alexellis. I saw issues with Flannel as well. I finally just gave up last night. Figured I would hold off for a week or so to see what everyone else has figured out ;)

from ansible-rpi-k8s-cluster.

aaronkjones avatar aaronkjones commented on July 28, 2024

@mrlesmithjr The only way I could get this working on my cluster was to use Flannel. I forked the gist @alexellis and made modifications to use flannel instead, https://gist.github.com/aaronkjones/d996f1a441bc80875fd4929866ca65ad

If there are issues with flannel on arm it would be nice to know what they are.

from ansible-rpi-k8s-cluster.

mrlesmithjr avatar mrlesmithjr commented on July 28, 2024

@aaronkjones Cool deal. I plan on looking at this some more next week. I attempted flannel a day or so ago and mine was a failure too. But whatever is happening with Weave only happens when other nodes are joining the cluster and causing node reboots from what I have seen. Crazy stuff man!

from ansible-rpi-k8s-cluster.

aaronkjones avatar aaronkjones commented on July 28, 2024

I saw that too with the reboots. I got this in the terminal right before it rebooted,

kernel:[  277.274031] Internal error: Oops: 80000007 [#1] SMP ARM

It is mentioned here, here, and here. Each saying to use flannel over weave.

I am a k8s novice, so I have no idea what implications there are in using flannel over weave, but it works for me so far.

Feel free to close this ticket if you want. I think the CNIs are still being developed and are not stable for ARM. Next week flannel might break and weave may work fine. shrug

Thanks again for the help on this.

from ansible-rpi-k8s-cluster.

mrlesmithjr avatar mrlesmithjr commented on July 28, 2024

Definitely, appreciate the input you have provided as well. I'd like to keep this open and track it until some resolution comes. And if it is using Flannel, I am good with that too. But I do agree that there are probably a lot of unknowns with ARM at this time. I definitely don't see any issues with AMD64 deployments. And I will definitely keep tracking the activity over at https://gist.github.com/alexellis/fdbc90de7691a1b9edb545c17da2d975

from ansible-rpi-k8s-cluster.

mrlesmithjr avatar mrlesmithjr commented on July 28, 2024

This issue is now resolved and should be working as designed.

from ansible-rpi-k8s-cluster.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.