This guide provides comprehensive instructions for setting up a Kubernetes cluster on two nodes using Docker and NFS for shared storage.
It covers the installation of Docker, Kubernetes components, the configuration of an NFS server, and concludes with a practical exercise in running an MPI (Message Passing Interface) application across the cluster.
This setup is particularly useful for those looking to explore distributed computing and parallel processing with MPI in a Kubernetes environment.
- Two Ubuntu machines (nodes) with sudo privileges.
- An active internet connection on both nodes.
Docker must be installed on both nodes. If Docker is already installed, you can skip this section.
For additional details on setting up Docker, you can refer to this repository.
The following steps are for installing Kubernetes version 1.27.
These steps must be performed on both nodes.
sudo apt-get update
sudo apt-get install -y apt-transport-https ca-certificates curl
curl -fsSL https://pkgs.k8s.io/core:/stable:/v1.27/deb/Release.key | sudo gpg --dearmor -o /etc/apt/keyrings/kubernetes-apt-keyring.gpg
echo 'deb [signed-by=/etc/apt/keyrings/kubernetes-apt-keyring.gpg] https://pkgs.k8s.io/core:/stable:/v1.27/deb/ /' | sudo tee /etc/apt/sources.list.d/kubernetes.list
sudo apt-get update
sudo apt-get install -y kubelet kubeadm kubectl
sudo apt-mark hold kubelet kubeadm kubectl
To ensure Kubernetes functions correctly, swap must be disabled. The second command modifies the node configuration, so if you plan to remove Kubernetes before restarting, you can skip it.
swapoff -a
** Attention **
# To disable swap on startup, modify /etc/fstab
sudo sed -i '/ swap / s/^\(.*\)$/#\1/g' /etc/fstab
cri-dockerd is required to use Docker as the container runtime for Kubernetes since Kubernetes deprecated direct Docker support starting with version 1.20.
Install cri-dockerd on both nodes.
Refer to this blog for additional details.
sudo su
wget https://github.com/Mirantis/cri-dockerd/releases/download/v0.3.14/cri-dockerd-0.3.14.amd64.tgz
tar -xf cri-dockerd-0.3.14.amd64.tgz
sudo mv ./cri-dockerd/cri-dockerd /usr/local/bin/
rm -rf ./cri-dockerd
wget https://raw.githubusercontent.com/Mirantis/cri-dockerd/master/packaging/systemd/cri-docker.service
wget https://raw.githubusercontent.com/Mirantis/cri-dockerd/master/packaging/systemd/cri-docker.socket
sudo mv cri-docker.socket cri-docker.service /etc/systemd/system/
sudo sed -i -e 's,/usr/bin/cri-dockerd,/usr/local/bin/cri-dockerd,' /etc/systemd/system/cri-docker.service
systemctl daemon-reload
systemctl enable cri-docker.service
systemctl enable --now cri-docker.socket
systemctl status cri-docker.socket
Before proceeding, confirm the IP addresses of both nodes, as you will need them in later steps.
-
Check the IP address on each node: On Node1 (Master Node):
hostname -I
Take note of the IP address; this will be referred to as Node1_IP.
On Node2 (Worker Node):
hostname -I
Take note of the IP address; this will be referred to as Node2_IP.
Now, proceed with the steps below on Node1 (Master Node).
Use the --cri-socket unix:///var/run/cri-dockerd.sock
option to specify cri-dockerd, and the --pod-network-cidr=192.168.0.0/16
option for the Calico CNI that will be installed later.
sudo kubeadm init --cri-socket unix:///var/run/cri-dockerd.sock --pod-network-cidr=192.168.0.0/16
If successful, you should see:
Your Kubernetes control-plane has initialized successfully!
To start using your cluster, you need to run the following as a regular user:
mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config
Alternatively, if you are the root user, you can run:
export KUBECONFIG=/etc/kubernetes/admin.conf
You should now deploy a pod network to the cluster.
Run "kubectl apply -f [podnetwork].yaml" with one of the options listed at:
https://kubernetes.io/docs/concepts/cluster-administration/addons/
Then you can join any number of worker nodes by running the following on each as root:
kubeadm join Node1_IP --token <token> \
--discovery-token-ca-cert-hash <hash>
Note: Save the join
command with the token and other details, as it will be needed to register the worker nodes.
This step should be done under a regular user account, not as root.
mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config
Verify the setup:
kubectl get nodes
Run the following command on the worker node (Node2) to join it to the cluster.
Note: Ensure to include --cri-socket unix:///var/run/cri-dockerd.sock
as you did during the master node setup.
kubeadm join Node1_IP --cri-socket unix:///var/run/cri-dockerd.sock --token <token> --discovery-token-ca-cert-hash <hash>
Example:
kubeadm join 10.10.10.10:6443 --cri-socket unix:///var/run/cri-dockerd.sock --token kxsoro.qn88d098x5ifyw20 \
--discovery-token-ca-cert-hash sha256:4fec907958f294e349f2a86f5044c8f1fc9c01998dc19f6457751b83fc2cf9bd
If there are issues, ensure that the firewall is not blocking the connection.
Install the Calico network plugin on the master node.
Refer to the Calico documentation for more details.
kubectl create -f https://raw.githubusercontent.com/projectcalico/calico/v3.28.1/manifests/tigera-operator.yaml
kubectl create -f https://raw.githubusercontent.com/projectcalico/calico/v3.28.1/manifests/custom-resources.yaml
watch kubectl get pods -n calico-system
After confirming that the pods are running, exit the watch command with Ctrl + C
.
On the master node, verify the status of the nodes:
kubectl get nodes
Both nodes should be listed as "Ready":
NAME STATUS ROLES AGE VERSION
node1 Ready control-plane 114s v1.27.16
node2 Ready <none> 89s v1.27.16
You can further verify by running a test pod:
kubectl apply -f https://k8s.io/examples/pods/simple-pod.yaml
kubectl get pods
kubectl delete pod nginx
Here’s the revised version with instructions on configuring the NFS server to include Node1_IP
and Node2_IP
:
We'll set up an NFS server on the master node using a Docker container from erichough/nfs-server.
-
Create and configure the NFS directory:
sudo mkdir -p /nfs-export/mpi-dir/ sudo chown -R $USER:$USER /nfs-export sudo chmod a+w /nfs-export/mpi-dir/ sudo modprobe nfs nfsd rpcsec_gss_krb5
-
Run the NFS server container:
Before running the NFS server container, ensure you include the IP addresses of both nodes (
Node1_IP
andNode2_IP
). Modify theNFS_EXPORT_0
environment variable accordingly.docker run --name nfs-server --rm -d \ -v /nfs-export:/nfs-export \ -e NFS_EXPORT_0='/nfs-export $YOUR_SUBSET(rw,root_squash,no_subtree_check,async,fsid=0)' \ --privileged \ -p 2049:2049 \ erichough/nfs-server
For example, if
Node1_IP
is10.10.15.51
andNode2_IP
is10.10.15.52
and your nodes are within a specific subnet (e.g.,10.10.15.0/24
), you can specify that subnet:docker run --name nfs-server --rm -d \ -v /nfs-export:/nfs-export \ -e NFS_EXPORT_0='/nfs-export 10.10.15.0/24(rw,root_squash,no_subtree_check,async,fsid=0)' \ --privileged \ -p 2049:2049 \ erichough/nfs-server
To test access from the worker node:
-
On Node2 (Worker Node):
mkdir /mpi-dir sudo mount -v -t nfs4 Node1_IP:/mpi-dir /mpi-dir echo "Test" > /mpi-dir/testfile
This process may take some time.
-
Unmount and Clean Up:
umount /mpi-dir rm -rf /mpi-dir
Create a Persistent Volume (PV) and a Persistent Volume Claim (PVC) to manage shared storage:
-
Create a
nfs-pv-pvc.yaml
file: Replace<NODE1_IP>
with the IP address of your master node.apiVersion: v1 kind: PersistentVolume metadata: name: nfs-pv spec: capacity: storage: 10Gi accessModes: - ReadWriteMany mountOptions: - nfsvers=4.1 nfs: path: /mpi-dir server: <NODE1_IP> persistentVolumeReclaimPolicy: Retain --- apiVersion: v1 kind: PersistentVolumeClaim metadata: name: nfs-pvc spec: accessModes: - ReadWriteMany resources: requests: storage: 10Gi volumeName: nfs-pv
-
Apply the configuration:
kubectl apply -f nfs-pv-pvc.yaml
-
Deploy an application using the shared storage: We'll use the
guswns531/mpi-docker:v1
image, which you created in this repository.Before deploying, make sure to obtain the hostnames of your nodes using the following command:
kubectl get nodes
Use the hostnames retrieved from this command in the
nodeSelector
fields below.apiVersion: apps/v1 kind: Deployment metadata: name: mpi-deployment-1 labels: app: mpi spec: replicas: 2 # Number of pod replicas to be created for this deployment selector: matchLabels: app: mpi template: metadata: labels: app: mpi spec: containers: - name: mpi-node image: guswns531/mpi-docker:v1 # Custom Docker image used for MPI computations volumeMounts: - name: workspace mountPath: /mpi-dir # Mounts the NFS shared directory inside the container ports: - containerPort: 22 # SSH port used for MPI communication between nodes volumes: - name: workspace persistentVolumeClaim: claimName: nfs-pvc # Binds the NFS Persistent Volume Claim to this deployment nodeSelector: kubernetes.io/hostname: <Node1_Hostname> # Replace with the hostname of Node1 (master) tolerations: - key: "node-role.kubernetes.io/control-plane" operator: "Exists" effect: "NoSchedule" # Allows scheduling on control-plane nodes despite NoSchedule taint --- apiVersion: apps/v1 kind: Deployment metadata: name: mpi-deployment-2 labels: app: mpi spec: replicas: 2 # Number of pod replicas to be created for this deployment selector: matchLabels: app: mpi template: metadata: labels: app: mpi spec: containers: - name: mpi-node image: guswns531/mpi-docker:v1 # Custom Docker image used for MPI computations volumeMounts: - name: workspace mountPath: /mpi-dir # Mounts the NFS shared directory inside the container ports: - containerPort: 22 # SSH port used for MPI communication between nodes volumes: - name: workspace persistentVolumeClaim: claimName: nfs-pvc # Binds the NFS Persistent Volume Claim to this deployment nodeSelector: kubernetes.io/hostname: <Node2_Hostname> # Replace with the hostname of Node2 (worker node)
Replace <Node1_Hostname>
and <Node2_Hostname>
with the actual hostnames obtained from kubectl get nodes
. This will ensure the deployments are correctly scheduled on the appropriate nodes.
-
Apply the deployment:
kubectl apply -f mpi-deployment.yaml
-
Verify the setup:
kubectl get pods -o wide
You should see the pods running across the nodes with access to shared storage.
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES mpi-deployment-1-648bcf9c87-7fkfz 1/1 Running 0 8s 192.168.181.17 ns01 <none> <none> mpi-deployment-1-648bcf9c87-xxp2w 1/1 Running 0 8s 192.168.181.18 ns01 <none> <none> mpi-deployment-2-cd8fbcf5-bkdv7 1/1 Running 0 8s 192.168.205.214 ns02 <none> <none> mpi-deployment-2-cd8fbcf5-whclg 1/1 Running 0 8s 192.168.205.215 ns02 <none> <none>
We will use the pod IPs listed above for the next steps. Be sure to use these IPs when accessing the pods via SSH and for testing the MPI application.
-
Create a hostfile:
kubectl get pods -o wide --no-headers | awk '{print $6" slots=1"}' > /nfs-export/mpi-dir/hosts cat /nfs-export/mpi-dir/hosts
The hostfile should look like this:
192.168.181.17 slots=1 192.168.181.18 slots=1 192.168.205.214 slots=1 192.168.205.215 slots=1
-
Write the
hello_mpi.c
program:#include <mpi.h> #include <stdio.h> #include <unistd.h> int main(int argc, char** argv) { MPI_Init(&argc, &argv); int world_rank; MPI_Comm_rank(MPI_COMM_WORLD, &world_rank); int world_size; MPI_Comm_size(MPI_COMM_WORLD, &world_size); char hostname[256]; gethostname(hostname, 256); printf("Hello from rank %d out of %d processors on %s\n", world_rank, world_size, hostname); MPI_Finalize(); return 0; }
Copy the file to the shared NFS directory:
cp hello_mpi.c /nfs-export/mpi-dir
-
Compile and run the MPI application:
First, select one of the pod IPs obtained from the earlier steps. For example, if you choose
192.168.181.17
as your pod IP, proceed with the following steps:ssh [email protected] # password: mpi
Once logged in, compile the MPI application:
mpicc -o /mpi-dir/hello_mpi /mpi-dir/hello_mpi.c
Then, run the MPI application using the hostfile located in the shared NFS directory:
mpirun -np 4 -hostfile /mpi-dir/hosts /mpi-dir/hello_mpi
Expected output:
Hello from rank 0 out of 4 processors on mpi-deployment-1-648bcf9c87-7fkfz Hello from rank 1 out of 4 processors on mpi-deployment-1-648bcf9c87-xxp2w Hello from rank 2 out of 4 processors on mpi-deployment-2-cd8fbcf5-bkdv7 Hello from rank 3 out of 4 processors on mpi-deployment-2-cd8fbcf5-whclg
This version gives clear guidance on selecting a pod IP and then proceeding with the MPI application compilation and execution, ensuring users understand each step.
If you need to reset the Kubernetes setup on both nodes:
kubeadm reset --cri-socket unix:///var/run/cri-dockerd.sock
docker stop nfs-server