The purpose of this set of scripts is to eventually evolve it into a one-click solution to get a Data science environment going.
- One click installation of DSA Environment
- Install Jhub on K3d Eviornment
- Eventually make this cloud agnostic
- Currently has k3d set of features working with JupyterHub
- Has a Azure set of features working with Elasticsearch, Kibana and JupyterHub
- To get Jupyterhub on Azure, follow the steps in the JupyterHub setup section while ignoring the k3d cluster creation portion
- Optional setup for Kubernetes Dashboard is included
After installing K3d from k3d.io documentation, follow these steps to set up your cluster for JupyterHub:
Note: this cluster currently doesn't work with elasticsearch installation
#Create Cluster with chosen name
k3d cluster create <NAME> --agents 3
#Allow Kubeconfig to use k3d cluster
k3d kubeconfig merge <NAME> --switch-context
#Use the new cluster with kubectl
kubectl get nodes
az aks create -g matheesanmKubeEnv --name elasticCluster --node-count 7 --generate-ssh-keys
az aks get-credentials --resource-group matheesanmKubeEnv --name elasticCluster
#Install
curl https://raw.githubusercontent.com/helm/helm/master/scripts/get-helm-3 | bash
#Verify helm is installed
helm list
#Generate a random hex string representing 32 bytes AND Store this to use as a security token
openssl rand -hex 32
#Create and start editing a file called config.yaml
nano config.yaml
#The following is mandatory for jupyterHub yaml file
proxy:
secretToken: "<RANDOM_HEX>"
The following portions of YAML code are Optional and can be added to config.yaml, but are not used in this implementation:
singleuser:
memory:
limit: 1G
guarantee: 1G
cpu:
limit: 1
guarantee: 1
storage:
type: none
#Install JupyterHub:
helm repo add jupyterhub https://jupyterhub.github.io/helm-chart/
#Update Helm
helm repo update
#Variables to use the following
RELEASE=jhub
NAMESPACE=jhub
#Now install the chart configured by your config.yaml
helm upgrade --cleanup-on-fail --install jhub jupyterhub/jupyterhub --namespace jhub --create-namespace --version=0.9.0 --values config.yaml
#See the pods being created
kubectl get pod --namespace jhub
#The External IP ofr proxy-public should be available, if the load balancer is configured correctly(This has yet to be tested)
kubectl get service --namespace jhub
#But in the case you don't get an external IP, you can access Jhub through the following command, and going to localhost:8080
kubectl port-forward -n jhub svc/proxy-public 8080:80
When you want to modify the config.yaml file, try following the following steps:
- Make a change to your config.yaml.
- Run a helm upgrade:
In this case, your release can be found through helm list
RELEASE=jhub NAMESPACE=jhub helm upgrade --cleanup-on-fail \ $RELEASE jupyterhub/jupyterhub \ --namespace $NAMESPACE --version=0.8.2 \ --values config.yaml
- Verify that the hub and proxy pods entered the Running state after the upgrade completed.
NAMESPACE=jhub
kubectl get pod --namespace $NAMESPACE
az aks create -g <resourceGroupName> --name <kubernetesCluster> --node-count 7 --generate-ssh-keys
az aks get-credentials --resource-group <resourceGroupName> --name <kubernetesCluster>
kubectl apply -f https://download.elastic.co/downloads/eck/1.4.0/all-in-one.yaml
#Log checking:
kubectl -n elastic-system logs -f statefulset.apps/elastic-operator
cat <<EOF | kubectl apply -f -
apiVersion: elasticsearch.k8s.elastic.co/v1
kind: Elasticsearch
metadata:
name: quickstart
spec:
version: 7.11.1 #Make sure you use the version of your choice
http:
service:
spec:
type: LoadBalancer #Adds a External IP
nodeSets:
- name: default
count: 1
config:
node.master: true
node.data: true
node.ingest: true
node.store.allow_mmap: false
EOF
You should eventually see the quickstart-es-http service with an IP for the External Load Balancer. Take note of this IP as you will need it later on.
kubectl get elasticsearch
kubectl get pods -w
kubectl logs -f quickstart-es-default-0
kubectl get service quickstart-es-http
Get the password using the following command and try to curl into elasticsearch Alternativley, you can visit https://:9200 and try to login The Username is always elastic, and the password can be found using the following commands
PASSWORD=$(kubectl get secret quickstart-es-elastic-user -o=jsonpath='{.data.elastic}' | base64 —decode)
curl -u "elastic:$PASSWORD" -k "https://52.147.212.172:9200”
cat <<EOF | kubectl apply -f -
apiVersion: kibana.k8s.elastic.co/v1
kind: Kibana
metadata:
name: quickstart
spec:
version: 7.11.1 #Make sure Kibana and Elasticsearch are on the same version.
http:
service:
spec:
type: LoadBalancer #Adds a External IP
count: 1
elasticsearchRef:
name: quickstart
EOF
kubectl get kibana
kubectl get -n elasticSearch
curl -u "elastic:$PASSWORD" -k "https://52.147.212.172:9200”
In the background, run the following for now(Until proper Nodeport setup is added to the yaml): The following is only needed if you're not using an External IP
kubectl proxy
kubectl port-forward service/quickstart-es-http 9200
kubectl port-forward service/quickstart-kb-http 5601
kubectl get secret quickstart-es-elastic-user -o=jsonpath='{.data.elastic}' | base64 —decode
JupyterHub Installation(Assuming you have the config file and Helm setup already there)
helm upgrade --cleanup-on-fail --install jhub jupyterhub/jupyterhub --namespace jhub --create-namespace --version=0.9.0 --values config.yaml
kubectl apply -f https://raw.githubusercontent.com/kubernetes/dashboard/v2.0.0/aio/deploy/recommended.yaml
kubectl apply -f https://gist.githubusercontent.com/chukaofili/9e94d966e73566eba5abdca7ccb067e6/raw/0f17cd37d2932fb4c3a2e7f4434d08bc64432090/k8s-dashboard-admin-user.yaml
kubectl describe sa admin-user -n kube-system
kubectl describe secret admin-user-token-ls25k -n kube-system
Then follow rest of JupyterHub Config as you would on k3d, but exclude the cluster creation portion to get jupyterHub on Azure.
pip install elasticsearch
pip install pandas
import requests
import urllib3
urllib3.disable_warnings(urllib3.exceptions.InsecureRequestWarning)
response = requests.get('https://<ExternalIPofES>:9200', verify=False, auth=('elastic', 'ZI8DOAE42j486958hqqt7izr'))
print (response.text)
try:
import os
import sys
from elasticsearch import Elasticsearch as Elasticsearch
import pandas as pd
print('All modules loaded')
except Exception as e:
print("some error {}".format(e))
es = Elasticsearch(['https://elastic:<base64-decoded_password>@<ExternalIPofES>:9200/'], verify_certs=False)
## importing socket module
import socket
## getting the hostname by socket.gethostname() method
hostname = socket.gethostname()
## getting the IP address using socket.gethostbyname() method
ip_address = socket.gethostbyname(hostname)
## printing the hostname and ip_address
print(f"Hostname: {hostname}")
print(f"IP Address: {ip_address}")
--> See the latest 4 images for current state
Latest Todo(Connect Jupyterhub to es using python): https://elasticsearch-py.readthedocs.io/en/7.10.0/
Look into HELK: https://github.com/Cyb3rWard0g/HELK
Elastic Docs for Kube: https://www.elastic.co/guide/en/cloud-on-k8s/current/k8s-deploy-elasticsearch.html
TODO Soon: Add logstash to act as middleware for ES and Kafka: https://medium.com/@tharangarajapaksha/elk-stack-in-k8s-cluster-13bb509185e0 https://towardsdatascience.com/the-basics-of-deploying-logstash-pipelines-to-kubernetes-94a470ad34d9
Littlest JupyterHub Installatoin --> Doesn't fully support Docker env
Hub runig w elastic kid, Through yaml, controller, Be cloud agnostic Cluster cost, build up and tear down is going to be annoying We should learn how to do it in amazon and google Have ability to build scripts on all 3 If we show them it in Azure