Homework 15 (Docker-2)
- What is the difference between a container and an image? The main difference between the image and the container is the writable top layer. To create a container, the Docker engine takes an image, adds a writable top layer and initializes various parameters (network ports, container name, identifier and resource limits).
- Ready infrastructure for reddit-docker-app has the following form
- Infra
- ansible
- environments
- inventory.gcp.yml
- playbooks
- base.yml
- deploy.yml
- docker.yml
- site.yml
- ansible.cfg
- requirements.txt
- environments
- packer
- docker.json
- variables.json.example
- terraform
- main.tf
- outputs.tf
- terraform.tfvars.example
- variables.tf
- ansible
- We bake python, pip, docker.io, pip-docker module into the image (packer + ansible provisioning)
- With Terraform, we deploy the required number of instances from the finished image
- We launch a playbook that checks whether everything is installed, downloads the docker image and launches it
Homework 16 (Docker-3)
To start containers with new variables without restarting the builder, use the following commands
docker run -d --network=reddit --network-alias=app_post_db --network-alias=app_comment_db mongo:latest
docker run -d --network=reddit --network-alias=app_post --env POST_DATABASE_HOST=app_post_db androsovm/post:1.0
docker run -d --network=reddit --network-alias=app_comment --env COMMENT_DATABASE_HOST=app_comment_db androsovm/comment:1.0
docker run -d --network=reddit -p 9292:9292 --env POST_SERVICE_HOST=app_post --env COMMENT_SERVICE_HOST=app_comment androsovm/ui:1.0
- /ui/Dockerfile
FROM alpine:3.9
RUN apk --no-cache update && apk --no-cache --update add \
ruby-full ruby-dev build-base ruby-bundler \
&& bundle install \
&& bundle clean --force
androsovm/ui 2.0 4f32edbbdc96 3 hours ago 430MB
androsovm/ui 4.0 b733a4f805f9 About a minute ago 236MB
- /comment/Dockerfile
FROM alpine:3.9
RUN apk --no-cache update && apk --no-cache --update add \
ruby-full ruby-dev build-base ruby-bundler \
&& bundle install \
&& bundle clean --force
androsovm/comment 1.0 f2b8bb71005e 4 hours ago 784MB
androsovm/comment 3.0 1de43db40158 About a minute ago 233MB
- /post-py/Dockerfile
RUN apk --no-cache --update add build-base && \
pip install --no-cache-dir -r /app/requirements.txt && \
apk del build-base
androsovm/post 1.0 67d1538d796c 8 hours ago 110MB
androsovm/post 2.0 82b1e3091aa8 2 hours ago 106MB
For faster work of the builder, we also need to replace the ADD instructions with COPY and transfer all the steps for installing packages and copying files to the end of the Dockerfile.
Homework 17 (Docker-4)
- See the docker-compose.yml and .env.example
docker-compose [-f <arg>...] [options] [COMMAND] [ARGS...]
-p, --project-name NAME Specify an alternate project name
(default: directory name)
Example:
docker-compose -p hm17 up -d
Creating network "hm17_front_net" with the default driver
Creating network "hm17_back_net" with the default driver
Creating volume "hm17_post_db" with default driver
...
We can also name containers using docker-compose.yml
some_service:
container_name: name_name_name
- We need to copy the source to the docker host
docker-machine scp -r ui/ docker-host:/home/docker-user/ui
docker-machine scp -r comment/ docker-host:/home/docker-user/comment
docker-machine scp -r post-py/ docker-host:/home/docker-user/post-py
- Created a docker-compose.override.yml file
...
ui:
volumes:
- /home/docker-user/ui:/app
command: 'puma --debug -w 2'
post:
volumes:
- /home/docker-user/post-py:/app
comment:
volumes:
- /home/docker-user/comment:/app
command: 'puma --debug -w 2'
volumes:
ui:
post:
comment:
- Start and check
docker-compose -f docker-compose.yml -f docker-compose.override.yml up -d
docker ps
Homework 18 (gitlab-ci-1)
- In order for containers to run in containers (DinD), we need to re-register gitlab-runner
docker exec -it gitlab-runner gitlab-runner register --run-untagged --locked=false --docker-volumes /var/run/docker.sock:/var/run/docker.sock
- Change build_job :, add a docker image
image: docker:latest
- We can use the Dockerfile from previous lessons (docker-monolith)
script:
- echo 'Building'
- cd docker-monolith
- docker build -t gitlab-docker-app:1.0 .
- Now we need to refine test_unit_job:, adding an image and transferring commands from before_script:
test_unit_job:
image: ruby:2.4.2
stage: test
services:
- mongo:latest
script:
- cd reddit
- bundle install
- ruby simpletest.rb
- The easiest way 1.1) Because we can run infinitely many (in theory) gitlab-runner on one machine, we can launch a new container
docker run -d --name gitlab-runner2 --restart always \
-v /srv/gitlab-runner/config:/etc/gitlab-runner \
-v /var/run/docker.sock:/var/run/docker.sock \
gitlab/gitlab-runner:latest
1.2) And take advantage of non-interactive gitlab-runner registration
docker exec gitlab-runner2 gitlab-runner register \
--locked=false \
--non-interactive \
--url http://34.107.83.160/ \
--registration-token v3aNxnjLdRzwYUpmf19e \
--description "Docker Runner" \
--tag-list "linux,bionic,ubuntu,docker" \
--executor docker \
--docker-image "alpine:latest" \
--docker-volumes /var/run/docker.sock:/var/run/docker.sock
1.3) We can repeat these steps endlessly by simply changing the name of the container
- The hard way 2.1) We can take advantage of the ready-made role from ansible galaxy https://galaxy.ansible.com/riemers/gitlab-runner 2.2) Instances can be deployed using terraform 2.3) We can also bake an image using packer with docker and gitlab-runner
- Slack chat integration - #mikhail_androsov in devops-team-otus.slack.com
Homework 19 (monitoring-1)
- We can take this exporter https://github.com/percona/mongodb_exporter
- Need to download repository
git clone https://github.com/percona/mongodb_exporter.git
- Go to the folder with the repository and do docker build
docker build -t ${USERNAME}/mongodb-exporter:1.0 .
- Now add the mongodb-exporter service to docker-compose.yml
mongodb-exporter:
image: ${USERNAME}/mongodb-exporter:1.0
container_name: mongodb-exporter
command:
- '--mongodb.uri=mongodb://post_db:27017'
networks:
- back_net
- Run docker-compose
docker-compose up -d
- We can use official image from dockerhub https://hub.docker.com/r/prom/blackbox-exporter
- Since we need a configuration file for blackbox_exporter to work, create it
modules:
tcp_connect:
prober: tcp
timeout: 5s
http_2xx:
prober: http
timeout: 5s
http:
- Create a new image prom/blackbox-exporter look and add the config there.
FROM prom/blackbox-exporter:v0.16.0
ADD blackbox.yml /config/
- Do docker build
docker build -t ${USERNAME}/blackbox-exporter:1.0 .
- Now add the blackbox-exporter service to docker-compose.yml
blackbox-exporter:
image: ${USERNAME}/blackbox-exporter:1.0
container_name: blackbox-exporter
ports:
- '9115:9115' command:
- '--config.file=/config/blackbox.yml' networks:
- back_net
- Now we need to update the prometheus.yml configuration file. We will check the availability of http and tcp
- job_name: 'blackbox-tcp_connect'
metrics_path: /probe
params:
module: [tcp_connect]
static_configs:
- targets:
- '34.78.221.243:9292'
relabel_configs:
-
source_labels:
- __address__
target_label: __param_target
-
source_labels:
- __param_target
target_label: instance
-
replacement: "blackbox-exporter:9115"
target_label: __address__
- job_name: 'blackbox-http'
metrics_path: /probe
params:
module: [http_2xx]
static_configs:
- targets:
- '34.78.221.243:9292'
relabel_configs:
-
source_labels:
- __address__
target_label: __param_target
-
source_labels:
- __param_target
target_label: instance
-
replacement: "blackbox-exporter:9115"
target_label: __address__
- Update the prometheus image to add the updated configuration file
docker build -t ${USERNAME}/prometheus .
- Run docker-compose
docker-compose up -d
- See Makefile
- make - build & push all images
- make build_all - only build all images
- make push_all - only push all images
Homework 20 (monitoring-2)
- We will use the setup instructions - https://docs.docker.com/config/daemon/prometheus/
- docker-machine host - /etc/docker/daemon.json
{
"metrics-addr" : "0.0.0.0:9323",
"experimental" : true
}
- prometheus.yml
...
- job_name: 'docker'
static_configs:
- targets:
- '34.78.221.243:9323'
- Do not forget to reload the docker daemon
sudo systemctl daemon-reload
sudo systemctl restart docker
- For Grafana, download a ready-made dashboard - https://grafana.com/grafana/dashboards/1229
- Create a new file: /monitoring/telegraf/telegraf.conf
[[outputs.prometheus_client]]
listen = ":9126"
[[inputs.docker]]
endpoint = "unix:///var/run/docker.sock"
container_names = []
timeout = "5s"
perdevice = false
total = false
- Create a new Dockerfile: /monitoring/telegraf/Dockerfile
FROM telegraf:1.14.3-alpine
ADD telegraf.conf /etc/telegraf/
- Create a new build
docker build -t $USER_NAME/telegraf .
- Edit a docker-compose-monitoring.yml
telegraf:
image: ${USER_NAME}/telegraf
container_name: telegraf
volumes:
- /var/run/docker.sock:/var/run/docker.sock
networks:
- back_net
- Grafana dashboard is stored in the directory /monitoring/grafana/dashboards/Telegraf_Docker_Monitorings.json
- monitoring/alertmanager/config.yml
route:
receiver: 'slack-email-notifications'
receivers:
- name: 'slack-email-notifications'
slack_configs:
- channel: '#mikhail_androsov'
email_configs:
- to: $GMAIL_ACCOUNT
from: $GMAIL_ACCOUNT
smarthost: smtp.gmail.com:587
auth_username: $GMAIL_ACCOUNT
auth_identity: $GMAIL_ACCOUNT
auth_password: $GMAIL_PASSWORD
- Create a provisioning folder (monitoring/grafana/provisioning)
- Create a dashboards subfolder (monitoring/grafana/provisioning/dashboards) and a datasources subfolder (monitoring/grafana/provisioning/datasources)
- Create a dash.yml file (monitoring/grafana/provisioning/dashboards/dash.yml)
- name: 'default'
org_id: 1
folder: ''
type: 'file'
options:
folder: '/var/lib/grafana/dashboards'
- Create a data.yml file (monitoring/grafana/provisioning/datasources/data.yml)
datasources:
- access: 'proxy'
editable: true
is_default: true
name: 'Prometheus server'
org_id: 1
type: 'prometheus'
url: 'http://prometheus:9090'
version: 1
- Create a Dockerfile (monitoring/grafana/Dockerfile) file and add our data to the docker image
FROM grafana/grafana:5.0.0
ADD ./provisioning /etc/grafana/provisioning
ADD ./dashboards /var/lib/grafana/dashboards
- Build image
docker build -t $USER_NAME/grafana .
- Update file docker-compose-monitroing.yml
...
grafana:
image: ${USER_NAME}/grafana
...
- Restart all containers and remove the volume of Graphana (used Makefile)
make stop
docker volume rm docker_grafana_data
or
docker-compose down
docker-compose -f docker-compose-monitoring.yaml down
docker volume rm docker_grafana_data
- Start all containers (Used Makefile)
make run
or
docker-compose up -d
docker-compose -f docker-compose-monitoring.yaml up -d
- Create a folder stackdriver (monitoring/stackdriver)
- We will use the completed image prometheuscommunity/stackdriver-exporter:v0.9.0. For his work we need GCP account credentials.
- Create a Dockerfile file (monitorin/stackdriver/Dockerfile)
FROM prometheuscommunity/stackdriver-exporter:v0.9.0
ADD ./project.json /key/project.json
- Build image
docker build -t $USER_NAME/stackdriver .
- Update the Prometheus configuration and build image
...
- job_name: 'stackdriver'
static_configs:
- targets:
- 'stackdriver:9255'
...
docker build -t $USER_NAME/prometheus .
- Update configuration docker-compose-monitoring.yml
...
stackdriver:
image: ${USER_NAME}/stackdriver
container_name: stackdriver
environment:
- GOOGLE_APPLICATION_CREDENTIALS=/key/project.json
- STACKDRIVER_EXPORTER_GOOGLE_PROJECT_ID=PROJECT_NAME
- STACKDRIVER_EXPORTER_MONITORING_METRICS_TYPE_PREFIXES=compute.googleapis.com/instance,pubsub.googleapis.com/subscription,redis.googleapis.com/stats
ports:
- '9255:9255'
networks:
- back_net
...
- Do not push a stackdriver image to the docker hub!
- Now we can collect many metrics
- stackdriver_gce_instance_compute_googleapis_com_instance_cpu and submetrics
- stackdriver_gce_instance_compute_googleapis_com_instance_disk and submetrics
- stackdriver_gce_instance_compute_googleapis_com_instance_network and submetrics
- stackdriver_gce_instance_compute_googleapis_com_instance_uptime
- stackdriver_monitoring_scrapes_total
- and another
- We can use part of the demo version https://github.com/tricksterproxy/trickster/blob/master/deploy/trickster-demo
- Create a folder trickster (monitoring/trickster)
- Create a configuration trickster.conf file (monitoring/trickster/trickster.conf)
[frontend]
listen_port = 8480
[negative_caches]
[negative_caches.default]
400 = 3
404 = 3
500 = 3
502 = 3
[caches]
[caches.fs1]
cache_type = 'filesystem'
[caches.fs1.filesystem]
cache_path = '/data/trickster'
[caches.fs1.index]
max_size_objects = 512
max_size_backoff_objects = 128
[caches.mem1]
cache_type = 'memory'
[caches.mem1.index]
max_size_objects = 512
max_size_backoff_objects = 128
[tracing]
[tracing.std1]
tracer_type = 'stdout'
[tracing.std1.stdout]
pretty_print = true
[origins]
[origins.prom1]
origin_type = 'prometheus'
origin_url = 'http://prometheus:9090'
tracing_name = 'std1'
cache_name = 'mem1'
[logging]
log_level = 'info'
[metrics]
listen_port = 8481
- Create a Dockerfile file (monitorin/trickster/Dockerfile)
FROM tricksterproxy/trickster:1.1.0-beta
COPY trickster.conf /etc/trickster/
- Build image
docker build -t $USER_NAME/trickster .
- Update the Prometheus configuration and build image
...
- job_name: 'trickster'
static_configs:
- targets:
- 'trickster:8481'
...
docker build -t $USER_NAME/prometheus .
- Update the Grafana provisioning datasource configuration file and build image
...
- name: prom-trickster-memory-stdout
type: prometheus
access: proxy
orgId: 1
uid: ds_prom1_trickster
url: http://trickster:8480/prom1
version: 1
editable: true
docker build -t $USER_NAME/grafana .
- Update configuration docker-compose-monitoring.yml
trickster:
image: ${USER_NAME}/trickster
container_name: trickster
depends_on:
- prometheus
- grafana
ports:
- 8480:8480
- 8481:8481
networks:
- back_net
- Run it
make run
- Added dashboards to monitor the trickster and to test the trickster datasource (monitoring/grafana/dashboards/TricksterStatus.json & monitoring/grafana/dashboards/DockerMonitorinTrickster.json)
Homework 21 (logging-1)
- We can devide the grok pattern into 2 parts
<grok>
pattern service=%{WORD:service} \| event=%{WORD:event} \| request_id=%{GREEDYDATA:request_id} \| message='%{GREEDYDATA:message}'
</grok>
<grok>
pattern service=%{WORD:service} \| event=%{WORD:event} \| path=%{URIPATH:path} \| request_id=%{GREEDYDATA:request_id} \| remote_addr=%{IP:remote_addr} \| method= %{WORD:message} \| response_status=%{INT:response_status}
</grok>
- It remains to rebuild the image and check
make docker_build_fluentd
make run_logging
- The first problem I encountered was a long post loading and the error that there is a problem with the comment service. Let's see what zipkin writes.
Client Start
Start Time 05/30 19:49:42.848_007
Relative Time 3.061s
Address 192.168.48.5:9292 (ui_app)
Client Finish
Start Time 05/30 19:50:12.967_778
Relative Time 33.180s
Address 192.168.48.5:9292 (ui_app)
Tags
error - 500
http.path - /5ed26aa51f9dce00140f9416/comments
http.status - 500
Server Address
192.168.48.2:9292 (comment)
Site displays - Can't show comments, some problems with the comment service
- The problem turned out to be that variables are not declared in the comment service Dockerfile. Declare them in docker-compose.yml
comment:
image: ${USERNAME}/comment:${COMMENT_VER}
container_name: comment
environment:
- ZIPKIN_ENABLED=${ZIPKIN_ENABLED}
- COMMENT_DATABASE_HOST=comment_db
- COMMENT_DATABASE=comment
networks:
- front_net
- back_net
- After that, the problem went away, but a new one appeared, posts did not load fast enough. Let's see what zipkin writes.
POST
Client Start
Start Time 05/30 19:56:35.251_840
Relative Time 1.716ms
Address 192.168.48.5:9292 (ui_app)
Server Start
Start Time 05/30 19:56:35.254_037
Relative Time 3.913ms
Address 192.168.48.4:5000 (post)
Server Finish
Start Time 05/30 19:56:38.265_850
Relative Time 3.016s
Address 192.168.48.4:5000 (post)
Client Finish
Start Time 05/30 19:56:38.286_009
Relative Time 3.036s
Address 192.168.48.5:9292 (ui_app)
COMMENT
Client Start
Start Time 05/30 19:56:38.286_379
Relative Time 3.036s
Address 192.168.48.5:9292 (ui_app)
Client Finish
Start Time 05/30 19:56:38.304_208
Relative Time 3.054s
Address 192.168.48.5:9292 (ui_app)
- Everywhere a delay of at least 3 seconds, which is suspicious
- We find the problem in the /post-py/post_app.py file, someone set a delay of 3 seconds in the find_post (id) block
def find_post(id):
...
time.sleep(3)
...
- Delete or comment out this line
- It remains to rebuild the image and restart the application
make docker_build_post_bug
make stop_app
make run_app
- No more problems!
Homework 22 (kubernetes-1)
- By default, if you have a google cloud platform account with a trial period, you cannot use more than four external IP addresses Therefore, it is necessary to carefully check the commands before entering and edit them where necessary, so that the total number of instances does not exceed four
All executable commands that had to be edited here
for i in 0 1; do
gcloud compute instances create controller-${i} \
--async \
--boot-disk-size 200GB \
--can-ip-forward \
--image-family ubuntu-1804-lts \
--image-project ubuntu-os-cloud \
--machine-type n1-standard-1 \
--private-network-ip 10.240.0.1${i} \
--scopes compute-rw,storage-ro,service-management,service-control,logging-write,monitoring \
--subnet kubernetes \
--tags kubernetes-the-hard-way,controller
done
for i in 0 1; do
gcloud compute instances create worker-${i} \
--async \
--boot-disk-size 200GB \
--can-ip-forward \
--image-family ubuntu-1804-lts \
--image-project ubuntu-os-cloud \
--machine-type n1-standard-1 \
--metadata pod-cidr=10.200.${i}.0/24 \
--private-network-ip 10.240.0.2${i} \
--scopes compute-rw,storage-ro,service-management,service-control,logging-write,monitoring \
--subnet kubernetes \
--tags kubernetes-the-hard-way,worker
done
-
for instance in worker-0 worker-1; do
cat > ${instance}-csr.json <<EOF
{
"CN": "system:node:${instance}",
"key": {
"algo": "rsa",
"size": 2048
},
"names": [
{
"C": "US",
"L": "Portland",
"O": "system:nodes",
"OU": "Kubernetes The Hard Way",
"ST": "Oregon"
}
]
}
EOF
EXTERNAL_IP=$(gcloud compute instances describe ${instance} \
--format 'value(networkInterfaces[0].accessConfigs[0].natIP)')
INTERNAL_IP=$(gcloud compute instances describe ${instance} \
--format 'value(networkInterfaces[0].networkIP)')
cfssl gencert \
-ca=ca.pem \
-ca-key=ca-key.pem \
-config=ca-config.json \
-hostname=${instance},${EXTERNAL_IP},${INTERNAL_IP} \
-profile=kubernetes \
${instance}-csr.json | cfssljson -bare ${instance}
done
-
{
KUBERNETES_PUBLIC_ADDRESS=$(gcloud compute addresses describe kubernetes-the-hard-way \
--region $(gcloud config get-value compute/region) \
--format 'value(address)')
KUBERNETES_HOSTNAMES=kubernetes,kubernetes.default,kubernetes.default.svc,kubernetes.default.svc.cluster,kubernetes.svc.cluster.local
cat > kubernetes-csr.json <<EOF
{
"CN": "kubernetes",
"key": {
"algo": "rsa",
"size": 2048
},
"names": [
{
"C": "US",
"L": "Portland",
"O": "Kubernetes",
"OU": "Kubernetes The Hard Way",
"ST": "Oregon"
}
]
}
EOF
cfssl gencert \
-ca=ca.pem \
-ca-key=ca-key.pem \
-config=ca-config.json \
-hostname=10.32.0.1,10.240.0.10,10.240.0.11,${KUBERNETES_PUBLIC_ADDRESS},127.0.0.1,${KUBERNETES_HOSTNAMES} \
-profile=kubernetes \
kubernetes-csr.json | cfssljson -bare kubernetes
}
---
for instance in worker-0 worker-1; do
sudo gcloud compute scp ca.pem ${instance}-key.pem ${instance}.pem ${instance}:~/
done
---
for instance in controller-0 controller-1; do
sudo gcloud compute scp ca.pem ca-key.pem kubernetes-key.pem kubernetes.pem \
service-account-key.pem service-account.pem ${instance}:~/
done
---
for instance in worker-0 worker-1; do
kubectl config set-cluster kubernetes-the-hard-way \
--certificate-authority=ca.pem \
--embed-certs=true \
--server=https://${KUBERNETES_PUBLIC_ADDRESS}:6443 \
--kubeconfig=${instance}.kubeconfig
kubectl config set-credentials system:node:${instance} \
--client-certificate=${instance}.pem \
--client-key=${instance}-key.pem \
--embed-certs=true \
--kubeconfig=${instance}.kubeconfig
kubectl config set-context default \
--cluster=kubernetes-the-hard-way \
--user=system:node:${instance} \
--kubeconfig=${instance}.kubeconfig
kubectl config use-context default --kubeconfig=${instance}.kubeconfig
done
--
for instance in worker-0 worker-1; do
sudo gcloud compute scp ${instance}.kubeconfig kube-proxy.kubeconfig ${instance}:~/
done
--
for instance in controller-0 controller-1; do
sudo gcloud compute scp admin.kubeconfig kube-controller-manager.kubeconfig kube-scheduler.kubeconfig ${instance}:~/
done
--
for instance in controller-0 controller-1; do
sudo gcloud compute scp encryption-config.yaml ${instance}:~/
done
--
cat <<EOF | sudo tee /etc/systemd/system/kube-apiserver.service
[Unit]
Description=Kubernetes API Server
Documentation=https://github.com/kubernetes/kubernetes
[Service]
ExecStart=/usr/local/bin/kube-apiserver \\
--advertise-address=${INTERNAL_IP} \\
--allow-privileged=true \\
--apiserver-count=2 \\
--audit-log-maxage=30 \\
--audit-log-maxbackup=3 \\
--audit-log-maxsize=100 \\
--audit-log-path=/var/log/audit.log \\
--authorization-mode=Node,RBAC \\
--bind-address=0.0.0.0 \\
--client-ca-file=/var/lib/kubernetes/ca.pem \\
--enable-admission-plugins=NamespaceLifecycle,NodeRestriction,LimitRanger,ServiceAccount,DefaultStorageClass,ResourceQuota \\
--etcd-cafile=/var/lib/kubernetes/ca.pem \\
--etcd-certfile=/var/lib/kubernetes/kubernetes.pem \\
--etcd-keyfile=/var/lib/kubernetes/kubernetes-key.pem \\
--etcd-servers=https://10.240.0.10:2379,https://10.240.0.11:2379 \\
--event-ttl=1h \\
--encryption-provider-config=/var/lib/kubernetes/encryption-config.yaml \\
--kubelet-certificate-authority=/var/lib/kubernetes/ca.pem \\
--kubelet-client-certificate=/var/lib/kubernetes/kubernetes.pem \\
--kubelet-client-key=/var/lib/kubernetes/kubernetes-key.pem \\
--kubelet-https=true \\
--runtime-config=api/all \\
--service-account-key-file=/var/lib/kubernetes/service-account.pem \\
--service-cluster-ip-range=10.32.0.0/24 \\
--service-node-port-range=30000-32767 \\
--tls-cert-file=/var/lib/kubernetes/kubernetes.pem \\
--tls-private-key-file=/var/lib/kubernetes/kubernetes-key.pem \\
--v=2
Restart=on-failure
RestartSec=5
[Install]
WantedBy=multi-user.target
EOF
---
{
KUBERNETES_PUBLIC_ADDRESS=$(gcloud compute addresses describe kubernetes-the-hard-way \
--region $(gcloud config get-value compute/region) \
--format 'value(address)')
gcloud compute http-health-checks create kubernetes \
--description "Kubernetes Health Check" \
--host "kubernetes.default.svc.cluster.local" \
--request-path "/healthz"
gcloud compute firewall-rules create kubernetes-the-hard-way-allow-health-check \
--network kubernetes-the-hard-way \
--source-ranges 209.85.152.0/22,209.85.204.0/22,35.191.0.0/16 \
--allow tcp
gcloud compute target-pools create kubernetes-target-pool \
--http-health-check kubernetes
gcloud compute target-pools add-instances kubernetes-target-pool \
--instances controller-0,controller-1
gcloud compute forwarding-rules create kubernetes-forwarding-rule \
--address ${KUBERNETES_PUBLIC_ADDRESS} \
--ports 6443 \
--region $(gcloud config get-value compute/region) \
--target-pool kubernetes-target-pool
}
--
sudo gcloud compute ssh controller-0 \
--command "kubectl get nodes --kubeconfig admin.kubeconfig"
--
for instance in worker-0 worker-1; do
gcloud compute instances describe ${instance} \
--format 'value[separator=" "](networkInterfaces[0].networkIP,metadata.items[0].value)'
done
--
for i in 0 1; do
gcloud compute routes create kubernetes-route-10-200-${i}-0-24 \
--network kubernetes-the-hard-way \
--next-hop-address 10.240.0.2${i} \
--destination-range 10.200.${i}.0/24
done
--
sudo gcloud compute ssh controller-0 \
--command "sudo ETCDCTL_API=3 etcdctl get \
--endpoints=https://127.0.0.1:2379 \
--cacert=/etc/etcd/ca.pem \
--cert=/etc/etcd/kubernetes.pem \
--key=/etc/etcd/kubernetes-key.pem\
/registry/secrets/default/kubernetes-the-hard-way | hexdump -C"
--
gcloud -q compute instances delete \
controller-0 controller-1 \
worker-0 worker-1 \
--zone $(gcloud config get-value compute/zone)
--
{
gcloud -q compute routes delete \
kubernetes-route-10-200-0-0-24 \
kubernetes-route-10-200-1-0-24
gcloud -q compute networks subnets delete kubernetes
gcloud -q compute networks delete kubernetes-the-hard-way
}
- Generate certificates using a script /files/get-certs.sh and local ansible-playbook get-certs.yml
ansible-playbook --connection="local 127.0.0.1" playbooks/gen-certs.yml
- Bootstrapping the etcd Cluster
ansible-playbook playbooks/bootstrap-etcd.yml
Homework 24 (kubernetes-2)
- MANAGE KUBERNETES WITH TERRAFORM - Provision a GKE Cluster (Google Cloud) https://learn.hashicorp.com/terraform/kubernetes/provision-gke-cluster
- All files are on the way /kubernetes/terraform
- Clone the following repository
git clone https://github.com/hashicorp/learn-terraform-provision-gke-cluster
- Due to the limits of the trial account, we will change the number of nodes from 3 to 1
- File gke.tf
...
# GKE cluster
resource "google_container_cluster" "primary" {
name = "${var.project_id}-gke"
location = var.region
remove_default_node_pool = true
initial_node_count = 1
...
- Do not forget to change terraform.tfvars
- Initialize Terraform workspace
terraform init
- Provision the GKE cluster
terraform apply
- Configure kubectl
gcloud container clusters get-credentials docker-275315-gke --region europe-west3 --project MY_PROJECT
- Deploy and access Kubernetes Dashboard
kubectl apply -f https://raw.githubusercontent.com/kubernetes/dashboard/v2.0.0-beta8/aio/deploy/recommended.yaml
- Create a proxy server that will allow you to navigate to the dashboard
kubectl proxy
curl http://127.0.0.1:8001/api/v1/namespaces/kubernetes-dashboard/services/https:kubernetes-dashboard:/proxy/
...
<head>
<meta charset="utf-8">
<title>Kubernetes Dashboard</title>
<link rel="icon" type="image/png" href="assets/images/kubernetes-logo.png"/>
<meta name="viewport" content="width=device-width">
<link rel="stylesheet" href="styles.dd2d1d3576191b87904a.css"></head>
...
- Authenticate to Kubernetes Dashboard
kubectl apply -f https://raw.githubusercontent.com/hashicorp/learn-terraform-provision-eks-cluster/master/kubernetes-dashboard-admin.rbac.yaml
- Generate the authorization token
kubectl -n kube-system describe secret $(kubectl -n kube-system get secret | grep service-controller-token | awk '{print $1}')
- Output
Data
====
namespace: 11 bytes
token: eyJhbGciOiJSUzI1NiIsImtpZCI6IiJ9.eyJpc3MiOiJrdWJlcm5ldGVzL3NlcnZpY2VhY2Nv...
ca.crt: 1119 bytes
Homework 23 (kubernetes-3)
- Delete old secrets
kubectl delete secret ui-ingress -n dev
- We encode the certificate and key in base64
cat tls.crt | base64 | tr -d '\n'
cat tls.key | base64 | tr -d '\n'
- Create a manifest secret.yml
apiVersion: v1
kind: Secret
metadata:
name: ui-ingress
namespace: dev
data:
tls.crt: LS0tLS1CRUdJTiB......
tls.key: LS0tLS1CRUdJTiBQUklWQVRFI......
type: kubernetes.io/tls
- Apply
kubectl apply -f secret.yml -n dev
- Create second "from"
ingress:
- from:
- podSelector:
matchLabels:
app: reddit
component: comment
- from:
- podSelector:
matchLabels:
app: reddit
component: post
- Apply
kubectl apply -f mongo-network-policy.yml -n dev
Homework 24 (kubernetes-4)
- _Helpers.tpl files created
- comment/templates/_helpers.tpl
{{- define "comment.fullname" -}}
{{- printf "%s-%s" .Release.Name .Chart.Name }}
{{- end -}}
- post/templates/_helpers.tpl
{{- define "post.fullname" -}}
{{- printf "%s-%s" .Release.Name .Chart.Name }}
{{- end -}}
- ui/templates/_helpers.tpl
{{- define "ui.fullname" -}}
{{- printf "%s-%s" .Release.Name .Chart.Name }}
{{- end -}}
- All templates/*.yaml files edited where necessary
- Projects created ui post comment in Gitlab
- Initialized and committed repos post and ui
- Initialized and committed repo reddit-deploy
- Add and committed pipelines for post & comment
- Add and committed updated pipelines for post & comment
- COMMENT pipeline
- Add lines to install the tiller plugin
...
- echo "Installing Tiller plugin..."
- helm init --client-only
- helm plugin install https://github.com/rimusz/helm-tiller
...
- Edit "run" line
...
helm tiller run -- helm upgrade \
...
- POST pipeline
- Download the helm3
...
curl https://get.helm.sh/helm-v3.2.4-linux-amd64.tar.gz | tar zx # Helm 3
...
- Remove helm init & tiller, because helm init & tiller has been removed in version 3
- Done
- In each pipeline (ui, post, comment) we add a new stage "release_deploy"
stages:
- build
- test
- review
- release
- release_deploy <<<<<
- cleanup
- We describe it in the pipeline
release_deploy:
stage: release_deploy
variables:
TRIGGER_TOKEN: MY_TOKEN
REF: master
only:
- master
before_script:
- apk add -U curl
script:
- >
curl -X POST \
-F token="$TRIGGER_TOKEN" \
-F ref="$REF" \
http://gitlab-gitlab/api/v4/projects/1/trigger/pipeline
- Register a trigger in gitlab
http://gitlab-gitlab/androsovm/reddit-deploy/settings/ci_cd
- We edit the pipeline "reddit-deploy" in such a way that the stage of staging is not triggered
staging:
stage: staging
...
except:
- triggers
- We also make changes to the production stage
production:
stage: production
...
only:
refs:
- master
- triggers
- Now make changes and push them into the master branch
cd Gitlab_ci/post
git commit -am "Added auto_deploy trigger"
git checkout master
git merge feature/3
Homework 25 (kubernetes-5)
### Task 1 - Enable NodeExporter ``` nodeExporter: ## If false, node-exporter will not be installed ## enabled: true ``` ``` helm upgrade prom . -f custom_values.yml --install ```- To separate the reddit-endpoints configuration, just specify a new source_labels with the name of the desired service
- job_name: 'comment-endpoints'
...
- source_labels: [__meta_kubernetes_service_label_app]
action: keep
regex: reddit
- source_labels: [__meta_kubernetes_service_label_component]
action: keep
regex: comment
...
- Added 3 dashboards
UI_Service_Monitoring
Business_Logic_Monitoring
Docker and system monitoring
- UI_Service_Monitroing
rate(ui_request_count{kubernetes_namespace=~"$namespace",http_status=~"^[2].*"}[1m])
rate(ui_request_count{kubernetes_namespace=~"$namespace",http_status=~"^[45].*"}[1m])
histogram_quantile(0.95, sum(rate(ui_request_latency_seconds_bucket{kubernetes_namespace=~"$namespace"}[5m])) by (le))
- Business_Logic_Monitorin
rate(comment_count{kubernetes_namespace=~"$namespace"}[1h])
rate(post_count{kubernetes_namespace=~"$namespace"}[1h])
- Docker and system monitoring
Where - container_..._... - namespace=~"$namespace"
- node_..._... - kubernetes_namespace=~"$namespace"
time() - node_boot_time{kubernetes_namespace=~"$namespace",instance=~"$server:.*"}
Disk space - min((node_filesystem_size{kubernetes_namespace=~"$namespace", fstype=~"xfs|ext4",instance=~"$server:.*"} - node_filesystem_free{kubernetes_namespace=~"$namespace", fstype=~"xfs|ext4",instance=~"$server:.*"} )/ node_filesystem_size{kubernetes_namespace=~"$namespace", fstype=~"xfs|ext4",instance=~"$server:.*"})
...
sum(container_spec_memory_limit_bytes{namespace=~"$namespace", name=~".+"} - container_memory_usage_bytes{namespace=~"$namespace", name=~".+"}) by (name)
...
- Alertmanager config setup below
alertmanager ConfigMap entries
alertmanagerFiles:
alertmanager.yml:
global:
slack_api_url: "https://hooks.slack.com/services/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"
receivers:
- name: "slack-notifications"
slack_configs:
- channel: "#mikhail_androsov"
route:
receiver: "slack-notifications"
- Prometheus server ConfigMap Alert entries
serverFiles:
alerts:
groups:
- name: NodeDown
rules:
- alert: NodeDown
expr: up == 0
for: 1m
labels:
severity: critical
annotations:
summary: "Node down"
description: "Node has been down for more than 1 minute."
- Post endpoints monitoring configuration
additionalScrapeConfigs:
- job_name: 'post-endpoints'
kubernetes_sd_configs:
- role: endpoints
relabel_configs:
- source_labels: [__meta_kubernetes_service_label_app]
action: keep
regex: reddit
- source_labels: [__meta_kubernetes_service_label_component]
action: keep
regex: post
- action: labelmap
regex: __meta_kubernetes_service_label_(.+)
- source_labels: [__meta_kubernetes_namespace]
target_label: kubernetes_namespace
- source_labels: [__meta_kubernetes_service_name]
target_label: kubernetes_name
- ServiceMonitor Charts/prometheus-operator/templates/prometheus-operator/servicemonitor.yaml
- Chart/efk folder Running
helm install --name=efk efk/