carlosedp / cluster-monitoring Goto Github PK
View Code? Open in Web Editor NEWCluster monitoring stack for clusters based on Prometheus Operator
License: MIT License
Cluster monitoring stack for clusters based on Prometheus Operator
License: MIT License
I'm testing this out on a Turing Pi cluster, with 7 Pi Compute Module 3+ boards.
On my Grafana dashboard, I'm seeing no data for CPU temperature:
I'm going to dig in and see where the monitor is supposed to be running. I modified the vars.jsonnet
file like so, for my cluster:
{
_config+:: {
namespace: 'monitoring',
},
// Enable or disable additional modules
modules: [
{
// After deployment, run the create_gmail_auth.sh script from scripts dir.
name: 'smtpRelay',
enabled: false,
file: import 'smtp_relay.jsonnet',
},
{
name: 'armExporter',
enabled: true,
file: import 'arm_exporter.jsonnet',
},
{
name: 'upsExporter',
enabled: false,
file: import 'ups_exporter.jsonnet',
},
{
name: 'metallbExporter',
enabled: false,
file: import 'metallb.jsonnet',
},
{
name: 'traefikExporter',
enabled: false,
file: import 'traefik.jsonnet',
},
{
name: 'elasticExporter',
enabled: false,
file: import 'elasticsearch_exporter.jsonnet',
},
],
k3s: {
enabled: true,
master_ip: ['10.0.100.163'],
},
// Domain suffix for the ingresses
suffixDomain: '10.0.100.74.nip.io',
// If TLSingress is true, a self-signed HTTPS ingress with redirect will be created
TLSingress: true,
// If UseProvidedCerts is true, provided files will be used on created HTTPS ingresses.
// Use a wildcard certificate for the domain like ex. "*.192.168.99.100.nip.io"
UseProvidedCerts: false,
TLSCertificate: importstr 'server.crt',
TLSKey: importstr 'server.key',
// Setting these to false, defaults to emptyDirs
enablePersistence: {
prometheus: false,
grafana: false,
},
// Grafana "from" email
grafana: {
from_address: '[email protected]',
},
}
Hi, question. I am already using the Prometheus Operator Helm Chart in my k3s cluster. I would like to get kubeControllerManager and kubeScheduler monitored, what is the work that needs to be done to enable this without using your deployments here? Is it possible?
First of all, I add my thanks for this project. Here are the steps I just performed:
$ git clone https://github.com/carlosedp/cluster-monitoring.git
Cloning into 'cluster-monitoring'...
remote: Enumerating objects: 1256, done.
remote: Total 1256 (delta 0), reused 0 (delta 0), pack-reused 1256
Receiving objects: 100% (1256/1256), 921.68 KiB | 3.48 MiB/s, done.
Resolving deltas: 100% (878/878), done.
Modified vars.jsonnet, which is attached as vars.txt
$ make vendor
rm -rf vendor
/root/go/bin/jb install
GET https://github.com/coreos/kube-prometheus/archive/17989b42aa10b1c6afa07043cb05bcd5ae492284.tar.gz 200
GET https://github.com/brancz/kubernetes-grafana/archive/57b4365eacda291b82e0d55ba7eec573a8198dda.tar.gz 200
GET https://github.com/ksonnet/ksonnet-lib/archive/0d2f82676817bbf9e4acf6495b2090205f323b9f.tar.gz 200
GET https://github.com/kubernetes-monitoring/kubernetes-mixin/archive/b61c5a34051f8f57284a08fe78ad8a45b430252b.tar.gz 200
GET https://github.com/prometheus/prometheus/archive/74207c04655e1fd93eea0e9a5d2f31b1cbc4d3d0.tar.gz 200
GET https://github.com/coreos/etcd/archive/d8c8f903eee10b8391abaef7758c38b2cd393c55.tar.gz 200
GET https://github.com/coreos/prometheus-operator/archive/e31c69f9b5c6555e0f4a5c1f39d0f03182dd6b41.tar.gz 200
GET https://github.com/kubernetes/kube-state-metrics/archive/d667979ed55ad1c4db44d331b51d646f5b903aa7.tar.gz 200
GET https://github.com/kubernetes/kube-state-metrics/archive/d667979ed55ad1c4db44d331b51d646f5b903aa7.tar.gz 200
GET https://github.com/prometheus/node_exporter/archive/08ce3c6dd430deb51798826701a395e460620d60.tar.gz 200
GET https://github.com/grafana/grafonnet-lib/archive/8fb95bd89990e493a8534205ee636bfcb8db67bd.tar.gz 200
GET https://github.com/grafana/jsonnet-libs/archive/881db2241f0c5007c3e831caf34b0c645202b4ab.tar.gz 200
$ make
Installing jsonnet
go: found github.com/google/go-jsonnet/cmd/jsonnet in github.com/google/go-jsonnet v0.16.0
go: github.com/mattn/go-isatty upgrade => v0.0.12
go: github.com/mattn/go-colorable upgrade => v0.1.6
go: golang.org/x/sys upgrade => v0.0.0-20200620081246-981b61492c35
go: found github.com/google/go-jsonnet/cmd/jsonnetfmt in github.com/google/go-jsonnet v0.16.0
go: github.com/mattn/go-colorable upgrade => v0.1.6
go: github.com/mattn/go-isatty upgrade => v0.0.12
go: golang.org/x/sys upgrade => v0.0.0-20200620081246-981b61492c35
go: github.com/brancz/gojsontoyaml upgrade => v0.0.0-20200602132005-3697ded27e8c
rm -rf manifests
./scripts/build.sh main.jsonnet /root/go/bin/jsonnet
using jsonnet from path
+ set -o pipefail
+ rm -rf manifests
+ mkdir -p manifests/setup
+ jsonnet -J vendor -m manifests main.jsonnet
+ xargs '-I{}' sh -c 'cat {} | $(go env GOPATH)/bin/gojsontoyaml > {}.yaml; rm -f {}' -- '{}'
RUNTIME ERROR: Unexpected type null, expected object
base_operator_stack.jsonnet:(123:7)-(128:119) object <anonymous>
main.jsonnet:31:24-40 object <anonymous>
During manifestation
make: *** [Makefile:19: manifests] Error 1
When running make
, I get the following output:
$ make
rm -rf manifests
./scripts/build.sh main.jsonnet /home/pirate/go/bin/jsonnet
using jsonnet from arg
+ set -o pipefail
+ rm -rf manifests
+ mkdir manifests
+ /home/pirate/go/bin/jsonnet -J vendor -m manifests main.jsonnet
+ xargs '-I{}' sh -c 'cat {} | gojsontoyaml > {}.yaml; rm -f {}' -- '{}'
--: 1: --: gojsontoyaml: not found
--: 1: --: gojsontoyaml: not found
--: 1: --: gojsontoyaml: not found
--: 1: --: gojsontoyaml: not found
--: 1: --: gojsontoyaml: not found
--: 1: --: gojsontoyaml: not found
--: 1: --: gojsontoyaml: not found
--: 1: --: gojsontoyaml: not found
--: 1: --: gojsontoyaml: not found
--: 1: --: gojsontoyaml: not found
--: 1: --: gojsontoyaml: not found
--: 1: --: gojsontoyaml: not found
--: 1: --: gojsontoyaml: not found
--: 1: --: gojsontoyaml: not found
--: 1: --: gojsontoyaml: not found
--: 1: --: gojsontoyaml: not found
--: 1: --: gojsontoyaml: not found
--: 1: --: gojsontoyaml: not found
--: 1: --: gojsontoyaml: not found
--: 1: --: gojsontoyaml: not found
--: 1: --: gojsontoyaml: not found
--: 1: --: gojsontoyaml: not found
--: 1: --: gojsontoyaml: not found
--: 1: --: gojsontoyaml: not found
--: 1: --: gojsontoyaml: not found
--: 1: --: gojsontoyaml: not found
--: 1: --: gojsontoyaml: not found
--: 1: --: gojsontoyaml: not found
--: 1: --: gojsontoyaml: not found
--: 1: --: gojsontoyaml: not found
--: 1: --: gojsontoyaml: not found
--: 1: --: gojsontoyaml: not found
--: 1: --: gojsontoyaml: not found
--: 1: --: gojsontoyaml: not found
--: 1: --: gojsontoyaml: not found
--: 1: --: gojsontoyaml: not found
--: 1: --: gojsontoyaml: not found
--: 1: --: gojsontoyaml: not found
--: 1: --: gojsontoyaml: not found
--: 1: --: gojsontoyaml: not found
--: 1: --: gojsontoyaml: not found
--: 1: --: gojsontoyaml: not found
--: 1: --: gojsontoyaml: not found
--: 1: --: gojsontoyaml: not found
--: 1: --: gojsontoyaml: not found
--: 1: --: gojsontoyaml: not found
--: 1: --: gojsontoyaml: not found
--: 1: --: gojsontoyaml: not found
--: 1: --: gojsontoyaml: not found
--: 1: --: gojsontoyaml: not found
--: 1: --: gojsontoyaml: not found
--: 1: --: gojsontoyaml: not found
--: 1: --: gojsontoyaml: not found
--: 1: --: gojsontoyaml: not found
--: 1: --: gojsontoyaml: not found
--: 1: --: gojsontoyaml: not found
--: 1: --: gojsontoyaml: not found
--: 1: --: gojsontoyaml: not found
--: 1: --: gojsontoyaml: not found
--: 1: --: gojsontoyaml: not found
--: 1: --: gojsontoyaml: not found
--: 1: --: gojsontoyaml: not found
--: 1: --: gojsontoyaml: not found
--: 1: --: gojsontoyaml: not found
--: 1: --: gojsontoyaml: not found
--: 1: --: gojsontoyaml: not found
--: 1: --: gojsontoyaml: not found
--: 1: --: gojsontoyaml: not found
--: 1: --: gojsontoyaml: not found
--: 1: --: gojsontoyaml: not found
--: 1: --: gojsontoyaml: not found
--: 1: --: gojsontoyaml: not found
--: 1: --: gojsontoyaml: not found
--: 1: --: gojsontoyaml: not found
--: 1: --: gojsontoyaml: not found
--: 1: --: gojsontoyaml: not found
--: 1: --: gojsontoyaml: not found
--: 1: --: gojsontoyaml: not found
--: 1: --: gojsontoyaml: not found
All the .yaml
manifests are created, but they are empty. It seems like gojsontoyaml
may be quite essential to making the manifests actually have content ๐คช
Hello, I have an ARM k3s kubernetes cluster with 4 ARM machines (odroid MC1).
After I type the make deploy, I get these errors (full log):
rm -rf manifests
./scripts/build.sh main.jsonnet /root/go/bin/jsonnet
using jsonnet from arg
+ set -o pipefail
+ rm -rf manifests
+ mkdir manifests
+ /root/go/bin/jsonnet -J vendor -m manifests main.jsonnet
+ xargs '-I{}' sh -c 'cat {} | gojsontoyaml > {}.yaml; rm -f {}' -- '{}'
kubectl apply -f ./manifests/
namespace/monitoring created
customresourcedefinition.apiextensions.k8s.io/alertmanagers.monitoring.coreos.com created
customresourcedefinition.apiextensions.k8s.io/podmonitors.monitoring.coreos.com created
customresourcedefinition.apiextensions.k8s.io/prometheuses.monitoring.coreos.com created
customresourcedefinition.apiextensions.k8s.io/prometheusrules.monitoring.coreos.com created
customresourcedefinition.apiextensions.k8s.io/servicemonitors.monitoring.coreos.com created
clusterrole.rbac.authorization.k8s.io/prometheus-operator created
clusterrolebinding.rbac.authorization.k8s.io/prometheus-operator created
deployment.apps/prometheus-operator created
service/prometheus-operator created
serviceaccount/prometheus-operator created
secret/alertmanager-main created
service/alertmanager-main created
serviceaccount/alertmanager-main created
secret/grafana-config created
secret/grafana-datasources created
configmap/grafana-dashboard-apiserver created
configmap/grafana-dashboard-controller-manager created
configmap/grafana-dashboard-coredns-dashboard created
configmap/grafana-dashboard-k8s-cluster-rsrc-use created
configmap/grafana-dashboard-k8s-node-rsrc-use created
configmap/grafana-dashboard-k8s-resources-cluster created
configmap/grafana-dashboard-k8s-resources-namespace created
configmap/grafana-dashboard-k8s-resources-pod created
configmap/grafana-dashboard-k8s-resources-workload created
configmap/grafana-dashboard-k8s-resources-workloads-namespace created
configmap/grafana-dashboard-kubelet created
configmap/grafana-dashboard-kubernetes-cluster-dashboard created
configmap/grafana-dashboard-nodes created
configmap/grafana-dashboard-persistentvolumesusage created
configmap/grafana-dashboard-pods created
configmap/grafana-dashboard-prometheus-dashboard created
configmap/grafana-dashboard-prometheus-remote-write created
configmap/grafana-dashboard-prometheus created
configmap/grafana-dashboard-proxy created
configmap/grafana-dashboard-scheduler created
configmap/grafana-dashboard-statefulset created
configmap/grafana-dashboards created
deployment.apps/grafana created
service/grafana created
serviceaccount/grafana created
ingress.extensions/alertmanager-main created
ingress.extensions/grafana created
ingress.extensions/prometheus-k8s created
clusterrole.rbac.authorization.k8s.io/kube-state-metrics created
clusterrolebinding.rbac.authorization.k8s.io/kube-state-metrics created
deployment.apps/kube-state-metrics created
role.rbac.authorization.k8s.io/kube-state-metrics created
rolebinding.rbac.authorization.k8s.io/kube-state-metrics created
service/kube-state-metrics created
serviceaccount/kube-state-metrics created
clusterrole.rbac.authorization.k8s.io/node-exporter created
clusterrolebinding.rbac.authorization.k8s.io/node-exporter created
daemonset.apps/node-exporter created
service/node-exporter created
serviceaccount/node-exporter created
apiservice.apiregistration.k8s.io/v1beta1.metrics.k8s.io configured
clusterrole.rbac.authorization.k8s.io/prometheus-adapter created
clusterrole.rbac.authorization.k8s.io/system:aggregated-metrics-reader unchanged
clusterrolebinding.rbac.authorization.k8s.io/prometheus-adapter created
clusterrolebinding.rbac.authorization.k8s.io/resource-metrics:system:auth-delegator created
clusterrole.rbac.authorization.k8s.io/resource-metrics-server-resources created
configmap/adapter-config created
deployment.apps/prometheus-adapter created
rolebinding.rbac.authorization.k8s.io/resource-metrics-auth-reader created
service/prometheus-adapter created
serviceaccount/prometheus-adapter created
clusterrole.rbac.authorization.k8s.io/prometheus-k8s created
clusterrolebinding.rbac.authorization.k8s.io/prometheus-k8s created
service/kube-controller-manager-prometheus-discovery created
service/kube-dns-prometheus-discovery created
service/kube-scheduler-prometheus-discovery created
rolebinding.rbac.authorization.k8s.io/prometheus-k8s-config created
rolebinding.rbac.authorization.k8s.io/prometheus-k8s created
rolebinding.rbac.authorization.k8s.io/prometheus-k8s created
rolebinding.rbac.authorization.k8s.io/prometheus-k8s created
role.rbac.authorization.k8s.io/prometheus-k8s-config created
role.rbac.authorization.k8s.io/prometheus-k8s created
role.rbac.authorization.k8s.io/prometheus-k8s created
role.rbac.authorization.k8s.io/prometheus-k8s created
service/prometheus-k8s created
serviceaccount/prometheus-k8s created
deployment.apps/smtp-server created
service/smtp-server created
unable to recognize "manifests/0prometheus-operator-serviceMonitor.yaml": no matches for kind "ServiceMonitor" in version "monitoring.coreos.com/v1"
unable to recognize "manifests/alertmanager-alertmanager.yaml": no matches for kind "Alertmanager" in version "monitoring.coreos.com/v1"
unable to recognize "manifests/alertmanager-serviceMonitor.yaml": no matches for kind "ServiceMonitor" in version "monitoring.coreos.com/v1"
unable to recognize "manifests/grafana-serviceMonitor.yaml": no matches for kind "ServiceMonitor" in version "monitoring.coreos.com/v1"
unable to recognize "manifests/kube-state-metrics-serviceMonitor.yaml": no matches for kind "ServiceMonitor" in version "monitoring.coreos.com/v1"
unable to recognize "manifests/node-exporter-serviceMonitor.yaml": no matches for kind "ServiceMonitor" in version "monitoring.coreos.com/v1"
unable to recognize "manifests/prometheus-prometheus.yaml": no matches for kind "Prometheus" in version "monitoring.coreos.com/v1"
unable to recognize "manifests/prometheus-rules.yaml": no matches for kind "PrometheusRule" in version "monitoring.coreos.com/v1"
unable to recognize "manifests/prometheus-serviceMonitor.yaml": no matches for kind "ServiceMonitor" in version "monitoring.coreos.com/v1"
unable to recognize "manifests/prometheus-serviceMonitorApiserver.yaml": no matches for kind "ServiceMonitor" in version "monitoring.coreos.com/v1"
unable to recognize "manifests/prometheus-serviceMonitorCoreDNS.yaml": no matches for kind "ServiceMonitor" in version "monitoring.coreos.com/v1"
unable to recognize "manifests/prometheus-serviceMonitorKubeControllerManager.yaml": no matches for kind "ServiceMonitor" in version "monitoring.coreos.com/v1"
unable to recognize "manifests/prometheus-serviceMonitorKubeScheduler.yaml": no matches for kind "ServiceMonitor" in version "monitoring.coreos.com/v1"
unable to recognize "manifests/prometheus-serviceMonitorKubelet.yaml": no matches for kind "ServiceMonitor" in version "monitoring.coreos.com/v1"
Makefile:25: recipe for target 'deploy' failed
make: *** [deploy] Error 1
While running make I eventually get this in the console:
rm -rf manifests
./scripts/build.sh main.jsonnet /root/go/bin/jsonnet
using jsonnet from arg
+ set -o pipefail
+ rm -rf manifests
+ mkdir -p manifests/setup
+ /root/go/bin/jsonnet -J vendor -m manifests main.jsonnet
+ xargs '-I{}' sh -c 'cat {} | $(go env GOPATH)/bin/gojsontoyaml > {}.yaml; rm -f {}' -- '{}'
./scripts/build.sh: line 23: 7595 Killed $JSONNET_BIN -J vendor -m manifests "${1-example.jsonnet}"
7596 Done | xargs -I{} sh -c 'cat {} | $(go env GOPATH)/bin/gojsontoyaml > {}.yaml; rm -f {}' -- {}
make: *** [Makefile:19: manifests] Error 137
My vars.jsonnet edits from default are:
Any ideas what is causing this or how to get this to run?
Hi ๐
This repo is fantastic to see and so far its been brilliant.
So far I followed the pre requisite steps to get setup:
make vendor
make
kubectl apply -f manifests
Here's a snapshot of my cluster after applying all of the manifests
Everything looks great!
Unfortunately though I am unable to access the ingress gateways as suggested alertmanager..nip.io, prometheus..nip.io, grafana..nip.io. To get around this for now I thought I'd check out prometheus first via port-forward.
To do that I run: kubectl -n monitoring port-forward prometheus-k8s-0 9090:9090
When I check prometheus targets I see lots of similar errors to this one:
The majority of the errors are:
Details of my hardware:
NAME STATUS ROLES AGE VERSION
worker-rpi-3Bplus Ready <none> 22h v1.17.3+k3s1
worker-rpi-2 Ready <none> 22h v1.17.3+k3s1
master Ready master 22h v1.17.3+k3s1
worker-rpi-4 Ready <none> 22h v1.17.3+k3s1
master = Raspberry Pi 4 /w 4GB RAM
worker-rpi-4 = Rapsberry Pi 4 /w 4GB RAM
worker-rpi-2 = Raspberry Pi 2 B+
worker-rpi-3Bplus = Raspberry Pi 3 B +
All running docker version: 19.03.8 Client & Server.
If there's any further detail you need me to supply, I'll dig that out for you.
There also may be a gap in my knowledge here / potential further PEBCAC. Any advise / help would be much appreciated!
It seems the default (https://prometheus.io/docs/prometheus/latest/configuration/configuration/) is 60s/1m; though I noticed one override for 30s in base_operator_stack.jsonnet
for kubeStateMetrics
.
It seems the Prometheus pod is a bit overloaded when it gets deployed to some of my older Pi 3 B boards (seems to run okay on the Pi 4 with more RAM and faster CPU though)... and I'm wondering if setting the scrape interval to something a bit more lightweight like 2m or 5m might help the poor older Pis keep up.
I was about to jump over to my prometheus instance but that node just went down due to thrashing as it ran out of memory ๐คช
Anyways, just a quick support question, not a big deal and I may work on getting the memory requirements a little more stringent so Prometheus only goes to one of the newer/faster nodes.
Hi, I have edited the grafana-config secret to enable basic.auth.
I do not understand how to regenerate the grafana.ini with this change. I have deleted the pod, it was recreated, but the config is still the old one.
Please help, and thanks for the great work!
Cheers,
udo.
First of all, thank you for this amazing project!
I have created server certificate files using Let's Encrypt and Certbot. Now I wonder which files I need to copy into the server.key
and server.crt
files?
The Certbot has created the following files: cert.pem
, chain.pem
, fullchain.pem
and privkey.pem
.
Hi again, here's my next issue:
I'd like to access prometheus' admin-api. It is enabled via the command line option '--web.enable-admin-api'
I found: prometheus-operator/prometheus-operator#2300 which should enable this feature.
I have changed base_operator_stack.jsonnet and added "enableAdminAPI: 'true'". I have re-created and re-applied the manifests. Unfortunatly the commandline option is still not there on the prometheus-pod.
Thanks for your work on this. I am trying to get the SNMP exporter working and have used your configurations as a starting place but I am having trouble with the ServiceMonitor
and I'm not sure how to troubleshoot it.
I can see the snmp related servicemonitor object in Kubernetes after creating it with,
kubectl get servicemonitor -n monitoring -o yaml | grep snmp
But I don't see any of the configuration anywhere else in Prometheus. Likewise none of the SNMP metrics are showing up in Grafana. Nothing is really showing up the prometheus-operator logs either about the configuration.
I have tested that the exporter is working and I can get to it and scrape a target manually.
Any thoughts or insights? Maybe I am missing something simple? I can post any other details if needed.
Due to different runtime use by K3s, cadvisor does not report some metrics due to missing fields.
Ref:
Hi,
I modified the vars file and set "true" on arm_exporter and metallbexporter but in the end it did not deploye.
Also when I change the IP in "suffixdomains" example 192.168.1.2, after I deployed the IP is not changed and remains 192.168.15.15.
I think it just ignores the changes to the vars.jsonnet file.
I created a cluster of 6 nodes with k3s
using the first one as server and 5 others as agent.
I followed your readme, did some change for my own nfs
settings and finally built the manifests and applied them.
But my prometheus instance cannot scrap /metrics
when it's protected by kube-rbac-proxy
I tried to curl
manually from the prometheus pod using the serviceAccount token to see if it was a prometheus configuration issue, but I found the same problem.
Checking the log from kube-rbac-proxy
I found :
E0511 17:08:31.133029 1 webhook.go:106] Failed to make webhook authenticator request: the server could not find the requested resource
E0511 17:08:31.133217 1 proxy.go:67] Unable to authenticate the request due to an error: the server could not find the requested resource
Did I forgot to do something ? Or is it maybe and issue with k3s
itself ?
Dive deeper on kube-rbac-proxy use of the API on K3s to return endpoints into https and authenticated endpoints.
Ref. #13
I'm not a go or jsonnet person, so reporting here.
When building and adding arm-exporter for temps, the resulting yaml is missing a couple things.
arm-exporter-serviceAccount.yaml (file missing, needed, for me at least):
apiVersion: v1
kind: ServiceAccount
metadata:
name: arm-exporter
namespace: monitoring
along with:
arm-exporter-daemonset.yaml:
add: serviceAccountName: arm-exporter
and: tls-cipher-suites
Without the service account and the serviceAccountName added to the daemonset, I was getting 401 unauthorized errors.
Wish there was more I could do to help, this project is aweosme
Upgrading an existing Prometheus operator (from v0.17.0 to v0.28.0) and bumping the Prometheus version (v2.3.1 to v2.7.0) results in the following error in the Prometheus logs.
level=error ts=2019-02-01T18:43:54.10168301Z caller=main.go:688 err="error loading config from \"/etc/prometheus/config_out/prometheus.env.yaml\": couldn't load configuration (--config.file=\"/etc/prometheus/config_out/prometheus.env.yaml\"): open /etc/prometheus/config_out/prometheus.env.yaml: no such file or directory"
I haven't dug super deep into it yet, any idea what could be going on?
Prometheus crashes and fails to start after running for some time. It looks like the TSDB is getting too large and Prometheus can't allocate any memory anymore (mmap: cannot allocate memory).
% kubectl logs prometheus-k8s-0 -p prometheus
level=info ts=2020-05-31T06:15:21.309Z caller=main.go:329 msg="Starting Prometheus" version="(version=2.11.1, branch=HEAD, revision=e5b22494857deca4b806f74f6e3a6ee30c251763)"
level=info ts=2020-05-31T06:15:21.309Z caller=main.go:331 host_details="(Linux 4.19.97-v7l+ #1294 SMP Thu Jan 30 13:21:14 GMT 2020 armv7l prometheus-k8s-0 (none))"
...
level=info ts=2020-05-31T06:15:21.313Z caller=main.go:652 msg="Starting TSDB ..."
...
level=info ts=2020-05-31T06:15:21.320Z caller=repair.go:59 component=tsdb msg="found healthy block" mint=1590494400000 maxt=1590501600000 ulid=01E99KKY0EHNP5Y85BWZKD85CX
level=info ts=2020-05-31T06:15:21.320Z caller=repair.go:59 component=tsdb msg="found healthy block" mint=1590501600000 maxt=1590508800000 ulid=01E99KMVH5A14VHEX6SGYZSM5R
level=info ts=2020-05-31T06:15:21.320Z caller=repair.go:59 component=tsdb msg="found healthy block" mint=1590508800000 maxt=1590516000000 ulid=01E9AS2F3X2Z2WARE081EYBFP4
...
level=info ts=2020-05-31T06:15:21.740Z caller=main.go:521 msg="Stopping scrape discovery manager..."
level=info ts=2020-05-31T06:15:21.740Z caller=main.go:535 msg="Stopping notify discovery manager..."
level=info ts=2020-05-31T06:15:21.740Z caller=main.go:557 msg="Stopping scrape manager..."
level=info ts=2020-05-31T06:15:21.740Z caller=manager.go:776 component="rule manager" msg="Stopping rule manager..."
level=info ts=2020-05-31T06:15:21.741Z caller=manager.go:782 component="rule manager" msg="Rule manager stopped"
level=info ts=2020-05-31T06:15:21.741Z caller=notifier.go:602 component=notifier msg="Stopping notification manager..."
level=info ts=2020-05-31T06:15:21.741Z caller=main.go:531 msg="Notify discovery manager stopped"
level=info ts=2020-05-31T06:15:21.741Z caller=main.go:722 msg="Notifier manager stopped"
level=info ts=2020-05-31T06:15:21.741Z caller=main.go:517 msg="Scrape discovery manager stopped"
level=info ts=2020-05-31T06:15:21.741Z caller=main.go:551 msg="Scrape manager stopped"
level=error ts=2020-05-31T06:15:21.741Z caller=main.go:731 err="opening storage failed: unexpected corrupted block:map[01E9AS2F3X2Z2WARE081EYBFP4:mmap files: mmap: cannot allocate memory]"
I have set the persistence settings to false in vars.jsonnet.
// Setting these to false, defaults to emptyDirs
enablePersistence: {
prometheus: false,
grafana: false,
},
Is there an easy way to configure the TSDB retention behavior?
https://prometheus.io/docs/prometheus/latest/storage/#operational-aspects
--storage.tsdb.retention.size: [EXPERIMENTAL] This determines the maximum number
of bytes that storage blocks can use (note that this does not include the WAL size,
which can be substantial). The oldest data will be removed first. Defaults to 0 or
disabled. This flag is experimental and can be changed in future releases. Units
supported: KB, MB, GB, PB. Ex: "512MB"
Hi,
Can you help me solve an issue that i got when following your guide? Iยดve set up the vars master ip and suffix with my node1 IP. How should I approach this with multiple nodes in cluster?
I have set k3s enabled and armExporter and Traefik to true.
Should I run all the commands with sudo?
When I do I get
kubectl apply -f ./manifests/setup/
The connection to the server localhost:8080 was refused - did you specify the right host or port?
make: *** [Makefile:34: deploy] Error 1
Without sudo I get
kubectl apply -f manifests/setup/
namespace/monitoring unchanged
clusterrole.rbac.authorization.k8s.io/prometheus-operator unchanged
clusterrolebinding.rbac.authorization.k8s.io/prometheus-operator unchanged
[unable to recognize "manifests/setup/prometheus-operator-0alertmanagerCustomResourceDefinition.yaml": no matches for kind "CustomResourceDefinition" in version "apiextensions.k8s.io/v1", unable to recognize "manifests/setup/prometheus-operator-0podmonitorCustomResourceDefinition.yaml": no matches for kind "CustomResourceDefinition" in version "apiextensions.k8s.io/v1", unable to recognize "manifests/setup/prometheus-operator-0prometheusCustomResourceDefinition.yaml": no matches for kind "CustomResourceDefinition" in version "apiextensions.k8s.io/v1", unable to recognize "manifests/setup/prometheus-operator-0prometheusruleCustomResourceDefinition.yaml": no matches for kind "CustomResourceDefinition" in version "apiextensions.k8s.io/v1", unable to recognize "manifests/setup/prometheus-operator-0servicemonitorCustomResourceDefinition.yaml": no matches for kind "CustomResourceDefinition" in version "apiextensions.k8s.io/v1", unable to recognize "manifests/setup/prometheus-operator-0thanosrulerCustomResourceDefinition.yaml": no matches for kind "CustomResourceDefinition" in version "apiextensions.k8s.io/v1"]
What am I doing wrong?
Thanks :)
make
rm -rf manifests
./scripts/build.sh main.jsonnet /root/go/bin/jsonnet
using jsonnet from arg
+ set -o pipefail
+ rm -rf manifests
+ mkdir manifests
+ /root/go/bin/jsonnet -J vendor -m manifests main.jsonnet
+ xargs '-I{}' sh -c 'cat {} | gojsontoyaml > {}.yaml; rm -f {}' -- '{}'
RUNTIME ERROR: Unexpected type string, expected array
builtin function <flatMap>
vendor/ksonnet/ksonnet.beta.4/k8s.libsonnet:23134:57-66 thunk from <function <anonymous>>
vendor/ksonnet/ksonnet.beta.4/k8s.libsonnet:23134:48-67 function <anonymous>
utils.libsonnet:(80:23)-(83:24) thunk <subset> from <function <anonymous>>
utils.libsonnet:89:31-37 thunk from <function <anonymous>>
vendor/ksonnet/ksonnet.beta.4/k8s.libsonnet:7427:51-58 thunk from <function <anonymous>>
vendor/ksonnet/ksonnet.beta.4/k8s.libsonnet:7427:42-59 function <anonymous>
utils.libsonnet:89:9-38 function <anonymous>
k3s-overrides.jsonnet:8:7-130 object <anonymous>
main.jsonnet:25:27-46 object <anonymous>
During manifestation
This is a great project, and was really easy to get it monitoring my k3s raspberry pi cluster. Thank you.
Is there any way to add additional entries to the scrape_configs
in Prometheus' configuration to monitor things outside of the cluster? I found these instructions in the original project:
https://github.com/coreos/prometheus-operator/blob/master/Documentation/additional-scrape-config.md
I see the same files mentioned in this project, but it doesn't look like it's being ingested when running make deploy
. Is there something I should add to vars.jsonnet
to get that working?
Hi,
I need to enable this plugin (and potentially others), I have added it to vars.jsonnet
// Grafana "from" email
grafana: {
from_address: '[email protected]',
plugins: [
"grafana-piechart-panel",
],
},
but a make vendor/make/deploy doesn't get it enabled.
How can I get this done properly please?
Regards
Tom
Since the reorganization, I'm getting the following error when running make
:
---
changed: false
cmd: "/usr/bin/make"
msg: "+ set -o pipefail\n+ rm -rf manifests\n+ mkdir -p manifests/setup\n+ xargs '-I{}'
sh -c 'cat {} | $(go env GOPATH)/bin/gojsontoyaml > {}.yaml; rm -f {}' -- '{}'\n+
/home/pirate/go/bin/jsonnet -J vendor -m manifests main.jsonnet\nRUNTIME ERROR:
couldn't open import \"arm_exporter.jsonnet\": no match locally or in the Jsonnet
library paths\n\tvars.jsonnet:16:13-42\tobject <anonymous>\n\tmain.jsonnet:9:34-45\tthunk
from <thunk from <thunk <kp> from <$>>>\n\tutils.libsonnet:20:35-41\tthunk from
<function <aux>>\n\tutils.libsonnet:20:9-42\tfunction <aux>\n\tutils.libsonnet:21:5-21\tfunction
<anonymous>\n\tmain.jsonnet:9:14-92\tthunk <kp> from <$>\n\tmain.jsonnet:18:86-88\t\n\t<std>:1293:24-25\tthunk
from <function <anonymous>>\n\t<std>:1293:5-33\tfunction <anonymous>\n\tmain.jsonnet:18:69-104\t$\n\t\t\n\t\t\n\tDuring
evaluation\t\n\nmake: *** [Makefile:19: manifests] Error 1"
rc: 2
stderr: "+ set -o pipefail\n+ rm -rf manifests\n+ mkdir -p manifests/setup\n+ xargs
'-I{}' sh -c 'cat {} | $(go env GOPATH)/bin/gojsontoyaml > {}.yaml; rm -f {}' --
'{}'\n+ /home/pirate/go/bin/jsonnet -J vendor -m manifests main.jsonnet\nRUNTIME
ERROR: couldn't open import \"arm_exporter.jsonnet\": no match locally or in the
Jsonnet library paths\n\tvars.jsonnet:16:13-42\tobject <anonymous>\n\tmain.jsonnet:9:34-45\tthunk
from <thunk from <thunk <kp> from <$>>>\n\tutils.libsonnet:20:35-41\tthunk from
<function <aux>>\n\tutils.libsonnet:20:9-42\tfunction <aux>\n\tutils.libsonnet:21:5-21\tfunction
<anonymous>\n\tmain.jsonnet:9:14-92\tthunk <kp> from <$>\n\tmain.jsonnet:18:86-88\t\n\t<std>:1293:24-25\tthunk
from <function <anonymous>>\n\t<std>:1293:5-33\tfunction <anonymous>\n\tmain.jsonnet:18:69-104\t$\n\t\t\n\t\t\n\tDuring
evaluation\t\n\nmake: *** [Makefile:19: manifests] Error 1\n"
stderr_lines:
- "+ set -o pipefail"
- "+ rm -rf manifests"
- "+ mkdir -p manifests/setup"
- "+ xargs '-I{}' sh -c 'cat {} | $(go env GOPATH)/bin/gojsontoyaml > {}.yaml; rm
-f {}' -- '{}'"
- "+ /home/pirate/go/bin/jsonnet -J vendor -m manifests main.jsonnet"
- 'RUNTIME ERROR: couldn''t open import "arm_exporter.jsonnet": no match locally or
in the Jsonnet library paths'
- "\tvars.jsonnet:16:13-42\tobject <anonymous>"
- "\tmain.jsonnet:9:34-45\tthunk from <thunk from <thunk <kp> from <$>>>"
- "\tutils.libsonnet:20:35-41\tthunk from <function <aux>>"
- "\tutils.libsonnet:20:9-42\tfunction <aux>"
- "\tutils.libsonnet:21:5-21\tfunction <anonymous>"
- "\tmain.jsonnet:9:14-92\tthunk <kp> from <$>"
- "\tmain.jsonnet:18:86-88\t"
- "\t<std>:1293:24-25\tthunk from <function <anonymous>>"
- "\t<std>:1293:5-33\tfunction <anonymous>"
- "\tmain.jsonnet:18:69-104\t$"
- "\t\t"
- "\t\t"
- "\tDuring evaluation\t"
- ''
- 'make: *** [Makefile:19: manifests] Error 1'
stdout: |
rm -rf manifests
./scripts/build.sh main.jsonnet /home/pirate/go/bin/jsonnet
using jsonnet from arg
stdout_lines:
- rm -rf manifests
- "./scripts/build.sh main.jsonnet /home/pirate/go/bin/jsonnet"
- using jsonnet from arg
Hello @carlosedp ,
Can you shed some light on rpi_up? is it a metric? how does it work?
After pulling the latest commit and building, I noticed the prometheus-operator
is stuck in a CrashLoopBackOff
and the container log shows:
level=info ts=2020-05-23T23:29:30.5438805Z caller=operator.go:293 component=thanosoperator msg="connection established" cluster-version=v1.17.5+k3s1
ts=2020-05-23T23:29:30.73898204Z caller=main.go:304 msg="Unhandled error received. Exiting..." err="getting CRD: Alertmanager: customresourcedefinitions.apiextensions.k8s.io \"alertmanagers.monitoring.coreos.com\" is forbidden: User \"system:serviceaccount:monitoring:prometheus-operator\" cannot get resource \"customresourcedefinitions\" in API group \"apiextensions.k8s.io\" at the cluster scope"
I noticed the node-exporter seems to be giving back node IPs instead of Kubernetes Pod IPs, so when the dashboard is displayed in Grafana, I see my node DNS names, like worker-01, worker-02, etc.
The CPU temperature monitor data, though, uses Pod IPs instead of node IPs, so the data is a little harder to discern, since I have to manually map Pod IPs to the nodes those Pods are running on:
Hello,
How could I change the path that database is stored ? I would like change the path to my secondary hdd
Hi,
first of all, thanks for your work, I tested your manifest with my local k3s cluster and while it is not a perfect setting it works better than the standard prometheus-operator that is geared towards k8s.
I noticed that all services get exposed via TLS via the ingress, but it does not automatically redirect one to the HTTPS version of the site when you connect to the HTTP version of your service. This should be configurable via a setting in the ingress, I guess.
Hey,
first off all thanks for this nice work. Running it on my RPi4 cluster with HypriotOS and K3s without issues and also worked first try.
Unfortunately the Alerts show me KubeControllerManagerDown
and KubeSchedulerDown
is this expected behavior?
Additionally I have a question about your blogpost and one particular screenshot:
https://miro.medium.com/max/1400/1*zp4bS5omhxoLxbC4xGh5vQ.png
There it shows all the processes and their percentage of CPU usage. For me it shows only 1 graph with Value | 21% | 14%
. Is this a limitation due to HypriotOS, the ARM, k3s or did I forget something?
I noticed while debugging #39 that the arm-exporter
DaemonSet was only running on 6 out of 7 Pi nodes. It was not running on the master node.
The master has the following taint:
Taints: k3s-controlplane=true:NoExecute
But that doesn't seem to cause the node-exporter
DaemonSet to not deploy a Pod there:
# kubectl get ds -n monitoring
NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE
node-exporter 7 7 7 7 7 kubernetes.io/os=linux 27m
arm-exporter 6 6 6 6 6 beta.kubernetes.io/arch=arm 37m
The arch is arm
on all 7 Pis, so I'm not sure why the selector might influence the DS deployment.
Hey,
I was wondering if there was a way or any steps one could take to reduce the CPU load of the Prometheus Pod. Maybe I'm also doing something wrong, not too sure.
I've got a K3s cluster with one RPI 4 4GB running as the master and 3 3B+ as workers. When I deploy the monitoring stack (following @geerlingguy 's tutorial) to my cluster most of the pods get scheduled to the master node.
In effect, the master node is constantly spiking from 20% CPU up to 50% (sometimes higher) CPU usage (I assume every time Prometheus is scraping), and in fact, Prometheus is the cause of almost all of this CPU usage.
Hello,
Thanks for getting all of these images together! Its been a lot more challenging to track down all the right images for my cluster than I initially thought.
I tried following the quickstart non k3s guide to deploy the monitoring stack on my cluster (Pi 4's, running Raspian Buster lite, full k8s set up with kubeadm) and receive the following PCV-related errors for both the prometheus-k8s-0 pod and the grafana pod:
running "VolumeBinding" filter plugin for pod "prometheus-k8s-0": pod has unbound immediate PersistentVolumeClaims
running "VolumeBinding" filter plugin for pod "grafana-759f594549-5mrsj": pod has unbound immediate PersistentVolumeClaims
I tried setting up the volumes manually but am not able to get around the PVC issue described. Both pods are in pending until they can attach to their volumes. Is there a way to go around the plugin used to create the PV and do it manually? Is it possible to run the plugin on its own to attempt to create the required volumes? I haven't used any filter plugins as described in the log before so there could be something simple I am missing as well.
I re-made and re-deployed the manifests once, not sure yet what else to try. I don't see the PV volumes for either initialized in the cluster either.
I could be missing something obvious in set-up, let me know if there are some obvious missed items that could lead to this.
Please let me know if there is any other information I can provide that would be helpful.
Thanks!
Hi, the stack looks great, I've been following a tutorial here https://kauri.io/deploy-prometheus-and-grafana-to-monitor-a-kube/186a71b189864b9ebc4ef7c8a9f0a6b5/a
however, I am not able to deploy this using macosx catalina and brew installed go
after installing go using brew I set the PATH
brew install go
export PATH=$PATH:/usr/local/Cellar/go/1.14.2_1/bin/
make vendor finishes, but make deploy gives me this error
make
rm -rf manifests
./scripts/build.sh main.jsonnet /usr/local/Cellar/go/1.14.2_1//bin/jsonnet
using jsonnet from arg
+ set -o pipefail
+ rm -rf manifests
+ mkdir manifests
+ /usr/local/Cellar/go/1.14.2_1//bin/jsonnet -J vendor -m manifests main.jsonnet
+ xargs '-I{}' sh -c 'cat {} | gojsontoyaml > {}.yaml; rm -f {}' -- '{}'
RUNTIME ERROR: vars.jsonnet:1:1-10 Unknown variable: master_ip
main.jsonnet:2:14-35 thunk <vars> from <$>
main.jsonnet:11:60-64
utils.libsonnet:21:9-13 thunk from <function <anonymous>>
utils.libsonnet:17:26-29 thunk from <function <aux>>
utils.libsonnet:17:15-30 function <aux>
utils.libsonnet:21:5-21 function <anonymous>
main.jsonnet:11:14-92 thunk <kp> from <$>
main.jsonnet:21:81-83
<std>:1278:24-25 thunk from <function <anonymous>>
<std>:1278:5-33 function <anonymous>
main.jsonnet:21:64-99 $
During evaluation
make: *** [manifests] Error 1
as I am a go noob, any help would be appreciated
Cheers!
The metrices for the pods on some namespaces are not there. Actually only the metrices for the kube-system and metallb namespaces are present (although without the metrices fort Network I/O). When i check the cpu and memory usage using kubectl top pod foo
i can prove that the pods are using memory and cpu.
How can i debug the problem?
hello, i run this with HA k3s but the dashboard always show NONE instead of the graph
can this compatible with 2 or more master cluster?
I configured the master-ip in vars I have also enabled k3s, persistance volume and suffix in vars.jsonnet. My k3s master has the ip 192.168.1.2 and the worker 192.168.1.4. Then I did the make vendor, make and make deploy all pods are running but for some reason I cannot access grafana, prometheus and alertmanager. So I checked did kubectl get ingress --all-namespaces and the result was the following. Is there anything wrong with the steps I have perfomed ?
NAMESPACE NAME CLASS HOSTS ADDRESS PORTS AGE
monitoring alertmanager-main alertmanager.192.168.1.2.nip.io 192.168.1.4 80, 443
monitoring grafana grafana.192.168.1.2.nip.io 192.168.1.4 80, 443 12s
monitoring prometheus-k8s prometheus.192.168.1.2.nip.io 192.168.1.4 80, 443
A convenient enhancement would be support for dynamic persistent storage. storageClassName instead of volumeName.
Hi,
I followed the installation process an the "make vendor" steps. Everything worked.
Then I started to run plain "make" and got the following error:
$ make
rm -rf manifests
./scripts/build.sh main.jsonnet
+ set -o pipefail
+ rm -rf manifests
+ mkdir manifests
+ jsonnet -J vendor -m manifests main.jsonnet
+ xargs '-I{}' sh -c 'cat {} | gojsontoyaml > {}.yaml; rm -f {}' -- '{}'
./scripts/build.sh: line 15: jsonnet: command not found
Makefile:12: recipe for target 'manifests' failed
make: *** [manifests] Error 127
Here is my OS info:
$ cat /etc/os-release
PRETTY_NAME="Raspbian GNU/Linux 9 (stretch)"
NAME="Raspbian GNU/Linux"
VERSION_ID="9"
VERSION="9 (stretch)"
ID=raspbian
ID_LIKE=debian
HOME_URL="http://www.raspbian.org/"
SUPPORT_URL="http://www.raspbian.org/RaspbianForums"
BUG_REPORT_URL="http://www.raspbian.org/RaspbianBugs"
HYPRIOT_OS="HypriotOS/armhf"
HYPRIOT_OS_VERSION="v2.0.1"
HYPRIOT_DEVICE="Raspberry Pi"
HYPRIOT_IMAGE_VERSION="v1.9.0"
Here is my k8s info:
Client Version: version.Info{Major:"1", Minor:"12", GitVersion:"v1.12.2", GitCommit:"17c77c7898218073f14c8d573582e8d2313dc740", GitTreeState:"clean", BuildDate:"2018-10-24T06:54:59Z", GoVersion:"go1.10.4", Compiler:"gc", Platform:"linux/arm"}
Server Version: version.Info{Major:"1", Minor:"12", GitVersion:"v1.12.2", GitCommit:"17c77c7898218073f14c8d573582e8d2313dc740", GitTreeState:"clean", BuildDate:"2018-10-24T06:43:59Z", GoVersion:"go1.10.4", Compiler:"gc", Platform:"linux/arm"}
What am I doing wrong?
I attempted to deploy to my rasperry pi cluster running k3s and i am unable to reach the dashboards. I was thinking maybe something with my k3s setup was wrong so i attempted to re-deploy k3s but am still seeing the same issue.
{
...
modules: [
{
// After deployment, run the create_gmail_auth.sh script from scripts dir.
name: 'smtpRelay',
enabled: false,
file: import 'smtp_relay.jsonnet',
},
{
name: 'armExporter',
enabled: true,
file: import 'arm_exporter.jsonnet',
},
{
name: 'upsExporter',
enabled: false,
file: import 'ups_exporter.jsonnet',
},
{
name: 'metallbExporter',
enabled: false,
file: import 'metallb.jsonnet',
},
{
name: 'traefikExporter',
enabled: true,
file: import 'traefik.jsonnet',
},
{
name: 'elasticExporter',
enabled: false,
file: import 'elasticsearch_exporter.jsonnet',
},
],
k3s: {
enabled: true,
master_ip: ['<my-ip>'],
},
// Domain suffix for the ingresses
suffixDomain: '<my-ip>.nip.io',
// If TLSingress is true, a self-signed HTTPS ingress with redirect will be created
TLSingress: true,
...
}
servicemonitor.monitoring.coreos.com/traefik created
The Secret "ingress-TLS-secret" is invalid: metadata.name: Invalid value: "ingress-TLS-secret": a DNS-1123 subdomain must consist of lower case alphanumeric characters, '-' or '.', and must start and end with an alphanumeric character (e.g. 'example.com', regex used for validation is '[a-z0-9]([-a-z0-9]*[a-z0-9])?(\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*')
make: *** [deploy] Error 1
I have noticed that the kube-controller-manager
and the kube-scheduler
disappears from the list of targets in prometheus
after 12-24 hours. The metrics endpoints it still available thought.
I have tried to restart the prometheus container, but to no avail. The only solution so far it to re-apply the manifest-files.
Thanks for a great repo!
I have tried this project on my Raspberry 4 Arm64 K3s cluster running k3s on unbuntu 19..10.
When I ran without persistant storage everything seemed to work great, I then tried to deploy with perstistent storage on my Ceph cluster publishing block storage, then the grafana fails to launch with the below error.
GF_PATHS_DATA='/var/lib/grafana' is not writable.
You may have issues with file permissions, more information here: http://docs.grafana.org/installation/docker/#migration-from-a-previous-version-of-the-docker-container-to-5-1-or-later
mkdir: can't create directory '/var/lib/grafana/plugins': Permission denied
Below is my vars.jsonnet file
{
_config+:: {
namespace: 'monitoring',
},
// Enable or disable additional modules
modules: [
{
// After deployment, run the create_gmail_auth.sh script from scripts dir.
name: 'smtpRelay',
enabled: false,
file: import 'smtp_relay.jsonnet',
},
{
name: 'armExporter',
enabled: true,
file: import 'arm_exporter.jsonnet',
},
{
name: 'upsExporter',
enabled: false,
file: import 'ups_exporter.jsonnet',
},
{
name: 'metallbExporter',
enabled: false,
file: import 'metallb.jsonnet',
},
{
name: 'traefikExporter',
enabled: true,
file: import 'traefik.jsonnet',
},
{
name: 'elasticExporter',
enabled: false,
file: import 'elasticsearch_exporter.jsonnet',
},
],
k3s: {
enabled: true,
master_ip: ['192.168.5.41'],
},
// Domain suffix for the ingresses
suffixDomain: 'example.com',
// If TLSingress is true, a self-signed HTTPS ingress with redirect will be created
TLSingress: true,
// If UseProvidedCerts is true, provided files will be used on created HTTPS ingresses.
// Use a wildcard certificate for the domain like ex. "*.192.168.99.100.nip.io"
UseProvidedCerts: false,
TLSCertificate: importstr 'server.crt',
TLSKey: importstr 'server.key',
// Setting these to false, defaults to emptyDirs
enablePersistence: {
prometheus: true,
grafana: true,
},
// Grafana "from" email
grafana: {
from_address: '[email protected]',
},
}
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning Failed 11m (x12279 over 44h) kubelet, ubuntuapp02 Error: secret "smtp-account" not found
Just wanted to say thank you for this project Carlos.
As an amateur with a bunch of PIs trying to learn and setup a cluster this was invaluable to me.
Thank you
Thank you for the awesome project.
I ran the ./deploy
script but just have one issue trying to bring everything up. The prometheus-k8s-0 and grafana pods seem to be stuck pending.
...
monitoring grafana-68c8dcd4dd-bv7ng 0/1 Pending 0 17m
monitoring prometheus-k8s-0 0/2 Pending 0 15s
monitoring prometheus-operator-c65785d89-2xkn4 1/1 Running 0 10m
...
relevant kubectl get events -n monitoring
5m 12m 26 prometheus-k8s-0.1534b4e7ea27ebb6 Pod Warning FailedScheduling default-scheduler pod has unbound PersistentVolumeClaims (repeated 4 times)
38s 3m 10 prometheus-k8s-0.1534b562df4c937a Pod Warning FailedScheduling default-scheduler pod has unbound PersistentVolumeClaims (repeated 4 times)
1m 12m 43 prometheus-k8s-db-prometheus-k8s-0.1534b4e7214bad48 PersistentVolumeClaim Normal FailedBinding persistentvolume-controller no persistent volumes available for this claim and no storage class is set
12m 12m 1 prometheus-k8s.1534b4e721f847d6 StatefulSet Normal SuccessfulCreate statefulset-controller create Claim prometheus-k8s-db-prometheus-k8s-0 Pod prometheus-k8s-0 in StatefulSet prometheus-k8s success
12m 12m 9 prometheus-k8s.1534b4e72367ee8e StatefulSet Warning FailedCreate statefulset-controller create Pod prometheus-k8s-0 in StatefulSet prometheus-k8s failed error: pods "prometheus-k8s-0" is forbidden: error looking up service account monitoring/prometheus-k8s: serviceaccount "prometheus-k8s" not found
3m 12m 2 prometheus-k8s.1534b4e7ea855e82 StatefulSet Normal SuccessfulCreate statefulset-controller create Pod prometheus-k8s-0 in StatefulSet prometheus-k8s successful
3m 3m 1 prometheus-k8s.1534b562e3e10eeb StatefulSet Warning FailedCreate statefulset-controller create Pod prometheus-k8s-0 in StatefulSet prometheus-k8s failed error: The POST operation against Pod could not be completed at this time, please try again.
Any thoughts on what might be going on? I see that it wants a persistent volume, any way around that?
It seems like node_namespace_pod_container
is missing, I can't really pinpoint why. Is it K3s related?
Referenced issue
prometheus-operator/kube-prometheus#284
Is there a way to "easily" point to a pv/pvc in order to have real persistant storage in my k3s cluster?
Sorry it's not an issue, but a question. Maybe a feature request :)
PS. Is there a way to easily uninstall all of it without going one-by-one deleting all resources? Thank you!
When I try running make vendor
after customizing my vars.jsonnet
file, I keep getting:
$ make vendor
Installing jsonnet-bundler
rm -rf vendor
``/bin/jb install
/bin/sh: 1: /bin/jb: not found
make: *** [Makefile:26: vendor] Error 127
It seems like the path /bin/jb
is hardcoded, but when it installs jsonnet-bundler
it is running with my local GOPATH, which is ~/go
, so jb
is installed in ~/go/bin/jb
and not in the global /bin
dir.
Can the makefile be updated to work with just calling jb
instead? I have added the go bin path to my user's $PATH
as well, but since the /bin/jb
location is hardcoded, I have to manually add a symlink or install as root, which is a little strange.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.