Git Product home page Git Product logo

pharos-cluster's Introduction

Pharos Cluster

Build Status Chat on Slack

Pharos Cluster is a Kontena Pharos (Kubernetes distribution) management tool. It handles cluster bootstrapping, upgrades and other maintenance tasks via SSH connection and Kubernetes API access.

Installation

Download binaries

The binary packages are available on the releases page.

Build and install Ruby gem

You need Ruby version 2.5

$ gem build pharos-cluster.gemspec
$ gem install pharos-cluster*.gem
$ pharos --help

Usage

See documentation.

Further Information

Contributing

Bug reports and pull requests are welcome on GitHub at https://github.com/kontena/pharos-cluster.

pharos-cluster's People

Contributors

antonyveyre avatar captncraig avatar iahmad-khan avatar jakolehm avatar jnummelin avatar kke avatar madddi avatar miskun avatar nevalla avatar olanystrom avatar scottrobertson avatar spaletta avatar spcomb avatar techknowlogick avatar timer avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

pharos-cluster's Issues

Host labels

Option to add labels to hosts via cluster.yml.

Nodes should be configured in parallel

Optimize kupo up times by handling nodes in parallel.

All usages of Kupo::SSH::Client.for_host and Kupo::Kube.client need to be made threadsafe. It's enough to use a single SSH client for each host and run all the phases per node in serial, but things like Kupo::Phases::JoinNode and Kupo::Phases::LabelNode can't share the master node's SSH/kube clients if they run in parallel.

Backup add-on

Would be nice to get some backup setup automatically as on add-on.

Not sure if we'd need to backup anything else than etcd?

External etcd support

If we support external etcd (cluster) then we should be able to support multi-master config quite easily.

Related #12

heapster/metrics-server fail to auth against the secure kubelet API

Per #76 the kubelet --read-only-port is disabled and heapster/metrics-server are configured to talk to the authenticated kubelet https port: --source=kubernetes.summary_api:https://kubernetes.default.svc?kubeletHttps=true&kubeletPort=10250&useServiceAccount=true

However, the kubelet auth/authz is configured by kubeadm using kubelet --authorization-mode=Webhook --client-ca-file=..., which means that the kubelet ignores any bearer token and expects a TLS client cert for auth: https://kubernetes.io/docs/admin/kubelet-authentication-authorization/#kubelet-authentication

This means that the heapster/metrics-server currently attempt to connect to the kubelet API using the kube API in-cluster-config (serviceaccount token + kube CA), which fails because the kubelet uses a self-signed cert:

E0320 10:59:05.041917       1 manager.go:101] Error in scraping containers from kubelet_summary:167.99.136.175:10250: Get https://...:10250/stats/summary/: x509: cannot validate certificate for ... because it doesn't contain any IP SANs

Ẁith ?insecure=true, the kubelet API authorization fails, because it does not support the serviceaccount bearer token for authentication:

E0320 11:35:05.036030       1 manager.go:101] Error in scraping containers from kubelet_summary:...:10250: request failed - "403 Forbidden", response: "Forbidden (user=system:anonymous, verb=get, resource=nodes, subresource=stats)"

I think heapster/metrics-server will need to be provisioned with a custom client cert and kubeconfig in order to use the secure kubelet API, which unfortunately also requires disabling certificate verification for the kube API, at least until the kubelets get server certs signed from kube: kubernetes-retired/heapster#1498 (comment)

SSH exec errors are not handled

Hacking some debug logging into the Kupo::SSH::Client to demonstrate this with the current git master: Kupo::Phases::ConfigureKubelet is broken because the /etc/systemd/system/kubelet.service.d directory is missing (no longer created by the kubeadm package), so the phase is broken, but continues anyways.

    Configuring kubelet ...
SSH[167.99.36.141] exec sudo cat /etc/systemd/system/kubelet.service.d/5-kupo.conf...
SSH[167.99.36.141] exec stderr: cat: 
SSH[167.99.36.141] exec stderr: /etc/systemd/system/kubelet.service.d/5-kupo.conf
SSH[167.99.36.141] exec stderr: : No such file or directory
SSH[167.99.36.141] exec stderr: 
SSH[167.99.36.141] exec sudo cat /etc/systemd/system/kubelet.service.d/5-kupo.conf: exit 1
SSH[167.99.36.141] exec sudo mv /tmp/ec2ec6e56c8ed816047e25b7e8fe6b7e /etc/systemd/system/kubelet.service.d/5-kupo.conf...
SSH[167.99.36.141] exec stderr: mv: 
SSH[167.99.36.141] exec stderr: cannot move '/tmp/ec2ec6e56c8ed816047e25b7e8fe6b7e' to '/etc/systemd/system/kubelet.service.d/5-kupo.conf'
SSH[167.99.36.141] exec stderr: : No such file or directory
SSH[167.99.36.141] exec stderr: 
SSH[167.99.36.141] exec sudo mv /tmp/ec2ec6e56c8ed816047e25b7e8fe6b7e /etc/systemd/system/kubelet.service.d/5-kupo.conf: exit 1
SSH[167.99.36.141] exec sudo systemctl daemon-reload...
SSH[167.99.36.141] exec sudo systemctl daemon-reload: exit 0
SSH[167.99.36.141] exec sudo systemctl restart kubelet...
SSH[167.99.36.141] exec sudo systemctl restart kubelet: exit 0

Some phases explicitly check for exit_code = @ssh.exec(...) errors, but omit any of the stderr output, making these failures impossible to diagnose:

    Kubernetes control plane is not initialized, proceeding to initialize ...
    Initializing control plane ...
SSH[167.99.36.141] exec sudo kubeadm init --config /tmp/kubeadm.cfg.744c53f1bea5d9a16a7a81ba6d282171...
SSH[167.99.36.141] exec stdout: [init] Using Kubernetes version: v1.9.3
SSH[167.99.36.141] exec stdout: [init] Using Authorization modes: [Node RBAC]
SSH[167.99.36.141] exec stdout: [preflight] Running pre-flight checks.
SSH[167.99.36.141] exec stderr: 	[WARNING FileExisting-crictl]: crictl not found in system path
SSH[167.99.36.141] exec stderr: [preflight] Some fatal errors occurred:
	[ERROR Port-10250]: Port 10250 is in use
[preflight] If you know what you are doing, you can make a check non-fatal with `--ignore-preflight-errors=...`
SSH[167.99.36.141] exec sudo kubeadm init --config /tmp/kubeadm.cfg.744c53f1bea5d9a16a7a81ba6d282171: exit 2
Initialization of control plane failed!
/home/kontena/kontena/kupo/lib/kupo/phases/configure_master.rb:61:in `install'
/home/kontena/kontena/kupo/lib/kupo/phases/configure_master.rb:18:in `call'
/home/kontena/kontena/kupo/lib/kupo/up_command.rb:137:in `handle_masters'
/home/kontena/kontena/kupo/lib/kupo/up_command.rb:42:in `block in configure'
/home/kontena/kontena/kupo/lib/kupo/up_command.rb:41:in `chdir'
/home/kontena/kontena/kupo/lib/kupo/up_command.rb:41:in `configure'
/home/kontena/kontena/kupo/lib/kupo/up_command.rb:26:in `execute'
/home/kontena/kontena/kupo/vendor/bundle/ruby/2.3.0/gems/clamp-1.2.1/lib/clamp/command.rb:63:in `run'
/home/kontena/kontena/kupo/vendor/bundle/ruby/2.3.0/gems/clamp-1.2.1/lib/clamp/subcommand/execution.rb:11:in `execute'
/home/kontena/kontena/kupo/vendor/bundle/ruby/2.3.0/gems/clamp-1.2.1/lib/clamp/command.rb:63:in `run'
/home/kontena/kontena/kupo/vendor/bundle/ruby/2.3.0/gems/clamp-1.2.1/lib/clamp/command.rb:132:in `run'
/home/kontena/kontena/kupo/lib/kupo/root_command.rb:13:in `run'
bin/kupo:12:in `<top (required)>'

ERb YAML templates should have the .erb extension

  • Without the extension, code editors will detect it as YAML and will not highlight ERb code blocks
  • The file may not be valid YAML with the ERb code intact, making something like YAML.load(File.read('xyz.yml')) fail.
  • Having the extension makes it simple to conditionally interpolate only when necessary, the glob should therefore be *.{yml,yml.erb} and the reader should only run ERb interpolation when the extension is present.

Ingress-nginx erb template is broken

/Users/jari/.rubies/ruby-2.4.3/lib/ruby/2.4.0/erb.rb:896:in `eval': (erb):18: syntax error, unexpected ';' (SyntaxError)
; - if node_selector -; _erbout.concat "\n      nodeS
                       ^
(erb):20: syntax error, unexpected keyword_do_block, expecting end-of-input
; - node_selector.each do |key, value| -; _erbout.conca

Script executions do not fail the entire process

Cri-o setup is really flaky since the repos are failing. Fixing those is a separate issue, but reveals another bug we have. Now when the crio setup script fails, we still continue the setup process as-if the setup did go through:

 Configuring container runtime (cri-o) packages ...
    +     mkdir     -p     /etc/systemd/system/crio.service.d    
    + cat
    + apt-get install -y cri-o-1.9
    Reading package lists...    
    Building dependency tree...    
    Reading state information...    
    The following additional packages will be installed:
      cri-o-runc dirmngr gnupg-agent gnupg2 libassuan0 libgpgme11 libksba8
      libnpth0 pinentry-curses skopeo-containers
    Suggested packages:
      containernetworking-plugins gnupg-doc parcimonie xloadimage gpgsm
      pinentry-doc
    The following NEW packages will be installed:
      cri-o-1.9 cri-o-runc dirmngr gnupg-agent gnupg2 libassuan0 libgpgme11
      libksba8 libnpth0 pinentry-curses skopeo-containers
    0 upgraded, 11 newly installed, 0 to remove and 12 not upgraded.
    Need to get 9,903 kB of archives.
    After this operation, 43.4 MB of additional disk space will be used.
    Get:1 http://ppa.launchpad.net/projectatomic/ppa/ubuntu xenial/main amd64 cri-o-runc amd64 1.0.0-rc4.5-2~ubuntu16.04.2~ppa1 [1,524 kB]
    Get:2 http://archive.ubuntu.com/ubuntu xenial/main amd64 libassuan0 amd64 2.4.2-2 [34.6 kB]
    Get:3 http://archive.ubuntu.com/ubuntu xenial/main amd64 pinentry-curses amd64 0.9.7-3 [31.2 kB]
    Get:4 http://archive.ubuntu.com/ubuntu xenial/main amd64 libnpth0 amd64 1.2-3 [7,998 B]
    Get:5 http://ppa.launchpad.net/projectatomic/ppa/ubuntu xenial/main amd64 skopeo-containers amd64 0.1.28-2~ubuntu16.04.2~ppa5 [3,788 B]
    Err:5 http://ppa.launchpad.net/projectatomic/ppa/ubuntu xenial/main amd64 skopeo-containers amd64 0.1.28-2~ubuntu16.04.2~ppa5
      Hash Sum mismatch
    Get:6 http://ppa.launchpad.net/projectatomic/ppa/ubuntu xenial/main amd64 cri-o-1.9 amd64 1.9.10-1~ubuntu16.04.2~ppa1 [6,874 kB]
    Get:7 http://archive.ubuntu.com/ubuntu xenial/main amd64 gnupg-agent amd64 2.1.11-6ubuntu2 [239 kB]
    Get:8 http://archive.ubuntu.com/ubuntu xenial-updates/main amd64 libksba8 amd64 1.3.3-1ubuntu0.16.04.1 [90.2 kB]
    Get:9 http://archive.ubuntu.com/ubuntu xenial/main amd64 gnupg2 amd64 2.1.11-6ubuntu2 [756 kB]
    Get:10 http://archive.ubuntu.com/ubuntu xenial/main amd64 libgpgme11 amd64 1.6.0-1 [108 kB]
    Get:11 http://archive.ubuntu.com/ubuntu xenial/main amd64 dirmngr amd64 2.1.11-6ubuntu2 [235 kB]
    Fetched 9,902 kB in 2s (3,470 kB/s)
    E    :     Failed to fetch http://ppa.launchpad.net/projectatomic/ppa/ubuntu/pool/main/s/skopeo/skopeo-containers_0.1.28-2~ubuntu16.04.2~ppa5_amd64.deb  Hash Sum mismatch
    
    E    :     Unable to fetch some archives, maybe run apt-get update or try with --fix-missing?    
    Failed to execute configure-cri-o.sh
    Configuring kubelet ...
...

IMO these setup scripts failing should terminate the whole process immediately.

Use static binary for kubeadm

I saw mention in kubeadm docs that distro packaged kubeadm is not supported for upgrades (and I cannot find that page anymore). We need to investigate if this is still true and change installation/upgrade logic if needed.

Add option to set Let's Encrypt issuer (cert-manager)

Resource file example:

apiVersion: certmanager.k8s.io/v1alpha1
kind: Issuer
metadata:
  name: letsencrypt
  namespace: default
spec:
  acme:
    # The ACME server URL
    server: <%= server %>
    # Email address used for ACME registration
    email: <%= email %>
    # Name of a secret used to store the ACME account private key
    privateKeySecretRef:
      name: letsencrypt
    # Enable the HTTP-01 challenge provider
    http01: {}

TLS client certs used for service authentication must be rotated before expiry

#121 provisions the heapster and metrics-server stacks with TLS client certs managed via kube secrets and csr resources. Those certificates will have a validity period of one year, after which the metrics services would stop working.

Pharos needs support for automatically rotating those certs before they expire. That could be implemented within kupo as part of the same Pharos::Kube::CertManager used to provision the initial certs, but this would now require pharos to be run periodically to keep the cluster functional.

Alternatively, the client cert management could be moved to a custom controller running within the cluster itself... unless kube itself gains support for managing TLS client certs for pod serviceaccounts, or kubelet starts supporting serviceaccount token auth?

Consider following the kube camelCase key convention in addons

Example use case:

  1. User wants to run pods on some nodes only
  2. User googles around and ends up on kube docs
  3. The doc mentions nodeSelector but in Kupo::Addons::IngressNginx it is (after #27) node_selector.

The convention in kube seems to be to use camelCase, which work fine as Ruby symbols:

{ nodeSelector: { :someThing => 'value' } }

Using a different keying convention only adds to the confusion.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.