Git Product home page Git Product logo

kube_remediator's Introduction

Kube Remediator Build Status coverage

Remediators

Reschedules CrashLoopBackOff Pod to fix permanent crashes caused by stale init-container/sidecar/configmap

  • Listens to Pod update events and does a Pod list
  • Looks for containers in CrashLoopBackOff with restartCount > 5 (failureThreshold config)
  • Ignores Pods with annotation kube-remediator/CrashLoopBackOffRemediator: "false"
  • Can work in a single namespace, default is all namespaces "" (namespace config)
  • Ignores Pods without ownerReferences (Avoid deleting something which does not come back)

Deletes Pods with label kube-remediator/OldPodDeleter=true older than 24h

Reschedules Failed Pods by deleting them, since they are not automatically cleaned up.

  • Listens to Pod update events and does a Pod list
  • Finds pods in Failed status with reason OutOfCpu, OutofMemory.
  • Ignores Pods without ownerReferences (Avoid deleting something which does not come back)
  • Ignores Pods for Jobs because they can be automatically cleaned up.
  • Deletes the pods in failed status after 5 mins to have time to debug

Deletes Pods that in Completed status for more than 24h.

Unbound PersistentVolumeClaim cleaner TODO

Deletes PersistentVolumeClaim left behind by deleted StatefulSet, that are not automatically cleaned up otherwise

  • Waits for 7 days(configurable) before deleting
  • Ignores if PersistentVolume has persistentVolumeReclaimPolicy set to Retain

Deploy

kubectl apply -f kubernetes/rbac.yaml
kubectl apply -f kubernetes/app-server.yml

Configuration options:

  • Deploy provided image to use defaults under config/*
  • Make a new image FROM the provided image and add/remove config/*
  • Overwrite config/* with a mounted ConfigMap

Development

Boot Option A:

Run in local kubernetes with docker-for-mac

rake server

Boot Option B:

Run against local kubernetes cluster with go:

unset GOPATH
go mod vendor # install into local directory instead of global path
make dev # run on cluster from $KUBECONFIG (defaults to ~/.kube/config)

Test

  • Run unit tests: make test
  • Run a single suite: go test -run TestSuiteFailedPodRescheduler github.com/aksgithub/kube_remediator/pkg/remediator
  • Run a single test: comment out all other test in the suite and run the suite. TODO: improve.
# CrashLoopBackOffRemediator: pod is rescheduled after restarting 5 times ?
kubectl apply -f examples/crashloop_pod.yml

# OldPodDeleter: pod is deleted when it gets 24h old ? (best change the 24h in the code to 1min)
kubectl apply -f examples/old_pod.yml

Note: failed expectation in one test can lead to other tests failing. Only run one test when debugging.

kube_remediator's People

Contributors

grosser avatar yizhang-zen avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.