Git Product home page Git Product logo

oomhero's Introduction

OOMHero

OOMHero is a sidecar that helps you to keep track of your containers memory usage. By implementing it two signals are going to be send to your container as the memory usage grows: a warning and a critical signals. By leveraging these signals you might be able to defeat the deadly OOMKiller.

How it works

This sidecar will send your container two signals: when memory usage crosses so called warning(SIGUSR1 by default) and critical(SIGUSR2 by default) thresholds. It is possible to use different signals by specifying appropriate environment variables. Your application must be able to deal with these signals by implementing signal handlers.

You an see here an example of how to capture the signals in Go.

On limits

If only requests are specified during the pod Deployment no signal will be sent, this sidecar operates only on limits.

Deployment example

The Pod below is composed by two distinct containers, the first one is called bloat and its purpose is(as the name implies) to simulate a memory leak by constantly allocating in a global variable. The sidecar is an OOMHero configured to send a SIGUSR1(warning) when bloat reaches 65% and a SIGUSR2 (critical) on 90%. The only pre-requisite is that both containers share the same process namespace, hence shareProcessNamespace is set to true.

apiVersion: v1
kind: Pod
metadata:
  name: oomhero
spec:
  shareProcessNamespace: true
  containers:
    - name: bloat
      image: quay.io/rmarasch/bloat:latest
      imagePullPolicy: Always
      livenessProbe:
        periodSeconds: 3
        failureThreshold: 1
        httpGet:
          path: /healthz
          port: 8080
      resources:
        requests:
          memory: "256Mi"
          cpu: "250m"
        limits:
          memory: "256Mi"
          cpu: "250m"
    - name: oomhero
      image: quay.io/rmarasch/oomhero:latest
      imagePullPolicy: Always
      securityContext:
        privileged: true
      env:
      - name: WARNING
        value: "65"
      - name: CRITICAL
        value: "90" 

Saving the above yaml into a file you just need to deploy it:

$ kubectl create -f ./pod.yaml

That will create a Pod with two containers, you may follow the memory consumption and signals being sent by inspecting all pod logs.

$ # for bloat container log
$ kubectl logs -f oomhero --container bloat
$ # for oomhero container log
$ kubectl logs -f oomhero --container oomhero 

Configuring signals

Signals supported by OOMHero are:

  • SIGABRT
  • SIGCONT
  • SIGHUP
  • SIGINT
  • SIGIOT
  • SIGKILL
  • SIGQUIT
  • SIGSTOP
  • SIGTERM
  • SIGTSTP
  • SIGUSR1
  • SIGUSR2

To use any of those signals instead of default ones, set WARNING_SIGNAL and CRITICAL_SIGNAL environment variable to specify warning and critical signals respectively. If those environment variables are not set, OOMHero will use default values (SIGUSR1 and SIGUSR2).

For instance to send SIGTERM when critical threshold is reached put following in pod or deployment definition:

containers:
  # other containers omitted for brevity
  - name: oomhero
    image: quay.io/rmarasch/oomhero
    imagePullPolicy: Always
    env:
    - name: WARNING
      value: "65"
    - name: CRITICAL
      value: "90"
    - name: CRITICAL_SIGNAL
      value: "SIGTERM"

Cooldown

By default OOMHero sends one signal per second to other processes once they reach warning or critical threshold. This might be undesireable behavior in some circumstances, therefore cooldown can be configured. Once set, signal will be sent no more often than once in cooldown for each signal type separately. In other words other processes would not receive more than one warning and one ciritcal signal more often than once in cooldown.

To configure cooldown set COOLDOWN environment variable in deployment definition to a value conforming to time.ParseDuartion:

containers:
  # other containers omitted for brevity
  - name: oomhero
    image: quay.io/rmarasch/oomhero
    imagePullPolicy: Always
    env:
    - name: COOLDOWN
      value: "1m30s"

Help needed

Official documentation states that SYS_PTRACE capability is mandatory when signaling between containers on the same Pod. I could not validate if this is true as it works without it on my K8S cluster. If to make it work you had to add this capability please let me know.

oomhero's People

Contributors

ricardomaraschini avatar szarykott avatar vdepatla avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

oomhero's Issues

Test with resources different than limits

I have only tested this with resources limits and requests equal. We need to simulate to see how cgroup files contents are set when limits and requests are different.

Properly validate Warning and Critical

We need to check if Warning and Critical are properly set. Rules are very simple:

  1. Warning and Critical must be lower or equal to 100;
  2. Warning must be lower or equal to Critical;

If conditions are not met, use the default values and notify through log.

Option to reduce or turn off the logs?

This works well but generates too many logs in the output.
Is there an option to reduce the log output or completely turn off the logging functionality?
Preferably through some flag or something similar.

Configurable signals

First of all - thank you for this work, this is the only solution on the internet I found to solve my problems!

However, there is an improvement that would facilitate seamless integration with the existing setup at my company - configurable signals that are sent to other processes. As of now, they are hardcoded as SIGUSR1 and SIGUSR2. To take advantage of them we need to implement logic in the application to handle them, whereas what we want to achieve is simply gracefully shutting down the application - and for that, SIGTERM would be perfect.

To keep backward compatibility and flexibility signal to use could be provided via an environment variable or command line argument with defaults that are equal to currently hardcoded signals.

I am willing to code this and make a pull request, but I want to know your opinion first. Please let me know what you think!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.