Git Product home page Git Product logo

kube-green's Introduction

Go Report Card Coverage Security Coverage Status Documentations Adopters

Dark kube-green logo

How many of your dev/preview pods stay on during weekends? Or at night? It's a waste of resources! And money! But fear not, kube-green is here to the rescue.

kube-green is a simple k8s addon that automatically shuts down (some of) your resources when you don't need them.

If you already use kube-green, add you as an adopter!

Getting Started

These instructions will get you a copy of the project up and running on your local machine for development and testing purposes. See how to install the project on a live system in our docs.

Prerequisites

Make sure you have Go installed (download). Version 1.19 or higher is required.

Installation

To have kube-green running locally just clone this repository and install the dependencies running:

go get

Running the tests

There are different types of tests in this repository.

It is possible to run all the unit tests with

make test

To run integration tests:

make e2e-test

Deployment

To deploy kube-green in live systems, follow the docs.

To run kube-green for development purpose, you can use ko to deploy in a KinD cluster. It is possible to start a KinD cluster running kind create cluster --name kube-green-development. To deploy kube-green using ko, run:

make local-run clusterName=kube-green-development

Usage

The use of this operator is very simple. Once installed on the cluster, configure the desired CRD to make it works.

See here the documentation about the configuration of the CRD.

CRD Examples

Pods running during working hours with Europe/Rome timezone, suspend CronJobs and exclude a deployment named api-gateway:

apiVersion: kube-green.com/v1alpha1
kind: SleepInfo
metadata:
  name: working-hours
spec:
  weekdays: "1-5"
  sleepAt: "20:00"
  wakeUpAt: "08:00"
  timeZone: "Europe/Rome"
  suspendCronJobs: true
  excludeRef:
    - apiVersion: "apps/v1"
      kind:       Deployment
      name:       api-gateway

Pods sleep every night without restore:

apiVersion: kube-green.com/v1alpha1
kind: SleepInfo
metadata:
  name: working-hours-no-wakeup
spec:
  sleepAt: "20:00"
  timeZone: Europe/Rome
  weekdays: "*"

To see other examples, go to our docs.

Contributing

Please read CONTRIBUTING.md for details on our code of conduct, and the process for submitting pull requests to us.

Versioning

We use SemVer for versioning. For the versions available, see the release on this repository.

How to upgrade the version

To upgrade the version:

  1. make release version=v{{NEW_VERSION_TO_TAG}} where {{NEW_VERSION_TO_TAG}} should be replaced with the next version to upgrade. N.B.: version should include v as first char.
  2. git push --tags origin v{{NEW_VERSION_TO_TAG}}

API Reference documentation

API reference is automatically generated with this tool. To generate it automatically, are added in api versioned folder a file doc.go with the content of file groupversion_info.go and a comment with +genclient in the sleepinfo_types.go file for the resource type.

License

This project is licensed under the MIT License - see the LICENSE file for details

Acknowledgement

Special thanks to JGiola for the tech review.

Give a Star! ⭐

If you like or are using this project, please give it a star. Thanks!

Adopters

Here the list of adopters of kube-green.

If you already use kube-green, add you as an adopter!

kube-green's People

Contributors

davidebianchi avatar davidkarlsen avatar dependabot[bot] avatar dirk39 avatar furkansb avatar imdmahajankanika avatar jgiola avatar msfidelis avatar nsavelyeva avatar pscanf avatar silversoul93 avatar victorboissiere avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

kube-green's Issues

Scale to 1

How can I scale a Deployment to 1 or 2 instead of zero? Is the PDB help?

Request: ARM64 support

In order to run this on RaspberryPi clusters we would need arm64/armhf support.
Is this something you can add into the build chain?

Stop nodes

Is it possible to Stop nodes too?

Say we want to stop nodes during nights. Is it possible from kube-green?

Support for countdown schedules

I feel there is a scenario not supported yet that would be great to have.
it would be interesting to support a countdown schedule where we can scale a deployment to 0 after x days/hours.
If a new deployment occurs, the scheduled sleepAt time is restarted.

I see this useful in dev environments or PoC scenarios where nobody use them after some time.

Just a possible example, scale to 0 after 15days.

No wakeUpAt
sleepAt: */15 *:*

Is this something you would consider?

Does kube-green supported on k8s versions 1.24.9 or higher

I would like to know whether kube-green is supported in k8s versions > 1.24 as I can see some errors while deploying sleepInfo controller in AKS version 1.24.9, where same sleepInfo yaml worked fine without any issues on 1.23.12.

sleepInfo.yaml:

apiVersion: kube-green.com/v1alpha1
kind: SleepInfo
metadata:
  name: working-hours
spec:
  weekdays: "*"
  timeZone: "Europe/Rome"
  sleepAt: "10:52"
  wakeUpAt: "10:54"

below is the error log from kube-green-controller.


2023-03-07T09:29:34Z    ERROR   controllers.SleepInfo   unable to fetch sleepInfo       {"sleepinfo": "default/working-hours", "error": "SleepInfo.kube-green.com \"working-hours\" not found"}
github.com/kube-green/kube-green/controllers/sleepinfo.(*SleepInfoReconciler).Reconcile
        /workspace/controllers/sleepinfo/sleepinfo_controller.go:80
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile
        /go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:122
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler
        /go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:323
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem
        /go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:274
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2
        /go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:235

I found in the documentation that its currently supports (https://kube-green.dev/docs/install/#prerequisite) >= 1.19 <=1.24 , however want to know when kube-green is available on k8s 1.24.9

error when creating sleep.yaml

i getting this err msg when i try to deploy vsleepinfo.kb.io kind:
Error from server (InternalError): error when creating "sleep.yaml": Internal error occurred: failed calling webhook "vsleepinfo.kb.io": Post "https://kube-green-webhook-service.kube-green.svc:443/validate-kube-green-com-v1alpha1-sleepinfo?timeout=10s": context deadline exceeded

sleep.yaml
apiVersion: kube-green.com/v1alpha1 kind: SleepInfo metadata: name: sleepinfo-sample spec: weekdays: "1-5" sleepAt: "20:00" wakeUpAt: "08:00" timeZone: "Europe/Rome" suspendCronJobs: true excludeRef: - apiVersion: "apps/v1" kind: Deployment name: main-api

kubernetes cluster version: 1.21.10-gke.400
cert manager: 1.0.4 also tried latest version 1.7.1

what I'm missing ?

Make issue template

-> Issue template provides ease for the folks who want to open an issue.
-> It differentiates the various types of issues such as Code, Bug, Docs, Others
-> It also follows community standards.

Kube-green "skipping" deploy restoring at "wakeUpAt" time

Hi,
I started using kube-green in a cluster but I'm noticing a behavior I'm not able to explain for sure.
In one of the namespaces I configured, the sleepAt time is getting respected while the following wakeUpAt is not.

  • kubernetes version 1.23
  • kube-green version 0.4.1
  • I have multiple namespaces
  • not all of the namespaces have a SleepInfo CRD (as we do not need it)
  • the setting works in all namespaces except one
  • such a namespace has a high number of deployments compared to the others
  • In the logs I'm not seeing any error
  • In the problematic namespace I have the following SleepInfo resource
apiVersion: kube-green.com/v1alpha1
kind: SleepInfo
metadata:
  name: working-hours
spec:
  weekdays: "1-5"
  sleepAt: "20:00"
  wakeUpAt: "08:10"
  timeZone: "Europe/Rome"
  suspendDeployments: true
  suspendCronJobs: false
  • This is what I see in logs near the time the SleepInfo is supposed to trigger:
08:11:32.357 - INFO	controllers.SleepInfo	skip execution	{"sleepinfo": "back/working-hours", "now": 1675149092.3573985, "next run": 1675235400, "requeue": "23h58m27.642692068s"}

Is it possible that kube-green skips the execution if it's not able to retrieve from the cluster all the necessary information within a specific timeout?
From a quick look, looks to me like there's a timeout of 60 seconds within the code, is that applied to the above evaluation?

Thanks

Support a hierarchy of namespaces

It would be really useful to define a hierarchy of namespaces and start/stop them with kube-green.
This would help to shutdown large applications that may be maintained by more than one team

Improve controller deploy

Controller is deployed using kustomize. To do it, it is necessary to download the kube-green repo and run the make deploy command.

Possible alternatives:

  • olm - #87
  • helm chart
  • simple command to install the controller without download the repository

Sleep with TTL

As part of our testing process, we create a namespace with all microservices and 3rd parties for each developer as a namespace.
This namespace has a 72-hour TTL, pass that the namespace is destroyed automatically using a cronjob via K8S API.
Sometimes, this 72 hours is more than the developer needs to check the namespace, so we would like to put this namespace to sleep after 48 hours using kube-green and kill it after 72 hours.
This deletion can also be done manually but requires the developer to login into the cluster, which ends up having unused namespaces as a result of laziness.

is it possible by using kube-green?

Make test rise an error: controller-gen: No such file or directory

Hi! I just cloned the repo and I followed the README.md instructions.
When I started the make test command I received the following error
bash: /Users/giulioroggero/sourcecode/kube-green/bin/controller-gen: No such file or directory

  • go: v1.18.3
  • os: macOS 12.4
  • make: v3.81

It follows the complete make test output

➜  kube-green git:(main) go version
go version go1.18.3 darwin/arm64
➜  kube-green git:(main) go get
➜  kube-green git:(main) make test
go: creating new go.mod: module tmp
Downloading sigs.k8s.io/controller-tools/cmd/[email protected]
go: added github.com/fatih/color v1.12.0
go: added github.com/go-logr/logr v0.4.0
go: added github.com/gobuffalo/flect v0.2.3
go: added github.com/gogo/protobuf v1.3.2
go: added github.com/google/go-cmp v0.5.6
go: added github.com/google/gofuzz v1.1.0
go: added github.com/inconshreveable/mousetrap v1.0.0
go: added github.com/json-iterator/go v1.1.10
go: added github.com/mattn/go-colorable v0.1.8
go: added github.com/mattn/go-isatty v0.0.12
go: added github.com/modern-go/concurrent v0.0.0-20180306012644-bacd9c7ef1dd
go: added github.com/modern-go/reflect2 v1.0.1
go: added github.com/spf13/cobra v1.1.3
go: added github.com/spf13/pflag v1.0.5
go: added golang.org/x/mod v0.4.2
go: added golang.org/x/net v0.0.0-20210428140749-89ef3d95e781
go: added golang.org/x/sys v0.0.0-20210510120138-977fb7262007
go: added golang.org/x/text v0.3.6
go: added golang.org/x/tools v0.1.3
go: added golang.org/x/xerrors v0.0.0-20200804184101-5ec99f83aff1
go: added gopkg.in/inf.v0 v0.9.1
go: added gopkg.in/yaml.v2 v2.4.0
go: added gopkg.in/yaml.v3 v3.0.0-20210107192922-496545a6307b
go: added k8s.io/api v0.21.2
go: added k8s.io/apiextensions-apiserver v0.21.2
go: added k8s.io/apimachinery v0.21.2
go: added k8s.io/klog/v2 v2.8.0
go: added k8s.io/utils v0.0.0-20201110183641-67b214c5f920
go: added sigs.k8s.io/controller-tools v0.6.1
go: added sigs.k8s.io/structured-merge-diff/v4 v4.1.0
go: added sigs.k8s.io/yaml v1.2.0
/Users/giulioroggero/sourcecode/kube-green/bin/controller-gen "crd:trivialVersions=true,preserveUnknownFields=false" rbac:roleName=manager-role webhook paths="./..." output:crd:artifacts:config=config/crd/bases
bash: /Users/giulioroggero/sourcecode/kube-green/bin/controller-gen: No such file or directory
make: *** [manifests] Error 127

end-to-end test fails - failed to pull image

Hi there!

I'm trying to run the end to end test

kubectl kuttl test --skip-cluster-delete

on my mac book with podman and kind but it fails because the image kubegreen/kube-green:e2e-test is not found. Please find below the events of the controller kube-green-controller-manager

Events:
  Type     Reason          Age                    From     Message
  ----     ------          ----                   ----     -------
  Normal   BackOff         9d (x6507 over 10d)    kubelet  Back-off pulling image "kubegreen/kube-green:e2e-test"
  Warning  FailedMount     27m                    kubelet  MountVolume.SetUp failed for volume "cert" : failed to sync secret cache: timed out waiting for the condition
  Normal   SandboxChanged  27m                    kubelet  Pod sandbox changed, it will be killed and re-created.
  Normal   Pulled          27m                    kubelet  Container image "gcr.io/kubebuilder/kube-rbac-proxy:v0.12.0" already present on machine
  Normal   Created         27m                    kubelet  Created container kube-rbac-proxy
  Normal   Started         27m                    kubelet  Started container kube-rbac-proxy
  Normal   Pulling         27m (x3 over 27m)      kubelet  Pulling image "kubegreen/kube-green:e2e-test"
  Warning  Failed          26m (x3 over 27m)      kubelet  Failed to pull image "kubegreen/kube-green:e2e-test": rpc error: code = NotFound desc = failed to pull and unpack image "docker.io/kubegreen/kube-green:e2e-test": failed to resolve reference "docker.io/kubegreen/kube-green:e2e-test": docker.io/kubegreen/kube-green:e2e-test: not found
  Warning  Failed          26m (x3 over 27m)      kubelet  Error: ErrImagePull
  Warning  Failed          26m (x5 over 27m)      kubelet  Error: ImagePullBackOff
  Normal   BackOff         2m33s (x109 over 27m)  kubelet  Back-off pulling image "kubegreen/kube-green:e2e-test"

I check on https://hub.docker.com/u/kubegreen and the image is not available.

Add exclusion deployment rule

In some cases, it could be necessary to always have one or more services up and running.

An example of the service could be an api-gateway, which resolve with a courtesy page.

Add combination of sleepAt options in one resource

Hi, I was thinking about a way to combine more SleepInfo rules in only one resource.

Right now, to turn off a namespace, I have to create a SleepInfo resource defining spec.weekdays and spec.sleepAt.
If I would have to combine a more articulate way to make my namespace sleep I have to create more SleepInfos.

In my opinion, it would be nice to have the possibility to use only one SleepInfo manifest to manage this purpose, maybe having an array sleepInfoRules, that contains the different sleepAt values defined for different weekdays.

For example

spec:
  slpeeInfoRules:
    - weekdays: "1-5"
      sleepAt: "20:00"
      wakeUpAt: "07:00"
    - weekdays: "6-7"
      sleepAt: "00:00"   

Allow selecting excluded workloads with labels/annotations

Provide a way for users to exclude deployments by adding a label (or annotation) to them instead of listing them in SleepInfo. This would be easier for clusters with a global policy (see #167), still letting users exclude their workloads from the scale down.

How to just wakeup

My question is Genuine. Let me tell you one condition.
If I create SleepInfo for weekend and wakeup at weekdays but because of some urgent conditions, I want to make resources up again during weekend. I know I can change the sleepAt (to run for some times at weekend) but again I'll have to change sleepAt like previous (sleep for weekends) configuration and it's really bad if these steps of changing SleepAt can only apply through CICD and that too when someone from an organisation approve it and merge it specially like IAC and after preview can be checked and merge for a environment.

What is the best approach?

non-unique selectors

Installed from OLM v0.4.0.

When I apply sleepinfo I get:

k apply -f sleep.yaml 
Error from server (InternalError): error when creating "sleep.yaml": Internal error occurred: failed calling webhook "vsleepinfo.kb.io": failed to call webhook: Post "https://kube-green-controller-manager-service.openshift-operators.svc:443/validate-kube-green-com-v1alpha1-sleepinfo?timeout=10s": dial tcp 10.200.6.18:9443: connect: connection refused
svc:

k get svc -n openshift-operators -o wide|grep -i green 
kube-green-controller-manager-metrics-service                 ClusterIP   10.201.156.118   <none>        8443/TCP   30m   control-plane=controller-manager
kube-green-controller-manager-service                         ClusterIP   10.201.246.36    <none>        443/TCP    30m   control-plane=controller-manager
kube-green-webhook-service                                    ClusterIP   10.201.200.165   <none>        443/TCP    30m   control-plane=controller-manager

controller logs:

k logs kube-green-controller-manager-c7d45f7d9-j7rm8
I0929 08:15:06.504306       1 request.go:601] Waited for 1.043179934s due to client-side throttling, not priority and fairness, request: GET:https://10.201.0.1:443/apis/batch/v1?timeout=32s
1.6644393097184074e+09  INFO    controller-runtime.metrics      Metrics server is starting to listen    {"addr": "127.0.0.1:8080"}
1.664439309718671e+09   INFO    controller-runtime.builder      skip registering a mutating webhook, object does not implement admission.Defaulter or WithDefaulter wasn't called       {"GVK": "kube-green.com/v1alpha1, Kind=SleepInfo"}
1.664439309718695e+09   INFO    controller-runtime.builder      Registering a validating webhook        {"GVK": "kube-green.com/v1alpha1, Kind=SleepInfo", "path": "/validate-kube-green-com-v1alpha1-sleepinfo"}
1.6644393097187605e+09  INFO    controller-runtime.webhook      Registering webhook     {"path": "/validate-kube-green-com-v1alpha1-sleepinfo"}
1.664439309718829e+09   INFO    setup   starting manager
1.6644393097190156e+09  INFO    controller-runtime.webhook.webhooks     Starting webhook server
1.664439309719225e+09   INFO    controller-runtime.certwatcher  Updated current TLS certificate
1.6644393097193158e+09  INFO    controller-runtime.webhook      Serving webhook server  {"host": "", "port": 9443}
1.6644393097193944e+09  INFO    Starting server {"path": "/metrics", "kind": "metrics", "addr": "127.0.0.1:8080"}
1.664439309719424e+09   INFO    Starting server {"kind": "health probe", "addr": "[::]:8081"}
1.6644393097194986e+09  INFO    controller-runtime.certwatcher  Starting certificate watcher
I0929 08:15:09.719558       1 leaderelection.go:248] attempting to acquire leader lease openshift-operators/2bd226ed.kube-green.com...
I0929 08:15:09.805968       1 leaderelection.go:258] successfully acquired lease openshift-operators/2bd226ed.kube-green.com
1.6644393098061695e+09  DEBUG   events  Normal  {"object": {"kind":"Lease","namespace":"openshift-operators","name":"2bd226ed.kube-green.com","uid":"6fd7faf6-386e-4f05-9d3d-2fbca6b1274a","apiVersion":"coordination.k8s.io/v1","resourceVersion":"995652687"}, "reason": "LeaderElection", "message": "kube-green-controller-manager-c7d45f7d9-j7rm8_5c57e8db-f880-4e84-9c1d-ac092562c620 became leader"}
1.6644393098063145e+09  INFO    Starting EventSource    {"controller": "sleepinfo", "controllerGroup": "kube-green.com", "controllerKind": "SleepInfo", "source": "kind source: *v1alpha1.SleepInfo"}
1.6644393098063338e+09  INFO    Starting Controller     {"controller": "sleepinfo", "controllerGroup": "kube-green.com", "controllerKind": "SleepInfo"}
1.6644393109079242e+09  INFO    Starting workers        {"controller": "sleepinfo", "controllerGroup": "kube-green.com", "controllerKind": "SleepInfo", "worker count": 10}

Need example of sleepInfo for full weekend stop.

Hello, we need to turn off our workload during the weekend from Friday evening at 21:00 PM until Monday 06:00 AM. The rest of the time, our workloads must run non-stop from Monday to Friday. We have tested different settings but we can't find a configuration that allows us to do this. Could you help us? And at the same time enrich the documentation with this type of configuration. Thank you for your help.

Error suspending executed cronjobs

Suspension of CronJobs which have been executed is failing with the error the object has been modified; please apply your changes to the latest version and try again.
The reason is that DeepCopy() copies all the fields of the object, including uid, creationTimestamp, etc.
I have prepared a fix for this and tested it locally. I would like to open a PR, please let me know if I may proceed.

Steps to reproduce:

  1. Have a namespace with at least one cronjob which has been already executed
  2. Within that namespace apply SleepInfo to suspend cronjobs and wait until it gets executed

Expected result:

All cronjobs become suspended

Actual result:

The cronjobs that have been executed will not be suspended (all the other cronjobs will be suspended):

1.6661703052914312e+09	ERROR	controllers.SleepInfo	fails to handle sleep	{"sleepinfo": "<namespace>/<sleepinfo_name>", "error": "Operation cannot be fulfilled on cronjobs.batch \"<cronjob_name>\": the object has been modified; please apply your changes to the latest version and try again"}
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile
	/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:121
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler
	/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:320
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem
	/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:273
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2
	/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:234
1.6661703052989893e+09	ERROR	Reconciler error	{"controller": "sleepinfo", "controllerGroup": "kube-green.com", "controllerKind": "SleepInfo", "sleepInfo": {"name":"<sleepinfo_name>","namespace":"<namespace>"}, "namespace": "<namespace>", "name": "<sleepinfo_name>", "reconcileID": "6372ae6f-cc7c-4367-b84b-0b708966cd88", "error": "Operation cannot be fulfilled on cronjobs.batch \"<cronjob_name>\": the object has been modified; please apply your changes to the latest version and try again"}
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem
	/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:273
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2
	/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:234

GitOps support

Does this work also when those resources are managed by a gitops agent like fluxcd or argocd? If not, what needs to be done to make it work? I think this is really a good project!

OLM - OwnNamespace InstallModeType not supported, cannot configure to watch own namespace

Hello, thank you so much for this amazing project.

kubectl create -f https://operatorhub.io/install/kube-green.yaml
For some reason, we got the error below. Unfortunately, we couldn't fix it.
Could you help us?

kubectl describe csv kube-green.v0.5.0
OwnNamespace InstallModeType not supported, cannot configure to watch own namespace

Thanks

Sleep between Tuesday-Saturday, wake up Sunday

Hello,

Is there an option to configure namespaces to sleep WHOLE days and wake up on a specific days?

We have workloads only running on Sundays. On the rest of the days, I'd like to scale down all deployments in our namespaces and only wake them up on Sundays.

Documentation says that I can only configure sleep and for waking them up I have to deploy the resources manually.
Is there a workaround?

Thank you!

kube-green-controller-manager OOMKilled

Issue

The kube-green-controller-manager is unable to start since the manager container is killed (due to the memory limit).
Issue has been present since kube-green-0.5.0

Platform

  • OpenShift 4.10.x
  • kube-green installation through OpenShift community-operators

Steps to reproduce

  1. Install the kube-green operator (add "subscription" in OpenShift and install)
  2. Pods are started
  3. Container kube-rbac-proxy starts fine
  4. Container manager fails to start after consuming memory above the limit

Numbers

  • 82 namespaces with Sleepinfo
  • There should have been 86, based on the label.
  • 145 pods in all 86 namespaces.

The number of namespaces / pods is ever increasing.

Sleepinfo config via the OpenShift NamespaceConfig

apiVersion: redhatcop.redhat.io/v1alpha1
kind: NamespaceConfig
metadata:
  name: sleepinfo
spec:
  labelSelector:
    matchLabels:
      custom.label.com/sleep-enabled: "true"
  templates:
  - excludedPaths:
    - .spec.rbac.policy
    - .spec.replicas
    - .metadata
    - .status
    objectTemplate: |
      apiVersion: kube-green.com/v1alpha1
      kind: SleepInfo
      metadata:
        name: default-sleep
        namespace: {{ .Name }}
        labels:
          app.kubernetes.io/managed-by: namespace-configuration-operator
      spec:
        weekdays: "*"
        sleepAt: "02:00"
        timeZone: "Europe/Oslo"

Logs

openshift-operators)$ oc get pods -w
NAME                                                           READY   STATUS             RESTARTS        AGE
kube-green-controller-manager-88d764676-grtp7                  1/2     CrashLoopBackOff   6 (4m21s ago)   13m
kube-green-controller-manager-88d764676-grtp7                  1/2     Terminating        6 (4m29s ago)   13m
kube-green-controller-manager-88d764676-xc9hw                  0/2     Pending            0               0s
kube-green-controller-manager-88d764676-xc9hw                  0/2     Pending            0               0s
kube-green-controller-manager-88d764676-xc9hw                  0/2     ContainerCreating   0               1s
kube-green-controller-manager-88d764676-grtp7                  1/2     Terminating         6               13m
kube-green-controller-manager-88d764676-grtp7                  0/2     Terminating         6               13m
kube-green-controller-manager-88d764676-grtp7                  0/2     Terminating         6               13m
kube-green-controller-manager-88d764676-grtp7                  0/2     Terminating         6               13m
kube-green-controller-manager-88d764676-xc9hw                  0/2     ContainerCreating   0               2s
kube-green-controller-manager-88d764676-xc9hw                  1/2     Running             0               13s
kube-green-controller-manager-88d764676-xc9hw                  2/2     Running             0               21s
kube-green-controller-manager-88d764676-xc9hw                  1/2     OOMKilled           0               40s
kube-green-controller-manager-88d764676-xc9hw                  1/2     Running             1 (1s ago)      41s
kube-green-controller-manager-88d764676-xc9hw                  2/2     Running             1 (11s ago)     51s
kube-green-controller-manager-88d764676-xc9hw                  1/2     OOMKilled           1 (27s ago)     67s
kube-green-controller-manager-88d764676-xc9hw                  1/2     CrashLoopBackOff    1 (4s ago)      71s
kube-green-controller-manager-88d764676-xc9hw                  1/2     Running             2 (15s ago)     82s
kube-green-controller-manager-88d764676-xc9hw                  2/2     Running             2 (24s ago)     91s
kube-green-controller-manager-88d764676-xc9hw                  1/2     OOMKilled           2 (42s ago)     109s
kube-green-controller-manager-88d764676-xc9hw                  1/2     CrashLoopBackOff    2 (3s ago)      111s
kube-green-controller-manager-88d764676-xc9hw                  1/2     Running             3 (25s ago)     2m13s
kube-green-controller-manager-88d764676-xc9hw                  2/2     Running             3 (33s ago)     2m21s
kube-green-controller-manager-88d764676-xc9hw                  1/2     OOMKilled           3 (49s ago)     2m37s
kube-green-controller-manager-88d764676-xc9hw                  1/2     CrashLoopBackOff    3 (5s ago)      2m41s
kube-green-controller-manager-88d764676-xc9hw                  1/2     Running             4 (47s ago)     3m23s
kube-green-controller-manager-88d764676-xc9hw                  2/2     Running             4 (55s ago)     3m31s
kube-green-controller-manager-88d764676-xc9hw                  1/2     OOMKilled           4 (75s ago)     3m51s
kube-green-controller-manager-88d764676-xc9hw                  1/2     CrashLoopBackOff    4 (10s ago)     4m1s
kube-green-controller-manager-88d764676-xc9hw                  1/2     Running             5 (92s ago)     5m23s
kube-green-controller-manager-88d764676-xc9hw                  2/2     Running             5 (100s ago)    5m31s
kube-green-controller-manager-88d764676-xc9hw                  1/2     OOMKilled           5 (116s ago)    5m47s
kube-green-controller-manager-88d764676-xc9hw                  1/2     CrashLoopBackOff    5 (4s ago)      5m51s
kube-green-controller-manager-88d764676-xc9hw                  1/2     Running             6 (2m49s ago)   8m36s
kube-green-controller-manager-88d764676-xc9hw                  2/2     Running             6 (3m4s ago)    8m51s
kube-green-controller-manager-88d764676-xc9hw                  1/2     OOMKilled           6 (3m13s ago)   9m
kube-green-controller-manager-88d764676-xc9hw                  1/2     CrashLoopBackOff    6 (1s ago)      9m1s

Container logs

+ kube-green-controller-manager-88d764676-grtp7 › kube-rbac-proxy
kube-green-controller-manager-88d764676-grtp7 kube-rbac-proxy I0324 10:41:05.538150       1 main.go:186] Valid token audiences: 
kube-green-controller-manager-88d764676-grtp7 kube-rbac-proxy I0324 10:41:05.538239       1 main.go:316] Generating self signed cert as no cert is provided
kube-green-controller-manager-88d764676-grtp7 kube-rbac-proxy I0324 10:41:06.173084       1 main.go:366] Starting TCP socket on 0.0.0.0:8443
kube-green-controller-manager-88d764676-grtp7 kube-rbac-proxy I0324 10:41:06.173361       1 main.go:373] Listening securely on 0.0.0.0:8443
+ kube-green-controller-manager-88d764676-grtp7 › manager
kube-green-controller-manager-88d764676-grtp7 manager I0324 10:49:38.172885       1 request.go:690] Waited for 1.038301707s due to client-side throttling, not priority and fairness, request: GET:https://10.201.0.1:443/apis/security.openshift.io/v1?timeout=32s
kube-green-controller-manager-88d764676-grtp7 manager 2023-03-24T10:49:43Z      INFO    controller-runtime.metrics      Metrics server is starting to listen    {"addr": "127.0.0.1:8080"}
kube-green-controller-manager-88d764676-grtp7 manager 2023-03-24T10:49:43Z      INFO    controller-runtime.builder      skip registering a mutating webhook, object does not implement admission.Defaulter or WithDefaulter wasn't called     {"GVK": "kube-green.com/v1alpha1, Kind=SleepInfo"}
kube-green-controller-manager-88d764676-grtp7 manager 2023-03-24T10:49:43Z      INFO    controller-runtime.builder      Registering a validating webhook        {"GVK": "kube-green.com/v1alpha1, Kind=SleepInfo", "path": "/validate-kube-green-com-v1alpha1-sleepinfo"}
kube-green-controller-manager-88d764676-grtp7 manager 2023-03-24T10:49:43Z      INFO    controller-runtime.webhook      Registering webhook     {"path": "/validate-kube-green-com-v1alpha1-sleepinfo"}
kube-green-controller-manager-88d764676-grtp7 manager 2023-03-24T10:49:43Z      INFO    setup   starting manager
kube-green-controller-manager-88d764676-grtp7 manager 2023-03-24T10:49:43Z      INFO    controller-runtime.webhook.webhooks     Starting webhook server
kube-green-controller-manager-88d764676-grtp7 manager 2023-03-24T10:49:43Z      INFO    Starting server {"path": "/metrics", "kind": "metrics", "addr": "127.0.0.1:8080"}
kube-green-controller-manager-88d764676-grtp7 manager 2023-03-24T10:49:43Z      INFO    Starting server {"kind": "health probe", "addr": "[::]:8081"}
kube-green-controller-manager-88d764676-grtp7 manager I0324 10:49:43.430559       1 leaderelection.go:248] attempting to acquire leader lease openshift-operators/2bd226ed.kube-green.com...
kube-green-controller-manager-88d764676-grtp7 manager 2023-03-24T10:49:43Z      INFO    controller-runtime.certwatcher  Updated current TLS certificate
kube-green-controller-manager-88d764676-grtp7 manager 2023-03-24T10:49:43Z      INFO    controller-runtime.webhook      Serving webhook server  {"host": "", "port": 9443}
kube-green-controller-manager-88d764676-grtp7 manager 2023-03-24T10:49:43Z      INFO    controller-runtime.certwatcher  Starting certificate watcher
kube-green-controller-manager-88d764676-grtp7 manager I0324 10:50:00.924221       1 leaderelection.go:258] successfully acquired lease openshift-operators/2bd226ed.kube-green.com
kube-green-controller-manager-88d764676-grtp7 manager 2023-03-24T10:50:00Z      DEBUG   events  kube-green-controller-manager-88d764676-grtp7_e2b1fd43-9356-45ed-a8a7-53881f37c022 became leader      {"type": "Normal", "object": {"kind":"Lease","namespace":"openshift-operators","name":"2bd226ed.kube-green.com","uid":"0e30af23-b9d7-4512-86b4-373c29ac0d0a","apiVersion":"coordination.k8s.io/v1","resourceVersion":"4117336319"}, "reason": "LeaderElection"}
kube-green-controller-manager-88d764676-grtp7 manager 2023-03-24T10:50:00Z      INFO    Starting EventSource    {"controller": "sleepinfo", "controllerGroup": "kube-green.com", "controllerKind": "SleepInfo", "source": "kind source: *v1alpha1.SleepInfo"}
kube-green-controller-manager-88d764676-grtp7 manager 2023-03-24T10:50:00Z      INFO    Starting Controller     {"controller": "sleepinfo", "controllerGroup": "kube-green.com", "controllerKind": "SleepInfo"}
kube-green-controller-manager-88d764676-grtp7 manager 2023-03-24T10:50:01Z      INFO    Starting workers        {"controller": "sleepinfo", "controllerGroup": "kube-green.com", "controllerKind": "SleepInfo", "worker count": 20}
- kube-green-controller-manager-88d764676-grtp7 › manager

How does kube-green perform shutdown and restarts?

I am exploring kube-green for our setup and I have a few questions:

  1. Can we scale down the entire workloads in namespace or do we have have to do it per workload?
  2. Can it be used to delete HPA and Keda ScaledObjects? (The reason for asking this question is that we have HPAs and ScaledObjects on deployments, if we scale the deployment to 0 they will restart the pod)
  3. Can we scale down openshift deploymentconfigs?
  4. If we have to perform bulk shutdown of lets say 100+ namespaces and want to prioritise starting up statefulsets before deployments can we do it?
  5. Can we pull the records of the shutdown, startups done in the past 3 months?
  6. How are the startups performed? Does kube-green spin up cronjobs to shutdown/startup for individual deployments or does it perform bulk if there are a lot of workloads?

Requirement of Cert-Manager

Documentation regarding the requirement of installing a cert-manager for Kube-green. Would be useful to understand the behaviour and the usage of Kube-green with a cert-manager.

Save historical information

It could be an interesting feature to save metrics of the kube-green usage. This data could be the basis for creating a dashboard to monitor the use of kube-green.

We can expose metrics in prometheus format. There is already a base set of metrics information exposed by the controller.

If prometheus exporter is used, some possible metrics are:

  • The total number of workload sleeped
  • Information about the currently deployed SleepInfo
  • Number of SleepInfo without wakeUpAt set
  • The total number of currently sleep replicas per resource per namespace
  • An histogram to know how many hours the resources are stopped

More details on metrics to collect

All the metrics below would be prefixed with kube_green

The total number of workload sleep

An interesting metrics could be the total number of workload stopped by kube-green. So, for example, for a cronjob suspended it counts 1, and for a deployment with 4 replicas it counts 1.
This data should be filterable per resource type (deployment, cronjob) and per namespace with custom labels.

To collect this metric with Prometheus, we need to use the counter metrics.

The proposed name for this metric is sleep_workload_total.

Information about the currently deployed SleepInfo

From Prometheus, it could be interesting to access to the resource information of the SleepInfo.
The interesting data to collect in this case are:

  • namespace where the SleepInfo is deployed
  • name of the SleepInfo

and should be the labels of the gauge metrics.

The proposed name for this metric is current_sleepinfo.

With this information, it should be possible to get the following metrics.

Total number of SleepInfo per cluster

This is the sum of the current_sleepinfo values.

Number of namespaces with SleepInfo currently enabled

This is the sum of the unique value of the label namespace in current_sleepinfo

Number of SleepInfo without wakeUpAt set

From Prometheus, it could be interesting to access to the resource information of the SleepInfo without wakeUpAt.
The interesting data to collect in this case are:

  • namespace where the SleepInfo is deployed
  • name of the SleepInfo

and should be the labels of the gauge metrics.

The proposed name for this metric is current_permanent_sleepinfo.


The following are interesting metrics, but more difficult to obtain. For example, it's easy to know how many resources are sleep, but it's always possible to manually deploy the resource. So, to make the metrics correct, it should stay in watch of the resource to verify they are not restored.

The total number of currently sleep replicas per resource per namespace

An interesting metrics could be a number which define the total number of replicas currently stopped by kube-green.
This data should be filterable per resource type (only deployment at the moment) and per namespace with custom labels.

To collect this metric with Prometheus, we need to use the gauge metrics.

The proposed name for this metric is currently_sleep_replicas.

An histogram to know how many hours the resources are stopped

This metric can be collected using an histogram metrics with bucket set to 1h, 3h, 5h, 8h, 12h, 24h, +24h.
This data should be filterable per namespace and SleepInfo name with custom labels.
If values of hours or minutes contains the special char *, the cron is repeated. So, in this case, we set the value to 1h in the histogram (except if weka up is not set, in that case it will be set to infinite).

The proposed name for this metric is sleep_duration_seconds.

Number of namespaces which are or not restored

This is the sum of the unique value of the combination of labels name and namespace in the sleep_duration_seconds bucket with the label le +Inf

Optionally disable the sleep/wake up cycle in namespace

The SleepInfo CRD could have a new optional field, which enable or disable the controller functionality.

It could be useful not to totally remove the CRD, but disable it. For example, if for a specific period of time the selected namespace should not be sleeped.

Suspend only cronjobs?

is it possible to suspend only cronjobs? I've tested the following template but no luck -

apiVersion: kube-green.com/v1alpha1
kind: SleepInfo
metadata:
  name: working-hours
spec:
  weekdays: "*"
  sleepAt: "16:30"
  wakeUpAt: "10:00"
  timeZone: "Asia/Jerusalem"
  suspendCronJobs: true
  excludeRef:
    - apiVersion: "apps/v1"
      kind: Deployment
      name: "*"

Cluster-wide SleepInfo and namespace exclusion

For cluster with large number of namespaces, it would be more convenient to set a new CRD called ClusterSleepInfo that operates across all namespaces, with default exclusion of kube-system and option list of namespaces and resources to exclude:

apiVersion: kube-green.com/v1alpha1
kind: ClusterSleepInfo
metadata:
  name: working-hours
spec:
  weekdays: "1-5"
  sleepAt: "20:00"
  wakeUpAt: "08:00"
  timeZone: "Europe/Rome"
  suspendCronJobs: true
  excludeRef:
    - apiVersion: "apps/v1"
      kind:       Deployment
      name:       my-deployment
    - apiVersion: "v1"
      kind:       Namespace
      name:       my-namespace

I am aware of the implications here (controller needs access to every namespace) but it sounds like a good idea (at least to me!)

Enhancement suggestion: Ability to select labels

Suggestion: ability to select by label folowing this example:

apiVersion: kube-green.com/v1alpha1
kind: SleepInfo
metadata:
  name: working-hours
spec:
  weekdays: "1-5"
  sleepAt: "23:0"
  wakeUpAt: "07:00"
  suspendDeployments: true
  suspendCronJobs: false
  selector:
    matchLabels:
      app: my-app

Reasoning: When using the current configuration, the behavior is to put the entire cluster to sleep and we select the apps that we don't want to be shut down with the excludeRef. With a label selector, it's possible to fine tune more easily wich deployments are put to sleep, enabling easy adoption on large scale clusters and ensuring nothing critical is being turned off.

Would you consider scaling back based on cyclical changes in carbon intensity as well?

Hi there.

Thanks for making this project - I wanted to ask about the roadmap for this project, and check I understand how this works.

kube-green currently works by "powering down" pods between certain times of day, right?

Are there aany other criteria in use at all? I ask we there is a project that might be relevant we worked on, called grid-intensity-go, specifically designed to support carbon aware cloud scheduling, and I figured before creating any issues or PRs to contribute to this project I might ask about the general direction of the project.

If it helps, this blog post here might help - we outlined a few approaches for greener scheduling of cloud compute using tools like Kubernetes and Nomad, and I'd be curious of any of these ideas might fit in the kube green roadmap in future.

https://www.thegreenwebfoundation.org/news/carbon-aware-scheduling-on-nomad-and-kubernetes

Ability to sleep and wake up an entire cluster

For operational efficiency, it would be interesting to have the ability to sleep and wake up one or more namespaces, or the entire cluster completely.

I believe using allow or deny expressions based on namespaces can help.

Example:

---
apiVersion: kube-green.com/v1alpha1
kind: SleepCluster # Fictional Name
metadata:
  name: working-hours
spec:
  weekdays: "1-5"
  sleepAt: "20:00"
  wakeUpAt: "07:30"
  timeZone: "America/Sao_Paulo"
  suspendCronJobs: true
  excludeNamespaces:
    - kube-system
    - karpenter
    - kube-green
---

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.