robusta-dev / krr Goto Github PK

View Code? Open in Web Editor NEW

2.7K 23.0 142.0 4.95 MB

Prometheus-based Kubernetes Resource Recommendations

License: MIT License

Python 98.70% Dockerfile 0.87% Shell 0.43%

kubectl kubernetes metrics monitoring prometheus rightsizing vpa cost-control cost-saving finops

krr's People

Contributors

Stargazers

Watchers

Forkers

evertonsa armenr phamngocsonls butangero adityashishu147 doytsujin raambabun andybubune jwcastillo jorikseldeslachts hongshibao dexcompiler yonahd elmcompany asafahmadov sam-sundar jaysonv axelkovacki billhammond-dev devinfra-br gustajz monadyn muhammad-asn musyaf11 vitorhrduarte bryanasdev000 navachaitanyak mnbbrown lujiajing1126 jumping tariqsheikhsw leewalter doker78 si3mshady gsbarure freemanpolys ironsource-mobile 126789t everesio abelite-fang lemyst hateyouinfinity cr7258 adesdorcas clementgautier dahston abhinav-26 hdiep-umd sluglab baprx jarviliam niveshramesh pintia lucas-bonilla maximbetin serdarkkts vect-0r dawnblack2 bokeumeom thetechoddbug kalyann567 rakhmad moinuddin14 vvb2001 ramanwadhiani gmcnaught mansinhlee tuananh1406 frankfoerster24 vgopinathan antverpp kmechg muuran jacobamar deltacodepl hopisaurus mattclegg argyle-engineering whitebear009 machecazzon seanorama nataliagranato mrueg miguelcl rt-devops mstrokin marv-devops eng-aacm hoolydiver lawrenceasrikin vahan90 pbr0ck3r mrluje naweeng praharsha1341 rohank07 whaakman dnv-gssit vishwassomasekhariah shlomosfez

krr's Issues

[FEATURE] Add VictoriaMetrics support

Could you please add VictoriaMetrics support? VictoriaMetrics support PromQL as well as Graphite queries so it should work.

Typo in setup for Azure Prometheus in README

In the README.md file there is

# If you are not logged in to Azure, uncomment out the following line
# az login
AZURE_BEARER=$(az account get-access-token --resource=https://prometheus.monitor.azure.com  --query accesssToken --output tsv); echo $AZURE_BEARER

But there should be (see the --query parameter)

# If you are not logged in to Azure, uncomment out the following line
# az login
AZURE_BEARER=$(az account get-access-token --resource=https://prometheus.monitor.azure.com  --query accessToken --output tsv); echo $AZURE_BEARER

Just a typo, but it wastes time of someone to troubleshoot, why the token does not get fetched.

Also the next command in the same sections, that utilizes the token, should be IMO

krr simple -p PROMETHEUS_URL --prometheus-auth-header "Bearer $AZURE_BEARER"

All other examples uses krr command directly and the the usage of namespace is missleading.

Feature: Add kubeconfig flag to allow user to select a specific kubeconfig file to use

Is your feature request related to a problem? Please describe.
When running krr It uses the config for authorization in Kubernetes.
This is a pain for developers that don't maintain a single merged kubeconfig.

Describe the solution you'd like
Add a --kubeconfig flag that will specify the kubeconfig that you want to use when running krr

Unit alignment

The unit measured is not aligned, which makes the compartment difficult. For example in memory usage:
k to M
2097152k -> 498M
Mi to M
128Mi -> 995M

Using VictoriaMetrics leads `Connection reset by peer` after 2 minutes

Describe the bug
Using VictoriaMetrics leads Connection reset by peer after 2 minutes.

To Reproduce
Steps to reproduce the behavior:

krr simple --verbose --prometheus-url https://vmselect.test.com/select/0/prometheus
See error:

During handling of the above exception, another exception occurred:

╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
│ /Users/furkan.turkal/robusta_krr/core/runner.py:202 in run                                       │
│                                                                                                  │
│ [Errno 2] No such file or directory: '/Users/furkan.turkal/robusta_krr/core/runner.py'           │
│                                                                                                  │
│ /Users/furkan.turkal/robusta_krr/core/runner.py:177 in _collect_result                           │
│                                                                                                  │
│ [Errno 2] No such file or directory: '/Users/furkan.turkal/robusta_krr/core/runner.py'           │
│                                                                                                  │
│ /Users/furkan.turkal/robusta_krr/core/runner.py:138 in _gather_objects_recommendations           │
│                                                                                                  │
│ [Errno 2] No such file or directory: '/Users/furkan.turkal/robusta_krr/core/runner.py'           │
│                                                                                                  │
│ /Users/furkan.turkal/robusta_krr/core/runner.py:113 in _calculate_object_recommendations         │
│                                                                                                  │
│ [Errno 2] No such file or directory: '/Users/furkan.turkal/robusta_krr/core/runner.py'           │
│                                                                                                  │
│ /Users/furkan.turkal/robusta_krr/core/integrations/prometheus/loader.py:97 in gather_data        │
│                                                                                                  │
│ [Errno 2] No such file or directory:                                                             │
│ '/Users/furkan.turkal/robusta_krr/core/integrations/prometheus/loader.py'                        │
│                                                                                                  │
│                                     ... 6 frames hidden ...                                      │
│                                                                                                  │
│ /Users/furkan.turkal/requests/sessions.py:635 in post                                            │
│                                                                                                  │
│ [Errno 2] No such file or directory: '/Users/furkan.turkal/requests/sessions.py'                 │
│                                                                                                  │
│ /Users/furkan.turkal/requests/sessions.py:587 in request                                         │
│                                                                                                  │
│ [Errno 2] No such file or directory: '/Users/furkan.turkal/requests/sessions.py'                 │
│                                                                                                  │
│ /Users/furkan.turkal/requests/sessions.py:745 in send                                            │
│                                                                                                  │
│ [Errno 2] No such file or directory: '/Users/furkan.turkal/requests/sessions.py'                 │
│                                                                                                  │
│ /Users/furkan.turkal/requests/models.py:899 in content                                           │
│                                                                                                  │
│ [Errno 2] No such file or directory: '/Users/furkan.turkal/requests/models.py'                   │
│                                                                                                  │
│ /Users/furkan.turkal/requests/models.py:818 in generate                                          │
│                                                                                                  │
│ [Errno 2] No such file or directory: '/Users/furkan.turkal/requests/models.py'                   │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
ChunkedEncodingError: ("Connection broken: ConnectionResetError(54, 'Connection reset by peer')", ConnectionResetError(54, 'Connection reset by peer'))

Expected behavior
Maybe worth adding some resiliency with retries and timeouts.

Screenshots
-

Desktop (please complete the following information):

OS: macOS
Browser -
Version v1.4.0

Brainstorming: krr-operator

I was thinking about to create a Kubernetes operator that reads the output of krr and apply the recommended values to corresponding cluster. However, I don't feel comfortable while writing Python and was thinking to create an operator in Go that calls krr under the hood to grab recommended values and apply using the client-go API. So I thought we could officially provide an operator.

Some design ideas:

Introduce the operator in the separate repo vs use this one as a monorepo
Create a container image for krr-operator
It's long running standalone single Pod that check and apply recommended values in scheduled manner
- leader-election may needed for H/A (may not needed in the first phase)
- some Prometheus metrics for monitoring stuff (may not needed in the first phase)
A reconciler: subscribe to informers and apply recommended values immediately (this is required because once we edit the CPU/MEM values, a new deployment may override it, so might think of re-applying. may not needed in the first phase)

It'd be perfect opportunity to introduce an operator for this brilliant tool. Wdyt?

krr doesn't find pods in target namespace - 404

Describe the bug

$ python3.11 krr.py simple -n $NAMESPACE
...
[ERROR] Error trying to list pods in $NAMESPACE (404)
...
HTTP response body: {"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"the server could not find the requested resource","reason":"NotFound","details":{},"code":404}
...

To Reproduce
see above

Expected behavior
Finds pods deployed in provided namespace

Desktop (please complete the following information):

OS: [e.g. iOS]

$ cat /etc/os-release 
NAME="Ubuntu"
VERSION="20.04.6 LTS (Focal Fossa)"

Version [e.g. 22]

$ python3.11 krr.py version
1.3.0-dev

$ kubectl version --short
Client Version: v1.26.0
Kustomize Version: v4.5.7
Server Version: v1.21.8

Additional context
kubectl client works as expected.

Restrict workload selection

Is your feature request related to a problem? Please describe.
One might have a huge cluster and want to focus on optimizing just a particular workload.
Prometheus queries while analysing all worklaods in a cluster/namespace might take too long otherwise.

Describe the solution you'd like
Introduce a new command line argument as kubectl has:

    -l, --selector='':
	Selector (label query) to filter on, supports '=', '==', and '!='.(e.g. -l key1=value1,key2=value2). Matching
	objects must satisfy all of the specified label constraints.

to limit the workload selection. E. g:

krr simple --context my-cluster --namespace kube-system --selector app-instance=metrics-server
# or
krr simple --context my-cluster --namespace kube-system --selector owner=devops

Describe alternatives you've considered

krr simple --context my-cluster --namespace kube-system --workload Deployment/metrics-server

Additional context

Thanks for this wonderful tool. Happy to learn if there is alternative solution for me.

Krr tries to autodiscover prometheus (which takes time) even though -p flag is given

Krr tries to autodiscover prometheus/victoria/Thanos even though the -p flag is passed to the command.

If it's relevant (it might be because of the [DEBUG] Prometheus not found log line), I'm using a port-forwarded GKE managed Prometheus here.

Note that despite the log messages, it is actually working and spitting out recommendations.

krr simple -v -p http://127.0.0.1:9090 


 _____       _               _          _  _______  _____
|  __ \     | |             | |        | |/ /  __ \|  __ \
| |__) |___ | |__  _   _ ___| |_ __ _  | ' /| |__) | |__) |
|  _  // _ \| '_ \| | | / __| __/ _` | |  < |  _  /|  _  /
| | \ \ (_) | |_) | |_| \__ \ || (_| | | . \| | \ \| | \ \
|_|  \_\___/|_.__/ \__,_|___/\__\__,_| |_|\_\_|  \_\_|  \_\



Running Robusta's KRR (Kubernetes Resource Recommender) v1.4.1
Using strategy: Simple
Using formatter: table

[DEBUG] Found 2 clusters: production, staging
(/opt/homebrew/Cellar/krr/1.4.1/libexec/robusta_krr/core/integrations/kubernetes.pyc:370)
[DEBUG] Current cluster: staging                (/opt/homebrew/Cellar/krr/1.4.1/libexec/robusta_krr/core/integrations/kubernetes.pyc:371)
[DEBUG] Configured clusters: []         (/opt/homebrew/Cellar/krr/1.4.1/libexec/robusta_krr/core/integrations/kubernetes.pyc:373)
[INFO] Using clusters: ['staging']
[INFO] Listing scannable objects in staging
[DEBUG] Namespaces: *           (/opt/homebrew/Cellar/krr/1.4.1/libexec/robusta_krr/core/integrations/kubernetes.pyc:64)
[DEBUG] Listing deployments in staging          (/opt/homebrew/Cellar/krr/1.4.1/libexec/robusta_krr/core/integrations/kubernetes.pyc:143)
[DEBUG] Listing ArgoCD rollouts in staging              (/opt/homebrew/Cellar/krr/1.4.1/libexec/robusta_krr/core/integrations/kubernetes.pyc:163)
[DEBUG] Listing statefulsets in staging         (/opt/homebrew/Cellar/krr/1.4.1/libexec/robusta_krr/core/integrations/kubernetes.pyc:190)
[DEBUG] Listing daemonsets in staging           (/opt/homebrew/Cellar/krr/1.4.1/libexec/robusta_krr/core/integrations/kubernetes.pyc:210)
[DEBUG] Listing jobs in staging         (/opt/homebrew/Cellar/krr/1.4.1/libexec/robusta_krr/core/integrations/kubernetes.pyc:230)
[DEBUG] Found 1 rollouts in staging             (/opt/homebrew/Cellar/krr/1.4.1/libexec/robusta_krr/core/integrations/kubernetes.pyc:179)
[DEBUG] Found 3 statefulsets in staging         (/opt/homebrew/Cellar/krr/1.4.1/libexec/robusta_krr/core/integrations/kubernetes.pyc:199)
[DEBUG] Found 16 jobs in staging                (/opt/homebrew/Cellar/krr/1.4.1/libexec/robusta_krr/core/integrations/kubernetes.pyc:239)
[DEBUG] Found 38 daemonsets in staging          (/opt/homebrew/Cellar/krr/1.4.1/libexec/robusta_krr/core/integrations/kubernetes.pyc:219)
[DEBUG] Found 72 deployments in staging         (/opt/homebrew/Cellar/krr/1.4.1/libexec/robusta_krr/core/integrations/kubernetes.pyc:152)
[INFO] Found 175 objects across 19 namespaces in staging
on 0: [INFO] Connecting to Prometheus for staging cluster
on 0: [INFO] Using Prometheus at http://127.0.0.1:9090 for cluster staging
on 0: [DEBUG] Prometheus not found            (/opt/homebrew/Cellar/krr/1.4.1/libexec/robusta_krr/core/integrations/prometheus/loader.pyc:72)
on 0: [INFO] Connecting to Victoria for staging cluster
on 0: [INFO] Using Victoria at http://127.0.0.1:9090 for cluster staging
on 0: [DEBUG] Victoria Metrics not found              (/opt/homebrew/Cellar/krr/1.4.1/libexec/robusta_krr/core/integrations/prometheus/loader.pyc:72)
on 0: [INFO] Connecting to Thanos for staging cluster
on 0: [INFO] Using Thanos at http://127.0.0.1:9090 for cluster staging
on 0: [DEBUG] Thanos not found                (/opt/homebrew/Cellar/krr/1.4.1/libexec/robusta_krr/core/integrations/prometheus/loader.pyc:72)
on 0: [ERROR] No Prometheus or metrics service found
Calculating Recommendation |⚠︎                                       | (!) 0/175 [0%] in 6:00.3 (0.00/s)

Add support to Grafana cloud

Is your feature request related to a problem? Please describe.
We scrape the metrics using Grafana agent which sends data to Grafana cloud.

Describe the solution you'd like
It would be great if you could allow passing Grafana cloud Prometheus query endpoint with username and password.
something like below

krr -p https://prometheus-xxx.grafana.net/api/prom -u <username> -p <password>

Excel Export

As much as we all hate xlsx documents,

I was asked to put all krr into a spreadsheet and share via email.

I as a platform engineer would like to have a custom output format to xlsx or csv.

Add option to select object for which recommendations will be generated

Is your feature request related to a problem? Please describe.

I work on a project where we have many different components in a one namespace. When I am interested with recommendation only for one of them, I can limit the tool to be executed only in a one namespace, but I cannot limit only to specific objects. The problem is that the same component is deployed in many clusters and currently the configuration is fixed - the same value for that specific component in all clusters. To get the recommendations, I execute the krr tool on the same namespace in all clusters. That process is quite long because the tool prepares recommendations for 36 pods in every cluster. I am interested with the results for only one of those pods in every cluster.

Describe the solution you'd like

The simple strategy provides the --namespace parameter to select which namespace should be checked. It would be nice to have an additional parameter called --name. Then I can specify:

krr simply --namespace namespace --name component1

and only 1 component (name of the deployment/stateful) would be checked with 1-X pods.

Describe alternatives you've considered

I saw there was a new parameter added which has been not released yet: --selector. If the K8s objects are labeled properly, it could be used to find such items too.

Allow to run on specific namespaces with restricted permission

Describe the bug
On the cluster I used, I don't have access to all namespaces.
Even if I specify my namespaces, it seems it try to get the resources through a cluster-scoped api instead of a namespaced one.

To Reproduce
Steps to reproduce the behavior:

Make sure you don't have access to all namespaces, ex :

$ kubectl get po -A
Error from server (Forbidden): pods is forbidden: User "u-wf3je4hm2h" cannot list resource "pods" in API group "" at the cluster scope

krr simple -n my_namespace
See error

Running Robusta's KRR (Kubernetes Resource Recommender) 1.0.0
Using strategy: Simple
Using formatter: table

[ERROR] Error trying to list pods in cluster k8s-prod: (403)
Reason: Forbidden
HTTP response body: {"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"deployments.apps is forbidden: User \"u-wf3je4hm2h\" cannot list resource \"deployments\" in API group \"apps\" at the cluster
scope","reason":"Forbidden","details":{"group":"apps","kind":"deployments"},"code":403}

Expected behavior
It should be able to get the data if I have access to the specified namespace

Thank for krr, it's awesome :)

KRR scans not working with Azure managed prometheus

Describe the bug
Either using the UI portal or trying the direct calls from the cli with krr.py

To Reproduce
Steps to reproduce the behavior:

Setup robusta with AZ managed prometheus
Use ClientID + Secret
Metrics and alerts are in place but KRR scan is not working

Expected behavior
Have KRR_scans up and running via cli + UI

Limit concurrency of asyncio for gather_objects_recommendations

Describe the bug

When we have a large amount of container/pods in the cluster, e.g. >1000.

krr/robusta_krr/core/runner.py

Lines 122 to 127 in 21096ec

 async def _gather_objects_recommendations( 

 self, objects: list[K8sObjectData] 

 ) -> list[tuple[ResourceAllocations, MetricsData]]: 

 recommendations: list[tuple[RunResult, MetricsData]] = await asyncio.gather( 

 *[self._calculate_object_recommendations(object) for object in objects] 

 )

will start the >1000 coroutines to concurrently query Metrics server which cause resource exhaustion, e.g. connection pool in the VictoriaMetrics, local memory larger than 20GB, etc.

To Reproduce
Steps to reproduce the behavior:

Go to '...'
Click on '....'
Scroll down to '....'
See error

Expected behavior

According to the SO,

async def gather_with_concurrency(n: int, *coros):
    semaphore = asyncio.Semaphore(n)

    async def sem_coro(coro):
        async with semaphore:
            return await coro
    return await asyncio.gather(*(sem_coro(c) for c in coros))

can be used to limit concurrency level.

Screenshots
If applicable, add screenshots to help explain your problem.

Desktop (please complete the following information):

OS: [e.g. iOS]
Browser [e.g. chrome, safari]
Version [e.g. 22]

Smartphone (please complete the following information):

Device: [e.g. iPhone6]
OS: [e.g. iOS8.1]
Browser [e.g. stock browser, safari]
Version [e.g. 22]

Additional context
Add any other context about the problem here.

Prometheus high memory utilization because of krr queries

Describe the bug

I first noticed a problem when I clicked Rescan on the Efficiency panel of Robusta.dev that connects to Prometheus and gathers metrics from my cluster where I have installed the Robusta Helm chart.

My prometheus Pod that normally uses less than 1.5 GiB of memory suddenly needs a lot more than 2.5 GiB. Because it’s running on a node with only 4 GiB ram (and is limited to 3GiB in the Pod spec), the pod is OOM-killed. You can see that behavior in the graph below. Robusta UI also shows the OOM-killed pod.

I believe that feature uses krr in the background, so I also tried running it directly on the CLI pointing at the same Prometheus endpoint, which produced the same result.

Anything I could do to improve that (besides the obvious increase in memory limits)?

To Reproduce
Steps to reproduce the behavior:

Run krr pointing to a Prometheus pod with a 3 GiB memory limit
Prometheus should have 250k+ time series and 10k+ label pairs
Do not limit the namespaces scanned by krr

Expected behavior
Not cause Prometheus to crash

Screenshots

Additional context

When scanning a single namespace Prometheus doesn't crash but you can still notice a big hike in memory utilization

krr version information hasn't been updated since v1.1.1

Describe the bug
The krr version information stored in the follow locations has not been bumped since the release of v1.1.1:

This results in wrong package metadata and incorrect output of the version command:

~/git/krr ((HEAD detached at v1.2.1))$ python krr.py version
1.1.1

To Reproduce
Steps to reproduce the behavior:

Clone the git repo
Checkout a specific version tag > 1.1.1 via git checkout tags/v1.2.1
Run the version command: python krr.py version

Expected behavior
The version information and package metadata should be updated on every release.

Log to stderr

That's because -v flag omits some logs to stdout, which appends the verbose-related logs in the generated-JSON file. So the following steps does not work:

$ python krr.py simple -v -f json > result.json
$ cat result.json | jq
parse error: Invalid numeric literal at line 3, column 7

In the expected behaviour, krr should log to stderr and JSON output to stdout. Wdyt?

Applying recommendations immediately leads to OOMKilled on startup

Describe the bug
I am new to right-sizing k8s clusters and thought I'd give krr a try.

I've used all the default settings (history and buffer) and krr has suggested significant changes across many of my pods.

Upon applying them across two of my example pods, I immediately get OOMKilled errors.

To Reproduce
N/A

Expected behavior
A pod's expected memory footprint to be taken into consideration and not have its memory limit set so low that it can't start.

Screenshots

As can be seen in the above screenshot, HomeAssistant exceeds 1GB of memory use on a number of occasions in the past week, but krr suggests 516Mi as its memory limit.

Any tips on how to make this more useful?

KRR install missing python modules aiostream and slack-sdk

After manually installing KRR https://github.com/robusta-dev/krr#manual-installation, I noticed the following modules were missing:

aiostream
slack-sdk

To Reproduce

Follow manual install instructions
See the error for slack-sdk. (if you manually install via pip the error goes away https://pypi.org/project/slack-sdk/)
If step 2 is resolved using workaround you will then see the error for aiostream. (if you manually install via pip error goes away https://pypi.org/project/aiostream/)

Expected behavior
slack-sdk and aiostream installed properly via requirements.txt file.

Suggested Fix
Add these 2 dependencies slack-sdk and aiostream to the requirements.txt file

Error Messages

slack-sdk

python3 krr.py --help          
               
Traceback (most recent call last):
  File "/Users/calvincarter/_programming/docker/development/krr_install/krr/krr.py", line 1, in <module>
    from robusta_krr import run
  File "/Users/calvincarter/_programming/docker/development/krr_install/krr/robusta_krr/__init__.py", line 1, in <module>
    from .main import run
  File "/Users/calvincarter/_programming/docker/development/krr_install/krr/robusta_krr/main.py", line 16, in <module>
    from robusta_krr.core.runner import Runner
  File "/Users/calvincarter/_programming/docker/development/krr_install/krr/robusta_krr/core/runner.py", line 6, in <module>
    from slack_sdk import WebClient
ModuleNotFoundError: No module named 'slack_sdk'

aiostream

 python3 krr.py --help 
Traceback (most recent call last):
  File "/Users/calvincarter/_programming/docker/development/krr_install/krr/krr.py", line 1, in <module>
    from robusta_krr import run
  File "/Users/calvincarter/_programming/docker/development/krr_install/krr/robusta_krr/__init__.py", line 1, in <module>
    from .main import run
  File "/Users/calvincarter/_programming/docker/development/krr_install/krr/robusta_krr/main.py", line 16, in <module>
    from robusta_krr.core.runner import Runner
  File "/Users/calvincarter/_programming/docker/development/krr_install/krr/robusta_krr/core/runner.py", line 10, in <module>
    from robusta_krr.core.integrations.kubernetes import KubernetesLoader
  File "/Users/calvincarter/_programming/docker/development/krr_install/krr/robusta_krr/core/integrations/kubernetes.py", line 4, in <module>
    import aiostream
ModuleNotFoundError: No module named 'aiostream'

Desktop (please complete the following information):

OS: [Mac]
Version [Ventura 13.5]

Ability to set custom Prometheus data source for `remote_write` users

This feature is desirable for users who use the Prometheus's remote_write feature. We push the metrics to remote Victoria Metric agents so all Prometheus (kube-prometheus) instances in the clusters has 4 hours of retention. We also append a _cluster=my-cluster-name label for each metric to identify where the metric is coming from.

$ kubectl get secrets prometheus-prometheus-operator-kube-p-prometheus -o json | jq '.data."prometheus.yaml.gz"' -r | base64 -d | gunzip | yq e '.remote_write' -

- url: http://my-remote-vmagent/api/v1/write

I'd be nice to set manually Prometheus instance address with some custom label selectors (to ensure the corresponding cluster we are querying is correct). Any thoughts?

Krr 1.5.3 returns no results due to no metrics for PercentileCPULoader and MaxMemoryLoader

Describe the bug

I updated the tool to the 1.5.3 version. When I executed the simple strategy and it returned no results:

> krr simple --namespace <namespace> --selector="app = grafana"


 _____       _               _          _  _______  _____
|  __ \     | |             | |        | |/ /  __ \|  __ \
| |__) |___ | |__  _   _ ___| |_ __ _  | ' /| |__) | |__) |
|  _  // _ \| '_ \| | | / __| __/ _` | |  < |  _  /|  _  /
| | \ \ (_) | |_) | |_| \__ \ || (_| | | . \| | \ \| | \ \
|_|  \_\___/|_.__/ \__,_|___/\__\__,_| |_|\_\_|  \_\_|  \_\



Running Robusta's KRR (Kubernetes Resource Recommender) v1.5.3
Using strategy: Simple
Using formatter: table

[INFO] Using clusters: ['<cluster>']
on 0: [INFO] Listing scannable objects in <cluster>
on 0: [INFO] Connecting to Prometheus for <cluster> cluster
on 0: [INFO] Using Prometheus at https://<server>/api/v1/namespaces/<namespace>/services/prometheus-server-service:9091/proxy for cluster <cluster>   
on 0: [INFO] Prometheus found
on 0: [INFO] Prometheus connected successfully for <cluster> cluster
on 0: [WARNING] Prometheus returned no PercentileCPULoader metrics for Deployment <namespace>/grafana-deployment/grafana
on 0: [WARNING] Prometheus returned no MaxMemoryLoader metrics for Deployment <namespace>/grafana-deployment/grafana
Calculating Recommendation |████████████████████████████████████████| 1 in 15.4s (0.07/s)



Simple Strategy

CPU request: 99.0% percentile, limit: unset
Memory request: max + 5.0%, limit: max + 5.0%

This strategy does not work with objects with HPA defined (Horizontal Pod Autoscaler).
If HPA is defined for CPU or Memory, the strategy will return "?" for that resource.

Learn more: https://github.com/robusta-dev/krr#algorithm

┏━━━━━━━━┳━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━┳━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┓
┃ Number ┃ Namespace        ┃ Name             ┃ Pods ┃ Old Pods ┃ Type       ┃ Container ┃ CPU Diff ┃ CPU Requests     ┃ CPU Limits       ┃ Memory Diff ┃ Memory Requests  ┃ Memory Limits     ┃
┡━━━━━━━━╇━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━╇━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━┩
│     1. │ <namespace>      │ grafana-deploym… │ 0    │ 0        │ Deployment │ grafana   │          │ 40m -> ? (No     │ unset -> ? (No   │             │ 75Mi -> ? (No    │ 90Mi -> ? (No     │
│        │                  │                  │      │          │            │           │          │ data)            │ data)            │             │ data)            │ data)             │
└────────┴──────────────────┴──────────────────┴──────┴──────────┴────────────┴───────────┴──────────┴──────────────────┴──────────────────┴─────────────┴──────────────────┴───────────────────┘
                                                                                         100 points - A

When I execute the same command for the same cluster with the 1.4.1 version, it works fine:

> krr simple --namespace monitor-operator --selector="app = grafana"                                                                               


 _____       _               _          _  _______  _____
|  __ \     | |             | |        | |/ /  __ \|  __ \
| |__) |___ | |__  _   _ ___| |_ __ _  | ' /| |__) | |__) |
|  _  // _ \| '_ \| | | / __| __/ _` | |  < |  _  /|  _  /
| | \ \ (_) | |_) | |_| \__ \ || (_| | | . \| | \ \| | \ \
|_|  \_\___/|_.__/ \__,_|___/\__\__,_| |_|\_\_|  \_\_|  \_\



Running Robusta's KRR (Kubernetes Resource Recommender) v1.4.1
Using strategy: Simple
Using formatter: table

[INFO] Using clusters: ['<cluster>']
[INFO] Listing scannable objects in <cluster>
[INFO] Found 1 objects across 1 namespaces in <cluster>
on 0: [INFO] Connecting to Prometheus for <cluster> cluster
on 0: [INFO] Using Prometheus at https://<server>/api/v1/namespaces/<namespace>/services/prometheus-server-service:9091/proxy for cluster <cluster>   
on 0: [INFO] Prometheus found
on 0: [INFO] Prometheus connected successfully for <cluster>  cluster
Calculating Recommendation |████████████████████████████████████████| 1/1 [100%] in 5.5s (0.18/s)



Simple Strategy

CPU request: 99.0% percentile, limit: unset
Memory request: max + 5.0%, limit: max + 5.0%

This strategy does not work with objects with HPA defined (Horizontal Pod Autoscaler).
If HPA is defined for CPU or Memory, the strategy will return "?" for that resource.

Learn more: https://github.com/robusta-dev/krr#algorithm

┏━━━━━━━━┳━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━┳━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓    
┃ Number ┃ Namespace        ┃ Name               ┃ Pods ┃ Old Pods ┃ Type       ┃ Container ┃ CPU Diff ┃ CPU Requests     ┃ CPU Limits ┃ Memory Diff ┃ Memory Requests      ┃ Memory Limits ┃    
┡━━━━━━━━╇━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━╇━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩    
│     1. │ <namespace>      │ grafana-deployment │ 1    │ 0        │ Deployment │ grafana   │ -35m     │ (-35m) 40m -> 5m │ unset      │ -15Mi       │ (-15Mi) 75Mi -> 60Mi │ 90Mi -> 60Mi  │    
└────────┴──────────────────┴────────────────────┴──────┴──────────┴────────────┴───────────┴──────────┴──────────────────┴────────────┴─────────────┴──────────────────────┴───────────────┘    
                                                                                       100 points - A

To Reproduce
Steps to reproduce the behavior:

execute the simple strategy only with the namespace and selector parameters

Expected behavior

Recommendations should be calculated.

Desktop

OS: Microsoft Windows 11 Enterprise, 10.0.22621
Browser: Brave, 1.56.20 Chromium: 115.0.5790.171 (Official Build) (64-bit)

The CPU and RAM suggestions return question mark or none

I executed the krr.py script with -p flag to connect with my local Prometheus but the CPU and Memory suggestions return None. May I know what am I missing here?

Connecting to prometheus Behaviour

In the documentation, it should be made clear that the Prometheus needs to have one of the above labels

"app=kube-prometheus-stack-prometheus",
"app=prometheus,component=server",
"app=prometheus-server",
"app=prometheus-operator-prometheus",
"app=prometheus-msteams",
"app=rancher-monitoring-prometheus",
"app=prometheus-prometheus",

In addition, IMO the Ingress host should likely be the default URL.
As a fallback, it can try to look for the Service LB or access the Prometheus through the cluster API

Another small point as developers may not have access to the prometheus server namespace, it might be worth saying the recommended way to run this is using the -p flag

Can't krr use the nodeport port?

My kubernetes cluster is on another host, and it seems impossible to use nodeport to access the Prometheus server address. What should I do?

Add long term storage support (Thanos)

We are currently just storing 24h in our local prometheus and using for anything longer thanos.

It would cool to use thanos by filtering results based on a custom label set.

Support metrics-based workload discovery

Is your feature request related to a problem? Please describe.

In some cases, we want to run the KRR program locally. But for the security consideration, the API server of the Kubernetes cluster cannot be accessed outside of the cluster.

So we can use the Prometheus-based workload discovery if kube-state-metrics is installed.

Describe the solution you'd like

We can do the workload based discovery with the following steps,

List Deployments together with their ReplicaSets,

replicasets = await self.metrics_loader.loader.query("count by (namespace, owner_name, replicaset) (kube_replicaset_owner{"
                                               f'namespace=~"{ns}", '
                                               'owner_kind="Deployment"})')

List Pods from a group of ReplicaSets

# owner_name is ReplicaSet names
pods = await self.metrics_loader.loader.query("count by (owner_name, replicaset, pod) (kube_pod_owner{"
                                               f'namespace="{namespace}", '
                                               f'owner_name=~"{owner_name}", '
                                               'owner_kind="ReplicaSet"})')

List containers from Pods got from step (2)

containers = await self.metrics_loader.loader.query("count by (container) (kube_pod_container_info{"
                                               f'namespace="{namespace}", '
                                               f'pod=~"{pod_selector}"'
                                               "})")

Build K8sObjectData for containers got from step (3)

async def __build_from_owner(self, namespace: str, app_name: str, containers: List[str], pod_names: List[str]) -> List[K8sObjectData]:
        return [
            K8sObjectData(
                cluster=None,
                namespace=namespace,
                name=app_name,
                kind="Deployment",
                container=container_name,
                allocations=await self.__parse_allocation(namespace, "|".join(pod_names), container_name), # find 
                pods=[PodData(name=pod_name, deleted=False) for pod_name in pod_names], # list pods
            )
            for container_name in containers
        ]

Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.

Additional context
Add any other context or screenshots about the feature request here.

Does not work with a proxy

I need a proxy to get to our kubernetes environments.
The proxy is specified in kubeconfig but it seems krr aren't using it.
I also tried to set an http{s}_proxy env variable but that didn't work either.

Using krr v1.0.0.

Cannot get recommendations by using krr 1.3.2

Describe the bug

I updated the krr tool from version 1.2.1 to 1.3.2 today. Unfortunately, I cannot find a way to get recommendations by using the simple strategy. The same commands work fine for 1.2.1.

To Reproduce

Execute the following command:

krr.exe simple --kubeconfig <path-to-kubeconfig> --context <context-name> --namespace <namespace-name>

1.2.1

 _____       _               _          _  _______  _____
|  __ \     | |             | |        | |/ /  __ \|  __ \
| |__) |___ | |__  _   _ ___| |_ __ _  | ' /| |__) | |__) |
|  _  // _ \| '_ \| | | / __| __/ _` | |  < |  _  /|  _  /
| | \ \ (_) | |_) | |_| \__ \ || (_| | | . \| | \ \| | \ \
|_|  \_\___/|_.__/ \__,_|___/\__\__,_| |_|\_\_|  \_\_|  \_\



Running Robusta's KRR (Kubernetes Resource Recommender) v1.2.1
Using strategy: Simple
Using formatter: table

[WARNING] Could not load context from kubeconfig.
[WARNING] Falling back to clusters from CLI: ['context-name']
[INFO] Using clusters: ['context-name']
[INFO] Listing scannable objects in context-name
[INFO] Found 7 objects across 1 namespaces in context-name
on 0: [INFO] Connecting to Prometheus for context-name cluster
on 0: [INFO] Using Prometheus at https://some-domain-here/api/v1/namespaces/some-namespace-here/services/prometheus-server-service:9091/proxy for cluster context-name
on 0: [INFO] Prometheus found
on 0: [INFO] Prometheus connected successfully for context-name cluster
Calculating Recommendation |████████████████████████████████████████| 7/7 [100%] in 7.1s (0.60/s)



Simple Strategy

CPU request: 99.0% percentile, limit: unset
Memory request: max + 5.0%, limit: max + 5.0%
Learn more: https://github.com/robusta-dev/krr#algorithm
<the table is here>

1.3.2

 _____       _               _          _  _______  _____
|  __ \     | |             | |        | |/ /  __ \|  __ \
| |__) |___ | |__  _   _ ___| |_ __ _  | ' /| |__) | |__) |
|  _  // _ \| '_ \| | | / __| __/ _` | |  < |  _  /|  _  /
| | \ \ (_) | |_) | |_| \__ \ || (_| | | . \| | \ \| | \ \
|_|  \_\___/|_.__/ \__,_|___/\__\__,_| |_|\_\_|  \_\_|  \_\



Running Robusta's KRR (Kubernetes Resource Recommender) v1.3.2
Using strategy: Simple
Using formatter: table

[WARNING] Could not load context from kubeconfig.
[WARNING] Falling back to clusters from CLI: ['context-name']
[INFO] Using clusters: ['context-name']
[INFO] Listing scannable objects in context-name
[INFO] Found 7 objects across 1 namespaces in context-name
on 0: [INFO] Connecting to Prometheus for context-name cluster
on 0: [INFO] Using Prometheus at https://some-domain-here/api/v1/namespaces/some-namespace-here/services/prometheus-server-service:9091/proxy for cluster context-name
on 0: [INFO] Prometheus found
Calculating Recommendation |⚠︎                                       | (!) 0/7 [0%] in 3.9s (0.00/s)
[ERROR] No label specified, Rerun krr with the flag `-l <cluster>` where <cluster> is one of [<a very long list of items>]

The a very long list of items contains all K8s namespaces. As requested I added the -l parameter with the namespace-name value (that value was in the list):

krr.exe simple --kubeconfig <path-to-kubeconfig> --context <context-name> --namespace <namespace-name> -l <namespace-name>

 _____       _               _          _  _______  _____
|  __ \     | |             | |        | |/ /  __ \|  __ \
| |__) |___ | |__  _   _ ___| |_ __ _  | ' /| |__) | |__) |
|  _  // _ \| '_ \| | | / __| __/ _` | |  < |  _  /|  _  /
| | \ \ (_) | |_) | |_| \__ \ || (_| | | . \| | \ \| | \ \
|_|  \_\___/|_.__/ \__,_|___/\__\__,_| |_|\_\_|  \_\_|  \_\



Running Robusta's KRR (Kubernetes Resource Recommender) v1.3.2
Using strategy: Simple
Using formatter: table

[WARNING] Could not load context from kubeconfig.
[WARNING] Falling back to clusters from CLI: ['context-name']
[INFO] Using clusters: ['context-name']
[INFO] Listing scannable objects in context-name
[INFO] Found 7 objects across 1 namespaces in context-name
on 0: [INFO] Connecting to Prometheus for context-name cluster
on 0: [INFO] Using Prometheus at https://some-domain-here/api/v1/namespaces/some-namespace-here/services/prometheus-server-service:9091/proxy for cluster context-name
on 0: [INFO] Prometheus found
on 0: [WARNING] Prometheus returned no MemoryMetricLoader metrics for StatefulSet namespace-name/statefulset-name-1/container-name-1
on 0: [WARNING] Prometheus returned no CPUMetricLoader metrics for StatefulSet namespace-name/statefulset-name-1/container-name-1
on 1: [WARNING] Prometheus returned no CPUMetricLoader metrics for StatefulSet namespace-name/statefulset-name-2/container-name-2
on 1: [WARNING] Prometheus returned no MemoryMetricLoader metrics for StatefulSet namespace-name/statefulset-name-2/container-name-2
[...]

Simple Strategy

CPU request: 99.0% percentile, limit: unset
Memory request: max + 5.0%, limit: max + 5.0%

This strategy does not work with objects with HPA defined (Horizontal Pod Autoscaler).
If HPA is defined for CPU or Memory, the strategy will return "?" for that resource.

Learn more: https://github.com/robusta-dev/krr#algorithm

┏━━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━┓
┃ Number ┃ Namespace      ┃ Name                           ┃ Pods ┃ Old Pods ┃ Type        ┃ Container              ┃ CPU Diff ┃ CPU Requests         ┃ CPU Limits           ┃ Memory Diff ┃ Memory Requests       ┃ Memory Limits         ┃
┡━━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━┩
│     1. │ namespace-name │ deployment-name-1              │ 2    │ 0        │ Deployment  │ container-name-1       │          │ 500m -> ? (No data)  │ 1 -> ? (No data)     │             │ 1600Mi -> ? (No data) │ 1920Mi -> ? (No data) │
├────────┼────────────────┼────────────────────────────────┼──────┼──────────┼─────────────┼────────────────────────┼──────────┼──────────────────────┼──────────────────────┼─────────────┼───────────────────────┼───────────────────────┤
[...]

<all recommendations are set to no data>

My kubeconfig:

apiVersion: v1
kind: Config
clusters:
  - name: context-name
    cluster:
        server: 'https://some-domain-here'
        certificate-authority-data: >-
            some-data-here
contexts:
  - name: context-name
    context:
        cluster: context-name
        user: context-name-token
        namespace: default
current-context: context-name
users:
  - name: context-name-token
    user:
        exec:
            apiVersion: client.authentication.k8s.io/v1beta1
            command: kubectl
            args:
              - oidc-login
              - get-token
              - some-params-here

Expected behavior

Recommendations should be provided with valid data.

Desktop:

OS: Microsoft Windows 11 Enterprise, 10.0.22621 N/A Build 22621
PowerShell 7.3.6
Browser: Brave 1.52.130 with Chromium 114.0.5735.198 (Official Build)

PrometheusApiClientException: HTTP Status Code 414 Request-URI Too Large

Describe the bug
"krr simple -n kube-system -p https://prometheus.example.com" gives "Request-URI Too Large" exception

To Reproduce
not sure if this is reproducible in every environment, i am trying the command on a cluster having 165 nodes, not sure if that is a big number for krr

Expected behavior
report should appear

Screenshots
pasting the error with masked url and cluster name

$ krr simple -n kube-system -p https://prometheus.example.com


 _____       _               _          _  _______  _____
|  __ \     | |             | |        | |/ /  __ \|  __ \
| |__) |___ | |__  _   _ ___| |_ __ _  | ' /| |__) | |__) |
|  _  // _ \| '_ \| | | / __| __/ _` | |  < |  _  /|  _  /
| | \ \ (_) | |_) | |_| \__ \ || (_| | | . \| | \ \| | \ \
|_|  \_\___/|_.__/ \__,_|___/\__\__,_| |_|\_\_|  \_\_|  \_\



Running Robusta's KRR (Kubernetes Resource Recommender) v1.2.1
Using strategy: Simple
Using formatter: table

[INFO] Using clusters: ['k8-cluster']
[INFO] Listing scannable objects in k8-cluster
[INFO] Found 30 objects across 1 namespaces in k8-cluster
on 0: [INFO] Connecting to Prometheus for k8-cluster cluster
on 0: [INFO] Using Prometheus at https://prometheus.example.com for cluster k8-cluster
on 0: [INFO] Prometheus found
on 0: [INFO] Prometheus connected successfully for k8-cluster cluster
Calculating Recommendation |⚠︎                                       | (!) 0/30 [0%] in 1:55.2 (0.00/s)
╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
│ /private/tmp/robusta_krr/core/runner.py:174 in run                                               │
│                                                                                                  │
│ [Errno 2] No such file or directory: '/private/tmp/robusta_krr/core/runner.py'                   │
│                                                                                                  │
│ /private/tmp/robusta_krr/core/runner.py:153 in _collect_result                                   │
│                                                                                                  │
│ [Errno 2] No such file or directory: '/private/tmp/robusta_krr/core/runner.py'                   │
│                                                                                                  │
│ /private/tmp/robusta_krr/core/runner.py:125 in _gather_objects_recommendations                   │
│                                                                                                  │
│ [Errno 2] No such file or directory: '/private/tmp/robusta_krr/core/runner.py'                   │
│                                                                                                  │
│ /private/tmp/robusta_krr/core/runner.py:101 in _calculate_object_recommendations                 │
│                                                                                                  │
│ [Errno 2] No such file or directory: '/private/tmp/robusta_krr/core/runner.py'                   │
│                                                                                                  │
│ /private/tmp/robusta_krr/core/integrations/prometheus/loader.py:90 in gather_data                │
│                                                                                                  │
│ [Errno 2] No such file or directory:                                                             │
│ '/private/tmp/robusta_krr/core/integrations/prometheus/loader.py'                                │
│                                                                                                  │
│                                     ... 2 frames hidden ...                                      │
│                                                                                                  │
│ /private/tmp/robusta_krr/core/integrations/prometheus/metrics/base_filtered_metric.py:61 in      │
│ query_prometheus                                                                                 │
│                                                                                                  │
│ [Errno 2] No such file or directory:                                                             │
│ '/private/tmp/robusta_krr/core/integrations/prometheus/metrics/base_filtered_metric.py'          │
│                                                                                                  │
│ /private/tmp/robusta_krr/core/integrations/prometheus/metrics/base_metric.py:71 in               │
│ query_prometheus                                                                                 │
│                                                                                                  │
│ [Errno 2] No such file or directory:                                                             │
│ '/private/tmp/robusta_krr/core/integrations/prometheus/metrics/base_metric.py'                   │
│                                                                                                  │
│ /private/tmp/asyncio/threads.py:25 in to_thread                                                  │
│                                                                                                  │
│ [Errno 2] No such file or directory: '/private/tmp/asyncio/threads.py'                           │
│                                                                                                  │
│ /private/tmp/concurrent/futures/thread.py:58 in run                                              │
│                                                                                                  │
│ [Errno 2] No such file or directory: '/private/tmp/concurrent/futures/thread.py'                 │
│                                                                                                  │
│ /private/tmp/prometheus_api_client/prometheus_connect.py:408 in custom_query_range               │
│                                                                                                  │
│ [Errno 2] No such file or directory: '/private/tmp/prometheus_api_client/prometheus_connect.py'  │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
PrometheusApiClientException: HTTP Status Code 414 (b'<html>\r\n<head><title>414 Request-URI Too Large</title></head>\r\n<body>\r\n<center><h1>414 Request-URI Too
Large</h1></center>\r\n<hr><center>nginx</center>\r\n</body>\r\n</html>\r\n')

Desktop (please complete the following information):

OS: [e.g. iOS]
Browser [e.g. chrome, safari]
Version [e.g. 22]

Smartphone (please complete the following information):

Device: [e.g. iPhone6]
OS: [e.g. iOS8.1]
Browser [e.g. stock browser, safari]
Version [e.g. 22]

Additional context
on same cluster, it worked on a namespace having 3 pods, looks like something to do with large number of pods

Add Grafana Mimir integration

Is your feature request related to a problem? Please describe.
Grafana Mimir (even without authentication, so this is partially related to #18) requires specifying the tenant via HTTP header X-Scope-OrgID.

Describe the solution you'd like
Option to either pass arbitrary headers or specific flag for Mimir.

Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.

Additional context
Add any other context or screenshots about the feature request here.

Consider enriching `Reason: Not Found` error

krr simple simply returns the following error:

[ERROR] Error trying to list pods in cluster foo@bar: (404)
Reason: Not Found
HTTP response headers: HTTPHeaderDict({'Audit-Id': 'XXX', 'Cache-Control': 'no-cache, private', 'Content-Type': 'application/json',
'X-Kubernetes-Pf-Flowschema-Uid': 'XXX', 'X-Kubernetes-Pf-Prioritylevel-Uid': 'XXX', 'Date': 'Mon, 17 Jul 2023 09:11:41 GMT',
'Content-Length': '174'})
HTTP response body: {"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"the server could not find the requested
resource","reason":"NotFound","details":{},"code":404}

I'm not so sure if the error above is quite descriptive to end user. It'd better to do enriching and rewording, if possible.

krr version
v1.3.2

Kubernetes
v1.20.7

error when context not set

Describe the bug
A clear and concise description of what the bug is.

i work with more than a couple of clusters. to avoid confusing myself i do not set contexts forcing myself to do that for each kubectl command i write.

which means i get this error:

kubernetes.config.config_exception.ConfigException: Invalid kube-config file. Expected object with name in ${HOME}/.kube/config/contexts list

i had to use kubectx to set context for krr to work

To Reproduce
Steps to reproduce the behavior:

Go to '...'
Click on '....'
Scroll down to '....'
See error

Expected behavior
A clear and concise description of what you expected to happen.

Screenshots
If applicable, add screenshots to help explain your problem.

Desktop (please complete the following information):

OS: [e.g. iOS]
Browser [e.g. chrome, safari]
Version [e.g. 22]

Smartphone (please complete the following information):

Device: [e.g. iPhone6]
OS: [e.g. iOS8.1]
Browser [e.g. stock browser, safari]
Version [e.g. 22]

Additional context
Add any other context about the problem here.

Add support for rollouts

Hey everyone,
We are mostly using argocd rollout resource instead of deployments:

apiVersion: argoproj.io/v1alpha1
kind: Rollout

The Rollout of ArgoCD is a popular CRD that is used by most of the people that use ArgoCD, its basically a deployment but with extra capabilities.

It would be great if the krr tool could support it so it will show results of Rollouts as well.

PrometheusNotFound: when running the Rescan from the UI.

Describe the bug

In the UI under Efficiency when you select cluster to rescan and then click Rescan you get an error in the logs that says PrometheusNotFound: Prometheus url could not be found while scanning in default cluster

I am using an external prometheus

To Reproduce
Steps to reproduce the behavior:

Go to 'Efficiency`
Select a cluster from the Cluster to rescan dropdown.
Click on Rescan'
Wait for message to popup saying the job failed.
See error in logs

Expected behavior
I would expect this to rescan the cluster

Desktop (please complete the following information):

OS: MacOS
Browser Chrome
Version 113.0.5672.92

Brew package

Right now installing is a little convoluted if you just want to take the tool for a spin, we are all familiar with the pain of python envs.

It would be great if krr had a brew package.

Add support for DataDog metrics instead of Prometheus

Our organization no longer uses Prometheus, so I am very curious what would it take to integrate with DataDog as a metrics source, either as what they call an "integration" or just using plain old DataDog API.

Pods are not found

Describe the bug
We do have Pods which are created by another Pod (GitLab Runner Kubernetes Executer).
These Pods don't have a Deployment, but even while they are running, they are not found in krr.

How can these Pods also be checked.

Screenshots
If applicable, add screenshots to help explain your problem.

AttributeError: NoneType object has no attribute items

Greetings,

I just came across KRR and it looks impressive! Unfortunately, I am unable to run it on our self-hosted Kubernetes cluster, it fails with AttributeError: 'NoneType' object has no attribute 'items' error as seen below. Any insight on fixing this will be highly appreciated. Thanks.

python krr.py simple -v -n ops


 _____       _               _          _  _______  _____
|  __ \     | |             | |        | |/ /  __ \|  __ \
| |__) |___ | |__  _   _ ___| |_ __ _  | ' /| |__) | |__) |
|  _  // _ \| '_ \| | | / __| __/ _` | |  < |  _  /|  _  /
| | \ \ (_) | |_) | |_| \__ \ || (_| | | . \| | \ \| | \ \
|_|  \_\___/|_.__/ \__,_|___/\__\__,_| |_|\_\_|  \_\_|  \_\



Running Robusta's KRR (Kubernetes Resource Recommender) 1.1.1
Using strategy: Simple
Using formatter: table

[DEBUG] Found 1 clusters: kubernetes-admin@kubernetes           (/home/pkr/krr/robusta_krr/core/integrations/kubernetes.py:206)
[DEBUG] Current cluster: kubernetes-admin@kubernetes            (/home/pkr/krr/robusta_krr/core/integrations/kubernetes.py:207)
[DEBUG] Configured clusters: []         (/home/pkr/krr/robusta_krr/core/integrations/kubernetes.py:209)
[INFO] Using clusters: ['kubernetes-admin@kubernetes']
[INFO] Listing scannable objects in kubernetes-admin@kubernetes
[DEBUG] Namespaces: ['ops']             (/home/pkr/krr/robusta_krr/core/integrations/kubernetes.py:46)
[DEBUG] Listing deployments in kubernetes-admin@kubernetes              (/home/pkr/krr/robusta_krr/core/integrations/kubernetes.py:117)
[DEBUG] Listing statefulsets in kubernetes-admin@kubernetes             (/home/pkr/krr/robusta_krr/core/integrations/kubernetes.py:130)
[DEBUG] Listing daemonsets in kubernetes-admin@kubernetes               (/home/pkr/krr/robusta_krr/core/integrations/kubernetes.py:143)
[DEBUG] Listing jobs in kubernetes-admin@kubernetes             (/home/pkr/krr/robusta_krr/core/integrations/kubernetes.py:156)
[DEBUG] Found 12 daemonsets in kubernetes-admin@kubernetes              (/home/pkr/krr/robusta_krr/core/integrations/kubernetes.py:145)
[DEBUG] Found 104 statefulsets in kubernetes-admin@kubernetes           (/home/pkr/krr/robusta_krr/core/integrations/kubernetes.py:132)
[DEBUG] Found 407 deployments in kubernetes-admin@kubernetes            (/home/pkr/krr/robusta_krr/core/integrations/kubernetes.py:119)
[DEBUG] Found 756 jobs in kubernetes-admin@kubernetes           (/home/pkr/krr/robusta_krr/core/integrations/kubernetes.py:158)
[ERROR] Error trying to list pods in cluster kubernetes-admin@kubernetes: 'NoneType' object has no attribute 'items'
╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
│ /home/pkr/krr/robusta_krr/core/integrations/kubernetes.py:49 in list_scannable_objects           │
│                                                                                                  │
│    46 │   │   self.debug(f"Namespaces: {self.config.namespaces}")                                │
│    47 │   │                                                                                      │
│    48 │   │   try:                                                                               │
│ ❱  49 │   │   │   objects_tuple = await asyncio.gather(                                          │
│    50 │   │   │   │   self._list_deployments(),                                                  │
│    51 │   │   │   │   self._list_all_statefulsets(),                                             │
│    52 │   │   │   │   self._list_all_daemon_set(),                                               │
│                                                                                                  │
│ /home/pkr/krr/robusta_krr/core/integrations/kubernetes.py:121 in _list_deployments               │
│                                                                                                  │
│   118 │   │   ret: V1DeploymentList = await asyncio.to_thread(self.apps.list_deployment_for_al   │
│   119 │   │   self.debug(f"Found {len(ret.items)} deployments in {self.cluster}")                │
│   120 │   │                                                                                      │
│ ❱ 121 │   │   return await asyncio.gather(                                                       │
│   122 │   │   │   *[                                                                             │
│   123 │   │   │   │   self.__build_obj(item, container)                                          │
│   124 │   │   │   │   for item in ret.items                                                      │
│                                                                                                  │
│ /home/pkr/krr/robusta_krr/core/integrations/kubernetes.py:113 in __build_obj                     │
│                                                                                                  │
│   110 │   │   │   kind=item.__class__.__name__[2:],                                              │
│   111 │   │   │   container=container.name,                                                      │
│   112 │   │   │   allocations=ResourceAllocations.from_container(container),                     │
│ ❱ 113 │   │   │   pods=await self.__list_pods(item),                                             │
│   114 │   │   )                                                                                  │
│   115 │                                                                                          │
│   116 │   async def _list_deployments(self) -> list[K8sObjectData]:                              │
│                                                                                                  │
│ /home/pkr/krr/robusta_krr/core/integrations/kubernetes.py:94 in __list_pods                      │
│                                                                                                  │
│    91 │   │   return ",".join(label_filters)                                                     │
│    92 │                                                                                          │
│    93 │   async def __list_pods(self, resource: Union[V1Deployment, V1DaemonSet, V1StatefulSet   │
│ ❱  94 │   │   selector = self._build_selector_query(resource.spec.selector)                      │
│    95 │   │   if selector is None:                                                               │
│    96 │   │   │   return []                                                                      │
│    97                                                                                            │
│                                                                                                  │
│ /home/pkr/krr/robusta_krr/core/integrations/kubernetes.py:84 in _build_selector_query            │
│                                                                                                  │
│    81 │                                                                                          │
│    82 │   @staticmethod                                                                          │
│    83 │   def _build_selector_query(selector: V1LabelSelector) -> Union[str, None]:              │
│ ❱  84 │   │   label_filters = [f"{label[0]}={label[1]}" for label in selector.match_labels.ite   │
│    85 │   │                                                                                      │
│    86 │   │   if selector.match_expressions is not None:                                         │
│    87 │   │   │   label_filters.extend(                                                          │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
AttributeError: 'NoneType' object has no attribute 'items'
[WARNING] Current filters resulted in no objects available to scan.
[WARNING] Try to change the filters or check if there is anything available.


┏━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━┳━━━━━━┳━━━━━━━━━━┳━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Number ┃ Cluster ┃ Namespace ┃ Name ┃ Pods ┃ Old Pods ┃ Type ┃ Container ┃ CPU Requests ┃ CPU Limits ┃ Memory Requests ┃ Memory Limits ┃
┡━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━╇━━━━━━╇━━━━━━━━━━╇━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
└────────┴─────────┴───────────┴──────┴──────┴──────────┴──────┴───────────┴──────────────┴────────────┴─────────────────┴───────────────┘

Kubernetes Info:

Self hosted cluster. Kubernetes version v1.25.7
Host OS: Ubuntu 22.04.1

Use Vertical Pod Autoscaler as an additional data source to enrich the recommendation algorithm

As current implementation of krr is highly relies on Prometheus metrics, I was thinking of how we can also get benefit of the VPA (in recommendation mode, if CRDs installed) as an additional data source to enrich recommendation system by making high precision calculations (as goldilocks did).

Implementation would quite simple:

Get the VPA
Check the VPA.Status.Recommendation.ContainerRecommendations
Get the Prometheus metrics (as-is)
Do some magic and calculation stuff (combine two different data sources)
Finalize the recommended values

Any thoughts?

Add yourself to the ADOPTERS file, get an awesome t-shirt

Hey all, we'd like highlight some success stories from using KRR in ADOPTERS.md. Open a PR telling us how about your experience with KRR and we'll send you some KRR swag 🙂

Providing strategy settings on command line

Hi!

Thanks for creating this! I've been trying to use it, and for the most part it's been a very nice experience. However, I am having some issues with getting the strategy parameters to work.

I loaded up the project in VSCode and started the application (with a few breakpoints) with the following parameters:

krr.py simple -p "https://<my prometheus endpoint>" -n "selfhosted" --history_duration "1" --memory_buffer_percentage "10"

I notice that the values for history_duration and memory_buffer_percentage are still 336 and 5 respectively. Am I doing something wrong in the CLI call?

krr.py simple --help gives me this:

Usage: krr.py simple [OPTIONS]

 Run KRR using the `simple` strategy

╭─ Options ────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ --help          Show this message and exit.                                                                                                                                      │
╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
╭─ Kubernetes Settings ────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ --cluster    -c      TEXT  List of clusters to run on. By default, will run on the current cluster. Use '*' to run on all clusters. [default: None]                              │
│ --namespace  -n      TEXT  List of namespaces to run on. By default, will run on all namespaces. [default: None]                                                                 │
╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
╭─ Prometheus Settings ────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ --prometheus-url          -p      TEXT  Prometheus URL. If not provided, will attempt to find it in kubernetes cluster [default: None]                                           │
│ --prometheus-auth-header          TEXT  Prometheus authentication header. [default: None]                                                                                        │
│ --prometheus-ssl-enabled                Enable SSL for Prometheus requests.                                                                                                      │
╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
╭─ Logging Settings ───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ --formatter  -f      TEXT  Output formatter (json, pprint, table, yaml) [default: table]                                                                                         │
│ --verbose    -v            Enable verbose mode                                                                                                                                   │
│ --quiet      -q            Enable quiet mode                                                                                                                                     │
╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
╭─ Strategy Settings ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ --history_duration                TEXT  The duration of the history data to use (in hours). [default: 336]                                                                       │
│ --timeframe_duration              TEXT  The step for the history data (in minutes). [default: 15]                                                                                │
│ --cpu_percentile                  TEXT  The percentile to use for the CPU recommendation. [default: 99]                                                                          │
│ --memory_buffer_percentage        TEXT  The percentage of added buffer to the peak memory usage for memory recommendation. [default: 5]                                          │
╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯

Feature request: configurable prometheus queries

We use kube-eagle and victoria metrics.

Right now, I have to set up a scrape job from cadvisor solely dedicated to krr:

  # These metrics are required by https://github.com/robusta-dev/krr
  - job_name: kubelet
    scheme: https
    tls_config:
      ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
      insecure_skip_verify: true
    bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
    kubernetes_sd_configs:
      - role: node
    relabel_configs:
      - action: labelmap
        regex: __meta_kubernetes_node_label_(.+)
      - target_label: __address__
        replacement: kubernetes.default.svc:443
      - source_labels: [__meta_kubernetes_node_name]
        regex: (.+)
        target_label: __metrics_path__
        replacement: /api/v1/nodes/$1/proxy/metrics/cadvisor
    metric_relabel_configs:
      # We only need the following metrics because they are needed by robusta-dev/krr, everything else we get from kube-eagle
      - action: keep
        if: '{__name__=~"(container_cpu_usage_seconds_total|container_memory_working_set_bytes)"}'

I think it would be an improvement if one could overwrite the queries mentioned in the docs to whatever fits the local setup.
Let users figure out the equivalent queries for tools like kube-eagle.

Too many historic pods causes querystring too long for Prometheus range_query API

Describe the bug

If we have many historic pods, the query expr will be very long. Then at some point, the request may be rejected by the gateway, e.g. Nginx. HTTP code 422 may be returned.

To Reproduce
Steps to reproduce the behavior:

Go to '...'
Click on '....'
Scroll down to '....'
See error

Expected behavior

Use POST API instead of GET.

https://github.com/4n4nd/prometheus-api-client-python/blob/39c5710521134fc450e9b4103cbb5995c05c5273/prometheus_api_client/prometheus_connect.py#L403-L409

But since we are using prometheus-api-client-python, it is not possible to do this.

Screenshots

Desktop (please complete the following information):

OS: [e.g. iOS]
Browser [e.g. chrome, safari]
Version [e.g. 22]

Smartphone (please complete the following information):

Device: [e.g. iPhone6]
OS: [e.g. iOS8.1]
Browser [e.g. stock browser, safari]
Version [e.g. 22]

Additional context
Add any other context about the problem here.

Can't install v1.2.1 through Homebrew on linux

Describe the bug
Cannot install krr with homebrew on linux since artifacts have been renamed

To Reproduce
Steps to reproduce the behavior:

brew upgrade krr
==> Upgrading 1 outdated package:
robusta-dev/krr/krr 1.0.0 -> 1.2.1
==> Fetching robusta-dev/krr/krr
Error: krr: Failed to download resource "krr"
Failure while executing; /usr/bin/env /home/linuxbrew/.linuxbrew/Homebrew/Library/Homebrew/shims/shared/curl --disable --cookie /dev/null --globoff --show-error --user-agent Linuxbrew/4.0.21\ $Linux\;\ x86_64\ Ubuntu\ 20.04.6\ LTS$\ curl/7.68.0 --header Accept-Language:\ en --retry 3 --fail --location --silent --head --request GET https://github.com/robusta-dev/krr/releases/download/v1.2.1/krr-linux-latest-v1.2.1.zip exited with 22. Here's the output:
curl: (22) The requested URL returned error: 404

The issue is with the artifact name on linux which is krr-linux-latest-v1.2.1.zip on brew but krr-ubuntu-latest-v1.2.1.zip on release page

Not sure on which side you want it to be fixed (homebrew vs artifact name)

Expected behavior
Krr should install properly

Screenshots
If applicable, add screenshots to help explain your problem.

Desktop (please complete the following information):

Windows 10 with WSL2

brew install on linux references the wrong asset ...

$ brew install krr
==> Fetching robusta-dev/krr/krr
Error: krr: Failed to download resource "krr"
Failure while executing; `/usr/bin/env /home/linuxbrew/.linuxbrew/Homebrew/Library/Homebrew/shims/shared/curl --disable --cookie /dev/null --globoff --show-error --user-agent Linuxbrew/4.0.27\ \(Linux\;\ x86_64\ Debian\ GNU/Linux\ 12\ \(bookworm\)\)\ curl/7.88.1 --header Accept-Language:\ en --retry 3 --fail --location --silent --head --request GET https://github.com/robusta-dev/krr/releases/download/v1.3.2/krr-linux-latest-v1.3.2.zip` exited with 22. Here's the output:
HTTP/2 404 
server: GitHub.com
date: Thu, 06 Jul 2023 15:17:29 GMT
content-type: text/plain; charset=utf-8
vary: X-PJAX, X-PJAX-Container, Turbo-Visit, Turbo-Frame, Accept-Encoding, Accept, X-Requested-With
cache-control: no-cache
strict-transport-security: max-age=31536000; includeSubdomains; preload
x-frame-options: deny
x-content-type-options: nosniff
x-xss-protection: 0
referrer-policy: no-referrer-when-downgrade
content-security-policy: default-src 'none'; base-uri 'self'; connect-src 'self'; form-action 'self'; img-src 'self' data:; script-src 'self'; style-src 'unsafe-inline'
content-length: 9
x-github-request-id: 4CF5:77F0:3E5E2C6:3F1ED1F:64A6DB09

curl: (22) The requested URL returned error: 404

Be able to explicitly ask for recommendations based on max/avg mem and/or cpu

Is your feature request related to a problem? Please describe.
As a user, I'd love to be able to explicitly select krr recommendations based on:

average cpu
max cpu
average mem
max mem

Stretch-goal: Be able to mix and match them

Describe the solution you'd like

Some additional ability to use krr in the following way:

krr --avg-mem --max-cpu

Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.

Writing my own custom strategy (this will take me a bit...would prefer to avoid) or hacking on a copy of the code.

Error trying to list pods

Describe the bug
When I run the simplest possible command, no pod is detected.

Full output with -v arg: https://pastebin.com/XZTGz2HY

Screenshots

Desktop (please complete the following information):

OS: Ubuntu WSL

Additional context

K8s version: 1.26.4

krr --install-completion crash

Describe the bug
krr --install-completion crashes when krr was install via homebrew

To Reproduce
Steps to reproduce the behavior:

$ brew tap robusta-dev/homebrew-krr
$ brew install krr
$ krr --install-completion

Expected behavior
Shell completion installed successfuly

Screenshots

Desktop (please complete the following information):

OS: macOS on M2 chip
Version: Ventura

New installation method: asdf

Is your feature request related to a problem? Please describe.
Linux users have to install krr manually via python or with homebrew, ..sad times.

Describe the solution you'd like
asdf is the ultimate package manager for CLI tools (it's great, you should all be using it), I would love to see an official krr plugin for asdf

https://asdf-vm.com/

Describe alternatives you've considered
If krr can be published as a simple binary, that would also be nice.

Additional context

	async def _gather_objects_recommendations(
	self, objects: list[K8sObjectData]
	) -> list[tuple[ResourceAllocations, MetricsData]]:
	recommendations: list[tuple[RunResult, MetricsData]] = await asyncio.gather(
	*[self._calculate_object_recommendations(object) for object in objects]
	)

robusta-dev / krr Goto Github PK

krr's People

Contributors

Stargazers

Watchers

Forkers

krr's Issues

1.2.1

1.3.2

Recommend Projects

Recommend Topics

Recommend Org