Comments (10)
- Is it cloud, on-prem, some test env with kind/minikube/etc
The k8s cluster is deployed on-prem on proxmox. 3 master nodes, 3 worker nodes.
- Number of nodes and approximate number of pods
±150
- Nodes resources (CPU, RAM, storage)
maple@ubuntu:tk8s-mon$ k top nodes
NAME CPU(cores) CPU% MEMORY(bytes) MEMORY%
tk8sm01 423m 10% 4086Mi 53%
tk8sm02 316m 7% 3477Mi 45%
tk8sm03 252m 6% 3216Mi 42%
tk8sw01 381m 4% 11859Mi 37%
tk8sw02 671m 8% 8623Mi 27%
tk8sw03 1051m 13% 18054Mi 56%
- Exact audit policy used
https://falco.org/docs/install-operate/third-party/learning/#falco-with-multiple-sources
ELK event count screenshot from opensearch
You can see that normal event rate[5m] is around 6k-7k. When the kubeshark was deployed the rate jumped to 13k.
The empty bars are kube-apiserver crashes. We needed to temporarily disable auditing in the kube-apiserver.yaml
so that we would be able to uninstall the kubeshark helm release.
Additionally see the screenshots from Grafana
kubeshark daemonset memory load
kube-apiserver memory load
At the moment we are gonna to test following
- upgrade nodes to use cgroup v2
- limit the kube-apiserver memory resources so that next time at least the node itself survive
- limit the kubeshark memory resources according to the guide https://docs.kubeshark.co/en/performance
- scope the kubeshark to specific namespace
You are right that the audit policy definition has huge impact on the amount of the events generated. The one we are using is pretty verbose as it's needed for analysis by Falco.
from kubeshark.
Thx for the info @MMquant
To confirm if Kubeshark itself generates those events, can you please exclude Kubeshark service account from auditing?
If you installed Kubeshark in the default namespace, this can be done with the rule:
- level: None
userGroups: ["system:serviceaccounts"]
users: ["system:serviceaccount:default:kubeshark-service-account"]
So, when you have time, remove Kubeshark, add rule, restart API servers, install Kubeshark and check if number of events is that high again
from kubeshark.
Hi @corest ,
we have just tested the rule and it seems that the rule indeed filtered-out the "DOS events".
from kubeshark.
@MMquant thanks for reporting this. We are actively looking into this and will report back our findings.
from kubeshark.
Hi @MMquant
Thx for reporting this issue.
I've tried to verify this on our test environment in EKS cluster. (5 t3.large nodes, ~100 pods)
This graph shows a number of audit log events in the cluster.
I installed Kubeshark at 7:30.
There was a little spike in a number of events at this point which stabilized after Kubeshark made its initial discovery. After that, no anomalies in the number of events were detected.
Also, there is no visible additional load on the Kubernetes API server.
We did similar tests before on a cluster with 100 nodes and ~1000 pods and didn't find any issues.
This doesn't prove that there are no such issues though.
Maybe EKS does not have that verbose audit policy, dunno.
Please provide more details on your setup.
- Is it cloud, on-prem, some test env with kind/minikube/etc
- Number of nodes and approximate number of pods
- Nodes resources (CPU, RAM, storage)
- Exact audit policy used
Also maybe ELK can provide some details on anomalies? You wrote that no common "DDOS" events were found, but maybe you can provide at least the difference in the count of events before Kubeshark and after.
E.g. average count before Kubeshark was 1k events/s and after Kubeshark was installed - 10k events/s.
That would help us to identify the magnitude of the issue at least.
from kubeshark.
from kubeshark.
@corest I think we can close this issue can't we?
@alongir The logs I posted have been sorted out by our devops team. Moreover I wouldn't discuss that log error in this issue as I think these things are not related.
from kubeshark.
@MMquant we will keep this opened for now as I have few things to work on:
- Create environment with extensive audit policy + falco to replicate your issue.
- Find why Kubeshark generates so many events
- Update docs on our side regarding excluding Kubeshar from audit events.
from kubeshark.
- Recreated cluster with 3 nodes and audit policy provided by Falco
- Installed Falco, some workloads. Left cluster for 1h. Average rate of audit events - 345 events/minute
- Installed Kubeshark and enabled scripts to have some activity. Left for 1h. Average rage of audit events increased to 352 events/minute
Overall in 1h Kubeshark service account generated ~300 events and that is expected and normal.
Also there was no additional visible load on Kubernetes API generated.
So for the case of this issue I think the reason behind high volume of events is very specific to cluster setup and can't be fixed on Kubeshark side as for now.
Last thing for this issue - I'll add section here https://docs.kubeshark.co/en/troubleshooting on how to exclude kubeshark audit events from monitoring
FYI @alongir
from kubeshark.
Done
from kubeshark.
Related Issues (20)
- Add securityContext.capabilities to Hub and Front
- crashloop on k3d/k3s HOT 1
- Namespace Kubeshark installation Vs Cluster Kubeshark installation HOT 1
- Couldn't initialize the tracer HOT 10
- Improve support for homebrew HOT 8
- Add Websocket support
- Detect socket creation errors using eBPF
- Kind support( pf-ring, ebpf, serviceMesh) HOT 8
- Specific PCAP TTL for Errors HOT 1
- Kubeshark with Bottlerocket? HOT 1
- Client OS: `windows 10`, chrome: `121.0.6167.185` failing HOT 1
- New Helper named Uniqe HOT 1
- Using kubeshark CLI with multiple kubeconfig files
- We can't verify pre-built binaries for windows/amd64 with checksum files HOT 1
- WebSockets in an IPv6 primary cluster fail HOT 2
- no push access for the kubeshark fork HOT 1
- Windows 11 Curl install does not work HOT 1
- Improve/Complete AMQP support
- Resolved K8s component name is inaccurate HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from kubeshark.