traas-stack / holoinsight-agent Goto Github PK
View Code? Open in Web Editor NEWAgent for HoloInsight
License: Apache License 2.0
Agent for HoloInsight
License: Apache License 2.0
K8s changes the Container Runtime on a Node from Docker Engine to containerd.
See https://kubernetes.io/docs/tasks/administer-cluster/migrating-from-dockershim/change-runtime-containerd/
Supports containerd as container layer.
Now, the log monitor feature can compute metrics from logs in Agent side.
Some users to this feature to compute their business error count.
But the result only contains error count, not log texts that lead to this error count.
Supports keeping some log texts when a certain condition(for example, when wrong) is met.
When there is a network isolation between HoloInsight-Agent pod and target pod, network plugins have to work with nsenter now.
But nsenter solution only works for runc runtime class not rund.
If we use docker Standard AP
when interacting with the container, we can get better compatibility.
Of course, performance issues must also be properly considered.
It is recommended to perform nsenter and docker exec performance tests on runc/rund respectively.
Currently there is no arm64v8 (and others arch) agent image, which make it inconvenient to run in my M1 mac environment although Apple supports 'Rosetta 2' to help amd64 binary run in arm64.
Supports building arm64v8 agent image
Container helper is compile with glibc. When running the helper bin in alpine.
Add DCGM-export metrics collection
When running in sidecar/VM mode, it is almost always necessary to enable the collection of system indicators.
Enable system metrics collector by default
For unknown reasons, there may be some execs that remain and do not exit.
I think we need to have some means to prevent arbitrary execs from hanging permanently.
This problem is not easy to reproduce.
It is very likely that holoinsight-agent exited abnormally while tar was reading data from STDIN, causing STDIN to be unable to read EOF for a long time, so it hung.
No exec remains.
There are several ideas:
No response
pkg/pipeline/sys/sys.go:169
SysPipeline.emitLoop
func (p *SysPipeline) emitLoop() {
...
p.emitOnce(alignTs)
...
}
Here alignTs is the start time of current time window when execute.
But the data event time should be changed to (alignTs - windowSize).
During that reboot or deploy time window there will be data loss, about 1~2 minutes.
This can lead to a poor user experience, as the data on the page will drop significantly, especially when the user's cluster size is relatively small.
Record the data location of consumption, and start consumption from the last position after restarting.
This is useful for log monitoring.
Add license header and related scripts
Add license header and related scripts
The main branch is important. We need to make sure that the codes of the main branch are always available to compile and run.
But currently there is no any protection for the main branch.
Configure github 'branch protection rule' for the main branch.
Check items before merging:
There are many metadatas in the pod labels. If we can support ref pod label as metric data tag. Then we can create more powerful queries with more dimensions.
Add ability to support ref pod label as metric data tag.
I have deployed a holoinsight-agent daemonset in my k8s cluster.
First, I configure the daemonset with apikey1 of tenant1.
And then I configure it with apikey2 of tenant2 and redeploy it.
In server side database, the metadata of these agents in 'gaea_agent' table still have tenant = 'tenant1'.
See description above
I think the correct case should be tenant = 'tenant2'.
Add deploy files for k8s.
For example deployment yaml or helm.
Add deploy files for k8s.
For example deployment yaml or helm.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.