Git Product home page Git Product logo

holoinsight-agent's People

Contributors

archerny avatar jsy1001de avatar sw1136562366 avatar wangsiyuan-code avatar xzchaoo avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

holoinsight-agent's Issues

Log sample ability

Describe This Problem

Now, the log monitor feature can compute metrics from logs in Agent side.
Some users to this feature to compute their business error count.
But the result only contains error count, not log texts that lead to this error count.

Proposal

Supports keeping some log texts when a certain condition(for example, when wrong) is met.

Additional Context

refactor: use docker Standard API instead of nsenter

Describe This Problem

When there is a network isolation between HoloInsight-Agent pod and target pod, network plugins have to work with nsenter now.
But nsenter solution only works for runc runtime class not rund.

If we use docker Standard AP when interacting with the container, we can get better compatibility.
Of course, performance issues must also be properly considered.
It is recommended to perform nsenter and docker exec performance tests on runc/rund respectively.

Proposal

Additional Context

Supports building arm64v8 agent image

Describe This Problem

Currently there is no arm64v8 (and others arch) agent image, which make it inconvenient to run in my M1 mac environment although Apple supports 'Rosetta 2' to help amd64 binary run in arm64.

Proposal

Supports building arm64v8 agent image

Additional Context

Enable system metrics collector by default

Describe This Problem

When running in sidecar/VM mode, it is almost always necessary to enable the collection of system indicators.

Proposal

Enable system metrics collector by default

Additional Context

some execs that remain and do not exit

Describe this problem

image
For unknown reasons, there may be some execs that remain and do not exit.
I think we need to have some means to prevent arbitrary execs from hanging permanently.

Steps to reproduce

This problem is not easy to reproduce.
It is very likely that holoinsight-agent exited abnormally while tar was reading data from STDIN, causing STDIN to be unable to read EOF for a long time, so it hung.

Expected behavior

No exec remains.

Additional Information

There are several ideas:

  • Always use timeout to wrap all commands, and KILL when timeout occurs. But be aware that timeout can generate zombie processes in some scenarios. See #99
  • All exec commands carry special environment variable tags to indicate that they are calls from holoinsight-agent. Periodically clean up processes that need to be marked. See utils.go

No response

Wrong event time when emitting

Describe this problem

pkg/pipeline/sys/sys.go:169
SysPipeline.emitLoop

func (p *SysPipeline) emitLoop() {
  ...
  p.emitOnce(alignTs)
  ...
}

Here alignTs is the start time of current time window when execute.
But the data event time should be changed to (alignTs - windowSize).

To Reproduce

Expected behavior

Additional Information

Lossless restart

Describe This Problem

During that reboot or deploy time window there will be data loss, about 1~2 minutes.
This can lead to a poor user experience, as the data on the page will drop significantly, especially when the user's cluster size is relatively small.

Proposal

Record the data location of consumption, and start consumption from the last position after restarting.
This is useful for log monitoring.

Additional Context

Add license header and related scripts

Refactor Request

Add license header and related scripts

Description of the proposed solution

Add license header and related scripts

Additional context

Protected the main branch with status checks

Describe This Problem

The main branch is important. We need to make sure that the codes of the main branch are always available to compile and run.
But currently there is no any protection for the main branch.

Proposal

Configure github 'branch protection rule' for the main branch.
image

Check items before merging:

  1. Compile successfully
  2. All UT passed
  3. Codes are formatted
  4. All Golang files have license header

Additional Context

Add ability to support ref pod label as metric data tag

Describe This Problem

There are many metadatas in the pod labels. If we can support ref pod label as metric data tag. Then we can create more powerful queries with more dimensions.

Proposal

Add ability to support ref pod label as metric data tag.

Additional Context

Wrong agent metadata after change apikey and redeploy daemonset

Describe this problem

I have deployed a holoinsight-agent daemonset in my k8s cluster.
First, I configure the daemonset with apikey1 of tenant1.
And then I configure it with apikey2 of tenant2 and redeploy it.
In server side database, the metadata of these agents in 'gaea_agent' table still have tenant = 'tenant1'.

To Reproduce

See description above

Expected behavior

I think the correct case should be tenant = 'tenant2'.

Additional Information

Add deploy files for k8s

Describe This Problem

Add deploy files for k8s.
For example deployment yaml or helm.

Proposal

Add deploy files for k8s.
For example deployment yaml or helm.

Additional Context

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.