Comments (10)
cgroups
was made for this, getting it to work inside of a container might be more difficult.
Windows has a notion of this with JobObjects. I would bet its not as feature complete as cgroups
, but provide enough to be usable.
Mac doesn't seem to support anything link this at the process level. Would be interesting if possible to use hypervisor, by limiting the resources by running the processes in a native VM.
from elastic-agent.
Pinging @elastic/ingest-management (Team:Ingest Management)
from elastic-agent.
Based on previous internal communication I started looking into leveraging cgroups
for creating a child cgroup per child process and applying resource limitations to the child cgroup. By not creating an independent cgroup but creating it from the parent cgroup, the resource usage still shows up in the stats for the parent cgroup.
Some first POC for playing around with cgroups
:
import (
"fmt"
"os"
"os/exec"
"github.com/containerd/cgroups"
specs "github.com/opencontainers/runtime-spec/specs-go"
)
// POC for working with v1 cgroups
// demonstrating how a process can be assigned to a cgroup and
// assign subprocesses to subcgroups with resource limitations.
//
// Example usage for runing on docker:
// docker build . -t go-cgroup
// time docker run -v /sys/fs/cgroup:/sys/fs/cgroup:rw go-cgroup
// The time command shows that the program needs significantly more
// time to finish when the subprocess resources are limited
func main() {
// create a new cgroup and add the current process to it
mainPid := os.Getpid()
q := int64(9000)
p := uint64(10000)
resources := &specs.LinuxResources{CPU: &specs.LinuxCPU{Quota: &q, Period: &p}}
mainCgroup, err := cgroups.New(cgroups.V1, cgroups.StaticPath(fmt.Sprintf("%v", mainPid)), resources)
if err != nil {
panic(err)
}
if err := mainCgroup.Add(cgroups.Process{Pid: mainPid}); err != nil {
panic(err)
}
// create subprocess
// run a script that creates some CPU load, e.g. fibonacci calculation
cmd := exec.Cmd{Path: "./fibonacci.sh", Stdout: os.Stdout, Stderr: os.Stdout}
cmd.Start()
defer cmd.Wait()
if cmd.Process == nil {
panic("subprocess not successfully started")
}
childPid := cmd.Process.Pid
// add subprocess to a new child cgroup
// change the quota to period ratio for verifying that the CPU quotas applied to
// the cgroup are limiting the processing of the script.
// decreasing the quota -> increases run time
q = int64(1000)
p = uint64(10000)
resources = &specs.LinuxResources{CPU: &specs.LinuxCPU{Quota: &q, Period: &p}}
childCgroup, err := mainCgroup.New("childgroup", resources)
if err != nil {
panic(err)
}
if err := childCgroup.Add(cgroups.Process{Pid: childPid}); err != nil {
panic(err)
}
listProcesses("main", mainCgroup)
listProcesses("child", childCgroup)
printStats(fmt.Sprint(mainPid), mainCgroup)
printStats(fmt.Sprint(childPid), mainCgroup)
}
func listProcesses(name string, cg cgroups.Cgroup) {
processes, err := cg.Processes(cgroups.Pids, true)
if err != nil {
panic(err)
}
fmt.Println(fmt.Sprintf("Processes in %s cgroup", name))
for _, p := range processes {
fmt.Println(fmt.Sprintf("Pid: %v", p.Pid))
}
}
func printStats(name string, cg cgroups.Cgroup) {
stats, err := cg.Stat()
if err == nil {
fmt.Println(fmt.Sprintf("CPU usage for %s: %v: %v", name, stats.CPU.Usage.Total, stats.CPU.Usage.User))
} else {
panic(err)
}
}
from elastic-agent.
cgroups was made for this, getting it to work inside of a container might be more difficult.
There are definitly things we need to figure out inside containers, e.g. @axw just recently had to change the cgroup paths for metrics collection when running inside containers. I did run the above shared script inside a docker container though, so we should be able to use it as a start.
from elastic-agent.
Pinging @elastic/elastic-agent-control-plane (Team:Elastic-Agent-Control-Plane)
from elastic-agent.
@ruflin should be added in architecture V2 also, right?
from elastic-agent.
Happy to label this with v2 architecture for tracking. My current take is that likely we should this more via deployment instead of having elastic-agent being responsible to manage resources. It's an ongoing conversation.
from elastic-agent.
Label added.
from elastic-agent.
cc @ph for the V2 Architecture.
from elastic-agent.
On linux I set the cgroup via systemd limits on the parent process(the Agent). All sub children will inherit this limit
eg.
systemctl set-property elastic-agent.service MemoryLimit=50G
systemctl status elastic-agent.service
● elastic-agent.service - Elastic Agent is a unified agent to observe, monitor and protect your system.
Loaded: loaded (/etc/systemd/system/elastic-agent.service; enabled; vendor preset: enabled)
Drop-In: /etc/systemd/system.control/elastic-agent.service.d
└─50-MemoryLimit.conf
Active: active (running) since Tue 2022-05-24 21:00:20 CEST; 18h ago
Main PID: 2756710 (elastic-agent)
Tasks: 587 (limit: 618877)
Memory: 724.9M (limit: 50.0G)
CGroup: /system.slice/elastic-agent.service
├─2756710 /opt/Elastic/Agent/elastic-agent
├─2842320 /opt/Elastic/Agent/data/elastic-agent-b9a28a/install/filebeat-8.2.0-linux-x86_64/filebeat -E setup.ilm.enabled=false -E setup.template.enabled=false -E management.enabled=true -E logging.level=debug -E gc_percent=${FILEBEAT_GOG>
├─2842351 /opt/Elastic/Agent/data/elastic-agent-b9a28a/install/metricbeat-8.2.0-linux-x86_64/metricbeat -E setup.ilm.enabled=false -E setup.template.enabled=false -E management.enabled=true -E logging.level=debug -E gc_percent=${METRICBE>
├─2842651 /opt/Elastic/Agent/data/elastic-agent-b9a28a/install/osquerybeat-8.2.0-linux-x86_64/osquerybeat -E setup.ilm.enabled=false -E setup.template.enabled=false -E management.enabled=true -E logging.level=debug -E logging.level=error>
├─2842677 /opt/Elastic/Agent/data/elastic-agent-b9a28a/install/filebeat-8.2.0-linux-x86_64/filebeat -E setup.ilm.enabled=false -E setup.template.enabled=false -E management.enabled=true -E logging.level=debug -E gc_percent=${FILEBEAT_GOG>
├─2842706 /opt/Elastic/Agent/data/elastic-agent-b9a28a/install/metricbeat-8.2.0-linux-x86_64/metricbeat -E setup.ilm.enabled=false -E setup.template.enabled=false -E management.enabled=true -E logging.level=debug -E gc_percent=${METRICBE>
├─2842840 /opt/Elastic/Agent/data/elastic-agent-b9a28a/install/osquerybeat-8.2.0-linux-x86_64/osqueryd --flagfile=osquery/osquery.flags --pack_delimiter=_ --extensions_socket=/var/run/120651786/osquery.sock --database_path=osquery/osquer>
└─2842901 /opt/Elastic/Agent/data/elastic-agent-b9a28a/install/osquerybeat-8.2.0-linux-x86_64/osquery-extension.ext --socket /var/run/120651786/osquery.sock --timeout 10 --interval 3
from elastic-agent.
Related Issues (20)
- [Flaky Test]: TestFleetManagedUpgradeUnprivileged, TestFleetManagedUpgradePrivileged – context deadline exceeded HOT 3
- Linux != Only Debian/Ubuntu Lol HOT 1
- Integration Testing Framework: Add retries to Artifacts API call HOT 1
- Use snapshot API to download latest snapshot package in integration test and in agent HOT 2
- Remove dependency from snapshot api by removing the upgrade to specific snapshot feature HOT 2
- [Enhancement]: Enforce Agent Tamper Protection when enrolling agents using the Force flag HOT 9
- NVIDIA DCGM metrics support
- [RPM]: Failed to create symbolic link error on running enable command when enroll command is run with `--delay enroll`. HOT 5
- Improve retry mechanism for HTTP clients (integration tests and agent upgrades) HOT 7
- [Flaky Test]: Integration tests on Windows – virus or potentially unwanted software HOT 5
- [Flaky Test] VM orchestration is unstable in integration tests HOT 13
- [Automation] Update an existing versions file PR HOT 1
- Add a mage target that packages the elastic-agent-core package instead of recompiling the agent HOT 3
- [GCP instances]: Windows Agent goes Unhealthy and Healthy inconsistently when installed with Elastic Defend. HOT 9
- [8.13] Endpoint service logs are no longer collected in diagnostics HOT 4
- [Flaky Test]: TestFleetAirGappedUpgradeUnprivileged – Error while dialing: dial unix /opt/Elastic/Agent/elastic-agent.sock: connect: no such file or directory HOT 4
- Use different version for elastic-agent packages built from commit in Integration Tests HOT 2
- Formalize expectations when elastic-agent is installed via package manager HOT 4
- [Flaky Test]: TestFakeInputSuite – timed out waiting for Manager to start HOT 1
- Refactor/Expand elastic-agent diagnostics integration tests HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from elastic-agent.