Git Product home page Git Product logo

Comments (10)

blakerouse avatar blakerouse commented on July 22, 2024 1

cgroups was made for this, getting it to work inside of a container might be more difficult.

Windows has a notion of this with JobObjects. I would bet its not as feature complete as cgroups, but provide enough to be usable.

Mac doesn't seem to support anything link this at the process level. Would be interesting if possible to use hypervisor, by limiting the resources by running the processes in a native VM.

from elastic-agent.

elasticmachine avatar elasticmachine commented on July 22, 2024

Pinging @elastic/ingest-management (Team:Ingest Management)

from elastic-agent.

simitt avatar simitt commented on July 22, 2024

Based on previous internal communication I started looking into leveraging cgroups for creating a child cgroup per child process and applying resource limitations to the child cgroup. By not creating an independent cgroup but creating it from the parent cgroup, the resource usage still shows up in the stats for the parent cgroup.

Some first POC for playing around with cgroups:

import (
	"fmt"
	"os"
	"os/exec"

	"github.com/containerd/cgroups"
	specs "github.com/opencontainers/runtime-spec/specs-go"
)

// POC for working with v1 cgroups
// demonstrating how a process can be assigned to a cgroup and
// assign subprocesses to subcgroups with resource limitations.
//
// Example usage for runing on docker:
// docker build . -t go-cgroup
// time docker run -v /sys/fs/cgroup:/sys/fs/cgroup:rw go-cgroup
// The time command shows that the program needs significantly more
// time to finish when the subprocess resources are limited
func main() {
	// create a new cgroup and add the current process to it
	mainPid := os.Getpid()
	q := int64(9000)
	p := uint64(10000)
	resources := &specs.LinuxResources{CPU: &specs.LinuxCPU{Quota: &q, Period: &p}}
	mainCgroup, err := cgroups.New(cgroups.V1, cgroups.StaticPath(fmt.Sprintf("%v", mainPid)), resources)
	if err != nil {
		panic(err)
	}
	if err := mainCgroup.Add(cgroups.Process{Pid: mainPid}); err != nil {
		panic(err)
	}

	// create subprocess
	// run a script that creates some CPU load, e.g. fibonacci calculation
	cmd := exec.Cmd{Path: "./fibonacci.sh", Stdout: os.Stdout, Stderr: os.Stdout}
	cmd.Start()
	defer cmd.Wait()
	if cmd.Process == nil {
		panic("subprocess not successfully started")
	}
	childPid := cmd.Process.Pid

	// add subprocess to a new child cgroup
	// change the quota to period ratio for verifying that the CPU quotas applied to
	// the cgroup are limiting the processing of the script.
	// decreasing the quota -> increases run time
	q = int64(1000)
	p = uint64(10000)
	resources = &specs.LinuxResources{CPU: &specs.LinuxCPU{Quota: &q, Period: &p}}
	childCgroup, err := mainCgroup.New("childgroup", resources)
	if err != nil {
		panic(err)
	}
	if err := childCgroup.Add(cgroups.Process{Pid: childPid}); err != nil {
		panic(err)
	}
	listProcesses("main", mainCgroup)
	listProcesses("child", childCgroup)
	printStats(fmt.Sprint(mainPid), mainCgroup)
	printStats(fmt.Sprint(childPid), mainCgroup)
}

func listProcesses(name string, cg cgroups.Cgroup) {
	processes, err := cg.Processes(cgroups.Pids, true)
	if err != nil {
		panic(err)
	}
	fmt.Println(fmt.Sprintf("Processes in %s cgroup", name))
	for _, p := range processes {
		fmt.Println(fmt.Sprintf("Pid: %v", p.Pid))
	}
}

func printStats(name string, cg cgroups.Cgroup) {
	stats, err := cg.Stat()
	if err == nil {
		fmt.Println(fmt.Sprintf("CPU usage for %s: %v: %v", name, stats.CPU.Usage.Total, stats.CPU.Usage.User))
	} else {
		panic(err)
	}
}

from elastic-agent.

simitt avatar simitt commented on July 22, 2024

cgroups was made for this, getting it to work inside of a container might be more difficult.

There are definitly things we need to figure out inside containers, e.g. @axw just recently had to change the cgroup paths for metrics collection when running inside containers. I did run the above shared script inside a docker container though, so we should be able to use it as a start.

from elastic-agent.

elasticmachine avatar elasticmachine commented on July 22, 2024

Pinging @elastic/elastic-agent-control-plane (Team:Elastic-Agent-Control-Plane)

from elastic-agent.

jlind23 avatar jlind23 commented on July 22, 2024

@ruflin should be added in architecture V2 also, right?

from elastic-agent.

ruflin avatar ruflin commented on July 22, 2024

Happy to label this with v2 architecture for tracking. My current take is that likely we should this more via deployment instead of having elastic-agent being responsible to manage resources. It's an ongoing conversation.

from elastic-agent.

jlind23 avatar jlind23 commented on July 22, 2024

Label added.

from elastic-agent.

jlind23 avatar jlind23 commented on July 22, 2024

cc @ph for the V2 Architecture.

from elastic-agent.

zez3 avatar zez3 commented on July 22, 2024

On linux I set the cgroup via systemd limits on the parent process(the Agent). All sub children will inherit this limit
eg.

systemctl set-property elastic-agent.service MemoryLimit=50G

systemctl status elastic-agent.service
● elastic-agent.service - Elastic Agent is a unified agent to observe, monitor and protect your system.
     Loaded: loaded (/etc/systemd/system/elastic-agent.service; enabled; vendor preset: enabled)
    Drop-In: /etc/systemd/system.control/elastic-agent.service.d
             └─50-MemoryLimit.conf
     Active: active (running) since Tue 2022-05-24 21:00:20 CEST; 18h ago
   Main PID: 2756710 (elastic-agent)
      Tasks: 587 (limit: 618877)
     Memory: 724.9M (limit: 50.0G)
     CGroup: /system.slice/elastic-agent.service
             ├─2756710 /opt/Elastic/Agent/elastic-agent
             ├─2842320 /opt/Elastic/Agent/data/elastic-agent-b9a28a/install/filebeat-8.2.0-linux-x86_64/filebeat -E setup.ilm.enabled=false -E setup.template.enabled=false -E management.enabled=true -E logging.level=debug -E gc_percent=${FILEBEAT_GOG>
             ├─2842351 /opt/Elastic/Agent/data/elastic-agent-b9a28a/install/metricbeat-8.2.0-linux-x86_64/metricbeat -E setup.ilm.enabled=false -E setup.template.enabled=false -E management.enabled=true -E logging.level=debug -E gc_percent=${METRICBE>
             ├─2842651 /opt/Elastic/Agent/data/elastic-agent-b9a28a/install/osquerybeat-8.2.0-linux-x86_64/osquerybeat -E setup.ilm.enabled=false -E setup.template.enabled=false -E management.enabled=true -E logging.level=debug -E logging.level=error>
             ├─2842677 /opt/Elastic/Agent/data/elastic-agent-b9a28a/install/filebeat-8.2.0-linux-x86_64/filebeat -E setup.ilm.enabled=false -E setup.template.enabled=false -E management.enabled=true -E logging.level=debug -E gc_percent=${FILEBEAT_GOG>
             ├─2842706 /opt/Elastic/Agent/data/elastic-agent-b9a28a/install/metricbeat-8.2.0-linux-x86_64/metricbeat -E setup.ilm.enabled=false -E setup.template.enabled=false -E management.enabled=true -E logging.level=debug -E gc_percent=${METRICBE>
             ├─2842840 /opt/Elastic/Agent/data/elastic-agent-b9a28a/install/osquerybeat-8.2.0-linux-x86_64/osqueryd --flagfile=osquery/osquery.flags --pack_delimiter=_ --extensions_socket=/var/run/120651786/osquery.sock --database_path=osquery/osquer>
             └─2842901 /opt/Elastic/Agent/data/elastic-agent-b9a28a/install/osquerybeat-8.2.0-linux-x86_64/osquery-extension.ext --socket /var/run/120651786/osquery.sock --timeout 10 --interval 3

from elastic-agent.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.