google / cadvisor Goto Github PK

Analyzes resource usage and performance characteristics of running containers.

License: Other

Go 94.22% Shell 1.95% Makefile 0.20% Python 0.37% JavaScript 2.38% HTML 0.66% Dockerfile 0.23%

cadvisor's Introduction

cAdvisor (Container Advisor) provides container users an understanding of the resource usage and performance characteristics of their running containers. It is a running daemon that collects, aggregates, processes, and exports information about running containers. Specifically, for each container it keeps resource isolation parameters, historical resource usage, histograms of complete historical resource usage and network statistics. This data is exported by container and machine-wide.

cAdvisor has native support for Docker containers and should support just about any other container type out of the box. We strive for support across the board so feel free to open an issue if that is not the case. cAdvisor's container abstraction is based on lmctfy's so containers are inherently nested hierarchically.

Quick Start: Running cAdvisor in a Docker Container

To quickly tryout cAdvisor on your machine with Docker, we have a Docker image that includes everything you need to get started. You can run a single cAdvisor to monitor the whole machine. Simply run:

VERSION=v0.49.1 # use the latest release version from https://github.com/google/cadvisor/releases
sudo docker run \
  --volume=/:/rootfs:ro \
  --volume=/var/run:/var/run:ro \
  --volume=/sys:/sys:ro \
  --volume=/var/lib/docker/:/var/lib/docker:ro \
  --volume=/dev/disk/:/dev/disk:ro \
  --publish=8080:8080 \
  --detach=true \
  --name=cadvisor \
  --privileged \
  --device=/dev/kmsg \
  gcr.io/cadvisor/cadvisor:$VERSION

cAdvisor is now running (in the background) on http://localhost:8080. The setup includes directories with Docker state cAdvisor needs to observe.

Note: If you're running on CentOS, Fedora, or RHEL (or are using LXC), take a look at our running instructions.

We have detailed instructions on running cAdvisor standalone outside of Docker. cAdvisor running options may also be interesting for advanced usecases. If you want to build your own cAdvisor Docker image, see our deployment page.

For Kubernetes users, cAdvisor can be run as a daemonset. See the instructions for how to get started, and for how to kustomize it to fit your needs.

Building and Testing

See the more detailed instructions in the build page. This includes instructions for building and deploying the cAdvisor Docker image.

Exporting stats

cAdvisor supports exporting stats to various storage plugins. See the documentation for more details and examples.

Web UI

cAdvisor exposes a web UI at its port:

http://<hostname>:<port>/

See the documentation for more details.

Remote REST API & Clients

cAdvisor exposes its raw and processed stats via a versioned remote REST API. See the API's documentation for more information.

There is also an official Go client implementation in the client directory. See the documentation for more information.

Roadmap

cAdvisor aims to improve the resource usage and performance characteristics of running containers. Today, we gather and expose this information to users. In our roadmap:

Advise on the performance of a container (e.g.: when it is being negatively affected by another, when it is not receiving the resources it requires, etc).
Auto-tune the performance of the container based on previous advise.
Provide usage prediction to cluster schedulers and orchestration layers.

Community

Contributions, questions, and comments are all welcomed and encouraged! cAdvisor developers hang out on Slack in the #sig-node channel (get an invitation here). We also have discuss.kubernetes.io.

Please reach out and get involved in the project, we're actively looking for more contributors to bring on board!

Core Team

Frequent Collaborators

@haircommander, RedHat

Emeritus

cadvisor's People

Stargazers

Watchers

Forkers

cburroughs rjnagal liuming910 monnand yekeqiang crosbymichael proppy yzq1979 kleopatra999 vmarmol pborreli abhinavn jhspaybar jchauncey vishh why404 bozzcq silky alexiswtd xuzhaokui chobits ngpestelos cnh darthlukan rakyll silence2012 martin-ly johnsondiao ahnan4arch pinterb udomchoklove blackhat06 byxorna tmlbl zhgwenming boostrack-oss dchen1107 stigkj stahnma xiewenhui rossbachp maxamillion afolarin nickman yyzi jmyounker bestdpf pmorie billthebest devopstw rayleyva epicpaas brendandburns hmalphettes jalateras sesteva xiaohui dnephin vlaxy satnam6502 streamrail plietar blakelapierre crackleengineering alihalabyah henrypfhu ashahab unixorn kevin1024 soa4java xiangflytang peerlibrary wlanslovenija altiscale hopkings2008 rainsome-org1 rancher seriesdigital alex-docker mindscratch bs-github akshshar nikicat emsu markdav apsaltis stefanjacobs123 sudhakso jbdalido agtlucas liudch cjlyth is-yangchen xiaofengzhiyu eparis kateknister ravigadde udono burmanm mraygalaxy

cadvisor's Issues

cAdvisor using 70% cpu

I'll be happy to investigate this tomorrow. My container running cAdvisor (from the google/cadvisor from the index) is using 70% CPU. It is not affected by closing the web interface.

Here's a sample of the log:

2014/06/13 11:08:14 Get(/)
2014/06/13 11:08:14 Request took 17.088113ms
2014/06/13 11:08:15 Api - Container(/)
2014/06/13 11:08:15 Get(/)
2014/06/13 11:08:15 Request took 17.685755ms
2014/06/13 11:08:16 Api - Container(/)
2014/06/13 11:08:16 Get(/)
2014/06/13 11:08:16 Request took 21.602546ms
2014/06/13 11:08:17 Api - Container(/)
2014/06/13 11:08:17 Get(/)
2014/06/13 11:08:17 Request took 19.849205ms
2014/06/13 11:08:18 Api - Container(/)
2014/06/13 11:08:18 Get(/)
2014/06/13 11:08:18 Request took 24.266196ms
2014/06/13 11:08:19 Api - Container(/)
2014/06/13 11:08:19 Get(/)
2014/06/13 11:08:19 Request took 20.246778ms
2014/06/13 11:08:20 Api - Container(/)
2014/06/13 11:08:20 Get(/)
2014/06/13 11:08:20 Request took 34.460933ms
2014/06/13 11:08:21 Api - Container(/)
2014/06/13 11:08:21 Get(/)
2014/06/14 19:54:38 Housekeeping(/docker/f603ef03a9b80db8497e994f68e2aac773640042e91dc0989d2b1d1130a05707) took 133.555829ms
2014/06/14 19:55:20 Housekeeping(/docker/f603ef03a9b80db8497e994f68e2aac773640042e91dc0989d2b1d1130a05707) took 272.423621ms
2014/06/14 19:57:34 Housekeeping(/docker/f603ef03a9b80db8497e994f68e2aac773640042e91dc0989d2b1d1130a05707) took 247.436739ms
2014/06/14 19:57:35 Housekeeping(/docker/0da4c6826c07ad89a02b7c8e55267c9ce30ab2d3efddc66fa4b36a7a96352289) took 272.757455ms

I'll ^C+\ it tomorrow and check out the goroutines. Happy to fix this, I just wanted to get it recorded before I start.

Dasboard view

Hi
I had usability request : when graphs information displayed user should scroll down to see full information , it very hard to compare and find relations between different graphs - Root cause

I suggest to minimize graphs to able see and compare them in the same page and same time frame - like other monitor tool do (example zabbix, nagios ) dashboard

Thanks

Handle non-unified hierarchy starting point

On some systems (e.g.: latest Ubuntu), when running cAdvisor at the root the cgroups are non unified:

cpu,cpuacct: /user/blah
cpuset: /
blkio: /user/blah

This breaks us when we try to read the stats relative to ourselves. Although the world is going unified, we should not break in these cases and at least gracefully degrade.

Implement and use a raw driver

We should have a raw cgroup driver which just shows stats. That would add support for LXC as well as raw systemd systems.

cadvisor in a CentOS 6 host

I'm trying to use cadvisor in a CentOS 6 host with kernel 2.6.32-431.17.1.el6.x86_64 and docker-io 0.11.1-4.el6.x86_64. SELinux is disabled.

When I run:

docker run \
  --volume=/var/run:/var/run:rw \
  --volume=/sys/fs/cgroup/:/sys/fs/cgroup:ro \
  --volume=/var/lib/docker/:/var/lib/docker:ro \
  --publish=8080:8080 \
  --detach=true \
  google/cadvisor

it fails because /sys/fs/cgroup/ do not exist under CentOS 6. There is /cgroup/ instead.

So I'm trying with:

docker run \
  --volume=/var/run:/var/run:rw \
  --volume=/cgroup/:/sys/fs/cgroup:ro \
  --volume=/var/lib/docker/:/var/lib/docker:ro \
  --publish=8080:8080 \
  --detach=true \
  google/cadvisor

But now I get this:

lxc-start: No such file or directory - failed to mount '/cgroup' on '/usr/lib64/lxc/rootfs///sys/fs/cgroup'
lxc-start: failed to setup the mount entries for 'dfae50b907d3f916a5133a1e68d56623e2de709e43385acb9d665d307c3843f4'
lxc-start: failed to setup the container
lxc-start: invalid sequence number 1. expected 2
lxc-start: failed to spawn 'dfae50b907d3f916a5133a1e68d56623e2de709e43385acb9d665d307c3843f4'

¿Any idea?

instant cpu usage is not pushed to influxdb

Graphs should have legends

They used to, but some refactoring before launch made them go away.

Monitor process inside container

Hi
It will be grate to be able to see same information for specific process inside container - not only the container info

Thanks

cAdvisor doesn't work with older versions of Docker

Libcontainer changed their state file format in newer versions of Docker and we only use those. We need to fail back gracefully with older versions.

Have Dockerfiles build from source

Memory usage breakdown should describe hot/cold

Today the user has no idea why the graph is the color that it is.

Page fault and memory usage suggestions

A couple of suggestions:

The graph of memory includes the kernel cache, which is not actually used from the point of view of monitoring or applications. It may be helpful to separate out the cached and total memory usage graph. Perhaps three overlayed points, one for cache size, one for total, and one for (total - cache) to get the actual application-used amount.
The graph for Page Faults occasionally shows negative values. Judging from the graph of memory usage, it looks like this corresponds with kernel cache flushes. It would be more helpful to show two overlayed lines, one for actual cache misses and one for cache flushes. This way each could be made always positive and the ambiguity is eliminated.

Provide a plugin to store cadvisor stats to BigQuery

Allow long housekeeping interval to be configurable

Allow it to be tweaked or even turned off for more packed machines

wrong stats error when docker container restarted

Environment is CoreOS (beta channel)

When a systemd.service that starts a docker container is restarted, cadvisor starts spewing errors about wrong stats. For example:

Failed to update stats for container "/system.slice/locksmithd.service": wrong stats: current CPU usage is less than prev CPU usage

Docker driver doesn't register for Docker containers in Core OS

This means no network stats.

Container name is not presented

Hi
Container name is not presented, in main page only GUID
root docker 3ba08c450497df5facf35139a02cd1fd3038a3d424d6972531098ed531c9f644

Thanks.

Only track some containers

As @crosbymichael mentioned in IRC and also a way for us to narrow down what we track when we start doing heavier things.

Unit tests for storage driver implementations

As we added the storage driver interface, there might be several storage driver implementations. We could provide a set of unit tests so that any driver implementation must pass all these tests. This could help us to reduce the possible redundant code and provide a guide line for contributors.

We could use storage/memory/memory_test.go as a starting point.

Add Graph of Working Set to UI

Today we just show usage, we should also have working set to show hot usage.

Move Dockerfiles

Right now quickstart/ is really allowing for different versions what are Docker only or LMCTFY. Also, this would allow hub.docker.com to auto build because different versions can be tags.

Get a CI for cAdvisor Going

Just tests and building to start should be good :)

Export docker container name to influxdb

Switch cpu reporting to be proportional shares instead of cores

Most people don't have a mix of containers and processes at the same level. This makes reporting cpu usage in terms of cores hard.

One alternative is to report proportional share or percentage relative to parent. This works really well for /docker as we know exactly how the resources are being distributed.

Copy over cAdvisor dependencies into /vendor

Makes go build more reliable and consistent.

When building Dockerfile : Unable to locate package libprotobuf8

When I copy the Dockerfile and I build it :
docker build -t cadvisor .
E : Unable to locate package libprotobuf8

It works when I use docker pull google/cadvisor

Docker 0.9.1 , Ubuntu 14.04

Switch to glog

@vishh suggested this

healthcheck endpoint

Right now when I deploy cadvisor I just use the /containers/ endpoint to verify the container is up and running in a good state. However, it takes the page a few seconds to load and this makes the deploy time a bit slow across a cluster of docker hosts. It would be nice if you guys had an endpoint like /healthcheck or /status that returned a 200 for good and 50(0) for bad without having the overhead of having to load all of the graphs.

Request Less Data in the UI

As @vishh pointed out, we only need about 1s of data on each request. Today we request all data on each request and 90%+ of it is data we already had.

cadvisor for OpenVZ?

I use OpenVZ and i was wondering if there would be any opposition for me to add OpenVZ tracking to cadvisor. I like the tool, and love docker / lxc, but have an existing OpenVZ farm and want to be able to use newer tools with the older tech :)

Make cAdvisor work on systemd systems

This will fix some of the issues involved in running on systemd machines. Hopefully, this will make it work on all systemd machines.

I'll look through particular failures on CoreOS systems and add them here.

root

Profile housekeeping

We need to understand where CPU is going to in housekeeping to try to minimize that, we be doing some stupid things that we can easily fix.

Add a built-in image generator.

Could you please add a built-in image generator for cpu,memory,network,bandwidth,etc? So that it will be convenient for system admins to monitor their container.

Ignore containers without any running tasks

This will lighted our load. I don't think there are any cases where the data would change if there are no tasks.

Provide information about the processes running inside the container

A listing of processes with their argv may be enough for now. I don't think we're gonna want to split stats by process since we can't do the same accounting we can with containers. Our recommendation may be subcontainers.

Influxdb client lib import is wrong

Influxdb client now lives here: https://github.com/influxdb/influxdb/tree/master/client, which makes https://github.com/google/cadvisor/blob/master/storage/influxdb/influxdb.go#L25 and https://github.com/google/cadvisor/blob/master/storage/influxdb/influxdb_test.go#L24 cause the build to fail. I tried rebuilding with the import changed to github.com/influxdb/influxdb/client but didnt want to fight with other build failures in influxdb [1].

[1] byxorna@0d1c3f1

Incude a special driver for root to handle machine information

A special root driver would scan and report all resources available on the machine. This can include machine topology and h/w resources.

Fix limits for root container

Currently root container reports unlimited memory as limit and 1 core as cpu.

This breaks all usage reporting. Fix by making limit same as system resources.

Housekeep less often if the stats have not changed

This is to try to detect idle containers and housekeep less often to use less CPU. Imagine that we see a container has not changed stats in some amount of time, we start housekeeping every 2s instead of 1s.

Batch writes to InfluxDB by Docker host

I think we could see performance gains in ingestion to InfluxDB if the writes were batched up by docker host. Best I can tell, current implementation is to call WriteSeries once per container on the host, which gets very chatty very fast. Batching all containers on the host together before calling WriteSeries should give a nice performance gain in InfluxDB. My Go chops are insufficient to know how much work that represents though. :-)

For reference, our (admittedly limited) testing with Influx has shown that a 2 CPU / 2 GB VM on shared storage can ingest 12000+ metrics per second when they are sent in batches of 500 with no noticeable issues. Spinning up cadvisor on 9 docker hosts (not many containers each) and pointing them at an InfluxDB VM with twice the resources resulted in a significant load increase, high IO wait times, and pretty much breaks query ability.

Export network stats to influxdb backend

Network stats will be super useful.

Allow housekeeping time to be configurable

Make a flag for is so that the user can decide to get more/less frequent stats. Should help on more loaded systems that may not need the second granularity.

Things to fix for this:

UI to not assume 1s.
Make the housekeepings run at configurable intervals.

no container data at vagrant precise64

Using cAdvisor under my vagrant precise64 does not provide container data. All I see is this:

Navigating to /docker results in

Failed to get container "/docker" with error: json: cannot unmarshal number into Go value of type string

system info

$ uname -a
Linux precise64 3.2.0-23-generic #36-Ubuntu SMP Tue Apr 10 20:39:51 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux
$ docker -v
Docker version 1.1.0, build 79812e3

$ vagrant -v
Vagrant 1.6.2

virtualbox: 4.3.12

Using cAdvisor under coreos-vagrant works as expected and I get information about docker container data. Does anybody has a clue what is going on here?

All the best

How do you tell cAdvisor which influxdb nodes to use?
Should it push the data in realtime or on a timer?

Im guessing eventually this will be the basis for a plugin setup for allowing data to be pushed to other systems like ganglia and graphite.