Git Product home page Git Product logo

cadvisor's Introduction

cAdvisor

test status

cAdvisor (Container Advisor) provides container users an understanding of the resource usage and performance characteristics of their running containers. It is a running daemon that collects, aggregates, processes, and exports information about running containers. Specifically, for each container it keeps resource isolation parameters, historical resource usage, histograms of complete historical resource usage and network statistics. This data is exported by container and machine-wide.

cAdvisor has native support for Docker containers and should support just about any other container type out of the box. We strive for support across the board so feel free to open an issue if that is not the case. cAdvisor's container abstraction is based on lmctfy's so containers are inherently nested hierarchically.

Quick Start: Running cAdvisor in a Docker Container

To quickly tryout cAdvisor on your machine with Docker, we have a Docker image that includes everything you need to get started. You can run a single cAdvisor to monitor the whole machine. Simply run:

VERSION=v0.49.1 # use the latest release version from https://github.com/google/cadvisor/releases
sudo docker run \
  --volume=/:/rootfs:ro \
  --volume=/var/run:/var/run:ro \
  --volume=/sys:/sys:ro \
  --volume=/var/lib/docker/:/var/lib/docker:ro \
  --volume=/dev/disk/:/dev/disk:ro \
  --publish=8080:8080 \
  --detach=true \
  --name=cadvisor \
  --privileged \
  --device=/dev/kmsg \
  gcr.io/cadvisor/cadvisor:$VERSION

cAdvisor is now running (in the background) on http://localhost:8080. The setup includes directories with Docker state cAdvisor needs to observe.

Note: If you're running on CentOS, Fedora, or RHEL (or are using LXC), take a look at our running instructions.

We have detailed instructions on running cAdvisor standalone outside of Docker. cAdvisor running options may also be interesting for advanced usecases. If you want to build your own cAdvisor Docker image, see our deployment page.

For Kubernetes users, cAdvisor can be run as a daemonset. See the instructions for how to get started, and for how to kustomize it to fit your needs.

Building and Testing

See the more detailed instructions in the build page. This includes instructions for building and deploying the cAdvisor Docker image.

Exporting stats

cAdvisor supports exporting stats to various storage plugins. See the documentation for more details and examples.

Web UI

cAdvisor exposes a web UI at its port:

http://<hostname>:<port>/

See the documentation for more details.

Remote REST API & Clients

cAdvisor exposes its raw and processed stats via a versioned remote REST API. See the API's documentation for more information.

There is also an official Go client implementation in the client directory. See the documentation for more information.

Roadmap

cAdvisor aims to improve the resource usage and performance characteristics of running containers. Today, we gather and expose this information to users. In our roadmap:

  • Advise on the performance of a container (e.g.: when it is being negatively affected by another, when it is not receiving the resources it requires, etc).
  • Auto-tune the performance of the container based on previous advise.
  • Provide usage prediction to cluster schedulers and orchestration layers.

Community

Contributions, questions, and comments are all welcomed and encouraged! cAdvisor developers hang out on Slack in the #sig-node channel (get an invitation here). We also have discuss.kubernetes.io.

Please reach out and get involved in the project, we're actively looking for more contributors to bring on board!

Core Team

Frequent Collaborators

Emeritus

cadvisor's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

cadvisor's Issues

cAdvisor using 70% cpu

I'll be happy to investigate this tomorrow. My container running cAdvisor (from the google/cadvisor from the index) is using 70% CPU. It is not affected by closing the web interface.

Here's a sample of the log:

2014/06/13 11:08:14 Get(/)
2014/06/13 11:08:14 Request took 17.088113ms
2014/06/13 11:08:15 Api - Container(/)
2014/06/13 11:08:15 Get(/)
2014/06/13 11:08:15 Request took 17.685755ms
2014/06/13 11:08:16 Api - Container(/)
2014/06/13 11:08:16 Get(/)
2014/06/13 11:08:16 Request took 21.602546ms
2014/06/13 11:08:17 Api - Container(/)
2014/06/13 11:08:17 Get(/)
2014/06/13 11:08:17 Request took 19.849205ms
2014/06/13 11:08:18 Api - Container(/)
2014/06/13 11:08:18 Get(/)
2014/06/13 11:08:18 Request took 24.266196ms
2014/06/13 11:08:19 Api - Container(/)
2014/06/13 11:08:19 Get(/)
2014/06/13 11:08:19 Request took 20.246778ms
2014/06/13 11:08:20 Api - Container(/)
2014/06/13 11:08:20 Get(/)
2014/06/13 11:08:20 Request took 34.460933ms
2014/06/13 11:08:21 Api - Container(/)
2014/06/13 11:08:21 Get(/)
2014/06/14 19:54:38 Housekeeping(/docker/f603ef03a9b80db8497e994f68e2aac773640042e91dc0989d2b1d1130a05707) took 133.555829ms
2014/06/14 19:55:20 Housekeeping(/docker/f603ef03a9b80db8497e994f68e2aac773640042e91dc0989d2b1d1130a05707) took 272.423621ms
2014/06/14 19:57:34 Housekeeping(/docker/f603ef03a9b80db8497e994f68e2aac773640042e91dc0989d2b1d1130a05707) took 247.436739ms
2014/06/14 19:57:35 Housekeeping(/docker/0da4c6826c07ad89a02b7c8e55267c9ce30ab2d3efddc66fa4b36a7a96352289) took 272.757455ms

I'll ^C+\ it tomorrow and check out the goroutines. Happy to fix this, I just wanted to get it recorded before I start.

Dasboard view

Hi
I had usability request : when graphs information displayed user should scroll down to see full information , it very hard to compare and find relations between different graphs - Root cause

I suggest to minimize graphs to able see and compare them in the same page and same time frame - like other monitor tool do (example zabbix, nagios ) dashboard

Thanks

Handle non-unified hierarchy starting point

On some systems (e.g.: latest Ubuntu), when running cAdvisor at the root the cgroups are non unified:

cpu,cpuacct: /user/blah
cpuset: /
blkio: /user/blah

This breaks us when we try to read the stats relative to ourselves. Although the world is going unified, we should not break in these cases and at least gracefully degrade.

Implement and use a raw driver

We should have a raw cgroup driver which just shows stats. That would add support for LXC as well as raw systemd systems.

cadvisor in a CentOS 6 host

I'm trying to use cadvisor in a CentOS 6 host with kernel 2.6.32-431.17.1.el6.x86_64 and docker-io 0.11.1-4.el6.x86_64. SELinux is disabled.

When I run:

docker run \
  --volume=/var/run:/var/run:rw \
  --volume=/sys/fs/cgroup/:/sys/fs/cgroup:ro \
  --volume=/var/lib/docker/:/var/lib/docker:ro \
  --publish=8080:8080 \
  --detach=true \
  google/cadvisor

it fails because /sys/fs/cgroup/ do not exist under CentOS 6. There is /cgroup/ instead.

So I'm trying with:

docker run \
  --volume=/var/run:/var/run:rw \
  --volume=/cgroup/:/sys/fs/cgroup:ro \
  --volume=/var/lib/docker/:/var/lib/docker:ro \
  --publish=8080:8080 \
  --detach=true \
  google/cadvisor

But now I get this:

lxc-start: No such file or directory - failed to mount '/cgroup' on '/usr/lib64/lxc/rootfs///sys/fs/cgroup'
lxc-start: failed to setup the mount entries for 'dfae50b907d3f916a5133a1e68d56623e2de709e43385acb9d665d307c3843f4'
lxc-start: failed to setup the container
lxc-start: invalid sequence number 1. expected 2
lxc-start: failed to spawn 'dfae50b907d3f916a5133a1e68d56623e2de709e43385acb9d665d307c3843f4'

¿Any idea?

Monitor process inside container

Hi
It will be grate to be able to see same information for specific process inside container - not only the container info

Thanks

Page fault and memory usage suggestions

A couple of suggestions:

  • The graph of memory includes the kernel cache, which is not actually used from the point of view of monitoring or applications. It may be helpful to separate out the cached and total memory usage graph. Perhaps three overlayed points, one for cache size, one for total, and one for (total - cache) to get the actual application-used amount.
  • The graph for Page Faults occasionally shows negative values. Judging from the graph of memory usage, it looks like this corresponds with kernel cache flushes. It would be more helpful to show two overlayed lines, one for actual cache misses and one for cache flushes. This way each could be made always positive and the ambiguity is eliminated.

wrong stats error when docker container restarted

Environment is CoreOS (beta channel)

When a systemd.service that starts a docker container is restarted, cadvisor starts spewing errors about wrong stats. For example:

Failed to update stats for container "/system.slice/locksmithd.service": wrong stats: current CPU usage is less than prev CPU usage

Container name is not presented

Hi
Container name is not presented, in main page only GUID
root docker 3ba08c450497df5facf35139a02cd1fd3038a3d424d6972531098ed531c9f644

Thanks.

Unit tests for storage driver implementations

As we added the storage driver interface, there might be several storage driver implementations. We could provide a set of unit tests so that any driver implementation must pass all these tests. This could help us to reduce the possible redundant code and provide a guide line for contributors.

We could use storage/memory/memory_test.go as a starting point.

Move Dockerfiles

Right now quickstart/ is really allowing for different versions what are Docker only or LMCTFY. Also, this would allow hub.docker.com to auto build because different versions can be tags.

Switch cpu reporting to be proportional shares instead of cores

Most people don't have a mix of containers and processes at the same level. This makes reporting cpu usage in terms of cores hard.

One alternative is to report proportional share or percentage relative to parent. This works really well for /docker as we know exactly how the resources are being distributed.

healthcheck endpoint

Right now when I deploy cadvisor I just use the /containers/ endpoint to verify the container is up and running in a good state. However, it takes the page a few seconds to load and this makes the deploy time a bit slow across a cluster of docker hosts. It would be nice if you guys had an endpoint like /healthcheck or /status that returned a 200 for good and 50(0) for bad without having the overhead of having to load all of the graphs.

Request Less Data in the UI

As @vishh pointed out, we only need about 1s of data on each request. Today we request all data on each request and 90%+ of it is data we already had.

cadvisor for OpenVZ?

I use OpenVZ and i was wondering if there would be any opposition for me to add OpenVZ tracking to cadvisor. I like the tool, and love docker / lxc, but have an existing OpenVZ farm and want to be able to use newer tools with the older tech :)

Make cAdvisor work on systemd systems

This will fix some of the issues involved in running on systemd machines. Hopefully, this will make it work on all systemd machines.

I'll look through particular failures on CoreOS systems and add them here.

Profile housekeeping

We need to understand where CPU is going to in housekeeping to try to minimize that, we be doing some stupid things that we can easily fix.

Add a built-in image generator.

Could you please add a built-in image generator for cpu,memory,network,bandwidth,etc? So that it will be convenient for system admins to monitor their container.

Influxdb client lib import is wrong

Influxdb client now lives here: https://github.com/influxdb/influxdb/tree/master/client, which makes https://github.com/google/cadvisor/blob/master/storage/influxdb/influxdb.go#L25 and https://github.com/google/cadvisor/blob/master/storage/influxdb/influxdb_test.go#L24 cause the build to fail. I tried rebuilding with the import changed to github.com/influxdb/influxdb/client but didnt want to fight with other build failures in influxdb [1].

[1] byxorna@0d1c3f1

Fix limits for root container

Currently root container reports unlimited memory as limit and 1 core as cpu.

This breaks all usage reporting. Fix by making limit same as system resources.

Housekeep less often if the stats have not changed

This is to try to detect idle containers and housekeep less often to use less CPU. Imagine that we see a container has not changed stats in some amount of time, we start housekeeping every 2s instead of 1s.

Batch writes to InfluxDB by Docker host

I think we could see performance gains in ingestion to InfluxDB if the writes were batched up by docker host. Best I can tell, current implementation is to call WriteSeries once per container on the host, which gets very chatty very fast. Batching all containers on the host together before calling WriteSeries should give a nice performance gain in InfluxDB. My Go chops are insufficient to know how much work that represents though. :-)

For reference, our (admittedly limited) testing with Influx has shown that a 2 CPU / 2 GB VM on shared storage can ingest 12000+ metrics per second when they are sent in batches of 500 with no noticeable issues. Spinning up cadvisor on 9 docker hosts (not many containers each) and pointing them at an InfluxDB VM with twice the resources resulted in a significant load increase, high IO wait times, and pretty much breaks query ability.

Allow housekeeping time to be configurable

Make a flag for is so that the user can decide to get more/less frequent stats. Should help on more loaded systems that may not need the second granularity.

Things to fix for this:

  • UI to not assume 1s.
  • Make the housekeepings run at configurable intervals.

no container data at vagrant precise64

Using cAdvisor under my vagrant precise64 does not provide container data. All I see is this:
screen shot 2014-08-07 at 11 35 35

Navigating to /docker results in

Failed to get container "/docker" with error: json: cannot unmarshal number into Go value of type string

system info

$ uname -a
Linux precise64 3.2.0-23-generic #36-Ubuntu SMP Tue Apr 10 20:39:51 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux
$ docker -v
Docker version 1.1.0, build 79812e3

$ vagrant -v
Vagrant 1.6.2

virtualbox: 4.3.12

Using cAdvisor under coreos-vagrant works as expected and I get information about docker container data. Does anybody has a clue what is going on here?

All the best

Add network metrics

Add network metrics
And a bandwidth usage status between given time period.

cadvisor command line opts vs env vars

I wonder if making the cadvisor container take environment vars instead of command line opts for things like -storage driver would make it easier to use with the upcoming docker deployment tools?

Track down memory leak

We reportedly grow to 180MB overnight, we must be putting a circular reference into one of our structures and not letting the garbage collector do its things (or we're not dumping some data for some reason)

allow data to be pushed to influxdb

  • How do you tell cAdvisor which influxdb nodes to use?
  • Should it push the data in realtime or on a timer?

Im guessing eventually this will be the basis for a plugin setup for allowing data to be pushed to other systems like ganglia and graphite.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.