intelsdi-x / snap-plugin-collector-docker Goto Github PK
View Code? Open in Web Editor NEWCollects Docker container runtime metrics
Home Page: http://snap-telemetry.io/
License: Apache License 2.0
Collects Docker container runtime metrics
Home Page: http://snap-telemetry.io/
License: Apache License 2.0
Snap version (use snapctl -v
):
test-4ac7b4d (docker container built from intelsdi/snap:xenial)
Environment:
uname -a
): 4.2.0-36-genericWhat happened:
plugin panics and crashes.
Steps to reproduce it (as minimally and precisely as possible):
I am unsure of why the panic is being triggered. Perhaps a race condition where a path existed but removed before du -s
is executed.
Anything else do we need to know (e.g. issue happens only occasionally):
time="2016-11-07T10:27:12Z" level=debug msg="panic: runtime error: index out of range" _module=plugin-exec io=stderr plugin=snap-plugin-collector-docker
time="2016-11-07T10:27:12Z" level=debug _module=plugin-exec io=stderr plugin=snap-plugin-collector-docker
time="2016-11-07T10:27:12Z" level=debug msg="goroutine 5 [running]:" _module=plugin-exec io=stderr plugin=snap-plugin-collector-docker
time="2016-11-07T10:27:12Z" level=debug msg="panic(0x88dd40, 0xc420012090)" _module=plugin-exec io=stderr plugin=snap-plugin-collector-docker
time="2016-11-07T10:27:12Z" level=debug msg="\t/home/travis/.gimme/versions/go1.7.1.linux.amd64/src/runtime/panic.go:500 +0x1a1" _module=plugin-exec io=stderr plugin=snap-plugin-collector-docker
time="2016-11-07T10:27:12Z" level=debug msg="github.com/intelsdi-x/snap-plugin-collector-docker/fs.diskUsage(0x9254e1, 0xf, 0xc42003aee8, 0xc42003ae78, 0x0)" _module=plugin-exec io=stderr plugin=snap-plugin-collector-docker
time="2016-11-07T10:27:12Z" level=debug msg="\t/home/travis/gopath/src/github.com/intelsdi-x/snap-plugin-collector-docker/fs/fs.go:371 +0x177" _module=plugin-exec io=stderr plugin=snap-plugin-collector-docker
time="2016-11-07T10:27:12Z" level=debug msg="github.com/intelsdi-x/snap-plugin-collector-docker/fs.(*DiskUsageCollector).worker.func1(0xb742f0, 0xc4200ffd00, 0x91dcac, 0x4, 0xc420012fd0, 0x1, 0x1)" _module=plugin-exec io=stderr plugin=snap-plugin-collector-docker
time="2016-11-07T10:27:12Z" level=debug msg="\t/home/travis/gopath/src/github.com/intelsdi-x/snap-plugin-collector-docker/fs/fs.go:105 +0x449" _module=plugin-exec io=stderr plugin=snap-plugin-collector-docker
time="2016-11-07T10:27:12Z" level=debug msg="created by github.com/intelsdi-x/snap-plugin-collector-docker/fs.(*DiskUsageCollector).worker" _module=plugin-exec io=stderr plugin=snap-plugin-collector-docker
time="2016-11-07T10:27:12Z" level=debug msg="\t/home/travis/gopath/src/github.com/intelsdi-x/snap-plugin-collector-docker/fs/fs.go:119 +0x80" _module=plugin-exec io=stderr plugin=snap-plugin-collector-docker
This plugin only builds on linux so we should provide a friendly warning when running the build on MacOS. The docker container build scripts have some hardcoded path assumption, and we should remove those assumptions (see influxdb example).
Remove
_ "github.com/docker/docker/vendor/src/github.com/opencontainers/runc/libcontainer/cgroups"
In our case there is a max number of shares overall that will be scheduled on a given machine. Shares per container would let us see the requested processing capacity per container and how the actual usage compares to that.
Plugin is not working properly on RHEL with Docker running cgroupfs as CGroup driver. In general handling of CGroup driver is incorrectly implemented. Additionally openlibcontainers/runc
lib is quite limiting enhancements to collected metrics.
Snap's metric types can use *
for wildcards in a namespace, which allows a one-to-many collection.
For example, the metric /intel/linux/docker/<container id>/cpu_stats/cpu_usage/percpu_usage/0
could be represented as/intel/linux/docker/*/cpu_stats/cpu_usage/percpu_usage/0
which would result in collecting percpu_usage/0
for all containers on a given host.
I am trying to install snap and docker plugin in the CoreOS cluster but it is not working. Is Snap currently supported for CoreOS operating system?
E.g. if you have two containers and you are collecting metrics with *, it will collect the metric 4x. After the PR on snap for allowing you to use anything to fill for a wildcard goes through, this should be fixed.
More logs to trace collection process are needed but they should be depended on the set log level. Such logs should have fields with full information what happens, for example:
logrus.WithFields(logrus.Fields{
"_block": "GetStatsFromContainer",
"_id": container_id,
}).Info("Getting stats from docker container")
It will be helpful for debug purpose.
Add cpuset subsystem to Docker Collector.
Following metrics would be very helpful for debugging Latency Critical workload performance:
Metrics description:
https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/6/html/Resource_Management_Guide/sec-cpuset.html
As this plugin references an old Snap build, the following error occurred when run:
snap-plugin-collector-docker/scripts/run_test_snap_plugin_collector_docker_darwin.sh
The error was:
20:38:25 # cd .; git clone https://code.google.com/p/go-decimal-inf.exp /tmp/gopath.IfIt0s/src/speter.net/go/exp/math/dec/inf
20:38:25 Cloning into '/tmp/gopath.IfIt0s/src/speter.net/go/exp/math/dec/inf'...
20:38:25 fatal: repository 'https://code.google.com/p/go-decimal-inf.exp/' not found
20:38:25 godep: error downloading dep (speter.net/go/exp/math/dec/inf): exit status 128
20:38:25 godep: Error downloading some deps. Aborting restore and check.
It would be good to tag docker metrics collected in Kubernetes environment with Kubernetes labels. Those labels are currently available as a separate metric /intel/docker/<docker_id>/spec/labels/<label_key>/value
. This metric shouldn't be removed thou from the list of metrics collected by this plugin.
Environment:
What happened:
Filesystem metrics are not collected for containers, ex. requesting /intel/docker/*/stats/filesystem/*/base_usage
returns 0 metrics
What you expected to happen:
Make it work :)
Anything else do we need to know (e.g. issue happens only occasionally):
I tried to debug this issue yesterday, worker thread is collecting stats for directory of interest (however, result might be incorrect because du -sx [dir]
is not traversing subdirectories nor its done manually in code). This code is riddle for me, some functions are left unused (i.e. updateContainerImagesPath
). Perhaps this functionality have been broken during one of refactors and this is rather platform independent.
The plugin cannot be loaded as its dynamic metric types don't have Name defined. Please see the error below:
NAP_PATH/bin/snapctl plugin load $SNAP_PATH/../../snap-plugin-collector-docker/build/rootfs/snap-plugin-collector-docker
Error loading plugin:
A dynamic element * requires a name for namespace /intel/linux/docker/*/cpu_stats/cpu_usage/total_usage.
If using auto-discovery flag to start Snap, listing loaded plugins, the snap-plugin-collector-docker was not shown but the metric list shown some metrics. For example:
intel/linux/docker/574c8997a830/memory_stats/stats/total_unevictable 3
/intel/linux/docker/574c8997a830/memory_stats/stats/total_writeback 3
/intel/linux/docker/574c8997a830/memory_stats/stats/unevictable 3
/intel/linux/docker/574c8997a830/memory_stats/stats/writeback 3
/intel/linux/docker/574c8997a830/memory_stats/swap_usage/failcnt 3
/intel/linux/docker/574c8997a830/memory_stats/swap_usage/max_usage 3
/intel/linux/docker/574c8997a830/memory_stats/swap_usage/usage 3
/intel/linux/docker/574c8997a830/memory_stats/usage/failcnt 3
/intel/linux/docker/574c8997a830/memory_stats/usage/max_usage 3
/intel/linux/docker/574c8997a830/memory_stats/usage/usage 3
/intel/mock/*/baz 1,2
/intel/mock/bar 1,2
/intel/mock/foo 1,2
Task would be disabled if collecting any of metric containing "docker".
Docker has built-in option to use host's procfs inside container, but when mounting procfs as volume shared from host or another container it is required to change path (for example to /hostproc or /proc2). Plugin do not support changing that path.
There is a lot of non-Task Manifest files in that folder. The folder needs reorganizing to make sense. See disk as an example.
Tests for client/client.go and fs/fs.go should be delivered. It seems that small code refactoring of these packages is needed to mock external lib methods.
Our docker collector should also support getting metrics from docker container running in Windows environment. Ideally with the same set of metrics as for Linux version.
The latest version of plugin exposes metric total
instead of total_usage
. Readme.md and Metrics.md is out-of-date. "Total_usage" is also used in tests what is misleading.
There should be a consequence in keeping metric namespace across plugin code, documentation, and tests. Also, unnecessary renaming should be avoided in the future.
cc: @intelsdi-x/snap-maintainers
Dears!
Something is wrong when I try to create task however I am not able to solve it.
No idea if configuration error or bug.
It would be great if you may offer any solution of this issue
or in better way some manifest and all other settings for correct run on Ubuntu 16.04 server.
Thanks for your reaction,
Best Regards,
Richard
P.S.: Docker CE is the latest same as on monitored system.
Snap daemon version (use snapteld -v
):
snaptel version 1.3.0
Environment:
uname -a
):What happened:
snaptel task hang in any case after few seconds.
No way to run it successfuly.
What you expected to happen:
snaptel task runs successfuly and send values to file-publisher.
Steps to reproduce it (as minimally and precisely as possible):
clear
echo "Intel Snap restarting..."
service snap-telemetry start
sleep 10s
sudo docker -v
rm -f snap-plugin-collector-docker
rm -f snap-plugin-publisher-file
wget http://snap.ci.snap-telemetry.io/plugins/snap-plugin-collector-docker/latest/linux/x86_64/snap-plugin-collector-docker
wget http://snap.ci.snap-telemetry.io/plugins/snap-plugin-publisher-file/latest/linux/x86_64/snap-plugin-publisher-file
chmod 755 snap-plugin-*
snaptel plugin load snap-plugin-collector-docker
snaptel plugin load snap-plugin-publisher-file
snaptel plugin list
snaptel metric list
//// curl -sfLO https://raw.githubusercontent.com/intelsdi-x/snap-plugin-collector-docker/master/examples/tasks/docker-file.json
snaptel task create -t docker-file.json
//// HANG ALLWAYS after few seconds
snaptel task list
Anything else do we need to know (e.g. issue happens only occasionally):
DOCKER:
Docker version 17.06.0-ce, build 02c1d87
FILE:
docker-file.json
{
"version": 1,
"schedule": {
"type": "simple",
"interval": "5s"
},
"workflow": {
"collect": {
"metrics": {
"/intel/docker//spec/": {},
"/intel/docker//stats/cgroups/cpu_stats/": {},
"/intel/docker//stats/cgroups/memory_stats/": {}
},
"config": {
"/intel/docker": {
"endpoint": "http://localhost:2375",
"procfs": "/proc"
}
},
"publish": [
{
"plugin_name": "file",
"config": {
"file": "/tmp/snap-docker-file.log"
}
}
]
}
}
}
FILE:
"/lib/systemd/system/docker.service"
[Unit]
Description=Docker Application Container Engine
Documentation=https://docs.docker.com
After=network-online.target docker.socket firewalld.service
Wants=network-online.target
Requires=docker.socket
[Service]
Type=notify
ExecStart=/usr/bin/dockerd -H fd:// -H=tcp://0.0.0.0:2375
ExecReload=/bin/kill -s HUP $MAINPID
LimitNOFILE=1048576
LimitNPROC=infinity
LimitCORE=infinity
TasksMax=infinity
TimeoutStartSec=0
Delegate=yes
KillMode=process
Restart=on-failure
StartLimitBurst=3
StartLimitInterval=60s
[Install]
WantedBy=multi-user.target
OUTPUT:
...
/intel/docker//stats/network//tx_dropped 8
/intel/docker//stats/network//tx_errors 8
/intel/docker//stats/network//tx_packets 8
Using task manifest to create task
Task created
ID: 663138f6-6b4b-423c-b9b3-e04f58e98390
Name: Task-663138f6-6b4b-423c-b9b3-e04f58e98390
State: Running
//// ID NAME STATE HIT MISS FAIL CREATED LAST FAILURE
//// 663138f6-6b4b-423c-b9b3-e04f58e98390 Task-663138f6-6b4b-423c-b9b3-e04f58e98390 //// Running 0 0 0 5:01AM 8-07-2017
grafana@ubuntu:/intelSnap$ snaptel task list/intelSnap$
//// ID NAME STATE HIT MISS FAIL CREATED LAST FAILURE
//// 663138f6-6b4b-423c-b9b3-e04f58e98390 Task-663138f6-6b4b-423c-b9b3-e04f58e98390 Running 2 0 2 5:01AM 8-07-2017 rpc error: code = 5 desc = open /proc/0/mountinfo: no such file or directory
grafana@ubuntu:
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.