Git Product home page Git Product logo

snap-plugin-collector-docker's Issues

panic when getting disk usage

Snap version (use snapctl -v):
test-4ac7b4d (docker container built from intelsdi/snap:xenial)

Environment:

  • Cloud provider or hardware configuration: Running in Kubernetes on baremetal
  • OS (e.g. from /etc/os-release): Ubuntu xenial
  • Kernel (e.g. uname -a): 4.2.0-36-generic
  • Relevant tools (e.g. plugins used with Snap):
  • Others (e.g. deploying with Ansible): Docker version 1.12.1

What happened:
plugin panics and crashes.

Steps to reproduce it (as minimally and precisely as possible):

I am unsure of why the panic is being triggered. Perhaps a race condition where a path existed but removed before du -s is executed.

Anything else do we need to know (e.g. issue happens only occasionally):

time="2016-11-07T10:27:12Z" level=debug msg="panic: runtime error: index out of range" _module=plugin-exec io=stderr plugin=snap-plugin-collector-docker 
time="2016-11-07T10:27:12Z" level=debug _module=plugin-exec io=stderr plugin=snap-plugin-collector-docker 
time="2016-11-07T10:27:12Z" level=debug msg="goroutine 5 [running]:" _module=plugin-exec io=stderr plugin=snap-plugin-collector-docker 
time="2016-11-07T10:27:12Z" level=debug msg="panic(0x88dd40, 0xc420012090)" _module=plugin-exec io=stderr plugin=snap-plugin-collector-docker 
time="2016-11-07T10:27:12Z" level=debug msg="\t/home/travis/.gimme/versions/go1.7.1.linux.amd64/src/runtime/panic.go:500 +0x1a1" _module=plugin-exec io=stderr plugin=snap-plugin-collector-docker 
time="2016-11-07T10:27:12Z" level=debug msg="github.com/intelsdi-x/snap-plugin-collector-docker/fs.diskUsage(0x9254e1, 0xf, 0xc42003aee8, 0xc42003ae78, 0x0)" _module=plugin-exec io=stderr plugin=snap-plugin-collector-docker 
time="2016-11-07T10:27:12Z" level=debug msg="\t/home/travis/gopath/src/github.com/intelsdi-x/snap-plugin-collector-docker/fs/fs.go:371 +0x177" _module=plugin-exec io=stderr plugin=snap-plugin-collector-docker 
time="2016-11-07T10:27:12Z" level=debug msg="github.com/intelsdi-x/snap-plugin-collector-docker/fs.(*DiskUsageCollector).worker.func1(0xb742f0, 0xc4200ffd00, 0x91dcac, 0x4, 0xc420012fd0, 0x1, 0x1)" _module=plugin-exec io=stderr plugin=snap-plugin-collector-docker 
time="2016-11-07T10:27:12Z" level=debug msg="\t/home/travis/gopath/src/github.com/intelsdi-x/snap-plugin-collector-docker/fs/fs.go:105 +0x449" _module=plugin-exec io=stderr plugin=snap-plugin-collector-docker 
time="2016-11-07T10:27:12Z" level=debug msg="created by github.com/intelsdi-x/snap-plugin-collector-docker/fs.(*DiskUsageCollector).worker" _module=plugin-exec io=stderr plugin=snap-plugin-collector-docker 
time="2016-11-07T10:27:12Z" level=debug msg="\t/home/travis/gopath/src/github.com/intelsdi-x/snap-plugin-collector-docker/fs/fs.go:119 +0x80" _module=plugin-exec io=stderr plugin=snap-plugin-collector-docker 

Expose cpu.shares specified per container

In our case there is a max number of shares overall that will be scheduled on a given machine. Shares per container would let us see the requested processing capacity per container and how the actual usage compares to that.

Cgroups driver incorrectly handled on RHEL

Plugin is not working properly on RHEL with Docker running cgroupfs as CGroup driver. In general handling of CGroup driver is incorrectly implemented. Additionally openlibcontainers/runc lib is quite limiting enhancements to collected metrics.

Make use of wildcards for metrics

Snap's metric types can use * for wildcards in a namespace, which allows a one-to-many collection.

For example, the metric /intel/linux/docker/<container id>/cpu_stats/cpu_usage/percpu_usage/0 could be represented as/intel/linux/docker/*/cpu_stats/cpu_usage/percpu_usage/0 which would result in collecting percpu_usage/0 for all containers on a given host.

CoreOS Support

I am trying to install snap and docker plugin in the CoreOS cluster but it is not working. Is Snap currently supported for CoreOS operating system?

Use logrus to logging in docker plugin

More logs to trace collection process are needed but they should be depended on the set log level. Such logs should have fields with full information what happens, for example:

logrus.WithFields(logrus.Fields{
    "_block": "GetStatsFromContainer",
    "_id": container_id,
}).Info("Getting stats from docker container")

It will be helpful for debug purpose.

Cannot build docker images

As this plugin references an old Snap build, the following error occurred when run:

snap-plugin-collector-docker/scripts/run_test_snap_plugin_collector_docker_darwin.sh 

The error was:

20:38:25 # cd .; git clone https://code.google.com/p/go-decimal-inf.exp /tmp/gopath.IfIt0s/src/speter.net/go/exp/math/dec/inf
20:38:25 Cloning into '/tmp/gopath.IfIt0s/src/speter.net/go/exp/math/dec/inf'...
20:38:25 fatal: repository 'https://code.google.com/p/go-decimal-inf.exp/' not found
20:38:25 godep: error downloading dep (speter.net/go/exp/math/dec/inf): exit status 128
20:38:25 godep: Error downloading some deps. Aborting restore and check.

Add Kubernetes labels as tags to docker metrics

It would be good to tag docker metrics collected in Kubernetes environment with Kubernetes labels. Those labels are currently available as a separate metric /intel/docker/<docker_id>/spec/labels/<label_key>/value. This metric shouldn't be removed thou from the list of metrics collected by this plugin.

Filesystem metrics not available

Environment:

  • Cloud provider or hardware configuration: bare metal with docker and kubernetes
  • Docker in version 1.12, unsure what was configuration of kubernetes cluster, maybe @mkuculyma can provide details if required

What happened:
Filesystem metrics are not collected for containers, ex. requesting /intel/docker/*/stats/filesystem/*/base_usage returns 0 metrics

What you expected to happen:
Make it work :)

Anything else do we need to know (e.g. issue happens only occasionally):
I tried to debug this issue yesterday, worker thread is collecting stats for directory of interest (however, result might be incorrect because du -sx [dir] is not traversing subdirectories nor its done manually in code). This code is riddle for me, some functions are left unused (i.e. updateContainerImagesPath). Perhaps this functionality have been broken during one of refactors and this is rather platform independent.

loading the plugin error

The plugin cannot be loaded as its dynamic metric types don't have Name defined. Please see the error below:

NAP_PATH/bin/snapctl plugin load $SNAP_PATH/../../snap-plugin-collector-docker/build/rootfs/snap-plugin-collector-docker
Error loading plugin:
A dynamic element * requires a name for namespace /intel/linux/docker/*/cpu_stats/cpu_usage/total_usage.

If using auto-discovery flag to start Snap, listing loaded plugins, the snap-plugin-collector-docker was not shown but the metric list shown some metrics. For example:

intel/linux/docker/574c8997a830/memory_stats/stats/total_unevictable         3
/intel/linux/docker/574c8997a830/memory_stats/stats/total_writeback          3
/intel/linux/docker/574c8997a830/memory_stats/stats/unevictable          3
/intel/linux/docker/574c8997a830/memory_stats/stats/writeback            3
/intel/linux/docker/574c8997a830/memory_stats/swap_usage/failcnt         3
/intel/linux/docker/574c8997a830/memory_stats/swap_usage/max_usage       3
/intel/linux/docker/574c8997a830/memory_stats/swap_usage/usage           3
/intel/linux/docker/574c8997a830/memory_stats/usage/failcnt              3
/intel/linux/docker/574c8997a830/memory_stats/usage/max_usage            3
/intel/linux/docker/574c8997a830/memory_stats/usage/usage            3
/intel/mock/*/baz                                1,2
/intel/mock/bar                                  1,2
/intel/mock/foo                                  1,2

Task would be disabled if collecting any of metric containing "docker".

No test for client and fs package

Tests for client/client.go and fs/fs.go should be delivered. It seems that small code refactoring of these packages is needed to mock external lib methods.

Support for docker on Windows

Our docker collector should also support getting metrics from docker container running in Windows environment. Ideally with the same set of metrics as for Linux version.

document the docker endpoint config

#59 and #62 is introducing a new endpoint config. We should remove this from the wishlist, document the default value, and provide example configs on how to use the tcp endpoint.

Metrics total_usage has been renamed (backward compatibility broken)

The latest version of plugin exposes metric total instead of total_usage. Readme.md and Metrics.md is out-of-date. "Total_usage" is also used in tests what is misleading.

There should be a consequence in keeping metric namespace across plugin code, documentation, and tests. Also, unnecessary renaming should be avoided in the future.

cc: @intelsdi-x/snap-maintainers

docker-collector hang up few second after startup

Dears!
Something is wrong when I try to create task however I am not able to solve it.
No idea if configuration error or bug.
It would be great if you may offer any solution of this issue
or in better way some manifest and all other settings for correct run on Ubuntu 16.04 server.

Thanks for your reaction,
Best Regards,
Richard

P.S.: Docker CE is the latest same as on monitored system.

Snap daemon version (use snapteld -v):
snaptel version 1.3.0

Environment:

  • Cloud provider or hardware configuration:
    Intel i7/32GBy RAM/SSD 240GBy Intel
  • OS (e.g. from /etc/os-release):
    Ubuntu 16.04 Server
  • Kernel (e.g. uname -a):
    Linux ubuntu 4.4.0-62-generic #83-Ubuntu SMP Wed Jan 18 14:10:15 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
  • Relevant tools (e.g. plugins used with Snap):
    snaptel plugin load snap-plugin-collector-docker
    snaptel plugin load snap-plugin-publisher-file

What happened:
snaptel task hang in any case after few seconds.
No way to run it successfuly.

What you expected to happen:
snaptel task runs successfuly and send values to file-publisher.

Steps to reproduce it (as minimally and precisely as possible):
clear
echo "Intel Snap restarting..."
service snap-telemetry start
sleep 10s
sudo docker -v
rm -f snap-plugin-collector-docker
rm -f snap-plugin-publisher-file
wget http://snap.ci.snap-telemetry.io/plugins/snap-plugin-collector-docker/latest/linux/x86_64/snap-plugin-collector-docker
wget http://snap.ci.snap-telemetry.io/plugins/snap-plugin-publisher-file/latest/linux/x86_64/snap-plugin-publisher-file
chmod 755 snap-plugin-*
snaptel plugin load snap-plugin-collector-docker
snaptel plugin load snap-plugin-publisher-file
snaptel plugin list
snaptel metric list
//// curl -sfLO https://raw.githubusercontent.com/intelsdi-x/snap-plugin-collector-docker/master/examples/tasks/docker-file.json
snaptel task create -t docker-file.json
//// HANG ALLWAYS after few seconds
snaptel task list

Anything else do we need to know (e.g. issue happens only occasionally):
DOCKER:
Docker version 17.06.0-ce, build 02c1d87

FILE:
docker-file.json
{
"version": 1,
"schedule": {
"type": "simple",
"interval": "5s"
},
"workflow": {
"collect": {
"metrics": {
"/intel/docker//spec/": {},
"/intel/docker//stats/cgroups/cpu_stats/": {},
"/intel/docker//stats/cgroups/memory_stats/": {}
},
"config": {
"/intel/docker": {
"endpoint": "http://localhost:2375",
"procfs": "/proc"
}
},
"publish": [
{
"plugin_name": "file",
"config": {
"file": "/tmp/snap-docker-file.log"
}
}
]
}
}
}

FILE:
"/lib/systemd/system/docker.service"
[Unit]
Description=Docker Application Container Engine
Documentation=https://docs.docker.com
After=network-online.target docker.socket firewalld.service
Wants=network-online.target
Requires=docker.socket

[Service]
Type=notify

ExecStart=/usr/bin/dockerd -H fd:// -H=tcp://0.0.0.0:2375

ExecReload=/bin/kill -s HUP $MAINPID
LimitNOFILE=1048576
LimitNPROC=infinity
LimitCORE=infinity
TasksMax=infinity
TimeoutStartSec=0
Delegate=yes
KillMode=process
Restart=on-failure
StartLimitBurst=3
StartLimitInterval=60s

[Install]
WantedBy=multi-user.target

OUTPUT:
...
/intel/docker//stats/network//tx_dropped 8
/intel/docker//stats/network//tx_errors 8
/intel/docker//stats/network//tx_packets 8
Using task manifest to create task
Task created
ID: 663138f6-6b4b-423c-b9b3-e04f58e98390
Name: Task-663138f6-6b4b-423c-b9b3-e04f58e98390
State: Running
//// ID NAME STATE HIT MISS FAIL CREATED LAST FAILURE
//// 663138f6-6b4b-423c-b9b3-e04f58e98390 Task-663138f6-6b4b-423c-b9b3-e04f58e98390 //// Running 0 0 0 5:01AM 8-07-2017
grafana@ubuntu:/intelSnap$ snaptel task list
//// ID NAME STATE HIT MISS FAIL CREATED LAST FAILURE
//// 663138f6-6b4b-423c-b9b3-e04f58e98390 Task-663138f6-6b4b-423c-b9b3-e04f58e98390 Running 2 0 2 5:01AM 8-07-2017 rpc error: code = 5 desc = open /proc/0/mountinfo: no such file or directory
grafana@ubuntu:
/intelSnap$

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.