Git Product home page Git Product logo

agentless-system-crawler's People

Contributors

basavaraju-g avatar brackendawson avatar canturkisci avatar cathyyoung avatar duglin avatar fabolive avatar gearoidibm avatar hitomitak avatar huang195 avatar kudva avatar mattshaw92 avatar mhbauer avatar mirskifa avatar nadgowdas avatar paulaldridge avatar peacocb avatar ricarkol avatar sahilsuneja1 avatar sastryduri avatar stefanberger avatar tatsuhirochiba avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

agentless-system-crawler's Issues

crawler dies with an exception

Description

Crawler died with following exception. It is possible that kafka server went down or otherwise unavailable. Crawler should log the failure but should continue without dying.

I notice that crawler is retrying max_emit_retries to emit and then raises exception which leads to its termination. My suggestion is that after max_emit_retries it should abort sending current message, and wait for next iteration, without dying. This approach helps in reducing maintenance associated with crawler restarts if there are intermittent network disruptions.

How to Reproduce

Log Output

Process crawler-1:
Traceback (most recent call last):
  File "/usr/local/lib/python2.7/multiprocessing/process.py", line 258, in _bootstrap
    self.run()
  File "/usr/local/lib/python2.7/multiprocessing/process.py", line 114, in run
    self._target(*self._args, **self._kwargs)
  File "crawler.py", line 61, in crawler_worker
    crawlutils.snapshot(**params)
  File "/crawler/crawlutils.py", line 324, in snapshot
    overwrite=overwrite
  File "/crawler/crawlutils.py", line 209, in snapshot_container
    ignore_exceptions=ignore_exceptions)
  File "/crawler/emitter.py", line 427, in __exit__
    self._publish(url)
  File "/crawler/emitter.py", line 403, in _publish
    self._publish_to_kafka(url, self.max_emit_retries)
  File "/crawler/emitter.py", line 373, in _publish_to_kafka
    raise e
EmitterEmitTimeout
Traceback (most recent call last):
  File "crawler.py", line 419, in <module>
    main()
  File "crawler.py", line 415, in main
    start_autonomous_crawler(args.numprocesses, args.logfile, params, options)
  File "crawler.py", line 116, in start_autonomous_crawler
    (pname, exitcode))
RuntimeError: crawler-1 terminated unexpectedly with errorcode 1
Traceback (most recent call last):
  File "/usr/local/lib/python2.7/multiprocessing/queues.py", line 268, in _feed
    send(obj)
IOError: [Errno 32] Broken pipe

Debugging Commands Output

Output of docker version:

docker version
Client:
 Version:      1.12.0
 API version:  1.24
 Go version:   go1.6.3
 Git commit:   8eab29e
 Built:        Thu Jul 28 22:00:36 2016
 OS/Arch:      linux/amd64

Server:
 Version:      1.12.0
 API version:  1.24
 Go version:   go1.6.3
 Git commit:   8eab29e
 Built:        Thu Jul 28 22:00:36 2016
 OS/Arch:      linux/amd64

Output of docker info:

docker info
Containers: 101
 Running: 100
 Paused: 0
 Stopped: 1
Images: 14
Server Version: 1.12.0
Storage Driver: aufs
 Root Dir: /var/lib/docker/aufs
 Backing Filesystem: extfs
 Dirs: 250
 Dirperm1 Supported: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
 Volume: local
 Network: host bridge overlay null
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Security Options: apparmor
Kernel Version: 3.16.0-76-generic
Operating System: Ubuntu 14.04.5 LTS
OSType: linux
Architecture: x86_64
CPUs: 4
Total Memory: 15.67 GiB
Name: crawltest.sl.cloud9.ibm.com
ID: 2AWA:7F3N:RFHD:QEKR:SGYP:JZLZ:VVQU:ORT6:CYLU:FNUZ:VBP7:Y2W5
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Username: sastryduri
Registry: https://index.docker.io/v1/
WARNING: No swap limit support
Insecure Registries:
 127.0.0.0/8

Output of python --version:

Python 2.7.6 (default, Jun 22 2015, 17:58:13)
[GCC 4.8.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.

Output of pip freeze:


UnboundLocalError in crawler if no volumes are mounted

Traceback (most recent call last):
File "/usr/lib/python2.7/multiprocessing/process.py", line 258, in _bootstrap
self.run()
File "/usr/lib/python2.7/multiprocessing/process.py", line 114, in run
self._target(_self._args, *_self._kwargs)
File "/opt/cloudsight/collector/crawler/crawler.py", line 62, in crawler_worker
crawlutils.snapshot(**params)
File "/opt/cloudsight/collector/crawler/crawlutils.py", line 495, in snapshot
container.link_logfiles(options=options)
File "/opt/cloudsight/collector/crawler/dockercontainer.py", line 134, in link_logfiles
logs_list = self._get_logfiles_list(host_log_dir, options)
File "/opt/cloudsight/collector/crawler/dockercontainer.py", line 365, in _get_logfiles_list
for log_src, log_dest in src_dest:
UnboundLocalError: local variable 'src_dest' referenced before assignment
Traceback (most recent call last):
File "/opt/cloudsight/collector/crawler/crawler.py", line 427, in
start_autonomous_crawler(args.numprocesses, args.logfile)
File "/opt/cloudsight/collector/crawler/crawler.py", line 105, in start_autonomous_crawler
(pname, exitcode))

travis trailing space error

I am getting travis build error, when using crawler asa submodule

./crawler/agentless-system-crawler/.travis.yml
16:16 error trailing spaces (trailing-spaces)
33:1 error trailing spaces (trailing-spaces)

--pluginmode is broken

Description

Crawler's --pluginmode is not respected. The crawler uses the --features arguments instead of using the plugins defined in teh config.

How to Reproduce

Log Output

Debugging Commands Output

Output of docker version:

(paste your output here)

Output of docker info:

(paste your output here)

Output of python --version:

(paste your output here)

Output of pip freeze:

(paste your output here)

Temporary issue while moving to plugin model

Description

for features crawled by the plugin (e.g. 'os'), and hence not actually part of the legacy features list, remove them from the feature list passed to _snapshot_single_frame() in crawlutils.py. Otherwise error in crawler-0.log (although correct output still show up in csv):

2016-09-16 20:03:45,470 crawler-0 ERROR 'os'
Traceback (most recent call last):
File "/root/agentless-system-crawler/crawler/crawlutils.py", line 66, in _snapshot_single_frame
for (key, val) in crawler.funcdictfeature:
KeyError: 'os'

How to Reproduce

sudo python crawler/crawler.py --crawlmode OUTCONTAINER --url file://output/test.csv --features os,cpu --logfile output/crawler.log

Log Output

2016-09-16 20:03:45,470 crawler-0 ERROR 'os'
Traceback (most recent call last):
File "/root/agentless-system-crawler/crawler/crawlutils.py", line 66, in _snapshot_single_frame
for (key, val) in crawler.funcdictfeature:
KeyError: 'os'

Debugging Commands Output

Output of docker version:

(paste your output here)

Output of docker info:

(paste your output here)

Output of python --version:

(paste your output here)

Output of pip freeze:

(paste your output here)

Add FeaturesCrawler unit tests for out of vm mode

Description

Add unit tests for OUTVM mode in features_crawler.py.

How to Reproduce

make unit should run the OUTVM unit tests.

Log Output

N/A

Debugging Commands Output

make unit
N/A

Output of docker info:
N/A

Output of python --version:
2.7

Output of pip freeze:
The important point here is that these unit tests should work even if the psvmi package is not installed.

"Final" code cleanup

Let's start by making pep8 happy.

  • pep8

Then, let's make it flake8 happy (max_complexity=10). There are some "is too complex" errors we are getting. Let's keep track of all of them here:

  • emitter.py
  • dockerutils.py
  • features_crawler.py
  • dockercontainer.py
  • containers.py
  • crawler.py
  • crawlutils.py

Make pylint happy

  • xxx divide this accordingly (pylint hates this codebase "Your code has been rated at 5.53/10")

After this, it would be great to be able to run in python3:

  • python3

Then, it would be awesome to get rid of our worker processes in favor of using python3 asyncio.

  • python3 asyncio and get rid of worker processes

And finally, let's complete the documentation:

  • documentation

aufs rootfs discovery error for crawler in container

When we run crawler inside container, it is failing with new mount path discovery method in dockerutils.

  File "/crawler/dockercontainer.py", line 109, in __init__
    self.root_fs = get_docker_container_rootfs_path(self.long_id)
  File "/crawler/dockerutils.py", line 306, in get_docker_container_rootfs_path
    raise DockerutilsException('Failed to get rootfs on aufs')
DockerutilsException: Failed to get rootfs on aufs

This occurs because, now we need /var/lib/docker mounted inside container for discovering.

Solution:
so just change the readme with new volume mount instruction

docker run \
                --privileged \
                --net=host \
                --pid=host \
                --name agentless-crawler\
                -v /cgroup:/cgroup:ro \
                -v /sys/fs/cgroup:/sys/fs/cgroup:ro \
                -v /var/lib/docker:/var/lib/docker:ro \
                -v /var/run/docker.sock:/var/run/docker.sock \
                -it -d crawler ...

make crawler (docker) event driven

Problem:

In the current system, crawler polls periodically to discover if any new container is created or existing container is deleted. If some container is created and deleted in-between this polling period, it would not be able to snapshot that container.

Propose Solution:

make crawler event-driven or more precisely interruptible. Register for docker events (externally in docker client) and when a new container is created, it gets interrupted and snapshot them immediately.

The crawl_os() feature is failing for alpine containers

This is what's happening:

  1. crawl_os() changes the namespaces (pid, mnt) to the one of the container
  2. it runs platform.platform() to get more info about the system
  3. platform.platform() forks somewhere in the middle (why?)
  4. that new forked process needs /usr/bin/python, but python is not installed by default on alpine
  5. the exception makes the whole os feature to not be collected

We should catch the exception on platform.platform() and continue with the os feature.

github template for issues and PRs

For Issues, I'm not sure what we need yet for troubleshooting:

  • container or directly running
  • version of python

PR template needs to explain the dco signing process.

Not all unit tests are passing on macOS

Description

Some unit tests like test_misc are not passing on a macOS because they do Linux specific things like open('/proc/%s/environ' % pid). These tests are missing some mocks (e.g. like open).

How to Reproduce

run make unit

Log Output

>       envlist = open('/proc/%s/environ' % pid).read().split('\000')
E       IOError: [Errno 2] No such file or directory: '/proc/82578/environ'

Debugging Commands Output

Output of docker version:


Output of docker info:

N/A

Output of python --version:

N/A

Output of pip freeze:

astroid==1.4.7
autopep8==1.2.4
awsebcli==3.7.7
backports.functools-lru-cache==1.2.1
backports.ssl-match-hostname==3.5.0.1
blessed==1.9.5
botocore==1.4.37
cement==2.8.2
colorama==0.3.7
configparser==3.5.0
coverage==4.1
dateutils==0.6.6
docker-py==1.8.1
dockerpty==0.4.1
docopt==0.6.2
docutils==0.12
flake8==2.6.2
funcsigs==1.0.2
ipaddress==1.0.16
isort==4.2.5
jmespath==0.9.0
kafka-python==0.9.2
kazoo==2.2.1
lazy-object-proxy==1.2.2
mccabe==0.5.0
mock==2.0.0
netifaces==0.10.4
pathspec==0.3.4
pbr==1.10.0
pep8==1.7.0
psutil==2.1.3
py==1.4.31
pycodestyle==2.0.0
pyflakes==1.2.3
pykafka==1.1.0
pylint==1.6.2
pytest==2.9.2
pytest-cov==2.3.0
python-dateutil==2.5.3
PyYAML==3.11
requests==2.5.0
semantic-version==2.5.0
simplejson==3.8.2
six==1.10.0
stevedore==1.16.0
tabulate==0.7.5
texttable==0.8.4
vboxapi==1.0
virtualenv==15.0.2
virtualenv-clone==0.2.6
virtualenvwrapper==4.7.1
wcwidth==0.1.7
websocket-client==0.37.0
wrapt==1.10.8
Yapsy==1.11.223```

memory leaks when emitting to kafka

Description

We observed memory leaks in worker processes when emitting to kafka

How to Reproduce

configure crawler to emit to kafka and monitor process size.

@ricarkol and Sastry Duri are working on it

Add docker utils get container ids method

Description

Methods to return a list of all container IDs without touching the docker socket.

How to Reproduce

N/A

Log Output

N/A

Debugging Commands Output

N/A

Will submit pull request.

Crawler crashing in process_is_crawler(pid) when monitored process disappears

Description

The crawler in OUTCONTAINER mode collects metrics for containers. A container is defined as any process tree where the pid namespace of their parent process is different than the init process pid namespace. According to that definition, the crawler process itself can be seen as a container when it setns into a container. To make sure that we don't crawl the crawler, the crawler makes a check: def process_is_crawler(pid). The issue with this call, is that there is a race where the monitored process disappears in the middle of the call. That situatoin shouldn't crash the crawler.

How to Reproduce

Run the crawler in OUTCONTAINER mode for an hour, while monitoring ~100 containers.

Log Output

Process crawler-0:
Traceback (most recent call last):
 File "/usr/local/lib/python2.7/multiprocessing/process.py", line 258, in _bootstrap
   self.run()
 File "/usr/local/lib/python2.7/multiprocessing/process.py", line 114, in run
   self._target(*self._args, **self._kwargs)
 File "crawler.py", line 62, in crawler_worker
   crawlutils.snapshot(**params)
 File "/crawler/crawlutils.py", line 486, in snapshot
   namespace)
 File "/crawler/containers.py", line 111, in get_filtered_list_of_containers
   for container in containers_list:
 File "/crawler/containers.py", line 57, in list_all_containers
   if misc.process_is_crawler(pid):
 File "/crawler/misc.py", line 88, in process_is_crawler
   cmdline = (proc.cmdline() if hasattr(proc.cmdline, '__call__'
 File "/usr/local/lib/python2.7/site-packages/psutil/__init__.py", line 551, in cmdline
   return self._proc.cmdline()
 File "/usr/local/lib/python2.7/site-packages/psutil/_pslinux.py", line 707, in wrapper
   raise NoSuchProcess(self.pid, self._name)
NoSuchProcess: process no longer exists (pid=6397)
Traceback (most recent call last):
 File "crawler.py", line 436, in <module>
   start_autonomous_crawler(args.numprocesses, args.logfile)
 File "crawler.py", line 113, in start_autonomous_crawler
   (pname, exitcode))
RuntimeError: crawler-0 terminated unexpectedly with errorcode 1
root@crawltest:~/exp6#

Debugging Commands Output

Output of docker version:

(paste your output here)

Output of docker info:

(paste your output here)

Output of python --version:

(paste your output here)

Output of pip freeze:

(paste your output here)

Our code is assuming that the docker daemon will use `/var/lib/docker` as the docker dir

Description

Our code is assuming that the docker daemon will use /var/lib/docker as the docker dir

How to Reproduce

Reinstall a docker daemon adn point it to a non-default docker dir, with DOCKER_OPTS="-g /somewhere/else/docker/"

Log Output

Debugging Commands Output

Output of docker version:

(paste your output here)

Output of docker info:

(paste your output here)

Output of python --version:

(paste your output here)

Output of pip freeze:

(paste your output here)

Would you consider a Python 3 version that leveragies asyncio?

Is the crawler waiting more for IO or it could CPU bound?

If the former, then asyncio might greatly simplify the code vs. the current multiprocessing approach which is generally more focused on CPU bound workloads. Python 2 support expires 3.5 years from now so porting ASC to Py3 would also help its longevity.

create emitter plugins

Description

How to Reproduce

Log Output

Debugging Commands Output

Output of docker version:

(paste your output here)

Output of docker info:

(paste your output here)

Output of python --version:

(paste your output here)

Output of pip freeze:

(paste your output here)

Get rid of platform_outofband for getting os_info

We are currently using a modified python platform module to get os info like linux distribution, versions, architecture, kernel version, etc. This module is too unreliable. It currently fails to distinguish an ubuntu from an alpine, or a centos from a redhat.

Perf improvement for DockerContainer objects

Currently, at each crawl iteration we instantiate a new DockerContainer object for each container. We do this irrespective of whether the container is actually new or not. The issue with this is that some plugins do a lot of work during DockerContianer objects initialization and therefore it is a lot of wasted work at every iteration. We should stop doing that and only instantiate container objects whenever the container is actually new.

Make it easier to visualize collected data

It would be awesome to have something like this:

make start-crawler-and-elk # or ./crawl.sh all or ./launch.sh all

Or something similar that starts a crawler for containers (in a container), and an ELK stack in a container (use https://github.com/cloudviz/crawler-elk-stack) receiving data from that crawler. The output of that make start-crawler-and-elk should be a URL for where to look at the pretty data (some kibana dashboard).

exception trace from crawler running in periodic mode when interruted

^CProcess crawler-0:
Traceback (most recent call last):
File "/usr/local/lib/python2.7/multiprocessing/process.py", line 258, in _bootstrap
Traceback (most recent call last):
File "crawler.py", line 427, in
self.run()
File "/usr/local/lib/python2.7/multiprocessing/process.py", line 114, in run
self._target(_self._args, *_self._kwargs)
File "crawler.py", line 62, in crawler_worker
start_autonomous_crawler(args.numprocesses, args.logfile)
File "crawler.py", line 112, in start_autonomous_crawler
crawlutils.snapshot(**params)

A crawler instance was started with frequency 1. When this crawler was terminated with -c
it throws an exception trace. It should exit more gracefully.

bash crawl.sh containers --features=os,cpu,interface,connection --frequency 1

File "/crawler/crawlutils.py", line 557, in snapshot
time.sleep(0.1)
time.sleep(time_to_sleep)
KeyboardInterrupt
KeyboardInterrupt
Process crawler-1:
Traceback (most recent call last):
File "/usr/local/lib/python2.7/multiprocessing/process.py", line 258, in _bootstrap
self.run()
File "/usr/local/lib/python2.7/multiprocessing/process.py", line 114, in run
self._target(_self._args, *_self._kwargs)
File "crawler.py", line 62, in crawler_worker
crawlutils.snapshot(**params)
File "/crawler/crawlutils.py", line 557, in snapshot
time.sleep(time_to_sleep)
KeyboardInterrupt

reading boolean values with configObj

Description

configObj may not be reading boolean values properly; see 'as_bool()' manual conversion in plugins_manager.py: get_plugin_args().

Two options:

  1. [FAILED] unrepr=True as argument in ConfigObj() config_parser.py: parse_crawler_config().
  2. [HAVEN'T TRIED] avoid_setns = boolean() for each plugin in config_spec_and_defaults.conf

How to Reproduce

Log Output

Debugging Commands Output

Output of docker version:

(paste your output here)

Output of docker info:

(paste your output here)

Output of python --version:

(paste your output here)

Output of pip freeze:

(paste your output here)

crawler cannot crawl containers created after crawler with aufs storage backend

Description

Crawler, when deployed as a docker container, cannot crawl containers created after crawler container if the storage back end is aufs. This is not a crawler problem as such. We think the problem lies with aufs driver. I am not sure.

How to Reproduce

btrfs storage backend -- docker version: 1.10.2 (ubuntu 14.04)

docker run --privileged --pid=host --net=host -v /:/hostroot:ro -it ubuntu:latest /bin/bash
[consider the docker container just created it as crawler container.]
[in crawler]
cd /hostroot/var/lib/docker/btrfs/subvolumes
ls -lrt 
[the last directory points to rootfs of the crawler container.]

[in host, create another container. This represents crawled container.]
docker run -it ubuntu bash

[in the crawler container]
ls -lrt
[last directory, represents the root of crawled container, if do ls you will see directory contents]

ls 3e812e171d964b3ea81cfd92b85afcf459286e5e0dc07e71beba4223bef33649
bin   dev  home  lib64  mnt  proc  run   srv  tmp  var
boot  etc  lib   media  opt  root  sbin  sys  usr

aufs storage backend docker: 1.12.0 (ubuntu 14.04)

docker run --privileged --pid=host --net=host -v /:/hostroot:ro -it ubuntu:latest /bin/bash
[consider the docker container just created it as crawler container.]
[in crawler]
cd /hostroot/var/lib/docker/aufs/mnt
ls -lrt
[last directory shows the root of crawler container.]

[in host]
docker run -it ubuntu bash

[in crawler container]
ls -lrt
ls last-dir (shows nothing)

[copy the directory of the crawler container, and exit the crawler container.]
[while keeping the crawled container active, create another instance of crawler container.]
docker run --privileged --pid=host --net=host -v /:/hostroot:ro -it ubuntu:latest /bin/bash
[consider the docker container just created it as crawler container.]
[in crawler]
cd /hostroot/var/lib/docker/aufs/mnt
ls last-dir (now shows contents of root directory)

With aufs storage backend we observed the same behavior with docker 1.10.2 as well.

make test fails on ubuntu 15.10

Running on ubuntu 15.10 the tests fail for me:

$ make test
docker build -t agentless-system-crawler-test -f Dockerfile.test .
Sending build context to Docker daemon 8.054 MB
Step 1 : FROM alpine:3.3
 ---> 184352182c50
Step 2 : RUN apk --no-cache add --repository http://dl-cdn.alpinelinux.org/alpine/edge/main libseccomp
 ---> Using cache
 ---> 3bf3b1051c6e
Step 3 : RUN apk --no-cache add --repository http://dl-cdn.alpinelinux.org/alpine/edge/community docker
 ---> Using cache
 ---> e7ecba0f96ff
Step 4 : RUN apk add --update     python     python-dev     py-pip     build-base     linux-headers     docker   && pip install --upgrade pip   && rm -rf /var/cache/apk/*
 ---> Using cache
 ---> f68df2385f16
Step 5 : COPY crawler/requirements.txt /requirements.txt
 ---> Using cache
 ---> 2210b776c254
Step 6 : COPY tests/requirements.txt /requirements-test.txt
 ---> Using cache
 ---> 3e31cb3f8039
Step 7 : RUN pip install -r requirements.txt && pip install -r requirements-test.txt
 ---> Using cache
 ---> a986b7cb8752
Step 8 : WORKDIR /crawler
 ---> Using cache
 ---> 64b667dc974d
Step 9 : COPY . /crawler
 ---> 397907bd286a
Removing intermediate container 625b2046cdf7
Step 10 : CMD (docker daemon > ../docker.out 2>&1 &) &&     sleep 5 &&  py.test
 ---> Running in cef39af7de40
 ---> 7ce55e8adb7c
Removing intermediate container cef39af7de40
Successfully built 7ce55e8adb7c
docker run --privileged -ti --rm agentless-system-crawler-test
============================= test session starts ==============================
platform linux2 -- Python 2.7.11, pytest-2.9.2, py-1.4.31, pluggy-0.3.1
rootdir: /crawler, inifile: 
collected 16 items 

tests/misc_test.py .
tests/test_crawler_container.py F
tests/test_dockerutils.py ...
tests/test_emitter.py ....
tests/test_features_crawler.py ..F.
tests/test_namespace.py ...

=================================== FAILURES ===================================
___________________ SingleContainerTests.testCrawlContainer ____________________

self = <tests.test_crawler_container.SingleContainerTests testMethod=testCrawlContainer>

    def testCrawlContainer(self):
        env = os.environ.copy()
        mypath = os.path.dirname(os.path.realpath(__file__))
        os.makedirs(self.tempd + '/out')

        #crawler itself needs to be root
        process = subprocess.Popen(
            [
                '/usr/bin/python2.7', mypath + '/../crawler/crawler.py',
                '--url', 'file://' + self.tempd + '/out/crawler',
                '--features', 'cpu,memory,interface',
                '--crawlContainers', 'ALL',
                '--format', 'graphite',
                '--crawlmode', 'OUTCONTAINER',
                '--numprocesses', '1'
            ],
            env=env)
        stdout, stderr = process.communicate()
        assert process.returncode == 0

        print stderr
        print stdout

        subprocess.call(['/bin/chmod', '-R', '777', self.tempd])

        files = os.listdir(self.tempd + '/out')
        assert len(files) == 1

        f = open(self.tempd + '/out/' + files[0], 'r')
        output = f.read()
        print output # only printed if the test fails
        assert 'interface-lo.if_octets.tx' in output
>       assert 'cpu-0.cpu-idle' in output
E       AssertionError: assert 'cpu-0.cpu-idle' in '172.17.0.2.focused_boyd.memory.memory-used 49152.000000 1466188061\r\n172.17.0.2.focused_boyd.memory.memory-buffered ...e-eth0.if_errors.tx 0.000000 1466188062\r\n172.17.0.2.focused_boyd.interface-eth0.if_errors.rx 0.000000 1466188062\r\n'

tests/test_crawler_container.py:66: AssertionError
----------------------------- Captured stdout call -----------------------------
None
None
172.17.0.2.focused_boyd.memory.memory-used 49152.000000 1466188061
172.17.0.2.focused_boyd.memory.memory-buffered 4096.000000 1466188061
172.17.0.2.focused_boyd.memory.memory-cached 4096.000000 1466188061
172.17.0.2.focused_boyd.memory.memory-free 475041792.000000 1466188061
172.17.0.2.focused_boyd.interface-lo.if_octets.tx 0.000000 1466188062
172.17.0.2.focused_boyd.interface-lo.if_octets.rx 0.000000 1466188062
172.17.0.2.focused_boyd.interface-lo.if_packets.tx 0.000000 1466188062
172.17.0.2.focused_boyd.interface-lo.if_packets.rx 0.000000 1466188062
172.17.0.2.focused_boyd.interface-lo.if_errors.tx 0.000000 1466188062
172.17.0.2.focused_boyd.interface-lo.if_errors.rx 0.000000 1466188062
172.17.0.2.focused_boyd.interface-eth0.if_octets.tx 0.000000 1466188062
172.17.0.2.focused_boyd.interface-eth0.if_octets.rx 0.000000 1466188062
172.17.0.2.focused_boyd.interface-eth0.if_packets.tx 0.000000 1466188062
172.17.0.2.focused_boyd.interface-eth0.if_packets.rx 0.000000 1466188062
172.17.0.2.focused_boyd.interface-eth0.if_errors.tx 0.000000 1466188062
172.17.0.2.focused_boyd.interface-eth0.if_errors.rx 0.000000 1466188062

______ FeaturesCrawlerTests.test_features_crawler_crawl_outcontainer_cpu _______

self = <tests.test_features_crawler.FeaturesCrawlerTests testMethod=test_features_crawler_crawl_outcontainer_cpu>

    def test_features_crawler_crawl_outcontainer_cpu(self):
        c = DockerContainer(self.container['Id'])
        crawler = FeaturesCrawler(crawl_mode='OUTCONTAINER', container=c)
>       for key, feature in crawler.crawl_cpu():

tests/test_features_crawler.py:55: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = <crawler.features_crawler.FeaturesCrawler instance at 0x7f94dfc026c8>
per_cpu = False

    def crawl_cpu(self, per_cpu=False):

        logger.debug('Crawling cpu information')

        if self.crawl_mode not in [
                Modes.INVM,
                Modes.OUTCONTAINER,
                Modes.OUTVM]:
            logger.error('Unsupported crawl mode: ' + self.crawl_mode +
                         '. Returning unknown memory key and attributes.'
                         )
            feature_attributes = CpuFeature(
                'unknown',
                'unknown',
                'unknown',
                'unknown',
                'unknown',
                'unknown',
                'unknown',
                'unknown',
            )

        host_cpu_feature = {}
        if self.crawl_mode in [Modes.INVM, Modes.OUTCONTAINER]:
            for (index, cpu) in \
                    enumerate(psutil.cpu_times_percent(percpu=True)):

                try:
                    idle = cpu.idle
                except Exception as e:
                    idle = 'unknown'
                try:
                    nice = cpu.nice
                except Exception as e:
                    nice = 'unknown'
                try:
                    user = cpu.user
                except Exception as e:
                    user = 'unknown'
                try:
                    wait = cpu.iowait
                except Exception as e:
                    wait = 'unknown'
                try:
                    system = cpu.system
                except Exception as e:
                    system = 'unknown'
                try:
                    interrupt = cpu.irq
                except Exception as e:
                    interrupt = 'unknown'
                try:
                    steal = cpu.steal
                except Exception as e:
                    steal = 'unknown'

                used = 100 - int(idle)

                feature_key = '{0}-{1}'.format('cpu', index)
                feature_attributes = CpuFeature(
                    idle,
                    nice,
                    user,
                    wait,
                    system,
                    interrupt,
                    steal,
                    used,
                )
                host_cpu_feature[index] = feature_attributes
                if self.crawl_mode == Modes.INVM:
                    try:
                        yield (feature_key, feature_attributes)
                    except Exception as e:
                        logger.error('Error crawling cpu information',
                                     exc_info=True)
                        raise CrawlError(e)

        if self.crawl_mode == Modes.OUTCONTAINER:

            if per_cpu:
                stat_file_name = 'cpuacct.usage_percpu'
            else:
                stat_file_name = 'cpuacct.usage'

            container = self.container

            try:
                (cpu_usage_t1, prev_time) = \
                    self._get_prev_container_cpu_times(container.long_id)

                if cpu_usage_t1:
                    logger.debug('Using previous cpu times for container %s'
                                 % container.long_id)
                    interval = time.time() - prev_time

                if not cpu_usage_t1 or interval == 0:
                    logger.debug(
                        'There are no previous cpu times for container %s '
                        'so we will be sleeping for 100 milliseconds' %
                        container.long_id)

                    with open(container.get_cpu_cgroup_path(stat_file_name),
                              'r') as f:
                        cpu_usage_t1 = f.readline().strip().split(' ')
                    interval = 0.1  # sleep for 100ms
                    time.sleep(interval)

                with open(container.get_cpu_cgroup_path(stat_file_name),
                          'r') as f:
                    cpu_usage_t2 = f.readline().strip().split(' ')

                # Store the cpu times for the next crawl

                self._save_container_cpu_times(container.long_id,
                                               cpu_usage_t2)
            except Exception as e:
                logger.error('Error crawling cpu information',
                             exc_info=True)
>               raise CrawlError(e)
E               CrawlError: [Errno 2] No such file or directory: u'docker/4e6e72c1dcc7ce36fd9cb4a7b703de42bce5d1cadece6d72bb9a271ef9943d2a/cpuacct.usage'

crawler/features_crawler.py:1125: CrawlError
----------------------------- Captured stderr call -----------------------------
No handlers could be found for logger "crawlutils"
==================== 2 failed, 14 passed in 152.71 seconds =====================
Makefile:9: recipe for target 'test' failed
make: *** [test] Error 1

Could this be due to me not having something setup properly?

cleanup config_spec_xxx.conf

Description

How to Reproduce

Log Output

Debugging Commands Output

Output of docker version:

(paste your output here)

Output of docker info:

(paste your output here)

Output of python --version:

(paste your output here)

Output of pip freeze:

(paste your output here)

'Make test' with alpine containers segfaults in musl libc with psvmi C <--> python interaction

Description

When agentless crawler is run in OUTVM mode, 'make test' segfaults for alpine containers. GDB says segfaults in apline's libc (musl). In the recent commit, the container OS has been changed to Ubuntu, which does not show this behaviour. Most probably not managing memory properly for interaction between psvmi's C core code and the python wrapper which the agentless crawler talks to.

How to Reproduce

make test (after changing FROM field to Alpine in Dockerfile.test)

Log Output

Debugging Commands Output

Output of docker version:

(paste your output here)

Output of docker info:

(paste your output here)

Output of python --version:

2.7

Output of pip freeze:

(paste your output here)

Crawler links of logs is not working with pathname pattern expansion

Description

How to Reproduce

Use the --linkLogFiles option in --crawlMode OUTCONTAINER and specify the list of log files as /var/log/*. You should get a link for all files under /var/log/, and it's not happening.

This was broken with commit fb4a230

Log Output

We are getting useless info logs (creating an issue to improve that).

Debugging Commands Output

Docker version 1.7.1, build 786b29d

Output of docker info:

Containers: 54
Images: 1351
Storage Driver: devicemapper
 Pool Name: docker-253:2-29492713-pool
 Pool Blocksize: 65.54 kB
 Backing Filesystem: extfs
 Data file: /dev/loop0
 Metadata file: /dev/loop1
 Data Space Used: 82.84 GB
 Data Space Total: 107.4 GB
 Data Space Available: 24.53 GB
 Metadata Space Used: 89.16 MB
 Metadata Space Total: 2.147 GB
 Metadata Space Available: 2.058 GB
 Udev Sync Supported: true
 Deferred Removal Enabled: false
 Data loop file: /var/lib/docker/devicemapper/devicemapper/data
 Metadata loop file: /var/lib/docker/devicemapper/devicemapper/metadata
 Library Version: 1.02.95-RHEL6 (2015-09-08)
Execution Driver: native-0.2
Logging Driver: json-file
Kernel Version: 3.10.94-1.el6.elrepo.x86_64
Operating System: <unknown>
CPUs: 4
Total Memory: 7.566 GiB
Name: oc1723680601.ibm.com
ID: EWRJ:NWQL:W4HL:PJUB:2IPZ:FQFP:JJXD:YZ5C:NW3S:UC5O:XNCL:I3IF
Username: kollerr
Registry: https://index.docker.io/v1/
WARNING: No swap limit support

Output of python --version:

Python 2.7.7

Output of pip freeze:

-psvmi==0.1
Babel==2.1.1
Jinja2==2.8
Markdown==2.6.4
MarkupSafe==0.23
Orange==2.7
PsVmi==1.0
PyHamcrest==1.8.1
PyQtX==0.1.2
Pycco==0.3.1
Pygments==2.0.2
Sphinx==1.3.1
Yapsy==1.11.223
alabaster==0.7.6
appdirs==1.4.0
argparse==1.3.0
astroid==1.4.7
autopep8==1.2.1
backports.functools-lru-cache==1.2.1
backports.ssl-match-hostname==3.5.0.1
baron==0.3.1
bottle==0.12.7
bz2file==0.98
colorama==0.3.1
configparser==3.3.0r2
coverage==4.1
cs==1.1.8
docker-py==1.8.1
docutils==0.12
doublex==1.8.2
flake8==2.6.2
funcsigs==1.0.2
icecli==2.0
ipaddress==1.0.16
isort==4.2.5
kafka-python==0.9.2
kazoo==2.2.1
lazy-object-proxy==1.2.2
matplotlib==1.4.3
mccabe==0.5.0
mock==2.0.0
netifaces==0.10.4
nose==1.3.6
numpy==1.9.2
pandas==0.17.1
pbr==1.10.0
pep8==1.6.2
pep8ify==0.0.13
psutil==2.1.3
py==1.4.26
py2-ipaddress==3.4.1
pyDoubles==1.8.1
pycodestyle==2.0.0
pyflakes==1.2.3
pyfmt==0.1
pykafka==1.1.0
pylint==1.6.2
pyparsing==2.0.3
pyrpm==0.3
pyrpm-02strich==0.5.5
pystache==0.5.4
pytest==2.6.4
pytest-cov==2.3.0
python-dateutil==2.4.2
python-qt==0.50
pytz==2015.7
requests==2.5.0
rply==0.7.4
seaborn==0.7.0
semantic-version==2.5.0
simplejson==3.8.2
six==1.10.0
smartypants==1.8.6
snowballstemmer==1.2.0
sphinx-rtd-theme==0.1.9
tabulate==0.7.5
websocket-client==0.37.0
wordgrapher==0.3.1
wrapt==1.10.8
wsgiref==0.1.2

enabled=True|False in crawler.conf

Description

Right now, the mention of a plugin name inside config.conf signifies the plugin 'enabled'. But it might make sense to add an 'enabled=True|False' field, so that the user doesn't have to keep removing/adding plugins and their associated fields on separate runs.

How to Reproduce

Log Output

Debugging Commands Output

Output of docker version:

(paste your output here)

Output of docker info:

(paste your output here)

Output of python --version:

(paste your output here)

Output of pip freeze:

(paste your output here)

Function redefinition in tests/unit/test_crawlutils.py

Description

test_snapshot_mesos() redefined in test_crawlutils.py

How to Reproduce

Log Output

Debugging Commands Output

Output of docker version:

(paste your output here)

Output of docker info:

(paste your output here)

Output of python --version:

(paste your output here)

Output of pip freeze:

(paste your output here)

The `--linkContainerlogs` arg is not working when running inside a container.

Description

the --linkContainerlogs arg is not working when running inside a container.

How to Reproduce

Log Output

Debugging Commands Output

Output of docker version:

(paste your output here)

Output of docker info:

(paste your output here)

Output of python --version:

(paste your output here)

Output of pip freeze:

(paste your output here)

Crawling docker containers with docker v. 1.7 is failing

The issue is that we expect the docker inspect to have the Mounts field, and in previous docker versions it was called Volumes. Also, Mounts is a list like this [{'Source::'/a', 'Destination':'/b'}, {'Source':'/c', 'Destination':'/d'}], and Volumes was just a big dictionary like: {'/a':'/b','/c':'/d'}.

docker 1.7.1:

 $ docker run -it -v /tmp:/host_tmp/ alpine sh
...
    "Volumes": {
        "/host_tmp": "/tmp"
    },

Add tests for --pluginmode

Description

How to Reproduce

Log Output

Debugging Commands Output

Output of docker version:

(paste your output here)

Output of docker info:

(paste your output here)

Output of python --version:

(paste your output here)

Output of pip freeze:

(paste your output here)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.