python-diamond / diamond Goto Github PK

Diamond is a python daemon that collects system metrics and publishes them to Graphite (and others). It is capable of collecting cpu, memory, network, i/o, load and disk metrics. Additionally, it features an API for implementing custom collectors for gathering metrics from almost any source.

Home Page: http://diamond.readthedocs.org/

License: MIT License

Makefile 0.27% Python 98.96% Shell 0.76% Ruby 0.01%

diamond's People

Stargazers

Watchers

Forkers

robertpitt disqus mattrobenolt hvnsweeting analogue lacrymology checkmypi bmhatfield ubiquiti-cloud kevinrobinson zstyblik yo61 sanga sreekantht jaingaurav emerzh prune998 krbaker shaunduncan azidin nielsole piontas jamesmulcahy f3ew russss lannyripple jedblack zhulinhong huzhou colekowalski elevateops nihn semanticsos deejay1 aeronotix jpward1981 mfriedenhagen neofreko janisz jim-holmstroem lyrixx dever860 kngjle inthecloud247 ngrossmann bwbaugh alexeydeyneko benmmurphypublic chriscannon zaytsev fluxrad alaxli doubtingben imreactmd hanscj1 visioncritical leafknode yalinhuang benkelaar yelp jrxfive kenatbasis patricklucas himanchali danilochilene mizhka jumping agaoglu joel-airspring jackywu strawerry jimmysssun cannium psych0d0g pschuermann michaeldoyle hprotzek airtoxin zoni netuitive benheilman clever justin8 mikkopoyhonen mslovy atlaspilotpuppy kargeor ijaycho ipastushenko chronidev sn00pydog efocht whowgames tommysdk skywalka lmtwga lorraine19871987 ramjothikumar h00dy jriguera

diamond's Issues

Datadog - wrong timestamp send

Here is the log from diamond.log

[2015-03-20 17:11:37,573] [MainThread] Sending.. topic[rax.iostat.xvda.write_requests_merged_per_second], value[2], timestamp[1426866011]

the time when the data is send is 17:11:37. But the timestamp value is at 16:40:11

dpkg --purge removes /lib/init/upstart-job

As the title says, purging the diamond deb removes /lib/init/upstart-job. I had originally raised this in the Brightcove repo before realizing that this was active repo. I've just rebuilt a fresh deb package from this repository and the issue seems to still be present.

DiskSpaceCollector does not honour exclude_filters unless using /proc/mount

This config option is ignored unless the mounts are found in /proc/mounts. If psutil is used to determine the mounts, then no filtering occurs.

I have a fix for this which I'll push shortly.

Logfile is being created with wrong permissions.

Logfile created by diamond (/var/log/diamond.log, and rotated files) has wrong permissions by default (666). I think, that It should have 640 permissions by default.

CPUCollector doesn't honor percore config variable for psutil collections

If you don't have /proc/stat present, CPUCollector will fallback to using 'psutil' to get the CPU data.

Unfortunately, only the /proc/stat codepath checks the 'percore' config variable.

Easy fix; will push one shortly.

Can we get a new pip package? (4.0.41 is over 3 months old)

I installed the latest build from pip (4.0.41).

It contains a bug that is fixed in latest master. This is the commit: 023895b

Could a new pip package be pushed? Current pip is over 3 months old.

Thanks!

Also, pip 4.0.41 has a packaging bug of some kind. It contains this file:

/etc/diamond/collectors/CPUCollector - test2.conf

This stray file with spaces in the name is present in addition to the normal CPUCollector.conf file.

Cant install diamond rpm on centos 7 => Requires: python(abi) = 2.6

when i try to install diamond rpm with yum install i get the following error
python(abi) = 2.6 is needed by diamond-4.0.128-0.noarch.rpm

running environment:
centos 7
python 2.7.5

is there any way i can build the rpm and tell the requirements is python(abi) >= 2.6 ??

Some varnish stats get missed because of bad regular expression

Installation via Pip overwrites config

I moved this issue from the old repository to here, as it still bugs me.

I prefer to install Diamond via pip install (-U) diamond. However, after I do this, my configuration in /etc/diamond is overwritten, meaning that I have to re-enable all collectors again.

Is there something I can do to prevent this (apart from making backups)?

(Ubuntu 14.10, system Python 2.7)

Welcome!

@python-diamond/devs This is the best way I know to contact you folks. I really feel that Diamond is only successful due to all of your contributions and having no place to work together, we can't achieve as much as we could. I've given you guys commit access and hope you will directly work in the code base.

I'll keep merging code into the brightcove upstream repo, but my hope is eventually we'll move the authoritative upstream version to this group.

If you're not interested in commit access, please feel free to leave the group and feel free to continue to send traditional pull requests.

If you are interested in committing, please commit away and feel free to take an interest in any area of the code.

Thanks!

Issues with SNMP collection

Hello!

I'm working on testing out SNMP data collection and I am running into some issues that are leading me to believe that this feature might be broken. I have a small VM setup using the SNMPRawCollector as follows:

[[SNMPRawCollector]]
enabled = True
interval = 10
[[[devices]]]
[[[[localhost]]]]
host = localhost
port = 161
community = public
[[[[[oids]]]]]
<my_oid> = mymetric

There are a few behaviors I have noticed that are confusing me:

The configuration docs on this collector seem to be out of date and do not show proper config nesting (i.e. the docs show [devices] instead of [[[devices]]])
When the collector fails to run, it fails silently (perhaps this is just my fault in configuring diamond)
When the collector runs, it performs no action. I did a little digging and it seems that a removal of a get_schedule method may have been the culprit, but I don't know enough about the project internals to confirm this. Some manual work by adding in a collect method that did much of the work that get_schedule did seems to actually run the collector.

Really, I'm just curious if what I'm seeing is expected or if there is anyone else who has run into similar issues using SNMP collection.

Thanks!

Diamond hemorrhages memory when Graphite server is inaccessible

I recently ran into a situation on one of my hosts where Diamond 4.0 series was consuming over 3 GB of memory after being unable to connect to Graphite for a couple of hours -- RES in htop appears to grow by 2-3 MB per minute. I'm not sure if this is an issue with unbounded write buffer growth or leaked objects in connection handling, but it presents a serious threat to system stability.

mysql.py is not a good name for a collector

I tried to use the MySQLCollector defined in src/collectors/mysql/mysql.py.

Diamond has claimed to load this: Loaded Modules: mysql, but was not loading the class. For the mysql55.py collector, Loaded Modules: mysql55 was followed by Loaded Collector: mysql55.MySQLPerfCollector, but for the mysql.py, Loaded Collector: MySQLCollector never showed.

That was, because

[2015-01-22 16:35:09,344] [MainThread] Loaded Collector: mysql55.MySQLPerfCollector
open("/usr/lib/python2.7/dist-packages/mysql/__init__.py", O_RDONLY) = 4

so the mysql.py file has been shadowed due to include path issues.

Renaming the collector definition to mysqlcollector.py fixed the issue.

load collector config

I recently install Diamond to send metrics to statsD. All CPU, Memory, Network... metrics works except ProcessResourcesCollector.

I don't know why but it seems ProcessResourcesCollector does not load my ProcessResourcesCollector.conf.

In file /etc/diamond/diamond.conf, I have :
collectors_config_path = /etc/diamond/collectors/
[[ProcessResourcesCollector]]
enabled = True

In file /etc/diamond/collectors/ProcessResourcesCollector.conf :
enabled=True
unit=kB
cpu_interval=0.1
[process]
[[diamond]]
selfmon=True

And when I check config status after restart Diamond service it does not appear any process or unit in kB.

diamond-setup --print -C ProcessResourcesCollector
ProcessResourcesCollector {'path_suffix': '', 'ttl_multiplier': 2, 'measure_collector_time': False, 'byte_unit': ['byte'], 'instance_prefix': 'instances', 'interval': '10', 'enabled': True, 'metrics_whitelist': None, 'metrics_blacklist': None, 'path_prefix': '', 'path': 'process', 'unit': 'B', 'hostname_method': 'smart', 'process': {}}

And I have a questions also to know how collect metrics for all process running on a server ?

Thanks in advance

handler config in root config file does not work in 4.0

I upgraded my diamond to 4.0 but the config for handler in /etc/diamond/diamond.conf didn't work anymore. It used to work with 3.x.

Even I created /etc/diamond/handlers/ArchiveHandler.conf, diamond does not use it:

# cat /etc/diamond/handlers/ArchiveHandler.conf
[ArchiveHandler]
log_file = /var/cache/salt/minion/diamond.archive.log

log

Jan  4 10:21:01 integration-ubuntu diamond[30607] ERROR diamond diamond.main:299 Unhandled exception: [Errno 21] Is a directory: '/'

because it use the default path "", which turned to "/" by logging module code:

In [5]: os.path.abspath('')
Out[5]: '/'

Could you please provide a working example for handler config. Thanks

Regarding computation of concurrent_io

Hi,

I just found that the computation of concurrent_io is actually util_percentage in its decimal format.

In the following code,
metrics['concurrent_io'] = (metrics['reads_per_second']
+ metrics['writes_per_second']
) * (metrics['service_time']
/ 1000.0)

where (reads_per_second + writes_per_second) = iops = io / time_delta,
service_time = io_milliseconds / io

so concurrent_io is io_milliseconds / time_delta / 1000.0, which is actually util_percentage.

Travis build are failing

All builds PR from last 4 days are failing for same reason.

No handlers could be found for logger "bernhard"

Problem sending metrics from ipv6-only machine.

I try to post my metrics from ipv6-only machine.
Destination has both A and AAAA records.

It tries to send metric by ipv4 and fails.
[2014-05-29 16:45:45,949] [MainThread] GraphiteHandler: Failed to connect to graphite-test:2024. [Errno 101] Network is unreachable.

https://github.com/python-diamond/Diamond/blob/master/src/diamond/handler/graphite.py#L171

CPUCollector not showing the right information

Hello,

So far, I'm having issues with CPUCollector.

Here is my setup:
OS: Ubuntu 14.10
Python Version: 2.7.8
Installed packages:

argparse==1.2.1
configobj==5.0.6
diamond==4.0.41
psutil==2.2.1
python-statsd==1.7.2
six==1.9.0
wsgiref==0.1.2

CPUCollector config:
(armory)~/D/A/diamond ❯❯❯ diamond-setup -c diamond.conf --print -C CPUCollector
CPUCollector {'path_suffix': '', 'ttl_multiplier': 2, 'measure_collector_time': False, 'percore': 'False', 'byte_unit': ['byte'], 'instance_prefix': 'instances', 'simple': 'True', 'interval': '60', 'enabled': True, 'xenfix': None, 'metrics_whitelist': None, 'normalize': 'False', 'metrics_blacklist': None, 'path_prefix': 'servers', 'path': 'cpu'}

Log with the issue:
[2015-04-18 20:26:41,088] [MainThread] servers.mars.cpu2.total.idle: 369.2
[2015-04-18 20:27:41,136] [MainThread] Sending idle 92.95|g
[2015-04-18 20:27:41,136] [MainThread] servers.mars.cpu2.cpu1.idle: 92.95
[2015-04-18 20:27:41,139] [MainThread] Sending idle 92.2833333333|g
[2015-04-18 20:27:41,139] [MainThread] servers.mars.cpu2.cpu0.idle: 92.2833333333
[2015-04-18 20:27:41,142] [MainThread] Sending idle 93.15|g
[2015-04-18 20:27:41,142] [MainThread] servers.mars.cpu2.cpu2.idle: 93.15
[2015-04-18 20:27:41,144] [MainThread] Sending idle 91.5666666667|g
[2015-04-18 20:27:41,144] [MainThread] servers.mars.cpu2.cpu3.idle: 91.5666666667
[2015-04-18 20:27:41,148] [MainThread] Sending idle 369.95|g
[2015-04-18 20:27:41,148] [MainThread] servers.mars.cpu2.total.idle: 369.95

diamond.conf
[[CPUCollector]]
enabled = True
percore = False
simple = True

(I tried configure CPUCollector.conf)

My /proc/stat file:

keys = ...

(armory)~/D/A/diamond ❯❯❯ cat /proc/stat
cpu 490759 1159 134813 11108860 125171 11 2189 0 0 0
cpu0 125085 166 38294 2767799 28333 9 914 0 0 0
cpu1 118958 204 29800 2777483 36090 1 981 0 0 0
cpu2 122389 358 34144 2780482 31344 0 164 0 0 0
cpu3 124326 430 32574 2783095 29403 0 130 0 0 0
intr 18709390 17 3 0 0 0 0 0 0 1 0 0 0 4 0 0 0 33 0 1561044 0 0 0 0 37 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 3709636 306618 555 13 596 765 3522476 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
ctxt 149248245
btime 1429370024
processes 27783
procs_running 1
procs_blocked 0
softirq 11021933 1274264 2395017 5 566309 307362 0 3675328 1823649 11967 968032

I tried disable percore, use simple, tried normalize also but always getting the same value.

Any idea how I can fix this?
The ideal should return only %, example below with psutil:

In [2]: psutil.cpu_times_percent()
Out[2]: scputimes(user=4.5, nice=0.0, system=0.6, idle=90.6, iowait=4.2, irq=0.0, softirq=0.0, steal=0.0, guest=0.0, guest_nice=0.0)

Best Regards,

Diamond needs to handle clock forwards caused by VM suspend/resumes

This applied to the 4.0 series.

If you suspend a VM, and resume it later, diamond sees the difference as a lot of missed runs and attempts to make up for it, causing high CPU usage and running a lot of collectors.

A tcpdump from a VM setup to discard diamond output:

[vagrant@local-puppet-001 ~]$ sudo tcpdump -c 5000 -i lo port 9 -s0 -A | grep cpu.total.user | sort | uniq -c
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on lo, link-type EN10MB (Ethernet), capture size 65535 bytes
5000 packets captured
10092 packets received by filter
65 packets dropped by kernel
1 .:j..:j.servers.office.local-puppet-001.cpu.total.user 0 1428568289
1 servers.office.local-puppet-001.cpu.total.user 0
1 servers.office.local-puppet-001.cpu.total.user 0 142856
18 servers.office.local-puppet-001.cpu.total.user 0 1428568274
33 servers.office.local-puppet-001.cpu.total.user 0 1428568275
30 servers.office.local-puppet-001.cpu.total.user 0 1428568276
30 servers.office.local-puppet-001.cpu.total.user 0 1428568277
24 servers.office.local-puppet-001.cpu.total.user 0 1428568278
36 servers.office.local-puppet-001.cpu.total.user 0 1428568279
36 servers.office.local-puppet-001.cpu.total.user 0 1428568280
34 servers.office.local-puppet-001.cpu.total.user 0 1428568281
29 servers.office.local-puppet-001.cpu.total.user 0 1428568282
30 servers.office.local-puppet-001.cpu.total.user 0 1428568283
37 servers.office.local-puppet-001.cpu.total.user 0 1428568284
35 servers.office.local-puppet-001.cpu.total.user 0 1428568285
27 servers.office.local-puppet-001.cpu.total.user 0 1428568286
41 servers.office.local-puppet-001.cpu.total.user 0 1428568287
44 servers.office.local-puppet-001.cpu.total.user 0 1428568288
42 servers.office.local-puppet-001.cpu.total.user 0 1428568289
34 servers.office.local-puppet-001.cpu.total.user 0 1428568290
40 servers.office.local-puppet-001.cpu.total.user 0 1428568291
36 servers.office.local-puppet-001.cpu.total.user 0 1428568292
30 servers.office.local-puppet-001.cpu.total.user 0 1428568293
1 servers.office.local-puppet-001.cpu.total.user 1 1428568286

That's about 30 runs of the CPU collector every second for a few seconds.

To run a single collector to test and exit doesn't work.

    parser.add_option("-r", "--run",
                      dest="collector",
                      default=None,
                      help="run a given collector once and exit")

This feature is not working anymore and it executes all enabled collector instead of only running single collector and exit. This part can be seen removed in commit 1e49cdf#diff-96492ac5a0fe3a4c6b33a1e1ddc0f34b .
Any particular reason? Can I send a pull request and add it back ?

Add Collector for inode Usage

Hi,

I think it does not exists yet, so here is a little request,

is it possible to have a collector to monitor Inode usage ..like the result of a df -i ....

i got a server that crashed today because of that ...

Thanks a lot :)

"pip install diamond" failed (CentOS 7, diamond 4.0.41)

Hello,
I can not install Diamond through "pip" in CentOS 7
Installation fails with the error:

...
error: can't copy 'rpm/systemd/diamond.service': doesn't exist or not a regular file
...

File setup.py in package diamond-4.0.41.tar.gz contains:

if distro in ['centos', 'redhat', 'debian', 'fedora']:
            data_files.append(('/etc/init.d',
                               ['bin/init.d/diamond']))
            data_files.append(('/var/log/diamond',
                               ['.keep']))
>>>            if distro_major_version >= '7' and not distro == 'debian':
                data_files.append(('/usr/lib/systemd/system',
                                   ['rpm/systemd/diamond.service']))
            elif distro_major_version >= '6' and not distro == 'debian':
                data_files.append(('/etc/init',
                                   ['rpm/upstart/diamond.conf']))

but there is no file "rpm/systemd/diamond.service" in the archive

Remove dependency on pyrabbit for RabbitMQ Collector

This can just be replaced with stdlib tools. No reason to require this dependency.

memory collectors uses wrong property to get actual memory available

According to documentation (http://pythonhosted.org/psutil/#psutil.virtual_memory) free isn't the right property to read amount of available memory - https://github.com/python-diamond/Diamond/blob/master/src/collectors/memory/memory.py#L115

free: memory not being used at all (zeroed) that is readily available; note that this doesn’t reflect the actual memory available (use ‘available’ instead).

DatadogHandler - throwing AttributeError: 'DatadogHandler' object has no attribute 'queue' exception

I've done a debian installation.
I try to use the DatadogHandler by adding handlers = diamond.handler.datadog.DatadogHandler to my /etc/diamond/diamond.conf I get the following exception (found in /var/log/diamond/diamond.log);

[2015-03-19 16:57:40,560] [MainThread] Traceback (most recent call last):
  File "/usr/lib/pymodules/python2.7/diamond/handler/Handler.py", line 72, in _process
    self.process(metric)
  File "/usr/lib/pymodules/python2.7/diamond/handler/datadog.py", line 83, in process
    self.queue.append(metric)
AttributeError: 'DatadogHandler' object has no attribute 'queue'

Debian package does not create group for diamond user

I am not sure if this is intended behavior, but currently creating a debian package and installing creates a 'diamond' user but does not explicitly place that user into a group. I would expect that a group 'diamond' would also be created. I have not looked at either the RHEL/gentoo packaging to see if this is the same behavior. Is not having an explicit group created expected?

Thanks!

Service does not start on modern RPM systems (CentOS 7 and Fedora 21)

Installed an rpm from HEAD.

Running systemctl start diamond gives:

Failed to start diamond.service: Connection timed out

Nothing goes into the logs

After running diamond -f -c /etc/diamond/diamond.conf the log contains:

[2015-01-03 18:35:39,751] [MainThread] Changed UID: 0 () GID: 0 ().
[2015-01-03 18:35:43,075] [MainThread] Unhandled exception: 'list' object has no attribute 'split'
[2015-01-03 18:35:43,077] [MainThread] traceback: Traceback (most recent call last):
  File "/bin/diamond", line 291, in main
    server.run()
  File "/usr/lib/python2.7/site-packages/diamond/server.py", line 96, in run
    handlers = handlers.split(',')
AttributeError: 'list' object has no attribute 'split'

cloudwatchHandler failed to compare metrics

I'm trying the cloudwatchHandler but for some reason I can't make it work.
The connection with cloudwatch works fine, but during the cloudwatch.process function , metric.getMetricPath() return nothing so the comparison fails without any log message.

diamond[13010] DEBUG diamond cloudwatch.process:186 Comparing Collector: [loadavg] with (loadavg) and Metric: [01] with ()

Here is my config


[[cloudwatchHandler]]
region = us-west-2

[[[LoadAvg01]]]
collector = loadavg
metric = 01
namespace = os
name = Avg01
unit = None

[collectors]

[[default]]
enabled = False
# path_suffix = os

[[LoadAverageCollector]]
enabled = True

Also, I probably misinterpreted the config with the namespace, but if I add path_suffix = os to my collectors, the comparison log:

Comparing Collector: [loadavg] with (os) and Metric: [01] with ()

PyPi link to documentation needs to be updated

On https://pypi.python.org/pypi/diamond/4.0.41 the 'Package Documentation' link points to https://github.com/BrightcoveOS/Diamond/wiki instead of https://github.com/python-diamond/Diamond/wiki

i

Names of some metrics prefixed with dot

Seems that Collector.get_metric_path is guilty. Last return in this method return '.'.join([prefix, path, name]) uses prefix which is blank in these cases. Sample metrics from StatsD's logs:

6 Mar 09:16:58 - DEBUG: .memcached.localhost.threads:4|g
6 Mar 09:16:58 - DEBUG: .network.eth0.tx_frame:0.0|g
6 Mar 09:16:58 - DEBUG: .iostat.sda5.writes_byte:2920448.0|g
6 Mar 09:16:58 - DEBUG: .diskspace._var.inodes_avail:6803010|g
6 Mar 09:16:58 - DEBUG: .loadavg.processes_total:806|g

It causes some issues later on when in e.g. relay-rules.conf for Graphite regexp needs to be added as for some metrics we have ".." inside.

Exceptions in collect() method are buried silently

I have the log level at DEBUG, but exceptions inside a custom collector's collect() method are not reported anywhere in the log. This makes it quite hard to track down issues.

MemAvailable Not Being Collected

Hello,

I'd really like to track the MemAvailable metric in Graphite, but when I set detailed = True in my collector configuration I do not see it in the list of metrics. I'm running the HEAD of the master branch. Is there something else I need to enable to collect that metric?

Thanks,
Chris

Update PyPi package?

Would it be possible to cut a release to update the PyPi package? did a pip install diamond yesterday and it installed the BrightCove version.

Tests fail when collector import other collectors

The server adds every collector directory to sys.path, but the test suite doesn't. This leads to errors in the test when collectors import other collectors, which don't fail in production.

I don't know if importing other collectors is "good form", but memory_docker imports memory_cgroup so it's test was failing with E.

Here's a very dirty patch that fixes the issue, but there should probably be more checking involved before adding the directories to the path.

diff --git a/test.py b/test.py
index b88517d..3245617 100755
--- a/test.py
+++ b/test.py
@@ -34,7 +34,11 @@ sys.path.append(os.path.abspath(os.path.join(os.path.dirname(
                                              'src')))
 sys.path.append(os.path.abspath(os.path.join(os.path.dirname(__file__),
                                              'src', 'collectors')))
-
+from glob import glob
+for coldir in glob(os.path.abspath(os.path.join(os.path.dirname(__file__),
+                                                'src', 'collectors', '*'))):
+    if os.path.isdir(coldir):
+        sys.path.append(coldir)

 def run_only(func, predicate):
     if predicate():

Please comment

rhel5 make rpm failure - Python 2.4 compatibility

byte-compiling /var/tmp/diamond-4.0.87-0-buildroot/usr/lib/python2.4/site-packages/diamond/utils/classes.py to classes.pyc
File "/usr/lib/python2.4/site-packages/diamond/utils/classes.py", line 151
except (KeyboardInterrupt, SystemExit) as err:
^
SyntaxError: invalid syntax

In Python 2.5 and earlier, use the comma version, since as isn't supported.

Allow IAM role authentication for the SQS Collector

The SQS collector should be able to use IAM role authentication, similar to how the ELB collector already allows.

ElasticSearchCollector feature request

Is it possible to add merge statistic support to the elasticsearch collector? The url for it is:
curl http://localhost:9200/_stats/merge?pretty=true

I am sure this is pretty trivial, I just do not know how to do it.

Problem with InfluxdbHandler

I have installed diamond and influxdb, both running, my diamonf conf setup is:

# Handlers for published metrics.
handlers = diamond.handler.graphite.GraphiteHandler, diamond.handler.archive.ArchiveHandler, diamond.handler.influxdbHandler.InfluxdbHandler

[[InfluxdbHandler]]
### Options for InfluxdbHandler
hostname = localhost
database = graphite
batch_size = 1
username = diamond
password = diamond
ssl = False
port = 8086

# Default Poll Interval (seconds)
interval = 10

The rest is all default conf.

I Have no problem accessing influxDB web interface and create databases and do some post... But for some reason diamond is not writing any data to infliuxDB, Am I missing something?

Long running collector shifts time overtime

Hello,
I've been waiting for new threading model hoping it will somehow solve a problem that I'm having with one of my custom collectors.
My collector runs for about 500-600 ms and due to that each publish/collection is shifted by this amount.
And due to this shift I'm having every so often gaps in graphite.

This is what it looks like in diamond.log

[2014-10-16 09:08:35,659] [Thread-1] servers.test.collector_time_ms 579 1413450515 [2014-10-16 09:08:51,211] [Thread-1] servers.test.collector_time_ms 551 1413450531 [2014-10-16 09:09:06,761] [Thread-1] servers.test.collector_time_ms 548 1413450546 [2014-10-16 09:09:22,403] [Thread-1] servers.test.collector_time_ms 639 1413450562

Is there a way for me to public metric with a timestamp of collector startup rather than actual
publish time ?

Thanks
m

PortStat - Counter not available in 2.6

from collections import Counter from here does not work in python 2.6 as it was implemented in 2.7

A backported version of Counter can be located here. It might be good to include this somewhere in the project as a fallback in case the collections.Counter can't be loaded.

Kafka tests are failing

I don't even know what kafka is, but we're getting this in our travis builds:

FAIL: test (testkafka.TestKafkaCollector)
----------------------------------------------------------------------
Traceback (most recent call last):
    File "/home/travis/virtualenv/python2.7.6/lib/python2.7/site-packages/mock.py", line 1201, in patched
        return func(*args, **keywargs)
    File "/home/travis/build/python-diamond/Diamond/src/collectors/kafka/test/testkafka.py", line 177, in test
        self.assertPublishedMany(publish_mock, expected_metrics)
    File "/home/travis/build/python-diamond/Diamond/test.py", line 186, in assertPublishedMany
        self.assertPublished(mock, key, value, expected_value)
    File "/home/travis/build/python-diamond/Diamond/test.py", line 158, in assertPublished
        self.assertEqual(actual_value, expected_value, message)
    AssertionError: jvm.gc.marksweep.TotalStartedThreadCount: actual number of calls 0, expected 1

I'm marking @dctrwatson as the author of the collector

Docker Containers

Hello, I've created a "DockerContainersCollector" at https://github.com/lesaux/diamond-DockerContainerCollector which leverages docker-py to gather container metrics.
Would there be some interests of having this included in Diamond?
If so I could work on cleaning it up, adding a few config parameters and creating tests.

Add some functionality to the mail collector

This is about the collector I sent in #40

There's one main thing I'd like to add right now that is the possibility of collecting metrics for more than one domain hosted in the server. The main idea behind the mailbox_prefix configuration key was to be able to define more than one instance of the collector, each with a different prefix and path (at least our path to the different domain spools are /var/mail/vhosts/<domain>/), but I just realized that's not possible.

So what do you think would be the best way of configuring this to have:

from /var/mail/vhosts/example.com collect metrics with prefix example_com
from /var/mail/vhosts/example.net collect metrics with prefix example.net
etc

My first thought was to pass /var/mail/vhosts/* as a glob and then automatically add the vhost name as a prefix, but the real distribution inside /var/mail depends on the mail server and configuration, the only thing the standard says is that you should have plain text files named <username> in the UNIX mailbox format for each user's mailbox. But our server has, besides /var/mail/vhosts/<domain>, /var/mail/vhosts/indexes/, and in there there's a lot of random imap information (the different users folders, trash, history, etc)

Another posible approach woud be:

[[MailCollector]]
spool_path = /var/mail         # this will be published without prefix
[[[example_com]]]
spool_path = /var/mail/vhosts/example.com    # prefix=example_com
[[[example_net]]]
spool_path = /var/mail/vhosts/example.net     # prefix=example_net

network collector: vlan interfaces with dots

Some of our servers use 8021q vlan interfaces and are named like eth2.23

These create metrics such as

servers.server1.network.eth2.23.rx_byte
servers.server1.network.eth2.23.tx_byte
servers.server1.network.eth2.rx_byte
servers.server1.network.eth2.tx_byte

causing metric path structure to change. It'd be better if the dot was replaced with something like _.

GraphiteHandler sends ntpd.offset value as 0

moved this issue from the old repo to this one (if I shouldn't have don this p[lease let me know)

I switched from the GraphitePickleHandler to the GraphiteHandler yesterday and noticed that all ntpd offset values where suddenly reported as 0. When looking into this I noticed when I switch back to the GraphitePickleHandler the correct value is being sent, in this case: 0.00162993

ntpdc -c kerninfo
pll offset: 0.00162993 s
pll frequency: -9.166 ppm
maximum error: 0.267084 s
estimated error: 0.01685 s
status: 2001 pll nano
pll time constant: 7
precision: 1e-09 s
frequency tolerance: 500 ppm
To verify I enabled the ArchiveHandler to see what is being logged by that handler and it also logs 0 while I see the correct value from the GraphitePickleHandler.

Running diamond in debug mode did not show any errors

first observed with version 3.3.510 and still and issue in version 4.0.60

Support for entry_points?

I was looking at the implementation for creating custom collectors and handlers and I was curious if there was any thought or plans to support entry_points rather than inspecting a canonical filesystem location. This approach would make it very easy to make custom/third-party collectors and handlers independently versioned, tested, and installed.

If there is some interest in this, I'd be very happy to help with the effort. Thanks!

NetAppCollector expects arguments to collect() method

The collect in the netapp collector classes expects additional arguments, but .collect() isn't called with any arguments.

# diamond -f -l -c /etc/diamond/diamond.conf -r /usr/share/diamond/collectors/netapp/netappDisk.py
...
...
1427756106.58   [NetAppCollector:21294:DEBUG]   Starting
1427756106.59   [NetAppCollector:21294:DEBUG]   Interval: 300.0 seconds
1427756106.59   [NetAppCollector:21294:DEBUG]   Max collection time: 270 seconds
1427756106.59   [NetAppCollector:21294:ERROR]   Collector failed!
1427756106.59   [NetAppCollector:21294:ERROR]   /usr/lib/python2.6/site-packages/diamond/utils/scheduler.py:87
Traceback (most recent call last):
  File "/usr/lib/python2.6/site-packages/diamond/utils/scheduler.py", line 63, in collector_process
    collector._run()
  File "/usr/lib/python2.6/site-packages/diamond/collector.py", line 472, in _run
    self.collect()
TypeError: collect() takes exactly 5 arguments (1 given)
...
...

This seems to be the only plugin affected, but I don't know how this ever worked ... has the API been changed in the past ?

# fgrep -r 'def collect(' /usr/share/diamond/collectors/ |fgrep -v 'def collect(self):'

/usr/share/diamond/collectors/netapp/netapp_inode.py:    def collect(self, device, ip, user, password):
/usr/share/diamond/collectors/netapp/netapp.py:    def collect(self, device, ip, user, password):
/usr/share/diamond/collectors/netapp/netappDisk.py:    def collect(self, device, ip, user, password):

GraphitePickleHandler drops metrics when carbon-relay is behind ELB

Similar to the closed issue BrightcoveOS/Diamond#458, but it looks like the fixes for that issue didn't work out for the GraphitePickleHandler as well.
I haven't had a chance to dive deep, but the gist of it is that when using the GraphitePickleHandler with a carbon-relay behind an ELB, we periodically (3-5 minutes) get the following errors

[2014-12-03 00:18:17,462] [Thread-1] GraphiteHandler: Socket error, trying reconnect.
[2014-12-03 00:18:17,462] [Thread-1] GraphiteHandler: Setting socket keepalives...

But the dropped metrics seem more frequent than that. We've set the idle connection timeout to 5 minutes on the ELB, and we send metrics every 1 minute.
What we have seen, all things being equal, when we switch to use the GraphiteHandler (line handler), metrics are submitted just fine and we don't see that error anymore. These are the settings we used for both handlers:

# Socket timeout (seconds)
timeout = 15

# Batch size for pickled metrics
batch = 256

keepalive = 1

max_backlog_multiplier = 10

trim_backlog_multiplier = 2

I'm still seeing if I can get more information about this problem, so if there are any suggestions about what I should check out, please let me know.
Thanks

python-diamond / diamond Goto Github PK

diamond's People

Stargazers

Watchers

Forkers

diamond's Issues

keys = ...

Recommend Projects

Recommend Topics

Recommend Org