grobian / carbon-c-relay Goto Github PK
View Code? Open in Web Editor NEWEnhanced C implementation of Carbon relay, aggregator and rewriter
License: Apache License 2.0
Enhanced C implementation of Carbon relay, aggregator and rewriter
License: Apache License 2.0
I'm seeing a strange issue where metric path gets garbled. My metric path looks looks something like:
vCenter.Datastore.<datastore name>.x.y.z
which results in a variation of different metric paths:
drwxr-xr-x 3 _graphite _graphite 4096 Mar 17 08:33 DavCenter
drwxr-xr-x 3 _graphite _graphite 4096 Mar 17 08:33 DatastovCenter
drwxr-xr-x 3 _graphite _graphite 4096 Mar 17 08:33 DvCenter
drwxr-xr-x 3 _graphite _graphite 4096 Mar 17 08:33 DatavCenter
drwxr-xr-x 3 _graphite _graphite 4096 Mar 17 08:33 DatasvCenter
drwxr-xr-x 3 _graphite _graphite 4096 Mar 17 08:33 DatvCenter
drwxr-xr-x 3 _graphite _graphite 4096 Mar 17 08:33 DatastvCenter
drwxr-xr-x 3 _graphite _graphite 4096 Mar 17 08:33 DatastorevCenter
drwxr-xr-x 3 _graphite _graphite 4096 Mar 17 08:33 DatastorvCenter
drwxr-xr-x 3 _graphite _graphite 4096 Mar 17 08:33 vCenter
drwxr-xr-x 66 _graphite _graphite 4096 Mar 17 08:33 Datastore
When I send the metric directly to carbon, things work ok. I've tested this with version 0.36 and with the latest v0.37 (50b6a1).
Any advice how to troubleshoot this?
Hi,
my setup:
here's the config:
cluster all
forward
graphiteA:2003
graphiteB:2003
graphiteC:2003
;
aggregate
^stats\.dc1pweb[0-9]+\.request_by_country\.([^.]*)
every 60 seconds
expire after 60 seconds
compute sum write to
stats._sum_dc1pweb.request_by_country.\1
;
aggregate
^stats\.dc1[^.]+\.warehouselogger\.([^.]*)
every 60 seconds
expire after 60 seconds
compute sum write to
stats._sum_dc1.warehouselogger.\1
;
match _sum_dc1pweb.request_by_country
send to all
;
match _sum_dc1.warehouselogger
send to all
;
now the thing is, i didn't want to maintain the A/B/C list in carbon-c-relay as well, because that's what carbon-relay-ng is for, and this list changes too often.
so i modified the config, to instead of forward to A/B/C, forward to carbon-relay-ng:2003, feeding the output of carbon-c-relay back into carbon-relay-ng, which can then feed the aggregations into everywhere it needs to go.
so that means that stuff also feeds back into carbon-c-relay.
I didn't expect this to be an issue, because the regexes are carefully written to only match the input metrics, and not its own aggregation outputs. However, it was. this is amount of metrics coming into carbon-relay-ng (it flattened back out because carbon-relay-ng hit 100% cpu and couldn't process anym roe)
i did some network sniffing on the relay machine and saw way too many metrics with sum_dc1pweb
in them.
did i misconfigure the relay? or is there any reason why this behavior would make sense? might it be a bug in the relay?
Does this build/run on windows?
Under high load, carbon-c-relay sometimes dies and the kernel ring buffer reports:
relay[12386] general protection ip:3ee520798e sp:7fc14c4bde00 error:0 in libpthread-2.12.so[3ee5200000+17000]
relay[29214]: segfault at b983c6a8 ip 0000000000409027 sp 00007fa3b983bcb0 error 4 in relay[400000+e000]
on CentOS release 6.3 (Final) 2.6.32-279.5.2.el6.x86_64. I'd be happy to provide more information as needed.
Hi,
We make extensive use of rewrite rules in our carbon/graphite setup.
Any plans to implement some way of performing rewriting on the metric names?
Bob
Hi,
I think that I have run into an issue with the any_of cluster type. I have not tested the other cluster types and I have not tested to see if there are any timeouts longer than 10 seconds involved.
It appears that the any_of cluster option does always elects to send to the same metric to the same cluster node, is sticky and does not fail over.
From the documentation:
cluster send-to-any-one
any_of 10.1.0.1:2010 10.1.0.1:2011;
This would implement a fail-over scenario, where two servers are used, the load between them is spreaded, but should any of them fail, all metrics are send to the remaining one. This typically works well for upstream relays, or for balancing carbon-cache processes running on the same machine. Should any member become unavailable, for instance due to rolling restart, the other members receive the traffic.
To test this I used the following config with release 32:
$ relay -p4000 -w 4 -b 10 -q 25000 -H maxwell -f relay.conf -d
[2014-09-16 17:37:26] starting carbon-c-relay v0.33 (2014-09-16)
configuration:
relay hostname = twiki501.back.test.bc.local
listen port = 4000
workers = 4
send batch size = 10
server queue size = 25000
debug = true
routes configuration = relay.conf
parsed configuration follows:
cluster one
any_of
localhost:5000
localhost:6000
localhost:7000
;
match ^carbon\.relays\..*$
send to blackhole
stop
;
match *
send to one
;
I started 3 nc instances listening on ports 5000 6000 7000.
I sent the following metrics:
orang-utan 4 1410887768
spider 4 1410887768
chimp 4 1410887768
I observed the following results:
orang-utans always go to port 5000
spiders always go to port 6000
chimps always go to port 7000
This is the case even if there is nothing listening on the target port.
Please let me know if I can help in any way. I am afraid that my C has about 16 years of rust on it but I am tempted to relearn.
Thanks for what is, by my testing, the fastest way to relay carbon metrics.
Hello,
I've found the high cpu usage on the worker threads of the carbon-c-relay. The symptom and impact are mostly the same as the issue reported in this #8 but only the trigger is different. Somehow there were some data that contains only "." sent from our client to our carbon-c-relay and cause this issue.
Step to reproduce is below. Could you help me fix this please?
echo -n "." | nc localhost 2003
I am trying to incorporate carbon-c-relay into an existing environment and am finding that the carbon_ch hashing method does not line up with that of the standard carbon tools.
Should
cluster my_store
carbon_ch
10.0.0.100:2113
10.0.0.200:2113
10.0.0.300:2113
10.0.0.400:2113
;
behave the same as?
DESTINATIONS = 10.0.0.100:2120:a,10.0.0.200:2120:a,10.0.0.300:2120:a,10.0.0.400:2120:a
In my testing it was not the case.
[[email protected] bin]$ ./carbon-c-relay -f /opt/graphite/conf/carbon-c-relay2.conf -q 100000 -s
[2015-01-26 22:54:12] starting carbon-c-relay v0.37 (2015-01-26)
configuration:
relay hostname = carbon01
listen port = 2003
workers = 2
send batch size = 2500
server queue size = 100000
statistics submission interval = 60s
submission = true
routes configuration = /opt/graphite/conf/carbon-c-relay2.conf
parsed configuration follows:
cluster graphite
fnv1a_ch replication 1
127.0.0.1:2104
127.0.0.1:2106
127.0.0.1:2108
127.0.0.1:2110
;
match *
send to graphite
;
listening on tcp4 0.0.0.0 port 2003
listening on tcp6 :: port 2003
listening on udp4 0.0.0.0 port 2003
listening on udp6 :: port 2003
listening on UNIX socket /tmp/.s.carbon-c-relay.2003
starting 2 workers
starting statistics collector
[2015-01-26 22:54:12] failed to write() to 127.0.0.1:2108: Broken pipe
[2015-01-26 22:54:12] failed to write() to 127.0.0.1:2106: Broken pipe
[2015-01-26 22:54:12] failed to write() to 127.0.0.1:2104: Broken pipe
[2015-01-26 22:54:12] failed to write() to 127.0.0.1:2110: Broken pipe
[2015-01-26 22:54:12] server 127.0.0.1:2108: OK
[2015-01-26 22:54:12] failed to write() to 127.0.0.1:2108: Broken pipe
[2015-01-26 22:54:12] server 127.0.0.1:2104: OK
[2015-01-26 22:54:12] failed to write() to 127.0.0.1:2104: Broken pipe
[2015-01-26 22:54:12] server 127.0.0.1:2106: OK
[2015-01-26 22:54:12] failed to write() to 127.0.0.1:2106: Broken pipe
[2015-01-26 22:54:12] server 127.0.0.1:2110: OK
[2015-01-26 22:54:12] failed to write() to 127.0.0.1:2110: Broken pipe
[2015-01-26 22:54:12] server 127.0.0.1:2108: OK
[2015-01-26 22:54:12] failed to write() to 127.0.0.1:2108: Broken pipe
[2015-01-26 22:54:12] server 127.0.0.1:2104: OK
[2015-01-26 22:54:12] failed to write() to 127.0.0.1:2104: Broken pipe
[2015-01-26 22:54:12] server 127.0.0.1:2106: OK
[2015-01-26 22:54:12] failed to write() to 127.0.0.1:2106: Broken pipe
[2015-01-26 22:54:13] server 127.0.0.1:2110: OK
[2015-01-26 22:54:13] failed to write() to 127.0.0.1:2110: Broken pipe
[2015-01-26 22:54:13] server 127.0.0.1:2108: OK
[2015-01-26 22:54:13] failed to write() to 127.0.0.1:2108: Broken pipe
[2015-01-26 22:54:13] server 127.0.0.1:2104: OK
[2015-01-26 22:54:13] failed to write() to 127.0.0.1:2104: Broken pipe
[2015-01-26 22:54:13] server 127.0.0.1:2106: OK
[2015-01-26 22:54:13] failed to write() to 127.0.0.1:2106: Broken pipe
[2015-01-26 22:54:13] server 127.0.0.1:2110: OK
[2015-01-26 22:54:13] failed to write() to 127.0.0.1:2110: Broken pipe
[2015-01-26 22:54:13] server 127.0.0.1:2108: OK
[2015-01-26 22:54:13] failed to write() to 127.0.0.1:2108: Broken pipe
[2015-01-26 22:54:13] server 127.0.0.1:2104: OK
[2015-01-26 22:54:13] failed to write() to 127.0.0.1:2104: Broken pipe
[2015-01-26 22:54:13] server 127.0.0.1:2106: OK
[2015-01-26 22:54:13] failed to write() to 127.0.0.1:2106: Broken pipe
[2015-01-26 22:54:13] server 127.0.0.1:2110: OK
[2015-01-26 22:54:13] failed to write() to 127.0.0.1:2110: Broken pipe
[2015-01-26 22:54:13] server 127.0.0.1:2108: OK
[2015-01-26 22:54:13] failed to write() to 127.0.0.1:2108: Broken pipe
[2015-01-26 22:54:13] server 127.0.0.1:2104: OK
[2015-01-26 22:54:13] failed to write() to 127.0.0.1:2104: Broken pipe
[2015-01-26 22:54:13] server 127.0.0.1:2106: OK
[2015-01-26 22:54:13] failed to write() to 127.0.0.1:2106: Broken pipe
[2015-01-26 22:54:13] server 127.0.0.1:2110: OK
[2015-01-26 22:54:13] failed to write() to 127.0.0.1:2110: Broken pipe
[2015-01-26 22:54:13] server 127.0.0.1:2108: OK
[2015-01-26 22:54:13] failed to write() to 127.0.0.1:2108: Broken pipe
[2015-01-26 22:54:13] server 127.0.0.1:2104: OK
[2015-01-26 22:54:13] failed to write() to 127.0.0.1:2104: Broken pipe
[2015-01-26 22:54:13] server 127.0.0.1:2106: OK
[2015-01-26 22:54:13] failed to write() to 127.0.0.1:2106: Broken pipe
[2015-01-26 22:54:14] server 127.0.0.1:2110: OK
[2015-01-26 22:54:14] failed to write() to 127.0.0.1:2110: Broken pipe
[2015-01-26 22:54:14] server 127.0.0.1:2108: OK
[2015-01-26 22:54:14] failed to write() to 127.0.0.1:2108: Broken pipe
[2015-01-26 22:54:14] server 127.0.0.1:2104: OK
[2015-01-26 22:54:14] failed to write() to 127.0.0.1:2104: Broken pipe
[2015-01-26 22:54:14] server 127.0.0.1:2106: OK
[2015-01-26 22:54:14] failed to write() to 127.0.0.1:2106: Broken pipe
[2015-01-26 22:54:14] server 127.0.0.1:2110: OK
[2015-01-26 22:54:14] failed to write() to 127.0.0.1:2110: Broken pipe
[2015-01-26 22:54:14] server 127.0.0.1:2104: OK
[2015-01-26 22:54:14] failed to write() to 127.0.0.1:2104: Broken pipe
[2015-01-26 22:54:14] server 127.0.0.1:2108: OK
[2015-01-26 22:54:14] failed to write() to 127.0.0.1:2108: Broken pipe
[2015-01-26 22:54:14] server 127.0.0.1:2106: OK
[2015-01-26 22:54:14] failed to write() to 127.0.0.1:2106: Broken pipe
[2015-01-26 22:54:14] server 127.0.0.1:2110: OK
[2015-01-26 22:54:14] failed to write() to 127.0.0.1:2110: Broken pipe
[2015-01-26 22:54:14] server 127.0.0.1:2108: OK
[2015-01-26 22:54:14] failed to write() to 127.0.0.1:2108: Broken pipe
[2015-01-26 22:54:14] server 127.0.0.1:2104: OK
[2015-01-26 22:54:14] failed to write() to 127.0.0.1:2104: Broken pipe
[2015-01-26 22:54:14] server 127.0.0.1:2106: OK
[2015-01-26 22:54:14] failed to write() to 127.0.0.1:2106: Broken pipe
[2015-01-26 22:54:14] server 127.0.0.1:2104: OK
[2015-01-26 22:54:14] failed to write() to 127.0.0.1:2104: Broken pipe
[2015-01-26 22:54:1
Hi. Could you provide some configuration examples?
For example, how to integrate such carbon-aggregator configuration:
aggregation-rules.conf:
stats.app...all..count (15) = sum stats.app...(\d+)-(\d+)-(\d+)-(\d+).<>.count
Does carbon-c-relay supports named captures in output patterns like carbon-aggregator?
Hi,
I'm trying to use carbon-c-relay and perform some aggregations. One of the rules is:
aggregate
^([^-]+)-[^.]+\.cacti_nginx\.cacti_nginx_sockets\.nginx_(requests|handled|accepts)
every 30 seconds
expire after 35 seconds
compute sum write to
aggregates.\1.nginx.sockets.\2
compute sum write to
aggregates.all.nginx.sockets.\2
;
Though there are some strange gaps (looks like 1 or 2 points are missing):
I can't understand why there can be no data for aggregate but data for the points itself. Especially because data is obtained by collectd and they have actual time.
With format=json it looks like that:
[20622.722064, 1419838410], [null, 1419838440], [21996.078082, 1419838470]
[18649.835853, 1419838410], [18739.188967999995, 1419838440], [19948.72801599999, 1419838470]
hi,
I'm pretty sure that this is not a bug, I am just totally failing to grok something...
If you spot what I am doing wrong, please let me know. Otherwise, please feel free to close this issue.
When carbon c relay is fed metrics from collectd all metrics relayed have a '_' appended.
If I capture the output of collectd to a file then play it back to cc relay with netcat everything is fine.
Scenario
collectd -> carbon c relay -> carbon-cache.
I am running carbon c relay release 0.27 (need to update..)
carbon cache is complaining in /var/log/carbon/listener.log
27/08/2014 19:14:07 :: invalid line received from client 127.0.0.1:55633, ignoring
27/08/2014 19:14:07 :: invalid line received from client 127.0.0.1:55633, ignoring
27/08/2014 19:14:07 :: invalid line received from client 127.0.0.1:55633, ignoring
27/08/2014 19:14:07 :: invalid line received from client 127.0.0.1:55633, ignoring
27/08/2014 19:14:07 :: invalid line received from client 127.0.0.1:55633, ignoring
27/08/2014 19:14:07 :: invalid line received from client 127.0.0.1:55633, ignoring
stopping carbon-cache and replacing it with netcat:
$ nc -l 2023
vbox_twiki01.processes-httpd.ps_data 1005916160.000000 1409166917_
vbox_twiki01.processes-httpd.ps_code 190201856.000000 1409166917_
vbox_twiki01.processes-httpd.ps_stacksize 13432.000000 1409166917_
vbox_twiki01.processes-httpd.ps_cputime.user 0.000000 1409166917_
vbox_twiki01.processes-httpd.ps_cputime.syst 0.000000 1409166917_
vbox_twiki01.processes-httpd.ps_count.processes 10.000000 1409166917_
vbox_twiki01.processes-httpd.ps_count.threads 10.000000 1409166917_
vbox_twiki01.processes-httpd.ps_pagefaults.minflt 0.000000 1409166917_
vbox_twiki01.processes-httpd.ps_pagefaults.majflt 0.000000 1409166917_
vbox_twiki01.processes-httpd.ps_disk_octets.read 0.000000 1409166917_
vbox_twiki01.processes-httpd.ps_disk_octets.write 0.000000 1409166917_
checking that collectd is not sending weirdness by replacing cc relay with netcat:
$ nc -l 2103
vbox_twiki01.processes-httpd.ps_count.processes 10.000000 1409167037
vbox_twiki01.processes-httpd.ps_count.threads 10.000000 1409167037
vbox_twiki01.processes-httpd.ps_pagefaults.minflt 0.000000 1409167037
vbox_twiki01.processes-httpd.ps_pagefaults.majflt 0.000000 1409167037
vbox_twiki01.processes-httpd.ps_disk_octets.read 0.000000 1409167037
vbox_twiki01.processes-httpd.ps_disk_octets.write 0.000000 1409167037
vbox_twiki01.processes-httpd.ps_disk_ops.read 0.000000 1409167037
vbox_twiki01.processes-httpd.ps_disk_ops.write 0.000000 1409167037
vbox_twiki01.processes-collectd.ps_vm 674430976.000000 1409167037
vbox_twiki01.processes-collectd.ps_rss 2420736.000000 1409167037
vbox_twiki01.processes-collectd.ps_data 642191360.000000 1409167037
vbox_twiki01.processes-collectd.ps_code 2719744.000000 1409167037
vbox_twiki01.processes-collectd.ps_stacksize 2208.000000 1409167037
replacing collectd with netcat piping output into cc relay and catching cc_relay output with netcat:
terminal1$ cat ./test_metrics.txt | nc 127.0.0.1 2103
terminal2$ nc -l 2023
vbox_twiki01.processes-monit.ps_vm 127668224.000000 1409165887
vbox_twiki01.processes-monit.ps_rss 933888.000000 1409165887
vbox_twiki01.processes-monit.ps_data 78196736.000000 1409165887
vbox_twiki01.processes-monit.ps_code 7028736.000000 1409165887
vbox_twiki01.processes-monit.ps_stacksize 752.000000 1409165887
vbox_twiki01.processes-monit.ps_cputime.user 0.000000 1409165887
vbox_twiki01.processes-monit.ps_cputime.syst 0.000000 1409165887
vbox_twiki01.processes-monit.ps_count.processes 1.000000 1409165887
vbox_twiki01.processes-monit.ps_count.threads 2.000000 1409165887
vbox_twiki01.processes-monit.ps_pagefaults.minflt 0.000000 1409165887
vbox_twiki01.processes-monit.ps_pagefaults.majflt 0.000000 1409165887
vbox_twiki01.processes-monit.ps_disk_octets.read 10299.084952 1409165887
configuration for carbon c relay:
cluster localhost
forward
127.0.0.1:2023
;
match *
send to localhost
stop
;
start command for cc relay:
/usr/bin/relay -p 2103 -w 4 -d -s -f /etc/relay.conf
collectd write_graphite configuration:
<Plugin write_graphite>
<Node "cc_relay">
Host "localhost"
Port "2103"
Protocol "tcp"
LogSendErrors true
</Node>
</Plugin>
Hello,
We are running into a degradation issue with carbon-c-relay the symptoms of which are increased CPU usage for carbon-c-relay eventually getting stuck at 100% per thread and performance degradation, less throughput and dropped metrics, resulting from the increased CPU usage.
Have tested with one worker and multiples, same behaviour. With multiple workers threads start consuming 100% cpu until all of them do.
The stuck threads have the following output from strace:
<..>
read(5, 0x2b78c00008c5, 8095) = -1 EAGAIN (Resource temporarily unavailable)
read(5, 0x2b78c00008c5, 8095) = -1 EAGAIN (Resource temporarily unavailable)
read(5, 0x2b78c00008c5, 8095) = -1 EAGAIN (Resource temporarily unavailable)
read(5, 0x2b78c00008c5, 8095) = -1 EAGAIN (Resource temporarily unavailable)
read(5, 0x2b78c00008c5, 8095) = -1 EAGAIN (Resource temporarily unavailable)
read(5, 0x2b78c00008c5, 8095) = -1 EAGAIN (Resource temporarily unavailable)
read(5, 0x2b78c00008c5, 8095) = -1 EAGAIN (Resource temporarily unavailable)
read(5, 0x2b78c00008c5, 8095) = -1 EAGAIN (Resource temporarily unavailable)
read(5, 0x2b78c00008c5, 8095) = -1 EAGAIN (Resource temporarily unavailable)
read(5, 0x2b78c00008c5, 8095) = -1 EAGAIN (Resource temporarily unavailable)
read(5, 0x2b78c00008c5, 8095) = -1 EAGAIN (Resource temporarily unavailable)
read(5, 0x2b78c00008c5, 8095) = -1 EAGAIN (Resource temporarily unavailable)
read(5, 0x2b78c00008c5, 8095) = -1 EAGAIN (Resource temporarily unavailable)
read(5, 0x2b78c00008c5, 8095) = -1 EAGAIN (Resource temporarily unavailable)
read(5, 0x2b78c00008c5, 8095) = -1 EAGAIN (Resource temporarily unavailable)
read(5, 0x2b78c00008c5, 8095) = -1 EAGAIN (Resource temporarily unavailable)
read(5, 0x2b78c00008c5, 8095) = -1 EAGAIN (Resource temporarily unavailable)
read(5, 0x2b78c00008c5, 8095) = -1 EAGAIN (Resource temporarily unavailable)
read(5, 0x2b78c00008c5, 8095) = -1 EAGAIN (Resource temporarily unavailable)
read(5, 0x2b78c00008c5, 8095) = -1 EAGAIN (Resource temporarily unavailable)
read(5, 0x2b78c00008c5, 8095) = -1 EAGAIN (Resource temporarily unavailable)
read(5, 0x2b78c00008c5, 8095) = -1 EAGAIN (Resource temporarily unavailable)
read(5, 0x2b78c00008c5, 8095) = -1 EAGAIN (Resource temporarily unavailable)
read(5, 0x2b78c00008c5, 8095) = -1 EAGAIN (Resource temporarily unavailable)
read(5, 0x2b78c00008c5, 8095) = -1 EAGAIN (Resource temporarily unavailable)
read(5, 0x2b78c00008c5, 8095) = -1 EAGAIN (Resource temporarily unavailable)
read(5, 0x2b78c00008c5, 8095) = -1 EAGAIN (Resource temporarily unavailable)
read(5, 0x2b78c00008c5, 8095) = -1 EAGAIN (Resource temporarily unavailable)
read(5, 0x2b78c00008c5, 8095) = -1 EAGAIN (Resource temporarily unavailable)
read(5, 0x2b78c00008c5, 8095) = -1 EAGAIN (Resource temporarily unavailable)
read(5, 0x2b78c00008c5, 8095) = -1 EAGAIN (Resource temporarily unavailable)
read(5, 0x2b78c00008c5, 8095) = -1 EAGAIN (Resource temporarily unavailable)
read(5, 0x2b78c00008c5, 8095) = -1 EAGAIN (Resource temporarily unavailable)
read(5, 0x2b78c00008c5, 8095) = -1 EAGAIN (Resource temporarily unavailable)
read(5, 0x2b78c00008c5, 8095) = -1 EAGAIN (Resource temporarily unavailable)
read(5, 0x2b78c00008c5, 8095) = -1 EAGAIN (Resource temporarily unavailable)
read(5, 0x2b78c00008c5, 8095) = -1 EAGAIN (Resource temporarily unavailable)
read(5, 0x2b78c00008c5, 8095) = -1 EAGAIN (Resource temporarily unavailable)
read(5, 0x2b78c00008c5, 8095) = -1 EAGAIN (Resource temporarily unavailable)
nanosleep({0, 213000000}, NULL) = 0
<repeating..>
Whereas healthy threads have output similar to:
read(6, 0x2aea80004925, 8095) = -1 EAGAIN (Resource temporarily unavailable)
nanosleep({0, 215000000}, NULL) = 0
read(6, 0x2aea80004925, 8095) = -1 EAGAIN (Resource temporarily unavailable)
nanosleep({0, 298000000}, NULL) = 0
read(6, 0x2aea80004925, 8095) = -1 EAGAIN (Resource temporarily unavailable)
nanosleep({0, 209000000}, NULL) = 0
read(6, 0x2aea80004925, 8095) = -1 EAGAIN (Resource temporarily unavailable)
nanosleep({0, 207000000}, NULL) = 0
read(6, 0x2aea80004925, 8095) = -1 EAGAIN (Resource temporarily unavailable)
nanosleep({0, 257000000}, NULL) = 0
read(6, 0x2aea80004925, 8095) = -1 EAGAIN (Resource temporarily unavailable)
nanosleep({0, 133000000}, NULL) = 0
read(6, "<host>.<metric>"..., 8062) = 8062
It appears that in the high CPU usage case, many polls are done on what appears to be a socket with incoming metrics between each sleep where normally there would be one between each sleep.
Have not been able to replicate the issue with fake data in a self-contained test as of yet, will follow up as soon as we do. In the meantime would appreciate any pointers to where the issue lies, if you have any ideas.
Thanks!
It seems to be missing support for listening on UDP port. I'm looking to do UPD -> TCP or UDP -> UDP relaying. Am I missing an option ?
my config:
interval 10
expire 10
method sum
every point represents 10s. Howerver, the output of aggregator is always 10s laging behind that computed via graphite's sumSeries function, which is anoiying.
It will be helpful if aggregation supports rate calculation
.
For example, network inflow or outflow data collected by collectd is monotonic increasing. We need to subtract two points and divide by time to get bps
.
After reviewing the documentation I don't know how to send over a cluster in fail-over mode.
consider this cluster configuration
cluster A
ip_A_datacenter1
ip_A_datacenter2
cluster B
ip_B_datacenter1
ip_B_datacenter2
cluster C
ip_C_datacenter1
ip_C_datacenter2
match "^metricstoA\.*"
send to A
match "^metricstoB\.*"
send to B
match "^metricstoC\.*"
send to C
What I need is only send to the first host on each cluster (ip_X_datacenter1) and only if this one fails switch to the other ( ip_X_datacenter2) also should check ip_X_datacenter1 availability to come back to the main active server after the host has been recovered.
How can I do that?
It'll be good to have logrotate-friendly log support inside carbon-c-relay with reopening logs on SIGUSR, etc.
Good evening,
It looks like there may be some issue with the routing of the performance statistics when not running with debug enabled:
Running relay with debug enabled
$ relay -f /etc/cc_relay.conf -p 2103 -w 2 -b 2500 -q 25000 -d -H maxwell
[2014-09-02 20:14:19] starting carbon-c-relay v0.31 (2014-09-02)
configuration:
relay hostname = maxwell
listen port = 2103
workers = 2
send batch size = 2500
server queue size = 25000
debug = true
routes configuration = /etc/cc_relay.conf
parsed configuration follows:
cluster localhost
forward
127.0.0.1:2003
;
match *
send to localhost
;
listening on tcp4 0.0.0.0 port 2103
listening on UNIX socket /tmp/.s.carbon-c-relay.2103
starting 2 workers
starting statistics collector
[2014-09-02 20:14:38] failed to send() to 127.0.0.1:2003: Broken pipe
[2014-09-02 20:14:41] server 127.0.0.1:2003: OK
carbon.relays.maxwell.dispatcher1.metricsReceived 920 1409685319
carbon.relays.maxwell.dispatcher1.wallTime_us 15045 1409685319
carbon.relays.maxwell.dispatcher2.metricsReceived 1250 1409685319
carbon.relays.maxwell.dispatcher2.wallTime_us 21912 1409685319
carbon.relays.maxwell.metricsReceived 2170 1409685319
carbon.relays.maxwell.dispatch_wallTime_us 36957 1409685319
carbon.relays.maxwell.dispatch_busy 0 1409685319
carbon.relays.maxwell.dispatch_idle 2 1409685319
carbon.relays.maxwell.destinations.127_0_0_1:2003.sent 2170 1409685319
carbon.relays.maxwell.destinations.127_0_0_1:2003.queued 0 1409685319
carbon.relays.maxwell.destinations.127_0_0_1:2003.dropped 0 1409685319
carbon.relays.maxwell.destinations.127_0_0_1:2003.wallTime_us 19995 1409685319
carbon.relays.maxwell.destinations.internal.sent 0 1409685319
carbon.relays.maxwell.destinations.internal.queued 0 1409685319
carbon.relays.maxwell.destinations.internal.dropped 0 1409685319
carbon.relays.maxwell.destinations.internal.wallTime_us 0 1409685319
carbon.relays.maxwell.metricsSent 2170 1409685319
carbon.relays.maxwell.metricsQueued 0 1409685319
carbon.relays.maxwell.metricsDropped 0 1409685319
carbon.relays.maxwell.server_wallTime_us 19995 1409685319
carbon.relays.maxwell.connections 217 1409685319
carbon.relays.maxwell.disconnects 217 1409685319
^Ccaught SIGINT, terminating...
[2014-09-02 20:15:56] shutting down...
[2014-09-02 20:15:56] listener for port 2103 closed
[2014-09-02 20:15:57] collector stopped
[2014-09-02 20:15:57] stopped worker 1 2 3 (2014-09-02 20:15:58)
[2014-09-02 20:15:58] routing stopped
Running relay without debug enabled:
relay -f /etc/cc_relay.conf -p 2103 -w 2 -b 2500 -q 25000 -H maxwell
[2014-09-02 20:19:12] starting carbon-c-relay v0.31 (2014-09-02)
configuration:
relay hostname = maxwell
listen port = 2103
workers = 2
send batch size = 2500
server queue size = 25000
routes configuration = /etc/cc_relay.conf
parsed configuration follows:
cluster localhost
forward
127.0.0.1:2003
;
match *
send to localhost
;
listening on tcp4 0.0.0.0 port 2103
listening on UNIX socket /tmp/.s.carbon-c-relay.2103
starting 2 workers
starting statistics collector
[2014-09-02 20:20:12] failed to send() to internal:2103: Socket operation on non-socket
[2014-09-02 20:20:12] failed to send() to internal:2103: Socket operation on non-socket
[2014-09-02 20:20:12] failed to send() to internal:2103: Socket operation on non-socket
[2014-09-02 20:20:13] failed to send() to internal:2103: Socket operation on non-socket
[2014-09-02 20:20:13] failed to send() to internal:2103: Socket operation on non-socket
[2014-09-02 20:20:13] failed to send() to internal:2103: Socket operation on non-socket
[2014-09-02 20:20:13] failed to send() to internal:2103: Socket operation on non-socket
[2014-09-02 20:20:14] failed to send() to internal:2103: Socket operation on non-socket
[2014-09-02 20:20:14] failed to send() to internal:2103: Socket operation on non-socket
...........
I also noticed that a UNIX socket has been opened when not running in debug mode
Cheers,
Matthew.
Currently we can not use 0 in expire, thus the output from aggregator lags behind that computed by graphite functions (sum, avg..), which is annoying.
Hi,
I was recently setting up carbon-c-relay and had trouble getting aggregations to work. It seems that the aggregation configs must appear before the 'match' sections.
Eg. this did not work:
cluster caches
fnv1a_ch
127.0.0.1:2401 proto udp
127.0.0.1:2402 proto udp
127.0.0.1:2403 proto udp
127.0.0.1:2404 proto udp
;
match *
send to remotes
stop
;
# Metric format: metrics.<dc>.<type>.host.<host>.<metric>
aggregate
^metrics\.([^.]+)\.([^.]+)\.host\.([^.]+)\.(.+)$
every 10 seconds
expire after 10 seconds
compute sum write to
metrics.\1.\2.all.sum.\4
compute average write to
metrics.\1.\2.all.avg.\4
;
But this did:
cluster caches
fnv1a_ch
127.0.0.1:2401 proto udp
127.0.0.1:2402 proto udp
127.0.0.1:2403 proto udp
127.0.0.1:2404 proto udp
;
# Metric format: metrics.<dc>.<type>.host.<host>.<metric>
aggregate
^metrics\.([^.]+)\.([^.]+)\.host\.([^.]+)\.(.+)$
every 10 seconds
expire after 10 seconds
compute sum write to
metrics.\1.\2.all.sum.\4
compute average write to
metrics.\1.\2.all.avg.\4
;
match *
send to remotes
stop
;
This was running with version 0.39.
Is this behavior intentional, or a bug? The example configs in the README follow the non-working format, so at least we should update the README to avoid confusion.
Note that the above config is slightly modified from what I actually ran. Apologies, but I haven't had time yet to put together a verified set of configs and script to reproduce the problem, but the above should work. If it doesn't, I can get back to you with those (or better, a Vagrantfile).
Finally, thank you for your work on carbon-c-relay. I find it to be a real improvement over the carbon-relay and aggregator: easier configuration, more flexible routing, and much more efficient.
Thanks again
-Evan
Would be helpful if there was an option to run this process as a daemon, allowing easy init control.
How are carbon_ch/fnv1a_ch hashes calculated?
On start of carbon-c-relay?
Where is this information stored?
Basically I want to know if i use carbon_ch/fnv1a_ch hashing, is hash recalculated every time carbon-c-relay is restarted? And how do I rebalance if that's the case. I'm thinking a rule base approach may be better.
All questions above are in regards to sharding and not for distributing among local carbon-caches.
-Thanks
Hello!
I have a problem witch consistent-hashing.
My cluster configuration:
node1 - haproxy and 3 carbon-relay in consistent-hashing, replication factor 2
node2, node3, node4 - carbon-relay and 4 carbon-cache on each.
When i try to use graphite-c-relay on node1 instead of 3 carbon-relays, i faced with the following problem: the way that graphite-c-relay sorts metrics differs from the way that the original graphite does it.
The order of nodes and replication factor in the config carbon-c-relay is the same as it was in carbon.conf. I`m ready to give some extra information.
carbon-c-relay.conf
cluster oldcluster
carbon_ch replication 2
mynode2:2013 proto tcp
mynode3:2013 proto tcp
mynode4:2013 proto tcp
;
match *
send to oldcluster
stop
;
carbon.conf
RELAY_METHOD = consistent-hashing
REPLICATION_FACTOR = 2
DESTINATIONS = mynode2:2014:1, mynode3:2014:2, mynode4:2014:3
[relay:1]
LINE_RECEIVER_PORT = 2103
PICKLE_RECEIVER_PORT = 2104
[relay:2]
LINE_RECEIVER_PORT = 2203
PICKLE_RECEIVER_PORT = 2204
[relay:3]
LINE_RECEIVER_PORT = 2303
PICKLE_RECEIVER_PORT = 2304
Hi,
we use carbon c relay on both graphite/carbon servers and on clients.
One of the things we want to do on clients is to aggregate cpu metrics as provided by collectd every 10 seconds.
To do this I understand that we shall need a configuration something like:
aggregate
*.system.cpu-[0-9]+.cpu-idle
every 10 seconds
expire after 5 seconds
compute average write to
maxwell.vbox.cpu-all.average
for each of the processor states idle, interrupt, nice, softirq, steal, system, user, wait assuming that as this is a client server it will see metrics only from its self.
It would be really useful, from a configuration management point of view, if the configuration could be written as:
aggregate
*.system.cpu-[0-9]+.cpu-idle
every 10 seconds
expire after 5 seconds
compute average write to
$HOSTNAME.cpu-all.average
I saw the previous feature request for back references and that would be the best of all. However, I understand that the amount of effort to do so would be great.
Another thing. It has been some time since I have done any C but I have looked at your code and I hope to be able to contribute a small patch to vary the time between statistic emission. Our carbon stack emits statistics every 10 seconds, it would be nice for carbon c relay to do likewise.
Can we have a configuration option to change the default prefix for the relay's own metrics?
Meaning instead of always using carbon.relays
, use a configured value instead. This exists in carbon's stuff, cache and relay, and is used by some setups instead of the default carbon.<..> prefixes.
Also helps with situations where carbon-c-relay is talking to multiple carbon-caches across multiple nodes where you'd need multiple carbon-c-relays to replicate the built in carbon-relay setup and avoid infinite loops.
In this case both carbon-c-relays would be trying to write metrics to the same location, overwriting each other.
Thanks for the great work as always.
Greetings,
In issue #14, one of your terminal outputs shows that you have carbon-c-relay listening on ::, thus having the ability to listen on an IPv6 address.
Looking through the code it doesn't seem obvious to me how to get carbon-c-relay to listen on both IPv4 and IPv6. I admit that my C knowledge is sub-par (I mostly deal with BASH, python, ruby, and Go), but is there any way to get the daemon to listen with both address families?
From a quick reading, it seems for me, that migrating to C++ will make code more natural, like "dispatcher.c" or queue looks like OOP in C. If migrate code to C++ (using only small subset of it, e.g. classes, maybe some new atomic primitives), it'll make code more readable without any performance penalties.
When using carbon_ch
, the last node in the list always receives the metrics:
cluster default
carbon_ch
127.0.0.1:2003
127.0.0.1:2103
127.0.0.1:2203
127.0.0.1:2303
;
match *
send to default
stop
;
Hi @grobian as I'm discussing on the carbon issue (graphite-project/carbon#333) I would like to do and scheme that seems imposible with standar carbon-cache.py and carbon-relay.py.
Could carbon-c-relay replace the cargon-relay.py in that situation ? how would be a config file for this scheme?
Thank-you very much !!
Is it possible to support multi-thread for aggregator?
I have a scenario which the performance is seems blocking at the aggregation, as I have millions of metrics received every min, and all those metrics are needed to be aggregated, the configuration file looks like this:
cluster local_carbon
forward
192.168.0.138:2013
;
match ^metrics.all.*
send to local_carbon
stop
;
aggregate
metrics.*.api.ac\.([^.]+)\.([^.]+)\.([^.]+)\.([^.]+)\.([^.]+)\.count
every 10 seconds
expire after 50 seconds
compute sum write to
metrics.all.api.ac.\1.\2.\3.\4.\5.count
compute sum write to
metrics.all.api.ac.\1.\2.all.\4.\5.count
compute sum write to
metrics.all.api.ac.\1.\2.\3.all.\5.count
compute sum write to
metrics.all.api.ac.\1.\2.all.all.\5.count
;
and I have glanced the code(aggregator.c) a little bit, maybe it can be changed to bind one thread for every aggregator, instead of all aggregators?
This may not be the proper place for this question but can carbon-c-relay accept/send metrics via pickle? We are moving away from the original carbon-relay and have some applications sending pickled data.
Carbon-c-relay does not accept tabs as line metric separator characters but carbon-caches and the built in carbon-relay do.
This is causing issues with some clients that send tab separated line metrics, like sensu, which are accepted by the built in tools but not carbon-c-relay.
Could be interesting for a critical production environment if carbon-c-relay could reload configuration files itself periodically without restarting the process.
Could be possible in a mid term?
Some systems use udp input, it will be better if carbon-c-relay supports this.
Morning,
It looks like rewrites are performed before matches, such as a match placed before a rewrite will be sent data which has been subject to the rewrite.
From the documentation:
match * send to old;
rewrite ... ;
match * send to new;
I understand that this is not the intended behaviour.
Methodology:
relay invocation in terminal session 1:
$ /usr/bin/relay -p 4000 -w 2 -b 2500 -q 25000 -H maxwell -f ./relay.test.conf
[2014-09-11 11:07:19] starting carbon-c-relay v0.32 (2014-09-10)
configuration:
relay hostname = maxwell
listen port = 4000
workers = 2
send batch size = 2500
server queue size = 25000
routes configuration = ./relay.test.conf
parsed configuration follows:
cluster new
forward
127.0.0.1:5000
;
cluster old
forward
127.0.0.1:6000
;
match ^carbon\.relays\..*$
send to blackhole
stop
;
match *
send to old
;
rewrite ^foo\.(.*)
into bar.\1
;
match *
send to new
;
listening on tcp4 0.0.0.0 port 4000
listening on UNIX socket /tmp/.s.carbon-c-relay.4000
starting 2 workers
starting statistics collector
Listener for cluster old in terminal 2:
nc -kl 5000
Listener for cluster new in terminal 3:
nc -kl 6000
Feed data into graphite from terminal 4:
echo "foo.monkey 4 `date +%s`" | nc 127.0.0.1 4000
Expected result in cluster old:
foo.monkey 4 1410433961
Expected result in cluster new:
bar.monkey 4 1410433961
Actual result in cluster old:
bar.monkey 4 1410433961
Actual result in cluster new:
bar.monkey 4 1410433961
Is it possible to use the rewrite rule to convert all metric names to lowercase?
On a RHEL based system, a service relay start fails with exit code 1, if the service is already running. I would prefer the init script to behave like described in the LSB: http://refspecs.linuxbase.org/LSB_3.1.1/LSB-Core-generic/LSB-Core-generic/iniscrptact.html.
Pull request follows.
I've got a simple setup using any_of
in my config:
cluster default
any_of
127.0.0.1:2003
127.0.0.1:2103
127.0.0.1:2203
127.0.0.1:2303
;
match *
send to default
stop
;
After a small amount of time, relay
segfaults with:
Mar 13 02:52:25 sg1infmnp006 kernel: [46512476.556928] carbon-relay[3562]: segfault at 38 ip 00007fef26404e84 sp 00007fef25c19f40 error 4 in libpthread-2.15.so[7fef263fb000+18000]
Looking at it with gdb, I see:
Program terminated with signal 11, Segmentation fault.
#0 0x00007f8afad37e84 in pthread_mutex_lock () from /lib/x86_64-linux-gnu/libpthread.so.0
This is on Ubuntu 12.04, which currently uses 2.15-0ubuntu10.5 of libc6. I've tested it out using carbon_ch instead and haven't had it segfault at all.
I know the relay can send it's own statistics into the stats stream because I'm sure I saw them during my testing. Now however, they no longer show up. I've tried starting with debug and they do print to stdout.
# Define the local carbon cache cluster
cluster local_cache
any_of
127.0.0.1:2003
127.0.0.1:2103
127.0.0.1:2203
127.0.0.1:2303
;
# Define the AWS storage cluster
cluster aws_store
carbon_ch
<ip>:2113
<ip>:2113
<ip>:2113
<ip>:2113
;
# Send everything to local
match *
send to local_cache
;
# Send everything to AWS
match *
send to aws_store
;
Hi,
In my environment some systems send metrics with '*' and '/'.
Create some rules to change that chars to dots and remove duouble dots.
I test them with -t flag and works like a charm but when they are routed to next relay they appeared as somewhat another rule.
Example:
input metric
bus.telemetry.feature.http:/some.url.com/api/sky/list*.GET.p999
rewrited metric
bus.telemetry.feature.http:.some.url.com.api.sky.list.GET.p999
output metric
bus.telemetry.feature.http:_some.url.com_api_sky_list_.GET.p999
What I missed?
I've noticed that carbon-c-relay stops listening on the udp socket after some time (about a day). This is on rhel 6.6 with carbon-c-relay 0.39.
Can you please elaborate on the following options:
Options:
-w use <workers> worker threads, defaults to 16
-b server send batch size, defaults to 2500
-q server queue size, defaults to 25000
-d debug mode: currently writes statistics to log
-s submission mode: write info about errors to log
-t config test mode: prints rule matches from input on stdin
How do -w -b -q affect performance? -w seems to do nothing for me. -q seems to just lower the metrics sent to carbon-cache.
Is there any chance you could license this under Apache 2 or MIT? GPL2 is completely out of the question for us. ๐ฟ
Here's a trace from upstart logs:
[2014-09-17 10:54:56] failed to write() to 127.0.0.1:2013: Broken pipe
*** buffer overflow detected ***: /usr/local/bin/relay terminated
======= Backtrace: =========
/lib/x86_64-linux-gnu/libc.so.6(__fortify_fail+0x37)[0x7fe2dc89c287]
/lib/x86_64-linux-gnu/libc.so.6(+0x10a180)[0x7fe2dc89b180]
/lib/x86_64-linux-gnu/libc.so.6(+0x10b23e)[0x7fe2dc89c23e]
/usr/local/bin/relay[0x40886c]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x7e9a)[0x7fe2dcb58e9a]
/lib/x86_64-linux-gnu/libc.so.6(clone+0x6d)[0x7fe2dc88573d]
======= Memory map: ========
00400000-0040e000 r-xp 00000000 09:02 1185849 /usr/local/bin/relay
0060d000-0060e000 r--p 0000d000 09:02 1185849 /usr/local/bin/relay
0060e000-0060f000 rw-p 0000e000 09:02 1185849 /usr/local/bin/relay
02018000-02039000 rw-p 00000000 00:00 0 [heap]
7fe29c000000-7fe29c021000 rw-p 00000000 00:00 0
7fe29c021000-7fe2a0000000 ---p 00000000 00:00 0
7fe2a4000000-7fe2a4021000 rw-p 00000000 00:00 0
7fe2a4021000-7fe2a8000000 ---p 00000000 00:00 0
7fe2a8000000-7fe2a8021000 rw-p 00000000 00:00 0
7fe2a8021000-7fe2ac000000 ---p 00000000 00:00 0
7fe2adeef000-7fe2b0000000 rw-p 00000000 00:00 0
7fe2b0000000-7fe2b007b000 rw-p 00000000 00:00 0
7fe2b007b000-7fe2b4000000 ---p 00000000 00:00 0
7fe2b4000000-7fe2b407c000 rw-p 00000000 00:00 0
7fe2b407c000-7fe2b8000000 ---p 00000000 00:00 0
7fe2b8000000-7fe2b807d000 rw-p 00000000 00:00 0
7fe2b807d000-7fe2bc000000 ---p 00000000 00:00 0
7fe2bc000000-7fe2bc07a000 rw-p 00000000 00:00 0
7fe2bc07a000-7fe2c0000000 ---p 00000000 00:00 0
7fe2c0000000-7fe2c007c000 rw-p 00000000 00:00 0
7fe2c007c000-7fe2c4000000 ---p 00000000 00:00 0
7fe2c4000000-7fe2c4087000 rw-p 00000000 00:00 0
7fe2c4087000-7fe2c8000000 ---p 00000000 00:00 0
7fe2c8000000-7fe2c807a000 rw-p 00000000 00:00 0
7fe2c807a000-7fe2cc000000 ---p 00000000 00:00 0
7fe2cc000000-7fe2cc079000 rw-p 00000000 00:00 0
7fe2cc079000-7fe2d0000000 ---p 00000000 00:00 0
7fe2d0000000-7fe2d10a9000 rw-p 00000000 00:00 0
7fe2d10a9000-7fe2d4000000 ---p 00000000 00:00 0
7fe2d5953000-7fe2d5968000 r-xp 00000000 09:02 9175092 /lib/x86_64-linux-gnu/libgcc_s.so.1
7fe2d5968000-7fe2d5b67000 ---p 00015000 09:02 9175092 /lib/x86_64-linux-gnu/libgcc_s.so.1
7fe2d5b67000-7fe2d5b68000 r--p 00014000 09:02 9175092 /lib/x86_64-linux-gnu/libgcc_s.so.1
7fe2d5b68000-7fe2d5b69000 rw-p 00015000 09:02 9175092 /lib/x86_64-linux-gnu/libgcc_s.so.1
7fe2d5b69000-7fe2d5b6a000 ---p 00000000 00:00 0
7fe2d5b6a000-7fe2d636a000 rw-p 00000000 00:00 0
7fe2d636a000-7fe2d636b000 ---p 00000000 00:00 0
7fe2d636b000-7fe2d6b6b000 rw-p 00000000 00:00 0
7fe2d6b6b000-7fe2d6b6c000 ---p 00000000 00:00 0
7fe2d6b6c000-7fe2d736c000 rw-p 00000000 00:00 0
7fe2d736c000-7fe2d736d000 ---p 00000000 00:00 0
7fe2d736d000-7fe2d7b6d000 rw-p 00000000 00:00 0
7fe2d7b6d000-7fe2d7b6e000 ---p 00000000 00:00 0
7fe2d7b6e000-7fe2d836e000 rw-p 00000000 00:00 0
7fe2d836e000-7fe2d836f000 ---p 00000000 00:00 0
7fe2d836f000-7fe2d8b6f000 rw-p 00000000 00:00 0
7fe2d8b6f000-7fe2d8b70000 ---p 00000000 00:00 0
7fe2d8b70000-7fe2d9370000 rw-p 00000000 00:00 0
7fe2d9370000-7fe2d9371000 ---p 00000000 00:00 0
7fe2d9371000-7fe2d9b71000 rw-p 00000000 00:00 0
7fe2d9b71000-7fe2d9b72000 ---p 00000000 00:00 0
7fe2d9b72000-7fe2da372000 rw-p 00000000 00:00 0
7fe2da372000-7fe2da373000 ---p 00000000 00:00 0
7fe2da373000-7fe2dab73000 rw-p 00000000 00:00 0
7fe2dab73000-7fe2dab74000 ---p 00000000 00:00 0
7fe2dab74000-7fe2db374000 rw-p 00000000 00:00 0
7fe2db374000-7fe2db375000 ---p 00000000 00:00 0
7fe2db375000-7fe2dbb75000 rw-p 00000000 00:00 0
7fe2dbb75000-7fe2dbb76000 ---p 00000000 00:00 0
7fe2dbb76000-7fe2dc376000 rw-p 00000000 00:00 0
7fe2dc376000-7fe2dc38c000 r-xp 00000000 09:02 9175268 /lib/x86_64-linux-gnu/libz.so.1.2.3.4
7fe2dc38c000-7fe2dc58b000 ---p 00016000 09:02 9175268 /lib/x86_64-linux-gnu/libz.so.1.2.3.4
7fe2dc58b000-7fe2dc58c000 r--p 00015000 09:02 9175268 /lib/x86_64-linux-gnu/libz.so.1.2.3.4
7fe2dc58c000-7fe2dc58d000 rw-p 00016000 09:02 9175268 /lib/x86_64-linux-gnu/libz.so.1.2.3.4
7fe2dc58d000-7fe2dc58f000 r-xp 00000000 09:02 9178486 /lib/x86_64-linux-gnu/libdl-2.15.so
7fe2dc58f000-7fe2dc78f000 ---p 00002000 09:02 9178486 /lib/x86_64-linux-gnu/libdl-2.15.so
7fe2dc78f000-7fe2dc790000 r--p 00002000 09:02 9178486 /lib/x86_64-linux-gnu/libdl-2.15.so
7fe2dc790000-7fe2dc791000 rw-p 00003000 09:02 9178486 /lib/x86_64-linux-gnu/libdl-2.15.so
7fe2dc791000-7fe2dc946000 r-xp 00000000 09:02 9178488 /lib/x86_64-linux-gnu/libc-2.15.so
7fe2dc946000-7fe2dcb46000 ---p 001b5000 09:02 9178488 /lib/x86_64-linux-gnu/libc-2.15.so
7fe2dcb46000-7fe2dcb4a000 r--p 001b5000 09:02 9178488 /lib/x86_64-linux-gnu/libc-2.15.so
7fe2dcb4a000-7fe2dcb4c000 rw-p 001b9000 09:02 9178488 /lib/x86_64-linux-gnu/libc-2.15.so
7fe2dcb4c000-7fe2dcb51000 rw-p 00000000 00:00 0
7fe2dcb51000-7fe2dcb69000 r-xp 00000000 09:02 9178479 /lib/x86_64-linux-gnu/libpthread-2.15.so
7fe2dcb69000-7fe2dcd68000 ---p 00018000 09:02 9178479 /lib/x86_64-linux-gnu/libpthread-2.15.so
7fe2dcd68000-7fe2dcd69000 r--p 00017000 09:02 9178479 /lib/x86_64-linux-gnu/libpthread-2.15.so
7fe2dcd69000-7fe2dcd6a000 rw-p 00018000 09:02 9178479 /lib/x86_64-linux-gnu/libpthread-2.15.so
7fe2dcd6a000-7fe2dcd6e000 rw-p 00000000 00:00 0
7fe2dcd6e000-7fe2dcf1f000 r-xp 00000000 09:02 9178635 /lib/x86_64-linux-gnu/libcrypto.so.1.0.0
7fe2dcf1f000-7fe2dd11f000 ---p 001b1000 09:02 9178635 /lib/x86_64-linux-gnu/libcrypto.so.1.0.0
7fe2dd11f000-7fe2dd13a000 r--p 001b1000 09:02 9178635 /lib/x86_64-linux-gnu/libcrypto.so.1.0.0
7fe2dd13a000-7fe2dd145000 rw-p 001cc000 09:02 9178635 /lib/x86_64-linux-gnu/libcrypto.so.1.0.0
7fe2dd145000-7fe2dd149000 rw-p 00000000 00:00 0
7fe2dd149000-7fe2dd16b000 r-xp 00000000 09:02 9178480 /lib/x86_64-linux-gnu/ld-2.15.so
7fe2dd2fc000-7fe2dd362000 rw-p 00000000 00:00 0
7fe2dd367000-7fe2dd36b000 rw-p 00000000 00:00 0
7fe2dd36b000-7fe2dd36c000 r--p 00022000 09:02 9178480 /lib/x86_64-linux-gnu/ld-2.15.so
7fe2dd36c000-7fe2dd36e000 rw-p 00023000 09:02 9178480 /lib/x86_64-linux-gnu/ld-2.15.so
7fff93756000-7fff93777000 rw-p 00000000 00:00 0 [stack]
7fff937ff000-7fff93800000 r-xp 00000000 00:00 0 [vdso]
ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0 [vsyscall]
[2014-09-17 10:55:01] starting carbon-c-relay v0.33 (02696d)
configuration:
relay hostname = IAD01-GRAPHITE02.INTERNAL.NET
listen port = 2003
workers = 8
send batch size = 2500
server queue size = 25000
routes configuration = /etc/relay.conf
parsed configuration follows:
cluster all
fnv1a_ch replication 1
127.0.0.1:2013
127.0.0.1:2113
;
match *
send to all
stop
;
listening on tcp4 0.0.0.0 port 2003
listening on UNIX socket /tmp/.s.carbon-c-relay.2003
starting 8 workers
starting statistics collector
[2014-09-17 10:55:01] failed to connect() to 127.0.0.1:2013: Connection refused
[2014-09-17 10:55:02] server 127.0.0.1:2013: OK
What other information can I provide?
I've ran clang's scan-build (using my branch see the PR for simple-logging from my working account with fixes for clang):
CC="clang" scan-build -o analyze make -j4
Then I've opened report (it's an html with all code flow described there) and saw some errors that I think worth reporting here:
Bug Group Bug Type โพ File Function/Method Line Path Length
Dead store Dead assignment router.c router_optimise 1231 1 View Report
Dead store Dead assignment collector.c collector_runner 187 1 View Report
Logic error Dereference of null pointer consistent-hash.c ch_get_nodes 238 11 View Report
Logic error Dereference of null pointer consistent-hash.c ch_addnode 179 12 View Report
Logic error Dereference of null pointer router.c router_route_intern 1538 14 View Report
Memory Error Memory leak relay.c main 302 36 View Report
Memory Error Memory leak router.c router_readconfig 747 54 View Report
Logic error Result of operation is garbage or undefined aggregator.c aggregator_putmetric 223 15 View Report
Logic error Result of operation is garbage or undefined aggregator.c aggregator_putmetric 242 19 View Report
Memory leaks are not that bad - it's a corner cases for binding and parsing file - not all malloced variables are freed, but program is terminated (in router_readconfig i- you allocate w, but if encounter 'unexpected end of file' you don't free it, in relay.c -you allocate workers, but you won't free it if you'll fail to bind to socket).
About two null-pointer dereferences in consistent-hash:
ret[i].dest = w->server;
. I don't know if it really can happen not as a result of poorly crafted config, but still.ring->entries != NULL``, but either w is NULL or
ring->hash_replicasis less than 0, you'll get null pointer derefrence at line 179 (in my fork), string is
last->next = w```, cause last will be NULL and you are checking only if w is not null. Again, I think ring->hash_replicas should never be 0, but there maybe conditions where its possible.Maybe all are false-positives, but it's better if you'll have a look at the output of clang's analyzer.
At least it's right about "dead assignments" on https://github.com/grobian/carbon-c-relay/blob/master/router.c#L1231 - you set rwalk to 'bwalk->firstroute' in "then" statement, and emediately after rewrites it to 'bwalk->lastroute' on line 1235. And same thing about https://github.com/grobian/carbon-c-relay/blob/master/collector.c#L187 - it's just useless, cause it will never be read.
I think it's good idea to have sort of "fast restart" ability - e.x. when you add new aggregates, there should be a way to perform "fast" reload of the rules.
I work with @nareshov - We are seeing the same thing as reported i issue #17 - I just re-built from the latest on github:
*** buffer overflow detected ***: /usr/local/bin/relay terminated
======= Backtrace: =========
/lib/x86_64-linux-gnu/libc.so.6(__fortify_fail+0x37)[0x7f8dce1c3387]
/lib/x86_64-linux-gnu/libc.so.6(+0x109280)[0x7f8dce1c2280]
/lib/x86_64-linux-gnu/libc.so.6(+0x10a33e)[0x7f8dce1c333e]
/usr/local/bin/relay[0x40a449]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x7e9a)[0x7f8dce47fe9a]
/lib/x86_64-linux-gnu/libc.so.6(clone+0x6d)[0x7f8dce1ac8bd]
======= Memory map: ========
00400000-00410000 r-xp 00000000 09:02 5375089 /usr/local/bin/relay
00610000-00611000 r--p 00010000 09:02 5375089 /usr/local/bin/relay
00611000-00612000 rw-p 00011000 09:02 5375089 /usr/local/bin/relay
0095f000-009a5000 rw-p 00000000 00:00 0 [heap]
7f8d71eef000-7f8d74000000 rw-p 00000000 00:00 0
7f8d74000000-7f8d74021000 rw-p 00000000 00:00 0
7f8d74021000-7f8d78000000 ---p 00000000 00:00 0
7f8d78000000-7f8d78021000 rw-p 00000000 00:00 0
7f8d78021000-7f8d7c000000 ---p 00000000 00:00 0
7f8d7c000000-7f8d7c300000 rw-p 00000000 00:00 0
7f8d7c300000-7f8d80000000 ---p 00000000 00:00 0
7f8d80000000-7f8d8044a000 rw-p 00000000 00:00 0
7f8d8044a000-7f8d84000000 ---p 00000000 00:00 0
7f8d84000000-7f8d843fd000 rw-p 00000000 00:00 0
7f8d843fd000-7f8d88000000 ---p 00000000 00:00 0
7f8d88000000-7f8d8848f000 rw-p 00000000 00:00 0
7f8d8848f000-7f8d8c000000 ---p 00000000 00:00 0
7f8d8c000000-7f8d8c4bd000 rw-p 00000000 00:00 0
7f8d8c4bd000-7f8d90000000 ---p 00000000 00:00 0
7f8d90000000-7f8d903be000 rw-p 00000000 00:00 0
7f8d903be000-7f8d94000000 ---p 00000000 00:00 0
7f8d94000000-7f8d9444d000 rw-p 00000000 00:00 0
7f8d9444d000-7f8d98000000 ---p 00000000 00:00 0
7f8d98000000-7f8d983af000 rw-p 00000000 00:00 0
7f8d983af000-7f8d9c000000 ---p 00000000 00:00 0
7f8d9c000000-7f8d9c4b4000 rw-p 00000000 00:00 0
7f8d9c4b4000-7f8da0000000 ---p 00000000 00:00 0
7f8da0000000-7f8da0403000 rw-p 00000000 00:00 0
7f8da0403000-7f8da4000000 ---p 00000000 00:00 0
7f8da4000000-7f8da4022000 rw-p 00000000 00:00 0
7f8da4022000-7f8da8000000 ---p 00000000 00:00 0
7f8da8000000-7f8da84e4000 rw-p 00000000 00:00 0
7f8da84e4000-7f8dac000000 ---p 00000000 00:00 0
7f8dac000000-7f8dac3bb000 rw-p 00000000 00:00 0
7f8dac3bb000-7f8db0000000 ---p 00000000 00:00 0
7f8db0000000-7f8db0527000 rw-p 00000000 00:00 0
7f8db0527000-7f8db4000000 ---p 00000000 00:00 0
7f8db4000000-7f8db4452000 rw-p 00000000 00:00 0
7f8db4452000-7f8db8000000 ---p 00000000 00:00 0
7f8db87f9000-7f8db87fa000 ---p 00000000 00:00 0
7f8db87fa000-7f8db8ffa000 rw-p 00000000 00:00 0
7f8db8ffa000-7f8db8ffb000 ---p 00000000 00:00 0
7f8db8ffb000-7f8db97fb000 rw-p 00000000 00:00 0
7f8db97fb000-7f8db97fc000 ---p 00000000 00:00 0
7f8db97fc000-7f8db9ffc000 rw-p 00000000 00:00 0
7f8db9ffc000-7f8db9ffd000 ---p 00000000 00:00 0
7f8db9ffd000-7f8dba7fd000 rw-p 00000000 00:00 0
7f8dba7fd000-7f8dba7fe000 ---p 00000000 00:00 0
7f8dba7fe000-7f8dbaffe000 rw-p 00000000 00:00 0
7f8dbaffe000-7f8dbafff000 ---p 00000000 00:00 0
7f8dbafff000-7f8dbb7ff000 rw-p 00000000 00:00 0
7f8dbb7ff000-7f8dbb800000 ---p 00000000 00:00 0
7f8dbb800000-7f8dbc000000 rw-p 00000000 00:00 0
7f8dbc000000-7f8dbc4f6000 rw-p 00000000 00:00 0
7f8dbc4f6000-7f8dc0000000 ---p 00000000 00:00 0
7f8dc00fe000-7f8dc00ff000 ---p 00000000 00:00 0
7f8dc00ff000-7f8dc08ff000 rw-p 00000000 00:00 0
7f8dc17fb000-7f8dc17fc000 ---p 00000000 00:00 0
7f8dc17fc000-7f8dc1ffc000 rw-p 00000000 00:00 0
7f8dc1ffc000-7f8dc1ffd000 ---p 00000000 00:00 0
7f8dc1ffd000-7f8dc27fd000 rw-p 00000000 00:00 0
7f8dc27fd000-7f8dc27fe000 ---p 00000000 00:00 0
7f8dc27fe000-7f8dc2ffe000 rw-p 00000000 00:00 0
7f8dc2ffe000-7f8dc2fff000 ---p 00000000 00:00 0
7f8dc2fff000-7f8dc37ff000 rw-p 00000000 00:00 0
7f8dc37ff000-7f8dc3800000 ---p 00000000 00:00 0
7f8dc3800000-7f8dc4000000 rw-p 00000000 00:00 0
7f8dc4000000-7f8dc44fe000 rw-p 00000000 00:00 0
7f8dc44fe000-7f8dc8000000 ---p 00000000 00:00 0
7f8dc80fe000-7f8dc80ff000 ---p 00000000 00:00 0
7f8dc80ff000-7f8dc88ff000 rw-p 00000000 00:00 0
7f8dc88ff000-7f8dc8900000 ---p 00000000 00:00 0
7f8dc8900000-7f8dc9100000 rw-p 00000000 00:00 0
7f8dc9100000-7f8dc9101000 ---p 00000000 00:00 0
7f8dc9101000-7f8dc9901000 rw-p 00000000 00:00 0
7f8dc9901000-7f8dc9902000 ---p 00000000 00:00 0
7f8dc9902000-7f8dca102000 rw-p 00000000 00:00 0
7f8dca102000-7f8dca103000 ---p 00000000 00:00 0
7f8dca103000-7f8dca903000 rw-p 00000000 00:00 0
7f8dca903000-7f8dca904000 ---p 00000000 00:00 0
7f8dca904000-7f8dcb104000 rw-p 00000000 00:00 0
7f8dcbf77000-7f8dcbf8c000 r-xp 00000000 09:02 9175092 /lib/x86_64-linux-gnu/libgcc_s.so.1
7f8dcbf8c000-7f8dcc18b000 ---p 00015000 09:02 9175092 /lib/x86_64-linux-gnu/libgcc_s.so.1
7f8dcc18b000-7f8dcc18c000 r--p 00014000 09:02 9175092 /lib/x86_64-linux-gnu/libgcc_s.so.1
7f8dcc18c000-7f8dcc18d000 rw-p 00015000 09:02 9175092 /lib/x86_64-linux-gnu/libgcc_s.so.1
7f8dcc18d000-7f8dcc18e000 ---p 00000000 00:00 0
7f8dcc18e000-7f8dccb15000 rw-p 00000000 00:00 0
7f8dccb15000-7f8dccb16000 ---p 00000000 00:00 0
7f8dccb16000-7f8dcd49d000 rw-p 00000000 00:00 0
7f8dcd49d000-7f8dcd49e000 ---p 00000000 00:00 0
7f8dcd49e000-7f8dcdc9e000 rw-p 00000000 00:00 0
7f8dcdc9e000-7f8dcdcb4000 r-xp 00000000 09:02 9175268 /lib/x86_64-linux-gnu/libz.so.1.2.3.4
7f8dcdcb4000-7f8dcdeb3000 ---p 00016000 09:02 9175268 /lib/x86_64-linux-gnu/libz.so.1.2.3.4
7f8dcdeb3000-7f8dcdeb4000 r--p 00015000 09:02 9175268 /lib/x86_64-linux-gnu/libz.so.1.2.3.4
7f8dcdeb4000-7f8dcdeb5000 rw-p 00016000 09:02 9175268 /lib/x86_64-linux-gnu/libz.so.1.2.3.4
7f8dcdeb5000-7f8dcdeb7000 r-xp 00000000 09:02 9178524 /lib/x86_64-linux-gnu/libdl-2.15.so
7f8dcdeb7000-7f8dce0b7000 ---p 00002000 09:02 9178524 /lib/x86_64-linux-gnu/libdl-2.15.so
7f8dce0b7000-7f8dce0b8000 r--p 00002000 09:02 9178524 /lib/x86_64-linux-gnu/libdl-2.15.so
7f8dce0b8000-7f8dce0b9000 rw-p 00003000 09:02 9178524 /lib/x86_64-linux-gnu/libdl-2.15.so
7f8dce0b9000-7f8dce26d000 r-xp 00000000 09:02 9178502 /lib/x86_64-linux-gnu/libc-2.15.so
7f8dce26d000-7f8dce46d000 ---p 001b4000 09:02 9178502 /lib/x86_64-linux-gnu/libc-2.15.so
7f8dce46d000-7f8dce471000 r--p 001b4000 09:02 9178502 /lib/x86_64-linux-gnu/libc-2.15.so
7f8dce471000-7f8dce473000 rw-p 001b8000 09:02 9178502 /lib/x86_64-linux-gnu/libc-2.15.so
7f8dce473000-7f8dce478000 rw-p 00000000 00:00 0
7f8dce478000-7f8dce490000 r-xp 00000000 09:02 9178504 /lib/x86_64-linux-gnu/libpthread-2.15.so
7f8dce490000-7f8dce68f000 ---p 00018000 09:02 9178504 /lib/x86_64-linux-gnu/libpthread-2.15.so
7f8dce68f000-7f8dce690000 r--p 00017000 09:02 9178504 /lib/x86_64-linux-gnu/libpthread-2.15.so
7f8dce690000-7f8dce691000 rw-p 00018000 09:02 9178504 /lib/x86_64-linux-gnu/libpthread-2.15.so
7f8dce691000-7f8dce695000 rw-p 00000000 00:00 0
7f8dce695000-7f8dce847000 r-xp 00000000 09:02 9179198 /lib/x86_64-linux-gnu/libcrypto.so.1.0.0
7f8dce847000-7f8dcea46000 ---p 001b2000 09:02 9179198 /lib/x86_64-linux-gnu/libcrypto.so.1.0.0
7f8dcea46000-7f8dcea61000 r--p 001b1000 09:02 9179198 /lib/x86_64-linux-gnu/libcrypto.so.1.0.0
7f8dcea61000-7f8dcea6c000 rw-p 001cc000 09:02 9179198 /lib/x86_64-linux-gnu/libcrypto.so.1.0.0
7f8dcea6c000-7f8dcea70000 rw-p 00000000 00:00 0
7f8dcea70000-7f8dcea92000 r-xp 00000000 09:02 9178520 /lib/x86_64-linux-gnu/ld-2.15.so
7f8dceac9000-7f8dceafe000 r--s 00000000 09:02 9830433 /var/cache/nscd/hosts
7f8dceafe000-7f8dcec89000 rw-p 00000000 00:00 0
7f8dcec8e000-7f8dcec92000 rw-p 00000000 00:00 0
7f8dcec92000-7f8dcec93000 r--p 00022000 09:02 9178520 /lib/x86_64-linux-gnu/ld-2.15.so
7f8dcec93000-7f8dcec95000 rw-p 00023000 09:02 9178520 /lib/x86_64-linux-gnu/ld-2.15.so
7fff38f96000-7fff38fb7000 rw-p 00000000 00:00 0 [stack]
7fff38fff000-7fff39000000 r-xp 00000000 00:00 0 [vdso]
ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0 [vsyscall]
I have noticed that carbon-c-relay replaces the hash symbol with an underscore. Due to some technical debt, we actually have hash signs in our metric names in two locations. One example is below.
env.app.POS#.yadda.count
Other than an rewrite, is there a method to allow this or certain characters in metric names?
Additionally, a rewrite didn't fix my issue but issued the error below. As I understand it, if a rewrite occurs before a match, the rewritten metric will not be cleansed. It would basically attempting to rewrite a hash symbol with a hash symbol.
Error:
router_route: failed to rewrite metric: newmetric size too small to hold replacement
Hi @grobian there is anywhere documented the carbon-c-relay daemon statistics?
I can't see metrics related to resource consumption for each process ( and thread perhaps ..) do you have plans to add them?
Lots of thanks for your great work !!!
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.