rcrowley / go-metrics Goto Github PK
View Code? Open in Web Editor NEWGo port of Coda Hale's Metrics library
License: Other
Go port of Coda Hale's Metrics library
License: Other
Would it make sense to be able to send counter deltas to librarto, so that in Librato we can use server-side-aggregation and use the distributed event counting: http://support.metrics.librato.com/knowledgebase/articles/213775-what-is-service-side-aggregation-ssa
That would require the Reporter to report counters as gauges in librato and remember the last value send for that counter and subtract that from the value being reported.
Error message:
registry.MarshalJSON undefined (type metrics.Registry has no field or method MarshalJSON)
Code:
package main
import (
"os"
"time"
"net/http"
"github.com/rcrowley/go-metrics"
)
func main() {
registry := metrics.NewRegistry()
metrics.RegisterRuntimeMemStats(registry)
go metrics.CaptureRuntimeMemStats(registry, time.Duration(2) * time.Second)
http.HandleFunc("/", func(res http.ResponseWriter, req *http.Request) {
data, _ := registry.MarshalJSON() // This is where it breaks
res.Write(data)
})
http.ListenAndServe("localhost:9999", nil)
}
And allow custom tags to be provided with a metric.
(I may work on this, if I have time)
The current OpenTSDB seems to implement the socket version of messaging. It's very similar to the Graphite version. Looking at the Bosun documentation (said to support OpenTSDB 2.x) it appears to be limited to the http POST verb. I'm about to fork go-metrics in order to implement the feature but I wanted to get your opinion first.
Is Bosun supported? Is someone working on it? Would you accept a pull request?
g1 := metrics.GetOrRegisterGauge("the_same_key", metrics.DefaultRegistry)
g1.Update(100)
# this line will panic coz there's a StandardGauge with "the_same_key" already exists.
g2 := metrics.GetOrRegisterGaugeFloat64("the_same_key", metrics.DefaultRegistry)
panic: interface conversion: *metrics.StandardGauge is not metrics.GaugeFloat64: missing method Snapshot [recovered]
panic: interface conversion: *metrics.StandardGauge is not metrics.GaugeFloat64: missing method Snapshot
Is it better to raise an error instead of panic ?
It'd be nice with a few lines in the readme of how to use the samples. I'm trying to get a "recent data" histogram, but using metrics.NewExpDecaySample(600, 0.015)
gives me almost exactly the same data as metrics.NewUniformSample(1800)
(I add data to the histogram once a second). I expected them to be less similar, so I'm wondering if I'm doing it wrong.
Specifically then I expected Max/Min to "decay", too.
"since start" is the uniform sample, and "recent" is the ExpDecaySample from above:
http://ord1.ntppool.net:8053/status
Before I looked properly at the code and read the codahale documentation I also setup a version with three different uniformsamples (600, 3600 and 86400 reservoirs): http://zrh2.ntppool.net:8053/status -- this was useful for showing that at least for my use the reservoirsize doesn't seem to matter too much).
Would it make sense for me to implement a sliding window reservoir for my "what's the data been in the last X minutes" use? Since I only update the histogram once a second it's not that much data.
Anyway, a couple of lines in the documentation with recommendations for how and when to use the different sample types would be really helpful.
For reference my code updating the metrics is in https://github.com/abh/geodns/blob/master/metrics.go
When looking at my StatHat counters it was becoming apparent that the numbers were inflated by many times the actual count. The code appears to be sending the count to StatHat at the desired interval, but it doesn't appear to be sending a delta from the last count sent. Perhaps I'm reading it wrong.
Am I correct in thinking that StatHat is not expecting the entire count to be sent with each batch; simply the counter increments that have occurred since the last batch? It seems like sending the entire count each time, as opposed to the delta between the current count and the last count, could lead to the grossly inflated counts seen in the StatHat interface.
Hi. Thanks for putting the work into this repo.
I couldn't find an obvious way to create an ExpDecaySample or UniformSample type with a reservoir of float64 data points. Is this supported?
Hi have you seen this: http://lk4d4.darth.io/posts/defer/ ?
I benchmarked it again against go 1.3.1 and I still see a quite big difference
BenchmarkPut 50000 38259 ns/op
BenchmarkPutDefer 50000 63552 ns/op
BenchmarkGet 50000 41260 ns/op
BenchmarkGetDefer 10000 107873 ns/op
Basically defer seems not yet fully optimized.
Since go-metrics
have different short functions with defer
probably removing them could help in measuring with more precision.
What do you think?
When we use the librato reporter with go-metrics, it seems that the reporter will occationally fail to send metrics to librato server. Here's a sample log:
2014/11/16 15:16:58 ERROR sending metrics to librato Post https://metrics-api.librato.com/v1/metrics: net/http: request canceled while waiting for connection
2014/11/16 15:17:28 ERROR sending metrics to librato Post https://metrics-api.librato.com/v1/metrics: net/http: request canceled while waiting for connection
2014/11/16 15:18:18 ERROR sending metrics to librato Post https://metrics-api.librato.com/v1/metrics: net/http: request canceled while waiting for connection
2014/11/16 15:20:38 ERROR sending metrics to librato Post https://metrics-api.librato.com/v1/metrics: net/http: request canceled while waiting for connection
2014/11/16 18:20:48 ERROR sending metrics to librato Post https://metrics-api.librato.com/v1/metrics: read tcp 50.16.193.59:443: use of closed network connection
2014/11/16 19:20:08 ERROR sending metrics to librato Post https://metrics-api.librato.com/v1/metrics: read tcp 54.225.79.3:443: use of closed network connection
2014/11/16 21:24:28 ERROR sending metrics to librato Post https://metrics-api.librato.com/v1/metrics: read tcp 107.22.245.166:443: use of closed network connection
2014/11/16 23:13:48 ERROR sending metrics to librato Post https://metrics-api.librato.com/v1/metrics: read tcp 54.225.214.213:443: use of closed network connection
While it does seems that the problem lies in librato side ( server's not responding ?? ), does it make sense to retry the request in a later time ( or should we just ignore this error completely ) ?
When timing longer durations, sometimes the nanosecond-level output can be a little harder to read (lots of counting decimal places).
Log
could accept a scale
argument to apply to timers when they are printed, allowing callers to customize output to their liking.
InfluxDB 0.9 has support for tags, which are indexed key/value pairs that can be used for fast and efficient queries (https://influxdb.com/docs/v0.9/introduction/overview.html). Also, without tags, you're very limited on the kind of queries you can perform on your data, so it's important to have support for that in go-metrics.
In an environment where you have many hosts (or many docker instances), it makes sense to aggregate the already aggregated stats. statsd can be used for that. Is there a better alternative? I'm happy to create a patch.
This commit appears to have broken reporting to Librato for timers (at least):
d4f1d62#diff-2a2c532f685bccc6e7eb27c6144a94caR11
From Librato support:
{"attributes":{"display_transform":"x/1000000000","display_units_short":"s"},"count":203896,"max":8.25364642e+08,"min":4.843613e+06,"name":"metric.timer.mean","period":5,"sum_squares":2.79663655195528e+25},
which gives the error (as you mentioned in email to Nik)
{"errors":{"params":{"sum":["is required"]}
I can confirm that this was not a change in API behavior. If you have a way to actually specify the "sum" field (because it's required if you specify "count"), that should fix the problem. Docs: http://dev.librato.com/v1/post/metrics "Gauge Specific Parameters"
Also, I'm fairly certain once that's fixed, you'd run into an error with the "4.843613e+06" notation.
influxdata/influxdb#1349 changed the Influx client. The current code does no longer compile:
.gobuild/src/github.com/rcrowley/go-metrics/influxdb/influxdb.go:19: undefined: client.ClientConfig
.gobuild/src/github.com/rcrowley/go-metrics/influxdb/influxdb.go:38: undefined: client.Series
.gobuild/src/github.com/rcrowley/go-metrics/influxdb/influxdb.go:44: undefined: client.Series
.gobuild/src/github.com/rcrowley/go-metrics/influxdb/influxdb.go:52: undefined: client.Series
.gobuild/src/github.com/rcrowley/go-metrics/influxdb/influxdb.go:60: undefined: client.Series
.gobuild/src/github.com/rcrowley/go-metrics/influxdb/influxdb.go:70: undefined: client.Series
.gobuild/src/github.com/rcrowley/go-metrics/influxdb/influxdb.go:82: undefined: client.Series
.gobuild/src/github.com/rcrowley/go-metrics/influxdb/influxdb.go:93: undefined: client.Series
.gobuild/src/github.com/rcrowley/go-metrics/influxdb/influxdb.go:106: client.WriteSeries undefined (type *client.Client has no field or method WriteSeries)
I found no clue that tells me what the values mean. I could only infer the count...
metrics: 11:36:14.430331 timer Filter
metrics: 11:36:14.430371 count: 30
metrics: 11:36:14.430383 min: 19261
metrics: 11:36:14.430392 max: 429976180
metrics: 11:36:14.430434 mean: 103206457.83
metrics: 11:36:14.430448 stddev: 132111349.79
metrics: 11:36:14.430460 median: 31854654.00
metrics: 11:36:14.430470 75%: 163473873.00
metrics: 11:36:14.430481 95%: 426889010.20
metrics: 11:36:14.430491 99%: 429976180.00
metrics: 11:36:14.430501 99.9%: 429976180.00
metrics: 11:36:14.430511 1-min rate: 1.39
metrics: 11:36:14.430522 5-min rate: 1.39
metrics: 11:36:14.430533 15-min rate: 1.40
metrics: 11:36:14.430544 mean rate: 1.36
@matterkkila expressed concern about metrics being able to block progress in the context of forwarding to Graphite over TCP.
I think a UDP-like decoupling of metrics being updated from metrics being sent would be useful, especially in high-volume scenarios. The metrics should still be collected by the same process and otherwise function the same way. The write should just be nonblocking.
I'm writing a project that uses go-metrics and I only need to write stats once every 5 minutes. Right now it seems to be writing stats once every few seconds and I really do not need that precision. Is there a way to customize the time interval between metrics getting sent upstream?
In the spirit of how gauge is implemented in the original codahale metrics shouldn't the gauge type "pull values"? Maybe by accepting a function pointer and starting a goroutine to call the function on a fixed interval to retrieve value. It can also use a task manager to avoid spawning too many goroutines.
The DefaultRegistry is a package global. This is a huge no-no in libraries.
https://github.com/rcrowley/go-metrics/blob/master/registry.go#L136
What this means: I can't cleanly use the go-metrics with go-tigertonic in a test for my application that uses go-tigertonic, because the metrics left over from the one webserver (even if since shutdown; but the global registry persists those values) will contaminate the next web server to startup.
I've worked around this for now by doing metrics.Unregister() on every metric that it crashes on.
Better would be: to have one metrics.ClearDefaultRegistry() call that would wipe all state.
Even better that that, would be: to not have any global state.
Thanks.
Jason
@armon is a maintainer of serf and statsite. He also has a project with the same name and goal as yours: https://github.com/armon/go-metrics
Perhaps you might consider combining effort?
I'm attempting to report timers to Graphite using millisecond units. With DurationUnit: time.Nanosecond
(the default), they're off by a factor of a million. With DurationUnit: time.Millisecond
, they're off by a trillion.
Is this a problem of documentation (there aren't any DurationUnit examples) or should all the timer metrics be t.Foo()/du
rather than du*t.Foo()
? I'm happy to make the patch in either case.
seems useful to also be able to export all metrics using expvar
Hello! Could you point me in a right direction please.
The only thing I need is to export metrics into Graphite (only 3 types actually - counters, gauges and timers).
I looked through code and examples and it seems that counters and timers keep all previously collected data and not clearing automatically after exporting. I can imagine that this is the only way to prevent collapse when using multiple exporters. But what is the usage of such counters and timers - both just provide sum of all events happened since application start while we just need an amount of events (or timer values) since last exporting. I.e how many requests we got in a minute, how many times function was called etc. In case of timers - what is the average response time in last minute etc.
I see Clear()
method for Counter
type so I would able to clear it manually. But Timer
has no Clear()
method.
So what's the decision? Is this library not designed for tasks I described above?
Other libraries to document:
It would be useful for there to be premade command line automation, so that you can easily make command line applications that can send to any possible metrics target. Something like
-metrics:librato=<authdata> -metrics:opentsdb=<authdata>
etc
Influxdb's client package broke their public API.
I'll try to create a snapshot from the last valid version and PR with a new import path for go-metrics.
In stathat.go where it is posting stats for metrics.Timer the same timer name "mean" is being used for Mean and RateMean
Hi,
in http://godoc.org/github.com/rcrowley/go-metrics#CaptureDebugGCStats there's no way to control the name of the metrics registered.
I have multiple apps that have to report GC statistics to a InfluxDB server, and if I use this function as is, all the metrics will get merged by InfluxDB, and that's no good.
If you have the same problem, how do you handle it ?
As the title says, my suggestion is to add another function like CaptureDebugGCStatsWithPrefix or something that takes a prefix and passes it all the way down to the register call.
What do you all think ? I can work on a PR maybe today or tomorrow.
Periodically log every metric in slightly-more-parseable form to syslog:
w, _ := syslog.Dial("unixgram", "/dev/log", syslog.LOG_INFO, "metrics")
go metrics.Syslog(metrics.DefaultRegistry, 60e9, w)
Syslog function don't exist
Is there any reason why there is no official way of stopping a reporter once you have called its reporting function?
Example (Log reporter):
https://github.com/rcrowley/go-metrics/blob/master/log.go#L10
In order to be more test-friendly and to be able to gracefully stop metric reporting it would be cool to have such feature. I could try working on a PR for this. What do you think?
Hello,
It looks like ExpDecaySample rescale incorrectly. Please, have a look at func (s *ExpDecaySample) update()
:
func (s *ExpDecaySample) update(t time.Time, v int64) {
s.mutex.Lock()
defer s.mutex.Unlock()
s.count++
if s.values.Size() == s.reservoirSize {
s.values.Pop()
}
s.values.Push(expDecaySample{
k: math.Exp(t.Sub(s.t0).Seconds()*s.alpha) / rand.Float64(),
v: v,
})
if t.After(s.t1) {
values := s.values.Values()
t0 := s.t0
s.values = newExpDecaySampleHeap(s.reservoirSize)
s.t0 = t
s.t1 = s.t0.Add(rescaleThreshold)
for _, v := range values {
v.k = v.k * math.Exp(-s.alpha*float64(s.t0.Sub(t0)))
s.values.Push(v)
}
}
}
When we calcualte v.k firts time, we use seconds. But if we rescale it, we use nanoseconds instead of seconds. So after rescaling v.k will be always zero.
v.k = v.k * math.Exp(-s.alpha*float64(s.t0.Sub(t0)))
should be changed to
v.k = v.k * math.Exp(-s.alpha*float64(s.t0.Sub(t0).Seconds()))
What do you think ?
in runtime there is this nice code
for i := uint32(1); i <= memStats.NumGC-numGC; i++ {
runtimeMetrics.MemStats.PauseNs.Update(int64(memStats.PauseNs[(memStats.NumGC%256-i)%256]))
}
but here is a problem. numGC
doesn't seem to be ever updated, meaning that the whole PauseNs metric is completely wrong.
Hi,
Is there a formula to choose appropriate reservoir size and apha values of exponential decay sample so that they best represent the data that goes throught them for a specified amount a time (eg: default Timer use 1028, 0.015 which is supposed to represent roughly the last five minutes of data) ?
https://github.com/VividCortex/ewma looks simpler than ours.
Having the methods for creating and registering a metric on the registry itself would be a nice change to the API. For example:
metrics.DefaultRegistry.NewCounter("thing")
vs.
metrics.NewRegisteredCounter("thing", metrics.DefaultRegistry)
If I implement a new type of registry and it only works with a special type of counter, this would allow my registry to default to that type of counter when implementing the NewCounter function of the Registry interface.
Hi,
I need to measure request rate by Meter, but the problem is if there is 2 app instances,
Can I just plus app.instance1.Rate1 and app.instance2.Rate1 to get app.Rate1?
Thanks in advance
Tim
The code does not allow for non-integer metrics for gauge and meter.
This was not the case in the original codahale metrics and I would like to understand the reasoning behind this limitation.
This is a strange one, and is only happening on my 386 host. I'm working on shrinking the failing test. The following code will panic. go-metrics is at master. (I have go-metrics copied to a local directory -- the results are the same)
package main
import (
"sync/atomic"
"./go-metrics"
"log"
)
type A struct {
uncounted int64
}
func main() {
a := &A{}
var n int64 = 2
atomic.AddInt64(&a.uncounted, n)
log.Printf("count is %d", a.uncounted)
metrics.NewEWMA1().Update(2)
log.Printf("All done!")
}
I get the following results
$ go version && go run main.go
go version go1.1.2 linux/386
2013/09/16 20:19:29 count is 2
panic: runtime error: invalid memory address or nil pointer dereference
[signal 0xb code=0x1 addr=0x1 pc=0x80654fc]
goroutine 1 [running]:
sync/atomic.AddUint64()
/usr/local/go/src/pkg/sync/atomic/asm_386.s:69 +0xc
_/mnt/jenkins/tmp/gotest/go-metrics.(*StandardEWMA).Update(0x1824c0c0, 0x2, 0x0)
/mnt/jenkins/tmp/gotest/go-metrics/ewma.go:82 +0xd6
main.main()
/mnt/jenkins/tmp/gotest/main.go:19 +0xe6
goroutine 2 [syscall]:
goroutine 3 [runnable]:
exit status 2
Here's the offending line (which, notice, seems to work just fine in my main function above). a.uncounted points to a valid address (holding zero). Initializing it makes no difference.
atomic.AddInt64(&a.uncounted, n)
¯_(⊙︿⊙)_/¯
Per rcrowley/go-tigertonic#75, go-metrics
won't compile on App Engine because it uses runtime.NumCgoCall
, among probably other verboten variables.
We're using a lot timers in our application and they are updated fairly often. We've found in our benchmarking that a significant portion of our application's allocations are due to interface{} conversions in expDecaySampleHeap. It would be worthwhile to re-implement the heap code specialized for the expDecaySample type instead of using the heap package and avoid the overhead of using interface{}.
with -race
I found:
==================
WARNING: DATA RACE
Read by goroutine 10:
github.com/Dieterbe/go-metrics.(*StandardRegistry).registered()
/home/dieter/go/src/github.com/Dieterbe/go-metrics/registry.go:139 +0x63
github.com/Dieterbe/go-metrics.(*StandardRegistry).Each()
/home/dieter/go/src/github.com/Dieterbe/go-metrics/registry.go:63 +0x42
github.com/Dieterbe/go-metrics.graphite()
/home/dieter/go/src/github.com/Dieterbe/go-metrics/graphite.go:109 +0x372
github.com/Dieterbe/go-metrics.GraphiteWithConfig()
/home/dieter/go/src/github.com/Dieterbe/go-metrics/graphite.go:42 +0xe9
github.com/Dieterbe/go-metrics.Graphite()
/home/dieter/go/src/github.com/Dieterbe/go-metrics/graphite.go:35 +0x19d
Previous write by goroutine 28:
runtime.mapassign1()
/build/go/src/go-1.3.3/src/pkg/runtime/hashmap.goc:925 +0x0
github.com/Dieterbe/go-metrics.(*StandardRegistry).register()
/home/dieter/go/src/github.com/Dieterbe/go-metrics/registry.go:133 +0x222
github.com/Dieterbe/go-metrics.(*StandardRegistry).Register()
/home/dieter/go/src/github.com/Dieterbe/go-metrics/registry.go:97 +0xc1
github.com/Dieterbe/go-metrics.Register()
/home/dieter/go/src/github.com/Dieterbe/go-metrics/registry.go:169 +0x8b
main.Counter()
/home/dieter/go/src/github.com/graphite-ng/carbon-relay-ng/metrics_wrapper.go:11 +0x143
main.NewConn()
/home/dieter/go/src/github.com/graphite-ng/carbon-relay-ng/conn.go:81 +0x6fd
main.(*Destination).updateConn()
/home/dieter/go/src/github.com/graphite-ng/carbon-relay-ng/destination.go:132 +0x286
I.e. the graphite exporter is concurrently executing c.Registry.Each(...)
which calls registered()
, alongside with registry.register
.
func (r *StandardRegistry) registered() map[string]interface{} {
metrics := make(map[string]interface{}, len(r.metrics))
r.mutex.Lock()
defer r.mutex.Unlock()
for name, i := range r.metrics {
metrics[name] = i
}
return metrics
simply switching the two first lines should do it, the read of r.metrics will be protected.
but i wanted to file it here to make sure.
After the last time you Mark()
a Meter
, it will always continue to report the same value. I believe I can get around this by Mark(0)
'ing meters directly before reporting them, but I still can't get to the Meter
of a Timer
.
I took a quick hack at writing a test case to illustrate this, but much of it was time-dependent. I figured it would be worth opening the issue. If the fix is obvious to a set of eyes that aren't my own, it seems like marking them before handing them to the function passed to Registry.Each
would be super convenient.
To illustrate it, I have a Go proxy using this lib sitting in front of a JVM app using the Metrics library.
Below, I'm running two short bursts of requests then coming to an abrupt stop (without touching either the Go or JVM process -- they both remain running). The value highlighted can be ignored -- it's just where my mouse landed when taking a screenshot.
The request: I'd like to be able to own the tick loop currently in GraphiteWithConfig
for alternate error handling.
Motivation: we're running Graphite behind Amazon ELB, and today our instance changed its IP address. Metrics submissions started failing and required a restart of the service. I can create a new GraphiteConfig
in this case, but the current metrics API doesn't give any way to detect that the error has occurred.
This could be as simple as making func graphite
public, or adding a tiny wrapper:
func GraphiteOnce(c GraphiteConfig) error {
return graphite(&c)
}
Currently each of the meters spawns its own goroutine for updating. That's fine if you have a small number in your app but when you get in to the 100s or 1000s of meters it really bloats the number of goroutines which is a lot of overhead, and it makes poring through stack traces more difficult.
I'm wondering if it would be possible to replace the design with one that just uses a single arbiter goroutine with a ticker to update all the meters at once. I haven't thought about it in real depth yet but looking at the current arbiter code it seems like it should be possible.
Have you considered the possibility or are you aware of any roadblocks that would prevent such a thing from being achieved?
Histogram's Max() (and probably other, similar functions) are wrong and store values independently of the Sample being used. This means that Max() on a Histogram backed by an ExpDecaySample doesn't return the sample's current Max.
I read through the related @codahale's metrics implementation and they solve this by taking a snapshot of the sample (there it's called a Reservoir) during reporting. The snapshot is what has functions like Max() on it.
This is a nice abstraction, allowing the different Sample implementation deal with handling values, letting the snapshot tell you about the values of a sample at a given time, etc.
Would you consider a patch that re-organizes things and inserts Snapshots in between Samples and Histograms?
Documentation claims that UniformSample is using Vitter's Algorithm R for reservoir sampling, but actually it is implemented incorrectly.
Please see http://www.cs.umd.edu/~samir/498/vitter.pdf, page no.39, paragraph starting with words "Algorithm R...", the part about new item becoming a candidate. Also see wikipedia article for valid metacode.
Current implementation is closer to a moving window.
Patch:
diff --git i/sample.go w/sample.go
index e34b7b5..937901b 100644
--- i/sample.go
+++ w/sample.go
@@ -503,7 +503,10 @@ func (s *UniformSample) Update(v int64) {
if len(s.values) < s.reservoirSize {
s.values = append(s.values, v)
} else {
- s.values[rand.Intn(s.reservoirSize)] = v
+ r := rand.Int63n(s.count)
+ if r < int64(len(s.values)) {
+ s.values[int(r)] = v
+ }
}
}
This would break some tests.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.