rcrowley / go-metrics Goto Github PK

View Code? Open in Web Editor NEW

3.4K 85.0 493.0 1.27 MB

Go port of Coda Hale's Metrics library

License: Other

Go 99.85% Shell 0.15%

go-metrics's People

Stargazers

Watchers

Forkers

jmhodges mncaudill samuraisam wallrat aleksi inconshreveable graydon abh arikfr hailocab wadey mattkanwisher markchadwick facebookarchive yvasiyarov bketelsen schmichael tryphon dlsniper freeformz tobz crosbymichael vincentbernat wrightmikea niilo rubycut daniel-garcia matzhouse laurent-descrivan fabware smalldogsystems alimoeeny ninetwentyfour mistobaan pkieltyka wolfeidau didip ian-lewis-cs keyneston alex-tools flowhealth stuartcarnie tomarus golang-basic kaero cloudflare ruiaylin yujinqiu mowknow donovanhide gtrevg attilaolah jvshahid scalingdata mmcgrana pingles sonots kisielk dieterbe wheelcomplex pteichman mvitorino leverly pirsquare smira godeep xstevens gaurav46 willmadison toffaletti pjvds an2deg volkhin emmanuel rollbackup fridgei oddurmagg tsenart lacework 8legd liujianping oliveagle pavel-kolesnikov zumur zowens geertjohan savaki couchbasedeps rnaveiras phenixrizen datacratic fabletang whyrusleeping henning77 glycerine alex aranair wego dmreiland thiagocaiubi

go-metrics's Issues

Sending Event counts to librato

Would it make sense to be able to send counter deltas to librarto, so that in Librato we can use server-side-aggregation and use the distributed event counting: http://support.metrics.librato.com/knowledgebase/articles/213775-what-is-service-side-aggregation-ssa

That would require the Reporter to report counters as gauges in librato and remember the last value send for that counter and subtract that from the value being reported.

Switch interval argument type to time.Duration

Issue with MarshalJSON?

Error message:
registry.MarshalJSON undefined (type metrics.Registry has no field or method MarshalJSON)

Code:

package main

import (
    "os"
    "time"
    "net/http"
    "github.com/rcrowley/go-metrics"
)

func main() {
    registry := metrics.NewRegistry()

    metrics.RegisterRuntimeMemStats(registry)

    go metrics.CaptureRuntimeMemStats(registry, time.Duration(2) * time.Second)

    http.HandleFunc("/", func(res http.ResponseWriter, req *http.Request) {
        data, _ := registry.MarshalJSON()    // This is where it breaks
        res.Write(data)
    })

    http.ListenAndServe("localhost:9999", nil)
}

Update OpenTSDB to use v2 API

And allow custom tags to be provided with a metric.

(I may work on this, if I have time)

Bosun Support (aka OpenTSDB api)

The current OpenTSDB seems to implement the socket version of messaging. It's very similar to the Graphite version. Looking at the Bosun documentation (said to support OpenTSDB 2.x) it appears to be limited to the http POST verb. I'm about to fork go-metrics in order to implement the feature but I wanted to get your opinion first.

Is Bosun supported? Is someone working on it? Would you accept a pull request?

GetOrRegisterGaugeFloat64 panic

g1 := metrics.GetOrRegisterGauge("the_same_key", metrics.DefaultRegistry)
g1.Update(100)

# this line will panic coz there's a StandardGauge with "the_same_key" already exists. 
g2 := metrics.GetOrRegisterGaugeFloat64("the_same_key", metrics.DefaultRegistry)

panic: interface conversion: *metrics.StandardGauge is not metrics.GaugeFloat64: missing method Snapshot [recovered]
        panic: interface conversion: *metrics.StandardGauge is not metrics.GaugeFloat64: missing method Snapshot

Is it better to raise an error instead of panic ?

Expand Sample documentation

It'd be nice with a few lines in the readme of how to use the samples. I'm trying to get a "recent data" histogram, but using metrics.NewExpDecaySample(600, 0.015) gives me almost exactly the same data as metrics.NewUniformSample(1800) (I add data to the histogram once a second). I expected them to be less similar, so I'm wondering if I'm doing it wrong.

Specifically then I expected Max/Min to "decay", too.

"since start" is the uniform sample, and "recent" is the ExpDecaySample from above:
http://ord1.ntppool.net:8053/status

Before I looked properly at the code and read the codahale documentation I also setup a version with three different uniformsamples (600, 3600 and 86400 reservoirs): http://zrh2.ntppool.net:8053/status -- this was useful for showing that at least for my use the reservoirsize doesn't seem to matter too much).

Would it make sense for me to implement a sliding window reservoir for my "what's the data been in the last X minutes" use? Since I only update the histogram once a second it's not that much data.

Anyway, a couple of lines in the documentation with recommendations for how and when to use the different sample types would be really helpful.

For reference my code updating the metrics is in https://github.com/abh/geodns/blob/master/metrics.go

InfluxDB support out of date

Please keep this issue open until an updated influxdb exporter is available. This should prevent other people wasting their time updating the current one.
#123 #117

Are counter metrics sent to StatHat reset between batches?

When looking at my StatHat counters it was becoming apparent that the numbers were inflated by many times the actual count. The code appears to be sending the count to StatHat at the desired interval, but it doesn't appear to be sending a delta from the last count sent. Perhaps I'm reading it wrong.

Am I correct in thinking that StatHat is not expecting the entire count to be sent with each batch; simply the counter increments that have occurred since the last batch? It seems like sending the entire count each time, as opposed to the delta between the current count and the last count, could lead to the grossly inflated counts seen in the StatHat interface.

Current Behavior?

Batch 1

Increments since last count: 1000
Total increments: 1000
Increments sent to StatHat: 1000
StatHat total increments: 1000

Batch 2

Increments since last count: 1500
Total increments: 2500
Increments sent to StatHat: 2500
StatHat total increments: 3500

Batch 3

Increments since last count: 1000
Total increments: 3500
Increments sent to StatHat: 3500
StatHat total increments: 7000

Desired Behavior

Batch 1

Increments since last count: 1000
Total increments: 1000
Increments sent to StatHat: 1000
StatHat total increments: 1000

Batch 2

Increments since last count: 1500
Total increments: 2500
Increments sent to StatHat: 1500
StatHat total increments: 2500

Batch 3

Increments since last count: 1000
Total increments: 3500
Increments sent to StatHat: 1000
StatHat total increments: 3500

Support for ExpDecaySample and UniformSample in float64

Hi. Thanks for putting the work into this repo.

I couldn't find an obvious way to create an ExpDecaySample or UniformSample type with a reservoir of float64 data points. Is this supported?

Defer

Hi have you seen this: http://lk4d4.darth.io/posts/defer/ ?

I benchmarked it again against go 1.3.1 and I still see a quite big difference

BenchmarkPut       50000         38259 ns/op
BenchmarkPutDefer      50000         63552 ns/op
BenchmarkGet       50000         41260 ns/op
BenchmarkGetDefer      10000        107873 ns/op

Basically defer seems not yet fully optimized.

Since go-metrics have different short functions with defer probably removing them could help in measuring with more precision.

What do you think?

Librato: occationally failed to send metrics

When we use the librato reporter with go-metrics, it seems that the reporter will occationally fail to send metrics to librato server. Here's a sample log:

2014/11/16 15:16:58 ERROR sending metrics to librato Post https://metrics-api.librato.com/v1/metrics: net/http: request canceled while waiting for connection
2014/11/16 15:17:28 ERROR sending metrics to librato Post https://metrics-api.librato.com/v1/metrics: net/http: request canceled while waiting for connection
2014/11/16 15:18:18 ERROR sending metrics to librato Post https://metrics-api.librato.com/v1/metrics: net/http: request canceled while waiting for connection
2014/11/16 15:20:38 ERROR sending metrics to librato Post https://metrics-api.librato.com/v1/metrics: net/http: request canceled while waiting for connection
2014/11/16 18:20:48 ERROR sending metrics to librato Post https://metrics-api.librato.com/v1/metrics: read tcp 50.16.193.59:443: use of closed network connection 
2014/11/16 19:20:08 ERROR sending metrics to librato Post https://metrics-api.librato.com/v1/metrics: read tcp 54.225.79.3:443: use of closed network connection
2014/11/16 21:24:28 ERROR sending metrics to librato Post https://metrics-api.librato.com/v1/metrics: read tcp 107.22.245.166:443: use of closed network connection
2014/11/16 23:13:48 ERROR sending metrics to librato Post https://metrics-api.librato.com/v1/metrics: read tcp 54.225.214.213:443: use of closed network connection

While it does seems that the problem lies in librato side ( server's not responding ?? ), does it make sense to retry the request in a later time ( or should we just ignore this error completely ) ?

Feature Request: configure duration unit for printed timings

When timing longer durations, sometimes the nanosecond-level output can be a little harder to read (lots of counting decimal places).

Log could accept a scale argument to apply to timers when they are printed, allowing callers to customize output to their liking.

Add support for tags in metrics

InfluxDB 0.9 has support for tags, which are indexed key/value pairs that can be used for fast and efficient queries (https://influxdb.com/docs/v0.9/introduction/overview.html). Also, without tags, you're very limited on the kind of queries you can perform on your data, so it's important to have support for that in go-metrics.

statsd integration

In an environment where you have many hosts (or many docker instances), it makes sense to aggregate the already aggregated stats. statsd can be used for that. Is there a better alternative? I'm happy to create a patch.

Librato reporting for timers is broken

This commit appears to have broken reporting to Librato for timers (at least):
d4f1d62#diff-2a2c532f685bccc6e7eb27c6144a94caR11

From Librato support:

{"attributes":{"display_transform":"x/1000000000","display_units_short":"s"},"count":203896,"max":8.25364642e+08,"min":4.843613e+06,"name":"metric.timer.mean","period":5,"sum_squares":2.79663655195528e+25},

which gives the error (as you mentioned in email to Nik)

{"errors":{"params":{"sum":["is required"]}

I can confirm that this was not a change in API behavior.  If you have a way to actually specify the "sum" field (because it's required if you specify "count"), that should fix the problem.  Docs: http://dev.librato.com/v1/post/metrics "Gauge Specific Parameters"

Also, I'm fairly certain once that's fixed, you'd run into an error with the "4.843613e+06" notation.

Influxclient broken due to changes in influxdb repo

influxdata/influxdb#1349 changed the Influx client. The current code does no longer compile:

.gobuild/src/github.com/rcrowley/go-metrics/influxdb/influxdb.go:19: undefined: client.ClientConfig
.gobuild/src/github.com/rcrowley/go-metrics/influxdb/influxdb.go:38: undefined: client.Series
.gobuild/src/github.com/rcrowley/go-metrics/influxdb/influxdb.go:44: undefined: client.Series
.gobuild/src/github.com/rcrowley/go-metrics/influxdb/influxdb.go:52: undefined: client.Series
.gobuild/src/github.com/rcrowley/go-metrics/influxdb/influxdb.go:60: undefined: client.Series
.gobuild/src/github.com/rcrowley/go-metrics/influxdb/influxdb.go:70: undefined: client.Series
.gobuild/src/github.com/rcrowley/go-metrics/influxdb/influxdb.go:82: undefined: client.Series
.gobuild/src/github.com/rcrowley/go-metrics/influxdb/influxdb.go:93: undefined: client.Series
.gobuild/src/github.com/rcrowley/go-metrics/influxdb/influxdb.go:106: client.WriteSeries undefined (type *client.Client has no field or method WriteSeries)

Time units in output

I found no clue that tells me what the values mean. I could only infer the count...

metrics: 11:36:14.430331 timer Filter
metrics: 11:36:14.430371   count:              30
metrics: 11:36:14.430383   min:             19261
metrics: 11:36:14.430392   max:         429976180
metrics: 11:36:14.430434   mean:        103206457.83
metrics: 11:36:14.430448   stddev:      132111349.79
metrics: 11:36:14.430460   median:       31854654.00
metrics: 11:36:14.430470   75%:         163473873.00
metrics: 11:36:14.430481   95%:         426889010.20
metrics: 11:36:14.430491   99%:         429976180.00
metrics: 11:36:14.430501   99.9%:       429976180.00
metrics: 11:36:14.430511   1-min rate:          1.39
metrics: 11:36:14.430522   5-min rate:          1.39
metrics: 11:36:14.430533   15-min rate:         1.40
metrics: 11:36:14.430544   mean rate:           1.36

UDP-like mode for fire-and-forget metrics

@matterkkila expressed concern about metrics being able to block progress in the context of forwarding to Graphite over TCP.

I think a UDP-like decoupling of metrics being updated from metrics being sent would be useful, especially in high-volume scenarios. The metrics should still be collected by the same process and otherwise function the same way. The write should just be nonblocking.

Customizable time between metric reporting?

I'm writing a project that uses go-metrics and I only need to write stats once every 5 minutes. Right now it seems to be writing stats once every few seconds and I really do not need that precision. Is there a way to customize the time interval between metrics getting sent upstream?

Implementing gauge as a callback

In the spirit of how gauge is implemented in the original codahale metrics shouldn't the gauge type "pull values"? Maybe by accepting a function pointer and starting a goroutine to call the function on a fixed interval to retrieve value. It can also use a task manager to avoid spawning too many goroutines.

test cross-contamination due to use of global DefaultRegistry

The DefaultRegistry is a package global. This is a huge no-no in libraries.

https://github.com/rcrowley/go-metrics/blob/master/registry.go#L136

What this means: I can't cleanly use the go-metrics with go-tigertonic in a test for my application that uses go-tigertonic, because the metrics left over from the one webserver (even if since shutdown; but the global registry persists those values) will contaminate the next web server to startup.

I've worked around this for now by doing metrics.Unregister() on every metric that it crashes on.

Better would be: to have one metrics.ClearDefaultRegistry() call that would wipe all state.

Even better that that, would be: to not have any global state.

Thanks.

Jason

consider merging with armon's go-metrics

@armon is a maintainer of serf and statsite. He also has a project with the same name and goal as yours: https://github.com/armon/go-metrics

Perhaps you might consider combining effort?

Graphite DurationUnit has confusing semantics

I'm attempting to report timers to Graphite using millisecond units. With DurationUnit: time.Nanosecond (the default), they're off by a factor of a million. With DurationUnit: time.Millisecond, they're off by a trillion.

Is this a problem of documentation (there aren't any DurationUnit examples) or should all the timer metrics be t.Foo()/du rather than du*t.Foo()? I'm happy to make the patch in either case.

would it make sense to add an expvar exporter?

seems useful to also be able to export all metrics using expvar

Reset collected metrics after exporting

Hello! Could you point me in a right direction please.

The only thing I need is to export metrics into Graphite (only 3 types actually - counters, gauges and timers).

I looked through code and examples and it seems that counters and timers keep all previously collected data and not clearing automatically after exporting. I can imagine that this is the only way to prevent collapse when using multiple exporters. But what is the usage of such counters and timers - both just provide sum of all events happened since application start while we just need an amount of events (or timer values) since last exporting. I.e how many requests we got in a minute, how many times function was called etc. In case of timers - what is the average response time in last minute etc.

I see Clear() method for Counter type so I would able to clear it manually. But Timer has no Clear() method.

So what's the decision? Is this library not designed for tasks I described above?

Contact client implementation authors about splitting off into new libraries

Command line automation

It would be useful for there to be premade command line automation, so that you can easily make command line applications that can send to any possible metrics target. Something like
-metrics:librato=<authdata> -metrics:opentsdb=<authdata> etc

Influx dependency broken

Influxdb's client package broke their public API.
I'll try to create a snapshot from the last valid version and PR with a new import path for go-metrics.

Stathat metric naming

In stathat.go where it is posting stats for metrics.Timer the same timer name "mean" is being used for Mean and RateMean

Prefix support for CaptureDebugGCStats and RuntimeMemStats

Hi,

in http://godoc.org/github.com/rcrowley/go-metrics#CaptureDebugGCStats there's no way to control the name of the metrics registered.

I have multiple apps that have to report GC statistics to a InfluxDB server, and if I use this function as is, all the metrics will get merged by InfluxDB, and that's no good.

If you have the same problem, how do you handle it ?

As the title says, my suggestion is to add another function like CaptureDebugGCStatsWithPrefix or something that takes a prefix and passes it all the way down to the register call.

What do you all think ? I can work on a PR maybe today or tomorrow.

metrics has no Syslog function?

Periodically log every metric in slightly-more-parseable form to syslog:
w, _ := syslog.Dial("unixgram", "/dev/log", syslog.LOG_INFO, "metrics")
go metrics.Syslog(metrics.DefaultRegistry, 60e9, w)

Syslog function don't exist

Stopping metrics reporters

Is there any reason why there is no official way of stopping a reporter once you have called its reporting function?

Example (Log reporter):

https://github.com/rcrowley/go-metrics/blob/master/log.go#L10

In order to be more test-friendly and to be able to gracefully stop metric reporting it would be cool to have such feature. I could try working on a PR for this. What do you think?

ExpDecaySample rescaled incorrectly

Hello,

It looks like ExpDecaySample rescale incorrectly. Please, have a look at func (s *ExpDecaySample) update():

func (s *ExpDecaySample) update(t time.Time, v int64) {
s.mutex.Lock()
defer s.mutex.Unlock()
s.count++
if s.values.Size() == s.reservoirSize {
    s.values.Pop()
}
s.values.Push(expDecaySample{
    k: math.Exp(t.Sub(s.t0).Seconds()*s.alpha) / rand.Float64(),
    v: v,
})
if t.After(s.t1) {
    values := s.values.Values()
    t0 := s.t0
    s.values = newExpDecaySampleHeap(s.reservoirSize)
    s.t0 = t
    s.t1 = s.t0.Add(rescaleThreshold)
    for _, v := range values {
        v.k = v.k * math.Exp(-s.alpha*float64(s.t0.Sub(t0)))
        s.values.Push(v)
    }
}
}

When we calcualte v.k firts time, we use seconds. But if we rescale it, we use nanoseconds instead of seconds. So after rescaling v.k will be always zero.
v.k = v.k * math.Exp(-s.alpha*float64(s.t0.Sub(t0)))
should be changed to
v.k = v.k * math.Exp(-s.alpha*float64(s.t0.Sub(t0).Seconds()))

What do you think ?

numGc in runtime.go is never updated, meaning that PauseNs metric is useless

in runtime there is this nice code

    for i := uint32(1); i <= memStats.NumGC-numGC; i++ {
        runtimeMetrics.MemStats.PauseNs.Update(int64(memStats.PauseNs[(memStats.NumGC%256-i)%256]))
    }

but here is a problem. numGC doesn't seem to be ever updated, meaning that the whole PauseNs metric is completely wrong.

Choosing reservoir size and alpha of exponential decay sample

Hi,

Is there a formula to choose appropriate reservoir size and apha values of exponential decay sample so that they best represent the data that goes throught them for a specified amount a time (eg: default Timer use 1028, 0.015 which is supposed to represent roughly the last five minutes of data) ?

Replace our EWMA with VividCortex/ewma

https://github.com/VividCortex/ewma looks simpler than ours.

API change proposal registry.NewXXX instead of NewRegisteredXXX

Having the methods for creating and registering a metric on the registry itself would be a nice change to the API. For example:

metrics.DefaultRegistry.NewCounter("thing")

vs.

metrics.NewRegisteredCounter("thing", metrics.DefaultRegistry)

If I implement a new type of registry and it only works with a special type of counter, this would allow my registry to default to that type of counter when implementing the NewCounter function of the Registry interface.

How to plus meter

Hi,
I need to measure request rate by Meter, but the problem is if there is 2 app instances,
Can I just plus app.instance1.Rate1 and app.instance2.Rate1 to get app.Rate1?

Thanks in advance
Tim

Unable to use non-integer metrics for gauge and meter.

The code does not allow for non-integer metrics for gauge and meter.
This was not the case in the original codahale metrics and I would like to understand the reasoning behind this limitation.

EWMA panics go1.1.2 linux/386

This is a strange one, and is only happening on my 386 host. I'm working on shrinking the failing test. The following code will panic. go-metrics is at master. (I have go-metrics copied to a local directory -- the results are the same)

package main

import (
  "sync/atomic"
  "./go-metrics"
  "log"
)

type A struct {
  uncounted int64
}

func main() {
  a := &A{}
  var n int64 = 2 
  atomic.AddInt64(&a.uncounted, n)
  log.Printf("count is %d", a.uncounted)

  metrics.NewEWMA1().Update(2)
  log.Printf("All done!")
}

I get the following results

$ go version && go run main.go 
go version go1.1.2 linux/386
2013/09/16 20:19:29 count is 2
panic: runtime error: invalid memory address or nil pointer dereference
[signal 0xb code=0x1 addr=0x1 pc=0x80654fc]

goroutine 1 [running]:
sync/atomic.AddUint64()
        /usr/local/go/src/pkg/sync/atomic/asm_386.s:69 +0xc
_/mnt/jenkins/tmp/gotest/go-metrics.(*StandardEWMA).Update(0x1824c0c0, 0x2, 0x0)
        /mnt/jenkins/tmp/gotest/go-metrics/ewma.go:82 +0xd6
main.main()
        /mnt/jenkins/tmp/gotest/main.go:19 +0xe6

goroutine 2 [syscall]:

goroutine 3 [runnable]:
exit status 2

Here's the offending line (which, notice, seems to work just fine in my main function above). a.uncounted points to a valid address (holding zero). Initializing it makes no difference.

atomic.AddInt64(&a.uncounted, n)

¯_(⊙︿⊙)_/¯

Won't compile on AppEngine

Per rcrowley/go-tigertonic#75, go-metrics won't compile on App Engine because it uses runtime.NumCgoCall, among probably other verboten variables.

expDecaySampleHeap generates a lot of garbage

We're using a lot timers in our application and they are updated fairly often. We've found in our benchmarking that a significant portion of our application's allocations are due to interface{} conversions in expDecaySampleHeap. It would be worthwhile to re-implement the heap code specialized for the expDecaySample type instead of using the heap package and avoid the overhead of using interface{}.

data race

with -race I found:

==================
WARNING: DATA RACE
Read by goroutine 10:
  github.com/Dieterbe/go-metrics.(*StandardRegistry).registered()
      /home/dieter/go/src/github.com/Dieterbe/go-metrics/registry.go:139 +0x63
  github.com/Dieterbe/go-metrics.(*StandardRegistry).Each()
      /home/dieter/go/src/github.com/Dieterbe/go-metrics/registry.go:63 +0x42
  github.com/Dieterbe/go-metrics.graphite()
      /home/dieter/go/src/github.com/Dieterbe/go-metrics/graphite.go:109 +0x372
  github.com/Dieterbe/go-metrics.GraphiteWithConfig()
      /home/dieter/go/src/github.com/Dieterbe/go-metrics/graphite.go:42 +0xe9
  github.com/Dieterbe/go-metrics.Graphite()
      /home/dieter/go/src/github.com/Dieterbe/go-metrics/graphite.go:35 +0x19d

Previous write by goroutine 28:
  runtime.mapassign1()
      /build/go/src/go-1.3.3/src/pkg/runtime/hashmap.goc:925 +0x0
  github.com/Dieterbe/go-metrics.(*StandardRegistry).register()
      /home/dieter/go/src/github.com/Dieterbe/go-metrics/registry.go:133 +0x222
  github.com/Dieterbe/go-metrics.(*StandardRegistry).Register()
      /home/dieter/go/src/github.com/Dieterbe/go-metrics/registry.go:97 +0xc1
  github.com/Dieterbe/go-metrics.Register()
      /home/dieter/go/src/github.com/Dieterbe/go-metrics/registry.go:169 +0x8b
  main.Counter()
      /home/dieter/go/src/github.com/graphite-ng/carbon-relay-ng/metrics_wrapper.go:11 +0x143
  main.NewConn()
      /home/dieter/go/src/github.com/graphite-ng/carbon-relay-ng/conn.go:81 +0x6fd
  main.(*Destination).updateConn()
      /home/dieter/go/src/github.com/graphite-ng/carbon-relay-ng/destination.go:132 +0x286

I.e. the graphite exporter is concurrently executing c.Registry.Each(...) which calls registered(), alongside with registry.register.

func (r *StandardRegistry) registered() map[string]interface{} {
    metrics := make(map[string]interface{}, len(r.metrics))
    r.mutex.Lock()
    defer r.mutex.Unlock()
    for name, i := range r.metrics {
        metrics[name] = i
    }
    return metrics

simply switching the two first lines should do it, the read of r.metrics will be protected.
but i wanted to file it here to make sure.

which license this project belong to ?

Meters will continuously report their last rates without new data

After the last time you Mark() a Meter, it will always continue to report the same value. I believe I can get around this by Mark(0)'ing meters directly before reporting them, but I still can't get to the Meter of a Timer.

I took a quick hack at writing a test case to illustrate this, but much of it was time-dependent. I figured it would be worth opening the issue. If the fix is obvious to a set of eyes that aren't my own, it seems like marking them before handing them to the function passed to Registry.Each would be super convenient.

To illustrate it, I have a Go proxy using this lib sitting in front of a JVM app using the Metrics library.

Below, I'm running two short bursts of requests then coming to an abrupt stop (without touching either the Go or JVM process -- they both remain running). The value highlighted can be ignored -- it's just where my mouse landed when taking a screenshot.

The rates as reported from the JVM Metrics lib:

And from go-metrics:

API request: visibility into Graphite submission errors

The request: I'd like to be able to own the tick loop currently in GraphiteWithConfig for alternate error handling.

Motivation: we're running Graphite behind Amazon ELB, and today our instance changed its IP address. Metrics submissions started failing and required a restart of the service. I can create a new GraphiteConfig in this case, but the current metrics API doesn't give any way to detect that the error has occurred.

This could be as simple as making func graphite public, or adding a tiny wrapper:

func GraphiteOnce(c GraphiteConfig) error {
    return graphite(&c)
}

Possible to reduce number of goroutines for meters?

Currently each of the meters spawns its own goroutine for updating. That's fine if you have a small number in your app but when you get in to the 100s or 1000s of meters it really bloats the number of goroutines which is a lot of overhead, and it makes poring through stack traces more difficult.

I'm wondering if it would be possible to replace the design with one that just uses a single arbiter goroutine with a ticker to update all the meters at once. I haven't thought about it in real depth yet but looking at the current arbiter code it seems like it should be possible.

Have you considered the possibility or are you aware of any roadblocks that would prevent such a thing from being achieved?

Histogram's Max() and probably others

Histogram's Max() (and probably other, similar functions) are wrong and store values independently of the Sample being used. This means that Max() on a Histogram backed by an ExpDecaySample doesn't return the sample's current Max.

I read through the related @codahale's metrics implementation and they solve this by taking a snapshot of the sample (there it's called a Reservoir) during reporting. The snapshot is what has functions like Max() on it.

This is a nice abstraction, allowing the different Sample implementation deal with handling values, letting the snapshot tell you about the values of a sample at a given time, etc.

Would you consider a patch that re-organizes things and inserts Snapshots in between Samples and Histograms?

Invalid UniformSample update algorithm

Documentation claims that UniformSample is using Vitter's Algorithm R for reservoir sampling, but actually it is implemented incorrectly.

Please see http://www.cs.umd.edu/~samir/498/vitter.pdf, page no.39, paragraph starting with words "Algorithm R...", the part about new item becoming a candidate. Also see wikipedia article for valid metacode.

Current implementation is closer to a moving window.

Patch:

diff --git i/sample.go w/sample.go
index e34b7b5..937901b 100644
--- i/sample.go
+++ w/sample.go
@@ -503,7 +503,10 @@ func (s *UniformSample) Update(v int64) {
    if len(s.values) < s.reservoirSize {
        s.values = append(s.values, v)
    } else {
-       s.values[rand.Intn(s.reservoirSize)] = v
+       r := rand.Int63n(s.count)
+       if r < int64(len(s.values)) {
+           s.values[int(r)] = v
+       }
    }
 }

This would break some tests.

rcrowley / go-metrics Goto Github PK

go-metrics's People

Stargazers

Watchers

Forkers

go-metrics's Issues

Current Behavior?

Batch 1

Batch 2

Batch 3

Desired Behavior

Batch 1

Batch 2

Batch 3

Recommend Projects

Recommend Topics

Recommend Org