honeycombio / dynsampler-go Goto Github PK
View Code? Open in Web Editor NEWDynsampler is a Golang library for dynamic sampling of events sent to Honeycomb.
License: Apache License 2.0
Dynsampler is a Golang library for dynamic sampling of events sent to Honeycomb.
License: Apache License 2.0
I was looking through the code for AvgSampleWithMin
to understand how it works and noticed that there might be a case where data is always sampled until you hit the MinEventsPerSec
threshold.
AvgSampleWithMin{}.GetSampleRate()
has a shortcut if the sampler hasn't received any data:
dynsampler-go/avgsamplewithmin.go
Lines 164 to 166 in 7f9929d
This flag is only set to true if AvgSampleWithMin{}.updateMaps()
gets to the end of the function:
dynsampler-go/avgsamplewithmin.go
Lines 152 to 155 in 7f9929d
But if the number of events seen falls below the per-second threshold, then updateMaps()
short-circuits and configures the sampler to not sample the keys we've seen:
dynsampler-go/avgsamplewithmin.go
Lines 97 to 107 in 7f9929d
But it never sets haveData
to true, so GetSampleRate()
will always return the goal sample rate.
I suppose this could happen if the sampler is running in a development environment, where it might not see enough traffic to hit the default min of 50 events/second, so never gets to the bottom of the function?
Suppose someone is trying to achieve a high sample rate (30_000) with EMA.
To achieve a target rate of X with EMA, the highest-volume key has to see, on average, at least X occurrences during each AdjustmentInterval โ for each refinery instance.
If the key cardinality is too high, or traffic is more balanced across the keys, or the AdjustmentInterval is too short, it might not get there.
It is clearer to flip it over โ if the most common key has an average of X occurrences per AdjustmentInterval, then it won't be possible to see a resulting rate of greater than X.
Raising the GoalSampleRate won't affect the outcome, because there weren't enough samples to get to the original rate. Increasing the adjustment interval would help, but it also means that the sampler is less responsive.
There might be a way to look at the math that isn't as dependent on the target rate -- but it gets complicated by the fact that we want to ensure that there's at least one instance of every key in the resulting output.
The WindowedThroughput sampler might address this problem -- more investigation needed.
Many users of refinery would like the highest possible sample rate consistent with their pricing plan. The easiest way to do that is with a Throughput sampler, but we want to use an EMA technique to sample keys based on their frequency. So let's combine the two.
This is similar in goals to the newly added WindowedThroughput sampler, but uses less memory and has a weighted view of the past (recent things are more important than older things).
This is a follow-up of honeycombio/honeycomb-kubernetes-agent#81.
When using dynamic sampling for a log watcher where the event rate never reaches the minimum rate of dynsampler.AvgSampleWithMin
the watcher will use the target sample rate as a static sample rate forever.
If the minimum sample rate is reached once, and dynsampler.AvgSampleWithMin.updateMaps
returns normally, haveData
will be set. After that all events will be kept when the event rate drops below minimum, instead of sampling at the target sample rate.
I think it would be better for the watcher to fall back to keeping all events, even if the minimum event rate is never reached.
It would be nice to be able to set your dynamic sampling to use the latency of the event as part of the key for determining the sample rate, where the 50th percentile gets sampled most heavily and the 1% and 99% get sampled least heavily (with a curve of sample rate applied).
It is clear how to build this with the existing interface if you wish to use only a latency as your key, but I think allowing a combination of string key and percentile key might require adding an additional interface to the package.
Is your feature request related to a problem? Please describe.
The samplers in this repository are designed to be called for each span individually, and increment their counters with ++
. But in Refinery, the samplers are called once per trace. Each trace accounts for a number of spans, but that's not included in the calculations. Thus, throughput sampling in particular is potentially very wrong.
Describe the solution you'd like
Add a new function that takes count as a parameter, and implement the existing GetSampleRate call in terms of that function, passing 1 as the second parameter.
Describe alternatives you've considered
Calling the samplers once per span and ignoring most of the results -- but that's an expensive alternative.
Additional context
In this block in avgsamplerate.go
:
dynsampler-go/avgsamplerate.go
Lines 67 to 70 in e2e8a48
The Unlock
happens right after the map[string]int
is re-set, and the goroutine continues to run with (at least temporarily) the value of the keys set to 0
. It seems that this races with GetSampleRate
.
dynsampler-go/avgsamplerate.go
Lines 136 to 147 in e2e8a48
Example program:
package main
import (
"fmt"
"time"
dynsampler "github.com/honeycombio/dynsampler-go"
)
func main() {
sampler := &dynsampler.AvgSampleRate{
ClearFrequencySec: 1,
GoalSampleRate: 2,
}
if err := sampler.Start(); err != nil {
panic(err)
}
for {
rate := sampler.GetSampleRate("foo")
if rate != 2 {
fmt.Println(rate)
}
time.Sleep(1 * time.Millisecond)
}
}
running with -race
yields:
==================
WARNING: DATA RACE
Read at 0x00c42009e088 by main goroutine:
github.com/honeycombio/dynsampler-go.(*AvgSampleRate).GetSampleRate()
/home/nathan/go/src/github.com/honeycombio/dynsampler-go/avgsamplerate.go:143 +0x1f0
main.main()
/tmp/avgsampletest/main.go:21 +0xfe
Previous write at 0x00c42009e088 by goroutine 6:
github.com/honeycombio/dynsampler-go.(*AvgSampleRate).updateMaps()
/home/nathan/go/src/github.com/honeycombio/dynsampler-go/avgsamplerate.go:127 +0x824
github.com/honeycombio/dynsampler-go.(*AvgSampleRate).Start.func1()
/home/nathan/go/src/github.com/honeycombio/dynsampler-go/avgsamplerate.go:56 +0xe4
Goroutine 6 (running) created at:
github.com/honeycombio/dynsampler-go.(*AvgSampleRate).Start()
/home/nathan/go/src/github.com/honeycombio/dynsampler-go/avgsamplerate.go:59 +0x166
main.main()
/tmp/avgsampletest/main.go:16 +0xbe
==================
Possible fixes:
map
struct that locks before Get
and Set
. Not sure if the locking is meant to be avoiding anything else.defer.Unlock
in updateMaps
instead of unlocking at the beginning. Less efficient, but how much less depends on how many keys there are and what the value of ClearFrequencySec
is.cc @maplebed
The new WindowedThroughput sampler configures things using Go's time.Duration datatype. We should either convert the parameters to Duration (which is a breaking change) or create parallel parameters and deprecate the old ones.
Example: EMASampleRate has AdjustmentInterval
which is an integer representing a number of seconds.
We could create AdjustmentIntervalDuration
as a time.Duration and add some validation logic during Start() that says that only one of them can be nonzero.
This then simplifies configuration because all time values will consistently be a duration. It also eases testing because we could test with much shorter fractional durations.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.