Git Product home page Git Product logo

dynsampler-go's People

Contributors

bdarfler avatar christineyen avatar dependabot[bot] avatar emfree avatar igorwwwwwwwwwwwwwwwwwwww avatar jamiedanielson avatar kentquirk avatar maplebed avatar mikegoldsmith avatar nathanleclaire avatar pkanal avatar robbkidd avatar samstokes avatar tredman avatar tylerhelmuth avatar vreynolds avatar yizzlez avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

dynsampler-go's Issues

`AvgSampleWithMin` does not turn off sampling if throughput is below the min events per second threshold

I was looking through the code for AvgSampleWithMin to understand how it works and noticed that there might be a case where data is always sampled until you hit the MinEventsPerSec threshold.

AvgSampleWithMin{}.GetSampleRate() has a shortcut if the sampler hasn't received any data:

if !a.haveData {
return a.GoalSampleRate
}

This flag is only set to true if AvgSampleWithMin{}.updateMaps() gets to the end of the function:

a.lock.Lock()
defer a.lock.Unlock()
a.savedSampleRates = newSavedSampleRates
a.haveData = true

But if the number of events seen falls below the per-second threshold, then updateMaps() short-circuits and configures the sampler to not sample the keys we've seen:

// check to see if we fall below the minimum
if sumEvents < a.MinEventsPerSec*a.ClearFrequencySec {
// we still need to go through each key to set sample rates individually
for k := range tmpCounts {
newSavedSampleRates[k] = 1
}
a.lock.Lock()
defer a.lock.Unlock()
a.savedSampleRates = newSavedSampleRates
return
}

But it never sets haveData to true, so GetSampleRate() will always return the goal sample rate.

I suppose this could happen if the sampler is running in a development environment, where it might not see enough traffic to hit the default min of 50 events/second, so never gets to the bottom of the function?

It's hard to achieve a high sample rate with a key-based sampler

Suppose someone is trying to achieve a high sample rate (30_000) with EMA.

To achieve a target rate of X with EMA, the highest-volume key has to see, on average, at least X occurrences during each AdjustmentInterval โ€” for each refinery instance.

If the key cardinality is too high, or traffic is more balanced across the keys, or the AdjustmentInterval is too short, it might not get there.

It is clearer to flip it over โ€” if the most common key has an average of X occurrences per AdjustmentInterval, then it won't be possible to see a resulting rate of greater than X.

Raising the GoalSampleRate won't affect the outcome, because there weren't enough samples to get to the original rate. Increasing the adjustment interval would help, but it also means that the sampler is less responsive.

There might be a way to look at the math that isn't as dependent on the target rate -- but it gets complicated by the fact that we want to ensure that there's at least one instance of every key in the resulting output.

The WindowedThroughput sampler might address this problem -- more investigation needed.

feat: Add EMAThroughput sampler

Many users of refinery would like the highest possible sample rate consistent with their pricing plan. The easiest way to do that is with a Throughput sampler, but we want to use an EMA technique to sample keys based on their frequency. So let's combine the two.

This is similar in goals to the newly added WindowedThroughput sampler, but uses less memory and has a weighted view of the past (recent things are more important than older things).

Undesired behaviour when the minimum event rate for dynamic sampling is never reached

This is a follow-up of honeycombio/honeycomb-kubernetes-agent#81.

When using dynamic sampling for a log watcher where the event rate never reaches the minimum rate of dynsampler.AvgSampleWithMin the watcher will use the target sample rate as a static sample rate forever.

If the minimum sample rate is reached once, and dynsampler.AvgSampleWithMin.updateMaps returns normally, haveData will be set. After that all events will be kept when the event rate drops below minimum, instead of sampling at the target sample rate.

I think it would be better for the watcher to fall back to keeping all events, even if the minimum event rate is never reached.

feature request - sample based on percentile latency

It would be nice to be able to set your dynamic sampling to use the latency of the event as part of the key for determining the sample rate, where the 50th percentile gets sampled most heavily and the 1% and 99% get sampled least heavily (with a curve of sample rate applied).

It is clear how to build this with the existing interface if you wish to use only a latency as your key, but I think allowing a combination of string key and percentile key might require adding an additional interface to the package.

feat: Add GetSampleRateMulti(key, count)

Is your feature request related to a problem? Please describe.

The samplers in this repository are designed to be called for each span individually, and increment their counters with ++. But in Refinery, the samplers are called once per trace. Each trace accounts for a number of spans, but that's not included in the calculations. Thus, throughput sampling in particular is potentially very wrong.

Describe the solution you'd like

Add a new function that takes count as a parameter, and implement the existing GetSampleRate call in terms of that function, passing 1 as the second parameter.

Describe alternatives you've considered

Calling the samplers once per span and ignoring most of the results -- but that's an expensive alternative.

Additional context

Race condition in AvgSampleRate (maybe causing sample rate to be 0?)

In this block in avgsamplerate.go:

a.lock.Lock()
tmpCounts := a.currentCounts
a.currentCounts = make(map[string]int)
a.lock.Unlock()

The Unlock happens right after the map[string]int is re-set, and the goroutine continues to run with (at least temporarily) the value of the keys set to 0. It seems that this races with GetSampleRate.

func (a *AvgSampleRate) GetSampleRate(key string) int {
a.lock.Lock()
defer a.lock.Unlock()
a.currentCounts[key]++
if !a.haveData {
return a.GoalSampleRate
}
if rate, found := a.savedSampleRates[key]; found {
return rate
}
return 1
}

Example program:

package main

import (
        "fmt"
        "time"

        dynsampler "github.com/honeycombio/dynsampler-go"
)

func main() {
        sampler := &dynsampler.AvgSampleRate{
                ClearFrequencySec: 1,
                GoalSampleRate:    2,
        }

        if err := sampler.Start(); err != nil {
                panic(err)
        }

        for {
                rate := sampler.GetSampleRate("foo")
                if rate != 2 {
                        fmt.Println(rate)
                }
                time.Sleep(1 * time.Millisecond)
        }
}

running with -race yields:

==================
WARNING: DATA RACE
Read at 0x00c42009e088 by main goroutine:
  github.com/honeycombio/dynsampler-go.(*AvgSampleRate).GetSampleRate()
      /home/nathan/go/src/github.com/honeycombio/dynsampler-go/avgsamplerate.go:143 +0x1f0
  main.main()
      /tmp/avgsampletest/main.go:21 +0xfe
 
Previous write at 0x00c42009e088 by goroutine 6:
  github.com/honeycombio/dynsampler-go.(*AvgSampleRate).updateMaps()
      /home/nathan/go/src/github.com/honeycombio/dynsampler-go/avgsamplerate.go:127 +0x824
  github.com/honeycombio/dynsampler-go.(*AvgSampleRate).Start.func1()
      /home/nathan/go/src/github.com/honeycombio/dynsampler-go/avgsamplerate.go:56 +0xe4
 
Goroutine 6 (running) created at:
  github.com/honeycombio/dynsampler-go.(*AvgSampleRate).Start()
      /home/nathan/go/src/github.com/honeycombio/dynsampler-go/avgsamplerate.go:59 +0x166
  main.main()
      /tmp/avgsampletest/main.go:16 +0xbe
==================

Possible fixes:

  1. Make a map struct that locks before Get and Set. Not sure if the locking is meant to be avoiding anything else.
  2. Just defer.Unlock in updateMaps instead of unlocking at the beginning. Less efficient, but how much less depends on how many keys there are and what the value of ClearFrequencySec is.

cc @maplebed

Configuration durations should be of time.Duration type

The new WindowedThroughput sampler configures things using Go's time.Duration datatype. We should either convert the parameters to Duration (which is a breaking change) or create parallel parameters and deprecate the old ones.

Example: EMASampleRate has AdjustmentInterval which is an integer representing a number of seconds.

We could create AdjustmentIntervalDuration as a time.Duration and add some validation logic during Start() that says that only one of them can be nonzero.

This then simplifies configuration because all time values will consistently be a duration. It also eases testing because we could test with much shorter fractional durations.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.