Git Product home page Git Product logo

Comments (15)

GoogleCodeExporter avatar GoogleCodeExporter commented on May 5, 2024
+1

Also, I don't know what's the science behind the cutoffs chosen, but I just 
changed 
them from 0.9-1.1, to 0.5-2.0 -- maybe I went too lax on it, I dunno.

Original comment by [email protected] on 12 Jan 2010 at 7:24

from caliper.

GoogleCodeExporter avatar GoogleCodeExporter commented on May 5, 2024
It's a sanity check. I think the best fix is to write a new measurer. Perhaps 
we abandon 
the "always run for 5 seconds" default, and instead watch the numbers and wait 
for 
them to converge. If they don't converge for a long time, then fail.

Original comment by limpbizkit on 13 Jan 2010 at 3:58

from caliper.

GoogleCodeExporter avatar GoogleCodeExporter commented on May 5, 2024
Here are my latest thoughts:

We don't need to think of this as a sanity check that fails your whole run; 
instead, 
we could calculate and report a "quality score" for each measurement, and let 
the 
user decide.

As we run post-warmup trials, instead of keeping the rep count fixed 
(post-warmup), 
we can let it vary somewhat within a range. The "score" I'm referring to 
evaluates 
how linear these (time, reps) pairs appear to be. (Optionally, if they're 
linear but 
with a positive y-intercept, we could consider that value to be the per-trial 
overhead and correct for it in the values reported?)

Now, an intelligent measurer can be configured with not only a maximum run 
length in 
seconds, but also an "early termination" score that, if achieved, will end the 
run 
early to save time.

And, in a multiple-measurements-per-scenario situation, we no longer have to 
just 
blindly average (or median) the individual measurements; we can look at their 
quality 
scores and discard some of them, or weight them lower.  Of course, I think that 
most 
often if one measurement has low quality, the rest probably do too, because 
it's 
likely an inherent problem in the way the benchmark is constructed.


Original comment by [email protected] on 13 Jan 2010 at 5:29

from caliper.

GoogleCodeExporter avatar GoogleCodeExporter commented on May 5, 2024
Incidentally if it's a benchmark that allocates any memory, then it's unlikely 
that every timing interval will get 
interrupted by GC the same number of times with the same workload to do, so 
variation is inevitable.

Original comment by [email protected] on 22 Jan 2010 at 12:48

from caliper.

GoogleCodeExporter avatar GoogleCodeExporter commented on May 5, 2024
Still think that varying the rep count and evaluating the linearity of results 
would 
be a great thing to do.

Original comment by [email protected] on 7 Jun 2010 at 5:36

  • Added labels: Milestone-0.5

from caliper.

GoogleCodeExporter avatar GoogleCodeExporter commented on May 5, 2024
With r132 I've added a basic sanity check to give a friendly exception when the 
benchmark body is optimized away.

The only thing we're missing is checks for enh's reported error case: when the 
benchmark is close to, but not linear. For example, if the number of reps 
impacts the runtime. I'm unsure what the fix here will be.

Original comment by [email protected] on 6 Jul 2010 at 12:49

  • Changed state: Started

from caliper.

GoogleCodeExporter avatar GoogleCodeExporter commented on May 5, 2024
It would be nice to simply report a 'quality score' for each measurement 
somewhere.  Then just let users decide what to do.

So, if the results are absolutely linear, the quality score is 1.  It's 
possible all we need here is the statistical correlation coefficient between 
the rep counts and elapsed time measurements.

Original comment by [email protected] on 8 Jul 2010 at 4:05

from caliper.

GoogleCodeExporter avatar GoogleCodeExporter commented on May 5, 2024

Original comment by [email protected] on 8 Jul 2010 at 4:05

  • Changed title: Report a quality score for each measurement

from caliper.

GoogleCodeExporter avatar GoogleCodeExporter commented on May 5, 2024

Original comment by [email protected] on 14 Jul 2010 at 11:33

  • Added labels: Milestone-1.0
  • Removed labels: Milestone-0.5

from caliper.

GoogleCodeExporter avatar GoogleCodeExporter commented on May 5, 2024
Earlier I said "It's possible all we need here is the statistical correlation 
coefficient between the rep counts and elapsed time measurements."  This is 
probably not true.  Probably, we can vary the rep count just to check for 
concavity for sanity's sake, but the statistic the user will want to see really 
is the standard deviation.

Original comment by [email protected] on 17 Jul 2010 at 9:24

from caliper.

GoogleCodeExporter avatar GoogleCodeExporter commented on May 5, 2024

Original comment by [email protected] on 19 Mar 2011 at 2:57

  • Changed state: Accepted

from caliper.

GoogleCodeExporter avatar GoogleCodeExporter commented on May 5, 2024

Original comment by [email protected] on 19 Mar 2011 at 3:06

  • Added labels: Type-Enhancement

from caliper.

GoogleCodeExporter avatar GoogleCodeExporter commented on May 5, 2024
In caliper 1.0 we gather and report all the relevant data (subject to the 
user's preferences in caliperrc).  So even if we don't show a stddev on the 
screen, anyone who cares can always compute it later.  Tagging post-1.0.

Original comment by [email protected] on 14 Nov 2011 at 8:44

  • Added labels: Milestone-Post-1.0
  • Removed labels: Milestone-1.0

from caliper.

GoogleCodeExporter avatar GoogleCodeExporter commented on May 5, 2024

Original comment by [email protected] on 8 Feb 2012 at 9:49

  • Added labels: Component-Runner

from caliper.

GoogleCodeExporter avatar GoogleCodeExporter commented on May 5, 2024
I no longer think we want/need to do this.

Original comment by [email protected] on 16 May 2012 at 11:27

  • Changed state: WontFix

from caliper.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.