Comments (15)
+1
Also, I don't know what's the science behind the cutoffs chosen, but I just
changed
them from 0.9-1.1, to 0.5-2.0 -- maybe I went too lax on it, I dunno.
Original comment by [email protected]
on 12 Jan 2010 at 7:24
from caliper.
It's a sanity check. I think the best fix is to write a new measurer. Perhaps
we abandon
the "always run for 5 seconds" default, and instead watch the numbers and wait
for
them to converge. If they don't converge for a long time, then fail.
Original comment by limpbizkit
on 13 Jan 2010 at 3:58
from caliper.
Here are my latest thoughts:
We don't need to think of this as a sanity check that fails your whole run;
instead,
we could calculate and report a "quality score" for each measurement, and let
the
user decide.
As we run post-warmup trials, instead of keeping the rep count fixed
(post-warmup),
we can let it vary somewhat within a range. The "score" I'm referring to
evaluates
how linear these (time, reps) pairs appear to be. (Optionally, if they're
linear but
with a positive y-intercept, we could consider that value to be the per-trial
overhead and correct for it in the values reported?)
Now, an intelligent measurer can be configured with not only a maximum run
length in
seconds, but also an "early termination" score that, if achieved, will end the
run
early to save time.
And, in a multiple-measurements-per-scenario situation, we no longer have to
just
blindly average (or median) the individual measurements; we can look at their
quality
scores and discard some of them, or weight them lower. Of course, I think that
most
often if one measurement has low quality, the rest probably do too, because
it's
likely an inherent problem in the way the benchmark is constructed.
Original comment by [email protected]
on 13 Jan 2010 at 5:29
from caliper.
Incidentally if it's a benchmark that allocates any memory, then it's unlikely
that every timing interval will get
interrupted by GC the same number of times with the same workload to do, so
variation is inevitable.
Original comment by [email protected]
on 22 Jan 2010 at 12:48
from caliper.
Still think that varying the rep count and evaluating the linearity of results
would
be a great thing to do.
Original comment by [email protected]
on 7 Jun 2010 at 5:36
- Added labels: Milestone-0.5
from caliper.
With r132 I've added a basic sanity check to give a friendly exception when the
benchmark body is optimized away.
The only thing we're missing is checks for enh's reported error case: when the
benchmark is close to, but not linear. For example, if the number of reps
impacts the runtime. I'm unsure what the fix here will be.
Original comment by [email protected]
on 6 Jul 2010 at 12:49
- Changed state: Started
from caliper.
It would be nice to simply report a 'quality score' for each measurement
somewhere. Then just let users decide what to do.
So, if the results are absolutely linear, the quality score is 1. It's
possible all we need here is the statistical correlation coefficient between
the rep counts and elapsed time measurements.
Original comment by [email protected]
on 8 Jul 2010 at 4:05
from caliper.
Original comment by [email protected]
on 8 Jul 2010 at 4:05
- Changed title: Report a quality score for each measurement
from caliper.
Original comment by [email protected]
on 14 Jul 2010 at 11:33
- Added labels: Milestone-1.0
- Removed labels: Milestone-0.5
from caliper.
Earlier I said "It's possible all we need here is the statistical correlation
coefficient between the rep counts and elapsed time measurements." This is
probably not true. Probably, we can vary the rep count just to check for
concavity for sanity's sake, but the statistic the user will want to see really
is the standard deviation.
Original comment by [email protected]
on 17 Jul 2010 at 9:24
from caliper.
Original comment by [email protected]
on 19 Mar 2011 at 2:57
- Changed state: Accepted
from caliper.
Original comment by [email protected]
on 19 Mar 2011 at 3:06
- Added labels: Type-Enhancement
from caliper.
In caliper 1.0 we gather and report all the relevant data (subject to the
user's preferences in caliperrc). So even if we don't show a stddev on the
screen, anyone who cares can always compute it later. Tagging post-1.0.
Original comment by [email protected]
on 14 Nov 2011 at 8:44
- Added labels: Milestone-Post-1.0
- Removed labels: Milestone-1.0
from caliper.
Original comment by [email protected]
on 8 Feb 2012 at 9:49
- Added labels: Component-Runner
from caliper.
I no longer think we want/need to do this.
Original comment by [email protected]
on 16 May 2012 at 11:27
- Changed state: WontFix
from caliper.
Related Issues (20)
- Build is falling : Dagger and dagger-compiler jar is not being downloaded HOT 3
- frequent releases to maven repo (at least snapshot) HOT 8
- failing tests (f8dcb9a794b0cee6f9408d1b692572dbcd03788c) HOT 3
- scripts/caliper broken
- StreamServiceTest is flaky
- A simple example of how to use HOT 2
- Caliper with HttpServletRequest
- Missing pom.xml from recent commit
- Results should not be uploaded to the web by default HOT 1
- @Benchmark annotation
- -XX:CICompilerCount=2 vmOption does not apply
- microbenchmarks link in the home page of caliper wiki is not right HOT 1
- Fix Caliper tests under Java 10+11 HOT 3
- Unreleased Resource: Streams HOT 1
- Check for problems with ByteBuffer covariant return types when building with JDK 9+ HOT 2
- Maven release very old (2015)
- There is a vulnerability in OkHttp 2.5.0,upgrade recommended
- There is a vulnerability in Guava: Google Core Libraries for Java 28.1-jre,upgrade recommended
- Say something about (a) JMH and (b) the Jetpack Benchmark library HOT 7
- [Docs] Possible outdated documentation
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from caliper.