juliaci / benchmarktools.jl Goto Github PK

View Code? Open in Web Editor NEW

592.0 8.0 97.0 1.22 MB

A benchmarking framework for the Julia language

License: Other

Julia 100.00%

benchmark julia julia-language

benchmarktools.jl's People

Contributors

Stargazers

Watchers

Forkers

yuyichao tkelman lopezm94 vchuravy mauro3 vtjnash tkoolen garborg kristofferc cstjean fredrikekre scottpjones nalimilan goerz-forks xorjane terasakisatoshi adamslc thchr musm willow-ahrens tkf nhdaly briochemc jondeuce wkearn stjordanis roger-luo milesfrain miguelraz bdeonovic jw3126 carstenbauer skmendez oxinabox aerdely one-more-fix asinghvi17 jona-engel jiazichen111 kismet-kah fonsp pauljurczak shushman hs-ye jonasisensee tkralphs azurah francescocalisto nitin-ppnp ethankeystone gustaphe seelengrab federicostra standardgalactic abelsiqueira statisticalmice olof3 ihnorton physicscodeslab herclau ericphanson moelf luapulu louie-github cmcaine jhaasiii92 arkoniak mfsch stepanzh mcabbott vyu playfloor wikunia tmigot ettersi strickek lilithhafner pitmonticone kleinschmidt skleinbo d-netto whojo t-bltg oliver-leete frankwswang felixbenning nickrobinson251 maleadt jishnub ranocha milescranmer alexandernenninger lucaferranti pallharaldsson nitallorr zentrik yha

benchmarktools.jl's Issues

ERROR: UndefVarError: @MODULE not defined

julia> using BenchmarkTools

julia> @benchmark 1+1
ERROR: UndefVarError: @__MODULE__ not defined

julia> versioninfo()
Julia Version 0.6.1
Commit 0d7248e* (2017-10-24 22:15 UTC)
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: Intel(R) Core(TM) i7-4770 CPU @ 3.40GHz
  WORD_SIZE: 64
  BLAS: libopenblas (USE64BITINT DYNAMIC_ARCH NO_AFFINITY Haswell)
  LAPACK: libopenblas64_
  LIBM: libopenlibm
  LLVM: libLLVM-3.9.1 (ORCJIT, haswell)

julia> Pkg.status("BenchmarkTools")
 - BenchmarkTools                0.2.1

julia> @__MODULE__
ERROR: UndefVarError: @__MODULE__ not defined

Better time resolution for cheap functions

Right now the resolution of the result seems to be limited to 1ns. This should be enough for expensive benchmarks but could be improved with averaging and may not be enough for benchmarking cheap operations that only takes a few ns.

I encounter these situations mainly when benchmarking low level operations. E.g. in JuliaLang/julia#16174, where the optimized version of g2 takes only 1.2ns per loop. It would be nice if I don't have to write my own loops for these.

Automatic interpolation to avoid global variable issues

After approximately the zillionth time seeing people get confusing or incorrect benchmark results because they did:

@benchmark foo(x)

instead of

@benchmark foo($x)

I started wondering if maybe we could do something to avoid forcing this cognitive burden on users.

As inspiration, I've used the following macro in the unit tests to measure "real" allocations from a single execution of a function:

macro wrappedallocs(expr)
    argnames = [gensym() for a in expr.args]
    quote
        function g($(argnames...))
            @allocated $(Expr(expr.head, argnames...))
        end
        $(Expr(:call, :g, [esc(a) for a in expr.args]...))
    end
end

@wrappedallocs f(x) turns @allocated f(x) into something more like:

function g(_y)
  @allocated f(_y)
end
g(y)

which does the same computation but measures the allocations inside the wrapped function instead of at global scope.

It might be possible to do something like this for benchmarking. This particular implementation is wrong, because @wrappedallocs f(g(x)) will only measure the allocations of f() not g(), but a similar approach, involving walking the expression to collect all the symbols and then passing those symbols through a new outer function, might work.

The result would be that

@benchmark f(g(y), x)

would turn into something like

function _f(_f, _g, _y, _x)
  @_benchmark _f(_g(_y), _x)
end
_f(f, g, y, x)

where @_benchmark does basically what regular @benchmark does right now. Passing _f and _g as arguments is not necessary if they're regular functions, but it is necessary if they're arbitrary callable objects.

The question is: is this a good idea? This makes BenchmarkTools more complicated, and might involve too much magic. I also haven't thought through how to integrate this with the setup arguments. I'm mostly just interested in seeing if this is something that's worth spending time on.

One particular concern I have is if the user tries to benchmark a big block of code, we may end up with the wrapper function taking a ridiculous number of arguments, which I suspect is likely to be handled badly by Julia. Fortunately, the macro can at least detect that case and demand that the user manually splice in their arguments.

How to get full summary?

print, show, showall all return only one line "Trial(21.917 ns)"
is there a function to print the full summary?

julia> x = @benchmark sin(1)
BenchmarkTools.Trial: 
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     21.917 ns (0.00% GC)
  median time:      21.932 ns (0.00% GC)
  mean time:        22.034 ns (0.00% GC)
  maximum time:     37.657 ns (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     997
  time tolerance:   5.00%
  memory tolerance: 1.00%

julia> print(x)
Trial(21.917 ns)
julia> show(x)
Trial(21.917 ns)
julia> showall(x)
Trial(21.917 ns)

Assign deep into BenchmarkGroup using vector key

It would be nice if one could assign directly deep into a BenchmarkGroup using
a Vector key. The intermediate groups should be created automatically as
necessary. Currently:

julia>  using BenchmarkTools
julia>  g = BenchmarkGroup()
julia>  g[[1, "a", :b]] = "hello"
ERROR: KeyError: key 1 not found

One would have to do

julia>  using BenchmarkTools
julia>  g = BenchmarkGroup(1 => BenchmarkGroup("a" => BenchmarkGroup()))
julia>  g[[1, "a", :b]] = "hello"
"hello"

Expected:

julia>  using BenchmarkTools
julia>  g = BenchmarkGroup()
julia>  g[[1, "a", :b]] = "hello"
"hello"

use more portable/stable serialization/deserialization

While JLD is valid HDF5, I've been told JLD is painfully structured for use in other languages.

I've been bitten by compatibility issues, as well. Maintaining backwards compatibility is hard to pull off smoothly, and JLD changes often conflict with changes in Base, breaking forwards compatibility.

It would be nice to have simple CSV or JSON (de)serialization methods that didn't rely on external dependencies (or at least relied on more stable ones). One benefit is that methods like BenchmarkTools.save/BenchmarkTools.load would be easier to patch than external methods like JLD.load/JLD.save.

cc @quinnj, since we discussed this at JuliaCon. Do you have any specific opinions here?

A better way to inspect the code of the benchmark loop/kernel

Doing benchmark correctly sometimes involving making sure there's no additional overhead (e.g. due to accidental use of global variables) and the operation being benchmarked isn't optimized out (most likely due to constant propagation). In additional to documenting and giving examples about different ways to supply the parameters to the benchmark, I think it'll be useful to provide a way to show the code actually running in the loop.

This is also most relevant for cheap operations...

Add docstring for @benchmark

help?> @benchmark
No documentation found.

Benchmarking across a range of inputs

One of the features I really like from google/benchmark is the ability to benchmark across a range of input values.
Since the size of the data can have impacts on performance due to cache behaviour, I often find myself benchmarking with different arraysizes.

Side effect from btime macro

Why does this change the binding of a? cc @stevengj

julia> using BenchmarkTools

julia> a = BitArray(rand() < 0.5 for x in 1:10^4, y in 1:10^4);

julia> @btime sum(a,1)
  52.620 ms (2 allocations: 78.20 KiB)
1×10000 Array{Int64,2}:
 5043  5049  5047  5028  4921  4929  …  5029  5022  5013  5082  5032  5011

julia> a
2

Overhead caused by widening

After all, johnmyleswhite/Benchmarks.jl#36 is still present in this package....

julia> b1 = @benchmarkable rand($Float64); tune!(b1); run(b1)
BenchmarkTools.Trial:
  memory estimate:  16 bytes
  allocs estimate:  1
  --------------
  minimum time:     60.568 ns (0.00% GC)
  median time:      63.546 ns (0.00% GC)
  mean time:        71.347 ns (5.25% GC)
  maximum time:     25.763 μs (99.74% GC)
  --------------
  samples:          10000
  evals/sample:     982

julia> b2 = @benchmarkable rand(); tune!(b2); run(b2)
BenchmarkTools.Trial:
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     3.432 ns (0.00% GC)
  median time:      3.962 ns (0.00% GC)
  mean time:        3.852 ns (0.00% GC)
  maximum time:     11.629 ns (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     1000

AFAICT this affects BaseBenchmarks

change name to Benchmarking.jl?

BenchmarkTools is the bomb and everyone uses it – or should! The name is a little awkward though and doesn't have quite the gravitas and officialness that it deserves. What about calling the package Benchmarking.jl so that one writes the lovely definitive sounding using Benchmarking when one uses it?

Add a function to generate nice reports

I have started using this package to test a module I'm making and it works really well!

I was thinking that it would be useful to have function to generate a report from the results of a suite. What I ended up doing was to go look at the Nanosoldier repo and copy most of the code there (the printreport function). Having such a function in this repository would be useful.

Benchmark results not printed on JUNO/Atom

A method for render() has to be defined, as stated by @ChrisRackauckas .

At the moment the only way to "see" the results in Atom is to press "Copy" in the console and then paste somewhere else :(

(print() also doesn't work)

`setup`/`teardown` doesn't allow multiple assignment syntax inside Tuple

julia> using BenchmarkTools

# this should work, but doesn't
julia> @benchmark (a + b + c) setup=(a,b,c=1,2,3)
ERROR: syntax: assignment not allowed inside tuple
 in generate_benchmark_definition(::Module, ::Array{Symbol,1}, ::Array{Symbol,1}, ::Expr, ::Expr, ::Void, ::BenchmarkTools.Parameters) at /Users/jarrettrevels/.julia/v0.5/BenchmarkTools/src/execution.jl:282

julia> @benchmark (a + b + c) setup=begin a,b,c=1,2,3 end
BenchmarkTools.Trial:
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     1.645 ns (0.00% GC)
  median time:      1.962 ns (0.00% GC)
  mean time:        1.967 ns (0.00% GC)
  maximum time:     4.511 ns (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     1000
  time tolerance:   5.00%
  memory tolerance: 1.00%

Update docs on what to use instead of showall

showall is gone: JuliaLang/julia#22847 How to best look at results now? Note that repr and print show even less info.

Invalid redefinition of constant func

func generated by @benchmark conflicts with user variables named func.

Example:

julia> func(x) = x
func (generic function with 1 method)

julia> @benchmark func(3)
ERROR: invalid redefinition of constant func

This is on Julia Version 0.5.0-rc2

broken out of the box?

With Julia 0.6.1 and a fresh install:

julia> Pkg.update()

julia> Pkg.add("BenchmarkTools")
INFO: Cloning cache of BenchmarkTools from https://github.com/JuliaCI/BenchmarkTools.jl.git
INFO: Installing BenchmarkTools v0.2.0
INFO: Package database updated

julia> using BenchmarkTools
INFO: Precompiling module BenchmarkTools.

julia> @btime sin(1)
ERROR: UndefVarError: @__MODULE__ not defined

julia> @benchmark sin(1)
ERROR: UndefVarError: @__MODULE__ not defined

BenchmarkTools not working with Julia 0.7

On running @benchmarks(functionname()) on an already working function, I get this as my error -

ERROR: syntax: invalid syntax (escape (call (outerref parallel_add)))
Stacktrace:
 [1] generate_benchmark_definition(::Module, ::Array{Symbol,1}, ::Array{Any,1}, ::Expr, ::Void, ::Void, ::BenchmarkTools.Parameters) at /home/memphis/.julia/v0.7/BenchmarkTools/src/execution.jl:289

I am running Julia 0.7. The error does not persist if I run it on an older version of Julia (0.6). Is there any specific reason why this error persists/any way to fix it?

This is what I get when I run Pkg.status() -

1 required packages:
 - BenchmarkTools                0.0.8
9 additional packages:
 - BinDeps                       0.7.0
 - Blosc                         0.3.0
 - Compat                        0.30.0
 - FileIO                        0.5.1
 - HDF5                          0.8.5
 - JLD                           0.8.1
 - LegacyStrings                 0.2.2
 - SHA                           0.5.1
 - URIParser                     0.2.0

judge working with memory

I hope this is isn't a usage question but I think it would be nice if

julia> judge(memory(oldresults["solver"]), memory(results["solver"]))
ERROR: MethodError: no method matching judge(::Int64, ::Int64)

would work

Example code still uses JLD for saving parameters

BenchmarkTools.jl/benchmark/benchmarks.jl

Lines 33 to 40 in 9e91a17

 paramspath = joinpath(dirname(@__FILE__), "params.jld") 

 if isfile(paramspath) 

 loadparams!(suite, BenchmarkTools.load(paramspath, "suite"), :evals); 

 else 

 tune!(suite) 

 BenchmarkTools.save(paramspath, "suite", params(suite)); 

 end

tune! overrides evals=1, causing errors in destructive benchmarks

The following works fine

A = Array(SymTridiagonal(fill(2, 5), ones(5)))
b = @benchmarkable Base.LinAlg.chol!(x) setup=(x = Hermitian(copy($A), :U))
warmup(b)
run(b)

but injecting tune! between warmup and run as in @benchmark, i.e.

A = Array(SymTridiagonal(fill(2, 5), ones(5)))
b = @benchmarkable Base.LinAlg.chol!(x) setup=(x = Hermitian(copy($A), :U))
warmup(b)
tune!(b)
run(b)

causes chol! to throw a PosDefException from within tune!

...
julia> tune!(b)
ERROR: Base.LinAlg.PosDefException(4)
 in _chol!(::Array{Float64,2}, ::Type{UpperTriangular}) at ./linalg/cholesky.jl:55
 in chol!(::Hermitian{Float64,Array{Float64,2}}) at ./linalg/cholesky.jl:124
 in ##core#429(::Hermitian{Float64,Array{Float64,2}}) at ./<missing>:0
 in ##sample#430(::BenchmarkTools.Parameters) at /Users/sacha/.julia/v0.6/BenchmarkTools/src/execution.jl:248
 in #_lineartrial#19(::Int64, ::Array{Any,1}, ::Function, ::BenchmarkTools.Benchmark{Symbol("##benchmark#428")}, ::BenchmarkTools.Parameters) at /Users/sacha/.julia/v0.6/BenchmarkTools/src/execution.jl:51
 in _lineartrial(::BenchmarkTools.Benchmark{Symbol("##benchmark#428")}, ::BenchmarkTools.Parameters) at /Users/sacha/.julia/v0.6/BenchmarkTools/src/execution.jl:43
 in #lineartrial#20(::Array{Any,1}, ::Function, ::BenchmarkTools.Benchmark{Symbol("##benchmark#428")}, ::BenchmarkTools.Parameters) at /Users/sacha/.julia/v0.6/BenchmarkTools/src/execution.jl:59
 in #tune!#22(::Bool, ::String, ::Array{Any,1}, ::Function, ::BenchmarkTools.Benchmark{Symbol("##benchmark#428")}, ::BenchmarkTools.Parameters) at /Users/sacha/.julia/v0.6/BenchmarkTools/src/execution.jl:114
 in tune!(::BenchmarkTools.Benchmark{Symbol("##benchmark#428")}) at /Users/sacha/.julia/v0.6/BenchmarkTools/src/execution.jl:114

which indicates that setup isn't occurring between samples in tune!. Thoughts? Thanks!

Print important number prominently

If I understand correctly, the median time & memory estimate are by far the most important numbers coming out of a run. The printing should reflect that. Maybe:

julia> @benchmark sin(1)
BenchmarkTools.Trial:
  median time:      13.00 ns (0.00% GC)
  memory estimate:  0.00 bytes

  minimum time:     13.00 ns (0.00% GC)
  mean time:        13.02 ns (0.00% GC)
  maximum time:     36.00 ns (0.00% GC)
  samples:          10000
  evals/sample:     1000
  time tolerance:   5.00%
  memory tolerance: 1.00%
  allocs estimate:  0

Display plot when possible (JuPyteR, Juno (when ready))

I really want a histogram to appear by default
ok to label minimum , maximum, mean

A picture is worth a mega-byte of numbers :-)

Interpolating a type causes spurious slowdown in benchmarks.

julia> @btime round(Int, 5);
  1.539 ns (0 allocations: 0 bytes)

julia> @btime round($Int, 5);
  54.228 ns (0 allocations: 0 bytes)

I believe these should be equivalent.

Benchmark with @benchmark only one sample?

I run the following: @benchmark heat_juafem_examples.allrun()

I can see that the macro executes the function inside the module heat_juafem_examples four times, but reference to only a single sample is printed:

julia> include("heat_juafem_examples.jl"); @benchmark heat_juafem_examples.allrun()
WARNING: replacing module heat_juafem_examples
# heat_juafem_example
# heat_juafem_example
# heat_juafem_example
# heat_juafem_example
BenchmarkTools.Trial:
  memory estimate:  3.65 GiB
  allocs estimate:  20007018
  --------------
  minimum time:     9.877 s (12.50% GC)
  median time:      9.877 s (12.50% GC)
  mean time:        9.877 s (12.50% GC)
  maximum time:     9.877 s (12.50% GC)
  --------------
  samples:          1
  evals/sample:     1

julia>

Am I missing something? Is this the expected behavior?

Test output before benchmark execution

Will be great to have a way automatically test the output equality of the outputs at the beginning of the benchmark execution, to avoid regressions in quality during optimization stages.

Installation time on OSX / dependencies

Installing this on OS X takes about 10 minutes, downloading and compiling things, even installing python. It also breaks my Gtk.jl installation (it replaces the libraries with other versions).

This seems a bit excessive for such a simple package, why does it need so much binary dependencies? I'm not sure where the issue is coming from, I guess from JLD.

Remove vector indexing in favor of foldl

BenchmarkGroups currently implements an indexing scheme where a key vector indicates nested indexing. This can and should be removed in favor of using foldl(getindex, g::BenchmarkGroup, keys) (ref here).

Tag a release

Could someone please tag a release that includes the changes for 0.6 deprecations?

Pkg.update() didn't update BenchmarkTools, installed for PolyBench

I've using PolyBench.jl, which installs BenchmarkTools.jl as a dependency.

I issued Pkg.update() to include the v0.7 deprecation patch, but that didn't update BenchmarkTools. Pkg.update worked only after BenchmarkTools.jl was swtiched to the master branch from a branch that had the same name of it latest commit.

Other packages, including BinDeps, were not on their master branch yet got updated.

Is there an issue with the way I'm using Pkg.update() ?

Seemlessly comparing benchmark results across versions

In an ideal world, I would love the following to work:

versions = ["v0.0.1", "3743c273f4439833787522242efdcda87951f6d1", "v0.0.2"]
results = Pkg.benchmark("Foo",versions)
@show results # nice printing

@jrevels has pointed out that it is probably not feasible to have Pkg support this functionality in the short-term (see discussion here). But it seems like BenchmarkTools could support this natively, perhaps as a prototype to be taken up by Pkg at a later time.

Let's say we call this function benchmark_pkg(), then my proposal is to support the following:

benchmark_pkg("Foo",:major) # runs benchmarks on all major releases
benchmark_pkg("Foo",:minor) # runs benchmarks on all major & minor releases
benchmark_pkg("Foo",:patch) # runs benchmarks on all releases

# run benchmarks for each version/commit stored in a user-provided list
benchmark_pkg("Foo",::Vector{AbstractString})

How can I help take baby steps towards this vision? My thought is along these lines:

User defines benchmarks in perf/runtests.jl. (Maybe perf/runbenchmarks.jl would be a better name?)
pkg_benchmark() finds parameters using tune!, saves them in perf/benchmark_params.jl
Save the original/starting commit
For each version specified by the user:
- Use git checkout ..., to selectively checkout the src/ repository.
- Run the benchmarks
After all benchmarks are run, return src/ to its original/starting commit

Is this reasonable? Or more dangerous than I realize? For example, what if user edits code while benchmarks are running and saves the result...

Should regressing/improving tests be re-run?

Given the large number of tests, there is still a fair amount of noise in the reports. It would seem that one way to reduce the noise would be to follow up on apparent regressions/improvements with a re-running of those tests. I haven't looked into the infrastructure at all to see how easy this would be, but if both builds are still available I'm wondering whether this would be fairly straightforward?

broken on master: invalid syntax

julia> f() =0
f (generic function with 1 method)

julia> @btime f()
ERROR: syntax: invalid syntax (escape (call (outerref f)))
Stacktrace:
 [1] generate_benchmark_definition(::Module, ::Array{Symbol,1}, ::Array{Any,1}, ::Expr, ::Void, ::Void, ::BenchmarkTools.Parameters) at C:\Users\Mus\.julia\v0.7\BenchmarkTools\src\execution.jl:289

Allocations and GC being ignored?

I'm finding different results using BenchmarkTools as opposed to my own loops and @time, particularly regarding allocating operations.

I'm benchmarking Base.Array vs the types in StaticArrays.jl. Some of these arrays are immutable, and don't perform GC allocations, and others are mutable. If you are doing lots of small, e.g., matrix multiplications, then the allocations and GC can dominate the cost, and an immutable array is much faster. I have a non-allocating SArray and a (mutable) allocating MArray.

The typical results I were getting using loops and @time showed that when new copies of Array or MArray were created, a lot of time was spent on allocation and GC (but not for SArray):

=====================================
   Benchmarks for 4×4 matrices
=====================================

Matrix multiplication
---------------------
Array               ->  6.526125 seconds (31.25 M allocations: 3.492 GB, 6.61% gc time)
SArray              ->  0.369290 seconds (5 allocations: 304 bytes)
MArray              ->  1.964021 seconds (15.63 M allocations: 2.095 GB, 12.05% gc time)

Matrix multiplication (mutating)
--------------------------------
Array               ->  4.540372 seconds (6 allocations: 576 bytes)
MArray              ->  0.748238 seconds (6 allocations: 448 bytes)

However, I switched my tests to BenchmarkTools.jl and now the difference between SArray and MArray has disappeared. It appears almost like the allocating costs have been ameliorated somehow. Perhaps I'm using the package wrong, but I get:

=====================================================================
 Matrices of size 4×4 and eltype Float64
=====================================================================
SArray:    m3 = m1 * m2 takes 52.00 ns, 144.00 bytes (GC 0.00 ns)
Array:     m3 = m1 * m2 takes 196.00 ns, 240.00 bytes (GC 0.00 ns)
MArray:    m3 = m1 * m2 takes 55.00 ns, 144.00 bytes (GC 0.00 ns)

Array:     A_mul_B!(m3, m1, m2) takes 150.00 ns, 0.00 bytes (GC 0.00 ns)
MArray:    A_mul_B!(m3, m1, m2) takes 20.00 ns, 0.00 bytes (GC 0.00 ns)

The two calls I make are @benchmark *($(copy(m)), $(copy(m))) and @benchmark A_mul_B!($(copy(m)), $(copy(m)), $(copy(m))), where m is some random matrix 4x4 I made out of the above types. Is that the right way to use @benchmark?

wildly different benchmark timings for the same function with/without `@inline`

I am seeing a 10x difference for the same functions with/without an @inline hint.

reproducible with

using Base: significand_mask, Math.significand_bits, Math.exponent_bias, exponent_mask,
    exponent_half, leading_zeros, Math.exponent_bits, sign_mask, unsafe_trunc,
    @pure, Math.@horner, fpinttype, Math.exponent_max, Math.exponent_raw_max

# log2(10)
const LOG210 = 3.321928094887362347870319429489390175864831393024580612054756395815934776608624
# log10(2)
const LOG102 = 3.010299956639811952137388947244930267681898814621085413104274611271081892744238e-01
# log(10)
const LN10 = 2.302585092994045684017991454684364207601101488628772976033327900967572609677367

# log10(2) into upper and lower bits
LOG102U(::Type{Float64}) = 3.01025390625000000000e-1
LOG102U(::Type{Float32}) = 3.00781250000000000000f-1

LOG102L(::Type{Float64}) = 4.60503898119521373889e-6
LOG102L(::Type{Float32}) = 2.48745663981195213739f-4

# max and min arguments
MAXEXP10(::Type{Float64}) = 3.08254715559916743851e2 # log 2^1023*(2-2^-52)
MAXEXP10(::Type{Float32}) = 38.531839419103626f0     # log 2^127 *(2-2^-23)

# one less than the min exponent since we can sqeeze a bit more from the exp10 function
MINEXP10(::Type{Float64}) = -3.23607245338779784854769e2 # log10 2^-1075
MINEXP10(::Type{Float32}) = -45.15449934959718f0         # log10 2^-150

@inline exp10_kernel(x::Float64) =
    @horner(x, 1.0,
    2.30258509299404590109361379290930926799774169921875,
    2.6509490552391992146397114993305876851081848144531,
    2.03467859229323178027470930828712880611419677734375,
    1.17125514891212478829629617393948137760162353515625,
    0.53938292928868392106522833273629657924175262451172,
    0.20699584873167015119932443667494226247072219848633,
    6.8089348259156870502017966373387025669217109680176e-2,
    1.9597690535095281527677713029333972372114658355713e-2,
    5.015553121397981796436571499953060992993414402008e-3,
    1.15474960721768829356725927226534622604958713054657e-3,
    1.55440426715227567738830671828509366605430841445923e-4,
    3.8731032432074128681303432086835414338565897196531e-5,
    2.3804466459036747669197886523306806338950991630554e-3,
    9.3881392238209649520573607528461934634833596646786e-5,
    -2.64330486232183387018679354696359951049089431762695e-2)

@inline exp10_kernel(x::Float32) =
    @horner(x, 1.0f0,
    2.302585124969482421875f0,
    2.650949001312255859375f0,
    2.0346698760986328125f0,
    1.17125606536865234375f0,
    0.5400512218475341796875f0,
    0.20749187469482421875f0,
    5.2789829671382904052734375f-2)

@eval exp10_small_thres(::Type{Float64}) = $(2.0^-29)
@eval exp10_small_thres(::Type{Float32}) = $(2.0f0^-13)


function myexp10(x::T) where T<:Union{Float32,Float64}
    xa = reinterpret(Unsigned, x) & ~sign_mask(T)
    xsb = signbit(x)

    # filter out non-finite arguments
    if xa > reinterpret(Unsigned, MAXEXP10(T))
        if xa >= exponent_mask(T)
            xa & significand_mask(T) != 0 && return T(NaN)
            return xsb ? T(0.0) : T(Inf) # exp10(+-Inf)
        end
        x > MAXEXP10(T) && return T(Inf)
        x < MINEXP10(T) && return T(0.0)
    end

    # argument reduction
    if xa > reinterpret(Unsigned, T(0.5)*T(LOG102))
        if xa < reinterpret(Unsigned, T(1.5)*T(LOG102))
            if xsb
                k = -1
                r = muladd(T(-1.0), -LOG102U(T), x)
                r = muladd(T(-1.0), -LOG102L(T), r)
            else
                k = 1
                r = muladd(T(1.0), -LOG102U(T), x)
                r = muladd(T(1.0), -LOG102L(T), r)
            end
        else
            n = round(T(LOG210)*x)
            k = unsafe_trunc(Int,n)
            r = muladd(n, -LOG102U(T), x)
            r = muladd(n, -LOG102L(T), r)
        end
    elseif xa < reinterpret(Unsigned, exp10_small_thres(T))
        # Taylor approximation for small x ≈ 1.0 + log(10)*x
        return muladd(x, T(LN10), T(1.0))
    else # here k = 0
        return exp10_kernel(x)
    end

    # compute approximation
    y = exp10_kernel(r)
    if k > -significand_bits(T)
        # multiply by 2.0 first to prevent overflow, extending the range
        k == exponent_max(T) && return y * T(2.0) * T(2.0)^(exponent_max(T) - 1)
        twopk = reinterpret(T, rem(exponent_bias(T) + k, fpinttype(T)) << significand_bits(T))
        return y*twopk
    else
        # add significand_bits(T) + 1 to lift the range outside the subnormals
        twopk = reinterpret(T, rem(exponent_bias(T) + significand_bits(T) + 1 + k, fpinttype(T)) << significand_bits(T))
        return y * twopk * T(2.0)^(-significand_bits(T) - 1)
    end
end


@inline function myexp10_inline(x::T) where T<:Union{Float32,Float64}
    xa = reinterpret(Unsigned, x) & ~sign_mask(T)
    xsb = signbit(x)

    # filter out non-finite arguments
    if xa > reinterpret(Unsigned, MAXEXP10(T))
        if xa >= exponent_mask(T)
            xa & significand_mask(T) != 0 && return T(NaN)
            return xsb ? T(0.0) : T(Inf) # exp10(+-Inf)
        end
        x > MAXEXP10(T) && return T(Inf)
        x < MINEXP10(T) && return T(0.0)
    end

    # argument reduction
    if xa > reinterpret(Unsigned, T(0.5)*T(LOG102))
        if xa < reinterpret(Unsigned, T(1.5)*T(LOG102))
            if xsb
                k = -1
                r = muladd(T(-1.0), -LOG102U(T), x)
                r = muladd(T(-1.0), -LOG102L(T), r)
            else
                k = 1
                r = muladd(T(1.0), -LOG102U(T), x)
                r = muladd(T(1.0), -LOG102L(T), r)
            end
        else
            n = round(T(LOG210)*x)
            k = unsafe_trunc(Int,n)
            r = muladd(n, -LOG102U(T), x)
            r = muladd(n, -LOG102L(T), r)
        end
    elseif xa < reinterpret(Unsigned, exp10_small_thres(T))
        # Taylor approximation for small x ≈ 1.0 + log(10)*x
        return muladd(x, T(LN10), T(1.0))
    else # here k = 0
        return exp10_kernel(x)
    end

    # compute approximation
    y = exp10_kernel(r)
    if k > -significand_bits(T)
        # multiply by 2.0 first to prevent overflow, extending the range
        k == exponent_max(T) && return y * T(2.0) * T(2.0)^(exponent_max(T) - 1)
        twopk = reinterpret(T, rem(exponent_bias(T) + k, fpinttype(T)) << significand_bits(T))
        return y*twopk
    else
        # add significand_bits(T) + 1 to lift the range outside the subnormals
        twopk = reinterpret(T, rem(exponent_bias(T) + significand_bits(T) + 1 + k, fpinttype(T)) << significand_bits(T))
        return y * twopk * T(2.0)^(-significand_bits(T) - 1)
    end
end

julia> using BenchmarkTools

julia> @benchmark myexp10(1.3)
BenchmarkTools.Trial:
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     13.026 ns (0.00% GC)
  median time:      13.422 ns (0.00% GC)
  mean time:        14.516 ns (0.00% GC)
  maximum time:     125.138 ns (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     1000

julia> @benchmark myexp10_inline(1.3)
BenchmarkTools.Trial:
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     1.579 ns (0.00% GC)
  median time:      1.974 ns (0.00% GC)
  mean time:        2.081 ns (0.00% GC)
  maximum time:     20.527 ns (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     1000

governors advice may be obsolete in linuxtips.md

linuxtips.md mentions CPU governors and suggests performance. However, after the introduction of pstate in Sandy Bridge, the default processor frequency scaling became very responsive. In my own benchmarks, I frequently get much (10–30%) better performance from just working with the default than performance. Just not bothering to set CPU scaling may be the most realistic scenario for most recent computers (Sandy Bridge appeared after 2011).

@belapsed expression doesn't work on Julia 0.7

On Julia 0.7 @belapsed expression, without keyword arguments, doesn't work:

julia> @belapsed sin(1)
ERROR: ArgumentError: tuple must be non-empty

julia> @belapsed sin(1) evals = 3
1.2999999999999999e-11

No problem with Julia 0.5.

Note that the first one isn't tested in test/ExecutionTests.jl.

BTW, isn't 1.3e-11 a bit too low as elapsed time? On Julia 0.5, same machine (a laptop equipped with an Intel i7 CPU) time is of the order 1.2e-8, which looks more reasonable.

Make `warmup` function's `verbose` optional argument a keyword argument for consistency

Makes sense since run has this optional keyword argument

provide analogue of `@elapsed`

A lot of times, I just want the minimum time, especially if I'm collecting or comparing a lot of timing measurements. It would be nice to have an analogue of @elapsed that just returns the minimum time, but with all of the @benchmark niceties.

I've been using:

"""
Like `@benchmark`, but returns only the minimum time in ns.
"""
macro benchtime(args...)
    b = Expr(:macrocall, Symbol("@benchmark"), map(esc, args)...)
    :(time(minimum($b)))
end

Could something like this be included?

use legitimate non-iid hypothesis testing

It's unlikely I'll get around to doing this in the foreseeable future, but I'm tired of digging through issues to find this comment when I want to link it in other discussions. Recreated from my comment here:

Robust hypothesis testing is quite tricky to do correctly in the realm of non-i.i.d. statistics, which is the world benchmark timings generally live in. If you do the "usual calculations", you'll end up getting junk results a lot of the time.

A while ago, I developed a working prototype of a subsampling method for calculating p-values (which could be modified to compute confidence intervals), but it relies on getting the correct normalization coefficient for the test statistic + timing distribution (unique to each benchmark). IIRC, it worked decently on my test benchmark data, but only if I manually tuned the normalization coefficient for any given benchmark. There are methods out there for automatically estimating this coefficient, but I never got around to implementing them. For a reference, see Politis and Romano's book "Subsampling" (specifically section 8: "Subsampling with Unknown Convergence Rate").

define ratio and/or `/` for Trials

This would compute the ratio of all the relevant metrics:

memory estimate
allocs estimate
minimum time
median time
mean time
maximum time

Would it make sense to try to put some kind of confidence interval on the time based on all of the samples?

Compress files by default

Using something like https://github.com/bicycle1885/CodecZlib.jl the output files can be compressed to ~5% of the current size. It is a pretty lightweight dependency, so perhaps worth thinking about.

Printing in 0.5 not working correctly.

Maybe you are already aware but in case not, the printing to REPL is a bit wonky on 0.5.

I am not expecting stuff to work on 0.5 so take this just as a heads up :)

No argument is accepted for the functions passed to @benchmark or @btime

I do not know if this is a bug or a feature (!), but if I use @btime or @benchmark to benchmark a parametric function from inside another function, then it gives an ugly error that the function parameters are not defined.

Here is a simplified example that shows everything is fine if @btime is used at the top level:

julia> using BenchmarkTools

julia> f(x) = x^2
f (generic function with 1 method)

julia> x = 100
100

julia> @btime f(x)
  187.359 ns (1 allocation: 16 bytes)
10000

However, if we use it from inside another function, e.g., test(), then the error shows up:

julia> function test()
         y = 1000
         @btime f(y)
       end
test (generic function with 1 method)

julia> test()
ERROR: UndefVarError: y not defined
Stacktrace:
 [1] ##core#683() at /Users/dashti/.julia/v0.6/BenchmarkTools/src/execution.jl:316
 [2] ##sample#684(::BenchmarkTools.Parameters) at /Users/dashti/.julia/v0.6/BenchmarkTools/src/execution.jl:322
 [3] #_run#3(::Bool, ::String, ::Array{Any,1}, ::Function, ::BenchmarkTools.Benchmark{Symbol("##benchmark#682")}, ::BenchmarkTools.Parameters) at /Users/dashti/.julia/v0.6/BenchmarkTools/src/execution.jl:350
 [4] (::BenchmarkTools.#kw##_run)(::Array{Any,1}, ::BenchmarkTools.#_run, ::BenchmarkTools.Benchmark{Symbol("##benchmark#682")}, ::BenchmarkTools.Parameters) at ./<missing>:0
 [5] anonymous at ./<missing>:?
 [6] #run_result#19(::Array{Any,1}, ::Function, ::BenchmarkTools.Benchmark{Symbol("##benchmark#682")}, ::BenchmarkTools.Parameters) at /Users/dashti/.julia/v0.6/BenchmarkTools/src/execution.jl:44
 [7] (::BenchmarkTools.#kw##run_result)(::Array{Any,1}, ::BenchmarkTools.#run_result, ::BenchmarkTools.Benchmark{Symbol("##benchmark#682")}, ::BenchmarkTools.Parameters) at ./<missing>:0
 [8] #run#21(::Array{Any,1}, ::Function, ::BenchmarkTools.Benchmark{Symbol("##benchmark#682")}, ::BenchmarkTools.Parameters) at /Users/dashti/.julia/v0.6/BenchmarkTools/src/execution.jl:67
 [9] (::Base.#kw##run)(::Array{Any,1}, ::Base.#run, ::BenchmarkTools.Benchmark{Symbol("##benchmark#682")}, ::BenchmarkTools.Parameters) at ./<missing>:0 (repeats 2 times)
 [10] macro expansion at /Users/dashti/.julia/v0.6/BenchmarkTools/src/execution.jl:410 [inlined]
 [11] test() at ./REPL[6]:3

julia> function test2()
         y = 1000
         @benchmark f(y)
       end
test2 (generic function with 1 method)

julia> test2()
ERROR: UndefVarError: y not defined
Stacktrace:
 [1] ##core#687() at /Users/dashti/.julia/v0.6/BenchmarkTools/src/execution.jl:316
 [2] ##sample#688(::BenchmarkTools.Parameters) at /Users/dashti/.julia/v0.6/BenchmarkTools/src/execution.jl:322
 [3] #_run#4(::Bool, ::String, ::Array{Any,1}, ::Function, ::BenchmarkTools.Benchmark{Symbol("##benchmark#686")}, ::BenchmarkTools.Parameters) at /Users/dashti/.julia/v0.6/BenchmarkTools/src/execution.jl:350
 [4] (::BenchmarkTools.#kw##_run)(::Array{Any,1}, ::BenchmarkTools.#_run, ::BenchmarkTools.Benchmark{Symbol("##benchmark#686")}, ::BenchmarkTools.Parameters) at ./<missing>:0
 [5] anonymous at ./<missing>:?
 [6] #run_result#19(::Array{Any,1}, ::Function, ::BenchmarkTools.Benchmark{Symbol("##benchmark#686")}, ::BenchmarkTools.Parameters) at /Users/dashti/.julia/v0.6/BenchmarkTools/src/execution.jl:44
 [7] (::BenchmarkTools.#kw##run_result)(::Array{Any,1}, ::BenchmarkTools.#run_result, ::BenchmarkTools.Benchmark{Symbol("##benchmark#686")}, ::BenchmarkTools.Parameters) at ./<missing>:0
 [8] #run#21(::Array{Any,1}, ::Function, ::BenchmarkTools.Benchmark{Symbol("##benchmark#686")}, ::BenchmarkTools.Parameters) at /Users/dashti/.julia/v0.6/BenchmarkTools/src/execution.jl:67
 [9] (::Base.#kw##run)(::Array{Any,1}, ::Base.#run, ::BenchmarkTools.Benchmark{Symbol("##benchmark#686")}, ::BenchmarkTools.Parameters) at ./<missing>:0 (repeats 2 times)
 [10] macro expansion at /Users/dashti/.julia/v0.6/BenchmarkTools/src/execution.jl:234 [inlined]
 [11] test2() at ./REPL[8]:3

Here is my versioninfo() and Pkg.status("BenchmarkTools"):

julia> versioninfo()
Julia Version 0.6.2
Commit d386e40c17 (2017-12-13 18:08 UTC)
Platform Info:
  OS: macOS (x86_64-apple-darwin14.5.0)
  CPU: Intel(R) Core(TM) i7-3635QM CPU @ 2.40GHz
  WORD_SIZE: 64
  BLAS: libopenblas (USE64BITINT DYNAMIC_ARCH NO_AFFINITY Sandybridge)
  LAPACK: libopenblas64_
  LIBM: libopenlibm
  LLVM: libLLVM-3.9.1 (ORCJIT, ivybridge)

julia> Pkg.status("BenchmarkTools")
 - BenchmarkTools                0.2.5

Document profiling with @bprofile

It would be fantastic if there was a way to get a profile trace of an @benchmarkable object, and exploit some of the nice features (i.e. run it enough times to get a good sample). Any ideas on ways to do this?

end of line in @btime

it would be nice if you added an option to remove the end-of-line in the @btime output, so that it can be embedded in the middle of a line. (or stream so that one can postprocess it?)

broken on master

julia> f() =0
f (generic function with 1 method)

julia> @btime f()
ERROR: syntax: invalid syntax (escape (call (outerref f)))
Stacktrace:
 [1] generate_benchmark_definition(::Module, ::Array{Symbol,1}, ::Array{Any,1}, ::Expr, ::Void, ::Void, ::BenchmarkTools.Parameters) at C:\Users\Mus\.julia\v0.7\BenchmarkTools\src\execution.jl:289

Add StopWatch? See suggested implementation.

Add simple stopwatch, see below for basic suggested implementation:

mutable struct StopWatch
	t1::Float64
	t2::Float64
	StopWatch() = new(NaN, NaN)
end

function start!(sw::StopWatch)::Nothing
  sw.t1 = time()
  Nothing()
end

function stop!(sw::StopWatch)::Nothing
	sw.t2 = time()
	Nothing()
end

function reset!(sw::StopWatch)::Nothing
	sw.t2 = sw.t1 = NaN
  Nothing()
end

peek(sw::StopWatch) = isnan(sw.t1) ? throw("StopWatch ($sw) has not been started.") : isnan(sw.t2) ? time() - sw.t1 : sw.t2 - sw.t1
# duration(sw::StopWatch) = sw.t2 - sw.t1

# Demo
cl = StopWatch() # peek(cl) # error
start!(cl); sleep(2); peek(cl)
sleep(1); stop!(cl); peek(cl); reset!(cl)

Because sometimes you just want a stop watch.

Dict -> OrderedDict in BenchmarkGroup?

I don't know if anyone else feels the same, but it is a little bit annoying to me the fact that when I dispay the results of some benchmarks, they are not shown in insertion order.
As an example, here for me it is visually difficult to compare the performances of different graph types in Erdos because of the lack of ordering:

julia> @show res["generators"];
res["generators"] = 16-element BenchmarkTools.BenchmarkGroup:
  tags: []
  ("rrg","Net(500, 750) with [] graph, [] vertex, [] edge properties.") => Trial(632.408 μs)
  ("rrg","Graph{Int64}(100, 150)") => Trial(121.221 μs)
  ("rrg","Net(100, 150) with [] graph, [] vertex, [] edge properties.") => Trial(115.033 μs)
  ("rrg","Graph{Int64}(500, 750)") => Trial(677.647 μs)
  ("complete","Net(100, 4950) with [] graph, [] vertex, [] edge properties.") => Trial(896.223 μs)
  ("complete","DiGraph{Int64}(100, 9900)") => Trial(617.122 μs)
  ("complete","DiNet(20, 380) with [] graph, [] vertex, [] edge properties.") => Trial(42.104 μs)
  ("erdos","Graph{Int64}(500, 1500)") => Trial(405.240 μs)
  ("erdos","Net(100, 300) with [] graph, [] vertex, [] edge properties.") => Trial(71.516 μs)
  ("complete","DiGraph{Int64}(20, 380)") => Trial(23.721 μs)
  ("complete","Net(20, 190) with [] graph, [] vertex, [] edge properties.") => Trial(20.845 μs)
  ("complete","Graph{Int64}(100, 4950)") => Trial(159.900 μs)
  ("complete","DiNet(100, 9900) with [] graph, [] vertex, [] edge properties.") => Trial(1.861 ms)
  ("erdos","Net(500, 1500) with [] graph, [] vertex, [] edge properties.") => Trial(297.167 μs)
  ("complete","Graph{Int64}(20, 190)") => Trial(7.340 μs)
  ("erdos","Graph{Int64}(100, 300)") => Trial(88.091 μs)

Would it be reasonable and not too disruptive to use OrderedDicts instead of Dict in the BenchmarkGroup type?

Yes, I could write down some more appropriate comparison methods, but asking doesn't hurt :)

Cheers,
Carlo

	paramspath = joinpath(dirname(@__FILE__), "params.jld")

	if isfile(paramspath)
	loadparams!(suite, BenchmarkTools.load(paramspath, "suite"), :evals);
	else
	tune!(suite)
	BenchmarkTools.save(paramspath, "suite", params(suite));
	end