juliaci / benchmarktools.jl Goto Github PK
View Code? Open in Web Editor NEWA benchmarking framework for the Julia language
License: Other
A benchmarking framework for the Julia language
License: Other
julia> using BenchmarkTools
julia> @benchmark 1+1
ERROR: UndefVarError: @__MODULE__ not defined
julia> versioninfo()
Julia Version 0.6.1
Commit 0d7248e* (2017-10-24 22:15 UTC)
Platform Info:
OS: Linux (x86_64-linux-gnu)
CPU: Intel(R) Core(TM) i7-4770 CPU @ 3.40GHz
WORD_SIZE: 64
BLAS: libopenblas (USE64BITINT DYNAMIC_ARCH NO_AFFINITY Haswell)
LAPACK: libopenblas64_
LIBM: libopenlibm
LLVM: libLLVM-3.9.1 (ORCJIT, haswell)
julia> Pkg.status("BenchmarkTools")
- BenchmarkTools 0.2.1
julia> @__MODULE__
ERROR: UndefVarError: @__MODULE__ not defined
Right now the resolution of the result seems to be limited to 1ns
. This should be enough for expensive benchmarks but could be improved with averaging and may not be enough for benchmarking cheap operations that only takes a few ns
.
I encounter these situations mainly when benchmarking low level operations. E.g. in JuliaLang/julia#16174, where the optimized version of g2
takes only 1.2ns
per loop. It would be nice if I don't have to write my own loops for these.
After approximately the zillionth time seeing people get confusing or incorrect benchmark results because they did:
@benchmark foo(x)
instead of
@benchmark foo($x)
I started wondering if maybe we could do something to avoid forcing this cognitive burden on users.
As inspiration, I've used the following macro in the unit tests to measure "real" allocations from a single execution of a function:
macro wrappedallocs(expr)
argnames = [gensym() for a in expr.args]
quote
function g($(argnames...))
@allocated $(Expr(expr.head, argnames...))
end
$(Expr(:call, :g, [esc(a) for a in expr.args]...))
end
end
@wrappedallocs f(x)
turns @allocated f(x)
into something more like:
function g(_y)
@allocated f(_y)
end
g(y)
which does the same computation but measures the allocations inside the wrapped function instead of at global scope.
It might be possible to do something like this for benchmarking. This particular implementation is wrong, because @wrappedallocs f(g(x))
will only measure the allocations of f()
not g()
, but a similar approach, involving walking the expression to collect all the symbols and then passing those symbols through a new outer function, might work.
The result would be that
@benchmark f(g(y), x)
would turn into something like
function _f(_f, _g, _y, _x)
@_benchmark _f(_g(_y), _x)
end
_f(f, g, y, x)
where @_benchmark
does basically what regular @benchmark
does right now. Passing _f
and _g
as arguments is not necessary if they're regular functions, but it is necessary if they're arbitrary callable objects.
The question is: is this a good idea? This makes BenchmarkTools
more complicated, and might involve too much magic. I also haven't thought through how to integrate this with the setup
arguments. I'm mostly just interested in seeing if this is something that's worth spending time on.
One particular concern I have is if the user tries to benchmark a big block of code, we may end up with the wrapper function taking a ridiculous number of arguments, which I suspect is likely to be handled badly by Julia. Fortunately, the macro can at least detect that case and demand that the user manually splice in their arguments.
print
, show
, showall
all return only one line "Trial(21.917 ns)"
is there a function to print the full summary?
julia> x = @benchmark sin(1)
BenchmarkTools.Trial:
memory estimate: 0 bytes
allocs estimate: 0
--------------
minimum time: 21.917 ns (0.00% GC)
median time: 21.932 ns (0.00% GC)
mean time: 22.034 ns (0.00% GC)
maximum time: 37.657 ns (0.00% GC)
--------------
samples: 10000
evals/sample: 997
time tolerance: 5.00%
memory tolerance: 1.00%
julia> print(x)
Trial(21.917 ns)
julia> show(x)
Trial(21.917 ns)
julia> showall(x)
Trial(21.917 ns)
It would be nice if one could assign directly deep into a BenchmarkGroup
using
a Vector
key. The intermediate groups should be created automatically as
necessary. Currently:
julia> using BenchmarkTools
julia> g = BenchmarkGroup()
julia> g[[1, "a", :b]] = "hello"
ERROR: KeyError: key 1 not found
One would have to do
julia> using BenchmarkTools
julia> g = BenchmarkGroup(1 => BenchmarkGroup("a" => BenchmarkGroup()))
julia> g[[1, "a", :b]] = "hello"
"hello"
Expected:
julia> using BenchmarkTools
julia> g = BenchmarkGroup()
julia> g[[1, "a", :b]] = "hello"
"hello"
While JLD is valid HDF5, I've been told JLD is painfully structured for use in other languages.
I've been bitten by compatibility issues, as well. Maintaining backwards compatibility is hard to pull off smoothly, and JLD changes often conflict with changes in Base, breaking forwards compatibility.
It would be nice to have simple CSV or JSON (de)serialization methods that didn't rely on external dependencies (or at least relied on more stable ones). One benefit is that methods like BenchmarkTools.save
/BenchmarkTools.load
would be easier to patch than external methods like JLD.load
/JLD.save
.
cc @quinnj, since we discussed this at JuliaCon. Do you have any specific opinions here?
Doing benchmark correctly sometimes involving making sure there's no additional overhead (e.g. due to accidental use of global variables) and the operation being benchmarked isn't optimized out (most likely due to constant propagation). In additional to documenting and giving examples about different ways to supply the parameters to the benchmark, I think it'll be useful to provide a way to show the code actually running in the loop.
This is also most relevant for cheap operations...
help?> @benchmark
No documentation found.
One of the features I really like from google/benchmark is the ability to benchmark across a range of input values.
Since the size of the data can have impacts on performance due to cache behaviour, I often find myself benchmarking with different arraysizes.
Why does this change the binding of a? cc @stevengj
julia> using BenchmarkTools
julia> a = BitArray(rand() < 0.5 for x in 1:10^4, y in 1:10^4);
julia> @btime sum(a,1)
52.620 ms (2 allocations: 78.20 KiB)
1×10000 Array{Int64,2}:
5043 5049 5047 5028 4921 4929 … 5029 5022 5013 5082 5032 5011
julia> a
2
After all, johnmyleswhite/Benchmarks.jl#36 is still present in this package....
julia> b1 = @benchmarkable rand($Float64); tune!(b1); run(b1)
BenchmarkTools.Trial:
memory estimate: 16 bytes
allocs estimate: 1
--------------
minimum time: 60.568 ns (0.00% GC)
median time: 63.546 ns (0.00% GC)
mean time: 71.347 ns (5.25% GC)
maximum time: 25.763 μs (99.74% GC)
--------------
samples: 10000
evals/sample: 982
julia> b2 = @benchmarkable rand(); tune!(b2); run(b2)
BenchmarkTools.Trial:
memory estimate: 0 bytes
allocs estimate: 0
--------------
minimum time: 3.432 ns (0.00% GC)
median time: 3.962 ns (0.00% GC)
mean time: 3.852 ns (0.00% GC)
maximum time: 11.629 ns (0.00% GC)
--------------
samples: 10000
evals/sample: 1000
AFAICT this affects BaseBenchmarks
BenchmarkTools is the bomb and everyone uses it – or should! The name is a little awkward though and doesn't have quite the gravitas and officialness that it deserves. What about calling the package Benchmarking.jl
so that one writes the lovely definitive sounding using Benchmarking
when one uses it?
I have started using this package to test a module I'm making and it works really well!
I was thinking that it would be useful to have function to generate a report from the results of a suite. What I ended up doing was to go look at the Nanosoldier repo and copy most of the code there (the printreport
function). Having such a function in this repository would be useful.
A method for render()
has to be defined, as stated by @ChrisRackauckas .
At the moment the only way to "see" the results in Atom is to press "Copy" in the console and then paste somewhere else :(
(print()
also doesn't work)
julia> using BenchmarkTools
# this should work, but doesn't
julia> @benchmark (a + b + c) setup=(a,b,c=1,2,3)
ERROR: syntax: assignment not allowed inside tuple
in generate_benchmark_definition(::Module, ::Array{Symbol,1}, ::Array{Symbol,1}, ::Expr, ::Expr, ::Void, ::BenchmarkTools.Parameters) at /Users/jarrettrevels/.julia/v0.5/BenchmarkTools/src/execution.jl:282
julia> @benchmark (a + b + c) setup=begin a,b,c=1,2,3 end
BenchmarkTools.Trial:
memory estimate: 0 bytes
allocs estimate: 0
--------------
minimum time: 1.645 ns (0.00% GC)
median time: 1.962 ns (0.00% GC)
mean time: 1.967 ns (0.00% GC)
maximum time: 4.511 ns (0.00% GC)
--------------
samples: 10000
evals/sample: 1000
time tolerance: 5.00%
memory tolerance: 1.00%
showall
is gone: JuliaLang/julia#22847 How to best look at results now? Note that repr
and print
show even less info.
func
generated by @benchmark
conflicts with user variables named func
.
Example:
julia> func(x) = x
func (generic function with 1 method)
julia> @benchmark func(3)
ERROR: invalid redefinition of constant func
This is on Julia Version 0.5.0-rc2
With Julia 0.6.1 and a fresh install:
julia> Pkg.update()
julia> Pkg.add("BenchmarkTools")
INFO: Cloning cache of BenchmarkTools from https://github.com/JuliaCI/BenchmarkTools.jl.git
INFO: Installing BenchmarkTools v0.2.0
INFO: Package database updated
julia> using BenchmarkTools
INFO: Precompiling module BenchmarkTools.
julia> @btime sin(1)
ERROR: UndefVarError: @__MODULE__ not defined
julia> @benchmark sin(1)
ERROR: UndefVarError: @__MODULE__ not defined
On running @benchmarks(functionname())
on an already working function, I get this as my error -
ERROR: syntax: invalid syntax (escape (call (outerref parallel_add)))
Stacktrace:
[1] generate_benchmark_definition(::Module, ::Array{Symbol,1}, ::Array{Any,1}, ::Expr, ::Void, ::Void, ::BenchmarkTools.Parameters) at /home/memphis/.julia/v0.7/BenchmarkTools/src/execution.jl:289
I am running Julia 0.7. The error does not persist if I run it on an older version of Julia (0.6). Is there any specific reason why this error persists/any way to fix it?
This is what I get when I run Pkg.status()
-
1 required packages:
- BenchmarkTools 0.0.8
9 additional packages:
- BinDeps 0.7.0
- Blosc 0.3.0
- Compat 0.30.0
- FileIO 0.5.1
- HDF5 0.8.5
- JLD 0.8.1
- LegacyStrings 0.2.2
- SHA 0.5.1
- URIParser 0.2.0
I hope this is isn't a usage question but I think it would be nice if
julia> judge(memory(oldresults["solver"]), memory(results["solver"]))
ERROR: MethodError: no method matching judge(::Int64, ::Int64)
would work
BenchmarkTools.jl/benchmark/benchmarks.jl
Lines 33 to 40 in 9e91a17
The following works fine
A = Array(SymTridiagonal(fill(2, 5), ones(5)))
b = @benchmarkable Base.LinAlg.chol!(x) setup=(x = Hermitian(copy($A), :U))
warmup(b)
run(b)
but injecting tune!
between warmup
and run
as in @benchmark
, i.e.
A = Array(SymTridiagonal(fill(2, 5), ones(5)))
b = @benchmarkable Base.LinAlg.chol!(x) setup=(x = Hermitian(copy($A), :U))
warmup(b)
tune!(b)
run(b)
causes chol!
to throw a PosDefException
from within tune!
...
julia> tune!(b)
ERROR: Base.LinAlg.PosDefException(4)
in _chol!(::Array{Float64,2}, ::Type{UpperTriangular}) at ./linalg/cholesky.jl:55
in chol!(::Hermitian{Float64,Array{Float64,2}}) at ./linalg/cholesky.jl:124
in ##core#429(::Hermitian{Float64,Array{Float64,2}}) at ./<missing>:0
in ##sample#430(::BenchmarkTools.Parameters) at /Users/sacha/.julia/v0.6/BenchmarkTools/src/execution.jl:248
in #_lineartrial#19(::Int64, ::Array{Any,1}, ::Function, ::BenchmarkTools.Benchmark{Symbol("##benchmark#428")}, ::BenchmarkTools.Parameters) at /Users/sacha/.julia/v0.6/BenchmarkTools/src/execution.jl:51
in _lineartrial(::BenchmarkTools.Benchmark{Symbol("##benchmark#428")}, ::BenchmarkTools.Parameters) at /Users/sacha/.julia/v0.6/BenchmarkTools/src/execution.jl:43
in #lineartrial#20(::Array{Any,1}, ::Function, ::BenchmarkTools.Benchmark{Symbol("##benchmark#428")}, ::BenchmarkTools.Parameters) at /Users/sacha/.julia/v0.6/BenchmarkTools/src/execution.jl:59
in #tune!#22(::Bool, ::String, ::Array{Any,1}, ::Function, ::BenchmarkTools.Benchmark{Symbol("##benchmark#428")}, ::BenchmarkTools.Parameters) at /Users/sacha/.julia/v0.6/BenchmarkTools/src/execution.jl:114
in tune!(::BenchmarkTools.Benchmark{Symbol("##benchmark#428")}) at /Users/sacha/.julia/v0.6/BenchmarkTools/src/execution.jl:114
which indicates that setup
isn't occurring between samples in tune!
. Thoughts? Thanks!
If I understand correctly, the median time & memory estimate are by far the most important numbers coming out of a run. The printing should reflect that. Maybe:
julia> @benchmark sin(1)
BenchmarkTools.Trial:
median time: 13.00 ns (0.00% GC)
memory estimate: 0.00 bytes
minimum time: 13.00 ns (0.00% GC)
mean time: 13.02 ns (0.00% GC)
maximum time: 36.00 ns (0.00% GC)
samples: 10000
evals/sample: 1000
time tolerance: 5.00%
memory tolerance: 1.00%
allocs estimate: 0
I really want a histogram to appear by default
ok to label minimum , maximum, mean
A picture is worth a mega-byte of numbers :-)
julia> @btime round(Int, 5);
1.539 ns (0 allocations: 0 bytes)
julia> @btime round($Int, 5);
54.228 ns (0 allocations: 0 bytes)
I believe these should be equivalent.
I run the following: @benchmark heat_juafem_examples.allrun()
I can see that the macro executes the function inside the module heat_juafem_examples
four times, but reference to only a single sample is printed:
julia> include("heat_juafem_examples.jl"); @benchmark heat_juafem_examples.allrun()
WARNING: replacing module heat_juafem_examples
# heat_juafem_example
# heat_juafem_example
# heat_juafem_example
# heat_juafem_example
BenchmarkTools.Trial:
memory estimate: 3.65 GiB
allocs estimate: 20007018
--------------
minimum time: 9.877 s (12.50% GC)
median time: 9.877 s (12.50% GC)
mean time: 9.877 s (12.50% GC)
maximum time: 9.877 s (12.50% GC)
--------------
samples: 1
evals/sample: 1
julia>
Am I missing something? Is this the expected behavior?
Will be great to have a way automatically test the output equality of the outputs at the beginning of the benchmark execution, to avoid regressions in quality during optimization stages.
Installing this on OS X takes about 10 minutes, downloading and compiling things, even installing python. It also breaks my Gtk.jl installation (it replaces the libraries with other versions).
This seems a bit excessive for such a simple package, why does it need so much binary dependencies? I'm not sure where the issue is coming from, I guess from JLD.
BenchmarkGroups currently implements an indexing scheme where a key vector indicates nested indexing. This can and should be removed in favor of using foldl(getindex, g::BenchmarkGroup, keys)
(ref here).
Could someone please tag a release that includes the changes for 0.6 deprecations?
I've using PolyBench.jl, which installs BenchmarkTools.jl as a dependency.
I issued Pkg.update() to include the v0.7 deprecation patch, but that didn't update BenchmarkTools. Pkg.update worked only after BenchmarkTools.jl was swtiched to the master branch from a branch that had the same name of it latest commit.
Other packages, including BinDeps, were not on their master branch yet got updated.
Is there an issue with the way I'm using Pkg.update() ?
In an ideal world, I would love the following to work:
versions = ["v0.0.1", "3743c273f4439833787522242efdcda87951f6d1", "v0.0.2"]
results = Pkg.benchmark("Foo",versions)
@show results # nice printing
@jrevels has pointed out that it is probably not feasible to have Pkg
support this functionality in the short-term (see discussion here). But it seems like BenchmarkTools
could support this natively, perhaps as a prototype to be taken up by Pkg
at a later time.
Let's say we call this function benchmark_pkg()
, then my proposal is to support the following:
benchmark_pkg("Foo",:major) # runs benchmarks on all major releases
benchmark_pkg("Foo",:minor) # runs benchmarks on all major & minor releases
benchmark_pkg("Foo",:patch) # runs benchmarks on all releases
# run benchmarks for each version/commit stored in a user-provided list
benchmark_pkg("Foo",::Vector{AbstractString})
How can I help take baby steps towards this vision? My thought is along these lines:
perf/runtests.jl
. (Maybe perf/runbenchmarks.jl
would be a better name?)pkg_benchmark()
finds parameters using tune!
, saves them in perf/benchmark_params.jl
git checkout ...
, to selectively checkout the src/
repository.src/
to its original/starting commitIs this reasonable? Or more dangerous than I realize? For example, what if user edits code while benchmarks are running and saves the result...
Given the large number of tests, there is still a fair amount of noise in the reports. It would seem that one way to reduce the noise would be to follow up on apparent regressions/improvements with a re-running of those tests. I haven't looked into the infrastructure at all to see how easy this would be, but if both builds are still available I'm wondering whether this would be fairly straightforward?
julia> f() =0
f (generic function with 1 method)
julia> @btime f()
ERROR: syntax: invalid syntax (escape (call (outerref f)))
Stacktrace:
[1] generate_benchmark_definition(::Module, ::Array{Symbol,1}, ::Array{Any,1}, ::Expr, ::Void, ::Void, ::BenchmarkTools.Parameters) at C:\Users\Mus\.julia\v0.7\BenchmarkTools\src\execution.jl:289
I'm finding different results using BenchmarkTools as opposed to my own loops and @time
, particularly regarding allocating operations.
I'm benchmarking Base.Array
vs the types in StaticArrays.jl. Some of these arrays are immutable, and don't perform GC allocations, and others are mutable. If you are doing lots of small, e.g., matrix multiplications, then the allocations and GC can dominate the cost, and an immutable array is much faster. I have a non-allocating SArray
and a (mutable) allocating MArray
.
The typical results I were getting using loops and @time
showed that when new copies of Array
or MArray
were created, a lot of time was spent on allocation and GC (but not for SArray
):
=====================================
Benchmarks for 4×4 matrices
=====================================
Matrix multiplication
---------------------
Array -> 6.526125 seconds (31.25 M allocations: 3.492 GB, 6.61% gc time)
SArray -> 0.369290 seconds (5 allocations: 304 bytes)
MArray -> 1.964021 seconds (15.63 M allocations: 2.095 GB, 12.05% gc time)
Matrix multiplication (mutating)
--------------------------------
Array -> 4.540372 seconds (6 allocations: 576 bytes)
MArray -> 0.748238 seconds (6 allocations: 448 bytes)
However, I switched my tests to BenchmarkTools.jl and now the difference between SArray
and MArray
has disappeared. It appears almost like the allocating costs have been ameliorated somehow. Perhaps I'm using the package wrong, but I get:
=====================================================================
Matrices of size 4×4 and eltype Float64
=====================================================================
SArray: m3 = m1 * m2 takes 52.00 ns, 144.00 bytes (GC 0.00 ns)
Array: m3 = m1 * m2 takes 196.00 ns, 240.00 bytes (GC 0.00 ns)
MArray: m3 = m1 * m2 takes 55.00 ns, 144.00 bytes (GC 0.00 ns)
Array: A_mul_B!(m3, m1, m2) takes 150.00 ns, 0.00 bytes (GC 0.00 ns)
MArray: A_mul_B!(m3, m1, m2) takes 20.00 ns, 0.00 bytes (GC 0.00 ns)
The two calls I make are @benchmark *($(copy(m)), $(copy(m)))
and @benchmark A_mul_B!($(copy(m)), $(copy(m)), $(copy(m)))
, where m
is some random matrix 4x4 I made out of the above types. Is that the right way to use @benchmark
?
I am seeing a 10x difference for the same functions with/without an @inline
hint.
reproducible with
using Base: significand_mask, Math.significand_bits, Math.exponent_bias, exponent_mask,
exponent_half, leading_zeros, Math.exponent_bits, sign_mask, unsafe_trunc,
@pure, Math.@horner, fpinttype, Math.exponent_max, Math.exponent_raw_max
# log2(10)
const LOG210 = 3.321928094887362347870319429489390175864831393024580612054756395815934776608624
# log10(2)
const LOG102 = 3.010299956639811952137388947244930267681898814621085413104274611271081892744238e-01
# log(10)
const LN10 = 2.302585092994045684017991454684364207601101488628772976033327900967572609677367
# log10(2) into upper and lower bits
LOG102U(::Type{Float64}) = 3.01025390625000000000e-1
LOG102U(::Type{Float32}) = 3.00781250000000000000f-1
LOG102L(::Type{Float64}) = 4.60503898119521373889e-6
LOG102L(::Type{Float32}) = 2.48745663981195213739f-4
# max and min arguments
MAXEXP10(::Type{Float64}) = 3.08254715559916743851e2 # log 2^1023*(2-2^-52)
MAXEXP10(::Type{Float32}) = 38.531839419103626f0 # log 2^127 *(2-2^-23)
# one less than the min exponent since we can sqeeze a bit more from the exp10 function
MINEXP10(::Type{Float64}) = -3.23607245338779784854769e2 # log10 2^-1075
MINEXP10(::Type{Float32}) = -45.15449934959718f0 # log10 2^-150
@inline exp10_kernel(x::Float64) =
@horner(x, 1.0,
2.30258509299404590109361379290930926799774169921875,
2.6509490552391992146397114993305876851081848144531,
2.03467859229323178027470930828712880611419677734375,
1.17125514891212478829629617393948137760162353515625,
0.53938292928868392106522833273629657924175262451172,
0.20699584873167015119932443667494226247072219848633,
6.8089348259156870502017966373387025669217109680176e-2,
1.9597690535095281527677713029333972372114658355713e-2,
5.015553121397981796436571499953060992993414402008e-3,
1.15474960721768829356725927226534622604958713054657e-3,
1.55440426715227567738830671828509366605430841445923e-4,
3.8731032432074128681303432086835414338565897196531e-5,
2.3804466459036747669197886523306806338950991630554e-3,
9.3881392238209649520573607528461934634833596646786e-5,
-2.64330486232183387018679354696359951049089431762695e-2)
@inline exp10_kernel(x::Float32) =
@horner(x, 1.0f0,
2.302585124969482421875f0,
2.650949001312255859375f0,
2.0346698760986328125f0,
1.17125606536865234375f0,
0.5400512218475341796875f0,
0.20749187469482421875f0,
5.2789829671382904052734375f-2)
@eval exp10_small_thres(::Type{Float64}) = $(2.0^-29)
@eval exp10_small_thres(::Type{Float32}) = $(2.0f0^-13)
function myexp10(x::T) where T<:Union{Float32,Float64}
xa = reinterpret(Unsigned, x) & ~sign_mask(T)
xsb = signbit(x)
# filter out non-finite arguments
if xa > reinterpret(Unsigned, MAXEXP10(T))
if xa >= exponent_mask(T)
xa & significand_mask(T) != 0 && return T(NaN)
return xsb ? T(0.0) : T(Inf) # exp10(+-Inf)
end
x > MAXEXP10(T) && return T(Inf)
x < MINEXP10(T) && return T(0.0)
end
# argument reduction
if xa > reinterpret(Unsigned, T(0.5)*T(LOG102))
if xa < reinterpret(Unsigned, T(1.5)*T(LOG102))
if xsb
k = -1
r = muladd(T(-1.0), -LOG102U(T), x)
r = muladd(T(-1.0), -LOG102L(T), r)
else
k = 1
r = muladd(T(1.0), -LOG102U(T), x)
r = muladd(T(1.0), -LOG102L(T), r)
end
else
n = round(T(LOG210)*x)
k = unsafe_trunc(Int,n)
r = muladd(n, -LOG102U(T), x)
r = muladd(n, -LOG102L(T), r)
end
elseif xa < reinterpret(Unsigned, exp10_small_thres(T))
# Taylor approximation for small x ≈ 1.0 + log(10)*x
return muladd(x, T(LN10), T(1.0))
else # here k = 0
return exp10_kernel(x)
end
# compute approximation
y = exp10_kernel(r)
if k > -significand_bits(T)
# multiply by 2.0 first to prevent overflow, extending the range
k == exponent_max(T) && return y * T(2.0) * T(2.0)^(exponent_max(T) - 1)
twopk = reinterpret(T, rem(exponent_bias(T) + k, fpinttype(T)) << significand_bits(T))
return y*twopk
else
# add significand_bits(T) + 1 to lift the range outside the subnormals
twopk = reinterpret(T, rem(exponent_bias(T) + significand_bits(T) + 1 + k, fpinttype(T)) << significand_bits(T))
return y * twopk * T(2.0)^(-significand_bits(T) - 1)
end
end
@inline function myexp10_inline(x::T) where T<:Union{Float32,Float64}
xa = reinterpret(Unsigned, x) & ~sign_mask(T)
xsb = signbit(x)
# filter out non-finite arguments
if xa > reinterpret(Unsigned, MAXEXP10(T))
if xa >= exponent_mask(T)
xa & significand_mask(T) != 0 && return T(NaN)
return xsb ? T(0.0) : T(Inf) # exp10(+-Inf)
end
x > MAXEXP10(T) && return T(Inf)
x < MINEXP10(T) && return T(0.0)
end
# argument reduction
if xa > reinterpret(Unsigned, T(0.5)*T(LOG102))
if xa < reinterpret(Unsigned, T(1.5)*T(LOG102))
if xsb
k = -1
r = muladd(T(-1.0), -LOG102U(T), x)
r = muladd(T(-1.0), -LOG102L(T), r)
else
k = 1
r = muladd(T(1.0), -LOG102U(T), x)
r = muladd(T(1.0), -LOG102L(T), r)
end
else
n = round(T(LOG210)*x)
k = unsafe_trunc(Int,n)
r = muladd(n, -LOG102U(T), x)
r = muladd(n, -LOG102L(T), r)
end
elseif xa < reinterpret(Unsigned, exp10_small_thres(T))
# Taylor approximation for small x ≈ 1.0 + log(10)*x
return muladd(x, T(LN10), T(1.0))
else # here k = 0
return exp10_kernel(x)
end
# compute approximation
y = exp10_kernel(r)
if k > -significand_bits(T)
# multiply by 2.0 first to prevent overflow, extending the range
k == exponent_max(T) && return y * T(2.0) * T(2.0)^(exponent_max(T) - 1)
twopk = reinterpret(T, rem(exponent_bias(T) + k, fpinttype(T)) << significand_bits(T))
return y*twopk
else
# add significand_bits(T) + 1 to lift the range outside the subnormals
twopk = reinterpret(T, rem(exponent_bias(T) + significand_bits(T) + 1 + k, fpinttype(T)) << significand_bits(T))
return y * twopk * T(2.0)^(-significand_bits(T) - 1)
end
end
julia> using BenchmarkTools
julia> @benchmark myexp10(1.3)
BenchmarkTools.Trial:
memory estimate: 0 bytes
allocs estimate: 0
--------------
minimum time: 13.026 ns (0.00% GC)
median time: 13.422 ns (0.00% GC)
mean time: 14.516 ns (0.00% GC)
maximum time: 125.138 ns (0.00% GC)
--------------
samples: 10000
evals/sample: 1000
julia> @benchmark myexp10_inline(1.3)
BenchmarkTools.Trial:
memory estimate: 0 bytes
allocs estimate: 0
--------------
minimum time: 1.579 ns (0.00% GC)
median time: 1.974 ns (0.00% GC)
mean time: 2.081 ns (0.00% GC)
maximum time: 20.527 ns (0.00% GC)
--------------
samples: 10000
evals/sample: 1000
linuxtips.md mentions CPU governors and suggests performance
. However, after the introduction of pstate
in Sandy Bridge, the default processor frequency scaling became very responsive. In my own benchmarks, I frequently get much (10–30%) better performance from just working with the default than performance
. Just not bothering to set CPU scaling may be the most realistic scenario for most recent computers (Sandy Bridge appeared after 2011).
On Julia 0.7 @belapsed expression
, without keyword arguments, doesn't work:
julia> @belapsed sin(1)
ERROR: ArgumentError: tuple must be non-empty
julia> @belapsed sin(1) evals = 3
1.2999999999999999e-11
No problem with Julia 0.5.
Note that the first one isn't tested in test/ExecutionTests.jl
.
BTW, isn't 1.3e-11
a bit too low as elapsed time? On Julia 0.5, same machine (a laptop equipped with an Intel i7 CPU) time is of the order 1.2e-8, which looks more reasonable.
Makes sense since run
has this optional keyword argument
A lot of times, I just want the minimum time, especially if I'm collecting or comparing a lot of timing measurements. It would be nice to have an analogue of @elapsed
that just returns the minimum time, but with all of the @benchmark
niceties.
I've been using:
"""
Like `@benchmark`, but returns only the minimum time in ns.
"""
macro benchtime(args...)
b = Expr(:macrocall, Symbol("@benchmark"), map(esc, args)...)
:(time(minimum($b)))
end
Could something like this be included?
It's unlikely I'll get around to doing this in the foreseeable future, but I'm tired of digging through issues to find this comment when I want to link it in other discussions. Recreated from my comment here:
Robust hypothesis testing is quite tricky to do correctly in the realm of non-i.i.d. statistics, which is the world benchmark timings generally live in. If you do the "usual calculations", you'll end up getting junk results a lot of the time.
A while ago, I developed a working prototype of a subsampling method for calculating p-values (which could be modified to compute confidence intervals), but it relies on getting the correct normalization coefficient for the test statistic + timing distribution (unique to each benchmark). IIRC, it worked decently on my test benchmark data, but only if I manually tuned the normalization coefficient for any given benchmark. There are methods out there for automatically estimating this coefficient, but I never got around to implementing them. For a reference, see Politis and Romano's book "Subsampling" (specifically section 8: "Subsampling with Unknown Convergence Rate").
This would compute the ratio of all the relevant metrics:
Would it make sense to try to put some kind of confidence interval on the time based on all of the samples?
Using something like https://github.com/bicycle1885/CodecZlib.jl the output files can be compressed to ~5% of the current size. It is a pretty lightweight dependency, so perhaps worth thinking about.
I do not know if this is a bug or a feature (!), but if I use @btime
or @benchmark
to benchmark a parametric function from inside another function, then it gives an ugly error that the function parameters are not defined.
Here is a simplified example that shows everything is fine if @btime
is used at the top level:
julia> using BenchmarkTools
julia> f(x) = x^2
f (generic function with 1 method)
julia> x = 100
100
julia> @btime f(x)
187.359 ns (1 allocation: 16 bytes)
10000
However, if we use it from inside another function, e.g., test()
, then the error shows up:
julia> function test()
y = 1000
@btime f(y)
end
test (generic function with 1 method)
julia> test()
ERROR: UndefVarError: y not defined
Stacktrace:
[1] ##core#683() at /Users/dashti/.julia/v0.6/BenchmarkTools/src/execution.jl:316
[2] ##sample#684(::BenchmarkTools.Parameters) at /Users/dashti/.julia/v0.6/BenchmarkTools/src/execution.jl:322
[3] #_run#3(::Bool, ::String, ::Array{Any,1}, ::Function, ::BenchmarkTools.Benchmark{Symbol("##benchmark#682")}, ::BenchmarkTools.Parameters) at /Users/dashti/.julia/v0.6/BenchmarkTools/src/execution.jl:350
[4] (::BenchmarkTools.#kw##_run)(::Array{Any,1}, ::BenchmarkTools.#_run, ::BenchmarkTools.Benchmark{Symbol("##benchmark#682")}, ::BenchmarkTools.Parameters) at ./<missing>:0
[5] anonymous at ./<missing>:?
[6] #run_result#19(::Array{Any,1}, ::Function, ::BenchmarkTools.Benchmark{Symbol("##benchmark#682")}, ::BenchmarkTools.Parameters) at /Users/dashti/.julia/v0.6/BenchmarkTools/src/execution.jl:44
[7] (::BenchmarkTools.#kw##run_result)(::Array{Any,1}, ::BenchmarkTools.#run_result, ::BenchmarkTools.Benchmark{Symbol("##benchmark#682")}, ::BenchmarkTools.Parameters) at ./<missing>:0
[8] #run#21(::Array{Any,1}, ::Function, ::BenchmarkTools.Benchmark{Symbol("##benchmark#682")}, ::BenchmarkTools.Parameters) at /Users/dashti/.julia/v0.6/BenchmarkTools/src/execution.jl:67
[9] (::Base.#kw##run)(::Array{Any,1}, ::Base.#run, ::BenchmarkTools.Benchmark{Symbol("##benchmark#682")}, ::BenchmarkTools.Parameters) at ./<missing>:0 (repeats 2 times)
[10] macro expansion at /Users/dashti/.julia/v0.6/BenchmarkTools/src/execution.jl:410 [inlined]
[11] test() at ./REPL[6]:3
julia> function test2()
y = 1000
@benchmark f(y)
end
test2 (generic function with 1 method)
julia> test2()
ERROR: UndefVarError: y not defined
Stacktrace:
[1] ##core#687() at /Users/dashti/.julia/v0.6/BenchmarkTools/src/execution.jl:316
[2] ##sample#688(::BenchmarkTools.Parameters) at /Users/dashti/.julia/v0.6/BenchmarkTools/src/execution.jl:322
[3] #_run#4(::Bool, ::String, ::Array{Any,1}, ::Function, ::BenchmarkTools.Benchmark{Symbol("##benchmark#686")}, ::BenchmarkTools.Parameters) at /Users/dashti/.julia/v0.6/BenchmarkTools/src/execution.jl:350
[4] (::BenchmarkTools.#kw##_run)(::Array{Any,1}, ::BenchmarkTools.#_run, ::BenchmarkTools.Benchmark{Symbol("##benchmark#686")}, ::BenchmarkTools.Parameters) at ./<missing>:0
[5] anonymous at ./<missing>:?
[6] #run_result#19(::Array{Any,1}, ::Function, ::BenchmarkTools.Benchmark{Symbol("##benchmark#686")}, ::BenchmarkTools.Parameters) at /Users/dashti/.julia/v0.6/BenchmarkTools/src/execution.jl:44
[7] (::BenchmarkTools.#kw##run_result)(::Array{Any,1}, ::BenchmarkTools.#run_result, ::BenchmarkTools.Benchmark{Symbol("##benchmark#686")}, ::BenchmarkTools.Parameters) at ./<missing>:0
[8] #run#21(::Array{Any,1}, ::Function, ::BenchmarkTools.Benchmark{Symbol("##benchmark#686")}, ::BenchmarkTools.Parameters) at /Users/dashti/.julia/v0.6/BenchmarkTools/src/execution.jl:67
[9] (::Base.#kw##run)(::Array{Any,1}, ::Base.#run, ::BenchmarkTools.Benchmark{Symbol("##benchmark#686")}, ::BenchmarkTools.Parameters) at ./<missing>:0 (repeats 2 times)
[10] macro expansion at /Users/dashti/.julia/v0.6/BenchmarkTools/src/execution.jl:234 [inlined]
[11] test2() at ./REPL[8]:3
Here is my versioninfo()
and Pkg.status("BenchmarkTools")
:
julia> versioninfo()
Julia Version 0.6.2
Commit d386e40c17 (2017-12-13 18:08 UTC)
Platform Info:
OS: macOS (x86_64-apple-darwin14.5.0)
CPU: Intel(R) Core(TM) i7-3635QM CPU @ 2.40GHz
WORD_SIZE: 64
BLAS: libopenblas (USE64BITINT DYNAMIC_ARCH NO_AFFINITY Sandybridge)
LAPACK: libopenblas64_
LIBM: libopenlibm
LLVM: libLLVM-3.9.1 (ORCJIT, ivybridge)
julia> Pkg.status("BenchmarkTools")
- BenchmarkTools 0.2.5
It would be fantastic if there was a way to get a profile trace of an @benchmarkable
object, and exploit some of the nice features (i.e. run it enough times to get a good sample). Any ideas on ways to do this?
it would be nice if you added an option to remove the end-of-line in the @btime
output, so that it can be embedded in the middle of a line. (or stream so that one can postprocess it?)
julia> f() =0
f (generic function with 1 method)
julia> @btime f()
ERROR: syntax: invalid syntax (escape (call (outerref f)))
Stacktrace:
[1] generate_benchmark_definition(::Module, ::Array{Symbol,1}, ::Array{Any,1}, ::Expr, ::Void, ::Void, ::BenchmarkTools.Parameters) at C:\Users\Mus\.julia\v0.7\BenchmarkTools\src\execution.jl:289
Add simple stopwatch, see below for basic suggested implementation:
mutable struct StopWatch
t1::Float64
t2::Float64
StopWatch() = new(NaN, NaN)
end
function start!(sw::StopWatch)::Nothing
sw.t1 = time()
Nothing()
end
function stop!(sw::StopWatch)::Nothing
sw.t2 = time()
Nothing()
end
function reset!(sw::StopWatch)::Nothing
sw.t2 = sw.t1 = NaN
Nothing()
end
peek(sw::StopWatch) = isnan(sw.t1) ? throw("StopWatch ($sw) has not been started.") : isnan(sw.t2) ? time() - sw.t1 : sw.t2 - sw.t1
# duration(sw::StopWatch) = sw.t2 - sw.t1
# Demo
cl = StopWatch() # peek(cl) # error
start!(cl); sleep(2); peek(cl)
sleep(1); stop!(cl); peek(cl); reset!(cl)
Because sometimes you just want a stop watch.
I don't know if anyone else feels the same, but it is a little bit annoying to me the fact that when I dispay the results of some benchmarks, they are not shown in insertion order.
As an example, here for me it is visually difficult to compare the performances of different graph types in Erdos because of the lack of ordering:
julia> @show res["generators"];
res["generators"] = 16-element BenchmarkTools.BenchmarkGroup:
tags: []
("rrg","Net(500, 750) with [] graph, [] vertex, [] edge properties.") => Trial(632.408 μs)
("rrg","Graph{Int64}(100, 150)") => Trial(121.221 μs)
("rrg","Net(100, 150) with [] graph, [] vertex, [] edge properties.") => Trial(115.033 μs)
("rrg","Graph{Int64}(500, 750)") => Trial(677.647 μs)
("complete","Net(100, 4950) with [] graph, [] vertex, [] edge properties.") => Trial(896.223 μs)
("complete","DiGraph{Int64}(100, 9900)") => Trial(617.122 μs)
("complete","DiNet(20, 380) with [] graph, [] vertex, [] edge properties.") => Trial(42.104 μs)
("erdos","Graph{Int64}(500, 1500)") => Trial(405.240 μs)
("erdos","Net(100, 300) with [] graph, [] vertex, [] edge properties.") => Trial(71.516 μs)
("complete","DiGraph{Int64}(20, 380)") => Trial(23.721 μs)
("complete","Net(20, 190) with [] graph, [] vertex, [] edge properties.") => Trial(20.845 μs)
("complete","Graph{Int64}(100, 4950)") => Trial(159.900 μs)
("complete","DiNet(100, 9900) with [] graph, [] vertex, [] edge properties.") => Trial(1.861 ms)
("erdos","Net(500, 1500) with [] graph, [] vertex, [] edge properties.") => Trial(297.167 μs)
("complete","Graph{Int64}(20, 190)") => Trial(7.340 μs)
("erdos","Graph{Int64}(100, 300)") => Trial(88.091 μs)
Would it be reasonable and not too disruptive to use OrderedDict
s instead of Dict
in the BenchmarkGroup
type?
Yes, I could write down some more appropriate comparison methods, but asking doesn't hurt :)
Cheers,
Carlo
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.