google / swift-benchmark Goto Github PK

View Code? Open in Web Editor NEW

913.0 22.0 52.0 170 KB

A swift library to benchmark code snippets.

License: Apache License 2.0

Swift 87.35% CMake 5.18% Python 7.47%

swift benchmark-framework benchmarking swiftpm swiftpackage

swift-benchmark's Introduction

swift-benchmark

A Swift library for benchmarking code snippets, similar to google/benchmark.

Example:

import Benchmark

benchmark("add string reserved capacity") {
    var x: String = ""
    x.reserveCapacity(2000)
    for _ in 1...1000 {
        x += "hi"
    }
}

Benchmark.main()

At runtime, you can filter which benchmarks to run by using the --filter command line flag. For more details on what options are available, pass either the -h or --help command line flags.

Example:

$ swift run -c release BenchmarkMinimalExample --help
USAGE: benchmark-command [--allow-debug-build] [--filter <filter>] [--filter-not <filter-not>] [--iterations <iterations>] [--warmup-iterations <warmup-iterations>] [--min-time <min-time>] [--max-iterations <max-iterations>] [--time-unit <time-unit>] [--inverse-time-unit <inverse-time-unit>] [--columns <columns>] [--format <format>] [--quiet]

OPTIONS:
  --allow-debug-build     Overrides check to verify optimized build.
  --filter <filter>       Run only benchmarks whose names match the regular expression.
  --filter-not <filter-not>
                          Exclude benchmarks whose names match the regular expression.
  --iterations <iterations>
                          Number of iterations to run.
  --warmup-iterations <warmup-iterations>
                          Number of warm-up iterations to run.
  --min-time <min-time>   Minimal time to run when automatically detecting number iterations.
  --max-iterations <max-iterations>
                          Maximum number of iterations to run when automatically detecting number iterations.
  --time-unit <time-unit> Time unit used to report the timing results.
  --inverse-time-unit <inverse-time-unit>
                          Inverse time unit used to report throughput results.
  --columns <columns>     Comma-separated list of column names to show.
  --format <format>       Output format (valid values are: json, csv, console, none).
  --quiet                 Only print final benchmark results.
  -h, --help              Show help information.

$ swift run -c release BenchmarkMinimalExample
running add string no capacity... done! (1832.52 ms)
running add string reserved capacity... done! (1813.96 ms)

name                         time     std        iterations
-----------------------------------------------------------
add string no capacity       37435 ns ±   6.22 %      37196
add string reserved capacity 37022 ns ±   1.75 %      37749

For more examples, see Sources/BenchmarkMinimalExample and Sources/BenchmarkSuiteExample.

Usage

Add this library as a SwiftPM dependency:

let package = Package(
    name: ... ,
    products: [
        .executable(name: "Benchmarks", targets: ["Benchmarks"])
    ],
    dependencies: [
      .package(url: "https://github.com/google/swift-benchmark", from: "0.1.0")
    ],
    targets: [
        .target(
            name: "Benchmarks",
            dependencies: [.product(name: "Benchmark", package: "swift-benchmark")]
        )
    ]
)

Roadmap

The project is in an early stage and offers only a basic set of benchmarking utilities. Feel free to file issues and feature requests to help us prioritize what to do next.

License

Please see LICENSE for details.

Contributing

Please see CONTRIBUTING.md for details.

swift-benchmark's People

Contributors

Stargazers

Watchers

swift-benchmark's Issues

Add integration with XCTest

As suggested by @dabrahams, it would be good to have an integration with XCTest to run benchmarks as tests.

This would imply that we can create an XCTest test suite out of benchmark suite and run all benchmarks. Each benchmark will map to a single test case where the benchmark is run a single iteration.

A report that compares runs

It would be useful to get a report of the relative difference between two runs.

Some existing tools:

google/benchmark's compare.py. (I tried running this on the json from swift-benchmark but it didn't print anything. I guess the format is a bit different.)
The Swift compiler's benchmark report. Example.

Document set-up/teardown state.measure {} pattern somewhere other than commit messages

e.g. the README and examples

release tags that we can depend on from other projects' Package.swift

I'm running into a specific problem that would be fixed by release tags: tensorflow/swift-models#640 causes error "swift-benchmark is required using two different revision-based requirements, which is not supported" if we try to depend on swift-models (which depends on a specific commit hash of swift-benchmark) and penguin (which depends on master of swift-benchmark at the same time (as we do in SwiftFusion).

Printing ratios by default?

Because humans brains are what they are, I think relative performance of benchmarked items may be more important than the absolute time for a lot of situations.

"X is 3x faster than Y" has instant interpretability for a human brain, where "X took 1449 nanoseconds and Y took 4,347 nanoseconds" has the meaningful interpretation one step removed.

Would a PR to print the relative ratios of benchmarked items by default be well received?
I don't mean getting rid of any of the metrics currently printed, just adding ratios.

Automatically detect a reasonable number of iterations

Currently the framework doesn't make any attempt at understanding how long does a single benchmark iteration run. The number of benchmark is hardcoded to a static constant, which is the same for all benchmarks. Given that some benchmarks can take way longer than the others, this causes uneven amount of time spent per benchmark.

Add support for custom reporters

Internally we already have an API to report benchmark results in various formats.

We should polish and expose it as a public API. Moreover, we should think how to make it work with reporting to locations that support real-time dashboarding during benchmark run, such as tensorboard.dev.

Add CI for CMake builds

In #22, we added CMake builds which are helpful for Windows development. We need to add Continuous Integration for those, otherwise there is a risk of them getting out of date.

Add a blackHole

The library should provide a way to ensure that code under test isn't optimized away. The only reliable way I know of to do this that doesn't cost more than a function call is:

notOptimizedAway<T>(_ x: T) { 
  withUnsafePointer(to: x) { opaqueCFunctionTakingVoidConstStar($0) }
}

(storing values into Any can be costly if they need to be boxed, and anyway if the compiler can prove the Any is never read, it may one day optimize it out).

Add support for custom metrics

Some benchmarks require tracking domain-specific settings such as:

Number of items processed
Amount of data processed and/or processing throughput in mb/s
Amount of examples in machine learning models
Domain-specific custom performance metrics (e.g., number of JIT recompilations in tracing ML compilers)

To expose this we can provide an extra overload where benchmark obtains a handle to BenchmarkState:

benchmark("my benchmark") { state in 
   let result = doSomethingExpensive()
   state.metric(name: "examples", value: result.examples.count)
}

The framework should report be able to report both the raw metric and metrics divided by the running time (i.e., examples -> examples/s, items -> items/s).

Attempt to quiesce the machine while running

So that other processes don't interfere with benchmark results, it's important to do things like nice, caffeinate on Mac, etc,

Fix the doc comment for state.measure

See https://github.com/google/swift-benchmark/pull/37/files#r478828446

Right-justify time output in the console

Currently, it's sometimes a little difficult to compare the benchmark output. It'd be better to right-justify to make it easy to compare across different performance scales. (See sample output below.)

name                                                                   time            std                   iterations  
-----------------------------------------------------------------------------------------------------------------------
NonBlockingCondition: notify one, no waiters                           49.0 ns         ± 218.03171899854937  1000000     
NonBlockingCondition: notify all, no waiters                           48.0 ns         ± 259.1237923013594   1000000     
NonBlockingCondition: preWait, cancelWait                              62.0 ns         ± 318.1394131555278   1000000     
NonBlockingCondition: preWait, notify, cancelWait                      95.0 ns         ± 383.7078986828835   1000000     
NonBlockingCondition: preWait, notify, commitWait                      180.0 ns        ± 530.3777998534739   745911      
NonBlockingThreadPool: join, one level                                 688.0 ns        ± 181484.56500114241  7751        
NonBlockingThreadPool: join, two levels                                2328.5 ns       ± 362158.9014454464   1294        
NonBlockingThreadPool: join, three levels                              4993.0 ns       ± 380630.33014058863  941         
NonBlockingThreadPool: join, four levels, three on thread pool thread  5930.5 ns       ± 397880.22706779756  914         
NonBlockingThreadPool: parallel for, one level                         4987256.5 ns    ± 958287.8800861727   28          
NonBlockingThreadPool: parallel for, two levels                        4492208.0 ns    ± 666873.9680715388   32          
AdjacencyList: build a fully-connected graph of size 10                45450.0 ns      ± 11900.774671466019  2643        
AdjacencyList: build a fully-connected graph of size 100               4368357.0 ns    ± 332722.03091855097  33          
AdjacencyList: build a fully-connected graph of size 1000              432393589.5 ns  ± 5835349.92137541    2

Better document options

What are the defaults?

explain how # iterations and # warmup iterations are chosen.
explain how max-iterations is chosen by default
explain what the legal column names are for --columns.
explain what unit --min-time is specified in

Attach summary of the hardware configuration to the json output

Currently we only include benchmark results in the json output. We should add a summary of the detected hardware alongside it. Especially summary of the CPU capabilities, and any of the attached accelerators (i.e., GPU, TPU etc). This is useful to make sense of historical stored results, and to make sure that any of the benchmark comparisons are apples-to-apples.

2 warmup iterations should be a default, or something

Swift builds up lots of metadata and witness table caches at runtime, and when timing things that go quickly, that can be really significant. At least if the iterations are fast enough, you should throw away the first one to get those caches initialized, and maybe throw out another, to get the fast paths into the i-cache. I found adding warmup iterations could drastically change the relative measurement of some benchmarks.

Add continuous integration

We need to make sure code keeps working after each pull request.

Improve time formatting in console output

Currently, time is always in denominations of nanoseconds. This is fantastic for fast micro-benchmarks, but somewhat suboptimal for larger benchmarks (see example below). It'd be great to automatically pick a reasonable time unit for each benchmark (while simultaneously supporting a flag to use nanoseconds across all, which can make comparisons across benchmarks easier).

Related: #25

Add facility for hiding values from the compiler

/// Returns `x`, preventing its value from being statically known by the optimizer.
@inline(__always)
fileprivate func hideValue<T>(_ x: T) -> T {
  
  @_optimize(none) 
  func assumePointeeIsWritten<T>(_ x: UnsafeMutablePointer<T>) {}
  
  var copy = x
  withUnsafeMutablePointer(to: &copy) { assumePointeeIsWritten($0) }
  return copy
}

Could a new version be cut with bumped dependency on swift-argument-parser?

Right now one can't add a package dependency even if head is fixed, would be nice to just bump out a minor release;

Dependencies could not be resolved because root depends on 'swift-argument-parser' 1.0.0..<2.0.0 and root depends on 'swift-benchmark' 0.1.1..<1.0.0.
'swift-benchmark' >= 0.1.1 practically depends on 'swift-argument-parser' 0.5.0..<1.0.0 because no versions of 'swift-benchmark' match the requirement 0.1.2..<1.0.0 and 'swift-benchmark' 0.1.1 depends on 'swift-argument-parser' 0.5.0..<1.0.0.

Make settings customizable via command-line arguments

#10 introduces the concept of BenchmarkSetting that has three scopes: defaults, suite-level, benchmark-level.

It would be convenient to add an additional level for settings that are set via command-line interface. Semantically, the level is going to be located between defaults and per-suite settings (i.e., CLI-provided settings should override the defaults, but not explicitly provided settings).

Add support for machine-readable output

Currently we only support human-readable tabular output. It's great for iterative development locally but it's hard to use from other scripts (i.e., for continuous performance tracking).

To fix this, we should add a JSON-based output format that contains information in the structured format.

Suggestion: consistent time formatting?

swift-benchmark currently strips .0 when printing the time column:

swift-benchmark/Sources/Benchmark/BenchmarkFormatter.swift

Lines 22 to 23 in 9d4782b

 if string.hasSuffix(".0") { 

 return String(string.dropLast(2))

This means when several benchmarks are printed in a run, they may be printed inconsistently over the column:

name                                        time        std        iterations
-----------------------------------------------------------------------------
String.removeFirst()                        81679174 ns ±   1.64 %         16
String.removeFirst(1)                       81369569 ns ±   2.42 %         17
Substring.removeFirst()                      132.500 ns ± 274.18 %    1000000
Substring.removeFirst(1)                    93850715 ns ±   2.25 %         15

In this example Substring.removeFirst() has a trailing .500 that makes it difficult to see just how much faster it is when viewing at-a-glance. (I'm not sure if these occasional sub-nanosecond measurements point to a bug elsewhere.) Any chance this formatting could be made more consistent for legibility?

What's the status of this project?

Given the recent news about S4TF being archived, I wonder if anybody will continue working on this.

Would it perhaps be a good idea to seek out a new home for it?

run setup code before the benchmark

I would like a way to set up some data before the timed portion of the benchmark runs.

Concrete use case: In borglab/SwiftFusion#61, I add a benchmark that requires a dataset from the internet. I put the code that downloads the dataset outside of the benchmark, but this means that the dataset download happens even when the benchmark that requires it is filtered out.

Add support for continuous user-defined metrics

Currently only time and warmup time can be recorded as a time series, while counters provide a single aggregate value.

It would be great to add support for recording custom time series data:

benchmark("...") { state in
   while true {
     var v = ...
     state.measure {
       // benchmark body, modifies v
     }
     state.record(metric: "name", value: Double(v))
   }
}

For example, it can be used to report accuracy in ML models, where not only the final result is important but also the shape of the curve towards the converged score.

Add support for the output column customisation

Currently the output of the benchmark run looks like this:

$ swift run -c release BenchmarkMinimalExample
running add string no capacity
running add string reserved capacity

name                          time        std                   iterations  
--------------------------------------------------------------------------
add string no capacity        40053.0 ns  ± 1681.4582875216558  100000      
add string reserved capacity  39675.0 ns  ± 769.3813620184557   100000

The columns are hardcoded to always be: time, std, iterations. It would be great to make it possible to customize this columns to only include what's desired. The end-user UI will include a new command-line option --show:

$ swift run -c release BenchmarkMinimalExample --show time,std
running add string no capacity
running add string reserved capacity

name                          time        std                   
--------------------------------------------------------------
add string no capacity        40053.0 ns  ± 1681.4582875216558  
add string reserved capacity  39675.0 ns  ± 769.3813620184557

This will open the opportunity to add more column options to choose from such as min/max/avg/percentile values/etc.

Make table output be valid Markdown

Then we can paste it into github comments and it will format nicely, and more compactly, without triple-backquoting it.

Run baseline and new benchmarks; compare results, report speedups/slowdowns and significance

Interpreting benchmark results can be hard; in SwiftUI (and I think this was taken from something done for Swift's benchmarks) the benchmark tool would run a baseline state and new state of the code in the same process, interleaving runs, and report how much speedup or slowdown there is. Also the reports would be adorned with a mark that indicated definite speedup, definite slowdown, or statistically insignificant result. These facilities definitely helped us understand the effects we were seeing and also whether we needed to increase the iteration count or manually kill off processes, etc., to get a useful measurement. IIRC there were three ways to measure the mean result and we mostly ignored everything but the geometric mean. I can probably find out a bit more about the specifics on request. It's also very valuable to have this kind of report from CI.

	if string.hasSuffix(".0") {
	return String(string.dropLast(2))