dotnet / performance Goto Github PK

This repo contains benchmarks used for testing the performance of all .NET Runtimes

License: MIT License

C# 26.41% Python 2.21% PowerShell 0.82% Batchfile 0.01% Shell 0.61% CMake 0.08% HTML 3.23% CSS 0.31% JavaScript 0.06% Visual Basic .NET 0.01% F# 64.92% Jupyter Notebook 1.35% Dockerfile 0.01% C 0.01%

performance's Issues

Port System.Threading.Performance.Tests

Port
Remove the tests duplicated by CoreCLR benchmarks

Analyze System.Runtime.Serialization.Xml and decide whethere to remove them or not

We already have a LOT of serializer benchmarks in the performance repo

Compare what we have vs CoreFX benchmarks
Remove the duplicated benchmarks

Make sure all benchmarks runs on non-Windows OSes too

Follow up of #100 (comment)

After I am done with the port I will run all of the Benchmarks on MacOS and Ubuntu and fix the issues if there are any.

/cc @jorive

Port and possibly redesign System.Diagnostics.Process.Performance.Tests

Port
Investigate why all of them are disabled https://github.com/dotnet/corefx/issues/16653
Redesign those ones which need a redesign

Port System.Runtime.Numerics.Performance.Tests

Port
Make sure we don't benchmark empty loops

for (int i = 0; i < 1000000; i++)
{
     var bi = new BigInteger(input);
}

Port System.Linq.Performance.Tests

Port System.Collections.NonGeneric.Performance.Tests

These tests contain a lot of setup/cleanup logic, good issue for a start

Redesign EnumPerf.ObjectGetType

JIT is smart and knows that enum.GetType() is a constant, so it optimizes it away. Unfortunately, it's keeping the empty loop.

public static Color blackColor = Color.Black;

[Benchmark]
public Type ObjectGetType()
{
    Type tmp = null;

    for (int i = 0; i < InnerIterationCount; i++)
        tmp = blackColor.GetType();

    return tmp;
}

Port System.Memory.Performance.Tests

There is a LOT of Memory benchmarks

Port
Make sure we don't measure empty loops:
Take a short look at CoreCLR Span benchmarks, create new issue for removing duplicated benchmarks

for (int i = 0; i < Benchmark.InnerIterationCount; i++)
{
      Span<char> span = memory.Span;
}

Investigate bimodal SpectralNorm_3

The SpectralNorm_3 is a bimodal benchmark.

Example histograms from BenchmarkDotNet:

-------------------- Histogram --------------------
[0.786 ms ; 1.033 ms) | @
[1.033 ms ; 1.466 ms) | @@@@@@@@@@@@@@@@@@
[1.466 ms ; 1.807 ms) | @@@
[1.807 ms ; 2.240 ms) | @@@@@@@
[2.240 ms ; 2.875 ms) | @@@@@
[2.875 ms ; 3.288 ms) |
[3.288 ms ; 3.721 ms) | @@@@@@
---------------------------------------------------

-------------------- Histogram --------------------
[0.942 ms ; 1.237 ms) | @@@
[1.237 ms ; 1.614 ms) | @@@@@@@@@@
[1.614 ms ; 1.892 ms) | @
[1.892 ms ; 2.269 ms) | @@@@@@@
[2.269 ms ; 2.563 ms) | @@@
[2.563 ms ; 2.940 ms) | @@@@@@@@@
[2.940 ms ; 3.354 ms) | @@@@
[3.354 ms ; 3.731 ms) | @@@
---------------------------------------------------

Sample results from xunit-performance (please look at the Min and Max value):

DotNetBenchmark-spectralnorm-3.dll	Metric	Unit	Iterations	Average	STDEV.S	Min	Max
BenchmarksGame.SpectralNorm_3.RunBench	Duration	msec	6	1848.838	380.029	1517.790	2584.247

DotNetBenchmark-spectralnorm-3.dll	Metric	Unit	Iterations	Average	STDEV.S	Min	Max
BenchmarksGame.SpectralNorm_3.RunBench	Duration	msec	6	1928.860	173.474	1831.114	2272.792

We can get the diasm with BDN, but we need #40 to be implemented first to get profiles

Port System.Threading.Tasks.Extensions.Performance.Tests

Port System.Text.Encoding.Performance.Tests

Port
Reduce the number of test cases

Remove System.Runtime.Serialization.Json.Performance.Tests

We already have a LOT of serializer benchmarks

Add the JSON.NET and DataContractSerializer benchmarks to CoreFX category
Add the JSON.NET and DataContractSerializer benchmarks to CoreCLR category

Port and cleanup Perf.TypeDescriptorTests

Port
compare the results for every type used as an argument, remove the types with similar results
move the unit testing part to CoreFX unit tests project

        [InlineData(typeof(bool), typeof(BooleanConverter))]
        [InlineData(typeof(byte), typeof(ByteConverter))]
        [InlineData(typeof(SByte), typeof(SByteConverter))]
        [InlineData(typeof(char), typeof(CharConverter))]
        [InlineData(typeof(double), typeof(DoubleConverter))]
        [InlineData(typeof(string), typeof(StringConverter))]
        [InlineData(typeof(short), typeof(Int16Converter))]
        [InlineData(typeof(int), typeof(Int32Converter))]
        [InlineData(typeof(long), typeof(Int64Converter))]
        [InlineData(typeof(float), typeof(SingleConverter))]
        [InlineData(typeof(UInt16), typeof(UInt16Converter))]
        [InlineData(typeof(UInt32), typeof(UInt32Converter))]
        [InlineData(typeof(UInt64), typeof(UInt64Converter))]
        [InlineData(typeof(object), typeof(TypeConverter))]
        [InlineData(typeof(void), typeof(TypeConverter))]
        [InlineData(typeof(DateTime), typeof(DateTimeConverter))]
        [InlineData(typeof(DateTimeOffset), typeof(DateTimeOffsetConverter))]
        [InlineData(typeof(Decimal), typeof(DecimalConverter))]
        [InlineData(typeof(TimeSpan), typeof(TimeSpanConverter))]
        [InlineData(typeof(Guid), typeof(GuidConverter))]
        [InlineData(typeof(Array), typeof(ArrayConverter))]
        [InlineData(typeof(ICollection), typeof(CollectionConverter))]
        [InlineData(typeof(Enum), typeof(EnumConverter))]
        [InlineData(typeof(SomeEnum), typeof(EnumConverter))]
        [InlineData(typeof(SomeValueType?), typeof(NullableConverter))]
        [InlineData(typeof(int?), typeof(NullableConverter))]
        [InlineData(typeof(ClassWithNoConverter), typeof(TypeConverter))]
        [InlineData(typeof(BaseClass), typeof(BaseClassConverter))]
        [InlineData(typeof(DerivedClass), typeof(DerivedClassConverter))]
        [InlineData(typeof(IBase), typeof(IBaseConverter))]
        [InlineData(typeof(IDerived), typeof(IBaseConverter))]
        [InlineData(typeof(ClassIBase), typeof(IBaseConverter))]
        [InlineData(typeof(ClassIDerived), typeof(IBaseConverter))]
        [InlineData(typeof(Uri), typeof(UriTypeConverter))]

Port System.Collections

A LOT of benchmarks. We should make sure that all the test cases make sense.

Port System.Net.Primitives.Performance.Tests

Repo structure change proposal

We are soon going to Open this repo and I think that we should change the folder structure before we do that.

Currently we have:

├───.vscode
├───docs
├───scripts
│   └───build
└───src
    ├───ArtifactsUploader // an internal tool
    ├───benchmarks // bdn microbenchmarks
    ├───common // single file
    ├───coreclr // old xunit microbenchmarks
    ├───CoreFx // old xunit microbenchmarks
    ├───dmlib
    ├───docker
    └───scenarios // end-to-end benchmarks

My proposal for now

├───.vscode
├───build // common with a static code analysis rules moved to build
├───docs
├───scripts
│   └───build
└───src
    ├───benchmarks
    └─────micro // currently in "benchmarks"
    └─────end-to-end // currently in "scenarios"
    └─────other
          └─────containers // currently in "docker"
          └─────dmlib // currently in "dmlib"
    ├───tools
    └─────ArtifactsUploader

In the future I would like to add Java and C++ benchmarks to compare against our competition:

├───.vscode
├───build
├───docs
├───scripts
│   └───build
└───src
    ├───benchmarks
    └─────micro 
    └─────end-to-end
    └─────competition
          └─────java
          └─────cpp
    └─────other
          └─────containers
          └─────dmlib
    ├───tools
    └─────ArtifactsUploader

@jorive @brianrob what do you think? I want to get acceptance before I start working on the PR ;p

Port System.Net.Http.Performance.Tests

Support for SupplementalTestData

For some benchmarks we use test data that is not included in the repository because of its size (i.e. 50MB text file). We store these files in https://github.com/dotnet/corefx-testdata and include them via a SupplementalTestData directive in our benchmark csproj file: https://github.com/dotnet/corefx/blob/master/src/System.Text.RegularExpressions/tests/Performance/System.Text.RegularExpressions.Performance.Tests.csproj#L20

We should support these directives here in the performance repository and make sure that these are only included when needed. Another example why a csproj per source assembly is desirable.

cc @danmosemsft @adamsitnik

Performance issues in System.Collection exposed by new benchmarks

With the new benchmarks we can compare "apples to apples" which exposed a lot of interesting differences.

This issue is an aggregation of many issues, I will report them in CoreFX once I make the CoreFX dev experience very good.

Port System.Globalization.Performance.Tests

Very good issue for a start!

[Jenkins] Python 3 bat script returned exit code 103

We have been hitting in the lab a similar error to what is described here: https://issues.jenkins-ci.org/browse/JENKINS-35009.

Find and port System.Numerics.Vectors.Performance.Tests

The folder where the benchmarks should be contains no benchmarks.

Find the benchmarks
Port

/cc @DrewScoggins @jorive

Port System.Runtime.Extensions.Performance.Tests

Port System.Security.Cryptography.Primitives.Performance.Tests

Compare the results, if they are not much different, remove some benchmarks

Port System.IO.Compression.Performance.Tests

Port
Make sure it makes sense to benchmark all the permutations

Set PERFSNAKE Machines to Auto-Connect to Jenkins

Right now, most PERFSNAKE machines don't auto-connect to Jenkins, which means that manual intervention is needed whenever machines go down for any reason.

In order for us to increase the number of performance runs for both daily builds and PRs, we need to make sure that our machine pool is resilient to restarts and outages.

@anscoggi, @adiaaida can one of you take this?

Required for https://github.com/dotnet/coreclr/issues/15175.

cc: @dotnet/rap-team

Expand the Perf test coverage for the System.Numerics.Vector types

The current perf test coverage for the System.Numerics.Vector types is fairly limited (this extends to types such as Matrix4x4, Quaternion, and Plane as well).

Given that these are meant to be fairly core high-performance types (used in things such as Multimedia-based applications), it is important that we have a high amount of perf-test coverage over the current implementations.

Port System.Net.Sockets.Performance.Tests

`BenchmarkDotNet.Artifacts` should be generated where `Benchmarks.dll` exists.

After running the micro-benchmarks, the results are dropped side-by-side with the Benchmarks.csproj.
If I run two different frameworks, one after the other, the initial results are overwritten.

Port System.Console.Performance.Tests

These tests are benchmarking Console, which is also used by BenchmarkDotNet to report results. Might be a challenging task to make it work.

Make it possible to run with Mono

It would be amazing for this benchmarking suite to run on top of Mono. This would allow us to more easily compare performance across the different runtimes, as well as help the Mono team identify where we should focus our performance work.

What should we do on our end to make it easier for you?

Thank you!

Investiage multimodal BinaryTrees_5

The BinaryTrees_5 is a bimodal benchmark.

Example histograms from BenchmarkDotNet:

-------------------- Histogram --------------------
[119.310 ms ; 146.287 ms) | @@@
[146.287 ms ; 182.663 ms) |
[182.663 ms ; 210.865 ms) | @@
[210.865 ms ; 237.841 ms) | @@@@@
[237.841 ms ; 256.492 ms) | @
[256.492 ms ; 283.468 ms) | @@@@@@@@@@@
[283.468 ms ; 316.363 ms) | @@@@@@@@@@@@@@
[316.363 ms ; 346.171 ms) | @@@@
---------------------------------------------------

-------------------- Histogram --------------------
[113.076 ms ; 131.348 ms) | @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
[131.348 ms ; 151.367 ms) | @@@@
[151.367 ms ; 162.228 ms) |
[162.228 ms ; 180.499 ms) | @@
[180.499 ms ; 191.304 ms) |
[191.304 ms ; 209.576 ms) | @@@@@@@
[209.576 ms ; 220.639 ms) |
[220.639 ms ; 238.910 ms) | @@
[238.910 ms ; 261.127 ms) | @
---------------------------------------------------

Sample results from xunit-performance (please look at the Min value):

DotNetBenchmark-binarytrees-5.dll	Metric	Unit	Iterations	Average	STDEV.S	Min	Max
BenchmarksGame.BinaryTrees_5.RunBench	Duration	msec	7	1604.557	564.717	1040.190	2265.712

DotNetBenchmark-binarytrees-5.dll	Metric	Unit	Iterations	Average	STDEV.S	Min	Max
BenchmarksGame.BinaryTrees_5.RunBench	Duration	msec	5	2122.072	138.067	1935.307	2304.632

Port System.Threading.Channels.Performance.Tests

Port new SoA benchmarks from CoreCLR

dotnet/coreclr#18839

Implement ETW Profiler for BDN like we have for xunit

Ideas for reducing the time required to run all benchmarks

@jorive I am going to add all the ideas I have here and one day we can start the improvements from here.

PingPong benchmark from System.Threading.Channels.Tests takes 2.5s to execute. It's executed for 3 channels, up to 20 times for each of them. 2.5 x 3 x 20 = 150 seconds. The solution would be to change the inner iteration count from 1_000_000 to 1_000 or any other smaller value. fixed in #126

Port System.Xml.XmlDocument.Performance.Tests

Port and improve Perf_Marvin

Port from xunit-performance to BenchmarkDotNet
investigate why the benchmarks are not calling Marvin API from CoreFX, but instead, have it's own copy of the hashing algorithm. IMHO it creates a need to synchronize the code between repositories and makes it possible to miss a regression if somebody changes the implementation in CoreFX
reduce the number of permutations, it's more a unit test than a benchmark today.

Port System.Runtime.Performance.Tests

Port
Make sure we don't benchmark empty loops

for (int i = 0; i < 10000; i++)
 {
      new Guid(guidStr); new Guid(guidStr); new Guid(guidStr);
      new Guid(guidStr); new Guid(guidStr); new Guid(guidStr);
      new Guid(guidStr); new Guid(guidStr); new Guid(guidStr);
}

Make sure we don't run more permutations than needed:

[Benchmark]
[InlineData("a", 0)]
[InlineData("  ", 0)]
[InlineData("  ", 1)]
[InlineData("TeSt!", 0)]
[InlineData("TeSt!", 2)]
[InlineData("TeSt!", 3)]
[InlineData("I think Turkish i \u0131s TROUBL\u0130NG", 0)]
[InlineData("I think Turkish i \u0131s TROUBL\u0130NG", 18)]
[InlineData("I think Turkish i \u0131s TROUBL\u0130NG", 22)]
[InlineData("dzsdzsDDZSDZSDZSddsz", 0)]
[InlineData("dzsdzsDDZSDZSDZSddsz", 7)]
[InlineData("dzsdzsDDZSDZSDZSddsz", 10)]
[InlineData("a\u0300\u00C0A\u0300A", 0)]
[InlineData("a\u0300\u00C0A\u0300A", 3)]
[InlineData("a\u0300\u00C0A\u0300A", 4)]
[InlineData("Foo\u0400Bar!", 0)]
[InlineData("Foo\u0400Bar!", 3)]
[InlineData("Foo\u0400Bar!", 4)]
[InlineData("a\u0020a\u00A0A\u2000a\u2001a\u2002A\u2003a\u2004a\u2005a", 0)]
[InlineData("a\u0020a\u00A0A\u2000a\u2001a\u2002A\u2003a\u2004a\u2005a", 3)]
[InlineData("\u4e33\u4e65 Testing... \u4EE8", 0)]

public static object[][] UInt64Values => new[]
{
    new object[] { 214748364LU },
    new object[] { 2LU },
    new object[] { 21474836LU },
    new object[] { 21474LU },
    new object[] { 214LU },
    new object[] { 2147LU },
    new object[] { 214748LU },
    new object[] { 21LU },
    new object[] { 2147483LU },
    new object[] { 922337203685477580LU },
    new object[] { 92233720368547758LU },
    new object[] { 9223372036854775LU },
    new object[] { 922337203685477LU },
    new object[] { 92233720368547LU },
    new object[] { 9223372036854LU },
    new object[] { 922337203685LU },
    new object[] { 92233720368LU },
    new object[] { 0LU }, // min value
    new object[] { 18446744073709551615LU }, // max value
    new object[] { 2147483647LU }, // int32 max value
    new object[] { 9223372036854775807LU }, // int64 max value
    new object[] { 1000000000000000000LU }, // quintillion
    new object[] { 4294967295000000000LU }, // uint.MaxValue * Billion
    new object[] { 4294967295000000001LU }, // uint.MaxValue * Billion + 1
};

Port System.IO.MemoryMappedFiles.Performance.Tests

Port
Use some static analysis tool and remove dead code
Move the asserts to unit test project
Reduce the number of permutations

Road to BenchmarkDotNet

This is a list of requirements that need to be met before we switch from xunit-performance to BenchmarkDotNet.

Missing features:

dotnet/BenchmarkDotNet#587 "Support netcoreapp2.1" (fixed by @eerhardt, was part of 0.10.11 release)
dotnet/BenchmarkDotNet#256 dotnet/BenchmarkDotNet#350 dotnet/BenchmarkDotNet#754 "Per-method parameterization" - sth what xunit offers with [InlineData] and [MemberData] (fixed by @adamsitnik, was part of 0.10.14 release)
dotnet/BenchmarkDotNet#652 support generic benchmark types (fixed by @adamsitnik, was part of 0.10.13 release)
dotnet/BenchmarkDotNet#175 "Add .NET Core support for Diagnostics package" - consume the .NET Standard TraceEvent lib and allow to use it in .NET Core apps (fixed by @adamsitnik, will be part of 0.11.00 release)
dotnet/BenchmarkDotNet#698 ".NET Standard 2.0, support" (fixed by @adamsitnik, will be part of 0.11.00 release)
dotnet/BenchmarkDotNet#701 "Implement BenchView exporter"
dotnet/BenchmarkDotNet#715 "Scenarios support - run given exe as benchmark, gather multiple results" - we need it for JitBench, the Roslyn team has an initial implementation that we could use

Performance (the tool needs to be fast to run as part of CI):

dotnet/BenchmarkDotNet#606 "Improve Memory Diagnoser" - BDN required one extra process run to get the memory statistics, I have changed the architecture to require only one extra iteration. We need one extra iteration because for desktop .NET we are using AppDomain.MonitoringIsEnabled which adds an extra overhead. So we run the benchmarks without overhead, measure time, enable monitoring and run one extra iteration to get the memory statistics (fixed by @adamsitnik, was part of 0.10.12 release)
dotnet/BenchmarkDotNet#543 "Run Disassembly Diagnoser without extra run" - BDN required one extra process run to get the disassembly, I have changed the architecture to synchronize parent and child processes and get the disassembly after running the benchmarks, but before quiting the process (fixed by @adamsitnik, was part of 0.10.12 release)
dotnet/BenchmarkDotNet#699 "Generate one executable per runtime settings" BDN used to build an extra exe per benchmark. It was taking a lot of time. , I have changed the architecture, now it groups the benchmarks by runtime settings (framework/JIT/GC etc) and builds in parallel one exe for the entire group of benchmarks. For BenchmarkDotNet.Samples project with 650 benchmarks it used to take 1h to build the extra exes. Now its's 13s on my PC (fixed by @adamsitnik, will be part of 0.11.00 release)
dotnet/BenchmarkDotNet#704 "Add an optional way to configure storage to remember the PerfectInvocationCount, IterationCount and UnrollFactor" By default BDN uses a heuristic to find perfect invocation and iteration count. For CI scenarios we should remember these values, I estimate that it should save us 20-25% of time.
dotnet/BenchmarkDotNet#716 Allow to optionally run benchmarks in Parallel.

Private runtimes support (BenchmarkDotNet compiles new exe, so it needs to know how to work with private builds):

dotnet/BenchmarkDotNet#648 "BenchmarkDotNet requires dotnet cli toolchain to be installed". Prior joining MS I had no idea that you run .NET Core apps without adding dotnet cli to the PATH. Now it's optional and user can provide path to dotnet cli which should be used (fixed by @adamsitnik, was part of 0.10.13 release)
dotnet/BenchmarkDotNet#643 "BenchmarkDotNet should respect LangVersion project setting" - just copy it to the auto-generated project (fixed by @adamsitnik, was part of 0.10.13 release)
dotnet/BenchmarkDotNet#706 "Support private builds of .NET Runtime" - @vitek-karas needed to measure the perf difference after his recent NGEN changes. He wanted to compare the existing .NET Framework with his private build of CLR. This feature simply sends the provided version as COMPLUS_Version env var to the benchmarked process and allows to benchmark private desktop CLR builds. (fixed by @adamsitnik, was part of 0.10.14 release)
dotnet/BenchmarkDotNet#700 "Support private CoreCLR and CoreFX buids" - users can now use ANY CoreCLR and CoreFX builds for benchmarking. Uses dotnet cli to publish self-contained app, works on Windows, Linux and Mac. (fixed by @adamsitnik, will be part of 0.11.00 release)
dotnet/BenchmarkDotNet#718 CoreRT support.

Verification:

dotnet/BenchmarkDotNet#714 Test BenchmarkDotNet against unstable/multimodal benchmarks from CoreCLR/CoreFX repo provided by @AndyAyersMS .

I am tagging the Perf Team and the people who are interested in the progress:
@jorive @valenis @adiaaida @DrewScoggins @brianrob
@ViktorHofer @danmosemsft @eerhardt
@AndyAyersMS @JosephTremoulet
@davidfowl @DamianEdwards

Port System.Text.RegularExpressions.Performance.Tests

Port
Keep THIRD-PARTY-NOTICES
Check the stability of multithreaded benchmarks

Port System.IO.FileSystem.Performance.Tests

Port
Use some static analysis tool and remove dead code
Investigate what [OuterLoop] is and if we need it in BDN?

Redesign LINQ benchmarks

This is a #90 followup

~~CoreFX benchmarks call .ToArray() everywhere, while they should just iterate over the enumeration (like the CoreCLR ones) ~~ fixed in #127
Cast_ToBaseClass and Cast_SameType ignore provided size and iteration count and use hardcoded values. We don't know if this is by design or a bug. (cc @jorive)
Some benchmarks use iteration some iterationCount argument names, this should be unified.

Port System.Linq.Parallel.Performance.Tests

I missed it because it was not ported to performance repo

2.1 vs 2.2

I took all the benchmarks we have an executed them using all the goodness we have here in the perf repo. I will be posting the results below.

Info:

BenchmarkDotNet=v0.11.1.812-nightly, OS=Windows 10.0.17134.345 (1803/April2018Update/Redstone4)
Intel Xeon CPU E5-1650 v4 3.60GHz, 1 CPU, 12 logical and 6 physical cores
Frequency=3507496 Hz, Resolution=285.1037 ns, Timer=TSC
  [Host] : .NET Core 2.1.5 (CoreCLR 4.6.26919.02, CoreFX 4.6.26919.02), 64bit RyuJIT
  2.1    : .NET Core 2.1.5 (CoreCLR 4.6.26919.02, CoreFX 4.6.26919.02), 64bit RyuJIT
  2.2    : .NET Core 2.2.0-rtm-27029-02 (CoreCLR 4.6.27029.01, CoreFX 4.6.27029.02), 64bit RyuJIT

Port System.Collections.Concurrent.Performance.Tests

These tests contain a lot of setup/cleanup logic, porting them will not be trivial

Port System.IO.Pipes.Performance.Tests

Port
Investigate https://github.com/dotnet/corefx/issues/18290 and make sure we can disable benchmark per OS

Port System.IO.Compression.Brotli.Performance.Tests

Port
Make sure it makes sense to benchmark all the permutations
Remove the asserts, move them to unit test project if they are not tested otherwise

dotnet / performance Goto Github PK

performance's Issues

Recommend Projects

Recommend Topics

Recommend Org