buybackoff / 1brc Goto Github PK
View Code? Open in Web Editor NEW1BRC in .NET among fastest on Linux
License: MIT License
1BRC in .NET among fastest on Linux
License: MIT License
@buybackoff max measurement is 99.9 or 100, 1B * 100 > int.MaxValue, so for the general case this would result in overflow. Using long probably won't hurt perf, significantly.
Hi @buybackoff - thanks for the great article and running this unofficial leaderboard. Would you mind including my entry in your run? https://github.com/noahfalk/1brc/
Its quite speedy from my own benchmarking and I hope the results will be reproducible for you. I put some more info in the README on my repo but if you have any questions about it feel free to reach out. Thanks!
Please measure my implementation (https://github.com/kpocza/1brc) and add to the results table if it performs well on you machine.
I've made some improvements to my implementation (https://github.com/kpocza/1brc) mostly by decreasing hash collisions, exploiting cache locality.
I expect it to run 10-20% faster (especially for the 10k tests), but far slower than the best competitors.
Please rerun you benchmarks.
Not entirely sure I'm following your code (I can't find where the powers of 10 are actually used, for example).
But something struck me about IndexOfNewlineChar
: if only \r\n
and \n
variants are supported (not the \r
-only variant that is a relic of old Macs), you can look only for \n
, and if found and idx
> 0, then go see if it is preceeded by \r
and thus needs a 2-character stride. Something like:
internal static int IndexOfNewlineChar(ReadOnlySpan<byte> span, out int stride)
{
stride = default;
int idx = span.IndexOf((byte)'\n');
if (idx < 0) return;
int lastIdx = idx - 1;
if (idx > 0 && span[lastIdx] == '\r') {
stride = 2;
return lastIdx;
}
stride = 1;
return idx;
}
I have implemented my solution with a self imposed restriction to avoid Unsafe and stay managed only. It supports both \n
as well as \r\n
.
Hello Victor,
I know is very close to the official cut time.
I submitted a PR on 1brc-bench with my repository.
Thanks in advance.
https://github.com/lehuyduc/1brc-simd
Hi, I've updated my code to optimize for the 10k keys dataset. On my PC it's ~3x faster (excluding munmap
time) than the commit you tested. Default dataset performance is a bit slower.
Just ./run_cpp.sh
to compile and run.
To test the effect of hyper threading, you can do ./run_cpp.sh 12 12
(12
== number of threads total on your CPU). You will see interesting effects on the 10K dataset :D
Thanks! Looking forwards to your updated result.
I've made some improvements to the original version.
Please rerun your measurements on my implementation (https://github.com/kpocza/1brc) and add to the results table if it performs better. I've prechecked the output and it conforms to the previous version.
https://github.com/dzaima/1brc - build with make a.out
, run with THREADS_1BRC=[desired thread count] ./a.out path/to/measurements.txt
.
Language column is mildly non-trivial - there's C, C++, and Singeli involved (Singeli generating gen.c
which is committed to the repo to not require Singeli to build).
I expect it to roughly match lehuyduc's entry on both the original & 10K datasets (on my PC my solution is ~10% faster for 10K, and roughly the same for original).
(the repo also has a Java solution, but it's only optimized for the original dataset, and has extremely high variance (>2x; JIT compilation screwing up? haven't bothered looking into it), and with JIT startup time and worse codegen it struggles even at the best of times, so not worth including)
As per your recommendation on twitter, opening an issue here to see if you can add my solution to your .NET results comparison table: https://github.com/CameronAavik/1brc
I'm using Windows but according to the total process time my solution seems to be ~0.87s faster 2.22s -> 1.35s, and according to Stopwatch
inside the program it is ~0.11s faster 1.45s -> 1.34s.
Victor,
When you get the time, could you re-run with my last set of implementations?
https://github.com/pedrosakuma/1brc
Thanks a lot!
I was looking at the code and thought there are some things worth trying to further improve the execution time:
Summary
by ref to Merge
Summary
is 16 bytes and for min/max int
is not needed, short
should be enough..Select(x => (Name: x.Key.ToString(), x.Value))
for final result may not be needed, instead Utf8Span could implement IComparable and do comparison on Span
propertyOrderBy
can be replaced with something like Span<Utf8Span> spans = stackalloc Utf8Span[result.Count]; spans.Sort();
[SkipLocalsInit]
to Summary
since it looks like the Init
method is always invoked for a new instance anyway.ToList()
may not be needed before calling Aggregate
because Aggregate
already has a version for ParallelQuery<TSource>
Console.Write($"{pair.Name} = {pair.Value}");
allocates an unnecessary string, could instead rewrite it as Console.Write(pair.Name); Console.Write(" = "); Console.Write(pair.Value)
however not sure what is the performance of actual Console api, maybe the better strategy would be to just form 1 big resulting string and invoke a single Console.Write
?Hi,
I think you can win something by implementing ISpanFormattable
on Summary
. This will make the string interpolation in Console.Write($"{pair.Name}={pair.Value}")
run faster. You avoid the ToString() method.
You can try this using this code. It was faster on my (slow) computer.
public struct Summary : ISpanFormattable {
public bool TryFormat(Span<char> destination, out int charsWritten, ReadOnlySpan<char> format, IFormatProvider? provider) {
return destination.TryWrite($"{Min / 10.0:N1}/{Average / 10.0:N1}/{Max / 10.0:N1}", out charsWritten);
}
public string ToString(string? format, IFormatProvider? formatProvider) {
return ToString();
}
If it works you can even gain more performance by reimplementing the TryFormat method without the use of the TryWrite extensionmethod. It quite a lot of work but might work.
Regards,
Fons
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.