Comments (12)
isolated and duplicated here: https://github.com/RoaringBitmap/real-roaring-datasets
from roaring.
...or vice-versa. I ended using bitmapContainer operations by default for most things in the runContainer binary operations, since they sped up the high cardinality test cases so much. For low cardinality however the non-bitmapContainer ops may be a win. Requires measurement.
from roaring.
@lemire are there existing benchmarks (in the Java version?) that should be ported to Go; or do the Go benchmarks already suffice?
I'd like to do at least one pass of perf tuning in order to catch glaring issues while the code is still fresh in mind. To that end, I'd like to have benchmarks that include the runContainer16 as a part of their testing. If nothing is available I can generate random ones easily, but I think it would be nice to compare speed directly against the Java or C Roaring implementations if possible.
from roaring.
here is what I get from the java repo, if someone could explain the columns, this might be a good place to start....
jaten@jatens-MacBook-Pro ~/go/src/github.com/RoaringBitmap/RoaringBitmap/simplebenchmark (master) $ ./run.sh
# bitspervalue nanotimefor2by2and nanotimefor2by2or nanotimeforwideor nanotimeforcontains (first normal then buffer)
census-income.zip 2.59 2398999 3087540 663947 51744 2616805 3242794 801903 61818
census-income_srt.zip 0.60 892023 2101384 493592 51848 1199480 2715677 669780 57988
census1881.zip 15.06 87353 923013 1331235 32423 113971 1187069 1497454 31657
census1881_srt.zip 2.07 83891 366097 951445 42705 71974 387080 839619 33649
dimension_003.zip 0.59 535187 1272743 1130322 314439 611348 1807932 1350725 404351
dimension_008.zip 0.35 451754 640828 493947 94197 518238 1268994 626942 275252
dimension_033.zip 0.14 123837 435861 375743 35644 190126 578976 389863 38769
uscensus2000.zip 35.91 31364 148260 935618 28363 32830 225320 1152845 25543
weather_sept_85.zip 5.38 9536055 12942823 3206015 116243 10009470 13687968 3265626 123652
weather_sept_85_srt.zip 0.34 1121712 2513016 1011577 49968 1263084 2760889 1142754 52280
wikileaks-noquotes.zip 5.70 277787 859430 340619 52893 333683 985212 389047 57702
wikileaks-noquotes_srt.zip 1.48 126781 565407 300072 45610 134155 483344 228947 42447
jaten@jatens-MacBook-Pro ~/go/src/github.com/RoaringBitmap/RoaringBitmap/simplebenchmark (master) $
from roaring.
@glycerine We have Go wrappers around C code, so that's a good reference point... that is, Go's roaring should aim to be at least as good as gocroaring...
https://github.com/RoaringBitmap/gocroaring
from roaring.
I have a shallow benchmark...
https://github.com/lemire/gobitmapbenchmark
from roaring.
The easiest reference point might be this C/C++ benchmark...
https://github.com/RoaringBitmap/CBitmapCompetition
The README.md file there goes into the details of what each and every column means. This is not Go code, of course... it is C and C++... but the idea is the same.
from roaring.
great! thanks @lemire
from roaring.
As for your specific question:
bitspervalue: bits of storage per value stored on average... should be self-explanatory... should be better when using run compression for obvious reasons
nanotimefor2by2and : take bitmaps two-by-two, compute intersection, report the time
nanotimefor2by2or : same with union
nanotimeforwideor : compute the "wide" (complete) union of all bitmaps
nanotimeforcontains : we compute the universe size (all integers in [0,N)) then for each bitmap we check whether the values N/4, N/2 and 3N/4 are present in teach bitmap... this stresses random access...
from roaring.
looking at the java test data, it is about 120MB; I'm going to make a separate repo for it so as to not duplicate too much in the Go implementation.
from roaring.
Actually, yes, I think it is nice to have it as a secondary repository.
from roaring.
The benchmark against the C version of Roaring has been extended considerably: see https://github.com/lemire/gobitmapbenchmark
from roaring.
Related Issues (20)
- upper bound memory estimate HOT 3
- question: what is Freeze? HOT 2
- Failed to read runtime container content: unexpected EOF HOT 1
- External-memory roaring data structure HOT 2
- Add Bitmap.NextAbsentValue HOT 5
- error in roaringArray.readFrom: could not read initial cookie: unexpected EOF HOT 7
- [roaring64] Why Or function modify bitmap "a" in this example? HOT 7
- Regarding memory use of maximum size and removal of bit number HOT 2
- UnmarshalBinary has containers with needCopyOnWrite set to true HOT 1
- Implement roaring_bitmap_internal_validate HOT 2
- error in roaringArray.readFrom: did not find expected serialCookie in header HOT 2
- "error in roaringArray.readFrom: did not find expected serialCookie in header" HOT 4
- make qa fails for release 1.6.0
- incorrect GetSizeInBytes() value HOT 1
- "error in roaringArray.readFrom: did not find expected serialCookie in header" when reading a bitmap written by roaring64 HOT 5
- "Could not deserialize bitmap for key #0: error in roaringArray.readFrom: did not find expected serialCookie in header" on v1.8.0 when reading a bitmap written by roaring64 HOT 1
- Go get error HOT 2
- Feature request : mmap roaring bitmap for use in multi threaded inter-process/separate program HOT 1
- Feature request for 128bit for ipv6 usage. HOT 3
- possible to do an mmap version of roaring bitmap for golang? HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from roaring.