Git Product home page Git Product logo

Comments (12)

glycerine avatar glycerine commented on May 19, 2024 1

isolated and duplicated here: https://github.com/RoaringBitmap/real-roaring-datasets

from roaring.

glycerine avatar glycerine commented on May 19, 2024

...or vice-versa. I ended using bitmapContainer operations by default for most things in the runContainer binary operations, since they sped up the high cardinality test cases so much. For low cardinality however the non-bitmapContainer ops may be a win. Requires measurement.

from roaring.

glycerine avatar glycerine commented on May 19, 2024

@lemire are there existing benchmarks (in the Java version?) that should be ported to Go; or do the Go benchmarks already suffice?

I'd like to do at least one pass of perf tuning in order to catch glaring issues while the code is still fresh in mind. To that end, I'd like to have benchmarks that include the runContainer16 as a part of their testing. If nothing is available I can generate random ones easily, but I think it would be nice to compare speed directly against the Java or C Roaring implementations if possible.

from roaring.

glycerine avatar glycerine commented on May 19, 2024

here is what I get from the java repo, if someone could explain the columns, this might be a good place to start....

jaten@jatens-MacBook-Pro ~/go/src/github.com/RoaringBitmap/RoaringBitmap/simplebenchmark (master) $ ./run.sh
# bitspervalue nanotimefor2by2and nanotimefor2by2or nanotimeforwideor nanotimeforcontains (first normal then buffer)
census-income.zip                       2.59         2398999   3087540    663947     51744         2616805   3242794    801903     61818
census-income_srt.zip                   0.60          892023   2101384    493592     51848         1199480   2715677    669780     57988
census1881.zip                          15.06          87353    923013   1331235     32423          113971   1187069   1497454     31657
census1881_srt.zip                      2.07           83891    366097    951445     42705           71974    387080    839619     33649
dimension_003.zip                       0.59          535187   1272743   1130322    314439          611348   1807932   1350725    404351
dimension_008.zip                       0.35          451754    640828    493947     94197          518238   1268994    626942    275252
dimension_033.zip                       0.14          123837    435861    375743     35644          190126    578976    389863     38769
uscensus2000.zip                        35.91          31364    148260    935618     28363           32830    225320   1152845     25543
weather_sept_85.zip                     5.38         9536055  12942823   3206015    116243        10009470  13687968   3265626    123652
weather_sept_85_srt.zip                 0.34         1121712   2513016   1011577     49968         1263084   2760889   1142754     52280
wikileaks-noquotes.zip                  5.70          277787    859430    340619     52893          333683    985212    389047     57702
wikileaks-noquotes_srt.zip              1.48          126781    565407    300072     45610          134155    483344    228947     42447
jaten@jatens-MacBook-Pro ~/go/src/github.com/RoaringBitmap/RoaringBitmap/simplebenchmark (master) $ 

from roaring.

lemire avatar lemire commented on May 19, 2024

@glycerine We have Go wrappers around C code, so that's a good reference point... that is, Go's roaring should aim to be at least as good as gocroaring...

https://github.com/RoaringBitmap/gocroaring

from roaring.

lemire avatar lemire commented on May 19, 2024

I have a shallow benchmark...
https://github.com/lemire/gobitmapbenchmark

from roaring.

lemire avatar lemire commented on May 19, 2024

@glycerine

The easiest reference point might be this C/C++ benchmark...

https://github.com/RoaringBitmap/CBitmapCompetition

The README.md file there goes into the details of what each and every column means. This is not Go code, of course... it is C and C++... but the idea is the same.

from roaring.

glycerine avatar glycerine commented on May 19, 2024

great! thanks @lemire

from roaring.

lemire avatar lemire commented on May 19, 2024

@glycerine

As for your specific question:

bitspervalue: bits of storage per value stored on average... should be self-explanatory... should be better when using run compression for obvious reasons

nanotimefor2by2and : take bitmaps two-by-two, compute intersection, report the time

nanotimefor2by2or : same with union

nanotimeforwideor : compute the "wide" (complete) union of all bitmaps

nanotimeforcontains : we compute the universe size (all integers in [0,N)) then for each bitmap we check whether the values N/4, N/2 and 3N/4 are present in teach bitmap... this stresses random access...

from roaring.

glycerine avatar glycerine commented on May 19, 2024

looking at the java test data, it is about 120MB; I'm going to make a separate repo for it so as to not duplicate too much in the Go implementation.

from roaring.

lemire avatar lemire commented on May 19, 2024

@glycerine

Actually, yes, I think it is nice to have it as a secondary repository.

from roaring.

lemire avatar lemire commented on May 19, 2024

The benchmark against the C version of Roaring has been extended considerably: see https://github.com/lemire/gobitmapbenchmark

from roaring.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.