Git Product home page Git Product logo

hashtable-benchmarks's Introduction

Hashtable Benchmarks

Benchmarks for comparing hashtable implementations.

  1. Build:
bazel build :hashtable_benchmarks

Note that -c opt is the default.

  1. Run:
./bazel-bin/hashtable_benchmarks --benchmark_format=json > benchmark-results.json
  1. Analyze:

You can use http://colab.research.google.com along with hashtable_benchmarks.ipynb to parse the generated benchmark-results.json.

  1. Contribute:

We would like this to turn into the place for comparing hashtables in C++. We will accept external dependencies on other hashtable libraries (assuming they have a compatible licence). We encourage folks to improve and modify both the analysis and the benchmarks themselves as we learn things. Please join the dicussion at

https://groups.google.com/forum/#!forum/hashtable-benchmarks

Disclaimer

This is not an officially supported Google product.

hashtable-benchmarks's People

Contributors

aysylu avatar fowles avatar nbronson avatar rundong08 avatar sh1boot avatar shixiao avatar southerngs avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

hashtable-benchmarks's Issues

branch predictability during hot lookup benchmarks is too high

The hot lookup benchmarks repeatedly look for the same key. This allows the processor's branch predictor to learn the correct pattern of branches even for hash table algorithms that have unpredictable branches, biasing results in their favor. We should construct a sequence of keys for use in all of the inner loops that currently have a limit of kOpsPerKey.

Feel free to assign to me, I plan to fix this.

support non-deterministic rehash

Currently the benchmark assumes that the rehash point of a given implementation is only determined by the size(), so all of the sets constructed by GenerateSets will have the same size. This is not true for the hash table implementations at https://github.com/skarupke/flat_hash_map, which should be included. Functions that use Transpose are the ones that will take a bit of work to fix.

GenerateSets has a predictable insertion order for Density::kMax

GenerateSets builds a maximum-density set by observing a rehash at N elements and then constructing a new hash table with N-2 keys. Keys are inserted into the new table in the iteration order of the old table. Some hash table algorithms may be positively or negatively affected by this insertion pattern. For example, F14NodeSet will benefit in the large set iteration speed tests because the indirect nodes will be malloc-ed in a sequence that is a merge of two contiguous layouts, so iteration will have better locality than we would expect.

An interesting closely related issue in an earlier version of Rust's hash table is described at https://accidentallyquadratic.tumblr.com/post/153545455987/rust-hash-iteration-reinsertion

Feel free to assign to me, I plan to fix this

high load factor

bytell_hash_set's rehash is not expected load factor. My flat emhash can set load factor to 1.0
I have update dynamic_rehash_effect.cc.

static void high_load()
{
    size_t maxSize = 1U << 28;
    size_t numReps = 100;

    std::random_device rd;
    auto dis = std::uniform_int_distribution<uint32_t>{0, (1U << 31) - 1};

    for (size_t rep = 0; rep < numReps; ++rep) {
        {
            auto rng = std::mt19937(rep);
           //https://github.com/ktprime/emhash/blob/master/hash_set4.hpp
            emhash9::HashSet<uint32_t> set; 
            set.max_load_factor(0.999);

            auto t1 = getus();
            while (set.size() < maxSize) {
                auto key = dis(rng);
                size_t prevCap = set.bucket_count();
                set.insert(key);
                if (set.bucket_count() > prevCap) {
                    size_t prevSize = set.size() - 1;
                    auto lf = static_cast<double>(prevSize) / prevCap;
                    std::cout << prevCap << " " << prevSize << " " << lf << "\n";
                }
            }
            std::cout << "emhash loop " << rep << " time use " << (getus() - t1) / 1000000.000 << " sec\n";
        }

        {
            auto rng = std::mt19937(rep);
            ska::bytell_hash_set<uint32_t> set;
            set.max_load_factor(0.999);

            auto t1 = getus();
            while (set.size() < maxSize) {
                auto key = dis(rng);
                size_t prevCap = set.bucket_count();
                set.insert(key);
                if (set.bucket_count() > prevCap) {
                    size_t prevSize = set.size();
                    auto lf = static_cast<double>(prevSize) / prevCap;
                    std::cout << prevCap << " " << prevSize << " " << lf << "\n";
                }
            }
            std::cout << "skaset loop " << rep << " time use " << (getus() - t1) / 1000000.000 << "sec\n\n";
        }
    }
}

//the result.

8 8 1
16 16 1
32 32 1
64 64 1
128 128 1
256 256 1
512 512 1
1024 1023 0.999023
2048 2046 0.999023
4096 4092 0.999023
8192 8184 0.999023
16384 16368 0.999023
32768 32736 0.999023
65536 65471 0.999008
131072 130941 0.999001
262144 261882 0.999001
524288 523764 0.999001
1048576 1047528 0.999001
2097152 2095055 0.999
4194304 4190110 0.999
8388608 8380220 0.999
16777216 16760439 0.999
33554432 33520878 0.999
67108864 67041756 0.999
134217728 134083510 0.999
268435456 268167021 0.999
emhash loop 0 time use 59.2812 sec

0 1 inf
16 16 1
32 32 1
64 64 1
128 128 1
256 255 0.996094
512 509 0.994141
1024 1014 0.990234
2048 1973 0.963379
4096 3929 0.959229
8192 7745 0.945435
16384 15540 0.948486
32768 30946 0.944397
65536 61592 0.939819
131072 121771 0.929039
262144 244278 0.931847
524288 485419 0.925863
1048576 963137 0.918519
2097152 1916894 0.914046
4194304 3790756 0.903787
8388608 7541070 0.898966
16777216 14901313 0.888187
33554432 29881340 0.890533
67108864 57400366 0.855332
134217728 115062312 0.857281
268435456 230845682 0.859967
skaset loop 0 time use 49.4685sec

Linking error with folly

I see the following linking error with folly when I try to compile:

bazel-out/k8-opt/bin/external/facebook_folly/_objs/folly/SafeAssert.o:SafeAssert.cpp:function unsigned long folly::to_ascii_with<10ul, folly::to_ascii_alphabet<false>, 20ul>(char (&) [20ul], unsigned long): error: undefined reference to 'folly::detail::to_ascii_powers<10ul, unsigned long>::data'
bazel-out/k8-opt/bin/external/facebook_folly/_objs/folly/SafeAssert.o:SafeAssert.cpp:function unsigned long folly::to_ascii_with<10ul, folly::to_ascii_alphabet<false>, 20ul>(char (&) [20ul], unsigned long): error: undefined reference to 'folly::detail::to_ascii_table<10ul, folly::to_ascii_alphabet<false> >::data'
bazel-out/k8-opt/bin/external/facebook_folly/_objs/folly/SafeAssert.o:SafeAssert.cpp:function unsigned long folly::to_ascii_with<10ul, folly::to_ascii_alphabet<false>, 20ul>(char (&) [20ul], unsigned long): error: undefined reference to 'folly::detail::to_ascii_powers<10ul, unsigned long>::data'
bazel-out/k8-opt/bin/external/facebook_folly/_objs/folly/SafeAssert.o:SafeAssert.cpp:function unsigned long folly::to_ascii_with<10ul, folly::to_ascii_alphabet<false>, 20ul>(char (&) [20ul], unsigned long): error: undefined reference to 'folly::detail::to_ascii_powers<10ul, unsigned long>::data'
bazel-out/k8-opt/bin/external/facebook_folly/_objs/folly/SafeAssert.o:SafeAssert.cpp:function unsigned long folly::to_ascii_with<10ul, folly::to_ascii_alphabet<false>, 20ul>(char (&) [20ul], unsigned long): error: undefined reference to 'folly::detail::to_ascii_powers<10ul, unsigned long>::data'
bazel-out/k8-opt/bin/external/facebook_folly/_objs/folly/SafeAssert.o:SafeAssert.cpp:function unsigned long folly::to_ascii_with<10ul, folly::to_ascii_alphabet<false>, 20ul>(char (&) [20ul], unsigned long): error: undefined reference to 'folly::detail::to_ascii_table<10ul, folly::to_ascii_alphabet<false> >::data'

I've removed folly in my testing for now. But I thought I would open an issue to see if this is a problem others see due to changes with folly, or if it's an issue with my configuration.

My build command is:

BAZEL_CXXOPTS="-std=c++14:-stdlib=libc++:-isystem${LLVM_HDR}/:-isystem${CLANG_HDR}:-isystem${LLVM_HDR}/ext/" BAZEL_LINKOPTS="-v:-stdlib=libc++:-L${LLVM_LIB}:-lc++:-lm:-Wl,-rpath=${LLVM_LIB}" bazel build -c opt --dynamic_mode=off hashtable_benchmarks

I'm using llvm-9 on Ubuntu 20.04 system. Everything is pretty standard so the issue would likely be easy to reproduce.

bazel is awful

Why can't you just use make like sane projects?

First issue, I forget the confusing error messages, but eventually I figured out that 'folly' has renamed its branch from master to main. patch:

+++ b/WORKSPACE
@@ -59,8 +59,8 @@ http_archive(
http_archive(
name = "facebook_folly",
build_file = "//:BUILD.folly",

  • strip_prefix = "folly-master",
  • urls = ["https://github.com/facebook/folly/archive/master.zip"],
  • strip_prefix = "folly-main",
  • urls = ["https://github.com/facebook/folly/archive/main.zip"],
    )

ska::flat_hash_set and ska::bytell_hash_set

Second issue:

ERROR: /home/willy/.cache/bazel/_bazel_willy/a2fc307a5e109a80523e72df06593c55/external/facebook_folly/BUILD.bazel:1:11: C++ compilation of rule '@facebook_folly//:folly' failed (Exit 1) gcc failed: error executing command /usr/bin/gcc -U_FORTIFY_SOURCE -fstack-protector -Wall -Wunused-but-set-parameter -Wno-free-nonheap-object -fno-omit-frame-pointer -g0 -O2 '-D_FORTIFY_SOURCE=1' -DNDEBUG -ffunction-sections ... (remaining 48 argument(s) skipped)

Use --sandbox_debug to see verbose messages from the sandbox
In file included from external/facebook_folly/folly/container/detail/F14Table.h:42,
from external/facebook_folly/folly/container/detail/F14Table.cpp:17:
external/facebook_folly/folly/functional/Invoke.h:22:10: fatal error: boost/preprocessor/control/expr_iif.hpp: No such file or directory
22 | #include <boost/preprocessor/control/expr_iif.hpp>
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
compilation terminated.
Target //:hashtable_benchmarks failed to build

After trying a bunch of things, I decided to just "apt-get install libboost-all-dev"

Now I get a new error:
ERROR: /home/willy/.cache/bazel/_bazel_willy/a2fc307a5e109a80523e72df06593c55/external/facebook_folly/BUILD.bazel:1:11: undeclared inclusion(s) in rule '@facebook_folly//:folly':
this rule is missing dependency declarations for the following files included by 'external/facebook_folly/folly/lang/ToAscii.cpp':
'/usr/lib/gcc/x86_64-linux-gnu/11/include/stddef.h'
'/usr/lib/gcc/x86_64-linux-gnu/11/include/stdint.h'
'/usr/lib/gcc/x86_64-linux-gnu/11/include/stdarg.h'
Target //:hashtable_benchmarks failed to build

and I'm all out of spoons for debugging this.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.