Git Product home page Git Product logo

martinus / robin-hood-hashing Goto Github PK

View Code? Open in Web Editor NEW
1.5K 40.0 140.0 8.98 MB

Fast & memory efficient hashtable based on robin hood hashing for C++11/14/17/20

Home Page: https://gitter.im/martinus/robin-hood-hashing

License: MIT License

C++ 93.42% Ruby 0.06% CMake 6.14% Shell 0.36% C 0.02%
c-plus-plus hash-tables unordered-maps hash container header-only stl-containers single-file no-dependencies cpp

robin-hood-hashing's Introduction

➵ robin_hood unordered map & set Release GitHub license


NOTE: Unfortunately I do not have time to continue development for this hashmap. I have a worthy successor though, please head over to: ankerl::unordered_dense::{map, set}

I have spent a lot of time developing and improving it robin_hood, and it works quite well for most use cases. I won't make any updates to this code any more, unless they are PRs for critical bug fixes.


Travis CI Build Status Appveyor Build Status Codacy Badge Total alerts Language grade: C/C++ Coverage Status

robin_hood::unordered_map and robin_hood::unordered_set is a platform independent replacement for std::unordered_map / std::unordered_set which is both faster and more memory efficient for real-world use cases.

Installation & Usage

Direct Inclusion

  1. Add robin_hood.h to your C++ project.
  2. Use robin_hood::unordered_map instead of std::unordered_map
  3. Use robin_hood::unordered_set instead of std::unordered_set

Conan, the C/C++ Package Manager

  1. Setup your CMakeLists.txt (see Conan documentation on how to use MSBuild, Meson and others) like this:
    project(myproject CXX)
    
    add_executable(${PROJECT_NAME} main.cpp)
    
    include(${CMAKE_BINARY_DIR}/conanbuildinfo.cmake) # Include Conan-generated file
    conan_basic_setup(TARGETS) # Introduce Conan-generated targets
    
    target_link_libraries(${PROJECT_NAME} CONAN_PKG::robin-hood-hashing)
  2. Create conanfile.txt in your source dir (don't forget to update the version)
    [requires]
    robin-hood-hashing/3.11.5
    
    [generators]
    cmake
  3. Install and run Conan, then build your project as always:
    pip install conan
    mkdir build
    cd build
    conan install ../ --build=missing
    cmake ../
    cmake --build .
    The robin-hood-hashing package in Conan is kept up to date by Conan contributors. If the version is out of date, please create an issue or pull request on the conan-center-index repository.

Benchmarks

Please see extensive benchmarks at Hashmaps Benchmarks. In short: robin_hood is always among the fastest maps and uses far less memory than std::unordered_map.

Design Choices

  • Two memory layouts. Data is either stored in a flat array, or with node indirection. Access for unordered_flat_map is extremely fast due to no indirection, but references to elements are not stable. It also causes allocation spikes when the map resizes, and will need plenty of memory for large objects. Node based map has stable references & pointers (NOT iterators! Similar to std::unordered_map) and uses const Key in the pair. It is a bit slower due to indirection. The choice is yours; you can either use robin_hood::unordered_flat_map or robin_hood::unordered_node_map directly. If you use robin_hood::unordered_map It tries to choose the layout that seems appropriate for your data.

  • Custom allocator. Node based representation has a custom bulk allocator that tries to make few memory allocations. All allocated memory is reused, so there won't be any allocation spikes. It's very fast as well.

  • Optimized hash. robin_hood::hash has custom implementations for integer types and for std::string that are very fast and falls back to std::hash for everything else.

  • Depends on good Hashing. For a really bad hash the performance will not only degrade like in std::unordered_map, the map will simply fail with an std::overflow_error. In practice, when using the standard robin_hood::hash, I have never seen this happening.

License

Licensed under the MIT License. See the LICENSE file for details.

by martinus

robin-hood-hashing's People

Contributors

acd1034 avatar actboy168 avatar adl avatar alexey-milovidov avatar avitase avatar cloudhan avatar gergondet avatar jeremyg-lunarg avatar k0zmo avatar martinus avatar ryan-rsm-mckenzie avatar ryemutt avatar talkless avatar tcpan avatar wolfpld avatar wyattoday avatar zhanglistar avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

robin-hood-hashing's Issues

Segmentation fault after copy assignment

Hello, thank you for your great Library!
While testing, I found that the following code resulted in a segmentation fault.

#include <robin_hood.h>
#include <string>

int main(){
  robin_hood::unordered_map<std::string,std::string> map;
  robin_hood::unordered_map<std::string,std::string> tmp;
  map.emplace("a", "b");
  map = tmp;
  map.emplace("c", "d");
  return 0;
}

I tried version 3.2.7 and the current master under MacOs 10.14.4 using clang and g++8.

edit: simplify example

Not working

`//============================================================================
// Name : ROBIN_HOOD TEST
// Author : legion
//============================================================================

#include

#include "robin_hood.h"
using namespace std;

int main() {
robin_hood::unordered_map<int, std::pair<int, int>> data;
data.insert(std::make_pair(1, std::make_pair(2, 2)));
auto f = data.find(1);
cout << f.first;
return 0;
}
`

ERROR
make all Building file: ../src/console.cpp Invoking: GCC C++ Compiler g++ -std=c++17 -O0 -g3 -Wall -c -fmessage-length=0 -MMD -MP -MF"src/console.d" -MT"src/console.o" -o "src/console.o" "../src/console.cpp" ../src/console.cpp: In function ‘int main()’: ../src/console.cpp:16:54: error: no matching function for call to ‘robin_hood::detail::Table<true, 80, int, std::pair<int, int>, robin_hood::hash<int>, std::equal_to<int> >::insert(std::pair<int, std::pair<int, int> >)’ 16 | data.insert(std::make_pair(1, std::make_pair(2, 2))); | ^ In file included from ../src/console.cpp:11: ../src/robin_hood.h:1647:10: note: candidate: ‘template<class Iter> void robin_hood::detail::Table<IsFlat, MaxLoadFactor100, Key, T, Hash, KeyEqual>::insert(Iter, Iter) [with Iter = Iter; bool IsFlat = true; long unsigned int MaxLoadFactor100 = 80; Key = int; T = std::pair<int, int>; Hash = robin_hood::hash<int>; KeyEqual = std::equal_to<int>]’ 1647 | void insert(Iter first, Iter last) { | ^~~~~~ ../src/robin_hood.h:1647:10: note: template argument deduction/substitution failed: ../src/console.cpp:16:54: note: candidate expects 2 arguments, 1 provided 16 | data.insert(std::make_pair(1, std::make_pair(2, 2))); | ^ In file included from ../src/console.cpp:11: ../src/robin_hood.h:1667:31: note: candidate: ‘std::pair<robin_hood::detail::Table<IsFlat, MaxLoadFactor100, Key, T, Hash, KeyEqual>::Iter<false>, bool> robin_hood::detail::Table<IsFlat, MaxLoadFactor100, Key, T, Hash, KeyEqual>::insert(const value_type&) [with bool IsFlat = true; long unsigned int MaxLoadFactor100 = 80; Key = int; T = std::pair<int, int>; Hash = robin_hood::hash<int>; KeyEqual = std::equal_to<int>; robin_hood::detail::Table<IsFlat, MaxLoadFactor100, Key, T, Hash, KeyEqual>::value_type = robin_hood::pair<int, std::pair<int, int> >]’ 1667 | std::pair<iterator, bool> insert(const value_type& keyval) { | ^~~~~~ ../src/robin_hood.h:1667:56: note: no known conversion for argument 1 from ‘std::pair<int, std::pair<int, int> >’ to ‘const value_type&’ {aka ‘const robin_hood::pair<int, std::pair<int, int> >&’} 1667 | std::pair<iterator, bool> insert(const value_type& keyval) { | ~~~~~~~~~~~~~~~~~~^~~~~~ ../src/robin_hood.h:1672:31: note: candidate: ‘std::pair<robin_hood::detail::Table<IsFlat, MaxLoadFactor100, Key, T, Hash, KeyEqual>::Iter<false>, bool> robin_hood::detail::Table<IsFlat, MaxLoadFactor100, Key, T, Hash, KeyEqual>::insert(robin_hood::detail::Table<IsFlat, MaxLoadFactor100, Key, T, Hash, KeyEqual>::value_type&&) [with bool IsFlat = true; long unsigned int MaxLoadFactor100 = 80; Key = int; T = std::pair<int, int>; Hash = robin_hood::hash<int>; KeyEqual = std::equal_to<int>; robin_hood::detail::Table<IsFlat, MaxLoadFactor100, Key, T, Hash, KeyEqual>::value_type = robin_hood::pair<int, std::pair<int, int> >]’ 1672 | std::pair<iterator, bool> insert(value_type&& keyval) { | ^~~~~~ ../src/robin_hood.h:1672:51: note: no known conversion for argument 1 from ‘std::pair<int, std::pair<int, int> >’ to ‘robin_hood::detail::Table<true, 80, int, std::pair<int, int>, robin_hood::hash<int>, std::equal_to<int> >::value_type&&’ {aka ‘robin_hood::pair<int, std::pair<int, int> >&&’} 1672 | std::pair<iterator, bool> insert(value_type&& keyval) { | ~~~~~~~~~~~~~^~~~~~ ../src/console.cpp:18:13: error: ‘class robin_hood::detail::Table<true, 80, int, std::pair<int, int>, robin_hood::hash<int>, std::equal_to<int> >::Iter<false>’ has no member named ‘first’ 18 | cout << f.first; | ^~~~~ make: *** [src/subdir.mk:20: src/console.o] Error 1

Add hashbits to infobytes

Should get better performance with this. But: how many bits to take?
Maybe try to be adaptive. Take as many bits as possible, and reduce the number of bits if necessary.

Reduction should be very fast:

// interpret mInfo as uint64_t, loop everything uint64_t& infos:
info = (info >>1) & 0x7f7f7f7f7f7f7f7f;

But we need a few additional members:

int numHashBits = 2;
int maxUntilResize = 0xff - (1<<2 - 1);
int infoInc = 1 << numHashBits;

redo benchmark with suggestions on reddit's post

ubsan: load/store of misaligned address & strange behavior with different -O levels (g++)

I have faced to strange behavior (suppose UB) while trying to use compound template-generated type as unordered_map KeyType: https://gist.github.com/Nekrolm/47be0ea8644247ee7390f59a4ac3a778

In Debug build with ubsan instrumentation I got warning about misaligned access (robin_hood.h line 1991).
With -O2 it works fine. But with -O3 it crushes with segfault.

g++ -v
Using built-in specs.
COLLECT_GCC=g++
COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-linux-gnu/7/lto-wrapper
OFFLOAD_TARGET_NAMES=nvptx-none
OFFLOAD_TARGET_DEFAULT=1
Target: x86_64-linux-gnu
Configured with: ../src/configure -v --with-pkgversion='Ubuntu 7.4.0-1ubuntu1~18.04.1' --with-bugurl=file:///usr/share/doc/gcc-7/README.Bugs --enable-languages=c,ada,c++,go,brig,d,fortran,objc,obj-c++ --prefix=/usr --with-gcc-major-version-only --program-suffix=-7 --program-prefix=x86_64-linux-gnu- --enable-shared --enable-linker-build-id --libexecdir=/usr/lib --without-included-gettext --enable-threads=posix --libdir=/usr/lib --enable-nls --with-sysroot=/ --enable-clocale=gnu --enable-libstdcxx-debug --enable-libstdcxx-time=yes --with-default-libstdcxx-abi=new --enable-gnu-unique-object --disable-vtable-verify --enable-libmpx --enable-plugin --enable-default-pie --with-system-zlib --with-target-system-zlib --enable-objc-gc=auto --enable-multiarch --disable-werror --with-arch-32=i686 --with-abi=m64 --with-multilib-list=m32,m64,mx32 --enable-multilib --with-tune=generic --enable-offload-targets=nvptx-none --without-cuda-driver --enable-checking=release --build=x86_64-linux-gnu --host=x86_64-linux-gnu --target=x86_64-linux-gnu
Thread model: posix
gcc version 7.4.0 (Ubuntu 7.4.0-1ubuntu1~18.04.1) 

minimize number of moves

when removing an element, all data is shifted down by once. That's not strictly necessary, it's enough when only the last for the same bucket is moved.

Same for insert: we don't need to shiftUp everything.

Implement this, and compare in random insert & remove benchmark. Compare number of moves before/after.

rename maps

Rename flat_map into unordered_flat_map and node_map into unordered_node_map to prevent confusion with other flat_map implementations like boost that use a sorted vector as the basis.

Can't use same struct for both hashing and equality

Some programmers like to use the same struct for both hashing and equality when defining custom types. See this block of code in HHVM for a real-world example.

This practice is supported by std::unordered_map:

#include <iostream>
#include <unordered_map>
#include <utility>

#include "robin_hood.h"

template<typename T, typename U, typename V, typename W>
using Map = std::unordered_map<T, U, V, W>;
// using Map = robin_hood::unordered_map<T, U, V, W>;
// doesn't compile if you use robin_hood instead

struct HashCmp {
  size_t operator()(int a) const {
    return a;
  }

  size_t operator()(int a, int b) const {
    return a == b;
  }
};

int main() {
  Map<int, int, HashCmp, HashCmp> m;
}

it = map.erase(it) potentially passes object twice

It seems @jcageman has stumbled uppon an issue that probably all robin-hood map with backward shift deletion have (see #18). Here is a simple reproducer

Insert two elements, where both hash to the last bucket of the map. One entry will be the last one of the map, the other one will wrap around and use the first bucket.

robin_hood::unordered_node_map<int, int, DummyHash> map;
map[1024] = 1;
map[2048] = 2;

it = map.begin(); // it points to 2048, the first element which has wrapped around
++it;  // it points to the last element, 1024 which is in its original bucket
it = map.erase(it); // backward shift deletion removes 1024 and wraps back 2048, so it now (again!) points to 2048 and NOT end
it = map.erase(it); // finally we are at the end.

Also see Tessil/robin-map#20

Consider porting to C++11 / C++03

A large number of development environments still do not have C++14 support, especially in many embedded environments that only support C++03.

Can you consider porting this powerful and efficient container to these older versions?

Thanks :-)

CMake build

the current Makefile is a mess because my make-foo is low. Create a nice CMake build instead.

What does unaligned_load do?

More a question than an issue (sorry I couldn't figure out how to add labels in github).

I've been reading the source code and, even though there is a comment, I can't understand what is the purpose of this function.

template <typename T>
inline T unaligned_load(void const* ptr) noexcept {
    // using memcpy so we don't get into unaligned load problems.
    // compiler should optimize this very well anyways.
    T t;
    memcpy(&t, ptr, sizeof(T));
    return t;
}

Why not just do a cast?

*reinterpret_cast<T*>(ptr);

I've tried running both in my PC, and they seem to do the same, so may it's a performance thing?

Thanks for making this awesome piece of code :)

Fails to find keys after reserve and assign

See the following test:

TEST_CASE_TEMPLATE("reserve_and_assign", Map, robin_hood::unordered_flat_map<std::string, int>,
                   robin_hood::unordered_node_map<std::string, int>) {
    Map a = {
        {"button", {}},
        {"selectbox-tr", {}},
        {"slidertrack-t", {}},
        {"sliderarrowinc-hover", {}},
        {"text-l", {}},
        {"title-bar-l", {}},
        {"checkbox-checked-hover", {}},
        {"datagridexpand-active", {}},
        {"icon-waves", {}},
        {"sliderarrowdec-hover", {}},
        {"datagridexpand-collapsed", {}},
        {"sliderarrowinc-active", {}},
        {"radio-active", {}},
        {"radio-checked", {}},
        {"selectvalue-hover", {}},
        {"huditem-l", {}},
        {"datagridexpand-collapsed-active", {}},
        {"slidertrack-b", {}},
        {"selectarrow-hover", {}},
        {"window-r", {}},
        {"selectbox-tl", {}},
        {"icon-score", {}},
        {"datagridheader-r", {}},
        {"icon-game", {}},
        {"sliderbar-c", {}},
        {"window-c", {}},
        {"datagridexpand-hover", {}},
        {"button-hover", {}},
        {"icon-hiscore", {}},
        {"sliderbar-hover-t", {}},
        {"sliderbar-hover-c", {}},
        {"selectarrow-active", {}},
        {"window-tl", {}},
        {"checkbox-active", {}},
        {"sliderarrowdec-active", {}},
        {"sliderbar-active-b", {}},
        {"sliderarrowdec", {}},
        {"window-bl", {}},
        {"datagridheader-l", {}},
        {"sliderbar-t", {}},
        {"sliderbar-active-t", {}},
        {"text-c", {}},
        {"window-br", {}},
        {"huditem-c", {}},
        {"selectbox-l", {}},
        {"icon-flag", {}},
        {"sliderbar-hover-b", {}},
        {"icon-help", {}},
        {"selectvalue", {}},
        {"title-bar-r", {}},
        {"sliderbar-active-c", {}},
        {"huditem-r", {}},
        {"radio-checked-active", {}},
        {"selectbox-c", {}},
        {"selectbox-bl", {}},
        {"icon-invader", {}},
        {"checkbox-checked-active", {}},
        {"slidertrack-c", {}},
        {"sliderarrowinc", {}},
        {"checkbox", {}},
    };

    Map b;
    b = a;
    REQUIRE(b.find("button") != b.end()); // Passes OK.

    Map c;
    c.reserve(90);
    c = a;
    REQUIRE(c.find("button") != c.end()); // Fails.
}

Whether or not it passes depends on the size passed to reserve, as well as the number of elements in the original map. Iteration through the elements seems to work fine.

GCC -O2 -Wstrict-aliasing=2 warnings

GCC reports strict aliasing issues with -O2 (or higher optimization level) and -Wstrict-aliasing=2

robin_hood.h:2176:47: warning: dereferencing type-punned pointer will break strict-aliasing rules [-Wstrict-aliasing]
robin_hood.h:2160:50: warning: dereferencing type-punned pointer will break strict-aliasing rules [-Wstrict-aliasing]

These are probably false positives, which can be treated with

#pragma GCC diagnostic ignored "-Wstrict-aliasing"

or using less questionable pointer types conversion.

Reproduced with gcc versions 6.3.0 and 8.3.0.

compiler warning VS2017 (version 19.00)

relevant code: https://godbolt.org/z/7l35xp

warning C4100: 'tuple2': unreferenced formal parameter
note: see reference to function template instantiation 'robin_hood::pair<const int,T>::pair<_Ty&&,0,,>(std::tuple<_Ty &&> &,std::tuple<> &,std::integer_sequence<unsigned int,0>,std::integer_sequence<::size_t>)' being compiled

With 19.00 of the VS compiler i can reproduce the warning (my exact version is 19.00.24215.1 locally). With the latest VS compiler i can't reproduce it. See the result of both compiler versions in the link above. The code itself seems fine, so i suspect it might be a missing compiler feature or even a bug.

Creating `std::vector<std::pair<K, V>>` is less than ergonomic

This code compiles for std::unordered_map but not for robin_hood::unordered_map:

#include <utility>

#include "robin_hood.h"

int main() {
  robin_hood::unordered_map<int, int> m;
  std::vector<std::pair<int, int>> rows{m.begin(), m.end()};
}

Replacing std::pair with robin_hood::pair would make the code compile, but would also cause a hashmap implementation's details to propagate throughout the codebase. Replacing this one-liner with a for-loop would also work, but the ergonomics are worse and developers may wind up propagating robin_hood::pairs anyway by following the path of least resistance.

clang++ on windows

././tsl/robin_hood.h:150:31: error: unknown type name 'uint128_t'
auto result = static_cast<uint128_t>(a) * static_cast<uint128_t>(b);
^
././tsl/robin_hood.h:150:59: error: unknown type name 'uint128_t'
auto result = static_cast<uint128_t>(a) * static_cast<uint128_t>(b);

Adding a CMake interface library target

As this is a library, it would be awesome to add a library definition to the CMake target for easier inclusion in other projects. As usually, you can include libraries from a cloned repository with a single call to add_subdirectory(external/somelibrary) without also having it building tests or other binaries.

If you like, I will gladly provide a PR for this. Though I must note that I'm not really familiar with your specific structuring of the CMake part. Therefore, I think it might be beneficial if you were to implement it, or first gave your opinion on some of the specifics.

Implementation

While ideally the library targets would also be used internally to "link" against the headers in all tests and examples, it can be considered a convention and thus is not strictly necessary.

Essentially it boils down to adding something along these lines to CMakeLists.txt (or src/CMakeLists.txt depending on your conventions). I'm assuming robin_hood as a library name:

add_library(robin_hood INTERFACE)
add_library(robin_hood::robin_hood ALIAS robin_hood)
target_compile_features(robin_hood INTERFACE ${RH_cxx_standard})

target_include_directories(robin_hood INTERFACE
	$<BUILD_INTERFACE:${PROJECT_SOURCE_DIR}/src/include>
	$<INSTALL_INTERFACE:src/include>
)

In case you need a reference implementation for the CMake part, you can have a look at the CMakeLists.txt of other great header-only libraries like foonathan/type_safe, microsoft/GSL.

If you do want to build binaries by default (probably useful when developing the library), but only when this project is NOT included by another project, it might be beneficial to use a master project detection like the following, and to initialize your options accordingly:

# Determine if built as a subproject (using add_subdirectory)
# or if it is the master project.
set(MASTER_PROJECT OFF)
if(CMAKE_CURRENT_SOURCE_DIR STREQUAL CMAKE_SOURCE_DIR)
	set(MASTER_PROJECT ON)
endif()

option(BUILD_TESTS "Build the tests" ${MASTER_PROJECT})

Random sampling

I have a branch of this repo that adds a sample() method which, given a PRNG, returns a random entry in the map. The value of this method is that it enables probabilistic cache eviction without requiring a separate random-access dictionary of the entries.

Is this a feature you would be interested in adding to the class?

Inconsistent tags

Hi,

You have some tags with 'v' prefix and others do not have it.
Is it possible to use consistent tags (any option), to keep scripts stable?

Debugflag to enable iterator stability error check

Add a define that enables error checking in robin_hood map, to check for invalid uses of iteratos after map/set is modified. This is mostly for debugging.

  • Add a flag ROBIN_HOOD_ITERATOR_STABILITY_CHECK or something like that which enables/disables this code part:
    • Add counter member variable to robin_hood (initialized e.g. with a random number or just zero)
    • Counter is increased whenever the map is modified
    • All iterators get this counter when counter
    • accessing the iterator checks if the iterator's counter is the same as the map's counter, throws an exception otherwise

Thanks John for suggesting this!

robin_hood.h:2012:13: error: attempt to free a non-heap object 'fm' [-Werror=free-nonheap-object]

COLLECT_GCC=g++
COLLECT_LTO_WRAPPER=/opt/rh/devtoolset-7/root/usr/libexec/gcc/x86_64-redhat-linux/7/lto-wrapper
Target: x86_64-redhat-linux
Configured with: ../configure --enable-bootstrap --enable-languages=c,c++,fortran,lto --prefix=/opt/rh/devtoolset-7/root/usr --mandir=/opt/rh/devtoolset-7/root/usr/share/man --infodir=/opt/rh/devtoolset-7/root/usr/share/info --with-bugurl=http://bugzilla.redhat.com/bugzilla --enable-shared --enable-threads=posix --enable-checking=release --enable-multilib --with-system-zlib --enable-__cxa_atexit --disable-libunwind-exceptions --enable-gnu-unique-object --enable-linker-build-id --with-gcc-major-version-only --enable-plugin --with-linker-hash-style=gnu --enable-initfini-array --with-default-libstdcxx-abi=gcc4-compatible --with-isl=/builddir/build/BUILD/gcc-7.3.1-20180303/obj-x86_64-redhat-linux/isl-install --enable-libmpx --enable-gnu-indirect-function --with-tune=generic --with-arch_32=i686 --build=x86_64-redhat-linux
Thread model: posix
gcc version 7.3.1 20180303 (Red Hat 7.3.1-5) (GCC)

Usage for avoiding duplicates

In my benchmarks for using robin_hood for avoiding duplicate pointers/integers, the performance seems almost magical. In a test with 8 random unique integers repeated once (for a total of 16 insertions), even robin_hood::unordered_set<size_t> it is about 1.2x faster than an array that uses linear search to avoid duplicates. So much so that it feels like I'm doing something wrong. Is that expected?

How does robin_hood::unordered_set manage to be so fast at these small sizes while maintaining iterator stability? Or does it give that up?

On a separate note, I'm a bit worried that the README.md says that it falls back to std::hash in other cases. But as Malte Skarupke notes, this can be bad on Windows because Visual C++'s std::hash uses stronger hash functions to compensate for the fact that its unordered_set implementation uses power of 2 sizes and then only keeps the lower bits of the hash result. In fact, not only does it use FNV1 hash for integers, but also for enum, float, double, and pointers. So it feels dangerous to fallback to std::hash. Is there a customization point to use say boost::hash as the fallback instead?

Crashing with overflow_error

Hello,

I am currently trying to integrate robin hood hashing into my project.
As mentioned on your github, I simply included the robin_hood.h and used robin_hood::unordered_map instead of std::unordered_map.

However, upon start, my program crashes and reports the following error:

terminate called after throwing an instance of 'std::overflow_error'
what(): robin_hood::map overflow
Command terminated by signal 6

Do you have any idea what could cause this?

Thanks!

Cheers,
Pierre

Feature request: Add ability to save and load a hash_map to / from disk

Firstly, let me tell you that I am very satisfied with your robin hood hash map. It performs extremely well and the the node map in particular is very memory efficient. Thank you for sharing this with us.

Since I am dealing with relatively large dynamically created hash maps of up to 15GB I would be interested in having the ability to save a hash map to disk and restore it later. Do you have a preferred / tested method to achieve this?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.