Git Product home page Git Product logo

xxhash_cpp's Introduction

xxhash_cpp

Port of the xxHash library to C++17.

CircleCI

Compatibility

Compiler Min. Version
MSVC (Visual Studio) 19.1 (VS 2017.3 P2)
clang 3.9
gcc 7

Example Usage

// standalone hash
std::array<int, 4> input {322, 2137, 42069, 65536};
xxh::hash_t<32> hash = xxh::xxhash<32>(input); 

// hash streaming
std::array<unsigned char, 512> buffer;
xxh::hash_state_t<64> hash_stream; 
while (fill_buffer(buffer))
{
  hash_stream.update(buffer);
}
xxh::hash_t<64> final_hash = hash_stream.digest();

The template argument specifies whether the algorithm will use the 32 or 64 bit version. Other values are not allowed. Typedefs hash32_t, hash64_t, hash_state32_t and hash_state64_t are provided.

xxh::xxhash and xxh::hash_state_t::update provide several convenient overloads, all accepting optional seed and endianness arguments:

  • C-style const void* + size_t pair
  • const std::vector<T>&
  • const std::basic_string<T>&
  • A pair of templated iterators
  • const std::array<T, N>&
  • const std::initializer_list<T>&

Build Instructions

The library is provided as a single standalone header, for static linking only. No build instructions are nessessary.

xxHash - Extremely fast hash algorithm

xxHash is an Extremely fast Hash algorithm, running at RAM speed limits. It successfully completes the SMHasher test suite which evaluates collision, dispersion and randomness qualities of hash functions. Code is highly portable, and hashes are identical on all platforms (little / big endian).

Benchmarks

The benchmark uses SMHasher speed test, compiled with Visual 2010 on a Windows Seven 32-bits box. The reference system uses a Core 2 Duo @3GHz

Name Speed Quality Author
xxHash 5.4 GB/s 10 Y.C.
MurmurHash 3a 2.7 GB/s 10 Austin Appleby
SBox 1.4 GB/s 9 Bret Mulvey
Lookup3 1.2 GB/s 9 Bob Jenkins
CityHash64 1.05 GB/s 10 Pike & Alakuijala
FNV 0.55 GB/s 5 Fowler, Noll, Vo
CRC32 0.43 GB/s 9
MD5-32 0.33 GB/s 10 Ronald L.Rivest
SHA1-32 0.28 GB/s 10

Q.Score is a measure of quality of the hash function. It depends on successfully passing SMHasher test set. 10 is a perfect score. Algorithms with a score < 5 are not listed on this table.

A more recent version, XXH64, has been created thanks to Mathias Westerdahl, which offers superior speed and dispersion for 64-bits systems. Note however that 32-bits applications will still run faster using the 32-bits version.

SMHasher speed test, compiled using GCC 4.8.2, on Linux Mint 64-bits. The reference system uses a Core i5-3340M @2.7GHz

Version Speed on 64-bits Speed on 32-bits
XXH64 13.8 GB/s 1.9 GB/s
XXH32 6.8 GB/s 6.0 GB/s

License

The library file xxhash.hpp is BSD licensed.

Build modifiers

The following macros influence xxhash behavior. They are all disabled by default.

  • XXH_FORCE_NATIVE_FORMAT : on big-endian systems : use native number representation, resulting in system-specific results. Breaks consistency with little-endian results.

  • XXH_CPU_LITTLE_ENDIAN : if defined to 0, sets the native endianness to big endian, if defined to 1, sets the native endianness to little endian, if left undefined, the endianness is resolved at runtime, before main is called, at the cost of endianness not being constexpr.

  • XXH_FORCE_MEMORY_ACCESS : if defined to 2, enables unaligned reads as an optimization, this is not standard compliant, if defined to 1, enables the use of packed attribute for optimization, only defined for gcc and icc otherwise, uses the default fallback method (memcpy)

Other languages

Beyond the C reference version, xxHash is also available on many programming languages, thanks to great contributors. They are listed here.

xxhash_cpp's People

Contributors

callumattryde avatar fabrice-baray avatar flamefire avatar jimmyvandenbergh avatar redspah avatar studoot avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

xxhash_cpp's Issues

128-bit checksum compilation error with 0.8.1

Compiling this function, which worked with version 0.7.3. to_string() not given here, but what it does is reasonably obvious. Note the error is not in this function, but in the header:

std::string
xxhash3(std::string_view data)
{
#define checksum_bits 128
    static_assert(checksum_bits == 64 || checksum_bits == 128);

    xxh::hash3_state_t<checksum_bits> state;
    state.update(data.data(), data.size());

    // convert checksum to canonical byte order                                                                                     
    xxh::canonical_t<checksum_bits> const canonical{state.digest()};
    auto const hash{canonical.get_hash()};

#if checksum_bits == 128
    return to_string(hash.low64, hash.high64);
#else
    return to_string(hash);
#endif
}

g++ 8.3 in C++17 mode:

xxhash/0.8.1/include/xxhash.hpp: In function ‘xxh::uint128_t xxh::intrin::bit_ops::mult64to128(uint64_t, uint64_t)’:
xxhash/0.8.1/include/xxhash.hpp:290:10: error: request for member ‘low64’ in ‘r128’, which is of non-class type ‘__int128 unsigned’
     r128.low64 = (uint64_t)(product);
          ^~~~~
xxhash/0.8.1/include/xxhash.hpp:291:10: error: request for member ‘high64’ in ‘r128’, which is of non-class type ‘__int128 unsigned’
     r128.high64 = (uint64_t)(product >> 64);
          ^~~~~~
xxhash/0.8.1/include/xxhash.hpp:292:12: error: could not convert ‘r128’ from ‘__int128 unsigned’ to ‘xxh::uint128_t’ {aka ‘xxh::typedefs::uint128_t’}
     return r128;
            ^~~~
xxhash/0.8.1/include/xxhash.hpp: At global scope:
xxhash/0.8.1/include/xxhash.hpp:915:3: error: explicit template specialization cannot have a storage class
   static inline uint_t<32> avalanche<32>(uint_t<32> hash) {
   ^~~~~~
xxhash/0.8.1/include/xxhash.hpp:925:3: error: explicit template specialization cannot have a storage class
   static inline uint_t<64> avalanche<64>(uint_t<64> hash) {
   ^~~~~~

g++-5 is not supported

While the readme states that g++-5 would be supported it does not work:

xxhash.hpp:625:3: error: array must be initialized with a brace-enclosed initializer
xxhash.hpp:625:3: error: too many initializers for ‘std::array<long unsigned int, 4ul>’

It seems to be a problem with brace elision, so adding extra braces should work.

I'd suggest to add the "supported compilers" from the readme to CI

Make warning free build

Compiling the current source throws a couple warnings (g++-6:

  • -Wunknown-pragma
  • -Wattributes
  • -Wunused-parameter
  • -Wsign-compare
  • -Wpedantic

I suggest to build with -Werror on CI and take care of the warnings

Possible hashing inconsistency

Hi,

Apologies in advance for the vague description. I am not really sure what's going on so I'll just do my best to describe my usage scenario:

Somewhere in my program I am building a std::vector<uint64_t>:

        std::vector<uint64_t> hashes;
        std::transform(symbols.begin() + i - size, symbols.begin() + i, std::back_inserter(hashes), [](const Symbol& sm) { return sm.CalculatedHashValue; }); 
        hashes.push_back(s.HashValue);
        auto hash = xxh::xxhash<64>(hashes);
        fmt::print("hash(");
        for (auto h : hashes) { fmt::print("{} ", h); }
        fmt::print(") = {}\n", hash);
        s.CalculatedHashValue = hash;        

This outputs:

hash(123 456 0) = 14523173615704738576

I wanted to debug this result so in main.cpp I did:

    std::array<uint64_t, 3> arr { 123, 456, 0 };
    std::cout << "array hash = " << xxh::xxhash<64>(arr, 0) << std::endl;

This outputs:

array hash = 13763445824703203362

As you can see there's an inconsistency even though the values are the same.

The next thing I added another test in my main.cpp:

    std::vector<uint64_t> vec { 123, 456, 0 };
    std::cout << "vector hash = " << xxh::xxhash<64>(vec) << std::endl;

And now the weird part. After adding the above two lines, I get:

hash(123 456 0  ) = 13763445824703203362
array hash = 13763445824703203362
vector hash = 13763445824703203362

Now the hashes are consistent as they should be. Could this be a bug?

EDIT:
The problem seems to go away if I manually define the endianess before including xxhash.hpp:

#define XXH_CPU_LITTLE_ENDIAN 1 

Best,
Bogdan

Linking issue with static lib

Hi,

Using xxhash.hpp in my static lib, needs to be included more or less globally in order to have access to the exposed hash types. Linkage is broken unless the following methods are made static:

942c942
<               static void accumulate(uint64_t* XXH_RESTRICT acc, const uint8_t* XXH_RESTRICT input, const uint8_t* XXH_RESTRICT secret, size_t nbStripes, acc_width accWidth)
---
>               void accumulate(uint64_t* XXH_RESTRICT acc, const uint8_t* XXH_RESTRICT input, const uint8_t* XXH_RESTRICT secret, size_t nbStripes, acc_width accWidth)
951c951
<               static void hash_long_internal_loop(uint64_t* XXH_RESTRICT acc, const uint8_t* XXH_RESTRICT input, size_t len, const uint8_t* XXH_RESTRICT secret, size_t secretSize, acc_width accWidth)
---
>               void hash_long_internal_loop(uint64_t* XXH_RESTRICT acc, const uint8_t* XXH_RESTRICT input, size_t len, const uint8_t* XXH_RESTRICT secret, size_t secretSize, acc_width accWidth)
973c973
<               static uint64_t mix_2_accs(const uint64_t* XXH_RESTRICT acc, const uint8_t* XXH_RESTRICT secret)
---
>               uint64_t mix_2_accs(const uint64_t* XXH_RESTRICT acc, const uint8_t* XXH_RESTRICT secret)
978c978
<               static uint64_t merge_accs(const uint64_t* XXH_RESTRICT acc, const uint8_t* XXH_RESTRICT secret, uint64_t start)
---
>               uint64_t merge_accs(const uint64_t* XXH_RESTRICT acc, const uint8_t* XXH_RESTRICT secret, uint64_t start)
990c990
<               static void init_custom_secret(uint8_t* customSecret, uint64_t seed64)
---
>               void init_custom_secret(uint8_t* customSecret, uint64_t seed64)
1128c1128
<               static uint64_t mix_16b(const uint8_t* XXH_RESTRICT input, const uint8_t* XXH_RESTRICT secret, hash64_t seed64)
---
>               uint64_t mix_16b(const uint8_t* XXH_RESTRICT input, const uint8_t* XXH_RESTRICT secret, hash64_t seed64)
1135c1135
<               static uint128_t mix_32b(hash128_t acc, const uint8_t* input_1, const uint8_t* input_2, const uint8_t* secret, hash64_t seed)
---
>               uint128_t mix_32b(hash128_t acc, const uint8_t* input_1, const uint8_t* input_2, const uint8_t* secret, hash64_t seed)

This is with gcc 9.2.

Best,
Bogdan

Support for macos arm64

Hello,

When trying to use this library on macos with clan on arm64, I get the following error:
/Applications/Xcode_14.2.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/lib/clang/14.0.0/include/immintrin.h:14:2: error: "This header is only meant to be used on x86 and x64 architecture"

The issue seems to be caused simply by importing immintrin.h.
Is there a known workaround for this problem?

Best
Peter

Linker error building with x86 MSVC

Describe the bug
I was compiling with the x86 version of MSVC, before applying the change in #15. xxhash.hpp was included in multiple source files and I got the following linker error:

b.obj : error LNK2005: "void __cdecl xxh::intrin::prefetch(void const *)" (?prefetch@intrin@xxh@@YAXPBX@Z) already defined in a.obj
a.exe : fatal error LNK1169: one or more multiply defined symbols found

To Reproduce
The sample code & reproduction steps are documented in this gist.

Expected behavior
The code should build successfully.

Desktop (please complete the following information):

  • OS: Windows 10 build 19645
  • Toolchain: MSVC 19.16.27041 (x86), MSVC 19.26.28806 (x86)

Additional context
#15 fixes this issue for x86 MSVC. However, this issue also arises when XXH_NO_PREFETCH is used, or when a compiler other than MSVC or g++ (or clang++ - it defines __GNUC__, so masquerades as g++) is used. It seems to me that all the prefetch implementations should be marked as inline to denote that there may be multiple definitions of the function in different translation units.

C++14 compatibility?

I'm on a project with C++14 requirement. Is it possible to adjust the implementation to be C++14 compatible?

From what I have seen only if constexpr is used from C++17. In once case (swap32/64) it can be replaced by a simple overload instead of 2 names and the other can be solved by extracting a subfunction (potentially merging code from here and (here)[https://github.com/RedSpah/xxhash_cpp/blob/92cf55f21d341520137e4a7eb155290d390bdbff/xxhash/xxhash.hpp#L589] although I'm not sure)

This would make it available to a wider audience especially as this seems the only C++ implementation. Thanks for that! 👍

Hash output reversed

Hi,

I'm attempting to implement xxhash_cpp in a project, where unsurprisingly, I need to generate the xxhash64 for a file and check it against a known hash string. But I've run into the problem that my output hash is reversed.

Output from QuickHash: 9FD684E536C4C0B9
Output from xxhsum: 9fd684e536c4c0b9

Output from my implementation B9C0C436E584D69F

This seems like an endianness problem, but I can't see any option for setting a specific endianness.

What would be the best strategy to to an equality check for the known hash string?

  1. Create hash, convert to hex string and check
  2. Somehow convert my hash string to a hash64_t object and compare with the hash object created from the file

My current implementation:

  std::string filename = "/my_input_file";
  std::ifstream filestream(filename, std::ifstream::in | std::istream::binary);
  xxh::hash_state64_t hash_stream;
  std::vector<uint8_t> contents((std::istreambuf_iterator<char>(filestream)), std::istreambuf_iterator<char>());
  hash_stream.update(contents);

  xxh::hash64_t final_hash = hash_stream.digest();
  std::cout << byte_print(final_hash) <<std::endl;

Make in the root dir does not work & folder structure

It seems the makefile in the project root is a copy of the real makefile in the subfolder. Issuing make in the root folder results in an error as the files are not found.

In fixing this I'd suggest to reorganize the folder structure. A common way to use 3rd party libraries is by including them as a git submodule and adding their include folder to -I. With the current structure this will bring in Catch and a couple source files which may be a problem.

From what I understand it is enough to include xxhash.hpp into an application and it will work. None of the other files are required for consumers. Is this correct?

If so I suggest to create a folder include in the root containing only xxhash.hpp and a folder test (or tests) containing the rest. The top-level makefile will just include/redirect/... (I'm not familiar with Makefiles TBH) the tests makefile.

Thank you :)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.