redspah / xxhash_cpp Goto Github PK

View Code? Open in Web Editor NEW

161.0 13.0 35.0 479 KB

Port of the xxhash library to C++17.

License: BSD 2-Clause "Simplified" License

C++ 99.48% C 0.23% CMake 0.28%

hashing optimized xxhash cpp hash xxhash-library xxhash-cpp cpp-library cpp17

xxhash_cpp's Introduction

xxhash_cpp

Port of the xxHash library to C++17.

Compatibility

Compiler	Min. Version
MSVC (Visual Studio)	19.1 (VS 2017.3 P2)
clang	3.9
gcc	7

Example Usage

// standalone hash
std::array<int, 4> input {322, 2137, 42069, 65536};
xxh::hash_t<32> hash = xxh::xxhash<32>(input); 

// hash streaming
std::array<unsigned char, 512> buffer;
xxh::hash_state_t<64> hash_stream; 
while (fill_buffer(buffer))
{
  hash_stream.update(buffer);
}
xxh::hash_t<64> final_hash = hash_stream.digest();

The template argument specifies whether the algorithm will use the 32 or 64 bit version. Other values are not allowed. Typedefs hash32_t, hash64_t, hash_state32_t and hash_state64_t are provided.

xxh::xxhash and xxh::hash_state_t::update provide several convenient overloads, all accepting optional seed and endianness arguments:

C-style const void* + size_t pair
const std::vector<T>&
const std::basic_string<T>&
A pair of templated iterators
const std::array<T, N>&
const std::initializer_list<T>&

Build Instructions

The library is provided as a single standalone header, for static linking only. No build instructions are nessessary.

xxHash - Extremely fast hash algorithm

xxHash is an Extremely fast Hash algorithm, running at RAM speed limits. It successfully completes the SMHasher test suite which evaluates collision, dispersion and randomness qualities of hash functions. Code is highly portable, and hashes are identical on all platforms (little / big endian).

Benchmarks

The benchmark uses SMHasher speed test, compiled with Visual 2010 on a Windows Seven 32-bits box. The reference system uses a Core 2 Duo @3GHz

Name	Speed	Quality	Author
xxHash	5.4 GB/s	10	Y.C.
MurmurHash 3a	2.7 GB/s	10	Austin Appleby
SBox	1.4 GB/s	9	Bret Mulvey
Lookup3	1.2 GB/s	9	Bob Jenkins
CityHash64	1.05 GB/s	10	Pike & Alakuijala
FNV	0.55 GB/s	5	Fowler, Noll, Vo
CRC32	0.43 GB/s	9
MD5-32	0.33 GB/s	10	Ronald L.Rivest
SHA1-32	0.28 GB/s	10

Q.Score is a measure of quality of the hash function. It depends on successfully passing SMHasher test set. 10 is a perfect score. Algorithms with a score < 5 are not listed on this table.

A more recent version, XXH64, has been created thanks to Mathias Westerdahl, which offers superior speed and dispersion for 64-bits systems. Note however that 32-bits applications will still run faster using the 32-bits version.

SMHasher speed test, compiled using GCC 4.8.2, on Linux Mint 64-bits. The reference system uses a Core i5-3340M @2.7GHz

Version	Speed on 64-bits	Speed on 32-bits
XXH64	13.8 GB/s	1.9 GB/s
XXH32	6.8 GB/s	6.0 GB/s

License

The library file xxhash.hpp is BSD licensed.

Build modifiers

The following macros influence xxhash behavior. They are all disabled by default.

XXH_FORCE_NATIVE_FORMAT : on big-endian systems : use native number representation, resulting in system-specific results. Breaks consistency with little-endian results.
XXH_CPU_LITTLE_ENDIAN : if defined to 0, sets the native endianness to big endian, if defined to 1, sets the native endianness to little endian, if left undefined, the endianness is resolved at runtime, before main is called, at the cost of endianness not being constexpr.
XXH_FORCE_MEMORY_ACCESS : if defined to 2, enables unaligned reads as an optimization, this is not standard compliant, if defined to 1, enables the use of packed attribute for optimization, only defined for gcc and icc otherwise, uses the default fallback method (memcpy)

Other languages

Beyond the C reference version, xxHash is also available on many programming languages, thanks to great contributors. They are listed here.

xxhash_cpp's People

Contributors

Stargazers

Watchers

xxhash_cpp's Issues

Support XXH3 ?

Would be great :) :
See https://xxhash.com/

128-bit checksum compilation error with 0.8.1

Compiling this function, which worked with version 0.7.3. to_string() not given here, but what it does is reasonably obvious. Note the error is not in this function, but in the header:

std::string
xxhash3(std::string_view data)
{
#define checksum_bits 128
    static_assert(checksum_bits == 64 || checksum_bits == 128);

    xxh::hash3_state_t<checksum_bits> state;
    state.update(data.data(), data.size());

    // convert checksum to canonical byte order                                                                                     
    xxh::canonical_t<checksum_bits> const canonical{state.digest()};
    auto const hash{canonical.get_hash()};

#if checksum_bits == 128
    return to_string(hash.low64, hash.high64);
#else
    return to_string(hash);
#endif
}

g++ 8.3 in C++17 mode:

xxhash/0.8.1/include/xxhash.hpp: In function ‘xxh::uint128_t xxh::intrin::bit_ops::mult64to128(uint64_t, uint64_t)’:
xxhash/0.8.1/include/xxhash.hpp:290:10: error: request for member ‘low64’ in ‘r128’, which is of non-class type ‘__int128 unsigned’
     r128.low64 = (uint64_t)(product);
          ^~~~~
xxhash/0.8.1/include/xxhash.hpp:291:10: error: request for member ‘high64’ in ‘r128’, which is of non-class type ‘__int128 unsigned’
     r128.high64 = (uint64_t)(product >> 64);
          ^~~~~~
xxhash/0.8.1/include/xxhash.hpp:292:12: error: could not convert ‘r128’ from ‘__int128 unsigned’ to ‘xxh::uint128_t’ {aka ‘xxh::typedefs::uint128_t’}
     return r128;
            ^~~~
xxhash/0.8.1/include/xxhash.hpp: At global scope:
xxhash/0.8.1/include/xxhash.hpp:915:3: error: explicit template specialization cannot have a storage class
   static inline uint_t<32> avalanche<32>(uint_t<32> hash) {
   ^~~~~~
xxhash/0.8.1/include/xxhash.hpp:925:3: error: explicit template specialization cannot have a storage class
   static inline uint_t<64> avalanche<64>(uint_t<64> hash) {
   ^~~~~~

from_canonical function is missing

Comments for issue #17 say that from_canonical() was added, but after extracting the 0.8.1 tarball fgrep cannot find any such function.

g++-5 is not supported

While the readme states that g++-5 would be supported it does not work:

xxhash.hpp:625:3: error: array must be initialized with a brace-enclosed initializer
xxhash.hpp:625:3: error: too many initializers for ‘std::array<long unsigned int, 4ul>’

It seems to be a problem with brace elision, so adding extra braces should work.

I'd suggest to add the "supported compilers" from the readme to CI

Make warning free build

Compiling the current source throws a couple warnings (g++-6:

-Wunknown-pragma
-Wattributes
-Wunused-parameter
-Wsign-compare
-Wpedantic

I suggest to build with -Werror on CI and take care of the warnings

can you include some basic usage instructions?

How do I build it?

Some minimal (example) usage instructions would be great too

Possible hashing inconsistency

Hi,

Apologies in advance for the vague description. I am not really sure what's going on so I'll just do my best to describe my usage scenario:

Somewhere in my program I am building a std::vector<uint64_t>:

        std::vector<uint64_t> hashes;
        std::transform(symbols.begin() + i - size, symbols.begin() + i, std::back_inserter(hashes), [](const Symbol& sm) { return sm.CalculatedHashValue; }); 
        hashes.push_back(s.HashValue);
        auto hash = xxh::xxhash<64>(hashes);
        fmt::print("hash(");
        for (auto h : hashes) { fmt::print("{} ", h); }
        fmt::print(") = {}\n", hash);
        s.CalculatedHashValue = hash;

This outputs:

hash(123 456 0) = 14523173615704738576

I wanted to debug this result so in main.cpp I did:

    std::array<uint64_t, 3> arr { 123, 456, 0 };
    std::cout << "array hash = " << xxh::xxhash<64>(arr, 0) << std::endl;

This outputs:

array hash = 13763445824703203362

As you can see there's an inconsistency even though the values are the same.

The next thing I added another test in my main.cpp:

    std::vector<uint64_t> vec { 123, 456, 0 };
    std::cout << "vector hash = " << xxh::xxhash<64>(vec) << std::endl;

And now the weird part. After adding the above two lines, I get:

hash(123 456 0  ) = 13763445824703203362
array hash = 13763445824703203362
vector hash = 13763445824703203362

Now the hashes are consistent as they should be. Could this be a bug?

EDIT:
The problem seems to go away if I manually define the endianess before including xxhash.hpp:

#define XXH_CPU_LITTLE_ENDIAN 1

Best,
Bogdan

Linking issue with static lib

Hi,

Using xxhash.hpp in my static lib, needs to be included more or less globally in order to have access to the exposed hash types. Linkage is broken unless the following methods are made static:

942c942
<               static void accumulate(uint64_t* XXH_RESTRICT acc, const uint8_t* XXH_RESTRICT input, const uint8_t* XXH_RESTRICT secret, size_t nbStripes, acc_width accWidth)
---
>               void accumulate(uint64_t* XXH_RESTRICT acc, const uint8_t* XXH_RESTRICT input, const uint8_t* XXH_RESTRICT secret, size_t nbStripes, acc_width accWidth)
951c951
<               static void hash_long_internal_loop(uint64_t* XXH_RESTRICT acc, const uint8_t* XXH_RESTRICT input, size_t len, const uint8_t* XXH_RESTRICT secret, size_t secretSize, acc_width accWidth)
---
>               void hash_long_internal_loop(uint64_t* XXH_RESTRICT acc, const uint8_t* XXH_RESTRICT input, size_t len, const uint8_t* XXH_RESTRICT secret, size_t secretSize, acc_width accWidth)
973c973
<               static uint64_t mix_2_accs(const uint64_t* XXH_RESTRICT acc, const uint8_t* XXH_RESTRICT secret)
---
>               uint64_t mix_2_accs(const uint64_t* XXH_RESTRICT acc, const uint8_t* XXH_RESTRICT secret)
978c978
<               static uint64_t merge_accs(const uint64_t* XXH_RESTRICT acc, const uint8_t* XXH_RESTRICT secret, uint64_t start)
---
>               uint64_t merge_accs(const uint64_t* XXH_RESTRICT acc, const uint8_t* XXH_RESTRICT secret, uint64_t start)
990c990
<               static void init_custom_secret(uint8_t* customSecret, uint64_t seed64)
---
>               void init_custom_secret(uint8_t* customSecret, uint64_t seed64)
1128c1128
<               static uint64_t mix_16b(const uint8_t* XXH_RESTRICT input, const uint8_t* XXH_RESTRICT secret, hash64_t seed64)
---
>               uint64_t mix_16b(const uint8_t* XXH_RESTRICT input, const uint8_t* XXH_RESTRICT secret, hash64_t seed64)
1135c1135
<               static uint128_t mix_32b(hash128_t acc, const uint8_t* input_1, const uint8_t* input_2, const uint8_t* secret, hash64_t seed)
---
>               uint128_t mix_32b(hash128_t acc, const uint8_t* input_1, const uint8_t* input_2, const uint8_t* secret, hash64_t seed)

This is with gcc 9.2.

Best,
Bogdan

Support for macos arm64

Hello,

When trying to use this library on macos with clan on arm64, I get the following error:
/Applications/Xcode_14.2.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/lib/clang/14.0.0/include/immintrin.h:14:2: error: "This header is only meant to be used on x86 and x64 architecture"

The issue seems to be caused simply by importing immintrin.h.
Is there a known workaround for this problem?

Best
Peter

Linker error building with x86 MSVC

Describe the bug
I was compiling with the x86 version of MSVC, before applying the change in #15. xxhash.hpp was included in multiple source files and I got the following linker error:

b.obj : error LNK2005: "void __cdecl xxh::intrin::prefetch(void const *)" (?prefetch@intrin@xxh@@YAXPBX@Z) already defined in a.obj
a.exe : fatal error LNK1169: one or more multiply defined symbols found

To Reproduce
The sample code & reproduction steps are documented in this gist.

Expected behavior
The code should build successfully.

Desktop (please complete the following information):

OS: Windows 10 build 19645
Toolchain: MSVC 19.16.27041 (x86), MSVC 19.26.28806 (x86)

Additional context
#15 fixes this issue for x86 MSVC. However, this issue also arises when XXH_NO_PREFETCH is used, or when a compiler other than MSVC or g++ (or clang++ - it defines __GNUC__, so masquerades as g++) is used. It seems to me that all the prefetch implementations should be marked as inline to denote that there may be multiple definitions of the function in different translation units.

C++14 compatibility?

I'm on a project with C++14 requirement. Is it possible to adjust the implementation to be C++14 compatible?

From what I have seen only if constexpr is used from C++17. In once case (swap32/64) it can be replaced by a simple overload instead of 2 names and the other can be solved by extracting a subfunction (potentially merging code from here and (here)[https://github.com/RedSpah/xxhash_cpp/blob/92cf55f21d341520137e4a7eb155290d390bdbff/xxhash/xxhash.hpp#L589] although I'm not sure)

This would make it available to a wider audience especially as this seems the only C++ implementation. Thanks for that! 👍

The project doesn't compile in Clang in Visual studio

Update for xxHash 8.0.0

The most recent release of xxhash is 0.8.0
Do you have plans to update xxhash_cpp to that version?

Changes to be incorporated are as far as I see:

I would suggest to create a tag 0.7.3 from the currently latest commit on master, i.e. 6246966 , before proceeding with implementing 0.8.0 changes.

Constructor parameter missing in example usage

In the example usage the constructor parameter (hash) is missing:
xxh::hash_state_t<64> hash_stream(hash)

Hash output reversed

Hi,

I'm attempting to implement xxhash_cpp in a project, where unsurprisingly, I need to generate the xxhash64 for a file and check it against a known hash string. But I've run into the problem that my output hash is reversed.

Output from QuickHash: 9FD684E536C4C0B9
Output from xxhsum: 9fd684e536c4c0b9

Output from my implementation B9C0C436E584D69F

This seems like an endianness problem, but I can't see any option for setting a specific endianness.

What would be the best strategy to to an equality check for the known hash string?

Create hash, convert to hex string and check
Somehow convert my hash string to a hash64_t object and compare with the hash object created from the file

My current implementation:

  std::string filename = "/my_input_file";
  std::ifstream filestream(filename, std::ifstream::in | std::istream::binary);
  xxh::hash_state64_t hash_stream;
  std::vector<uint8_t> contents((std::istreambuf_iterator<char>(filestream)), std::istreambuf_iterator<char>());
  hash_stream.update(contents);

  xxh::hash64_t final_hash = hash_stream.digest();
  std::cout << byte_print(final_hash) <<std::endl;

Make in the root dir does not work & folder structure

It seems the makefile in the project root is a copy of the real makefile in the subfolder. Issuing make in the root folder results in an error as the files are not found.

In fixing this I'd suggest to reorganize the folder structure. A common way to use 3rd party libraries is by including them as a git submodule and adding their include folder to -I. With the current structure this will bring in Catch and a couple source files which may be a problem.

From what I understand it is enough to include xxhash.hpp into an application and it will work. None of the other files are required for consumers. Is this correct?

If so I suggest to create a folder include in the root containing only xxhash.hpp and a folder test (or tests) containing the rest. The top-level makefile will just include/redirect/... (I'm not familiar with Makefiles TBH) the tests makefile.

Thank you :)

#include <cstring> missing

At least my GCC7.3.0 environment needs an #include <cstring> to resolve memcpy
See also https://en.cppreference.com/w/cpp/string/byte/memcpy

Update for xxHash 0.7.4

This version is slightly behind the head version of xxHash. Are you planning to update?