ogxd / gxhash Goto Github PK
View Code? Open in Web Editor NEWThe fastest hashing algorithm 📈
Home Page: https://docs.rs/gxhash
License: MIT License
The fastest hashing algorithm 📈
Home Page: https://docs.rs/gxhash
License: MIT License
Dear gxhash team,
I was using it for some MinHash algorithms (https://github.com/jianshu93/hyperminhash-rs/blob/354358e6703774143d6a5a9fc19c61329aa6c457/src/lib.rs#L165) to estimate set similarity:
I have the following compiling error:
error: Gxhash requires aes and sse2 intrinsics. Make sure the processor supports it and build with RUSTFLAGS="-C target-cpu=native" or RUSTFLAGS="-C target-feature=+aes,+sse2".
--> /storage/home/hcoda1/4/jzhao399/.cargo/registry/src/index.crates.io-6f17d22bba15001f/gxhash-3.4.1/src/gxhash/platform/x86.rs:2:1
|
2 | compile_error!{"Gxhash requires aes and sse2 intrinsics. Make sure the processor supports it and build with RUSTFLAGS="-C target-cpu=native" or RUSTFLAGS="-C target-feature=+aes,+sse2"."}
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
I have to use RUSTFLAGS="-C target-feature=+aes,+sse2" when compiling it and it worked. However, I do have both SSE and AES on my platform. Building the library itself, I do not have this problem. Any idea how I can avoid this when using as a library since most modern CPUs should have those 2 features. FYI, no problem when using as library in ARM (Mac M series chips).
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch epb cat_l3 cdp_l3 invpcid_single intel_ppin intel_pt ssbd mba ibrs ibpb stibp ibrs_enhanced tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm cqm mpx rdt_a avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local dtherm ida arat pln pts hwp hwp_act_window hwp_epp hwp_pkg_req pku ospke avx512_vnni md_clear spec_ctrl intel_stibp flush_l1d arch_capabilities
Thanks,
Jianshu
Based on https://news.ycombinator.com/item?id=40344581, it sounds like a fixed point attack might be part of a DOS exploit chain. It wouldn't be a backwards compatible change (i.e. would need to bump the major version), but could the compression function incorporate the seed? Hopefully this has no impact on performance.
let bin = [];
dbg!(gxhash::gxhash128(&bin, 0));
let mut hasher = gxhash::GxHasher::with_seed(0);
hasher.write(&bin);
dbg!(hasher.finish_u128());
[xhash/tests/main.rs:14:3] gxhash::gxhash128(&bin, 0) = 302767221070957831171542222971961600063
[xhash/tests/main.rs:17:3] hasher.finish_u128() = 78505093061913940866771591142410949548
It looks like GxBuildHasher accesses the Rng on every construction. Can the technique from RandomState be used instead which only does the RNG once per thread & then simply adds 1 to the state used on each construction?
The build script in this repository is wrong in many ways.
Lines 5 to 6 in c9c9f61
Checking for nightly to enable features should not be done. If the feature ever changes, builds using old versions of this library on new nightlies will break, which is highly undesirable. Not everyone that uses nightly wants to use unstable features, and they should never be implicitly enabled behind the users back. Use a feature that can be enabled by the user or some global cfg instead.
Lines 7 to 9 in c9c9f61
This checks the feature of the build script, so the host. This makes no sense, the host bears no resemblance to the target.
Lines 13 to 15 in c9c9f61
While this attempts to handle cross compilation (unlike the previous check), it does not handle it correctly. By default, this works. But when using cargo cross compilation mode, even for the same target as the host (by passing --target explicitly, which is done by Miri and also required for -Zbuild-std), then RUSTFLAGS only apply to the target binaries, not programs on the host. Using cfg in the build script to attempt to detect whether the target feature is enabled does not work. Instead, check the cfg in the code and compile_error!
when it is not enabled.
I need a deterministic BuildHasher
for my use case and I'd love to try gxhash.
Currently GxHasher
has a with_seed
constructor but not GxBuildHasher
.
Would it be possible to add this?
Also, can you confirm that the hashes produced by gxhash are identical on all supported platforms when the seed is the same?
For input sizes >= 80 bytes and modulo 16 (length of vector size) the construction is proceeding to reading one vector out of the bounds, making such hashes invalid.
This is a major bug for two reasons:
I just looked into replacing ahash
with gxhash
elsewhere and ran into HashMap::new()
missing.
I reckon there may be more. I'll have a look later and may add some PRs to this issue. I'd keep it open until I'm certain the gxhash
containers are full drop-in replacements for the std
versions.
There are probably other issues as well, but this line is particularly problematic:
gxhash/src/gxhash/platform/x86.rs
Line 86 in 8bee61e
This is trivial to invert and allows you to create arbitrary seed-independent multicollisions. I would suggest not advertising DoS resistance on this hash at all.
// Not an endorsement of aes_crypto, just the first crate
// I could find that allows cross-platform single-round encryption.
use aes_crypto::AesBlock;
fn main() {
let zero_key = AesBlock::zero();
let mut s0 = [0u8; 192];
let mut s1 = [0u8; 192];
s0[64] = 100;
s1[64] = 42;
let v0 = AesBlock::new(s0[64..64 + 16].try_into().unwrap());
v0.enc(zero_key).store_to(&mut s0[64 + 32..]);
let v0 = AesBlock::new(s1[64..64 + 16].try_into().unwrap());
v0.enc(zero_key).store_to(&mut s1[64 + 32..]);
// Different strings.
assert!(s0 != s1);
// Collide regardless of seed.
assert!(gxhash::gxhash128(&s0, 0) == gxhash::gxhash128(&s1, 0));
assert!(gxhash::gxhash128(&s0, 0xdeadbeef) == gxhash::gxhash128(&s1, 0xdeadbeef));
}
Breaking API change so would require a major bump, but all the other hashing libraries I've seen use u64 for the seed. The i64 comes from the x64 intrinsic only working with signed integers (arm has vdupq_n_u64
). Could the external API be changed to u64 with the x64 backend doing a as i64
to convert?
A more motivating reason to break the major version would be to make the function more DOS resistant so if that's being done, maybe this could be done at the same time?
Currently, GxHasher::default()
is not DOS resitant. That is because it uses a default state/seed of 0, which implies that an attacker can generate in advance a set of colliding keys for this seed and dramatically alter the efficiency of a given HashMap
/ HashSet
on a targeted service.
It is very possible to use GxHash
with a seed (a full state or a short, fixed-size i64
value). Thus, by making GxHasher::default()
create an Hasher
with a random state (using a cryptographically secure RNG) we can make it DOS resistant, as attackers will no longer be able to know it advance which set of colliding keys.
An i64 seed is more than enough for this purpose.
GxHasher::default()
to use a random seed.I saw your post on dotnet/runtime#85206 about a C# port. Are you able to share it? I'd like to benchmark it both in terms of speed and correctness.
Hello, I find this project quite interesting, do you think there could be a WASM compatible version (perhaps just not manually vectorized) for applications that compile on both WASM and native targets?
Congratulations on the awesome performance and good quality metrics! 🎉
Thanks in advance,
Dherse
Currently, gxhash with a 128-bit state is much faster for small input sizes, while gxhash with a 256-bit state is faster for large input sizes. The idea is to study the possibilities of making an hybrid state version of gxhash, leveraring the advantages of 128-bit state with the advantages of the 256-bit state processing, for maximum throughput for all input sizes.
There are two ways to achieve this:
Some challenges are:
I'm having some problems when using cargo doc having gxhash as a dependency. Setting RUSTFLAGS=... does not work :/ It still emits the "Gxhash requires aes and sse2 instrinsics..."
This does not happen with cargo build
In get_partial_safe, it's possible to use aligned loads by declaring the buffer as MaybeUninit<State>
and then casting the pointer for std::ptr::copy and zeroing the rest of the buffer, instead of declaring the buffer as an u8 array.
gxhash/src/gxhash/platform/x86_128.rs
Lines 35 to 39 in b2b9d24
how to use AVX2 only for hashmap/set ,and use stable hash for hash128
RUSTFLAGS="-C target-feature=+aes" cargo build --release
fn main() {
let bytes: &[u8] = "hello world".as_bytes();
let seed = 1234;
println!(" 32-bit hash: {:x}", gxhash::gxhash32(&bytes, seed));
println!(" 64-bit hash: {:x}", gxhash::gxhash64(&bytes, seed));
println!("128-bit hash: {:x}", gxhash::gxhash128(&bytes, seed));
}
OS: Ubuntu 23
Cargo: cargo 1.80.0-nightly (4de0094ac 2024-05-09)
Hi! It would be interesting to see comparison of time to hash a short string (like in aHash README). Throughput is great, but since in many cases hashmap keys are short strings, time to hash 40 KB of data is less relevant then time time to hash some e.g. 16-byte strings. Plot of hashing time per key size would be really helpful IMO.
For small inputs (<16 bytes), the length of the input is added to the input.
Value recovery
During compressing, the byte array { 1, 2, 3, 4 } becomes a 16-byte vector <5, 6, 7, 8, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4>
It is then run through Finalize, which runs 4 rounds of AES encrypt with static keys:
output = Encrypt(input, seed);
output = Encrypt(output, key1);
output = Encrypt(output, key2);
output = Encryptoutput, key3);
It is possible for an attacker who knows the seed to recover the original value by inverting the function.
output = Decrypt(input, key3);
output = Decrypt(output, key2);
output = Decrypt(output, key1);
output = Decrypt(output, seed);
We get a masked input, but can simply unmask it by taking the length at the end of the vector and subtract it from each of the values in the vector. <5, 6, 7, 8, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4>
then becomes the original input <1, 2, 3, 4, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0>
Invertibility in a non-cryptographic hash function is not an issue, but it makes certain attacks much easier.
Seed recovery
In the case where an attacker does not know the seed, it is possible to recover the seed through brute-force, given only a single full-length hash output.
Let's say we are given a 128-bit hash value of <213, 184, 163, 143, 142, 216, 110, 155, 68, 66, 183, 166, 216, 225, 120, 52>
We run it through our inverted hash function above and get <148, 13, 13, 13, 13, 13, 13, 207, 13, 13, 58, 13, 13, 144, 13, 13>
At this point we don't know that the original input was <5, 6, 7, 8, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4>
, but we can brute-force the seed by running a single decrypt round: output = Decrypt(input, seed);
We know we have the right seed, if the result of decryption ends with a repeating integer (the length). We can filter false-positives by checking if the repeating characters start length-number-of-bytes into the vector. Once we have the seed, we can recover the original value.
The crate seems to be missing a non-arch specific implementation, which forces anyone using gxhash to select another hash and use cfg directives to choose between them (obviously releasing non-specialized crates that only work on x86-64 and arm64 is not acceptable).
If the user selecting another hash is desired, then all GxHash types should take the other hash as a generic parameter, for user convenience.
The code seems to hardcode AES keys.
This makes no sense, and it should instead a mechanism similar to what the standard library uses to provide keys for SipHash.
I have noticed a strange behavior which I am not sure whether it is intended or not:
The hashes of two u8 a
and b
are the same regardless of order:
(a, b).hash(&mut state)
(b, a).hash(&mut state)
This is an issue in my particular use case because it prevents me from distinguishing between two distinct cases only by the hash.
I don't know whether this is actually a bug or no as I am a total noob about cryptography and hashing functions.
Thanks for you help/advice,
Dherse
Combiniting is not a word
"Ideally, half of the bit should change on average." should be "Ideally, half of the hash should change on average"
i see blake3
https://docs.rs/blake3/latest/blake3/struct.Hasher.html
Hasher implements std::io::Write, so it’s possible to use std::io::copy to update a Hasher from any reader. Unfortunately, this standard approach can limit performance, because copy currently uses an internal 8 KiB buffer that isn’t big enough to take advantage of all SIMD instruction sets. (In particular, AVX-512 needs a 16 KiB buffer.) update_reader avoids this performance problem and is slightly more convenient.
I want ask, how many bytes use write() each time that are most efficient when use gxhash?
Currently, each Hasher::write
implies a finalization. This is a problem because hashing strings for instance needs 2 writes (see write_str
implementation), thus 2 finalization steps, which is the most expensive part of the hashing process.
Only finalize in finish
The Hasher
trait exposes methods to hash primitives. Currently, we hash primitives by considering them all as slices of bytes. Hashing can be much performance if the type is known in advance (eg load primitive directly in SIMD vector).
Triggered by the following discussion: rust-lang/hashbrown#487
write_u32
, write_u64
, ...)
While you are correct that at the machine code level, one can read past an array bounds without invoking UB -- because at the machine code level, there is no array bounds, only linear memory -- this is not the case in higher level languages such as Rust, or even LLVM IR.
It appears this does not trigger any harmful optimization yet, but it may at any moment (especially while upgrading), so it would be best to look towards replacing the current implementation.
It's so common for SIMD algorithms to wish to read beyond array bounds that the Rust programming language has included explicit support for it under the form of the simd_masked_load
(and its counterpart, simd_masked_store
) intrinsic.
Using the intrinsic guarantees that a Rust compiler, and its various backends, will correctly understand the developer's intent.
Unfortunately, documentation is sparse, so further testing (and discussion with developers) may be necessary to assert the precise safety guarantees of the intrinsic -- for example, whether it automatically handles reads beyond the end of an OS page, or not -- and double-check how good the generated code is.
Also unfortunately, it is an unstable intrinsic (requires nightly), which may make it unsuitable for use at the moment, though it could be enabled with a feature flag for nightly users' peace of mind.
A last resort implementation would be to directly use inline assembly. The implementation is small enough, and with only 2 target instruction sets, that it may not prove unduly complex, nor too much of a maintenance burden.
as titled. not xxhash. xxh3
how to use AVX2 only for hashmap/set ,and use stable hash for hash128/64/32
I often use AHashMap
as a drop in for HashMap
in my code.
Switching to GxHashMap
in one of my projects, I get this:
error[E0277]: the trait bound `GxBuildHasher: Clone` is not satisfied
--> src/app.rs:60:26
|
59 | #[derive(Clone, Debug, Default)]
| ----- in this derive macro expansion
60 | struct UnicodeCollection(HashSet<char>);
| ^^^^^^^^^^^^^ the trait `Clone` is not implemented for `GxBuildHasher`
|
= note: required for `HashSet<char, GxBuildHasher>` to implement `Clone`
Here is a PR that, among other stuff, fixes this..
It seems like this is nightly only, but it is probably pretty easy to get it working on stable. Here are some workarounds for the nightly features it uses:
pointer_byte_offsets
: just cast it back and forth ptr.cast::<u8>().add(...).cast()
stmt_expr_attributes
: looks like you only use this to make #[allow(unused_assignments)]
ok. You can probably eliminate this attribute and work around it with something like let _tmp_ptr;
before the if
block then load_unaligned(_tmp_ptr, ...)
core_intrinsics
for likely
: you may be able to reverse this and get the same effects with stable #[cold]
instead. There is also likely_stable
which may work. You also might not even need it - not sure if you've benchmarked but iirc, likely
doesn't gain as much in recent LLVM versionsstdsimd
for _mm_loadu_epi8
. I'm unfortunately not really sure what to do here, maybe you could inline assembly it?Hello,
this is mostly an API question -
should these two invocations yield the same hash?
I would expect that they would yield the same hash as they do for the DefaultHasher
(see here).
use gxhash::*;
use std::hash::Hasher;
fn main() {
let mut h = GxHasher::with_seed(0);
h.write_u8(1);
dbg!(h.finish()); // 7327909443358324775
let mut h = GxHasher::with_seed(0);
h.write(&[1u8]);
dbg!(h.finish()); // 8871263162142091921
}
Kind regards,
ambiso
Running benches/hashset.rs (target/release/deps/hashset-e0bfa844d705dd83)
Gnuplot not found, using plotters backend
HashSet/u32/Default Hasher
time: [16.380 ns 16.711 ns 17.067 ns]
Found 1 outliers among 100 measurements (1.00%)
1 (1.00%) high mild
Benchmarking HashSet/u32/GxHash: Warming up for 3.0000 serror: bench failed, to rerun pass `--bench hashset`
Caused by:
process didn't exit successfully: `/root/git/gxhash/target/release/deps/hashset-e0bfa844d705dd83 --bench` (signal: 4, SIGILL: illegal instruction)
Running benches/ilp.rs (target/release/deps/ilp-18f7b97131886606)
Gnuplot not found, using plotters backend
baseline time: [134.34 µs 135.20 µs 136.28 µs]
Found 8 outliers among 100 measurements (8.00%)
5 (5.00%) high mild
3 (3.00%) high severe
unrolled time: [129.67 µs 132.18 µs 136.40 µs]
Found 12 outliers among 100 measurements (12.00%)
2 (2.00%) high mild
10 (10.00%) high severe
temp time: [54.313 µs 55.820 µs 57.345 µs]
Found 1 outliers among 100 measurements (1.00%)
1 (1.00%) high mild
laned time: [46.377 µs 47.916 µs 49.764 µs]
Found 4 outliers among 100 measurements (4.00%)
4 (4.00%) high mild
Running benches/throughput/main.rs (target/release/deps/throughput-ce001a906bb8460b)
gxhash-avx2
error: bench failed, to rerun pass `--bench throughput`
Caused by:
process didn't exit successfully: `/root/git/gxhash/target/release/deps/throughput-ce001a906bb8460b --bench` (signal: 4, SIGILL: illegal instruction)
Running benches/throughput_criterion.rs (target/release/deps/throughput_criterion-c233dcb596180928)
Gnuplot not found, using plotters backend
Benchmarking all/gxhash-avx2/4: Warming up for 3.0000 serror: bench failed, to rerun pass `--bench throughput_criterion`
Caused by:
process didn't exit successfully: `/root/git/gxhash/target/release/deps/throughput_criterion-c233dcb596180928 --bench` (signal: 4, SIGILL: illegal instruction)
error: 3 targets failed:
`--bench hashset`
`--bench throughput`
`--bench throughput_criterion`
~/git/gxhash main 1m 30s root@u2 13:43:20
❯ cargo bench --no-fail-fast -F=avx2
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.