Comments (7)
I'd like to keep this open, since we really need to fix the hasher.
from hashbrown.
It seems that FxHash
is generating particularly poor hashes. If you make hashbrown use the libstd default hasher (SipHash) then it generates comparable results:
let mut m = HashMap::with_hasher(std::collections::hash_map::RandomState::new());
from hashbrown.
Ah okay, I just wasn't sure whether there was something strange going on, or whether this is expected/not-surprising. I figured that since SiphHash is DOS-resistant, and since I don't need that security for my use case, then I could go with hashbrown and get a bit of a speedup. I'll just stick with the built-in HashMap unless I'm dealing with numbers. Thanks for your help!
from hashbrown.
Ah, great! I'm too new to to rust (and hashing algorithms) to help out myself at this point, but I'm looking forward to testing out the improvements! 👍
from hashbrown.
Fxhash is know to generate poor hashes often. I'd consider switching to Seahash. It's what I did I'm my crate ccl.
from hashbrown.
Fixed by #97
from hashbrown.
For anyone who's arrived here from google searches trying to find a fast deterministic hash function for hashmaps, I've learned a few things:
- The stdlib HashMap can be made deterministic so long as the
Hasher
is deterministic. For example,HashMap::<_, _, BuildHasherDefault<DefaultHasher>>::default()
is deterministic. seahash
is actually not faster than the stdlib.
~/Desktop> rust-script -w hyperfine foo.rs # stdlib
Benchmark 1: /home/asaveau/.cache/rust-script/binaries/release/foo_d757ec98400a8a2dbe591e4e
Time (mean ± σ): 224.0 ms ± 7.7 ms [User: 224.2 ms, System: 0.3 ms]
Range (min … max): 216.4 ms … 244.8 ms 12 runs
~/Desktop> nano foo.rs
12sec ~/Desktop> rust-script -w hyperfine foo.rs # seahash
Benchmark 1: /home/asaveau/.cache/rust-script/binaries/release/foo_d757ec98400a8a2dbe591e4e
Time (mean ± σ): 276.5 ms ± 13.9 ms [User: 276.1 ms, System: 0.0 ms]
Range (min … max): 265.9 ms … 311.3 ms 10 runs
With the following benchmarking code (fixed from the original issue to not use memory allocation or depend on the system in any way, it's purely compute bound):
#!/usr/bin/env rust-script
//! Dependencies can be specified in the script file itself as follows:
//!
//! ```cargo
//! [dependencies]
//! itoa = "1"
//! rand = "0.8"
//! rand_xoshiro = "0.6"
//! seahash = "4"
//! ```
use rand::Rng;
use rand::SeedableRng;
use rand_xoshiro::Xoshiro256PlusPlus;
use std::time::Instant;
use std::hash::*;
fn main() {
let iters = 10000000;
// let mut hasher = seahash::SeaHasher::new();
let mut hasher = DefaultHasher::new();
let mut rng = Xoshiro256PlusPlus::seed_from_u64(42);
let mut buf = itoa::Buffer::new();
let now = Instant::now();
for _ in 0..iters {
let s = buf.format(rng.gen_range(0..2u32.pow(31)));
hasher.write(s.as_bytes());
}
println!("{:?} {}", now.elapsed(), hasher.finish());
}
from hashbrown.
Related Issues (20)
- latest/recent rev appears to break ahash/compile-time-rng usage HOT 2
- Why the identity function can be used as unlikely function? HOT 3
- `hashbrown` fails to compile as a transitive dependency HOT 2
- allocator-api2 default-feature? HOT 2
- Compiling hashbrown 0.14.2 for aarch64-unknown-linux-gnu with "target-cpu=cortex-a53" generates illegal instructions HOT 2
- Switching to GxHash? HOT 9
- Feature: increase capacity according to the actual size returned by the allocator HOT 2
- Hashbrown crash due to bad malloc HOT 1
- 0.14.3 - no method named `clear` found for struct `HashMap` in the current scope HOT 5
- Benchmark biaised due to no fence around input
- assertion failed: buckets.is_power_of_two() HOT 8
- Build breaks on nightly due to use of `stdsimd` rust feature in ahash 0.8.6 HOT 2
- Was swap-remove behavior ever considered when removing entries? HOT 10
- Consider returning to 1.63.0 MSRV HOT 1
- How to calculate the size of the hashbrown::HashMap at runtime? HOT 1
- LLVM failed to use the knowledge from a never-overflow assumption HOT 13
- Library test `map::test_map::test_clone_from_memory_leaks` errors with using uninitialized data under valgrind and miri
- update to ahash 0.8.7 or after to use new stdsimd feature portable_simd HOT 1
- Insertion performance with arena allocators HOT 3
- Do not grow the raw table when lots of deletion and insertion is performed HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from hashbrown.