Git Product home page Git Product logo

Comments (7)

Amanieu avatar Amanieu commented on July 17, 2024 5

I'd like to keep this open, since we really need to fix the hasher.

from hashbrown.

Amanieu avatar Amanieu commented on July 17, 2024

It seems that FxHash is generating particularly poor hashes. If you make hashbrown use the libstd default hasher (SipHash) then it generates comparable results:

let mut m = HashMap::with_hasher(std::collections::hash_map::RandomState::new());

from hashbrown.

josephrocca avatar josephrocca commented on July 17, 2024

Ah okay, I just wasn't sure whether there was something strange going on, or whether this is expected/not-surprising. I figured that since SiphHash is DOS-resistant, and since I don't need that security for my use case, then I could go with hashbrown and get a bit of a speedup. I'll just stick with the built-in HashMap unless I'm dealing with numbers. Thanks for your help!

from hashbrown.

josephrocca avatar josephrocca commented on July 17, 2024

Ah, great! I'm too new to to rust (and hashing algorithms) to help out myself at this point, but I'm looking forward to testing out the improvements! 👍

from hashbrown.

xacrimon avatar xacrimon commented on July 17, 2024

Fxhash is know to generate poor hashes often. I'd consider switching to Seahash. It's what I did I'm my crate ccl.

from hashbrown.

Amanieu avatar Amanieu commented on July 17, 2024

Fixed by #97

from hashbrown.

SUPERCILEX avatar SUPERCILEX commented on July 17, 2024

For anyone who's arrived here from google searches trying to find a fast deterministic hash function for hashmaps, I've learned a few things:

  • The stdlib HashMap can be made deterministic so long as the Hasher is deterministic. For example, HashMap::<_, _, BuildHasherDefault<DefaultHasher>>::default() is deterministic.
  • seahash is actually not faster than the stdlib.
~/Desktop> rust-script -w hyperfine foo.rs     # stdlib
Benchmark 1: /home/asaveau/.cache/rust-script/binaries/release/foo_d757ec98400a8a2dbe591e4e
  Time (mean ± σ):     224.0 ms ±   7.7 ms    [User: 224.2 ms, System: 0.3 ms]
  Range (min … max):   216.4 ms … 244.8 ms    12 runs
 
~/Desktop> nano foo.rs
12sec ~/Desktop> rust-script -w hyperfine foo.rs     # seahash
Benchmark 1: /home/asaveau/.cache/rust-script/binaries/release/foo_d757ec98400a8a2dbe591e4e
  Time (mean ± σ):     276.5 ms ±  13.9 ms    [User: 276.1 ms, System: 0.0 ms]
  Range (min … max):   265.9 ms … 311.3 ms    10 runs

With the following benchmarking code (fixed from the original issue to not use memory allocation or depend on the system in any way, it's purely compute bound):

#!/usr/bin/env rust-script
//! Dependencies can be specified in the script file itself as follows:
//!
//! ```cargo
//! [dependencies]
//! itoa = "1"
//! rand = "0.8"
//! rand_xoshiro = "0.6"
//! seahash = "4"
//! ```

use rand::Rng;
use rand::SeedableRng;
use rand_xoshiro::Xoshiro256PlusPlus;

use std::time::Instant;
use std::hash::*;

fn main() {
    let iters = 10000000;

    // let mut hasher = seahash::SeaHasher::new();
    let mut hasher = DefaultHasher::new();

    let mut rng = Xoshiro256PlusPlus::seed_from_u64(42);
    let mut buf = itoa::Buffer::new();

    let now = Instant::now();
    for _ in 0..iters {
        let s = buf.format(rng.gen_range(0..2u32.pow(31)));
        hasher.write(s.as_bytes());
    }
    println!("{:?} {}", now.elapsed(), hasher.finish());
}

from hashbrown.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.