Git Product home page Git Product logo

mih-rs's Introduction

mih-rs

Documentation Crates.io License: MIT

Rust implementation of multi-index hashing (MIH) for neighbor searches on binary codes in the Hamming space, described in the paper

Norouzi, Punjani, and Fleet, Fast exact search in Hamming space with multi-index hashing, IEEE TPAMI, 36(6):1107โ€“ 1119, 2014.

As the benchmark result shows, on 10 million 64-bit codes, mih-rs can perform top-k searches 19โˆ’94 times faster than linear search when k = 1..100.

Features

  • Two types of neighbor searches: mih-rs provides the two search operations:

    • Range search finds neighbor codes whose Hamming distances to a given code are within a radius.
    • Top-K search finds the top-K codes that are closest to a given code.
  • Fast and memory-efficient implementation: The data structure is built on sparse hash tables, following the original implementation.

  • Parameter free: mih-rs automatically sets an optimal parameter of MIH depending on a given database (although you can also set this manually).

  • Serialization: mih-rs supports to serialize/deserialize the index.

Example

use mih_rs::Index;

// Database of codes
let codes: Vec<u64> = vec![
    0b1111111111111111111111011111111111111111111111111011101111111111, // #zeros = 3
    0b1111111111111111111111111111111101111111111011111111111111111111, // #zeros = 2
    0b1111111011011101111111111111111101111111111111111111111111111111, // #zeros = 4
    0b1111111111111101111111111111111111111000111111111110001111111110, // #zeros = 8
    0b1101111111111111111111111111111111111111111111111111111111111111, // #zeros = 1
    0b1111111111111111101111111011111111111111111101001110111111111111, // #zeros = 6
    0b1111111111111111111111111111111111101111111111111111011111111111, // #zeros = 2
    0b1110110101011011011111111111111101111111111111111000011111111111, // #zeros = 11
];

// Query code
let qcode: u64 = 0b1111111111111111111111111111111111111111111111111111111111111111; // #zeros = 0

// Construct the index
let index = Index::new(codes).unwrap();

// Find the ids of neighbor codes whose Hamming distances are within 2
let mut searcher = index.range_searcher();
let answers = searcher.run(qcode, 2);
assert_eq!(answers, vec![1, 4, 6]);

// Find the ids of the top-4 nearest neighbor codes
let mut searcher = index.topk_searcher();
let answers = searcher.run(qcode, 4);
assert_eq!(answers, vec![4, 1, 6, 0]);

// Serialization/Deserialization
let mut data = vec![];
index.serialize_into(&mut data).unwrap();
let other = Index::<u64>::deserialize_from(&data[..]).unwrap();
assert_eq!(index, other);

Binary code types

mih_rs::Index can be built from a vector of type mih_rs::CodeInt that is a primitive integer trait supporting a popcount operation. Currently, this library defines mih_rs::CodeInt for u8, u16, u32, and u64.

Benchmark

timeperf_topk.rs offers the benchmark of top-K search for MIH and LinearSearch algorithms on binary code types u32 and u64.

The following table shows the result of average search times in milliseconds per query, in the settings:

  • Database: N random codes from a uniform distribution.
  • Query set: 100 random codes from a uniform distribution.
  • Machine: MacBook Pro (2019) of Quad-Core Intel Core i5 @2.4 GHz with 16 GB of RAM.
  • Library version: v0.2.0

Result for u32

Algorithm N=10,000 N=100,000 N=1,000,000 N=10,000,000
MIH (K=1) 0.01 0.02 0.07 0.38
MIH (K=10) 0.04 0.08 0.30 1.06
MIH (K=100) 0.13 0.22 1.22 4.35
LinearSearch 0.36 4.40 50.96 626.87

Result for u64

Algorithm N=10,000 N=100,000 N=1,000,000 N=10,000,000
MIH (K=1) 0.10 0.36 1.46 6.7
MIH (K=10) 0.20 0.76 3.72 14.8
MIH (K=100) 0.41 1.53 7.02 33.2
LinearSearch 0.36 4.36 52.28 629.1

Licensing

This library is free software provided under MIT.

mih-rs's People

Contributors

hirosassa avatar kampersanda avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.