Git Product home page Git Product logo

large-hashmap's Introduction

Concurrent Large Hash Map

A scalable, concurrent hash map implementation incorporating extendible hashing[1] and hopscotch hashing[3]. These two techniques are employed hierarchically—each of the buckets indexed by the extendible hashing directory is treated as an independent hash table, which is managed internally using hopscotch hashing techniques. Discussions of both hashing techniques use the term bucket for concepts that, from the perspective of this project, are dissimlar. To avoid confusion, the units indexed by extensible hashing are called segments here. The term bucket in this project is consistent with the definition used in discussions of hopscotch hashing: a set of map entries that share the same hash index within a segment, all located in close proximity to that index in the segment's array structure. In addition to establishing a hierarchical structure, these hashing techniques address different problems. Extendible hashing manages map growth; hopscotch hashing resolves collisions and distributes map entries in the underlying array for efficient retrieval.

Concurrency

All operations are thread-safe. Retrieval operations do not entail blocking; they execute concurrently with update operations. Update operations may block if overlapping updates are attempted on the same segment. Segmentation is configurable, so the probability of update blocking is (to a degree) under the programmer's control. Retrievals reflect the result of update operations that complete before the onset of the retrieval, and may also reflect the state of update operations that complete after the onset of the retrieval. Iterators do not throw `java.util.ConcurrentModificationException`. If an entry is contained in the map prior to the iterator's creation and it is not removed before the iterator is exhausted, it will be returned by the iterator. The entries returned by an iterator may or may not reflect updates that occur after the iterator's creation. Iterators themselves are not thread-safe; although iterators can be used concurrently with operations on the map and other iterators, an iterator should not be used by multiple threads.

Extendible Hashing

Extendible hashing allows the map to expand gracefully, distributing the cost of resizing into constant-sized increments as the map grows. The map is partitioned into fixed-size segments. Hash values are mapped to segments through a central directory, which is, essentially, a radix search tree flattened into an array. When a segment reaches the load factor threshold it splits into two segments. When a split would exceed directory capacity, the directory doubles in size. The current implementation does not merge segments to reduce capacity as entries are removed.

Concurrency during splits and directory expansion

Ellis[2] describes strategies for concurrent operations on extendible hash tables. The strategy used in this project is informed by this paper, but does not follow it precisely. When an update causes a segment to split, the updating thread acquires a lock on the directory to assign references to the new segments in the directory. If a split forces the directory to expand, the updating thread keeps the directory locked during expansion. A directory lock will not block a concurrent update unless that update forces a segment split.

Hopscotch Hashing

This implementation uses Hopscotch hashing within segments for collision resolution. Hopscotch hashing is similar to linear probing, except that colliding map entries all reside in relatively close proximity to each other, resulting in shorter searches and improved cache hit rates. Hopscotch hashing also yields good performance at load factors as high as 0.9.

At present, the hop range (longest distance from the hash index that a collision can be stored) is set at 32, and the maximum search range for an empty slot during an add operation is 512. Hopscotch hashing was designed to support a high degree of concurrency, including non-blocking retrieval. This implementation follows the concurrency strategies described in the originating papers, with minor variations.

Long Hash Codes and Key Adapters

ConcurrentLargeHashMap is designed to support hash maps that can expand to very large size (> 232 items). To that end, it uses 64-bit hash codes. Lacking a 64-bit cognate of `Object.hashCode()`, this project defines a *key adapter* interface that acts as an intermediary between keys and the hash map. The hash map uses a key adapter to obtain 64-bit hash codes for keys, and to perform matching operations between keys stored in the map and keys passed as method parameters.

SpookyHash

The quality of the hash function used with the hash map directly affects overall performance. An implementation of Bob Jenkins' SpookyHash V2[4] algorithm is included in this project, and is highly recommended. The performance of SpookyHash, with respect to both speed and the statistical properties of the resulting hash codes, compares favorably among hashing algorithms that produce 64- or 128-bit results.

[1] Fagin, et al, "Extendible Hashing - A Fast Access Method for Dynamic Files", ACM TODS Vol. 4 No. 3, Sept. 1979

[2] Ellis, "Extendible Hashing for Concurrent Operations and Distributed Data", PODS 1983

[3] Herlihy, et al, "Hopscotch Hashing", DISC 2008.

[4] http://burtleburtle.net/bob/hash/spooky.html

large-hashmap's People

Contributors

edgeofmagic avatar

Stargazers

 avatar Sridhar M avatar glowinthedark avatar Swaraj avatar Andrejs Jermakovics avatar

Watchers

James Cloos avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.