Git Product home page Git Product logo

evmap's Introduction

Build Status Codecov Crates.io Documentation

A lock-free, eventually consistent, concurrent multi-value map.

This map implementation allows reads and writes to execute entirely in parallel, with no implicit synchronization overhead. Reads never take locks on their critical path, and neither do writes assuming there is a single writer (multi-writer is possible using a Mutex), which significantly improves performance under contention. It is backed by the concurrency primitive left-right.

The trade-off exposed by this module is one of eventual consistency: writes are not visible to readers except following explicit synchronization. Specifically, readers only see the operations that preceeded the last call to WriteHandle::flush by a writer. This lets writers decide how stale they are willing to let reads get. They can refresh the map after every write to emulate a regular concurrent HashMap, or they can refresh only occasionally to reduce the synchronization overhead at the cost of stale reads.

For read-heavy workloads, the scheme used by this module is particularly useful. Writers can afford to refresh after every write, which provides up-to-date reads, and readers remain fast as they do not need to ever take locks.

The map is multi-value, meaning that every key maps to a collection of values. This introduces some memory cost by adding a layer of indirection through a Vec for each value, but enables more advanced use. This choice was made as it would not be possible to emulate such functionality on top of the semantics of this map (think about it -- what would the operational log contain?).

To facilitate more advanced use-cases, each of the two maps also carry some customizeable meta-information. The writers may update this at will, and when a refresh happens, the current meta will also be made visible to readers. This could be useful, for example, to indicate what time the refresh happened.

Performance

These benchmarks are outdated at this point, but communicate the right point. Hopefully I'll have a chance to update them again some time soon.

I've run some benchmarks of evmap against a standard Rust HashMap protected by a reader-writer lock, as well as against chashmap — a crate which provides "concurrent hash maps, based on bucket-level multi-reader locks". The benchmarks were run using the binary in benchmark/ on a 40-core machine with Intel(R) Xeon(R) CPU E5-2660 v3 @ 2.60GHz CPUs.

The benchmark runs a number of reader and writer threads in tight loops, each of which does a read or write to a random key in the map respectively. Results for both uniform and skewed distributions are provided below. The benchmark measures the average number of reads and writes per second as the number of readers and writers increases.

Preliminary results show that evmap performs well under contention, especially on the read side. This benchmark represents the worst-case usage of evmap in which every write also does a refresh. If the map is refreshed less often, performance increases (see bottom plot).

Read throughput Write throughput Write throughput

evmap's People

Contributors

elichai avatar fintelia avatar glittershark avatar jbg avatar jonhoo avatar kolemannix avatar ms705 avatar novacrazy avatar ptravers avatar shaneeverittm avatar srishanbhattarai avatar steffahn avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

evmap's Issues

Adding serde support?

The crate dependency hashbrown already provides serde support, but why doesn't evmp provide support for serde?

Am I understanding correctly that wait-free writes are not supported?

I was about to start using this package because I have a single-writer, multi-reader setup where the write operation cannot wait for readers, so a traditional RwLock or Mutex can be annoying. The description says that evmap has "lock-free writes" but I don't see how that would work. Both flush and refresh will block the writer until the readers are done.

I would want a situation where writers can freely add operations without every having to wait and as soon as there are no more readers, the readers' side is updated. But on the readers' time, not on the writer's time.

It seems like with "lock-free writes" you are intending to say "writes don't prevent new reads", it seems ambiguous as I took it to mean "writes don't need to wait for a lock".

Eventually consistent writes

I wrote an eventually consistent LRU earlier this week by leveraging crossbeam with evmap. This concept can be easily adapted for non-LRU use as to facilitate concurrent eventual writes with a version tailored for bounded use via crossbeam::queue:ArrayQueue and an unbounded version via crossbeam::queue::SegQueue. If you want this as part of evmap, you can run with the idea or I can push up a PR and we collaborate, otherwise I'll release this as a separate crate.

ReadHandle (and others) are unsound due to missing Send and Sync bounds

Unfortunately, due to an oversight, Inner doesn't inherit any requirements on Send / Sync to be Send / Sync, so ReadHandle is unconditionally Send. This is because AtomicPtr has no Send bound on its T (I understand why, because technically you can use it from safe code with a non-Send type), so the auto-Send/Sync bound didn't get applied for K, V, M, or S; but in practice, most unsafe code will actually dereference the pointer at some point so this is usually wrong in that case. It might be nice to have some on-by-default lint for use of AtomicPtr without manually checking the appropriate bounds).

Just not having bounds on Send wouldn't necessarily make ReadHandle unsound, but I believe it is. ReadHandle effectively owns a MapReadRef which is (AFAIK, correctly) only Send when K, V, M, and S are Sync (fortunately, I think they don't need to be Send since you can't get an owned instance of any of those out of a MapReadRef directly, so I believe that part is sound). Since ReadHandle owns (and provides access to) MapReadRef, it follows that ReadHandle: Send should at least require K, V, M, S : Sync.

There are a few other ways to access values of the generic types in question on ReadHandle besides via the MapReadRef, but from what I saw none of them provide owned access to any of K, V, M, or S (at least, not owned access that calls any safe trait methods or hands them to the user, though maybe internally it does something like that), so I think just the Sync bound on the type parameters is also sufficient for safety of ReadHandle: Send, not Send+Sync on the type parameters, which is actually somewhat unusual for stuff like this (Sync containing !Sync is much more common than Send containing !Send). The weak bounds are due (I think) to the fact that the reader types don't ever call Drop on values of the owned type, only the writer type does, and there's nothing like Arc::make_mut and friends, so evmap implements a "weaker" form of sharing than Arc does.

The unsoundness of at least one other type (ReadHandleFactory) follows from this same root cause of using an AtomicPtr (arguably, it's worse, since it's not only unconditionally Send but also unconditionally Sync) and needs to be fixed in the same way (Arc<T> requires T: Send+Sync for both Send and Sync, and the AtomicPtr is in an Arc, so you don't need to explicitly impl !Send here).

More subtly, WriteHandle is also unsound for the same reason (this should get fixed automatically if ReadHandle is, since it owns a ReadHandle). At first, I actually thought WriteHandle should be able to be made sound without changing its Send requirements from what they currently are (which would require an explicit Send impl if the ReadHandle fix were added), because you can only write/drop through a unique WriteHandle, and you can't write at the same time as the Deref implementation on WriteHandle is active. Unfortunately, (1) we always create a ReadHandle on startup for some reason (despite the fact that WriteHandle already owns one internally?), and (2) even if we didn't, the Deref implementation on WriteHandle lets you clone the ReadHandle (though (2) is technically solvable by putting the relevant Sync bounds on the Deref implementation, or by not having the Deref implementation and exposing access to the non-clone / factory methods via wrappers instead). So WriteHandle should really just inherit the extra bounds from its owned ReadHandle.

There are probably a few other other types with the same issue.

Current implementation is unsound. Segfault and double free are possible with out-of-sync maps from a bad `PartialEq` implementation.

So basically, the current implementation seems to rely on the PartialEq implementation of a value for removing it. This is totally unsafe for the very same reasons why retain is unsafe: You cannot guarantee that the PartialEq implementation gives the same result on two seperate calls with the same arguments.

use std::sync::atomic::{AtomicBool, Ordering::SeqCst};
static SNEAKY: AtomicBool = AtomicBool::new(false);

#[derive(Eq, Hash)]
struct Sneaky(Vec<u8>);
impl PartialEq for Sneaky {
    fn eq(&self, other: &Self) -> bool {
        if SNEAKY.swap(false, SeqCst) {
            false
        } else {
            self.0 == other.0
        }
    }
}

fn main() {
    let (_, mut w) = evmap::new::<u8, Box<Sneaky>>();
    w.insert(0, Box::new(Sneaky(vec![0])));
    w.refresh();
    w.remove(0, Box::new(Sneaky(vec![0])));
    SNEAKY.store(true, SeqCst);
    w.refresh();
    w.refresh();
    w.refresh();
}
$ cargo run
   Compiling evmaptest v0.1.0 (/home/frank/[...]/evmaptest)
    Finished dev [unoptimized + debuginfo] target(s) in 0.78s
     Running `target/debug/evmaptest`
[1]    30645 segmentation fault (core dumped)  cargo run

Change to struct Sneaky(u8); and Box::new(Sneaky(0)) and you get

free(): double free detected in tcache 2
[1]    31123 abort (core dumped)  cargo run

instead.


Since Rust doesn’t have (and won’t have anytime soon) an effects system, we cannot guarantee “pure” implementations for e.g. PartialEq on the values type. I guess the only way to fix this is to make sure that the equality check happens only once, and then e.g. some kind of raw index into the bag is determined and stored into the op-log. That index only is then used for removing the same value from the other map.

I’m not familiar with the implementation details of this crate as to assess if this is possible and how hard of a change this would be but it seems like some nontrivial changes are needed. I’d suspect that PartialEq implementations (and perhaps also the Hash implementation?) of both the keys and the value type might be able to cause problems (I haven’t tested the behavior of the map for bad/sneaky key types yet though).

Also, @jonhoo I wanna say thanks for the great coding streams.


A bit of testing shows that this exact code can produce a segfault on every version starting from version 5. I only tested the latest minor versions, i.e.: 5.1.1, 6.0.1, 7.1.3, 8.0.1, 9.0.1, 10.0.2, 11.0.0-alpha.1; as well as 5.0.0 and the current master and left-right branches.


This code variation
use std::sync::atomic::{AtomicBool, Ordering::SeqCst};
static SNEAKY: AtomicBool = AtomicBool::new(false);

#[derive(Eq, Hash, Debug)]
struct Sneaky(Vec<u8>);
impl PartialEq for Sneaky {
    fn eq(&self, other: &Self) -> bool {
        if SNEAKY.swap(false, SeqCst) {
            false
        } else {
            self.0 == other.0
        }
    }
}

fn main() {
    let (r, mut w) = evmap::new::<u8, Box<Sneaky>>();
    let dbg = || r.get_and(&0, |vs| {
        println!("{:?}", vs);
    });
    w.insert(0, Box::new(Sneaky(vec![0])));
    w.refresh();
    w.remove(0, Box::new(Sneaky(vec![0])));
    SNEAKY.store(true, SeqCst);
    w.refresh();
    r.get_and(&0, |vs| {
        println!("{:?}", vs);
    });
    w.refresh();
    w.refresh();
    r.get_and(&0, |vs| {
        println!("{:?}", vs);
    });
}

leads to segfault on 2.0.0, 2.0.2, 3.0.1, 4.2.3. (and probably every other version 2-4 as well).

Version 1.* didn’t have ShallowCopy yet, so it probably should be fine.

Feature request: Weak<T> support

I think it would be interesting to make an access-log based concurrent LRU that functions by having a circular buffer of strong references via crossbeam::queue::ArrayQueue that uses retain to clean up anything without strong references. For this to work though, Weak needs to be ShallowCopy and probably something clever needs to be done about the Eq trait bound. Thoughts?

Evmap to file (persistence)

Hi,

Not an issue but what is the best way to save the content of evmap into a file and load it back at runtime?

IO BufReader reading line by line with a split?
Thanks again :)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.