arthurprs / quick-cache Goto Github PK

View Code? Open in Web Editor NEW

107.0 6.0 8.0 215 KB

Lightweight and high performance concurrent cache

License: MIT License

Rust 100.00%

cache caching concurrent lru rust clock-pro

quick-cache's Introduction

Quick Cache

Lightweight and high performance concurrent cache optimized for low cache overhead.

Small overhead compared to a concurrent hash table
Scan resistent and high hit rate caching policy
User defined weight per item
Scales well with the number of threads
Atomic operations with get_or_insert and get_value_or_guard functions
Atomic async operations with get_or_insert_async and get_value_or_guard_async functions
Doesn't use background threads
Only trivially verifiable usages of unsafe
Small dependency tree

The implementation is optimized for use cases where the cache access times and overhead can add up to be a significant cost. Features like: time to live, iterators, event listeners and others; are not currently implemented in Quick Cache. If you need these features you may want to take a look at the Moka crate.

Examples

Basic usage

use quick_cache::unsync::Cache;

fn main() {
    let cache = Cache::new(5);
    cache.insert("square", "blue");
    cache.insert("circle", "black");
    assert_eq!(*cache.get(&"square").unwrap(), "blue");
    assert_eq!(*cache.get(&"circle").unwrap(), "black");
}

A cache with custom item weights. In this case according to the string length of the value.

use quick_cache::{Weighter, sync::Cache};

#[derive(Clone)]
struct StringWeighter;

impl Weighter<u64, String> for StringWeighter {
    fn weight(&self, _key: &u64, val: &String) -> u32 {
        // Be cautions out about zero weights!
        val.len().clamp(1, u32::MAX as usize) as u32
    }
}

fn main() {
    let cache = Cache::with_weighter(100, 100_000, StringWeighter);
    cache.insert(1, "1".to_string());
    cache.insert(54, "54".to_string());
    cache.insert(1000, "1000".to_string());
    assert_eq!(cache.get(&1000).unwrap(), "1000");
}

Using the Equivalent trait for complex keys

use quick_cache::{sync::Cache, Equivalent};

#[derive(Debug, Hash)]
pub struct Pair<A, B>(pub A, pub B);

impl<A, B, C, D> Equivalent<(C, D)> for Pair<A, B>
where
    A: PartialEq<C>,
    B: PartialEq<D>,
{
    fn equivalent(&self, rhs: &(C, D)) -> bool {
        self.0 == rhs.0 && self.1 == rhs.1
    }
}

fn main() {
    let cache: Cache<(String, i32), String> = Cache::new(5);
    cache.insert(("square".to_string(), 2022), "blue".to_string());
    cache.insert(("square".to_string(), 2023), "black".to_string());
    assert_eq!(cache.get(&Pair("square", 2022)).unwrap(), "blue");
}

Benchmarks

Since this crate is performance oriented it needs some comparisons. That said, benchmarks can be misleading so take everything with a pinch of salt.

Benchmarks performed with mokabench in a x64 Linux OS + Intel i9-12900H CPU.

Trace 1 (S3 from the Arc paper)

Cache	Max Capacity	Clients	Hit Ratio(%)	Elapsed time(s)
HashLink (LRU w/ Mutex)	100000	1	2.327	2.479
HashLink (LRU w/ Mutex)	100000	3	2.327	6.171
HashLink (LRU w/ Mutex)	100000	6	2.328	12.467
QuickCache Sync Cache	100000	1	12.764	2.465
QuickCache Sync Cache	100000	3	12.765	1.387
QuickCache Sync Cache	100000	6	12.764	0.835
Mini Moka Sync Cache	100000	1	10.436	10.473
Mini Moka Sync Cache	100000	3	10.541	8.439
Mini Moka Sync Cache	100000	6	10.366	8.008
HashLink (LRU w/ Mutex)	400000	1	12.039	2.956
HashLink (LRU w/ Mutex)	400000	3	12.041	6.939
HashLink (LRU w/ Mutex)	400000	6	12.043	13.503
QuickCache Sync Cache	400000	1	42.044	3.250
QuickCache Sync Cache	400000	3	42.044	1.299
QuickCache Sync Cache	400000	6	42.044	0.777
Mini Moka Sync Cache	400000	1	42.544	9.671
Mini Moka Sync Cache	400000	3	41.525	7.052
Mini Moka Sync Cache	400000	6	41.007	6.618
HashLink (LRU w/ Mutex)	800000	1	56.600	3.221
HashLink (LRU w/ Mutex)	800000	3	56.603	5.921
HashLink (LRU w/ Mutex)	800000	6	56.605	12.768
QuickCache Sync Cache	800000	1	66.801	3.980
QuickCache Sync Cache	800000	3	66.802	1.418
QuickCache Sync Cache	800000	6	66.803	0.798
Mini Moka Sync Cache	800000	1	70.304	10.613
Mini Moka Sync Cache	800000	3	68.054	4.754
Mini Moka Sync Cache	800000	6	66.288	4.137

Trace 2 (DS1 from the Arc paper)

Cache	Max Capacity	Clients	Hit Ratio(%)	Elapsed time(s)
HashLink (LRU w/ Mutex)	1000000	1	3.086	10.097
HashLink (LRU w/ Mutex)	1000000	3	3.085	22.443
HashLink (LRU w/ Mutex)	1000000	6	3.082	41.351
QuickCache Sync Cache	1000000	1	15.052	9.426
QuickCache Sync Cache	1000000	3	15.059	4.139
QuickCache Sync Cache	1000000	6	15.078	2.467
Mini Moka Sync Cache	1000000	1	14.910	28.974
Mini Moka Sync Cache	1000000	3	13.257	24.681
Mini Moka Sync Cache	1000000	6	13.518	22.759
HashLink (LRU w/ Mutex)	4000000	1	20.245	9.863
HashLink (LRU w/ Mutex)	4000000	3	20.250	19.418
HashLink (LRU w/ Mutex)	4000000	6	20.259	36.932
QuickCache Sync Cache	4000000	1	44.565	9.083
QuickCache Sync Cache	4000000	3	44.574	4.272
QuickCache Sync Cache	4000000	6	44.569	2.449
Mini Moka Sync Cache	4000000	1	45.398	31.108
Mini Moka Sync Cache	4000000	3	44.236	21.381
Mini Moka Sync Cache	4000000	6	41.723	18.719
HashLink (LRU w/ Mutex)	8000000	1	43.035	9.751
HashLink (LRU w/ Mutex)	8000000	3	43.034	16.554
HashLink (LRU w/ Mutex)	8000000	6	43.031	34.978
QuickCache Sync Cache	8000000	1	71.124	7.675
QuickCache Sync Cache	8000000	3	71.139	3.198
QuickCache Sync Cache	8000000	6	71.143	2.348
Mini Moka Sync Cache	8000000	1	67.447	28.462
Mini Moka Sync Cache	8000000	3	67.995	17.054
Mini Moka Sync Cache	8000000	6	64.738	12.929

References

License

This project is licensed under the MIT license.

quick-cache's People

Contributors

Stargazers

Watchers

Forkers

xuchaoqian sicikh liyixin95 triblespace justanotherdot nagisa amdahl-rs

quick-cache's Issues

Documentation errors

I've noticed a few problems with the documentation:

In quick_cache::sync::Cache:

get():

Callers should prefer get_mut whenever possible as it’s more efficient

There's no get_mut method

remove():

Peeks an item from the cache whose key is key and qey is <= highest_version. Contrary to gets, peeks don’t alter the key “hotness”.

There's no qey. This is not peek.

insert():

Inserts an item in the cache with key key and qey qey.

There's no qey.

OptionsBuilder:

build():

Panics if weight_capacity or estimated_items_capacity were not set.

From looking at the code, this doesn't seem true, it returns an error.

Shouldn't Weighter::weight return u64?

It seems weird that the limit for a single item is limited to u32 even though weight for the overall collection is expressed as u64. Is that an intentional design choice to limit the weight for a given entry to 4 GiB?

Clear all items from `sync::Cache`

The unsync::Cache has a clear() method that will remove all the items in the cache. However, the sync::Cache does not have an equivalent function. It would be very nice to have fn clear(&self) for the sync::Cache for clearing all entries from the cache without having to drop the previous cache and replace it with a new one to ensure that all the items have been cleared out of it.

Is estimated_items_capacity superfluous in Cache::with_weighter?

Hi,

I've been using quick-cache inside https://github.com/marvin-j97/lsm-tree with great success so far. I've recently been changing from Cache::new to Cache::with_weighter.

Is there any expected changes in performance when changing estimated_items_capacity? According to my benchmarks this value does not change insert performance nor retrieval performance. So what is the difference between

Cache::with_weighter(0, 1_000_000, w) and
Cache::with_weighter(1, 1_000_000, w) and
Cache::with_weighter(1_000_000, 1_000_000, w), if any?

Both caches are supposed to store 1'000'000 bytes (or whatever unit the weighter returns).

get_or_* method's trait bound is too strict .

Problem Description

consider we have a cache like this:

let cache = Arc::new(sync::Cache::<String, u64>::new(100));

if i call get_value_or_guard like this:

cache.get_value_or_guard("foo", None);

then i get a compile error:

error[E0308]: mismatched types
   --> src\lib.rs:389:34
    |
389 |         cache.get_value_or_guard("foo", None);
    |               ------------------ ^^^^^ expected `&String`, found `&str`
    |               |
    |               arguments to this method are incorrect
    |
    = note: expected reference `&std::string::String`
               found reference `&'static str`
note: method defined here
   --> src\sync.rs:304:12
    |
304 |     pub fn get_value_or_guard<'a>(
    |            ^^^^^^^^^^^^^^^^^^
305 |         &'a self,
306 |         key: &Key,
    |         ---------

So, I have to clone the key, which is unnecessary.

Possible Solution

Maybe we can use ToOwned instead of Clone, like this:

    pub async fn get_value_or_guard_async<Q>(
        &self,
        key: &Q,
    ) -> Result<Val, PlaceholderGuard<'_, Key, Val, We, B, L>>
    where
        Q: Hash + Equivalent<Key> + ToOwned<Owned = Key> + ?Sized,

ToOwned is implement for all T that implement Clone:

impl<T> ToOwned for T
where
    T: Clone,

So, this trait bound is compatible with the old one, and allow more key type, such as &str for String.

Finally, thanks for creating this great crate, and I would like to provide a pr for this issue, if you agree with this solution.

`before_evict` + `weight` API is stateful/awkward (it can't effectively do pining)

Hey there,

I'm currently building a system where some data is "strongly" held in the cache (and thus never evicted) until it is synced to a persistent store after which it is "weakly" held, and thus elegible for eviction.

The ability to set the weight of an entry to 0 before eviction and thus prevent it from being evicted seems ideal for this. Howeve I found the API to be a bit awkward as it requires statefully remembering that an entry is about to be evicted and then adjusting the weight accordingly.

E.g. in my case the weight of each entry isvalue.len() normally and if Arc::strong_count(value) == 1 {0} else {value.len()} on eviction.

After looking through the code it looks like the pre-eviction weight is only ever used ephemerally to determine if it is ==0 or below the remaining weight threshold. So the semantic seems to be last chance to adjust your weight to show that you really really want to be in the cache, in which case one will probably always choose 0.

So it seems to me that this semantic might be more cleanly captured by always having weight return the "true weight of the entry" but give before_evict the ability to prevent eviction, by returning a boolean.

Thoughts? 😄

Add atomic `get_or_insert` method to sync `Cache`/`KQCache`

Hi! Thank you for your work on this lightweight, versatile Rust cache library 😄

I'd like to use it to implement a cache for deserialized file data that multiple threads can access in parallel. Roughly, I'd like each thread to interact with the cache like this:

The thread asks the cache whether the data for a file has been deserialized (i.e., calls get on the cache).
- If deserialized data is cached (i.e., get returns Some), then use that to do work.
- If the deserialized data is not cached (i.e., get returns None), then deserialize it, add it to the cache (i.e., call insert), and use the just deserialized data.

However, implementing that with the currently available API is unsatisfactory:

Using get and insert naively means that there is the possibility that several threads watch get yielding None, so data may be deserialized several times, only to have most of that work discarded when the thread that finishes last calls insert. This does not affect correctness in my case, but can waste a lot of work.
Alternatively, the unsynchronized versions of Cache/KQCache can be used in conjunction with external synchronization, so that each thread acquires a mutex before calling the get and insert methods of the cache. This guarantees the atomicity of both operations and that no other thread will see the missing cache entry while data is being deserialized, but introduces a lot of contention.

As an ideal solution to this situation, I'd like the API of this crate to provide an atomic get_or_insert_with method, similar to that of the Entry enum in the Rust standard library, that allows this atomic read-compute a value-insert operation to be performed as efficiently as possible.

`unsync::Cache::get_or_insert_with`

Currently in order to implement "if missing, insert & return the value from the cache" with unsync::Cache it is necessary to do something like this:

if let Some(val) = cache.get(key) { return val; }
cache.insert(key, with());
cache.get(key).unwrap()

however this ends up pushing an assumption that Cache::insert can't fail, for instance, and forces three total lookups of the key.

I have taken a look at implementing this myself, but this seems quite involved. In particular, Shard::insert has multiple strategies it works with and it won't necessarily insert the value into the cache due to weight limits, at which point it might give away the ownership to the lifecycle management hook. So it returns a Result<(), (Key, Val)>.

But maybe there's still a way that I'm not seeing to enable it returning &Val upon successful insertion?

Consider making dependency on `parking_lot` optional

It is conventional wisdom among the Rust community that parking_lot used to provide an implementation of RwLock that was better than std::sync::RwLock:

std's RwLock always held a boxed a pthread lock, while parking_lot didn't.
parking_lot locks take 1 byte of storage space, instead of 1 word or more in std's case.
parking_lot may explicitly take advantage of hardware lock elision instructions.
parking_lot locks do not poison on panic, while offering more features (mapped lock guards, upgradeable locks, serde compatibility...) and predictable cross-platform behavior (due to parking_lot implementing thread queuing data types in userspace).

However, with the release of Rust 1.62, the std::sync::RwLock implementation was overhauled. The changes are described in detail at rust-lang/rust#93740, its release announcement on the Rust blog, and the resources those pages link to.

As it was discussed in the Rust language forum, I believe that the choice between parking_lot and std::sync::RwLock is less clear-cut now than it was before:

std's lock does no longer box any pthread lock on any Rust Tier 1 platform, barring macOS.
For applications that are not constrained by memory bandwidth or consumption, the execution speed difference between manipulating 1 byte vs. 1 word is likely to be minimal, due to how variables are aligned in memory and the fact that most processors in use today have word sizes greater than 1 byte.
In x86-64 CPUs, the hardware lock elision instruction used by parking_lot is provided as a part of the Transactional Synchronization Extensions, which have been disabled in every Intel CPU that implements them via microcode updates to mitigate security issues. In addition, Intel considers such extensions deprecated, and stopped implementing them in newer processors. AMD never supported such extensions, AFAIK.
Lock poisoning may be useful to handle shared data being left in an inconsistent state after a thread panics. In addition, std's usage of OS-specific lock syscalls may improve thread scheduling decisions and, as a consequence, performance.
parking_lot claims to offer faster locks than std in its README, but that claim hasn't been updated since 2016, so it can't take into account the improvements made on std for Rust 1.62.

There are more arguments for and against both std's and parking_lot locks, but I think the points above suffice to grasp that adding the ability to toggle quick-cache's dependency on parking_lot may be a good idea. Moreover, it is a stated goal of quick-cache to have a "small dependency tree", so making dependencies optional increases the appeal of this crate with respect to that goal.

As an experiment, I did a quick patch to 5dc07e1 to swap parking_lot's RwLock with std::sync::RwLock and ran cargo bench before and after the change. Criterion did not find any statistically significant performance difference on a Linux system running on an Intel Core i7-12700 CPU.

I would not mind submitting a PR with this change if it is deemed appropriate for the project ❤️