A multithreaded and single threaded string interner that allows strings to be cached with a minimal memory footprint,
associating them with a unique key that can be used to retrieve them at any time. A Rodeo allows O(1)
internment and resolution and can be turned into a RodeoReader to allow for contention-free resolutions
with both key to str and str to key operations. It can also be turned into a RodeoResolver with only
key to str operations for the lowest possible memory usage.
Which interner do I use?
For single-threaded workloads Rodeo is encouraged, while multi-threaded applications should use ThreadedRodeo.
Both of these are the only way to intern strings, but most applications will hit a stage where they are done interning
strings, and at that point is where the choice between RodeoReader and RodeoResolver. If the user needs to get
keys for strings still, then they must use the RodeoReader (although they can still transfer into a RodeoResolver)
at this point. For users who just need key to string resolution, the RodeoResolver gives contention-free access at the
minimum possible memory usage. Note that to gain access to ThreadedRodeo the multi-threaded feature is required.
Interner
Thread-safe
Intern String
str to key
key to str
Contention Free
Memory Usage
Rodeo
❌
✅
✅
✅
N/A
Medium
ThreadedRodeo
✅
✅
✅
✅
❌
Most
RodeoReader
✅
❌
✅
✅
✅
Medium
RodeoResolver
✅
❌
❌
✅
✅
Least
Cargo Features
By default lasso has one dependency, hashbrown, and only Rodeo is exposed. Hashbrown is used since the raw_entry api is currently unstable in the standard library's hashmap.
The raw hashmap API is used for custom hashing within the hashmaps, which works to dramatically reduce memory usage
To make use of ThreadedRodeo, you must enable the multi-threaded feature.
multi-threaded - Enables ThreadedRodeo, the interner for multi-threaded tasks
ahasher - Use ahash's RandomState as the default hasher
no-std - Enables no_std + alloc support for Rodeo and ThreadedRodeo
Automatically enables the following required features:
ahasher - no_std hashing function
serialize - Implements Serialize and Deserialize for all Spur types and all interners
inline-more - Annotate external apis with #[inline]
Example: Using Rodeo
use lasso::Rodeo;letmut rodeo = Rodeo::default();let key = rodeo.get_or_intern("Hello, world!");// Easily retrieve the value of a key and find the key for valuesassert_eq!("Hello, world!", rodeo.resolve(&key));assert_eq!(Some(key), rodeo.get("Hello, world!"));// Interning the same string again will yield the same keylet key2 = rodeo.get_or_intern("Hello, world!");assert_eq!(key, key2);
Example: Using ThreadedRodeo
use lasso::ThreadedRodeo;use std::{thread, sync::Arc};let rodeo = Arc::new(ThreadedRodeo::default());let key = rodeo.get_or_intern("Hello, world!");// Easily retrieve the value of a key and find the key for valuesassert_eq!("Hello, world!", rodeo.resolve(&key));assert_eq!(Some(key), rodeo.get("Hello, world!"));// Interning the same string again will yield the same keylet key2 = rodeo.get_or_intern("Hello, world!");assert_eq!(key, key2);// ThreadedRodeo can be shared across threadslet moved = Arc::clone(&rodeo);let hello = thread::spawn(move || {assert_eq!("Hello, world!", moved.resolve(&key));
moved.get_or_intern("Hello from the thread!")}).join().unwrap();assert_eq!("Hello, world!", rodeo.resolve(&key));assert_eq!("Hello from the thread!", rodeo.resolve(&hello));
Example: Creating a RodeoReader
use lasso::Rodeo;// Rodeo and ThreadedRodeo are interchangeable hereletmut rodeo = Rodeo::default();let key = rodeo.get_or_intern("Hello, world!");assert_eq!("Hello, world!", rodeo.resolve(&key));let reader = rodeo.into_reader();// Reader keeps all the strings from the parentassert_eq!("Hello, world!", reader.resolve(&key));assert_eq!(Some(key), reader.get("Hello, world!"));// The Reader can now be shared across threads, no matter what kind of Rodeo created it
Example: Creating a RodeoResolver
use lasso::Rodeo;// Rodeo and ThreadedRodeo are interchangeable hereletmut rodeo = Rodeo::default();let key = rodeo.get_or_intern("Hello, world!");assert_eq!("Hello, world!", rodeo.resolve(&key));let resolver = rodeo.into_resolver();// Resolver keeps all the strings from the parentassert_eq!("Hello, world!", resolver.resolve(&key));// The Resolver can now be shared across threads, no matter what kind of Rodeo created it
Example: Making a custom-ranged key
Sometimes you want your keys to only inhabit (or not inhabit) a certain range of values so that you can have custom niches.
This allows you to pack more data into what would otherwise be unused space, which can be critical for memory-sensitive applications.
use lasso::{Key,Rodeo};// First make our key type, this will be what we use as handles into our interner#[derive(Copy,Clone,PartialEq,Eq)]structNicheKey(u32);// This will reserve the upper 255 values for us to use as nichesconstNICHE:usize = 0xFF000000;// Implementing `Key` is unsafe and requires that anything given to `try_from_usize` must produce the// same `usize` when `into_usize` is later calledunsafeimplKeyforNicheKey{fninto_usize(self) -> usize{self.0asusize}fntry_from_usize(int:usize) -> Option<Self>{if int < NICHE{// The value isn't in our niche range, so we're good to goSome(Self(int asu32))}else{// The value interferes with our niche, so we return `None`None}}}// To make sure we're upholding `Key`'s safety contract, let's make two small tests#[test]fnvalue_in_range(){let key = NicheKey::try_from_usize(0).unwrap();assert_eq!(key.into_usize(), 0);let key = NicheKey::try_from_usize(NICHE - 1).unwrap();assert_eq!(key.into_usize(), NICHE - 1);}#[test]fnvalue_out_of_range(){let key = NicheKey::try_from_usize(NICHE);assert!(key.is_none());let key = NicheKey::try_from_usize(u32::max_value()asusize);assert!(key.is_none());}// And now we're done and can make `Rodeo`s or `ThreadedRodeo`s that use our custom key!letmut rodeo:Rodeo<NicheKey> = Rodeo::new();let key = rodeo.get_or_intern("It works!");assert_eq!(rodeo.resolve(&key), "It works!");
Example: Creation using FromIterator
use lasso::Rodeo;use core::iter::FromIterator;// Works for both `Rodeo` and `ThreadedRodeo`let rodeo = Rodeo::from_iter(vec!["one string",
"two string",
"red string",
"blue string",
]);assert!(rodeo.contains("one string"));assert!(rodeo.contains("two string"));assert!(rodeo.contains("red string"));assert!(rodeo.contains("blue string"));
use lasso::Rodeo;use core::iter::FromIterator;// Works for both `Rodeo` and `ThreadedRodeo`let rodeo:Rodeo = vec!["one string", "two string", "red string", "blue string"].into_iter().collect();assert!(rodeo.contains("one string"));assert!(rodeo.contains("two string"));assert!(rodeo.contains("red string"));assert!(rodeo.contains("blue string"));
Benchmarks
Benchmarks were gathered with Criterion.rs
OS: Windows 10
CPU: Ryzen 9 3900X at 3800Mhz
RAM: 3200Mhz
Rustc: Stable 1.44.1
Suppose a new string is being interned by two threads running at the same time, then the if at line 331 (shard.read().get(...)) would return None for both threads. Both threads continue to line 341, one will call self.map.insert() which will write the string into the map, the second thread would do the same, overwriting the value from the first thread.
This would then create a situation where two Spurs could exist that resolve to identical strings, but not be equal to each other because their keys are different.
One option to address this would be to lock the shard for the duration as we insert into the arena and increment the key. If we would like to avoid locking for that long, perhaps checking the return value from self.map.insert(...) would help, as we can verify that we already added that string in, and just return its key, but then we would need to undo the other operations, and it doesn't seem possible to remove strings from the arena. This way we may end up using slightly more memory and keys than actually needed (which is already the case in this scenario), but at the very least, the Spur equality property would be maintained.
When I try to use both features I get the following compile error:
Compiling lasso v0.6.0
error[E0599]: no function or associated item named `deserialize` found for struct `Vec<_, _>` in the current scope
--> /home/xxxxx/.cargo/registry/src/github.com-1ecc6299db9ec823/lasso-0.6.0/src/reader.rs:423:40
|
423 | let vector: Vec<String> = Vec::deserialize(deserializer)?;
| ^^^^^^^^^^^ function or associated item not found in `Vec<_, _>`
error[E0599]: no function or associated item named `deserialize` found for struct `Vec<_, _>` in the current scope
--> /home/xxxxx/.cargo/registry/src/github.com-1ecc6299db9ec823/lasso-0.6.0/src/resolver.rs:305:40
|
305 | let vector: Vec<String> = Vec::deserialize(deserializer)?;
| ^^^^^^^^^^^ function or associated item not found in `Vec<_, _>`
error[E0599]: no function or associated item named `deserialize` found for struct `Vec<_, _>` in the current scope
--> /home/xxxxx/.cargo/registry/src/github.com-1ecc6299db9ec823/lasso-0.6.0/src/single_threaded.rs:928:40
|
928 | let vector: Vec<String> = Vec::deserialize(deserializer)?;
| ^^^^^^^^^^^ function or associated item not found in `Vec<_, _>`
error[E0277]: the trait bound `String: Deserialize<'_>` is not satisfied
--> /home/xxxxx/.cargo/registry/src/github.com-1ecc6299db9ec823/lasso-0.6.0/src/multi_threaded.rs:963:45
|
963 | let deser_map: HashMap<String, K> = HashMap::deserialize(deserializer)?;
| ^^^^^^^^^^^^^^^^^^^^ the trait `Deserialize<'_>` is not implemented for `String`
|
= help: the trait `Deserialize<'de>` is implemented for `&'a str`
= note: required for `hashbrown::HashMap<String, K>` to implement `Deserialize<'_>`
Some errors have detailed explanations: E0277, E0599.
For more information about an error, try `rustc --explain E0277`.
error: could not compile `lasso` due to 4 previous errors
Compilation exited abnormally with code 101 at Tue Nov 15 14:06:59
Thaks for this nice library :) Here is the issue I am facing:
When producing a release build of a library using lasso 0.4 with the following dependencies
serde = { version = "1.0.114", features = ["derive"] }
serde_json = "1.0.56"
typetag = "0.1"
lasso = { version = "0.4", features = ["multi-threaded", "serialize"] }
During
cargo build --release
Following compilation errors appear
error[E0277]: the size for values of type `str` cannot be known at compilation time
--> /Users/pablomartinez/.cargo/registry/src/github.com-1ecc6299db9ec823/lasso-0.4.1/src/util.rs:457:31
|
457 | let elem: &_ = $slice.get_unchecked($idx);
| ^^^^^^^^^^^^^ doesn't have a size known at compile-time
|
::: /Users/pablomartinez/.cargo/registry/src/github.com-1ecc6299db9ec823/lasso-0.4.1/src/reader.rs:455:49
|
455 | let key_string: &str = unsafe { index_unchecked!(strings, key.into_usize()) };
| ------------------------------------------- in this macro invocation
|
= help: the trait `std::marker::Sized` is not implemented for `str`
= note: to learn more, visit <https://doc.rust-lang.org/book/ch19-04-advanced-types.html#dynamically-sized-types-and-the-sized-trait>
= note: this error originates in a macro (in Nightly builds, run with -Z macro-backtrace for more info)
error[E0277]: the size for values of type `str` cannot be known at compilation time
--> /Users/pablomartinez/.cargo/registry/src/github.com-1ecc6299db9ec823/lasso-0.4.1/src/reader.rs:433:27
|
433 | let mut strings = Vec::with_capacity(capacity.strings);
| ^^^^^^^^^^^^^^^^^^ doesn't have a size known at compile-time
|
= help: the trait `std::marker::Sized` is not implemented for `str`
= note: to learn more, visit <https://doc.rust-lang.org/book/ch19-04-advanced-types.html#dynamically-sized-types-and-the-sized-trait>
= note: required by `std::vec::Vec::<T>::with_capacity`
error[E0599]: no method named `push` found for struct `std::vec::Vec<str>` in the current scope
--> /Users/pablomartinez/.cargo/registry/src/github.com-1ecc6299db9ec823/lasso-0.4.1/src/reader.rs:471:29
|
471 | strings.push(allocated);
| ^^^^ method not found in `std::vec::Vec<str>`
|
= note: the method `push` exists but the following trait bounds were not satisfied:
`str: std::marker::Sized`
error[E0599]: no method named `get_unchecked` found for struct `std::vec::Vec<str>` in the current scope
--> /Users/pablomartinez/.cargo/registry/src/github.com-1ecc6299db9ec823/lasso-0.4.1/src/util.rs:457:31
|
457 | let elem: &_ = $slice.get_unchecked($idx);
| ^^^^^^^^^^^^^ method not found in `std::vec::Vec<str>`
|
::: /Users/pablomartinez/.cargo/registry/src/github.com-1ecc6299db9ec823/lasso-0.4.1/src/reader.rs:476:38
|
476 | ... unsafe { index_unchecked!(strings, key.into_usize()) };
| ------------------------------------------- in this macro invocation
|
= note: this error originates in a macro (in Nightly builds, run with -Z macro-backtrace for more info)
error[E0308]: mismatched types
--> /Users/pablomartinez/.cargo/registry/src/github.com-1ecc6299db9ec823/lasso-0.4.1/src/reader.rs:490:13
|
490 | strings,
| ^^^^^^^ expected `&str`, found `str`
|
= note: expected struct `std::vec::Vec<&'static str>`
found struct `std::vec::Vec<str>`
error[E0277]: the size for values of type `str` cannot be known at compilation time
--> /Users/pablomartinez/.cargo/registry/src/github.com-1ecc6299db9ec823/lasso-0.4.1/src/util.rs:457:31
|
457 | let elem: &_ = $slice.get_unchecked($idx);
| ^^^^^^^^^^^^^ doesn't have a size known at compile-time
|
::: /Users/pablomartinez/.cargo/registry/src/github.com-1ecc6299db9ec823/lasso-0.4.1/src/single_threaded.rs:960:49
|
960 | let key_string: &str = unsafe { index_unchecked!(strings, key.into_usize()) };
| ------------------------------------------- in this macro invocation
|
= help: the trait `std::marker::Sized` is not implemented for `str`
= note: to learn more, visit <https://doc.rust-lang.org/book/ch19-04-advanced-types.html#dynamically-sized-types-and-the-sized-trait>
= note: this error originates in a macro (in Nightly builds, run with -Z macro-backtrace for more info)
error[E0277]: the size for values of type `str` cannot be known at compilation time
--> /Users/pablomartinez/.cargo/registry/src/github.com-1ecc6299db9ec823/lasso-0.4.1/src/single_threaded.rs:938:27
|
938 | let mut strings = Vec::with_capacity(capacity.strings);
| ^^^^^^^^^^^^^^^^^^ doesn't have a size known at compile-time
|
= help: the trait `std::marker::Sized` is not implemented for `str`
= note: to learn more, visit <https://doc.rust-lang.org/book/ch19-04-advanced-types.html#dynamically-sized-types-and-the-sized-trait>
= note: required by `std::vec::Vec::<T>::with_capacity`
error[E0599]: no method named `push` found for struct `std::vec::Vec<str>` in the current scope
--> /Users/pablomartinez/.cargo/registry/src/github.com-1ecc6299db9ec823/lasso-0.4.1/src/single_threaded.rs:976:29
|
976 | strings.push(allocated);
| ^^^^ method not found in `std::vec::Vec<str>`
|
= note: the method `push` exists but the following trait bounds were not satisfied:
`str: std::marker::Sized`
error[E0599]: no method named `get_unchecked` found for struct `std::vec::Vec<str>` in the current scope
--> /Users/pablomartinez/.cargo/registry/src/github.com-1ecc6299db9ec823/lasso-0.4.1/src/util.rs:457:31
|
457 | let elem: &_ = $slice.get_unchecked($idx);
| ^^^^^^^^^^^^^ method not found in `std::vec::Vec<str>`
|
::: /Users/pablomartinez/.cargo/registry/src/github.com-1ecc6299db9ec823/lasso-0.4.1/src/single_threaded.rs:981:38
|
981 | ... unsafe { index_unchecked!(strings, key.into_usize()) };
| ------------------------------------------- in this macro invocation
|
= note: this error originates in a macro (in Nightly builds, run with -Z macro-backtrace for more info)
error[E0308]: mismatched types
--> /Users/pablomartinez/.cargo/registry/src/github.com-1ecc6299db9ec823/lasso-0.4.1/src/single_threaded.rs:995:13
|
995 | strings,
| ^^^^^^^ expected `&str`, found `str`
|
= note: expected struct `std::vec::Vec<&'static str>`
found struct `std::vec::Vec<str>`
error: aborting due to 10 previous errors
Some errors have detailed explanations: E0277, E0308, E0599.
For more information about an error, try `rustc --explain E0277`.
error: could not compile `lasso`.
The issue doesn't appear when producing a non release build. Ended moving to lasso0.3.1 but the Key serializations isn't as nicer as in 0.4.1 :)
impl Default for Rodeo<Spur, RandomState> redirects to Self::new(), and Self::new() is implemented for any Rodeo<K, RandomState> where K: Key.
Is there any reason not to implement impl<K: Key> Default for Rodeo<K, RandomState>?
To my knowledge dashmap can't do dat, but on nightly std's hash map and hashbrown on stable can. What this allows you to do is performing a single hash map look-up, no matter whether the input is already interned or needs to be stored.
The big downside is more complicated code, especially with some feature cfgs for nightly, and, well, this only being an optimisation for the ST interner. So all this may be considered »not worth it«.
error[E0275]: overflow evaluating the requirement `&_ well-formed`
--> src/util.rs:445:13
|
445 | $slice[$idx]
| ^^^^^^^^^^^^
|
::: src/reader.rs:453:49
|
453 | let key_string: &str = unsafe { index_unchecked!(strings, key.into_usize()) };
| ------------------------------------------- in this macro invocation
|
= note: this error originates in the macro `index_unchecked` (in Nightly builds, run with -Z macro-backtrace for more info)
error[E0275]: overflow evaluating the requirement `&_ well-formed`
--> src/util.rs:445:13
|
445 | $slice[$idx]
| ^^^^^^^^^^^^
|
::: src/rodeo.rs:1132:49
|
1132 | let key_string: &str = unsafe { index_unchecked!(strings, key.into_usize()) };
| ------------------------------------------- in this macro invocation
|
= note: this error originates in the macro `index_unchecked` (in Nightly builds, run with -Z macro-backtrace for more info)
For more information about this error, try `rustc --explain E0275`.
warning: `lasso` (lib) generated 7 warnings
error: could not compile `lasso` (lib) due to 2 previous errors; 7 warnings emitted
Possibly related to #45, but a fix will be required for debug.
There is a new semver version of hashbrown. It would be nice to get a new release of lasso that uses it so we can avoid duplicate dependencies. From the output of cargo deny check:
As can be seen, lasso on it's own (with the multi-threaded feature) is enough to pull in two versions thanks to dashmap depending on the newer version. Other libraries that I use (like rust-ini) also pulls in the newer version.
Hi, just used lasso for the first time and I was getting deadlocks and panics inside of DashMap - had no idea what I was doing wrong. After like an hour of debugging I noticed that 0.7.0 just came out a week ago so on a hunch I tried 0.6.0 - deadlocks and panics went away!
If it helps here's the panic trace:
0: std::panicking::begin_panic_handler
at /rustc/3a8a131e9509c478ece1c58fe0ea2d49463d2300/library\std\src\panicking.rs:577
1: core::panicking::panic_fmt
at /rustc/3a8a131e9509c478ece1c58fe0ea2d49463d2300/library\core\src\panicking.rs:67
2: core::panicking::panic
at /rustc/3a8a131e9509c478ece1c58fe0ea2d49463d2300/library\core\src\panicking.rs:117
3: dashmap::mapref::entry::VacantEntry<ref$<str$>,lasso::keys::Spur,ahash::random_state::RandomState>::insert<ref$<str$>,lasso::keys::Spur,ahash::random_state::RandomState>
at C:\Users\Andy\.cargo\registry\src\index.crates.io-6f17d22bba15001f\dashmap-5.4.0\src\mapref\entry.rs:106
4: enum2$<dashmap::mapref::entry::Entry<ref$<str$>,lasso::keys::Spur,ahash::random_state::RandomState> >::or_try_insert_with
at C:\Users\Andy\.cargo\registry\src\index.crates.io-6f17d22bba15001f\dashmap-5.4.0\src\mapref\entry.rs:82
5: lasso::threaded_rodeo::ThreadedRodeo<lasso::keys::Spur,ahash::random_state::RandomState>::try_get_or_intern
at C:\Users\Andy\.cargo\registry\src\index.crates.io-6f17d22bba15001f\lasso-0.7.0\src\threaded_rodeo.rs:329
6: lasso::threaded_rodeo::ThreadedRodeo<lasso::keys::Spur,ahash::random_state::RandomState>::get_or_intern<lasso::keys::Spur,ahash::random_state::RandomState,ref$<str$> >
at C:\Users\Andy\.cargo\registry\src\index.crates.io-6f17d22bba15001f\lasso-0.7.0\src\threaded_rodeo.rs:289
error: Undefined Behavior: Data race detected between (1) Read on thread `<unnamed>` and (2) Write on thread `<unnamed>` at alloc120819+0x8. (2) just happened here
--> /home/ben/.cargo/registry/src/index.crates.io-6f17d22bba15001f/lasso-0.7.0/src/arenas/atomic_bucket.rs:178:13
|
178 | addr_of_mut!((*self.as_ptr()).len).write(new_length);
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Data race detected between (1) Read on thread `<unnamed>` and (2) Write on thread `<unnamed>` at alloc120819+0x8. (2) just happened here
|
help: and (1) occurred earlier here
--> src/main.rs:13:17
|
13 | rodeo.get_or_intern(rng.gen::<u64>().to_string());
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
= help: this indicates a bug in the program: it performed an invalid operation, and caused Undefined Behavior
= help: see https://doc.rust-lang.org/nightly/reference/behavior-considered-undefined.html for further information
= note: BACKTRACE (of the first span):
= note: inside `lasso::arenas::atomic_bucket::UniqueBucketRef::set_len` at /home/ben/.cargo/registry/src/index.crates.io-6f17d22bba15001f/lasso-0.7.0/src/arenas/atomic_bucket.rs:178:13: 178:65
= note: inside `lasso::arenas::atomic_bucket::UniqueBucketRef::push_slice` at /home/ben/.cargo/registry/src/index.crates.io-6f17d22bba15001f/lasso-0.7.0/src/arenas/atomic_bucket.rs:212:18: 212:49
= note: inside `lasso::arenas::lockfree::LockfreeArena::store_str` at /home/ben/.cargo/registry/src/index.crates.io-6f17d22bba15001f/lasso-0.7.0/src/arenas/lockfree.rs:121:46: 121:70
= note: inside `lasso::ThreadedRodeo::try_get_or_intern::<std::string::String>` at /home/ben/.cargo/registry/src/index.crates.io-6f17d22bba15001f/lasso-0.7.0/src/threaded_rodeo.rs:322:49: 322:83
= note: inside `lasso::ThreadedRodeo::get_or_intern::<std::string::String>` at /home/ben/.cargo/registry/src/index.crates.io-6f17d22bba15001f/lasso-0.7.0/src/threaded_rodeo.rs:289:9: 289:36
note: inside closure
--> src/main.rs:13:17
|
13 | rodeo.get_or_intern(rng.gen::<u64>().to_string());
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
note: some details are omitted, run with `MIRIFLAGS=-Zmiri-backtrace=full` for a verbose backtrace
error: aborting due to previous error; 1 warning emitted
Outside of Miri, this readily manifests as an infinite loop or a panic.
Indexing an interner by key seems very convenient. Unfortunately none of the Rodeo family of types currently allow for that. I propose that for each of the Rodeo types:
rodeo[key] replace the behavior of rodeo.resolve(key),
rodeo.resolve(key) replace the behavior of rodeo.try_resolve(key),
rodeo.try_resolve() be deprecated.
Obviously (2) will result in an API breaking change. This shouldn't be an issue since lasso is still pre-v1.0.0, but if avoiding this is desired, there are other options. In addition to (1), we could:
leave .try_resolve() and deprecate .resolve() (unnecessarily wordy; inconsistent naming),
leave both .try_resolve() and .resolve() (there's now more than one way of doing something).
I'm not completely happy with either of these alternative solutions, but then again, I don't even use lasso (yet) so perhaps consideration should be placed on its users rather than my API preferences.
(I wrote a comparison of memory usage of different interners. This issue is suggesting some improvements from my own interner implementation that are the cause of the nearly 3x memory usage difference.) You've quoted memory footprint as a key feature of lasso, so I thought you'd be interested to see my analysis. I've not actually measured performance characteristics of any of these tweaks, just the memory usage patterns.
The current definition of the (single threaded) Rodeo is roughly (degenericized)
When using the raw_entry API, the size of the str -> Spur map can be cut at least to a third by instead storing (effectively) HashSet<Spur> (though for raw_entry it's a HashMap<Spur, ()>) and using raw_entry to compare the Spur entries as if they were the string they map to. The cost for this memory saving is an extra indirection involved in hashing/comparing entries in the map (for any str -> Spur conversions).
There is also a further optimization potential for the Spur -> str map. There are two potential options here:
Reduce memory usage of the map by about 25%. Instead of storing the &[u8] directly in the vector, store indexes into the arena. IIUC, these would minimally be { bucket_idx: u32, start_idx: u32, end_idx: u32 } for the current arena. (In my own crate, it's just (u32, u32) as the arena is a single contiguous allocation; however, I believe the less-copies arena approach you have here to probably be significantly better for tail-latency.) The cost would be extra indirection both on Spur -> str lookups and on str -> Spur lookups if the above change to that map is used.
Allow interning of real &'static str in the interner without copying them into internal storage. With a direct Spur -> str map that doesn't go indirect through the arena, this is trivial to support; just put the &'static str into the map. And interning of statically known strings is a surprisingly common and useful thing to be able to do; most contexts that benefit from an interner have some set of known names they know they'll always use and already have baked into the binary's static data section in order to treat them specially. (Here's rustc's list!)
If lasso can incorporate even just the first size optimization for the Spur -> str map, even if just with the hashbrown feature, I'd feel a lot more comfortable recommending lasso instead of maintaining my own interner.