Summary
Rework current mutable slice split logic into non-atomic, where bitslice.split_at_mut()
would return a tuple of (&mut self, AfterSplit)
. AfterSplit logic is explained lower.
Motivation
I was recently thinking about simple and fast iteration logic based on std slice iterator with some bit tweaks, but such iterator should use default access. I simply don't like carrying atomic overhead everywhere, even if I can deal with it. Also, getting rid of atomics would simplify overall logic of this crate.
Guide-level explanation
On the high level AfterSplit (or whatever name it would get) is a sibling of BitSlice: it implements many traits and functions BitSlice implements (for example, it could be mutable split itself, and after.split_at_mut()
produces (self, AfterSplit)
tuple). But it is a downgrade, and so it doesn't have any public constructor and some BitSlice functions aren't there.
Reference-level explanation
On a lower level AfterSplit is a three words wide structure: (owned_bits: Bitstore, slice: &mut Bitslice)
. The trick is simple: slice would always start with properly owned byte (if it's length is not zero, of course) because slices could be split only at machine word border, and the lone bitstore could only store $BITS - 1
bits at max. The latter is easy: split_mut() returns (self, ...), and "self" part there will be always greedy, so its last item will have at least one bit, and never zero (unless the whole slice has zero length, but in that case both slices have zero length). So, the BitStore part of AfterSplit always has one bit to show it's length, and that's enough (trailing_zeroes, leading_zeroes, wrapped up in an ordering trait function).
Drawbacks
A lot of code and docs to write, three words wide instead of two, not a proper BitSlice, thus it could be limited in options.
Rationale and alternatives
I see three ways to deal with mutable splits: no split_mut() at all, atomics, or some kind of AfterSplit. The first way is very simple, but strongly hits functionality; the second way hits performance and now you should uphold the possible double access on most operations with slice, even if they use immutable access; the third way is a compromise: mutable splitting is allowed and doesn't interfere with other parts of library, but is somewhat unhandy for the user.
Prior art
The current atomic access logic crawls everywhere due to edge cases you should think about: for example, iterator over an immutable bitslice should use atomic access, because it could be called onto the product of split_mut(), which could be mutated right now (which is a very rare case, but you are forced to uphold it). Right now atomics are everywhere. This is good to emulate std slice logic, but BitSlice is not a common slice; nobody would treat utf-8 string as ascii, unless it is checked to be ascii. And so BitSlice doesn't need to do everything in the way std slices do. For example, "string".chars() iterator produce value itself, not a reference, and I haven't heard a single complaint about it. And so could work BitSlice immutable iterator: just give 'em the bool and forget about it.
Unresolved questions
Actually, there's just one question: should it be implemented at all, and if yes, how should it be implemented? Once it's resolved, there's a strait way to write code.
Future possibilities
After this, there would be an easy way to write fast value-producing iterator over an immutable bitslice, without thinking much about mutable access problems. Maybe, implementing some other things would become easier. I'm not very sure about it. But it's a move from copypasting common slices' logic, which could be good for performance, as bit slice is special and the fast way to do things is almost never the same as fast way with common slices.
And some things that are not related to this particular question, but related to fast iterations:
- BitStore trait should have those functions:
rotate_left
, rotate_right
, reverse_bits
(names are the same with numeric primitives' functions);
- BitOrder trait should have those functions:
rotate_forward
, rotate_backward
, from_ne_order
— this function takes BitStore with some bits and turns it into the required order (essentially a no-op for Lsb0 and single bit reverse for Msb0). Those functions could be easily implemented for two default orderings with functions from 1.
I could have written PR with those things before that RFC, but there's no point in those functions before mutable split question is answered in some way.