Overview
- Add a
KnownLayout
trait which encodes aspects of a type's layout, and allow deriving it
- Make
KnownLayout
a super-trait of FromZeroes
and AsBytes
- Use
KnownLayout
to support synthesizing references to custom DSTs
Progress
Motivation
TODO
Design
TODO
TODO: Mention this use case from @kupiakos: Might want to support a slice DST which has a custom length field, something like:
struct Foo {
len_of_rest: U32<BE>,
rest: [u8],
}
How can we support a) validating this field or, b) "fixing up" this field after conversion? E.g., if the normal result of conversion would result in a Foo
with 4 elements in rest
, but len_of_rest
is 2, how do we make sure that conversion actually produces a Foo
with 2 elements in rest
?
Relationship to other traits
Ideally, we'd like KnownLayout
to be a super-trait of FromZeroes
(or TryFromBytes
(#5) once FromZeroes: TryFromBytes
). Unfortunately, FromZeroes
supports types which KnownLayout
cannot support: unsized types without a fixed representation (ie, repr(rust)
). These types do not provide sufficient guarantees about their layout in order to satisfy KnownLayout
's safety conditions. We have two options:
- Restrict
FromZeroes
to only support types which KnownLayout
can support, and then have FromZeroes: KnownLayout
- Use
KnownLayout
as a bound on individual functions and methods
We intend for zerocopy to be a general-purpose library which supports use cases that do not require known representations (any use case which doesn't require a type's layout to correspond to a fixed specification, but only to be consistent within the context of a program's execution). The first option would preclude us from supporting these use cases, so we opt for the latter: we will use KnownLayout
as a bound on individual functions and methods.
Deriving KnownLayout
Computing alignment
Use align_of_val_raw
Blocked on rust-lang/rust#69835
TODO
Use a #[repr(C)]
type and field offset
Blocked on either rust-lang/rust#106655 or rust-lang/rust#69835
This design is prototyped in #576:
macro_rules! align_of {
($t:ty) => {{
#[repr(C)]
struct OffsetOfTrailingIsAlignment {
_byte: u8,
_trailing: $t,
}
trailing_field_offset!(OffsetOfTrailingIsAlignment, _trailing)
}};
}
This design relies on the trailing_field_offset!
macro added in #540 and described below. This macro relies on stabilizing rust-lang/rust#69835. Alternatively, if we can stabilize offset_of!
(rust-lang/rust#106655) and add support for using offset_of!
with unsized types, then we can replace trailing_field_offset!
here with offset_of!
.
Computing trailing field offset
The DstLayout
type needs to know the offset of the trailing slice within a DST. For example, given Bar
:
#[repr(C)]
struct Foo {
u: u16,
tail: [u8],
}
#[repr(C)]
struct Bar {
u: u16,
foo: Foo,
}
...the trailing tail: [u8]
is at offset 4 (after 2 bytes for Bar.u
and 2 bytes for Foo.u
).
In order to compute trailing slice offset in the context of a custom derive, it's necessary to compute recursively. In particular, given the offset of the trailing field within the outermost type (in this case, Bar.foo
) and the offset of the trailing slice within the inner type (in this case, Foo.tail
), it's possible to compute the offset of the trailing slice within the outer type as the sum of these two values. In all cases, this is a recursive computation that bottoms out at an actual slice type, [T]
, whose trailing slice offset is 0.
Thus the challenge is, given the AST of an arbitrary, possibly dynamically-sized type, to produce code which computes the byte offset of the trailing field. We have a few options.
offset_of!
Blocked on rust-lang/rust#106655
We could simply wait for the standard library's offset_of!
macro to stabilize and add support for unsized types.
addr_of!
Blocked on rust-lang/rust#69835
Another option is to rely on the standard library's addr_of!
macro. Given a raw pointer to the beginning of a type, the expression addr_of!((*ptr).trailing_field)
will compute the address of trailing field, and the raw pointer method offset_from
can be used to compute the byte offset between these two pointers, effectively computing the trailing field offset.
Unfortunately, place projection the addr_of!
macro performs a place projection, and even if the pointers are never dereferenced, place projections may only happen inside the bounds of a valid allocation. Per The Reference, the following is undefined behavior:
Performing a place projection that violates the requirements of in-bounds pointer arithmetic. A place projection is a field expression, a tuple index expression, or an array/slice index expression.
Thus, in order to use addr_of!
, we need a valid allocation which is large enough that the field projection will not go out-of-bounds (the allocation does not need to be properly aligned). It is fairly trivial to produce a large allocation, even at const time:
const LARGE_ALLOCATION: NonNull<[u8]> = {
const REF: &[u8; 1 << 16] = &[0; 1 << 16];
let ptr: *const [u8; 1 << 16] = REF;
let ptr: *const [u8] = ptr::slice_from_raw_parts(ptr.cast(), 1 << 16);
unsafe { NonNull::new_unchecked(ptr.cast_mut()) }
};
This, however, isn't enough. We also need to bounds check our allocation to make sure that the field projection won't go out-of-bounds. While it's unlikely we'd ever encounter a type with a trailing field offset of 2^16 bytes, it's not impossible, and we can't exhibit undefined behavior if we encounter such a type.
In order to perform the bounds check, we need some way of obtaining an upper bound for the trailing field offset before we've actually computed it. The only way of doing this (to my knowledge) is to calculate the size of the smallest possible value of the type (ie, the value with 0 trailing slice elements). We can't construct such an instance in the general case, but we can construct a raw pointer to such an instance:
// A `*const [()]` with 0 elements.
let slc = core::ptr::slice_from_raw_parts(&() as *const _, 0);
let t = slc as *const T;
This relies on behavior which is currently not well-defined, but is in review as of this writing.
However, once we have this raw pointer, we need to know its size. The only way to do this is with the currently-unstable size_of_val_raw
. Once we've computed the size of the smallest possible value for our type (ie, size_of_val_raw(t)
), we have an upper bound for the offset of the trailing field; so long as this value is not larger than the size of our allocation, the field projection is guaranteed to be in-bounds, allowing us to soundly compute the trailing field offset.
Old text
Add a replacement for MaybeUninit
which supports unsized types. Add support for conversions on custom dynamically sized-types (DSTs).
Status
Phase 1 is in review in #312.
Motivation
This proposal aims to solve two problems using a single, unified design.
Unsized MaybeUninit
The standard library's MaybeUninit
type don't support wrapping unsized types. MaybeUninit
is a core building block of important designs like TryFromBytes
. So long as it doesn't support unsized types, designs like TryFromBytes
also can't support unsized types.
Custom DSTs
While zerocopy supports conversions on slice types, it doesn't support conversions on custom dynamically-sized types (DSTs) like:
#[derive(FromZeroes, FromBytes, AsBytes)]
#[repr(C)]
struct UdpHeader { ... }
#[derive(FromZeroes, FromBytes, AsBytes)]
#[repr(C)]
struct UdpPacket {
header: UdpHeader,
body: [u8], // Unsized field makes this type a "custom DST"
}
Currently, users such as packet-formats instead write code like:
#[derive(FromZeroes, FromBytes, AsBytes)]
#[repr(C)]
struct UdpHeader { ... }
struct UdpPacket<B> {
header: Ref<B, UdpHeader>,
body: B,
}
The latter code is more cumbersome, and requires storing an extra pointer in order to refer to a UDP packet in memory.
The ability to support custom DSTs is a frequent request from our users.
Design
This design comes in two phases; the first phase can be implemented without the second phase, and will provide value on its own.
In Phase 1, support is added for a MaybeUninit
-like type that supports wrapping both T: Sized
and [T] where T: Sized
. This will unlock the TryFromBytes
design as described above. In the Phase 2, support is added for a KnownLayout
trait which provides a superset of the functionality from Phase 1, and supports zero-copy conversion of arbitrary custom DSTs. A custom derive is provided KnownLayout
.
Phase 1: MaybeUninit
The standard library's MaybeUninit<T>
type has the same layout as T
, but it has no bit validity constraints - any byte value, including an uninitialized byte, may be written at any byte offset in a MaybeUninit<T>
. A replacement for this type just needs to have these semantics, and also needs to support unsized types.
Our design builds upon the fact that MaybeUninit
exists and works for sized types. We define the following trait:
pub unsafe trait AsMaybeUninit {
/// A type which has the same layout as `Self`, but which has no validity
/// constraints.
///
/// Roughly speaking, this type is equivalent to what the standard library's
/// [`MaybeUninit<Self>`] would be if `MaybeUninit` supported unsized types.
type MaybeUninit: ?Sized;
}
For Sized
types, the implementation is trivial:
unsafe impl<T: Sized> AsMaybeUninit for T {
type MaybeUninit = core::mem::MaybeUninit<T>;
}
For all other types, we use the standard library's MaybeUninit
type as a building block to build up a type whose layout is the same as what MaybeUninit
's would be if it supported unsized types:
unsafe impl<T: Sized> AsMaybeUninit for [T] {
type MaybeUninit = [core::mem::MaybeUninit<T>];
}
unsafe impl AsMaybeUninit for str {
type MaybeUninit = <[u8] as AsMaybeUninit>::MaybeUninit;
}
Finally, we add our own MaybeUninit
type, which is simply a convenience wrapper:
#[repr(transparent)]
pub struct MaybeUninit<T: AsMaybeUninit + ?Sized> {
inner: T::MaybeUninit,
}
// The equivalent impl for `MaybeUninit<T>` is already covered by the blanket impl for `T: Sized`.
unsafe impl<T: Sized> AsMaybeUninit for MaybeUninit<[T]> {
type MaybeUninit = [<T as AsMaybeUninit>::MaybeUninit];
}
In Phase 1, these are the only supported unsized types. In Phase 2, we allow deriving AsMaybeUninit
on arbitrary types, which adds support for custom DSTs.
Safety invariants
The safety invariants on AsMaybeUninit
are somewhat complex. This is mostly a result of needing to support unsized types. For a more in-depth explanation of why supporting unsized types introduces a lot of complexity, see here.
pub unsafe trait AsMaybeUninit {
/// A type which has the same layout as `Self`, but which has no validity
/// constraints.
///
/// Roughly speaking, this type is equivalent to what the standard library's
/// [`MaybeUninit<Self>`] would be if `MaybeUninit` supported unsized types.
///
/// # Safety
///
/// For `T: AsMaybeUninit`, the following must hold:
/// - Given `m: T::MaybeUninit`, it is sound to write any byte value,
/// including an uninitialized byte, at any byte offset in `m`
/// - `T` and `T::MaybeUninit` have the same alignment requirement
/// - It is valid to use an `as` cast to convert a `t: *const T` to a `m:
/// *const T::MaybeUninit` and vice-versa (and likewise for `*mut T`/`*mut
/// T::MaybeUninit`). Regardless of which direction the conversion was
/// performed, the sizes of the pointers' referents are always equal (in
/// terms of an API which is not yet stable, `size_of_val_raw(t) ==
/// size_of_val_raw(m)`).
/// - `T::MaybeUninit` contains [`UnsafeCell`]s at exactly the same byte
/// ranges that `T` does.
///
/// [`MaybeUninit<Self>`]: core::mem::MaybeUninit
/// [`UnsafeCell`]: core::cell::UnsafeCell
type MaybeUninit: ?Sized;
}
Let's walk through these:
- The reason that we describe bit validity in terms of writing to an existing value (rather than producing a new value) is that it allows us to sidestep a lot of the complexity of defining size equivalence between unsized types. This requirement is just a generalization of what you'd write for sized types: something like, "it must be valid to produce a
T::MaybeUninit
containing any bit pattern, including uninitialized bytes."
- We speak of
T
and T::MaybeUninit
's alignment in prose rather than by referring to core::mem::align_of
because that function requires that its type argument is sized.
- The
as
cast requirement ensures that T
and T::MaybeUninit
are either both sized or have compatible unsized types. In particular, Rust prohibits sized-to-unsized casts; requiring both directions to be valid ensures that casts are sized-to-sized or unsized-to-unsized. The size equality constraint ensures that:
- For sized types, sizes are equal
- For custom DSTs, the trailing slice element's sizes are equal; thus, a
T
and a T::MaybeUninit
whose trailing slices are of the same length have the same size
- It is intended to be sound to synthesize two references to the same memory region where only one reference treats that region as containing an
UnsafeCell
. In other words, interior mutability soundness is a runtime property that is only violated when an UnsafeCell
is used, not merely when a reference to it is constructed. Unfortunately, this is not actually currently guaranteed, and is unsound under the stacked borrows memory model. Thus, we need to ensure that UnsafeCell
s in T
and T::MaybeUninit
line up perfectly.
Pointer conversion
We want to be able to provide unsized equivalents of the assume_init_ref
and assume_init_mut
methods. However, the naive implementation doesn't work:
impl<T: AsMaybeUninit + ?Sized> MaybeUninit<T> {
pub unsafe fn assume_init_ref(&self) -> &T {
let ptr = (&self.inner) as *const T::MaybeUninit as *const T;
unsafe { &*ptr }
}
}
The *const T::MaybeUninit as *mut T
cast isn't valid in a generic context where T: ?Sized
because Rust doesn't know what type of pointers these are (thin, fat, what kind of fat pointer, etc). In order to get around this problem, we add the following methods to AsMaybeUninit
:
fn raw_from_maybe_uninit(maybe_uninit: *const Self::MaybeUninit) -> *const Self;
fn raw_mut_from_maybe_uninit(maybe_uninit: *mut Self::MaybeUninit) -> *mut Self;
This allows us to get assume_init_ref
and assume_init_mut
to work:
impl<T: AsMaybeUninit + ?Sized> MaybeUninit<T> {
pub unsafe fn assume_init_ref(&self) -> &T {
let ptr = T::raw_from_maybe_uninit(&self.inner);
unsafe { &*ptr }
}
}
Phase 2: KnownLayout
TODO
TODO: Mention that this will require removing the blanket impl of AsMaybeUninit
for T: Sized
, which will be a breaking change. We need to not publish in between Phases 1 and 2.
Alternatives
-
For MaybeUninit
, we could propose a change to Rust to add limited support for unsized unions, which would allow MaybeUninit
(which is a union type) to support unsized types. Even if such a proposal were accepted, it would likely take months or years to be stabilized and thus available for our use.
-
For custom DSTs, we could wait until the ptr_metadata
and layout_for_ptr
features stabilize, which would allow us to rewrite Ref
like this:
use core::ptr::Pointee;
pub struct Ref<B, T: ?Sized> {
data: *mut (),
meta: <T as Pointee>::Metadata,
_marker: PhantomData<(B, T)>,
}
For Sized
types, T::Metadata
would be ()
, and so Ref
would consume only a single word (rather than two, as it does today).
On its own, we could still support slice types (T = [U]
) as we do today using from_raw_parts
and from_raw_parts_mut
. However, using layout_for_ptr
, we could also determine a DST's prefix and element sizes and thus support arbitrary DSTs rather than just slices.
This design would work, but it's unlikely that either of these features will stabilize soon, and we need something in the meantime. The design presented also unblocks unsized MaybeUninit
, which these features wouldn't help with.