Git Product home page Git Product logo

zerocopy's Introduction

zerocopy

Need more out of zerocopy? Submit a customer request issue!

Fast, safe, compile error. Pick two.

Zerocopy makes zero-cost memory manipulation effortless. We write unsafe so you don't have to.

Overview

Conversion Traits

Zerocopy provides four derivable traits for zero-cost conversions:

  • TryFromBytes indicates that a type may safely be converted from certain byte sequences (conditional on runtime checks)
  • FromZeros indicates that a sequence of zero bytes represents a valid instance of a type
  • FromBytes indicates that a type may safely be converted from an arbitrary byte sequence
  • IntoBytes indicates that a type may safely be converted to a byte sequence

This traits support sized types, slices, and slice DSTs.

Marker Traits

Zerocopy provides three derivable marker traits that do not provide any functionality themselves, but are required to call certain methods provided by the conversion traits:

  • KnownLayout indicates that zerocopy can reason about certain layout qualities of a type
  • Immutable indicates that a type is free from interior mutability, except by ownership or an exclusive (&mut) borrow
  • Unaligned indicates that a type's alignment requirement is 1

You should generally derive these marker traits whenever possible.

Conversion Macros

Zerocopy provides four macros for safe, zero-cost casting between types:

  • (try_[try_transmute])transmute (conditionally) converts a value of one type to a value of another type of the same size
  • transmute_mut converts a mutable reference of one type to a mutable reference of another type of the same size
  • transmute_ref converts a mutable or immutable reference of one type to an immutable reference of another type of the same size

These macros perform compile-time alignment and size checks, but cannot be used in generic contexts. For generic conversions, use the methods defined by the conversion traits.

Byteorder-Aware Numerics

Zerocopy provides byte-order aware integer types that support these conversions; see the byteorder module. These types are especially useful for network parsing.

Cargo Features

  • alloc By default, zerocopy is no_std. When the alloc feature is enabled, the alloc crate is added as a dependency, and some allocation-related functionality is added.

  • std By default, zerocopy is no_std. When the std feature is enabled, the std crate is added as a dependency (ie, no_std is disabled), and support for some std types is added. std implies alloc.

  • derive Provides derives for the core marker traits via the zerocopy-derive crate. These derives are re-exported from zerocopy, so it is not necessary to depend on zerocopy-derive directly.

    However, you may experience better compile times if you instead directly depend on both zerocopy and zerocopy-derive in your Cargo.toml, since doing so will allow Rust to compile these crates in parallel. To do so, do not enable the derive feature, and list both dependencies in your Cargo.toml with the same leading non-zero version number; e.g:

    [dependencies]
    zerocopy = "0.X"
    zerocopy-derive = "0.X"
  • simd When the simd feature is enabled, FromZeros, FromBytes, and IntoBytes impls are emitted for all stable SIMD types which exist on the target platform. Note that the layout of SIMD types is not yet stabilized, so these impls may be removed in the future if layout changes make them invalid. For more information, see the Unsafe Code Guidelines Reference page on the layout of packed SIMD vectors.

  • simd-nightly Enables the simd feature and adds support for SIMD types which are only available on nightly. Since these types are unstable, support for any type may be removed at any point in the future.

Security Ethos

Zerocopy is expressly designed for use in security-critical contexts. We strive to ensure that that zerocopy code is sound under Rust's current memory model, and any future memory model. We ensure this by:

  • ...not 'guessing' about Rust's semantics. We annotate unsafe code with a precise rationale for its soundness that cites a relevant section of Rust's official documentation. When Rust's documented semantics are unclear, we work with the Rust Operational Semantics Team to clarify Rust's documentation.
  • ...rigorously testing our implementation. We run tests using Miri, ensuring that zerocopy is sound across a wide array of supported target platforms of varying endianness and pointer width, and across both current and experimental memory models of Rust.
  • ...formally proving the correctness of our implementation. We apply formal verification tools like Kani to prove zerocopy's correctness.

For more information, see our full soundness policy.

Relationship to Project Safe Transmute

Project Safe Transmute is an official initiative of the Rust Project to develop language-level support for safer transmutation. The Project consults with crates like zerocopy to identify aspects of safer transmutation that would benefit from compiler support, and has developed an experimental, compiler-supported analysis which determines whether, for a given type, any value of that type may be soundly transmuted into another type. Once this functionality is sufficiently mature, zerocopy intends to replace its internal transmutability analysis (implemented by our custom derives) with the compiler-supported one. This change will likely be an implementation detail that is invisible to zerocopy's users.

Project Safe Transmute will not replace the need for most of zerocopy's higher-level abstractions. The experimental compiler analysis is a tool for checking the soundness of unsafe code, not a tool to avoid writing unsafe code altogether. For the foreseeable future, crates like zerocopy will still be required in order to provide higher-level abstractions on top of the building block provided by Project Safe Transmute.

MSRV

See our MSRV policy.

Changelog

Zerocopy uses GitHub Releases.

Disclaimer

Disclaimer: Zerocopy is not an officially supported Google product.

zerocopy's People

Contributors

akonradi avatar antoniosbarotsis avatar benbrittain avatar birkenfeld avatar dependabot[bot] avatar djkoloski avatar dorryspears avatar frazar avatar glokta1 avatar google-pr-creation-bot avatar joshlf avatar jswrenn avatar kupiakos avatar marinaciocea-zz avatar maurer avatar mcy avatar mnkhouri avatar msalah73 avatar nmulcahey avatar rabisg0 avatar ryanrussell avatar samuelselleck avatar sanchithhegde avatar shaybarak avatar sivadeilra avatar step-security-bot avatar tamird avatar tommy-gilligan avatar yotamofek avatar zoo868e avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

zerocopy's Issues

Run `cargo miri test` on wasm and riscv target once they're supported

Currently, Miri doesn't support any wasm target (see #22 (comment)). Once one is supported, we should run cargo miri test on a wasm target in CI.

EDIT: This appears to also be true of riscv64gc-unknown-linux-gnu, although I haven't found it documented anywhere.

Old issue text

This job failed for reasons that seem specific to running cargo miri test with the target set to wasm32-wasi. I'm going to disable the test for the time being.

Thanks to @frazar for discovering rust-lang/miri#1916, which suggests that Miri lacks support for some wasm32 targets. The task for this issue now is to a) determine whether any wasm32 targets are supported, b) switch to using a supported one if so or, c) skip running Miri when targetting wasm32 if none are supported.

Roadmap

Overview

This issue describes zerocopy's high-level roadmap both in terms of goals and in terms of concrete steps to achieve those goals.

A slogan often associated with Rust is "Fast, Reliable, Productive. Pick Three." Zerocopy's mission is to make that slogan true by making it so that 100% safe Rust code is just as fast and ergonomic as unsafe Rust code.

In order to live up to that mission, we need to do the following things:

  • Hold ourselves to a high standard for soundness, including in the face of future compiler changes
  • Frame zerocopy in a way that is legible to various user bases, including:
    • Users who don't conceive of themselves as users of unsafe
    • Users who are especially security-conscious
    • Users who care about the crates.io ecosystem
  • Identify gaps which prevent users from choosing zerocopy, and close those gaps
  • Identify features which can improve the ergonomics or performance of code which uses zerocopy, and implement those features

Motivation

A user story

Imagine you are a systems programmer. Any sort of systems software will do, but we need a specific example, so let's say you're writing a networking stack. You care about your software's performance, you care about your software's correctness, and you care about your team's productivity. In order to achieve maximum performance, you want your code to do as few things as possible, and that means avoiding any situation where your data must be converted between representations in the course of processing it. For example, if you are parsing a network packet, you want to operate on the packet in-place: so-called "zero-copy" parsing (hey, that's the name of the crate!).

Your first impulse might be to use unsafe code. Perhaps you write a parsing routine like:

struct UdpHeader {
    src_port: u16,
    dst_port: u16,
    length: u16,
    checksum: u16,
}

struct UdpPacket<'a> {
    header: &'a UdpHeader,
    body: &'a [u8],
}

fn parse_udp_packet(bytes: &[u8]) -> Option<UdpPacket<'_>> {
    if bytes.len() <  {
        return None;
    }

    let (header, body) = bytes.split_at(size_of::<UdpHeader>());
    let header = unsafe { &*header.as_ptr().cast::<UdpHeader>() };
    Some(UdpPacket { header, body })
}

One of your goals is performance, and this code is fast! But you also care about your code's correctness, and you know that unsafe is notoriously difficult to get right (in fact, this implementation is unsound in two ways - can you spot them?). So you decide to be more careful. You spend the day poring over the Rustonomicon and the language reference. You find a fix some bugs in your code, and you even write a pseudo-proof of correctness in a "SAFETY" comment so that others can check your work.

#[repr(C)]
struct UdpHeader {
    src_port: [u8; 16],
    dst_port: [u8; 16],
    length: [u8; 16],
    checksum: [u8; 16],
}

struct UdpPacket<'a> {
    header: &'a UdpHeader,
    body: &'a [u8],
}

fn parse_udp_packet(bytes: &[u8]) -> Option<UdpPacket<'_>> {
    if bytes.len() < size_of::<UdpHeader>() {
        return None;
    }

    let (header, body) = bytes.split_at(size_of::<UdpHeader>());

    // SAFETY: We've validated that `bytes` is at least as long as `UdpHeader`. We know
    // that `UdpHeader` has no alignment requirement because all of its fields are `u8`
    // arrays, which don't have any alignment requirement, and it's `#[repr(C)]` so its
    // alignment is equal to the maximum of the alignments of its fields. Thus, the reference
    // we create here satisfies the layout properties of a `&UdpHeader`.
    //
    // We also know that any sequence of bytes of length `size_of::<UdpHeader>()` is a
    // valid instance of `UdpHeader` because that is true of all of its fields. That means that,
    // regardless of the contents of `bytes`, those contents represent a valid `UdpHeader`,
    // and so this conversion is unconditionally sound.
    //
    // Finally, we know that the created reference has the correct lifetime because of Rust's
    // lifetime elision rules. In particular, the type signature of this function guarantees that
    // the argument and return types have the same lifetime. Thus, the returned `UdpPacket`
    // cannot outlive the bytes it was parsed from.
    let header = unsafe { &*header.as_ptr().cast::<UdpHeader>() };
    Some(UdpPacket { header, body })
}

One of your goals is correctness, and this code is much more likely to be correct than the previous version! But you also care about your productivity, and you just spent an entire day writing a few lines of code. And what happens when you need to change the code? How much work will it take to convince yourself that a change is still correct? What if other, less experienced developers want to work on this section of code? Will they feel comfortable following your logic and feel confident in their ability to make changes without introducing bugs? So you decide to commit to never using unsafe. You modify your code to get rid of it and make whatever changes you need to get it to compile:

#[repr(C)]
struct UdpHeader {
    src_port: u16,
    dst_port: u16,
    length: u16,
    checksum: u16,
}

struct UdpPacket<'a> {
    header: UdpHeader,
    body: &'a [u8],
}

fn parse_udp_packet(bytes: &[u8]) -> Option<UdpPacket<'_>> {
    if bytes.len() < size_of::<UdpHeader>() {
        return None;
    }

    let (src_port_bytes, rest) = bytes.split_at(size_of::<u16>());
    let (dst_port_bytes, rest) = bytes.split_at(size_of::<u16>());
    let (length_bytes, rest) = bytes.split_at(size_of::<u16>());
    let (checksum_bytes, rest) = bytes.split_at(size_of::<u16>());

    let mut src_port = [0; 2];
    let mut dst_port = [0; 2];
    let mut length = [0; 2];
    let mut checksum = [0; 2];

    (&mut src_port[..]).copy_from(src_port_bytes);
    (&mut dst_port[..]).copy_from(dst_port_bytes);
    (&mut length[..]).copy_from(length_bytes);
    (&mut checksum[..]).copy_from(checksum_bytes);

    let header = UdpHeader {
        src_port: u16::from_be_bytes(src_port),
        dst_port: u16::from_be_bytes(dst_port),
        length: u16::from_be_bytes(length),
        checksum: u16::from_be_bytes(checksum),
    };

    Some(UdpPacket { header, body: rest })
}

One of your goals is productivity, and this code is easy to verify, so it was fast to write and will be fast to change in the future! But you also care about performance, and you're doing a lot more bounds checking and copying than you were before. Maybe the optimizer will improve things for you, but there's no way to be sure without benchmarking it, and even if the optimizer is smart enough this time, you might get unlucky with a future change that makes the code just confusing enough to stump the optimizer, leading to unexpected performance cliffs.

You think back on all of these attempts. You wanted fast code, so you used unsafe, but that made you worried about correctness. You also wanted correct code, so you spent a long time reasoning about your code's correctness and you wrote down that reasoning so others could check your work, but that took an entire day and resulted in code that would be slow to change in the future. You wanted to be productive, so you got rid of all of the unsafe, but that made your code slow again. It seems like you just can't win!

Moral

The moral of this story is that, when it comes to operations that touch memory directly, the Rust language and standard library are not on their own sufficient to achieve "Fast, Reliable, Productive. Pick Three." While the basic ingredients are all there, putting them together unavoidably requires sacrifices along one of the dimensions of speed, reliability, and productivity. Zerocopy aims to fill this gap. In the Design section, we outline the current state of zerocopy, identify the gaps between zerocopy's current state and its aspirational future, and outline the steps required reach that future.

Design

As mentioned above, zerocopy's mission is to make good on the slogan Fast, Reliable, Productive. Pick Three. by making it so that 100% safe Rust code is just as fast and ergonomic as unsafe Rust code. Using zerocopy, you could write the parsing code from the previous section like this:

use zerocopy::{FromBytes, Ref, Unaligned};

#[derive(FromBytes, Unaligned)]
#[repr(C)]
struct UdpHeader {
    src_port: [u8; 16],
    dst_port: [u8; 16],
    length: [u8; 16],
    checksum: [u8; 16],
}

struct UdpPacket<'a> {
    header: UdpHeader,
    body: &'a [u8],
}

fn parse_udp_packet(bytes: &[u8]) -> Option<UdpPacket<'_>> {
    let (header, body) = Ref::new_unaligned_from_prefix(bytes)?;
    Some(UdpPacket { header: header.into_ref(), body })
}

This is already a huge step above what you can do with just the standard library, and illustrates what it's like to have an API that takes care of all of this for you.

Thanks to ergonomics and safety like this, the building blocks that zerocopy provides are already being used in a diverse array of domains. Networking is zerocopy's origin and its bread and butter, but it is also used in embedded security firmware, in software emulation, in hypervisors, in filesystems, in high-frequency trading, and much more. However, it still has a ways to go before it can replace most of the unsafe code in the Rust ecosystem.

Gaps

User model

In order to identify gaps, it's helpful to say a bit about who we hope to reach with zerocopy.

Not looking to use unsafe code

A lot of use of unsafe code is by programmers who conceive of themselves primarily as trying to solve some practical problem. If they think about it at all, they think about unsafe code as a tool, not as an object of contemplation. They may have a vague sense of what the phrase "memory safe" means, and they may even know that pointers need to be aligned. They likely don't know that, in order to be able to convert a type to a byte slice, the type must not contain any uninitialized bytes, and they almost certainly have never heard of pointer provenance.

Often, these users don't know a priori that unsafe code is a tool they should consider. Instead, in trying to solve a particular problem, they may come across a crate or a Google search result which points them towards unsafe, or at least points them towards a crate which makes use of unsafe.

In order to reach users in this camp, we must:

  • Frame our APIs in terms that makes sense for their use cases instead of in terms of the language semantics concepts that underlie them. For example, the AsBytes trait should speak primarily about viewing a type as bytes; details about uninitialized bytes should be saved for the "Safety" section of the doc comment.
  • Advertise zerocopy in terms that these users will recognize as describing their needs. This is an area of active development, and threading the needle correctly is difficult.

Security-conscious

On the other end of the spectrum, many of our users come from domains which generally have a high bar for correctness - kernels, hypervisors, cryptography, security hardware, etc. These users are extremely wary of taking external dependencies, and only take dependencies when they absolutely need to or when they have a high degree of trust in an external software artifact.

In order to reach users in this camp, we must:

  • Hold ourselves to a high standard for correctness and soundness
  • Articulate this standard concisely but in sufficient technical detail that a user in this camp can come away from our docs comfortable with taking a dependency on zerocopy

Care about the open-source ecosystem

Many potential users are the authors of crates which are published on crates.io. These users have concerns which are specific to publishing software in an open-source ecosystem. For example:

  • They care about API stability, especially when their use of zerocopy would be visible in their own API
  • They care about compile times
  • They care about the optics of relying on pre-1.0 crates

In order to reach users in this camp, we must have good open-source hygiene. We must:

  • Provide the ability to disable features which are expensive to compile, especially including zerocopy-derive
  • Document and test compliance with a minimum supported Rust version (MSRV)
  • Decide what it would take for us to reach a 1.0 release; while versioning like this may not matter in some worlds (such as monorepos like Google's, where zerocopy was first developed), version numbers are taken as indicators of quality and stability in the open-source world. We need to think about what sorts of long-term API stability guarantees we're willing to make, and then be serious about it when we make them.

Memory model instability and zerocopy's future-soundness guarantee

Rust doesn't have a well-defined memory model. As a result, it's possible that code which is sound under today's compiler may become unsound at some point in the future. If zerocopy wants to be a trustworthy replacement for unsafe code, and ask its users not to worry about soundness, it needs to promise not only soundness, but soundness under any future compiler behavior and under any future memory model.

This work is tracked in #61.

Feature-completeness

Building-block API

Currently, we have a lot of support for combinations of operations. For example, if you want to convert a &mut [u8] to a &mut [T], and you want to check at runtime that your byte slice has the right size and alignment, you would do Ref::new_slice(bytes)?.into_mut_slice(). If you wanted to do the same, but first zero the bytes of the &mut [u8], you'd use the new_slice_zeroed constructor. Even though most of the logic is the same, there's an entirely different constructor.

This has a few downsides:

  • Operations are often fallible when they don't need to be. For example, casting from &[u8; size_of::<T>()] to &T where T: FromBytes + Unaligned can in principle be an infallible operation. However, since all of our APIs take the more general &[u8] type, we have no choice but to perform a bounds check, and thus to return an Option<&T> instead of just &T. This forces the user to .unwrap() or similar, and provides fewer guarantees about codegen.
  • Only explicitly-supported combinations are expressible. If we haven't gotten around to supporting a particular combination, there is no alternative.
  • Users must reach first and only for an API with a very_long_name_that_describes_exactly_what_they_want, and there are a ton to choose from.
  • Our API doesn't encourage users to understand what operations their behavior can be decomposed into.

To address these issues, we want to move towards a world in which there are small "building blocks" which can be combined to perform larger operations. Convenience methods for common combinations will probably still be supported, but we may remove some of the less-frequently used bits of the API so long as users can still express the same behavior using the new building blocks. So far, we intend to build:

  • ByteArray<T> - a polyfill for [u8; size_of::<T>()] until the latter type is stable in a generic context
  • Align<T, A> - a T whose alignment is rounded up to that of A
  • Various conversions which use the ByteArray, Unalign, and Align types to elide length and alignment checks. A few examples:
    • fn unaligned_ref_from_bytes(bytes: &ByteArray<T>) -> &Unalign<T> where T: FromBytes + Sized
    • fn mut_from_bytes(bytes: &mut ByteArray<T>) -> Option<&mut T> where T: FromBytes + AsBytes + Sized
    • fn as_byte_array(&self) -> &ByteArray<Self> where Self: AsBytes + Sized

Another added benefit of these building blocks is that it will make it easier to reason about the soundness of our implementations. Since many of our functions/methods encode complex behavior (exactly what we're talking about in this section), safety arguments are similarly complex. If we were instead able to decompose these into smaller (still unsafe) operations, we could make it easier to reason about the safety of the resulting implementations.

For example, currently, the implementation of Ref::into_ref looks like this:

Current impl
impl<'a, B, T> Ref<B, T>
where
    B: 'a + ByteSlice,
    T: FromBytes,
{
    /// Converts this `Ref` into a reference.
    ///
    /// `into_ref` consumes the `Ref`, and returns a reference to
    /// `T`.
    pub fn into_ref(self) -> &'a T {
        // SAFETY: This is sound because `B` is guaranteed to live for the
        // lifetime `'a`, meaning that a) the returned reference cannot outlive
        // the `B` from which `self` was constructed and, b) no mutable methods
        // on that `B` can be called during the lifetime of the returned
        // reference. See the documentation on `deref_helper` for what
        // invariants we are required to uphold.
        self.deref_helper()
    }
}

impl<B, T> Ref<B, T>
where
    B: ByteSlice,
    T: FromBytes,
{
    /// Creates an immutable reference to `T` with a specific lifetime.
    ///
    /// # Safety
    ///
    /// The type bounds on this method guarantee that it is safe to create an
    /// immutable reference to `T` from `self`. However, since the lifetime `'a`
    /// is not required to be shorter than the lifetime of the reference to
    /// `self`, the caller must guarantee that the lifetime `'a` is valid for
    /// this reference. In particular, the referent must exist for all of `'a`,
    /// and no mutable references to the same memory may be constructed during
    /// `'a`.
    unsafe fn deref_helper<'a>(&self) -> &'a T {
        &*self.0.as_ptr().cast::<T>()
    }
}

I'm sure that this is sound, but I've always been a bit nervous about how complex the argument is. By contrast, we can simplify this using the building blocks we intend to introduce. In 2c67380 (this commit hasn't been merged, and may be deleted at some point), we change the above code to:

New impl
impl<'a, B, T> Ref<B, T>
where
    B: ByteSlice + Into<&'a [u8]>,
    T: FromBytes,
{
    /// Converts this `Ref` into a reference.
    ///
    /// `into_ref` consumes the `Ref`, and returns a reference to
    /// `T`.
    pub fn into_ref(self) -> &'a T {
        let bytes = self.0.into();
        // SAFETY: `Ref` upholds the invariant that `.0`'s length is
        // equal to `size_of::<T>()`. `size_of::<ByteArray<T>>() ==
        // size_of::<T>()`, so this call is sound.
        let byte_array = unsafe { ByteArray::from_slice_unchecked(bytes) };
        // SAFETY: `Ref` upholds the invariant that `.0` satisfies
        // `T`'s alignment requirement.
        unsafe { T::ref_from_bytes_unchecked(byte_array) }
    }
}

I find this implementation much easier to reason about. The safety invariants on ByteArray::from_slice_unchecked and FromBytes::ref_from_bytes_unchecked are straightforward, and it is much more obvious from reading those functions that the lifetimes are propagated correctly. (Note that this commit also adds a requirement to ByteSlice about what an Into<&'a [u8]> impl is required to return.)

Simplify ByteSlice's definition and make it un-sealed

Currently, ByteSlice has both a Deref<Target=[u8]> bound and an as_ptr(&self) -> *const u8 method. The latter is probably redundant given the former, and adds another method that we have to document safety invariants for. ByteSlice's safety invariants are somewhat subtle, so getting rid of as_ptr would be very nice.

It would also make it easier for others to implement ByteSlice for their own types. We've had users request this, but it's currently impossible because ByteSlice is sealed. While we are confident that our existing impls of ByteSlice and ByteSliceMut are sound for our use cases, we would need to formalize the safety requirements for any types to implement these traits before we make them un-sealed. This is probably a good idea anyway because it may surface ways that we can simplify the API.

Split ByteSlice so that split_at is in a different trait (#1)

Currently, ByteSlice has a split_at(self, mid: usize) -> (Self, Self) method analogous to the slice method of the same name. Our performance design requires this method to be very cheap, which precludes implementing ByteSlice for types like Vec, for which split_at would require allocation.

Instead, #1 tracks splitting ByteSlice into two traits so a type such as Vec can implement the base ByteSlice trait without needing to implement split_at. Most of the zerocopy API can operate on this simpler trait, while a few functions and methods would still require the ability to call split_at.

Elide length or alignment checks when they can be verified statically

Tracked in #280.

Support types which are not FromBytes, but which can be converted from a sequence of zeroes

Tracked in #30.

Support fallible conversions

Tracked in #5; in progress.

Support conversions in const fn

Tracked in #115.

Support converting &[[u8; size_of::<T>()]] to &[T]

What is says on the tin.

Rename LayoutVerified to Ref (#68)

What is says on the tin. LayoutVerified is descriptive if you understand type theory and the concept of a "witness" (although we probably should have put "witness" in the name...), but it's a meaningless term for most users. We should rename it to Ref or similar - after all, it's just a reference with a few niceties.

Miscellaneous features

API polish

Documentation is complete, thorough, and up-to-date (#32)

High confidence in correctness and soundness

Tested and stable on all platforms

Usable in Cargo and crates.io ecosystem

Compile-time performance

Known bugs are fixed

Code quality

Developer experience

Document style

When zerocopy lived in Fuchsia's tree, we relied on Fuchia's style guidelines (both for code and commit messages). Now that we're on GitHub, we need to explicitly document these.

Publish 0.7.0-alpha

We recently migrated from Fuchsia's main repository, and at the time of migration, there were already features in zerocopy which were a) not published on crates.io and, b) used in Fuchsia's tree. Thus, Fuchsia can't switch from their in-tree copy to a vendored copy from crates.io until we publish on crates.io. We're a ways away from 0.7.0, so in order to unblock this for Fuchsia, we should publish a 0.7.0-alpha version.

List of TODOs blocked on MSRV

This issue is a repository of TODOs which are blocked on updating to a particular MSRV. In addition to those listed here, also see TODO(#67) comments in the source code.

  • In FromZeroes::new_box_slice_zeroed, remove defensive programming which currently works around a bug in Layout::from_size_align
  • Make MaybeValid::as_slice (pending in #279) const once our MSRV is >= 1.64.0, when slice_from_raw_parts was stabilized as const
  • Some instances of #[allow(clippy::as_conversions)] (pending in the current draft of #196) are spurious, and more recent versions of Clippy don't fire in those locations.
  • Some unsafe blocks in macros are marked with #[allow(clippy::undocumented_unsafe_blocks)]; more recent versions of Clippy don't fire in those locations.
  • Once our MSRV is 1.64, we can use this feature to mimic this setting in zerocopy-derive's Cargo.toml in order to cut down on duplication between it and zerocopy's Cargo.toml. This should also allow us to remove some of the CI logic that verifies that metadata in both files matches, since this will be true automatically.
  • Make any functions that use these const: rust-lang/rust#116218
  • Use ptr::from_ref and ptr::from_mut

Add type which encodes statically that a sequence of bytes are all zero

Especially in combination with the FromZeroes trait, it would be useful to be able to represent statically that a sequence of bytes are currently zero. Some operations which produce guaranteed-zero bytes (such as allocating new virtual memory pages) could get a performance or safety benefit from this functionality.

One hypothetical API would look like this, but there could be other ways of designing this:

// All bytes are 0. Note that there may be subtle interactions with
// interior mutability when handing out immutable references.
#[repr(transparent)]
pub struct Zero<T: ?Sized>(T);

// `T: AsBytes` allows us to inspect `T`'s bytes to confirm that they're all 0
impl<T: AsBytes> Zero<T> {
    pub fn new(t: T) -> Option<Zero<T>> { ... }

    pub fn try_from_slice(ts: &[T]) -> Option<&[Zero<T>]> { ... }
}

impl<T> Zero<T> {
    pub unsafe fn new_unchecked(t: T) -> Zero<T> { ... }
}

impl<T: ?Sized + AsRef> Zero<T> {
    pub fn try_from_ref(t: &T) -> Option<&Zero<T>> { ... }
}

impl<T: ?Sized> Zero<T> {
    pub unsafe fn from_ref_unchecked(t: &T) -> &Zero<T> { ... }
}

impl<T: ?Sized> Deref for Zero<T> { ... }

pub unsafe trait FromZeroes {
    fn from_bytes(bytes: Zero<ByteSlice<Self>>) -> Self { ... }

    fn from_slice(bytes: &[Zero<u8>]) -> Option<&[Self]> { ... }

    // Maybe modify existing zeroing methods to return a `Zero`?
    fn new_zeroed() -> Zero<Self> { ... }
}

Test `cargo package` or `cargo publish --dry-run` in CI?

Historically, I've run into issues while doing cargo package or cargo publish that I didn't run into during normal development. We should think through this more and figure out if there's anything we can do to exercise these potential pitfalls in CI.

Rust Cache warnings for tests with more than one feature enabled

Looking at this CI log

Warning:  ValidationError: Key Validation Error: v0-rust-1.56.1-i686-unknown-linux-gnu-alloc,simd--build_test-a732c1bed0bf174cd2e9b37132d29ed1815eb7cd-df69c652838aa8f1d3f759765b37bdf011baf599 cannot contain commas.
    at checkKey (/home/runner/work/_actions/Swatinem/rust-cache/v2.0.0/dist/restore/index.js:58:15)
    at Object.<anonymous> (/home/runner/work/_actions/Swatinem/rust-cache/v2.0.0/dist/restore/index.js:90:13)
    at Generator.next (<anonymous>)
    at /home/runner/work/_actions/Swatinem/rust-cache/v2.0.0/dist/restore/index.js:15:71
    at new Promise (<anonymous>)
    at __webpack_modules__.7799.__awaiter (/home/runner/work/_actions/Swatinem/rust-cache/v2.0.0/dist/restore/index.js:11:12)
    at Object.restoreCache (/home/runner/work/_actions/Swatinem/rust-cache/v2.0.0/dist/restore/index.js:80:12)
    at run (/home/runner/work/_actions/Swatinem/rust-cache/v2.0.0/dist/restore/index.js:62066:40)

it seems like Rust Cache does not like commas in keys.

This occurs because keys defined as

key: "${{ matrix.channel }}-${{ matrix.target }}-${{ matrix.features }}-${{ hashFiles('**/Cargo.lock') }}"

and they end up containing commas when ${{ matrix.features }} expands to two or more features, like "alloc,simd,simd-nightly".

Maybe the key definition should be changed to a comma-less verison. Also, a way to turn RustCache warnings into hard errors should be considered.

Relax requirements for deriving `FromZeroes` on enums

This issue used to track FromZeroes, which has now been merged. However, support for #[derive(FromZeroes)] is incomplete - the rules implemented when deriving on an enum are equivalent to the rules for deriving FromBytes on an enum. Those rules are unnecessarily restrictive.

Now, this issue tracks:

  • Figuring out what rules enums need to follow in order to be FromZeroes
  • Modifying our derive to implement those rules

Old text

Add an unsafe marker trait called FromZeroes (or similar) which indicates that a type can be safely constructed from all 0 bytes. Add a custom derive to zerocopy-derive for this trait.

I would expect the API to look something like this:

unsafe trait FromZeroes {
    #[doc(hidden)]
    fn only_derive_is_allowed_to_implement_this_trait() where Self: Sized;

    fn zero(&mut self);

    // These would all be moved from `FromBytes`.

    fn new_zeroed() -> Self where Self: Sized { ... }
    #[cfg(feature = "alloc")]
    fn new_box_zeroed() -> Box<Self> where Self: Sized { ... }
    #[cfg(feature = "alloc")]
    fn new_box_slice_zeroed() -> Box<Self> where Self: Sized { ... }

    // If we also implement `Zeroed` (#31):
    fn read_from_zeroed<B: ByteSlice>(bytes: Zeroed<B>) -> Option<Self> where Self: Sized { ... }
    fn read_from_zeroed_prefix<B: ByteSlice>(bytes: Zeroed<B>) -> Option<Self> where Self: Sized { ... }
    fn read_from_zeroed_suffix<B: ByteSlice>(bytes: Zeroed<B>) -> Option<Self> where Self: Sized { ... }
}

// `FromBytes` would gain a `FromZeroes` bound.
unsafe trait FromBytes: FromZeroes { ... }

`test_new_error` fails on i686

From this test job:

error: test failed, to rerun pass '--lib'
thread 'tests::test_new_error' panicked at 'assertion failed: LayoutVerified::<_, u64>::new(&buf.buf[4..]).is_none()', src/lib.rs:2784:9

Fix the bug and re-enable the test in .github/workflows/ci.yml.

Test `cargo doc` in CI

In CI, test that RUSTDOCFLAGS="-D warnings" cargo doc succeeds (the environment variable causes warnings such as broken intra-doc links to be treated as errors).

Note that we currently have some intra-doc links which are broken with some features enabled or disabled (see this URLO thread), and we need to figure out what to do about these.

`test_as_bytes_methods` fails on powerpc

From this test job:

thread 'main' panicked at 'assertion failed: `(left == right)`
  left: `[0, 0, 0, 1, 0, 0, 0, 2, 0, 0, 0, 0]`,
 right: `[1, 0, 0, 0, 2, 0, 0, 0, 0, 0, 0, 0]`', src/lib.rs:2885:9

Fix the bug and re-enable the test in .github/workflows/ci.yml.

Add defensive programming in `FromBytes::new_box_slice_zeroed`

In #63, we switched from manual bounds checking plus Layout::from_size_align_unchecked to calling Layout::from_size_align and relying on its bounds checking. There seemed to be a bug in that bounds checking prior to 1.65.0, and so we wrote a test which we disabled on Rust versions prior to 1.65.0. The reasoning was that the worst that could happen was a failed allocation, so it wasn't actually dangerous to expose this bug in the API.

dtolnay/semver#294 deals with this as well, and takes a more defensive stance. It observes that allocation with an invalid Layout is actually UB, and if the API is somehow reachable via attacker-controlled input, it results in an easy-to-exploit path to attacker-controlled UB. I think we may want to add defenses along the same lines.

The specific task is to:

  • Figure out what UB is possible when combining the code as currently written with the version of Layout::from_size_align on 1.64.0
  • Modify FromBytes::new_box_slice_zeroed to ensure that that UB cannot be triggered
  • Leave a // TODO(#67): ... comment to remove the workaround once our MSRV is at least 1.65.0
  • Add that TODO to the list for 1.65.0 kept in #67
  • Remove the conditional compilation on test_new_box_slice_zeroed_panics_isize_overflow and update the comment there

Make it easier to run CI tests locally

We currently run a large suite of tests in CI, which includes testing with different tools across different toolchains and different targets. It also involves miscellaneous things like confirming that README.md is up to date, that code is properly formatted with rustfmt, etc. In other words, it's not as simple as just running cargo test.

As a result, it's easy to make changes that appear fine locally but break CI, requiring an edit-commit-push cycle just to see if all of the CI tests will pass. We should make it easier to run most or all of these tests locally. For especially fast tests (like README.md and code formatting), we could even provide them as commit hooks.

@djkoloski also reports finding this to be painful (#69 (comment)):

I struggled specifically with formatting and forgetting to make stderr match in tests. Adding a way to format test files and an easier way to update all the stderr files for tests might be helpful. 🙂

The challenge here is going to be making these tests available without either a) moving everything out of our ci.yml and into a script (which means we don't get nice job output and error messages) or, b) duplicating everything (which is brittle).

Support `TryFromBytes` - conditional conversion analogous to `FromBytes`

Co-authored with @jswrenn.

Overview

Add a TryFromBytes trait, which supports byte-to-type conversions for non-FromBytes types by performing runtime validation. Add a custom derive which generates this validation code automatically.

Many thanks to @kupiakos and @djkoloski for providing invaluable feedback and input on this design.

Progress

  • Add TryFromBytes trait definition
  • Implement TryFromBytes for existing FromBytes types
  • Add try_from_ref method; impl for bool
  • Implement derive for structs
  • Implement for slices
  • Implement for arrays
  • Allow deriving on repr(packed) structs
  • Allow deriving on unions
  • Allow deriving on field-less enums with primitive reprs (u8, i16, etc)
  • Allow deriving on field-less enums with repr(C) by treating the discriminant type as [u8; size_of::<Self>()]
  • #873
  • Implement TryFromBytes for fn() and extern "C" fn() types
  • Implement TryFromBytes for UnsafeCell<T>
  • Make TryFromBytes a super-trait of FromZeros
  • Remove #[doc(hidden)] from all items which are intended to be public
  • Add to TryFromBytes docs to explain that you can't always round trip T -> [u8] -> T (notably for pointer types), which could be confusing given that, for TryFromBytes, the failure would show up at runtime
  • Rename methods consistent with #1095
  • TryFromBtyes doc comment currently incorrectly says: "zerocopy does not permit implementing TryFromBytes for any union type"
  • Consider this comment
  • Non-breaking/blocking
    • Allow deriving on data-full enums
  • Non-breaking/non-blocking
    • Add try_from_mut and try_read_from methods
    • Implement for unsized UnsafeCell
      • Consider that we may not need to require T: Sized (described in #251) if we use the design in #905
    • Remove Self: NoCell bound from try_read_from
    • Support deriving on unions without Immutable bound
    • is_bit_valid should promise not to mutate its argument's referent
    • #1330

Motivation

Many use cases involve types whose layout is well-defined, but which cannot implement FromBytes because there exist bit patterns which are invalid (either they are unsound in terms of language semantics or they are unsafe in the sense of violating a library invariant).

Consider, for example, parsing an RPC message format. It would be desirable for performance reasons to be able to read a message into local memory, validate its structure, and if validation succeeds, treat that memory as containing a parsed message rather than needing to copy the message in order to transform it into a native Rust representation.

Here's a simple, hypothetical example of an RPC to request log messages from a process:

/// The arguments to the `RequestLogs` RPC (auto-generated by the RPC compiler).
#[repr(C)]
struct RequestLogsArgs {
    max_logs: u64,
    since: LogTime,
    level: LogLevel,
}

/// Log time, measured as time on the process's monotonic clock.
#[repr(C)]
struct LogTime {
    secs: u64,
    // Invariant: In the range [0, 10^9)
    nsecs: u32,
}

/// Level of log messages requested from `RequestLogs`.
#[repr(u8)]
enum LogLevel {
    Trace,
    Debug,
    Info,
    Warn,
    Error,
}

None of these types can be FromBytes. For LogLevel, only the u8 values 0 through 4 correspond to enum variants, and constructing a LogLevel from any other u8 would be unsound. For LogTime, any sequence of the appropriate number of bytes would constitute a valid instance of LogTime from Rust's perspective - it would not cause unsoundness - but some such sequences would violate the invariant that the nsecs field is in the range [0, 10^9).

While these types can't be FromBytes, we'd still like to be able to conditionally reinterpret a sequence of bytes as a RequestLogsArgs - it's just that we need to perform runtime validation first. Ideally, we'd be able to write code like:

/// The arguments to the `RequestLogs` RPC (auto-generated by the RPC compiler).
#[derive(TryFromBytes)]
#[repr(C)]
struct RequestLogsArgs {
    max_stats: u64,
    since: LogTime,
    level: LogLevel,
}

/// Log time, measured as time on the process's monotonic clock.
#[derive(TryFromBytes)]
#[TryFromBytes(validator = "is_valid")]
#[repr(C)]
struct LogTime {
    secs: u64,
    // Invariant: In the range [0, 10^9)
    nsecs: u32,
}

impl LogTime {
    fn is_valid(&self) -> bool {
        self.nsecs < 1_000_000_000
    }
}

/// Level of log messages requested from `RequestLogs`.
#[derive(TryFromBytes)]
#[repr(u8)]
enum LogLevel {
    Trace,
    Debug,
    Info,
    Warn,
    Error,
}

The TryFromBytes trait - the subject of this design - provides the ability to fallibly convert a byte sequence to a type, performing validation at runtime. At a minimum, the validation code simply ensures soundness - for example, in the case of LogLevel, validating that byte values are in the range [0, 4]. The custom derive also supports user-defined validation like the LogTime::is_valid method (note the validator annotation on LogTime), which can be used to enforce safety invariants that go above and beyond soundness.

Given these derives of TryFromBytes, an implementation of this RPC could be as simple as:

fn serve_request_logs_rpc<F: FnMut(&RequestLogsArgs)>(server: &mut RpcServer, f: F) -> Result<()> {
    loop {
        let bytes = [0u8; mem::size_of::<RequestLogsArgs>()];
        server.read_request(&mut bytes[..])?;
        let args = RequestLogsArgs::try_from_bytes(&bytes[..]).ok_or(ParseError)?;
        f(args);
    }
}

The design proposed in this issue implements this API.

Design

TODO

This design builds on the following features:

/// A value which might or might not constitute a valid instance of `T`.
// Builds on the custom MaybeUninit type described in #29
pub struct MaybeValid<T: AsMaybeUninit + ?Sized>(MaybeUninit<T>);

// Allows us to use the `project!` macro for field projection (proposed in #196)
unsafe impl<T, F> Projectable<F, AlignedByteArray<F>> for AlignedByteArray<T> {
    type Inner = T;
}

impl<T> MaybeValid<T> {
    /// Converts this `MaybeValid<T>` to a `T`.
    ///
    /// # Safety
    ///
    /// `self` must contain a valid `T`.
    pub const unsafe fn assume_valid(self) -> T { ... }

    /// Converts this `&MaybeValid<T>` to a `&T`.
    ///
    /// # Safety
    ///
    /// `self` must contain a valid `T`.
    pub const unsafe fn assume_valid_ref(&self) -> &T { ... }

    /// Converts this `&mut MaybeValid<T>` to a `&mut T`.
    ///
    /// # Safety
    ///
    /// `self` must contain a valid `T`.
    pub unsafe fn assume_valid_mut(&mut self) -> &mut T { ... }
}

/// # Safety
///
/// `is_bit_valid` is correct. If not, can cause UB.
pub unsafe trait TryFromBytes {
    fn is_bit_valid(bytes: &MaybeValid<Self>) -> bool;

    fn try_from_ref(bytes: &[u8]) -> Option<&Self> {
        let maybe_valid = Ref::<_, MaybeValid<T>>::new(bytes)?.into_ref();
        if Self::is_bit_valid(maybe_valid) {
            // SAFETY: `is_bit_valid` promises that it only returns true if
            // its argument contains a valid `T`. This is exactly the safety
            // precondition of `MaybeValid::assume_valid_ref`.
            Some(unsafe { maybe_valid.assume_valid_ref() })
        } else {
            None
        }
    }

    fn try_from_mut(bytes: &mut [u8]) -> Option<&mut Self>
    where
        Self: AsBytes + Sized,
    {
        let maybe_valid = Ref::<_, MaybeValid<T>>::new(bytes)?.into_mut();
        if Self::is_bit_valid(maybe_valid) {
            // SAFETY: `is_bit_valid` promises that it only returns true if
            // its argument contains a valid `T`. This is exactly the safety
            // precondition of `MaybeValid::assume_valid_ref`.
            Some(unsafe { maybe_valid.assume_valid_ref() })
        } else {
            None
        }
    }

    fn try_read_from(bytes: &[u8]) -> Option<Self>
    where
        Self: Sized
    {
        let maybe_valid = <MaybeValid<T> as FromBytes>::read_from(bytes)?;
        if Self::is_bit_valid(&maybe_valid) {
            // SAFETY: `is_bit_valid` promises that it only returns true if
            // its argument contains a valid `T`. This is exactly the safety
            // precondition of `MaybeValid::assume_valid`.
            Some(unsafe { maybe_valid.assume_valid() })
        } else {
            None
        }
    }
}

Here's an example usage:

/// A type without any safety invariants.
#[derive(TryFromBytes)]
#[repr(C)]
struct MySimpleType {
    b: bool,
}

// Code emitted by `derive(TryFromBytes)`
unsafe impl TryFromBytes for MySimpleType {
    fn is_bit_valid(bytes: &MaybeValid<Self>) -> bool {
        // `project!` is described in #196
        let b: &MaybeValid<bool> = project!(&bytes.b);
        TryFromBytes::is_bit_valid(b)
    }
}

/// A type with invariants encoded using `validate`.
#[derive(TryFromBytes)]
#[TryFromBytes(validator = "validate")]
#[repr(C)]
struct MyComplexType {
    b: bool,
}

// Code emitted by `derive(TryFromBytes)`
unsafe impl TryFromBytes for MyComplexType {
    fn is_bit_valid(bytes: &AlignedByteArray<Self>) -> bool {
        // `project!` is described in #196
        let b: &MaybeValid<bool> = project!(&bytes.b);
        if !TryFromBytes::is_bit_valid(b) { return false; }
        // If there's no interior mutability, then we know this is sound because of preceding
        // validation. TODO: What to do about interior mutability?
        let slf: &MyComplexType = ...;
        MyComplexType::validate(slf)
    }
}

impl MyComplexType {
    fn validate(slf: &MyComplexType) -> bool { ... }
}

Unions

See for a discussion of how to support unions in TryFromBytes: #696

Relationship with other traits

There are obvious relationships between TryFromBytes and the existing FromZeroes and FromBytes traits:

  • If a type is FromZeroes, then it should probably be TryFromBytes (at a minimum, we must know something about the type's layout and bit validity to determine that it is genuinely FromZeroes)
    • This implies that we should change FromZeroes to be FromZeroes: TryFromBytes
  • If a type is FromBytes, then it is trivially TryFromBytes (where is_bit_valid unconditionally returns true)
    • This implies that we should provide a blanket impl impl<T: FromBytes> TryFromBytes for T

Unfortunately, neither of these are possible today.

FromZeroes: TryFromBytes

The reason this bound doesn't work has to do with unsized types. As described in the previous section, working with unsized types is difficult. Luckily for FromZeroes, it doesn't have to do anything with the types it's implemented for - it's just a marker trait. It can happily represent a claim about the bit validity of a type even if that type isn't constructible in practice (over time, FromZeroes will become more useful as more unsized types become constructible). By contrast, TryFromBytes is only useful if we can emit validation code (namely, is_bit_valid). For that reason, we require that TryFromBytes: AsMaybeUninit since that bound is required in order to support the MaybeValid type required by is_bit_valid.

This means that we have two options if we want FromZeroes: TryFromBytes:

  • We can keep TryFromBytes: AsMaybeUninit. As a result, some types which are FromZeroes today can no longer be FromZeroes, and some blanket impls of FromZeroes would require more complex bounds (e.g., today we write impl<T: FromZeroes> FromZeroes for Wrapping<T>; under this system, we'd need to write impl<T: FromZeroes> FromZeroes for Wrapping<T> where <T as AsMaybeUninit>::MaybeUninit: Sized, or alternatively we'd need to write one impl for T and a different one for [T]).
  • We could move the AsMaybeUninit bound out of definition of TryFromBytes and into is_bit_valid (and callers). As a result, we can keep existing impls of FromZeroes, but now T: TryFromBytes is essentially useless - to do anything useful, you need to specify T: TryFromBytes + AsMaybeUninit.

Neither option seems preferable to just omitting FromZeroes: TryFromBytes. Callers who require both can simply write T: FromZeroes + TryFromBytes.

(Note that the same points apply if we consider FromBytes: TryFromBytes)

impl<T: FromBytes> TryFromBytes for T

This conflicts with other blanket impls which we need for completeness:

  • impl<T: TryFromBytes> TryFromBytes for [T]
  • impl<const N: usize, T: TryFromBytes> TryFromBytes for [T; N]

As a result, we have to leave TryFromBytes and FromBytes as orthogonal. We may want to make it so that derive(FromBytes) automatically emits an impl of TryFromBytes, although in the general case that may require custom DST support.

Open questions

  • Is there any way to recover the blanket impl of TryFromBytes for T: FromBytes? Unlike FromZeroes: TryFromBytes, where you may need to perform runtime validation, if you know that T: FromBytes, then in principle you know that is_bit_valid can unconditionally return true without inspecting its argument, and so in principle it shouldn't matter whether you can construct a MaybeValid<Self>. Is there some way that we could allow FromBytes types to specify <Self as AsMaybeUninit>::MaybeUninit = () or similar in order to bypass the "only sized types or slices can implement AsMaybeUninit" problem?
    • One approach is to wait until a KnownLayout trait lands. There's a good chance that, under that design, we'd end up with FromZeroes: KnownLayout. If KnownLayout: AsMaybeUninit (or just absorbs the current definition of AsMaybeUninit into itself), it'd solve this problem since all zerocopy traits would imply support for MaybeValid.
  • In the first version of this feature, could we relax the Self: Sized bounds on try_from_ref and try_from_mut (without needing full custom-DST support)?
  • Should derive(FromBytes) emit an impl of TryFromBytes? What about custom DSTs?
  • What should the behavior for unions be? Should it validate that at least one variant is valid, or that all variants are valid? (This hinges somewhat on the outcome of rust-lang/unsafe-code-guidelines#438.)
  • What bounds should we place on T when implementing TryFromBytes for Unalign<T> (#320)?

Future directions

  • In this design, we ban interior mutability entirely. For references, this is unavoidable - e.g., if we were to allow types containing UnsafeCell in try_from_ref, then the user could obtain an &UnsafeCell and a &[u8] view of the same memory, which is unsound (it's unsound to even exist under Stacked Borrows, and unsound to expose to safe code in all cases). For values (i.e., try_read_from), we'd like to be able to support this - as long as we have some way of performing validation, it should be fine to return an UnsafeCell by value even if its bytes were copied from a &[u8]. Actually supporting this in practice is complicated for a number of reasons, but perhaps a future extension could support it. Reasons it's complicated:
    • is_bit_valid operates on a NonNull<Self>, so interior mutability isn't inherently a problem. However, it needs to be able to call a user's custom validator, which instead operates on a &Self, which is a problem.
    • Even if we could solve the previous problem somehow, we'd need to have is_bit_valid require that it's argument not be either experiencing interior mutation or, under Stacked Borrows, contain any UnsafeCells at all. When the NonNull<Self> is synthesized from a &[u8], this isn't a problem, but if in the future we want to support type-to-type conditional transmutation, it might be a problem. If, in the future, merely containing UnsafeCells is fine, then we could potentially design a wrapper type which "disables" interior mutation and supports field projection. This might allow us to solve this problem.
  • #590

Prior art

The bytemuck crate defines a CheckedBitPattern trait which serves a similar role to the proposed TryFromBytes.

Unlike TryFromBytes, CheckedBitPattern introduces a separate associated Bits type which is a type with the same layout as Self except that all bit patterns are valid. This serves the same role as MaybeValid<Self> in our design. One advantage for the Bits type is that it may be more ergonomic to write validation code for it, which is important for manual implementations of CheckedBitPattern. However, our design expects that manual implementations of TryFromBytes will be very rare. Since CheckedBitPattern's derive doesn't support custom validation, any type with safety invariants would need a manual implementation. By contrast, the TryFromBytes derive's support for a custom validation function means that, from a completeness standpoint, it should never be necessary to implement TryFromBytes manually. The only case in which a manual implementation might be warranted would be for performance reasons.

Determine our MSRV

Determine the minimum Rust version which Zerocopy compiles on, and add tests (see #12) to make sure that we support that MSRV.

Document soundness requirements around references

Migrated from https://fxbug.dev/108820

Currently, FromBytes and AsBytes are documented as simply requiring that a type may be converted from an arbitrary sequence of bytes or may be converted to a sequence of bytes (respectively). However, this isn't actually sufficient in practice given the way we use these traits. In particular, we allow converting from &[u8] to &T (where T: FromBytes) and from &T to &[u8] (where T: AsBytes). The UnsafeCell<T> type can be converted from an arbitrary sequence of bytes if T can. However, if we were to implement FromBytes for UnsafeCell<T>, it would make these reference conversions unsound, as code with a &T could perform interior mutability which code with the &[u8] wouldn't know about.

We need to expand the soundness requirements on FromBytes and AsBytes to explicitly mention this reference-safety requirement.

See also #251 for an alternate approach.

Add tests for compilation failure

We use the trybuild crate to test that zerocopy-derive correctly rejects code that it ought to. We have the same running for zerocopy itself, but no tests have been written to make use of it. zerocopy's API uses trait bounds to make certain illegal code impossible to express; we should write trybuild tests that make sure that each illegal code pattern actually fails to compile.

Contributing

If you'd like to contribute to this issue, just look through zerocopy's API, find unsound behavior that ought to be impossible to express, and write a test to confirm that it's inexpressible!

If you're not sure where to start, consider:

  • Does the API have any documented preconditions which ought to be enforced at compile time?
  • Does the API have any trait bounds?

Write a test which fails to uphold one of these requirements.

Here's a list of all of the APIs in zerocopy that have preconditions or trait bounds:

  • FromBytes (@IsaacCloos assigned)
    • read_from
    • read_from_prefix
    • read_from_suffix
  • AsBytes
    • as_bytes_mut
    • write_to
    • write_to_prefix
    • write_to_suffix
  • LayoutVerified
    • new
    • new_from_prefix
    • new_from_suffix
    • new_slice
    • new_slice_from_prefix
    • new_slice_from_suffix
    • new_zeroed
    • new_from_prefix_zeroed
    • new_from_suffix_zeroed
    • new_slice_zeroed
    • new_slice_from_prefix_zeroed
    • new_slice_from_suffix_zeroed
    • new_unaligned
    • new_unaligned_from_prefix
    • new_unaligned_from_suffix
    • new_slice_unaligned
    • new_slice_unaligned_from_prefix
    • new_slice_unaligned_from_suffix
    • new_unaligned_zeroed
    • new_unaligned_from_prefix_zeroed
    • new_unaligned_from_suffix_zeroed
    • new_slice_unaligned_zeroed
    • new_slice_unaligned_from_prefix_zeroed
    • new_slice_unaligned_from_suffix_zeroed
    • into_ref
    • into_mut
    • into_slice
    • into_mut_slice
    • bytes
    • bytes_mut
    • read
    • write
    • Debug for LayoutVerified<B, T>
    • Debug for LayoutVerified<B, [T]>
    • Deref for LayoutVerified<B, T>
    • Deref for LayoutVerified<B, [T]>
    • DerefMut for LayoutVerified<B, T>
    • DerefMut for LayoutVerified<B, [T]>
    • Display for LayoutVerified<B, T>
    • Display for LayoutVerified<B, [T]>
    • Ord for LayoutVerified<B, T>
    • Ord for LayoutVerified<B, [T]>
    • PartialOrd for LayoutVerified<B, T>
    • PartialOrd for LayoutVerified<B, [T]>
    • Eq for LayoutVerified<B, T>
    • Eq for LayoutVerified<B, [T]>
    • PartialEq for LayoutVerified<B, T>
    • PartialEq for LayoutVerified<B, [T]>
  • Unalign
    • get
    • AsBytes for Unalign<T>
    • FromBytes for Unalign<T>
    • Copy for Unalign<T>
    • Clone for Unalign<T>
  • extend_vec_zeroed
  • insert_vec_zeroed
  • transmute! macro

Optimize caching in CI

In CI, we use the Swatinem/rust-cache GitHub Action to cache certain build artifacts across runs. However, it seems from the README that we might be using it in a suboptimal way. A few observations:

  • Zerocopy has only one dependency, and rust-cache only caches dependencies, not crate artifacts (note that zerocopy-derive has more dependencies - and dependencies that are expensive to compile - so presumably benefits much more than zerocopy)
  • We generate a custom cache key which includes:
    • A hash of the Cargo features, but this presumably shouldn't have any effect on dependencies (at least in our case, since none of our Cargo features depend on dependencies' Cargo features)
    • A hash of **/Cargo.lock, but this is documented to be the default behavior anyway

We should:

  • Figure out if there's a way to squeeze more benefit out for zerocopy (can we get rust-cache to cache any more artifacts?)
  • Don't include Cargo features in the cache
  • Don't include a hash of **/Cargo.lock in the cache
  • Go back over our use of rust-cache and its README with a fine toothed comb and see what else we're doing suboptimally

cc @Swatinem in case there's anything obvious that jumps out at you!

Improve documentation

This issue tracks improvements to our documentation. See our roadmap for the high-level vision that the tasks tracked here fit into.

Top-level docs

Framing

  • Documentation frames zerocopy as a general-purpose tool for replacing unsafe code, not just as a tool for parsing or another specific task. Done in #386.
  • Consider adding examples of domains in which zerocopy is used in order to convey the breadth of use cases and suggest that it's not primarily meant for any single use case

Use cases

  • Top-level documentation points users in the right direction for most common use cases
    • This documentation must not assume a familiarity with unsafe code, but should instead focus on use cases

Correctness and soundness

  • Documentation frames the "unsafe as the cryptography of the Rust world" idea
  • Documentation describes zerocopy's future-soundness guarantee
  • Documentation describes the steps we take to ensure soundness
    • Safety/soundness arguments, especially with regards to reasoning only in terms of documented guarantees by the language reference or the standard library documentation
    • All tests run under Miri, including under experimental memory models (currently: stacked borrows, tree borrows, and strict provenance)
    • Once we use them, mention formal modeling tools (#378)
  • Consider citing high-profile users who are comfortable with this

Module/item docs

  • Items (types, methods, etc) provide toy examples to demonstrate basic usage
  • Modules and items whose usage is not self-evident provide more involved examples that help to teach users the correct mental model
  • Modules and items call out common use cases that would be reasonable to solve using them
  • Modules and items call out common use cases that would not be reasonable to solve using them, but which it would be easy to mistakenly believe would be reasonable to solve using them (e.g., FromBytes for parsing without considering endiannness)

Miscellaneous

  • Fully-fledged implementation examples in a separate module or in the examples directory

@kupiakos has provided this writeup and offered that we can incorporate it into our docs; it includes a lot of good examples and explanations

Rename `LayoutVerified` to `Ref`

The LayoutVerified type wraps a byte slice reference. It is a witness type that guarantees that the length and alignment have been validated to be compatible with a type parameter, T. It's named based on this witness role (the layout has been "verified"), but the name is confusing for consumers. We should rename it to Ref or similar - it's a reference type that enables zero-copy operations, hence zerocopy::Ref.

Test in CI that we have the same MSRV in all source files

Our MSRV is used in a number of places:

  • Doc comment in src/lib.rs
  • .github/workflows/ci.yml
  • tests/trybuild.rs (added in #60, which hasn't merged as of this writing)
  • zerocopy-derive/tests/trybuild.rs

Since they all need to be updated manually, it's easy for them to get out of sync. We should add a CI test that does some basic grep'ing to verify that they're in sync.

Appendix: Other approaches

I also tried to keep a single source of truth - an MSRV.txt file in the repository root that was sourced by other files. I ran into a few problems:

  • I was able to do the following to use that file to generate the crate-level doc comment:
    //! # Minimum Supported Rust Version (MSRV)
    //! 
    #![doc = concat!("zerocopy's MSRV is ", include_str!("../MSRV.txt"), ".")]
    ...unfortunately, the cargo readme tool we use to generate our README.md is not able to parse this, and so simply stops parsing the doc comment as soon as it encounters the #![doc = ...] line.
  • In zerocopy-derive/tests/trybuild.rs, I wasn't able to replace #[rustversion::stable(1.61.0)] with an equivalent which used MSRV.txt

Tracking issue for proving soundness, preventing regressions, and documenting security ethos

We aim to make the following promise: zerocopy and any code produced by zerocopy-derive are sound on any supported toolchain/target, and will remain sound under any future compiler changes; in other words, this soundness is forwards-compatible.

This issue tracks all efforts related to proving that we uphold this guarantee, preventing regressions, documenting this guarantee, and ensuring that our documentation properly frames zerocopy's role in the broader Rust safety ecosystem.

As of this writing, there are a few known gaps:

  • We prove our soundness in terms of the Rust reference and stdlib docs, and these docs are wrong on some target architectures (see #383). We don't intend to do anything about this in the short-term other than document that it's a gap.
  • While Rust's memory model for pointer operations will likely never be stricter than "strict provenance", this isn't actually guaranteed. While we're currently compliant with strict provenance, we can't say that that's sufficient for complete forwards-compatible soundness until this is a guarantee provided by the reference. See #181 for more details.
    • In terms of how we message this, we have two options:
      • Document that we "aim to be" sound under any future compiler version, and cite our adherence to strict provenance as an example of this work.
      • Don't mention soundness under any future compiler version, and just cite our adherence to strict provenance as general evidence of how seriously we take soundness.
  • As written, our custom derives can be "tricked" into emitting unsound code (#388).
  • It may be possible to use the trivial_bounds feature to break our derives (#500).

Safety comments

#429 tracks ensuring that each unsafe block is annotated with a safety comment that proves its soundness, and that these proofs are anchored only on guarantees made by the reference or stdlib docs.

Strict provenance

#181 tracks abiding by "strict provenance", which is a model for pointer operations which is likely as strict as any future Rust memory model will be.

Formal modeling/verification

#378 tracks using formal modeling/verification tools to test or prove the correctness of some of our core algorithms.

Target architectures

Some target architectures have nonstandard semantics which may make some of our soundness arguments invalid. #383 tracks making it clear in our documentation that zerocopy may be unsound on architectures whose semantics fails to satisfy the guarantees provided by the Rust reference and stdlib docs.

Proc macro execution model

As described in #388, there are currently soundness holes in our custom derives. Some have been confirmed, and others have been hypothesized but not confirmed. This issue tracks identifying the remaining soundness holes and fixing them.

trivial_bounds

As described in #500, it may be possible to use the trivial_bounds feature to break our derives.

Document our security "ethos"

#482 tracks documenting our security "ethos" - the approach we take to ensuring that our code is correct and secure.

Document our relationship to the Safe Transmute project

#480 tracks documenting the relationship between zerocopy and the Rust project's Safe Transmute project.

Use Safe Transmute in our custom derives (behind a feature flag)

#481 tracks allowing users to opt-in to our derives only supporting types which pass both our analysis and the analysis implemented by the (unstable) BikeshedIntrinsicFrom feature (the placeholder name for the analysis implemented by the Safe Transmute project).

Inline many trait methods (in zerocopy and in derive-generated code)

Status

  • #341
  • Deny clippy::missing_inline_in_public_items in derive-generated code

zerocopy-derive

We don't care about inline attributes for zerocopy-derive itself, but we do care for code emitted by zerocopy-derive. We need to figure out a way to either modify zerocopy-derive's output or modify our zerocopy-derive tests so that, when running zerocopy-derive tests, missing #[inline] attributes generate warnings or errors. It might be possible to use the clippy::missing_inline_in_public_items lint for this, but I'm not sure.

zerocopy

Many zerocopy trait methods contain very little logic or no logic at all, but are currently not marked with any inline attribute, and so cannot be inlined across a crate boundary. We should change this.

#[inline] attributes are now enforced by Clippy as of #341.

Support `KnownLayout` trait and custom DSTs

Overview

  • Add a KnownLayout trait which encodes aspects of a type's layout, and allow deriving it
  • Make KnownLayout a super-trait of FromZeroes and AsBytes
  • Use KnownLayout to support synthesizing references to custom DSTs

Progress

  • Encode the layout of unsized types in a value that can be operated on in const code
  • Implement a runtime check to validate pointer type casts and compute pointer metadata
  • Support deriving KnownLayout for sized types
  • Support deriving KnownLayout for unsized types
    • Use built-in Rust APIs to query the size an alignment of an unsized type:
      • Be able to determine the alignment of an unsized type; either:
      • Be able to calculate the offset of the trailing field of a potentially-unsized type; either:
    • Alternatively, manually implement Rust's repr(C) layout algorithm
    • Implement the derive
  • Support KnownLayout in Ref, and use this to support KnownLayout in other places
    • Add #[doc(hidden)] Ref constructor which takes T: ?Sized + KnownLayout, and is named something like new_known_layout_name_to_be_bikeshedded
    • Add private Ref method named something like deref_known_layout_name_to_be_bikeshedded (and similar for deref_mut). This method should not have a T: FromBytes bound, and should instead be unsafe and require that the contained T is valid as a safety precondition (this allows it to be used in TryFromBytes).
    • Updating existing uses (such as TryFromBytes::try_from_ref) to internally construct a Ref and then use the appropriate private deref method
    • Figure out how to support methods that provide explicit lengths for the trailing slice element
  • #1162
  • Require KnownLayout to apply recursively to all fields to remain forwards-compatible with #494
    • We're still going to do this, but not for this reason. It's tracked elsewhere.
  • Figure out how to phrase "size validity" in APIs that support unsized types (currently, we just refer to size_of::<T>())
  • Support unsized repr(transparent) types (and maybe other reprs?)
    • Won't do this now; filed to track: #1477
  • Add FromBytes methods and/or Ref constructors which operate on the whole buffer (rather than just the prefix/suffix) and take an explicit trailing element count
  • Resolve all TODO(#29) comments
  • #1292
  • Once we're ready to make KnownLayout public (as of this writing, this will be in 0.8 per #671):
    • Finalize all names and remove #[doc(hidden)] from all items
    • Mark all slice functions/methods as deprecated
    • For APIs which only operate on references (not values), replace the implicit T: Sized bound with T: ?Sized + KnownLayout; merge these APIs with any private APIs which were added during development (such as Ref::new_known_layout_name_to_be_bikeshedded)
    • Audit all doc comments to make sure they're up to date. Especially:
      • FromBytes:: ref_from_prefix and ref_from_suffix have stale doc comments that refer to size_of::<Self>()
      • Ref::new, new_from_prefix, and new_from_suffix have stale doc comments that refer to size_of::<T>()

Motivation

TODO

Design

TODO

TODO: Mention this use case from @kupiakos: Might want to support a slice DST which has a custom length field, something like:

struct Foo {
    len_of_rest: U32<BE>,
    rest: [u8],
}

How can we support a) validating this field or, b) "fixing up" this field after conversion? E.g., if the normal result of conversion would result in a Foo with 4 elements in rest, but len_of_rest is 2, how do we make sure that conversion actually produces a Foo with 2 elements in rest?

Relationship to other traits

Ideally, we'd like KnownLayout to be a super-trait of FromZeroes (or TryFromBytes (#5) once FromZeroes: TryFromBytes). Unfortunately, FromZeroes supports types which KnownLayout cannot support: unsized types without a fixed representation (ie, repr(rust)). These types do not provide sufficient guarantees about their layout in order to satisfy KnownLayout's safety conditions. We have two options:

  • Restrict FromZeroes to only support types which KnownLayout can support, and then have FromZeroes: KnownLayout
  • Use KnownLayout as a bound on individual functions and methods

We intend for zerocopy to be a general-purpose library which supports use cases that do not require known representations (any use case which doesn't require a type's layout to correspond to a fixed specification, but only to be consistent within the context of a program's execution). The first option would preclude us from supporting these use cases, so we opt for the latter: we will use KnownLayout as a bound on individual functions and methods.

Deriving KnownLayout

Computing alignment

Use align_of_val_raw

Blocked on rust-lang/rust#69835

TODO

Use a #[repr(C)] type and field offset

Blocked on either rust-lang/rust#106655 or rust-lang/rust#69835

This design is prototyped in #576:

macro_rules! align_of {
    ($t:ty) => {{
        #[repr(C)]
        struct OffsetOfTrailingIsAlignment {
            _byte: u8,
            _trailing: $t,
        }

        trailing_field_offset!(OffsetOfTrailingIsAlignment, _trailing)
    }};
}

This design relies on the trailing_field_offset! macro added in #540 and described below. This macro relies on stabilizing rust-lang/rust#69835. Alternatively, if we can stabilize offset_of! (rust-lang/rust#106655) and add support for using offset_of! with unsized types, then we can replace trailing_field_offset! here with offset_of!.

Computing trailing field offset

The DstLayout type needs to know the offset of the trailing slice within a DST. For example, given Bar:

#[repr(C)]
struct Foo {
    u: u16,
    tail: [u8],
}

#[repr(C)]
struct Bar {
    u: u16,
    foo: Foo,
}

...the trailing tail: [u8] is at offset 4 (after 2 bytes for Bar.u and 2 bytes for Foo.u).

In order to compute trailing slice offset in the context of a custom derive, it's necessary to compute recursively. In particular, given the offset of the trailing field within the outermost type (in this case, Bar.foo) and the offset of the trailing slice within the inner type (in this case, Foo.tail), it's possible to compute the offset of the trailing slice within the outer type as the sum of these two values. In all cases, this is a recursive computation that bottoms out at an actual slice type, [T], whose trailing slice offset is 0.

Thus the challenge is, given the AST of an arbitrary, possibly dynamically-sized type, to produce code which computes the byte offset of the trailing field. We have a few options.

offset_of!

Blocked on rust-lang/rust#106655

We could simply wait for the standard library's offset_of! macro to stabilize and add support for unsized types.

addr_of!

Blocked on rust-lang/rust#69835

Another option is to rely on the standard library's addr_of! macro. Given a raw pointer to the beginning of a type, the expression addr_of!((*ptr).trailing_field) will compute the address of trailing field, and the raw pointer method offset_from can be used to compute the byte offset between these two pointers, effectively computing the trailing field offset.

Unfortunately, place projection the addr_of! macro performs a place projection, and even if the pointers are never dereferenced, place projections may only happen inside the bounds of a valid allocation. Per The Reference, the following is undefined behavior:

Performing a place projection that violates the requirements of in-bounds pointer arithmetic. A place projection is a field expression, a tuple index expression, or an array/slice index expression.

Thus, in order to use addr_of!, we need a valid allocation which is large enough that the field projection will not go out-of-bounds (the allocation does not need to be properly aligned). It is fairly trivial to produce a large allocation, even at const time:

const LARGE_ALLOCATION: NonNull<[u8]> = {
    const REF: &[u8; 1 << 16] = &[0; 1 << 16];
    let ptr: *const [u8; 1 << 16] = REF;
    let ptr: *const [u8] = ptr::slice_from_raw_parts(ptr.cast(), 1 << 16);
    unsafe { NonNull::new_unchecked(ptr.cast_mut()) }
};

This, however, isn't enough. We also need to bounds check our allocation to make sure that the field projection won't go out-of-bounds. While it's unlikely we'd ever encounter a type with a trailing field offset of 2^16 bytes, it's not impossible, and we can't exhibit undefined behavior if we encounter such a type.

In order to perform the bounds check, we need some way of obtaining an upper bound for the trailing field offset before we've actually computed it. The only way of doing this (to my knowledge) is to calculate the size of the smallest possible value of the type (ie, the value with 0 trailing slice elements). We can't construct such an instance in the general case, but we can construct a raw pointer to such an instance:

// A `*const [()]` with 0 elements.
let slc = core::ptr::slice_from_raw_parts(&() as *const _, 0);
let t = slc as *const T;

This relies on behavior which is currently not well-defined, but is in review as of this writing.

However, once we have this raw pointer, we need to know its size. The only way to do this is with the currently-unstable size_of_val_raw. Once we've computed the size of the smallest possible value for our type (ie, size_of_val_raw(t)), we have an upper bound for the offset of the trailing field; so long as this value is not larger than the size of our allocation, the field projection is guaranteed to be in-bounds, allowing us to soundly compute the trailing field offset.

Old text

Add a replacement for MaybeUninit which supports unsized types. Add support for conversions on custom dynamically sized-types (DSTs).

Status

Phase 1 is in review in #312.

Motivation

This proposal aims to solve two problems using a single, unified design.

Unsized MaybeUninit

The standard library's MaybeUninit type don't support wrapping unsized types. MaybeUninit is a core building block of important designs like TryFromBytes. So long as it doesn't support unsized types, designs like TryFromBytes also can't support unsized types.

Custom DSTs

While zerocopy supports conversions on slice types, it doesn't support conversions on custom dynamically-sized types (DSTs) like:

#[derive(FromZeroes, FromBytes, AsBytes)]
#[repr(C)]
struct UdpHeader { ... }

#[derive(FromZeroes, FromBytes, AsBytes)]
#[repr(C)]
struct UdpPacket {
    header: UdpHeader,
    body: [u8], // Unsized field makes this type a "custom DST"
}

Currently, users such as packet-formats instead write code like:

#[derive(FromZeroes, FromBytes, AsBytes)]
#[repr(C)]
struct UdpHeader { ... }

struct UdpPacket<B> {
    header: Ref<B, UdpHeader>,
    body: B,
}

The latter code is more cumbersome, and requires storing an extra pointer in order to refer to a UDP packet in memory.

The ability to support custom DSTs is a frequent request from our users.

Design

This design comes in two phases; the first phase can be implemented without the second phase, and will provide value on its own.

In Phase 1, support is added for a MaybeUninit-like type that supports wrapping both T: Sized and [T] where T: Sized. This will unlock the TryFromBytes design as described above. In the Phase 2, support is added for a KnownLayout trait which provides a superset of the functionality from Phase 1, and supports zero-copy conversion of arbitrary custom DSTs. A custom derive is provided KnownLayout.

Phase 1: MaybeUninit

The standard library's MaybeUninit<T> type has the same layout as T, but it has no bit validity constraints - any byte value, including an uninitialized byte, may be written at any byte offset in a MaybeUninit<T>. A replacement for this type just needs to have these semantics, and also needs to support unsized types.

Our design builds upon the fact that MaybeUninit exists and works for sized types. We define the following trait:

pub unsafe trait AsMaybeUninit {
    /// A type which has the same layout as `Self`, but which has no validity
    /// constraints.
    ///
    /// Roughly speaking, this type is equivalent to what the standard library's
    /// [`MaybeUninit<Self>`] would be if `MaybeUninit` supported unsized types.
    type MaybeUninit: ?Sized;
}

For Sized types, the implementation is trivial:

unsafe impl<T: Sized> AsMaybeUninit for T {
    type MaybeUninit = core::mem::MaybeUninit<T>;
}

For all other types, we use the standard library's MaybeUninit type as a building block to build up a type whose layout is the same as what MaybeUninit's would be if it supported unsized types:

unsafe impl<T: Sized> AsMaybeUninit for [T] {
    type MaybeUninit = [core::mem::MaybeUninit<T>];
}

unsafe impl AsMaybeUninit for str {
    type MaybeUninit = <[u8] as AsMaybeUninit>::MaybeUninit;
}

Finally, we add our own MaybeUninit type, which is simply a convenience wrapper:

#[repr(transparent)]
pub struct MaybeUninit<T: AsMaybeUninit + ?Sized> {
    inner: T::MaybeUninit,
}

// The equivalent impl for `MaybeUninit<T>` is already covered by the blanket impl for `T: Sized`.
unsafe impl<T: Sized> AsMaybeUninit for MaybeUninit<[T]> {
    type MaybeUninit = [<T as AsMaybeUninit>::MaybeUninit];
}

In Phase 1, these are the only supported unsized types. In Phase 2, we allow deriving AsMaybeUninit on arbitrary types, which adds support for custom DSTs.

Safety invariants

The safety invariants on AsMaybeUninit are somewhat complex. This is mostly a result of needing to support unsized types. For a more in-depth explanation of why supporting unsized types introduces a lot of complexity, see here.

pub unsafe trait AsMaybeUninit {
    /// A type which has the same layout as `Self`, but which has no validity
    /// constraints.
    ///
    /// Roughly speaking, this type is equivalent to what the standard library's
    /// [`MaybeUninit<Self>`] would be if `MaybeUninit` supported unsized types.
    ///
    /// # Safety
    ///
    /// For `T: AsMaybeUninit`, the following must hold:
    /// - Given `m: T::MaybeUninit`, it is sound to write any byte value,
    ///   including an uninitialized byte, at any byte offset in `m`
    /// - `T` and `T::MaybeUninit` have the same alignment requirement
    /// - It is valid to use an `as` cast to convert a `t: *const T` to a `m:
    ///   *const T::MaybeUninit` and vice-versa (and likewise for `*mut T`/`*mut
    ///   T::MaybeUninit`). Regardless of which direction the conversion was
    ///   performed, the sizes of the pointers' referents are always equal (in
    ///   terms of an API which is not yet stable, `size_of_val_raw(t) ==
    ///   size_of_val_raw(m)`).
    /// - `T::MaybeUninit` contains [`UnsafeCell`]s at exactly the same byte
    ///   ranges that `T` does.
    ///
    /// [`MaybeUninit<Self>`]: core::mem::MaybeUninit
    /// [`UnsafeCell`]: core::cell::UnsafeCell
    type MaybeUninit: ?Sized;
}

Let's walk through these:

  • The reason that we describe bit validity in terms of writing to an existing value (rather than producing a new value) is that it allows us to sidestep a lot of the complexity of defining size equivalence between unsized types. This requirement is just a generalization of what you'd write for sized types: something like, "it must be valid to produce a T::MaybeUninit containing any bit pattern, including uninitialized bytes."
  • We speak of T and T::MaybeUninit's alignment in prose rather than by referring to core::mem::align_of because that function requires that its type argument is sized.
  • The as cast requirement ensures that T and T::MaybeUninit are either both sized or have compatible unsized types. In particular, Rust prohibits sized-to-unsized casts; requiring both directions to be valid ensures that casts are sized-to-sized or unsized-to-unsized. The size equality constraint ensures that:
    • For sized types, sizes are equal
    • For custom DSTs, the trailing slice element's sizes are equal; thus, a T and a T::MaybeUninit whose trailing slices are of the same length have the same size
  • It is intended to be sound to synthesize two references to the same memory region where only one reference treats that region as containing an UnsafeCell. In other words, interior mutability soundness is a runtime property that is only violated when an UnsafeCell is used, not merely when a reference to it is constructed. Unfortunately, this is not actually currently guaranteed, and is unsound under the stacked borrows memory model. Thus, we need to ensure that UnsafeCells in T and T::MaybeUninit line up perfectly.

Pointer conversion

We want to be able to provide unsized equivalents of the assume_init_ref and assume_init_mut methods. However, the naive implementation doesn't work:

impl<T: AsMaybeUninit + ?Sized> MaybeUninit<T> {
    pub unsafe fn assume_init_ref(&self) -> &T {
        let ptr = (&self.inner) as *const T::MaybeUninit as *const T;
        unsafe { &*ptr }
    }
}

The *const T::MaybeUninit as *mut T cast isn't valid in a generic context where T: ?Sized because Rust doesn't know what type of pointers these are (thin, fat, what kind of fat pointer, etc). In order to get around this problem, we add the following methods to AsMaybeUninit:

fn raw_from_maybe_uninit(maybe_uninit: *const Self::MaybeUninit) -> *const Self;
fn raw_mut_from_maybe_uninit(maybe_uninit: *mut Self::MaybeUninit) -> *mut Self;

This allows us to get assume_init_ref and assume_init_mut to work:

impl<T: AsMaybeUninit + ?Sized> MaybeUninit<T> {
    pub unsafe fn assume_init_ref(&self) -> &T {
        let ptr = T::raw_from_maybe_uninit(&self.inner);
        unsafe { &*ptr }
    }
}

Phase 2: KnownLayout

TODO

TODO: Mention that this will require removing the blanket impl of AsMaybeUninit for T: Sized, which will be a breaking change. We need to not publish in between Phases 1 and 2.

Alternatives

  • For MaybeUninit, we could propose a change to Rust to add limited support for unsized unions, which would allow MaybeUninit (which is a union type) to support unsized types. Even if such a proposal were accepted, it would likely take months or years to be stabilized and thus available for our use.

  • For custom DSTs, we could wait until the ptr_metadata and layout_for_ptr features stabilize, which would allow us to rewrite Ref like this:

    use core::ptr::Pointee;
    
    pub struct Ref<B, T: ?Sized> {
        data: *mut (),
        meta: <T as Pointee>::Metadata,
        _marker: PhantomData<(B, T)>,
    }

    For Sized types, T::Metadata would be (), and so Ref would consume only a single word (rather than two, as it does today).

    On its own, we could still support slice types (T = [U]) as we do today using from_raw_parts and from_raw_parts_mut. However, using layout_for_ptr, we could also determine a DST's prefix and element sizes and thus support arbitrary DSTs rather than just slices.

    This design would work, but it's unlikely that either of these features will stabilize soon, and we need something in the meantime. The design presented also unblocks unsized MaybeUninit, which these features wouldn't help with.

Split ByteSlice::split_at into separate trait

Migrated from https://fxbug.dev/76635

Currently, we don't implement ByteSlice for Vec<u8> because it would be expensive to implement the split_at method. However, most uses of ByteSlice don't make use of this method. We should split ByteSlice into multiple traits and only use the trait with the split_at method where it's actually necessary. Vec can then implement the base trait but not the trait with the split_at method.

Possible names for these traits: ByteSlice and SplittableByteSlice.

Test that `cargo readme` output matches `README.md`

We use the following command to auto-generate the contents of README.md from the crate-level doc comment in src/lib.rs:

cargo readme | sed 's/\[\(`[^`]*`\)]/\1/g'

The sed command removes code links like:

/// Here is a link to [`Vec<u8>`].

We should add a test that the output of that command matches the current contents of README.md, which will ensure PRs which edit that doc comment but do not update README.md can't be spuriously merged.

Support container conversions (and maybe other container types?)

Currently, we support T -> U conversions using the transmute! macro when T: AsBytes, U: FromBytes, and T and U have the same size. In principle, we ought to be able to support a range of conversions for container types wrapping T and U so long as those containers have well-defined layouts. Specifically:

  • Slices
  • Box (the docs guarantee Box's layout for Sized types)
  • ManuallyDrop (the docs guarantee that ManuallyDrop<T>'s layout is equivalent to T's)
  • Wrapping (the docs guarantee that Wrapping<T>'s layout is equivalent to T's)

...and possibly others.

This will help unblock ICU4X using zerocopy.

Here's one approach we could take: #1183

Simplify zerocopy-derive-generated code

We ought to be able to simplify the code generated by zerocopy-derive's AsBytes impl by adding the following module to zerocopy:

/// Utilities used by `zerocopy-derive`.
///
/// These are defined in `zerocopy` rather than in code generated by
/// `zerocopy-derive` so that they can be compiled once rather than recompiled
/// for every pair of type and trait (in other words, if they were defined in
/// generated code, then deriving `AsBytes` and `FromBytes` on three different
/// types would result in the code in question being emitted and compiled six
/// different times).
#[doc(hidden)]
#[allow(missing_debug_implementations)]
pub mod derive_util {
    /// Implemented for `Bool<true>`.
    pub trait True {}

    /// A boolean constant value which can be used in type bounds.
    pub struct Bool<const TERM: bool>();

    impl True for Bool<true> {}

    /// Does the struct type `$t` have padding?
    ///
    /// `$ts` is the list of the type of every field in `$t`. `$t` must be a
    /// struct type, or else `has_padding!`'s result may be meaningless.
    ///
    /// Note that `has_padding!`'s results are independent of `repr` since they
    /// only consider the size of the type and the sizes of the fields. Whatever
    /// the repr, the size of the type already takes into account any padding
    /// that the compiler has decided to add. Note that while this is *probably*
    /// also true of `repr(rust)`, the author is not confident of that fact, and
    /// it should not be relied upon for soundness.
    #[doc(hidden)] // `#[macro_export]` bypasses this module's `#[doc(hidden)]`.
    #[macro_export]
    macro_rules! has_padding {
        ($t:ty, $($ts:ty),*) => {
            core::mem::size_of::<$t>() > 0 $(+ core::mem::size_of::<$ts>())*
        };
    }
}

Then, for a type like:

#[derive(AsBytes)]
#[repr(C)]
struct Foo(u8, u16);

...we'd emit an impl like:

unsafe impl AsBytes for Foo where Bool<{!zerocopy::has_padding!(Foo, u8, u16)}>: True {
    fn only_derive_is_allowed_to_implement_this_trait()
    where
        Self: Sized,
    {}
}

Notably, this ought to make it a very simple change to support type parameters by emitting code like:

unsafe impl AsBytes for Bar<T> where Bool<{!zerocopy::has_padding!(Bar<T>, T)}>: True {
    fn only_derive_is_allowed_to_implement_this_trait()
    where
        Self: Sized,
    {}
}

Of course, that code isn't a supported use of const generics today, but whenever it is in the future, the change will be simple.

We might be able to use a similar technique to simplify impls of other traits as well.

Sync zerocopy and zerocopy-derive version numbers, have zerocopy depend on an exact version of zerocopy-derive

This has a few advantages:

  • It helps make it clear that zerocopy-derive is just an implementation detail of zerocopy, and shouldn't be depended upon directly
  • It makes reasoning about compatibility simpler by effectively making both crates part of a single codebase. So long as the code in zerocopy is compatible with the code in zerocopy-derive in the same Git commit, then publishing them both is fine. No more need to consider compatibility between a given version of zerocopy and a range of versions of zerocopy-derive.

Note that serde does this too. Serde has a ton of experience with these sorts of issues, so all else being equal, it's a reasonable idea to do what serde does.

Write generic transmute

Migrated from https://fxbug.dev/82795

The zerocopy transmute! macro can only be called in a context in which both the input and output types are concrete. As suggested in this comment (reproduced below), it should be possible to make a transmute function which can operate on generic types. This would require a substantial increase to zerocopy's API surface, and more code in zerocopy-derive, so it should only be done if a use case arises.

Copy of the comment:

One way you could get rid of this is by adding a

unsafe trait FromByteSized<const SIZE: usize>: FromBytes + Sized {}

that your derive macros generate, and then implement your transmute function as

fn transmute<T, U, const N: usize>(x: T)
where
  T: AsBytesSized<N>,
  U: FromBytesSized<N>,
{
  let eyepatch = ManuallyDrop::new(x);
  unsafe { mem::transmute_copy(&*eyepatch) }
}

If no type parameters are provided, Rust will correctly infer this, although you cannot yet write something like transmute::<_, U, _>(). This behavior isn't any better than the macro, but it makes it a Real Function at least. It is unclear to what degree this can be made to work in a generic context... probably propagating those bounds and the extra const parameter is "enough".

In the future, you could have AsBytesSized<N> provide an into_bytes() function that produces an unaligned [u8; N].

See: https://godbolt.org/z/vnbx9Y843

I took a quick stab at implementing a prototype of this (using a separate Size<const N: usize> trait), and ran into a few problems:

  • While it technically supports generics, if one of the types is concrete and the other is generic, the generic type must have a bound of the form Size<N> where N is a compile-time constant. You'd end up with a bound like T: FromBytes + Size<60>. Not the end of the world, but kind of ugly.
  • I can't figure out how to implement Size for arrays. There's no way to perform multiplication in a const generic context, so the following is illegal:
    impl<T> Size<0> for [T; 0] {}                                  // OK
    impl<T: Size<N>, const N: usize> Size<N> for [T; 1] {}         // OK
    impl<T: Size<N>, const N: usize> Size<{ N * 2 }> for [T; 2] {} // Illegal

Support generic parameters when deriving `AsBytes` on a `#[repr(transparent)]` type

Migrated from https://fxbug.dev/84475

Originally reported by @Kestrer

Hi, I'm not quite sure how to open an issue with Zerocopy directly so I hope I put this in the right place.

I'm the maintainer of the bounded-integer crate (https://docs.rs/bounded-integer) and I would like to support implementing the Zerocopy traits on the bounded integers. My code looks something like this:

type Inner = u8;

#[derive(Unaligned, AsBytes)]
#[repr(transparent)]
pub struct Bounded<const MIN: Inner, const MAX: Inner>(Inner);

However zerocopy complains because it doesn't support generic parameters.

Comment by @cmyr

I've also just run into this, and I'd be happy to prepare a patch if that would be welcome (and if I can remember how gerrit works).

I've taken a quick skim through the current impl, and stared into space for a few minutes, and it isn't obvious to me why this constraint exists, currently. I trust there is a reason (given that the limitation does not exist for FromBytes) but it isn't obvious to me what it is, so any useful information here would be welcome. :)

Comment by @joshlf

Hi folks, sorry for the delay in responding to this.

Generic parameters are supported for Unaligned - try deriving that trait on its own without AsBytes.

The reason that AsBytes doesn't support generic parameters is that I haven't figured out a way to ensure that the resulting type has no padding, which is a requirement for AsBytes. In particular, consider the following type:

#[repr(C)]
struct Foo<T, U> {
    t: T,
    u: U,
}

In order for Foo<T, U> to be AsBytes, there can't be any padding. That, however, depends on both the sizes and alignments of T and U. I haven't figured out how to write an impl bound that expresses the right requirements.

Take a look at the implementation of the AsBytes derive for how we do this when there are no generic parameters - we use const code to compare the size of the type to the sum of the sizes of its fields. That approach doesn't translate since const generics aren't powerful enough yet to just use the same code in an impl bound. We'd need another approach.

Comment by @joshlf

Update: Some of the other developers realized that we can probably support your specific use case more easily since your type is #[repr(transparent)], and thus we don't need to worry about any layout computation - we just need to bound Inner: AsBytes. I'm changing this issue to track that more narrow goal.

Add `zerocopy::byteorder::NE` typealias for `zerocopy::byteorder::NativeEndian`

Title says it all.

U32<NativeEndian> is a real mouthful, and working around it by including type NE = NativeEndian; in each file it gets used in would be really unfortunate...

If I had to guess, the reason this doesn't exist yet was because it might lead to confusion with NetworkEndian? That's just a guess of course, but in case that's right: I don't think it's too confusing, given that the Rust standard library already has the convention of implementing methods in triples of be/le/ne (e.g: uXX::{to,from}_{be,le,ne}_bytes())

Make derive macros hygienic

Status

  • core items are referenced as ::core::xxx rather than core::xxx
  • zerocopy-derive supports a #[zerocopy(crate = "...")] annotation

Crate name annotation

Sometimes, our derives will be used in a context in which zerocopy isn't called zerocopy. For example, this might happen if zerocopy itself is re-exported from another crate, or if a crate wishes to support deriving zerocopy traits from multiple zerocopy versions (see e.g. #557).

We can take inspiration from Serde, which defines the crate attribute option:

#[serde(crate = "...")]

In other words, we can do:

#[zerocopy(crate = "...")]

However, this isn't enough. For users who wish to invoke derives from multiple versions of zerocopy-derive at once, we need a way of disambiguating which attributes are meant to be consumed by which version of zerocopy-derive. We could provide a disambiguation option like so:

#[zerocopy(crate = "...", derive-version = "...")]

This would let us write code like the following (from #557):

#[cfg_attr(feature = "zerocopy_0_7", derive(zerocopy_0_7::FromBytes))]
#[cfg_attr(feature = "zerocopy_0_8", derive(zerocopy_0_8::FromBytes))]
#[zerocopy(crate = "zerocopy_0_7", derive-version = "0.7")]
#[zerocopy(crate = "zerocopy_0_8", derive-version = "0.8")]
struct Foo {
    ...
}

In this example, each zerocopy-derive would use derive-version to filter out attributes not meant for that version.

Core re-export

We can't rely on core being in scope (or referring to the "real" core crate). However, we can rely on zerocopy being in scope (possibly renamed, as described above). If we re-export core from zerocopy, then we can emit code that doesn't refer to ::core, but instead refers to ::zerocopy::core_reexport.

Automatically implement `Unaligned` without custom derive

Once the associated_const_equality feature is stabilized, we ought to be able to automatically implement Unaligned without needing a custom derive:

#![feature(associated_const_equality)]

unsafe trait Align {
    const ALIGN: usize;
}

unsafe impl<T> Align for T {
    const ALIGN: usize = core::mem::align_of::<T>();
}

unsafe trait Unaligned {}

unsafe impl<T: Align<ALIGN = 1>> Unaligned for T {}

Note that we may not want to do this - it would mean that types would no longer need to opt-in to implementing Unaligned as they do today. This probably isn't a huge deal since you can't do anything with Unaligned on its own (FromBytes and AsBytes are the traits that really unlock the ability to muck with a type's internal state), but at a minimum it would make the API inconsistent. One option would be to just make Align<ALIGN = 1> a bound so that Unaligned could become a safe trait that just represents the fact of opting-in:

trait Unaligned: Align<ALIGN = 1> {}

Test more conditions in GitHub actions

Use GitHub actions to test the following axes:

  • Currently stable, current beta, current nightly, MSRV stable (1.51.0)
  • All of the different targets that affect us (currently, this is only relevant for the simd and simd-nightly features, which emit impls for various architecture-specific types)
  • All of the Cargo features we support
  • cargo check, cargo test, and cargo miri test

Note that, depending on how we implement this, some combinations of the above may not be possible to run. E.g., if we run on an x86_64 machine, it may be possible to run cargo miri test while targeting powerpc (I'm not sure about this), but it definitely won't be possible to run cargo test while targeting powerpc.

As a possible stretch goal, also include a test to make sure that README.md is kept up to date (#18).

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.