rust-lang / unsafe-code-guidelines Goto Github PK

Forum for discussion about what unsafe code can and can't do

Home Page: https://rust-lang.github.io/unsafe-code-guidelines

License: Apache License 2.0

unsafe-code-guidelines's Introduction

UCG - Rust's Unsafe Code Guidelines

The purpose of this repository is to collect and discuss all sorts of questions that come up when writing unsafe code. It is primarily used by the opsem team to track open questions around the operational semantics, but we also track some "non-opsem" questions that fall into T-lang or T-type's purview, if they are highly relevant to unsafe code authors.

The Unsafe Code Guidelines Reference "book" is a past effort to systematize a consensus on some of these questions. It is not actively maintained any more, but can still be a good source of information and references. Note however that unless stated otherwise, the information in the guide is mostly a "recommendation" and still subject to change.

Current consensus is documented in t-opsem FCPs and the Rust Language Reference.

Code of Conduct and licensing

All interactions on this repository (whether on issues, PRs, or elsewhere) are governed by the Rust Code of Conduct.

Further, all content on this repository is subject to the standard Rust licensing.

unsafe-code-guidelines's People

Contributors

Stargazers

Watchers

unsafe-code-guidelines's Issues

architectures that don't support byte-level atomicity

In general, I think there is a reasonable concern that if you have architectures where atomicity below a certain level (say, word, or cache line) is not guaranteed, then Rust's type system will encounter problems. This has little to do with vectors: just imagine a struct Foo { a: u8, b: u8 }. If you borrow a and b independently, you can send them to distinct threads and mutate them in safe code. On x86 and other "common" architectures (afaik), that is not a problem. If we want to target one where writing to adjacent bytes might generate undefined results, then we would need some solution.

Some thoughts I have had:

we could (on such architectures) use atomic instructions when writing values of type u8 (at least if we suspect there may be multiple threads involved),
or else use modified unsafe rules on those architectures that make the borrows themselves unsafe (portability hazard, but plausible).

The C standard formalized in Coq

Link: http://robbertkrebbers.nl/thesis.html

This PhD thesis attempts to formalize the sequential (single-threaded) part of the C11 standard. The core contribution is a memory model that can explain the behavior of mixing "low-level" (byte-wise) and "high-level" C -- which kinds of pointer arithmetic and pointer casts are allowed, what exactly happens with pointers to field of unions as the union is overwritten, and so on. Type-based alias analysis is formalized and proven correct. A few questions remain unanswered, mostly related to pointer-to-integer casts (also see #30).

Some examples discussed in the thesis include:

union int_or_short { int x; short y; } u = { .y = 3 };
int *p = &u.x;
// p points to the x variant of u
short *q = &u.y; // q points to the y variant of u

short z = *q;
// u has variant y and is accessed through y -> OK
*p = 10;
// u has variant y and is accessed through x -> bad

union int_or_short { int x; short y; } u = { .x = 3 };
printf("%d\n", u.y); // this is legal, because of type punning

union int_or_short { int x; short y; } u = { .x = 3 };
short *p = &u.y;
printf("%d\n", *p); // this is bad, type punning does not apply

Representation of pointers

Crucially, pointers in this model are not numbers. They are paths in a tree that define how to traverse memory, from the root of a block, through the fields of structs and unions and the indices in an array to the destination of the pointer.
See, for example, the image on page 72: The pointer to s.u.x[1] is the path that starts at the root, picks the u field of the struct, then the x field of the union, then the index 1 in the array. This way, the actual layout of structs and unions can remain uncertain, and pointer arithmetic that goes beyond array bounds becomes UB.

Relevance for Rust

Hopefully, we won't have type-based alias analysis or type punning in Rust. Still, some parts of the model may remain relevant: We have to decide how much pointer arithmetic we allow. The exact layout of structs and enums is left to the compiler (and we want to have optimizations like representing Option<Box> without a discriminant), so we probably want pointer arithmetic that makes assumptions about this layout to be UB. The tree-based addressing could be used to appropriately model this underspecification.

(Sorry my summary is not as good as the ones Niko wrote... )

Layout of single-field structs

PR #31 identified this as an area for continued discussion. Specifically, if you have a struct with single field (struct Foo { x: T }), should we guarantee that the memory layout of Foo is identical to the memory layout of T (note that ABI details around function calls may still draw a distinction, which is why #[repr(transparent)] is needed). What about zero-sized types like PhantomData?

@rkruppe wrote:

Long ago I proposed that we might want to guarantee (some subset of) newtype unpacking for repr(Rust) structs. @nikomatsakis carried this over into #11 as discussion point but it received no further discussion. I like to think that means it's uncontroversial 😄 I've also never heard of any reason why one might not want that to be true.

To make a specific proposal, let's restrict it to structs [1] that contain a single field having the same memory layout as the type of the sole field. So struct Foo<T>(T); and struct Foo<T> { x: T } would be laid out like T in memory, though possibly still passed differently in function calls.

[1] The same guarantee for (T,) are already covered by the special case of homogeneous tuples being laid out like arrays that is already in this PR.

Representation of bool, integers and floating points

This issue is to discuss the memory layout for integral and floating point types:

bool
u8..u128, i8..i128
usize, isize
f32, f64

For the most part, these are relatively uncontroversial. However, there are some interesting things worth discussing:

Unlike other types, there are no #[repr(C)] vs #[repr(Rust)] variants here. The size is always fixed and well-defined across FFI boundaries. The types map to their corresponding integral types in the surrounding C ABI.
Prior discussions ([#46156][], [#46176][]) documented bool as a single byte that is either 0 or 1.
How is usize intended to be defined on various platforms?
- the native size of a pointer?
- the max of various other considerations?
- other edge cases to consider?
Rust currently states that the maximum size of any single value must fit in with isize
- Can we say a bit more about why? (e.g., ensuring that "pointer diff" is representable
Do we want to discuss signaling NaN at all? Specifically: why is it potentially of concern, and are there things that unsafe authors or other folks need to be aware of? (@gankro, for example, wrote that "NaN masking is unnecessary from LLVM's perspective", but I don't really know what that means. =)

Validity of aggregate types (structs, enums, tuples, arrays, ...)

Discussing what the validity invariants of aggregate types are (and assembling a full list of aggregate types).

Safe compound types include enums, structs, tuples, arrays, slices, closures, generators, SIMD vectors.

The obvious invariant is

If applicable: The discriminant matches a possible variant (for enums). This applies to repr(C) enums as well! See rust-lang/rust-memory-model#41 for some discussion of that specific case.
All fields (of the active variant, for enums) are valid at their respective type.
All bytes not covered by any field ("padding") may have arbitrary content (including uninitialized).

Is there any exception? Currently at least, generators are an exception: Their fields may be uninitialized, leading to special cases in both layout computation code and Miri.

(I put these all together because my expectation is that there's not much to say here. We can split this up into several topics if that seems necessary.)

Validity of char

Discussing the validity invariant of the char type.

The "obvious" choice is that it must be a valid unicode codepoint, and must not contain any uninitialized bits.

However, a possible issue with this choice is that this means we will have to extend the set of valid bit patterns whenever new codepoints get added to unicode. Is that a problem, e.g. when old and new code interact? On first glance it seems like this will only make fewer programs have UB. (@nikomatsakis I think this is related to your "future proofing" concern that you raised elsewhere. Here might be a good place to discuss it with a concrete example.)

Is the address of a local variable "allocated on the stack"?

Code sometimes has a need to get an address from the stack — often for debugging, but sometimes for other reasons. Consider this snippet of code from rayon-rs/rayon#571:

let base = 0;
let addr = &base as *const _ as u64;

the goal here is just to get "some stack pointer" to use as a (quite weak) seed for a PRNG. This feels... sketchy to me, I thought it was worth noting.

Collect discussion points: requirements on the C platform

Almost all layout-related discussions add requirements on the platform in one way or another (e.g. due to repr(C)), and I think it is worth it to have these all in one place.

For example, while discussing the definition of the layout of bool, one definition is that "bool is C compatible" and another one that "bool has size 1 where 0 represents false and 1 represents true". If we require the C platform to have sizeof(_Bool) == 1 both definitions are correct, and if we don't explicitly require that somewhere, we would be implying it later on anyways if we specify that bool is a "proper" C type.

Because not everyone is on Zulip, I'd like to start collecting discussion points for a minimal specification of what the C platform is and what requirements does Rust impose on it in the context of type layout (no calling conventions, etc.). I can send a PR afterwards with more points.

For background, @gankro wrote an excellent document about this: https://gankro.github.io/blah/rust-layouts-and-abis/

Goal of the discussion

The goal is to write down all the current agreed (in other UCG discussions) requirements that Rust has on the C platform and put them in one place to avoid repeating these in all other discussions. This list can evolve over time, and trade-offs on layouts can influence the platform requirements.

Some discussion points

Is a C platform required or optional?
If a C platform is optional, can the layout of #[repr(C)] types be platform-dependent ? If so, what's their layout when there is no C platform?
Which optional parts of a C platform are relevant for Rust? e.g. floats, maybe atomics?
Does a C platform have to be C standard compliant? This allows us to only specify stricter requirements than what e.g. C17 guarantees.
@gankro's post mentions the following requirements that go beyond what C17 guarantees:
- Have 8-bit, unaligned bytes (chars)
- Have a boolean be a byte, where true = 1 and false = 0
- Have integers be two's complement
- Have IEEE 754(-2008?) binary floats, if they exist (e.g. we're comfortable with just disabling floats)
- Be at least 16-bit (just in terms of pointer size, I think?)
- Have NULL be 0 (although things may be mapped to 0, but that's messy since references can't be NULL)
Which points of @gankro's list do we need to settle during the layout discussions? They all look reasonable to me. This could be an initial specification.
Maybe offtopic: we might have to define / explain what a C platform is. Does C++ requires a C platform? Maybe we could look up how C++ does this.

Canvas unsafe code in the wild

I think we should organize a kind of "canvas" to find examples of how unsafe code is used in the wild. To start, it'd be great to enumerate a list of interesting places to look.

Here is my start at a list. Further nominations welcome. I'll try to keep the list up to date. Moreover, if you feel you've examined the source, open any relate issues and we can check it off.

Standard library:
- btree, esp. the iterators
- hashmap
- ...?
External libraries:
- rayon
- take_mut
- abomonation
- ndarray
- intrusive_collections by @Amanieu, cited in #19
- rental

Other thoughts for packages? Is this a fruitful thing to examine?

What about: volatile accesses and memory-mapped IO

Folks who want to write drivers and embedded code using Rust need to have a way to guarantee exactly-once access to certain memory locations. Today, the embedded wg makes extensive use of @japaric's VolatileCell crate, along with RegisterBlock structures containing VolatileCell wrappers around each field of the register block, and a function to provide a single access to the register block at a fixed address. The API exposed in the the stdm32f103xx crate and similar only expose *const RegisterBlock values (example) from the overall Peripherals object. This then requires unsafe code to access and mutate any particular field.

Asks:

Is this pattern sufficient to guarantee that the number of writes to IO-mapped memory will exactly match the number of calls to unsafe { (*x.volatile_cell_field).set(...) }, and that the number of reads will exactly match the number of calls to unsafe { (*x.volatile_cell_field).get(...) }? it seems like it should be.
Is it possible to provide the same guarantee while exposing the register block via a safe reference type such as &? It would be possible to provide a custom RegisterRef<'a, T> that consisted of a raw pointer internally as well as a custom derive for projecting this to fields of the register block, but this seems unfortunately complicated and unergonomic.

Complicating factors:

LLVM's precise definition of "volatile" is a bit shakey. It says that optimizers must not change the number of volatile operations or change their order of execution relative to other volatile operations. However, it doesn't seem to specify that non-volatile operations can't be inserted-- this is something we need to prevent, but which LLVM might insert in an attempt to pre-load a value (as allowed by the "dereferencable" attribute that we apply to references). Can we make sure that LLVM doesn't do such a thing? If we fail in that, could we potentially make the compiler understand that VolatileCell is special, similar to UnsafeCell, and cannot have "dereferenceable" applied to references to it (and objects that contain it), in order to prevent this misoptimization? This seems potentially more complicated and intrusive, but IMO still worth considering.

cc @RalfJung @kulakowski @teisenbe @rkruppe

Representation of enums

Discussion topic about how enums are represented in Rust. Some things to work out:

What are the #[repr] options available for enums?
- RFC 2195 defined the layout of #[repr(C)] enums with payloads.
- RFC 2363 offers a proposal to permit specifying discriminations
When do we currently perform Option<T>-like layout optimizations?
When do we guarantee Option<T>-like layout optimizations?
- For any Option-like enum? What about things like Result<T, ()>?
- What are the conditions on T? (Obviously, must have some undefined values)
- the Rustonomicon has some text on this topic
Are there important enum variant optimizations we want freedom to do in the future that we might want to keep in mind?
Size of empty enums and !: defined to be 0
C-like enums: define, what does it say about representation?
- document the effect of #[repr(C) and friends here

stable addresses for local variables, etc

When do we guarantee "stable" addresses? (Meaning that the integral value of a pointer remains the same). Note that addresses are visible to safe code via as conversions.

Some examples:

Local variables

let x = 22;
{ let y = &x; .. }
...
{ let z = &x; .. }

Are the integral values of y and z guaranteed to be equal? It might be useful if they were not, since a compiler could keep x in a register and spill it to memory in different spots on the stack.

Assuming the answer to this question is "yes", are locals still guaranteed to have a stable address when they are reallocated using StorageDead / StorageLive? For example:

let x = 22;
let y = &x as *const _ as usize;
x = 33;
let z = &x as *const _ as usize;
assert_eq!(y, z); // is this true?

Or:

let mut prev = None;
loop {
  let x = 22;
  let y = &x as *const _ as usize;
  if let Some(z) = prev {
    assert_eq!(y, z); // is this true?
  }
  prev = Some(y);
}

Edit by @digama0: moved question about const address stability to #406 , clarified question on killed locals

Representation of structs

Discussion topic about how structs are represented in Rust. Some things to work out:

Do we ever say anything about how a #[repr(rust)] struct is laid out
(and/or treated by the ABI)?
- e.g., what about different structs with same definition
- across executions of the same program?
For example, rkruppe writes that we might "want to guarantee (some subset of) newtype unpacking and relegate #[repr(transparent)] to being the way to guarantee to other crates that a type with private fields is and will remain a newtype?"
When is interop with #[rust(C)] guaranteed and what can we say there?

Function Pointers

And how they work on an ABI level, and what you can do with them!

(This is mostly a reminder to myself to actually start working on this later)

Add a skeleton "Data Representation" chapter to the reference?

The other data representation issues can't be resolved until there's a document for them to edit.

Use layout consistently in the UCG documents

See #58 (comment) .

In the last meeting we discussed that we should use the term "layout" consistently and that we should not use the term "representation" to refer to "layout". This reserves the term "representation" for other purposes, like relating how mathematical objects are mapped to bitstring (cc @ubsan ).

The name of the repr attribute is unfortunate, but we can just document it as an attribute that controls "layout" (omitting the term "representation").

A Formal C Memory Model Supporting Integer-Pointer Casts

Links: PDF, ACM

This is a memory model targeting C. The C specification itself has very loose rules that don't support a lot of common things people do in C programs, primarily around using pointers as integers (e.g., bitmasking, etc).

The key ideas of the model are as follows:

When you allocate memory, it is initially assigned a logical identifier and (conceptually) has no physical address (yet).
Once a pointer is cast to an integer, the memory block it points at is assigned a physical address.
Once a memory block has a physical address, then it is much less optimizable. For example, if you cast from an integer i to a pointer p, you have to assume that accesses through p could affect any block with a physical address, but you know they can't affect anything that doesn't have a physical address.

Some examples from the paper:

Keeping local variables private (supported)

Consider this question:

int f() {
  int a = 0;
  g(); // Can `g()` observe/affect value of `a`?
  return a;
}

In this model, the answer is no: the allocation of a is private, and at no point did the code do something like (int) &a to convert it into a concrete block.

Keeping local allocations private (supported)

Consider this question:

p = malloc();
*p = 123;
bar(); // Can this affect `*p`?
a = *p;
hash_put(h, p, a);

In this model, the answer is no: the allocation of p is private, and at no point did the code do something like (int) p to convert it into a concrete block.

The paper calls this "ownership transfer". I've avoided this term because it's so different from what Rust means by this term. It's more like "retaining" or "respecting" ownership (of p).

The paper says that clang -O2 does this kind of optimization, but not gcc -O2, and claims therefore that it may not be that important.

Dead Cast Elimination (unsupported in the model)

Consider this question (not from paper):

foo(int *p) {
    int x = (int) p; // Can we drop this unused value?
    println("%d", *p);
}

In this model, we cannot eliminate (int) p even if the result is not used because it has a side-effect of (potentially) assigning p a concrete value.

The paper addresses this by doing this optimization later in the pipeline. Basically at the point where we drop this model and convert to something more concrete.

Example limitation

Here is an example of an optimization prohibited by this approach.

p = malloc(1);
*p = 123;
b = (int) p;
bar();
a = *p;
hash_put(h, b, a); // Can we optimize this to `hash_put(h, b, 123)`?

The answer is no: p was made "concrete", and hence may be affected by bar(). Interestingly, a slight reordering would enable the optimization:

p = malloc(1);
*p = 123;
bar(); // `p` is not concrete **yet**...
b = (int) p;
a = *p;
hash_put(h, b, a); // ...hence no opportunity between cast and here to modify `*p`

Should Rust permit conversions between pointers to data and pointers to functions?

From rust-lang/rfcs#1861 (comment)

@briansmith

Maybe we're talking past each other, but I would expect an extern type T to be guaranteed to not be a function pointer, just like struct T; is guaranteed to not point to a function in C.

Ah, I was not aware of that! (But also slight terminology tweak: It'd be *const T, not T, that would be a function pointer.) For anyone else who may have been in the same position as I was, the C99 standard says:

6.3.2.3:8 A pointer to a function of one type may be converted to a pointer to a function of another type and back again; the result shall compare equal to the original pointer. If a converted pointer is used to call a function whose type is not compatible with the pointed-to type, the behavior is undefined.

(As this gives no conversions between data pointer types and function pointer types, such conversions are undefined, and thus a pointer to one can't possibly be valid as a pointer to the other.)

The rationale I saw given when I looked was that data and function pointers may not be the same size.

However, Rust is not C, and this should probably fall under the aegis of the Rust unsafe semantics - permitting it may be of value. I've filed an issue raising that question.

Note that this requires having function pointer types, which we currently do not , and was something I thought RFC 1861 might make possible - the closest is a kind of "function reference of static lifetime" that is syntactically not a reference at all :(

semantics of black_box and clobber in the memory model

I've just closed rust-lang/rfcs#2360 due to @rkruppe's input that by trying to specify what black_box and clobber do there the RFC is basically specifying a subset of the memory model.

So I'd like to ask feedback for these here first.

The specification in the RFC is very loose. mem::black_box(x) is specified as follows:

writes the address of x to memory
reads all memory
writes all memory

while mem::clobber():

reads all memory
writes all memory

where I have no idea how to specify memory but @rkruppe came up with a minimal example that helps:

{
    let tmp = x;
    clobber();
}

Here, the compiler is allowed to optimize the store of x to tmp away, because the only code that has the address of tmp is in that scope, and that code does not read or write from tmp. In the specification of clobber, when it states "read/write from/to all memory", "memory" does not refer to tmp. However, if I change the example to:

{
    let tmp = x;
    black_box(&tmp);
    clobber();
}

in this case clobber() requires the store of x to tmp to be live, because black_box has written the address of tmp to "memory", and thus the "read/write to/from memory" in clobber can do something with tmp.

So a big question I have is what is "memory" here, and how does tmp in the first example differs from the rest?

I don't know how these could make sense in Rust memory model, what the wording for their specification should be, and I am barely qualified to follow the discussion at all. The RFC contains more information, and lots of godbolt links, but black_box and clobber proposed implementaiton is this:

#[inline(always)]
fn clobber() {
    unsafe { asm!("" : : : "memory" : "volatile") };
}

#[inline(always)]
fn black_box<T>(x: T) -> T {
    unsafe {
        asm!("" // template
          : // output operands
          : "r"(&x) // input operands
          // r: any general purpose register
          : "memory"    // clobbers all memory
          : "volatile"  // has side-effects
        );
        x
    }
}

Also, @rkruppe asked:

I'd like to know the difference between this and compiler_fence(SeqCst)

To which @nagisa answered:

The difference between asm! with a memory clobber and compiler_fence exists in the fact, that memory clobber requires compiler to actually reload the memory if they want to use it again (as memory is… clobbered – considered changed), whereas compiler_fence only enforces that memory accesses are not reordered and the compiler still may use the usual rules to figure that it needn’t to reload stuff.

cc @rkruppe @nagisa

bool == _Bool ?

The T-compiler and T-lang teams signed off here, that

bool has the same representation as _Bool

where on every platform that Rust currently supports this implies that:

bool has a size and an alignment of 1,
true as i32 == 1 and false as i32 == 0, EDIT: that is always true, the valid bit patterns of bool is what matters here, e.g., on a platform where bool has the same size as i8, whether transmute::<_, i8>(true) == 1 and transmute::<_, i8>(false) == 0

These two properties are not guaranteed by Rust, and unsafe code cannot rely on these. In the last UCG WG meeting it was unclear whether we want to guarantee these two properties or not. As @rkruppe pointed out, this would be guaranteeing something different and incompatible with what T-lang and T-compiler guaranteed.

Note: any change that the UCG WG proposes will have to go through the RFC process anyways, were it might be rejected. This issue is being raised with stakeholders to evaluate whether there is something that needs changing or not, and if so, whether the change is possible, has chances to achieve consensus, etc.

The following arguments have been raised (hope I did not miss any):

T-lang and T-compiler did not specify which version of the C standard _Bool conforms to. In C++20 and C20, P0907r4 (C++) and N2218 (C) specify that:
- bool and _Bool contain no padding bits (only value bits),
- 1 == (int)true and 0 == (int)false.
In some of the merged PRs of the UCG we have already specified that the platform's C implementation needs to comply with some, e.g., C17 or "latest" C standard properties (e.g. for repr(C) struct layout). If we end up requiring C20 for representation / validity of repr(C), we end up guaranteeing these properties. AFAICT the only property about bool that would remain as implementation-defined is its size and alignment.
In #9 / #49 , we ended up requiring that C's CHAR_BITS == 8. This implies that if CHAR_BITS != 8 then bool cannot have the same representation as _Bool. Some stakeholders still wanted to be able to do C FFI with these platforms, e.g. @briansmith suggested that Rust should diagnose, e.g., using bool on FFI on these platforms (#9 (comment)), but that interfacing with those platforms via C FFI (e.g. against assembly) should still be possible (e.g. in a DSP where CHAR_BITS == 16 passing a u16 to C / assembly / ... expecting a char or 16 bit integer should be doable).
What exactly T-lang and T-compiler actually ended up guaranteeing isn't 100% clear. In rust-lang/rust#46176 the decision seems to be that bool == _Bool, but the PR that was actually merged only mentions that bool has a size of 1: rust-lang/rust#46156. This might be an oversight in the docs, and some have mention that the reference is "non-normative". @briansmith pointed out (here and here) that bool ABI (e.g. integer class or something else?), alignment, bit patterns denoting true and false, etc. don't appear to be properly documented. @gankro summarized the status quo in Rust Layout and ABIs document and mentioned that projects like Firefox rely on these extra guarantees for correctness (e.g. for bindgen, etc. to work properly, see here.

There are a couple of comments by @withoutboats that I think show both T-lang and T-compiler's rationale and the spirit behind their decision, here:

I worry that if we don't specify bool as equivalent to C and C++ booleans, people will need to use c_bool in FFI to be cross platform compatible.

I worry that if we don't specify bool as byte sized, people will create a struct Bool(u8) to get that guarantee & keep their structs small.

and here:

People could come to the conclusion that they need a c_bool type for their FFI to be forward compatible with platforms we don't yet support. I think defining it as the same representation as _Bool / C++ bool makes it the least likely someone does something painful to avoid entirely hypothetical problems.

So even if the docs say that bool has a size of 1, and that's it, I believe that this last comment shows that the spirit of T-lang and T-compiler decision was to spare people from creating a c_bool type to be forward compatible on C FFI with platforms that we might never properly support.

I think that the open questions that have to be clarified, are:

Should we require C20 compatibility for _Bool, or do we want to stay backwards compatible with C99/11/17 ? (in this case, people can only rely on, e.g., true == 1 on targets where the platform implementation is C20 "conforming" at least w.r.t. _Bool)
Do we want to require that bool has a size and an alignment of 1 ? (in the hypothetical case that we ever support a platform where this is not the case, we could raise an improper_ctype warning / error on the platform, or some other form of diagnostic, as @briansmith suggested). This would be a change incompatible withbool == _Bool, might lead people to create and use a c_bool type, etc.

cc @rkruppe @withoutboats @briansmith @gankro @joshtriplett @cuviper @whitequark @est31 @SimonSapin

Representation of unions

Discussing how unions are laid out.

Is #[repr(C)] meaningful when applie to a union?
When (if ever) do we guarantee that all fields start at offset 0?
When (if ever) do we guarantee that all fields have the same address?
Any key things to note re: FFI interop?

A Promising Semantics for Relaxed-Memory Concurrency

Link: http://sf.snu.ac.kr/promise-concurrency/

This is a memory model targeting C++. It has several unique characteristics:

it is based on an operational semantics rather than an axiomatic presentation
- this means that it is defined in terms of simulating a kind of abstract computer, instead of as a series of more abstract predicates
- this means I can summarize using Rust pseudocode below; huzzah
it validates many common compiler optimizations
it does not resort to undefined behavior in the case of a data race (huzzah!), but instead defines a semantics for what values can result

I will endeavor to give a brief summary to the best of my understanding. However, it is likely I am making mistakes, so I would like feedback from the authors (cc @jeehoonkang). And of course to truly understand I recommend reading the paper, which is quite good.

Basic idea of the machine

To start, let's imagine a real simple computer. You might imagine modeling a computer's memory as a big dictionary mapping addresses to values:

memory: Map<Address, Value>

But a definition like this doesn't really work for parallel programs. One obvious reason is the hardware: remember that each of the CPUs on your computer has a distinct cache. Unless these CPUs synchronize with one another, it is possible for these caches to get out of sync. In that case, each CPU might have a different idea of what value is stored at a particular address. Naturally, we can synchronize the caches (that's what atomic operations do) but it's expensive, so we try to avoid it when possible (e.g., if some memory is owned (or believed to be owned) by a particular CPU). The problem is not limited to hardware; compiler optimziations can produce similar effects.

So instead we adopt a different, more flexible model for memory. Imagine instead memory as a big list of all the writes that have ever occurred:

memory: Vec<Write>

struct Write {
    address: Address,
    timestamp: Real, // a positive real number like 0, 1, or 1.2
    value: Value,
} // the paper writes this as <address:value@timestamp>, just fyi

struct Address(usize); // an address is just a pointer
struct Value(u8); // byte-addressable memory is the norm I hear

Now we equip each thread with an idea of the "current timestamp" for each address. This is basically the last write that we observed (i.e., by reading from that memory address and getting the value that was written). Once a thread has observed a particular write, it can never go back in time -- that is, we can't read again and get the value of some earlier write.

struct Thread {
    view: Map<Address, Real>, // timestamp from last observed write
    ... // the paper has some other stuff that I don't need for the purposes of my summary
}

Now the idea is that if a thread wants to read from memory, it can pick any of the writes for that particular address that it wants, so long as those writes don't come before the timestamp in its local view. After that, it must update its view:

fn read(memory: &Memory, thread: &mut Thread, address: Address) -> Value {
    // find all the writes that we could observe (from this thread or others)
    let timestamp = thread.view.get(address).unwrap_or(0);
    let available_writes: Vec<_> =
        memory.iter()
              .filter(|wr| wr.address == address && wr.timestamp >= timestamp)
              .collect();

    // pick one in whatever way we choose (not necessarily the most recent)
    let write = available_writes.pick_one(); // pick whichever one we want

    // update our local view now
    thread.view[address] = write.timestamp; // update our local view

    write.value
}

A write works by picking a fresh timestamp (something later than our local view, and otherwise unused, but basically arbitrary) and adding that into the list of writes.

fn write(memory: &mut Memory, thread: &mut Thread, address: Address, value: Value) { 
    // find all the writes that we could observe (from this thread or others)
    let cur_timestamp = thread.view.get(address).unwrap_or(0);
    let new_timestamp = pick_new_time_stamp(cur_timestamp, memory);
    thread.view[address] = new_timestamp;
    memory.push(Write { address address, value: value, timestamp: new_timestamp });
}

So far this is all fairly standard. This model as is however can't describe many common optimizations and other things.

Promises

The big idea of the paper is something called a promise. The idea is that a thread can basically speculate ahead and insert values into memory before it has actually written them. These writes are called promises.

In order to make a promise, a thread must be able to prove that it will be able to do this write. This proof cannot rely on the promise itself, which prevents circular reasoning ("I can read 5 from x because I promised to write 5 in the future").

The intuitive example is pretty good. Imagine that a, x and y are all memory locations that are initially zero, and we start two threads:

// Thread 1
a := x // writes **1**
y := 1

// Thread 2
x := y

The surprising thing is that a := x writes 1 before 1 has been written to x! One reason this could occur In Real Life is that the compiler may reorder the a := x; y := 1; statements, since they appear independent from one another. But how about in the model?

The idea here is that thread 1 would promise to write 1 to y -- clearly, whatever value we read from x, this will happen. This adds the promise into memory almost as if it were a real write. Now thread 2 can actually read from that promise when it reads from y. It then does an actual write of 1 to x. Then when we come back to thread 1, we read from x, and we actually see 1.

Atomic operations

The basic operational semantics I described above must be extended slightly for atomic operations. Instead of each write (or promise) securing a specific timestamp, they can now describe a half-closed like range (5, 6]. This is needed because otherwise there is always space for another thread to insert writes where they don't belong. i.e., if two threads are doing an atomic increment of x, and the first one writes with timestamp 1, we don't want the other thread to go inserting a write with timestamp 0.5 or something. So to prevent that the first one would secure the timestamp (0, 1]. Now the second write has to go afterwards.

So, in our model, we can change the timestamp to a pair of reals ((Real, Real)).

This corresponds to an operation called update. You can think of it as the tail end of a "compare and swap" that is going to succeed. In other words, in some previous steps we read old_value from address (with a given timestamp) and inserted a promise to store a new-value. We are now making good on that promise. (I didn't add the machinery into my Rust-model to track pending promises, so I'll just represent that as a set.)

fn update(memory: &Memory,
          pending_promises: &mut Set<Write>,
          thread: &mut Thread,
          address: Address,
          old_value: Value,
          new_value: Value) { 
    // for this kind of step to be legal, somewhere there should be a
    // previous write that stored `old_value` which is compatible with our view
    let old_write = memory.find_write(address, old_value, cur_timestamp).unwrap();
    let (old_start, old_end) = old_write.timestamp;

    let cur_timestamp = thread.view.get(address).unwrap_or(0);    
    assert!(cur_timestamp <= old_end); // since we observed the old read
    let new_timestamp = pick_new_timestamp(old_end); // something bigger than old_end and not taken

    let new_write = Write { address: address, value: new_value, timestamp: (old_end, new_timestamp) };
    thread.view[address] = new_timestamp;

    // remove the write from list of pending promises; we don't have to touch memory
    // because when we promised the value to be written, it would have been inserted
    // into memory
    pending_promises.remove(new_write);
}

More stuff

I didn't really digest the rest of the paper yet, which digs into more advanced cases.

Some questions

@jeehoonkang, can you clarify a few points for me?

when a thread makes a promise and does a proof, is that proof "thread-local"? That is, does it have to have knowledge of the full machine state, including all the instructions that other threads will execute?
- I am thinking in particular of examples like ARM-weak, where the thread winds up reading from its own promise -- though in re-reading that example, I see that in fact the proof of the promise doesn't rely on the promise itself, it's that a later read does that read
I know that you proved the soundness of various optimizations in this model, but haven't had time to dig into that yet. I'd be curious to get a feeling for how those proofs look. I don't quite have a feeling yet for how a compiler author reasons using this model yet. (OK, this is not a question.)

Write an introduction

Some topics to cover:

what this reference is
what this reference is not
what is an abstraction boundary

Definitions of "representation", "validity" etc.

We should add basic definitions of terms like "representation", "validity" and friends.

Needs license

This repo has no license. That's bad! The later you are thinking about this, the harder it is to introduce licenses. So better think about it now.

Representation of Rust references (`&T`, `&mut T`) and raw pointers (`const T, `mut T`)

Representation of Rust references:

Are &T and &mut T guaranteed to be a pointer?
Must always be aligned, non-null
Guaranteed to be ABI compatible with C pointer types ("in every way?")
- presuming that the referent is compatible, of course

Representation of raw pointers:

Guaranteed to be ABI compatible with C pointer types
- presuming that the referent is compatible, of course

Other factors:

Considerations for storing things in the low bits of pointers
- safe with raw pointers, not safe with references (from this comment)

Validity of function pointers

Discussing the validity of function pointers.

Clearly, the must be non-NULL. Since we exclude some bit patterns, we likely also do not want to allow the entire value to be uninitialized. We could allow some bits to be uninitialized as long as there is at least one bit initialized to 1, but, uh, why?^^

Anything else? Do fn ptrs have to point to allocated executable memory? This discussion concludes that the answer ought to be "no", because there's little to no benefit and it would interact badly with unloading shared libraries.

Layout of homogeneous structs

From #31: If you have homogeneous structs, where all the N fields are of a single type T, can we guarantee a mapping to the memory layout of [T; N]? How do we map between the field names and the indices? What about zero-sized types?

A specific proposal:

If all fields have the same type T (ignoring zero-sized types), then the layout will be the same as [T; N] where N is the number of fields
If the struct is defined with named fields, the mapping from fields to their indices is undefined (so foo.bar maps an undefined index)
If the struct is defined as a tuple struct (or a tuple), then the indices are derived from the definition (so foo.0 maps to [0])

This is basically because it's convenient but also because it's about the only sensible thing to do (unless you imagine we might want to start inserting padding between random fields or something, which connects to #35).

Note that for tuple structs (and tuples), the ordering of (public) fields is also significant for semver and other purposes -- changing the ordering will affect your clients. The same is not true for named fields and so this proposal attempts to avoid making it true.

What about: Targets where NULL is a valid pointer

This recently came up in a discussion. To my knowledge, LLVM has a pretty hard-coded assumption that for address space 0, NULL is never inbounds (let alone dereferencable), so we cannot actually support such targets with the LLVM backend. But there may well be tricks I am not aware of.

I need to do an oob vector load. How?

As an optimization during a buffer search, I need (very want) to load that buffer into a SIMD vector, even when the buffer doesn't fit into the vector. E.g. I might have a 31-byte buffer that can be efficiently searched with a 32-byte wide AVX2 vector.

From a machine perspective, I don't see this as a problem, as long as the load doesn't extend beyond the current page; from LLVM's perspective this seems like UB.

I'd really like to be able to write this code in Rust and not have to use assembly.

Here's an example of this pattern:

    #[inline(always)]
    unsafe fn do_tail_clever(needle: u8, p: *const u8, len: isize,
                             i: isize, q: __m256i) -> Option<usize> {
        let rem = len - i;
        debug_assert!(rem < 32);

        // Check if the 32-byte load is within the current page
        let page_alignment = 4096;
        let page_mask = !(page_alignment - 1);
        let current_p = p.offset(i) as usize;
        let avx_read_end = current_p + 32;
        let next_page = (current_p & page_mask) + page_alignment;

        if likely(avx_read_end <= next_page) {
            let x = _mm256_loadu_si256(p.offset(i) as *const __m256i);
            let r = _mm256_cmpeq_epi8(x, q);
            let z = _mm256_movemask_epi8(r);
            let garbage_mask = {
                let ones = u32::max_value();
                let mask = ones << rem;
                let mask = !mask;
                mask as i32
            };
            let z = z & garbage_mask;
            if z != 0 {
                return off(i, z);
            }

            return None;
        }

        // Slow path
        do_tail_simple(needle, p, len, i, q)
    }

It loads beyond the array, does vector operations on it, then disregards the oob bytes with a mask.

I'm hopeful that there is some mechanism to tell LLVM to 'forget' what it knows about this pointer, 'fooling' the optimizer into not messing with it.

From the LLVM aliasing rules, there is some language that makes me hopeful:

An integer constant other than zero or a pointer value returned from a function not defined within LLVM may be associated with address ranges allocated through mechanisms other than those provided by LLVM. Such ranges shall not overlap with any ranges of addresses allocated by mechanisms provided by LLVM.

So there is a class of pointers that can operate on arbitrary memory (those that don't come from LLVM). That suggests to me that I could e.g. send my pointer through assembly or some other black-box function to 'clean it', maybe. On the other hand, calling into any function, or even into inline asm imposes extra instructions that more-or-less defeat the optimization (inline asm in LLVM seems to always spill registers). Though that sentence also says "such ranges shall not overlap with any ranges of addresses allocated by mechanisms provided by LLVM"

I'm not sure how much 'wiggle-room' there is. Is a malloc'd array "provided by LLVM"? What are the consequences of disobeying this "shall not"?

Even if there's no in-language solution and it is technically UB, I am hopeful that I can do this thing without LLVM messing with my codegen.

cc @nikomatsakis writing this here per your request.

Temporary Scopes/Arenas to formalize lifetimes of local variables

Not sure this is the right place to put this, but @ubsan asked for it:

Semantic divergences from current rustc are in bold. Unresolved questions are (XXX: question).

Arenas

In the code examples here, assume this struct has been declared:

struct NoisyDrop(&'static str);
impl Drop for NoisyDrop {
    fn drop(&mut self) {
        println!("dropping {}", self.0);
    }
}

Arenas are used to ensure orderly and deterministic destruction of local variables and temporaries (XXX: buffer temporaries created for order-of-evaluation).

During its execution, a Rust function manages a stack of arenas, pushing new arenas to it and popping them.

Newly-created locals and temporary locations are always allocated at the end of an arena, but the arena need not be the topmost arena on the stack, as in example (1).

When an arena is popped, the locations within the arena are destroyed in the reverse order of their allocation. However, parts of a location that are in a deinitialized state are not
destroyed, as in example (2).

Lifetime and region checking treats the destruction of an arena as through each location was destroyed separately, in order, but subject to implementation limitations.

The location allocated for a local/temporary (the "alloca") is valid from the moment it is allocated until just after the value inside it is destroyed when it is popped, as through it was an &move reference.

To simplify implementation, each arena in Rust can contain only a sequence of locations
whose type and size are known at compile-time. Of course, this does not imply that an arena
is stored in that order in memory.

NOTE: arenas are sometimes called "destruction scopes", but in rustc destruction scopes do not include binding arenas, so that term would be confusing. (XXX: Bikeshed!)

The arena tree

The arenas in a function are structured as a tree defined by the program's structure:

When a function begins, the function's root arena is created and pushed. That arena
is popped when the function exits.
The following expression places push an expression arena when control enters them and pop it when it exits them:
- The initializer of a let-statement, if any
- The expression of an expression statement, edit 2017-12-25 with or without a semicolon
- The body of a match arm
- The guard of a match arm, if any
- The RHS of a logical && or || expression
- The condition of an if or while expression
The following block places push a block arena when control enters them and pop it when it leaves them:
- The body of a function
- The body of a loop or while loop
- The arms of an if-expression.
Each block, match guard, or match arm pushes a binding arena when it is entered (XXX: do we want after? Does it matter?). The arena is popped along with the statement's containing block's.

Remember that if let, while let and for are defined by their desugaring in terms of loop
and match.

Observe that the tail expression of a block is under the block's binding arena but none of the other arenas

Before a parent arena is popped, all of its children are popped in LIFO order.

Local variables

Local variables, or bindings (XXX: AFAICT the terms are used interchangeably in rustc - do we want to change this?) created by a let-statement are allocated into the statement's containing block's binding arena. The local variables are allocated before the statement's initializer executes.

Local variables created by argument buffers (see example (3)) and argument bindings are allocated in the function's root arena. The order in which they are allocated is that each argument's patterns are allocated followed by the argument's buffer, from left to right.

Local variables created by a match arm are allocated twice - once when the guard is executed, within the guard's binding arena (which will not call a destructor because bindings in guards must either be Copy or ref mut t, but will release the binding's storage) (XXX: pending final status of borrowck in guards) and once within the arm's binding arena (XXX: this differs a bit from the current behavior, but the current behavior is unsound and not something I would like to keep - cc @eddyb @nikomatsakis).

Temporaries

Temporaries that are created by a temporary lexpr borrow-extended by a let-statement are allocated within that let-statement's containing block's binding arena. Other temporaries are allocated into the topmost non-binding arena on the stack when they are created (see Example (1)).

If the pattern of a let-statement contains a by-ref binding, the root lexpr of the let-statement's expression is borrow-extended (see Example (4)).

If an address-of expression is distinguished subexpression of a let-statement, the root lexpr of the address-of expression's subexpression is borrow-extended (see Example (5)).

The following expressions are distinguished subexpression of a let-statement:

The expression of that let-statement
The subexpression of a address-of, cast, or parentheses that is a distinguished subexpression of that let-statement.
The fields (XXX: but not the FRU base) of a dictionary expression, tuple expression, or array expression (XXX: but not repeat expression) that is a distinguished subexpression of that let-statement.
The tail of a block expression that is a distinguished subexpression of that let-statement.
(XXX: this is just documenting the current implementation. Should we do better - see the merged but unimplemented rust-lang/rfcs#66?).

In some other cases that have yet to be documented (XXX: document them).

No other temporaries are borrow-extended (e.g. type annotations do not matter - see rust-lang/rust#36082).

Example (1) - basic arenas

fn basic() {
    let x = NoisyDrop("local `x`");
    (NoisyDrop("temporary"), ()).1
}

Here there are 4 arenas:

root arena
    - function block arena
        - function block binding arena
            - `let` expression arena

Here, the local x is allocated in the binding arena, but the (NoisyDrop("temporary"), ()) temporary skips the binding arena and is instead allocated into the function's block arena,
and is destroyed after the NoisyDrop("localx").

Example (2) - conditional drop

fn condi_drop(init_flag: bool, fini_flag: bool) {
    let (init, fini);
    fini = NoisyDrop("fini");
    if init_flag {
        init = NoisyDrop("init");
    }
    if !fini_flag {
        std::mem::forget(fini);
    }
}

Here, the locations of the locals init and fini are allocated from the same arena
in that order. Therefore, when the function is exited, fini is destroyed followed
by init.

At least, that is the case if both are initialized - when both flags are true. If
init_flag is false, then init is never initialized and therefore not destroyed, and
if fini_flag is false, then fini is deinitialized without its destructor being run
by mem::forget, and therefore it is not destroyed.

Example (3) - argument buffers

fn argument_buf((x1,_,z1): (NoisyDrop, NoisyDrop, NoisyDrop),
                (x2,_,z2): (NoisyDrop, NoisyDrop, NoisyDrop)) {
}

fn main() {
    argument_buf((NoisyDrop("x1"), NoisyDrop("y1"), NoisyDrop("z1")),
                 (NoisyDrop("x2"), NoisyDrop("y2"), NoisyDrop("z2")),);
}

Here, the first argument's bindings x1 and z1 are allocated first, followed by the first argument's buffer (which contains the triple (NoisyDrop("x1"), NoisyDrop("y1"), NoisyDrop("z1"))). Then, the same happens for the second argument.

Afterwards, all but the middle 2 fields of the buffers are moved to the argument bindings, so only the middle NoisyDrop's destructor is called when the buffer is destroyed.

Everything is allocated onto the function's root arena and dropped in reverse order, leading to

dropping y2
dropping z2
dropping x2
dropping y1
dropping z1
dropping x1

being printed

Example (4) - borrow extension by ref pattern

fn main() {
    let ref foo = (NoisyDrop("extended 1"), NoisyDrop("extended 2")).0;
    let _ = NoisyDrop("external");
}

The root lexpr of (NoisyDrop("extended 1"), NoisyDrop("extended 2")).0 is the lexpr (NoisyDrop("extended 1"), NoisyDrop("extended 2")), which is extended to the binding arena and therefore dropped after the unextended let.

This prints

dropping external
dropping extended 1
dropping extended 2

Example (5) - borrow extension by designated expression

fn main() {
    let _x : (&_, NoisyDrop) = (
        {&[&[NoisyDrop("extended 1")]]},
        NoisyDrop((&NoisyDrop("unextended"), "extended 2").1)
    );
    drop(NoisyDrop("external"));
}

This is a rather complicated case. The arena tree is

root arena
    - function block arena
        - function block binding arena
            - let expression arena
            - second statement expression arena

First, storage for _x is allocated in the block binding arena. Next, the first 2 address-of expressions are distinguished subexpressions of the let-statement, so the 2 arrays are allocated in that binding arena too. The third address-of expression is inside a function call, so it is not a distinguished subexpression, and is allocated within the let expression arena.

After the let-statement completes, the let expression arena is dropped, printing "dropping unextended".
Then, the second statement prints "dropping external".

Afterwards, the binding arena is dropped in reverse order of pushing - first the temporary is dropped, printing "dropping extended 1", and the _x is dropped, printing "dropping extended 2".

The overall output is

dropping unextended
dropping external
dropping extended 1
dropping extended 2

What about: Virtual memory effects

During the pointer docs update, a discussion spawned off about effects of virtual memory: What do we still guarantee when different virtual addresses map to the same physical address? Some of libstd, e.g. copy_nonoverlapping, will start misbehaving in that situation. FWIW, C seems to just not care: memmove has the same problem.

Validity of booleans

Discussing the validity invariant of booleans.

The obvious invariant is:

Must be either true or false.

Is there any reason to allow any other value? In particular, this invariant means that no bit may be uninitialized.

As usual with bool, the remaining thing that can be bikeshed indefinitely is the interaction with FFI. Can we really assume that any function calling us from C only passes one of two possible bit patterns, on any platform? On which platforms can we be more specific and specify the actual bit patterns? AFAIK false == 0x00 is given by the fact that C allows 0-initialziation of _Bool. Can we say anything about the bit pattern of true?

validity invariant for types

@RalfJung introduced the idea of validity invariants in their blog post "Two kinds of invariants". Presuming we agree with this framing (I do), we need to define these validity invariants.

These invariants need to justify the various sorts of optimizations that we currently do:

For example, Option<&T> layout optimization
Marking pointers as deferenceable

We need to discuss also the role of uninitialized memory and how it fits in. For example, when are invariants required to hold? Only when "compiler thinks memory is initialized" -- can/should we make that more precise? Also, what about loads of uninitialized integral values (a sometimes useful hack) -- are those valid? What is the effect?

Validity of unions

Discussing the validity invariant of unions.

One possible choice here is "none, any bit pattern is allowed no matter which types the fields have, and including uninitialized bits".

We could also decide that e.g. a

union Foo { a: bool, b: (bool, u8) }

must start with the first byte being either the bit-pattern of false or the bit-pattern of true, because all fields agree on that invariant.

Notice that we cannot require the union to be valid for some field: for a union like

union Mix {
  f1: (bool, u8),
  f2: (u8, bool),
}

we want to allow a bit pattern like 0x3 0x3, which can occur from code like

let m = Mix { f1: (false, 3) };
m.f2.0 = 3;

There is no demonstrated benefit from disallowing such code, and this kind of code seems perfectly reasonable around unions.

Given that, any validity invariant that wants to restrict the set of allowed bit patterns will be rather complicated. However, such an invariant would enable us to e.g. layout-optimize Option<Foo>, whereas the "anything goes"-invariant would prohibit any kind of layout optimization around unions.

Document existing optimizations

We should try to document existing MIR-level optimizations that exploit the reference types. I am not sure if we have such optimizations, but I seem to recall that @eddyb pointed me towards something that we already do.

Representation of fn pointers

Discussing the representation of extern "abi" fn(..) types:

What hazards exist if you try to transmute these to e.g. usize?
- the C standard, for example, is conservative about the size of a data vs fn pointer
- is this a concern on any modern architecture?
Related, is Option<extern "C" fn()> guaranteed to be equivalent to a "C fn pointer" representation?
- (I think yes, and projects rely on this)

Thread cancellation, asynchronous unwinding exceptions and their interaction with drops

So during the all-hands we identified a use-case that would be good to be documented in the unsafe code guidelines.

There are at least three sources of "cancellation" which might end up "removing" (in Taylor’s words) a value without "dropping" it, which in turn results in unsoundness for e.g. rayon, crossbeam, &pin stuff. These sources are:

longjmp/setjmp (used in practice for error handling by rust lua and perhaps many other embedded languages/interpreters);
pthread_cancel: which can run either an asynchronous unwinding exception (might occur at any program point) or raise an unwindingexception at well specified points such as sleep; exact behaviour is specified by pthread_setcancelstate.
pthread_exit, pthread_kill: which will "stop" the thread, potentially executing some arbitrary code and cleaning the thread up (freeing thread’s stack).

There’s no question that these functions may be useful in one scenario or the other, so it would be good if we figured out scenarios in which these functions are sound to use (e.g. if the thread stack contains only Copy types) and encoded this information into our unsafe code guidelines book.

cc @nikomatsakis @cramertj

Representation of tuples

Discussing how tuples are laid out.

Is tuple layout equivalent to some corresponding struct? If so, what struct exactly?
- that struct definition might in turn have undefined layout, of course

Some things that might be useful if they were defined:

Is it possible to inter-convert a (T, T, ..., T) tuple with a [T; N] memory representation?

Validity of references: Bit-related properties

Discussing the "bit-pattern validity" of references: the part that can be defined without referring to memory.

Certainly, references are non-NULL. Following the current lowering to LLVM, they also must be aligned. This is in conflict with creating references to fields of packed structs, see RFC 2582 for a proposed solution.

Do we want to allow uninitialized bits? Theoretically we could allow something like 0xUU000001 (on 32bit, where U represents 4 uninitialized bits) for &(), but there seems to be little point in doing so.

Deterministic (but undefined) layout

From #31: Can we say that layout is some deterministic function of a certain, fixed set of inputs? This would allow you to be sure that if you do not alter those inputs, your struct layout would not change, even if it meant that you can't predict precisely what it will be. For example, we might say that struct layout is a function of the struct's generic types and its substitutions, full stop -- this would imply that any two structs with the same definition are laid out the same. This might interfere with our ability to do profile-guided layout or to analyze how a struct is used and optimize based on that. (Some would call that a feature.)

Also, this presumably applies to enums as well as other types.

Validity of integers and floating point

Discussing the validity invariant of integer and floating point types.

Clearly, every possible bit pattern is allowed. For integers they all have a distinct and meaningful interpretation, and we have a safe NOP-conversion between f32 and u32, and f64 and u64, through to_bits and from_bits.

The remaining open question is: is it ever allowed to have an uninitialized bit in an integer or floating point value? We could reasonably decide either way. Also, when an integer is partially uninitialized, does that "infect" the entire integer or do we exactly preserve which byte is initialized?

2022-09-07: This has now pretty much been answered.

can we in some cases have more limited forms of "undefined behavior"?

In #5, @gereeter raised the point that defining all manner of errors as yielding "undefined behavior" is an awfully strong statement. It theoretically permits the compiler to "change the past" and may not mesh so well with the way that users think. On the other hand, weaker definitions may not permit the kinds of optimizations we want and may not be supported by LLVM.

In a sense, this is a cross-cutting concern: we need to figure out what's allowed and not allowed, but separately, we should consider if there are cases where we can contain the repercussions.

I'm not sure the best way to handle this, but I'm opening this up as its own potential discussion topic for the future.

Layout of vector types

The layout of Rust vector types (e.g. core::arch::x86_64::__m128 and friends) is currently unspecified but we probably want to make it implementation defined. That requires the implementation to specify their layout in "unsafe-code-guidelines speak" so we might just as well do it here for the canonical implementation at least for the stable types on the tier 1 platforms (maybe there is a way to do this generally). Some things to consider:

In C, __m256 is only available when AVX is available such that its layout is always a 256-bit wide register.
In Rust, __m256 is always available, even when 256-bit or 128-bit wide registers are not available. That is, repr(C) and repr(Rust) vector types do not necessarily have the same layout.
The Rust function abi for vector types transparently handles this by passing vector types through memory, but the x64 SysV ABI passes them in registers (this leads to subtleties when using them in C FFI: rust-lang/rust#53346).

Effect of `packed` and `align` on representation

Discussion topic for the effect of #[repr(packed)] and #[repr(align)] on memory layout.

Both of these attributes have RFCs. We should document what current behavior is, what gotchas to watch out for (e.g., &x.foo where foo is a field of a packed struct may not be aligned), and what we can guarantee going forward.

Provenance

Do we need to discuss C's provenance model?

Broken hyperlinks in SUMMARY.md

The hyperlinks in the "Areas of active investigation" https://github.com/rust-rfcs/unsafe-code-guidelines/blob/master/reference/src/SUMMARY.md currently 404.

publish the reference in github pages

We need to publish the reference in github pages but I was too lazy to set it up.

defining zero-sized structs

From #31:

"If you have a struct which -- transitively -- contains no data of non-zero size, then the size of that struct will be zero as well. These zero-sized structs appear frequently as exceptions in other layout considerations (e.g., single-field structs). An example of such a struct is std::marker::PhantomData."

Is that a sufficient definition for zero-sized structs? This seems like an important guarantee that we frequently rely upon for performance and other purposes, so it is worth specifying.

Validity of raw pointers

Discussing the validity invariant of raw pointers.

For pointers to sized types, this should probably be the same as the invariant for usize -- see the integer topic for discussing whether uninitialized bits are allowed or not.

For pointers to unsized types, there is an additional question: to what extent does the metadata have to be initialized/valid? Do we require it to be "valid" enough to determine size and alignment, e.g. do we require that the vtable pointer actually point to allocated memory?

rust-lang / unsafe-code-guidelines Goto Github PK

unsafe-code-guidelines's Introduction

UCG - Rust's Unsafe Code Guidelines

See also

Code of Conduct and licensing

unsafe-code-guidelines's People

Contributors

Stargazers

Watchers

Forkers

unsafe-code-guidelines's Issues

Representation of pointers

Relevance for Rust

Goal of the discussion

Some discussion points

Local variables

Keeping local variables private (supported)

Keeping local allocations private (supported)

Dead Cast Elimination (unsupported in the model)

Example limitation

Basic idea of the machine

Promises

Atomic operations

More stuff

Some questions

Arenas

The arena tree

Local variables

Temporaries

Example (1) - basic arenas

Example (2) - conditional drop

Example (3) - argument buffers

Example (4) - borrow extension by ref pattern

Example (5) - borrow extension by designated expression

Recommend Projects

Recommend Topics

Recommend Org