Git Product home page Git Product logo

Comments (8)

RalfJung avatar RalfJung commented on May 18, 2024 4

I must say that I would enjoy having these restrictions placed on raw pointers, but only if tooling could reliably enforce it. (Presumably through miri.)

Doing this in Miri is not a big issue. (Raw pointers right now are more relaxed mostly because there is some libstd code that is broken, and cannot be fixed without rust-lang/rfcs#2582.)

But Miri cannot run on FFI code, or code interacting with the hardware (embedded/kernel stuff), or code that just does lots of stuff (Miri is very slow). I think we could have a valgrind tool that helps detect such issues, but by their very nature valgrind tools are unable to reliably find UB.

If tooling couldn't reliably enforce it, then I think my joy would turn to misery. :-)

Fair. ;)

Reference-to-raw transmutes are an interesting open question. This is related to the fact that nobody knows what the exact LLVM semantics are for pointer-to-int transmutes -- making them the same as a cast would kill some quite important optimizations. Also see the discussion of type punning in §8 of this paper.

If you want your model of pointer casts shattered, have a look at this LLVM bug. inttoptr(ptrtoint x) and x are not the same thing.

from memchr.

BurntSushi avatar BurntSushi commented on May 18, 2024 2

(To be clear, as_ptr does no magic, it's implementation just does the right thing: first cast to wide raw pointer for entire slice, and only then cast to thin raw ptr for first element.)

I think this is the most interesting part to me! My prior mental model is that this

    pub const fn as_ptr(&self) -> *const T {
        self as *const [T] as *const T
    }

could equivalently be implemented as

    pub const fn as_ptr(&self) -> *const T {
        core::mem::transmute(self)
    }

But it sounds like that may not be the case since the actual series of steps on goes through via the raw pointer casts seems to matter.

I must say that I would enjoy having these restrictions placed on raw pointers, but only if tooling could reliably enforce it. (Presumably through miri.) If tooling couldn't reliably enforce it, then I think my joy would turn to misery. :-)

from memchr.

eddyb avatar eddyb commented on May 18, 2024 2

@BurntSushi I don't actually understand your two examples (the transmute doesn't seem to be valid because &[T] is not the same size as *const T).
The first one is the current implementation of <[T]>::as_ptr, right?

While looking at this I noticed again (playground link) that we don't allow &[T] to be cast directly to *const T while we do allow it for &[T; N].

(When someone brought up this issue elsewhere I thought the &slice[0] as *const T comment referred to code that was written instead of .as_ptr(), which is why I looked at casts)

from memchr.

BurntSushi avatar BurntSushi commented on May 18, 2024 1

Thanks for the issue! Unfortunately, I don't fully grok the underlying explanation for this. Perhaps you could ELINRJ (that is, Explain Like I'm Not Ralf Jung :-)).

While these points/questions don't represent the full extent of my non-understanding, they are perhaps a start:

  • The phrase "may conflict with stacked borrows" has absolutely no meaning to me. I recall reading Ralf's post on stacked borrows a while back, but I promise you that it didn't fully sink in.
  • Does this result in undefined behavior?
  • From what I can tell, creating the pointer manually from the starting raw pointer and creating it from the empty slice at the end seem like equivalent operations to me? What's going on in the language semantics that cause these two things to actually be different?
  • The ending pointer is calculated this way in all of the memchr implementations I think, not just the reverse ones. For example: https://github.com/BurntSushi/rust-memchr/blob/1ec5ecce03c220c762dd9a8b08f7a3d95522b765/src/x86/sse2.rs#L112 --- Or is there something about the reverse implementation that is special that I am not understanding?
  • How is this reconciled with things like "Both the starting and resulting pointer must be either in bounds or one byte past the end of the same allocated object." from the docs of pointer::offset?

from memchr.

HeroicKatora avatar HeroicKatora commented on May 18, 2024 1

Sure, I will try to answer the immediate questions first.

The phrase "may conflict with stacked borrows" has absolutely no meaning to me. I recall reading Ralf's post on stacked borrows a while back, but I promise you that it didn't fully sink in.

The phrase says: Under the memory model of stacked borrows, the implementation here is illegal. Since stacked borrows is not the official memory model of Rust (yet?) this doesn't make the code have undefined behaviour but the compiler may be changed so that it has. Since adapting the code to be defined even under stacked borrowed should be possible, this is a future proofing issue.

From what I can tell, creating the pointer manually from the starting raw pointer and creating it from the empty slice at the end seem like equivalent operations to me? What's going on in the language semantics that cause these two things to actually be different?

A part of stacked borrows is 'pointer provenance', the concept that the source of a pointer may influence the allowed operations. In particular, two pointers that compare equal may not be allowed to be used interchangably (for dereferencing). (Some more reading, another blog post by Ralf Jung.) And so, while the pointer of the empty slice at the end equals the pointer offset from the starting pointer, dereferencing the two pointers or other pointers created from them need not be allowed equally.

The end_ptr was created by casting a reference to the empty slice at the end. Under the strict pointer provenance model it must only be used to access that empty slice. In contrast, start_ptr is created by casting the reference to the whole slice (haystack.as_ptr() does this internally) and is thus valid for dereferencing it at any index within the slice. Meanwhile, pointer offsetting and comparison is itself allowed on both pointers equally.

Or is there something about the reverse implementation that is special that I am not understanding?

Yes, see the above. memchr only inspects end_ptr, compares it against ptr and subtracts the two—but it never dereferences end_ptr. That is the crucial part.

How is this reconciled with things like "Both the starting and resulting pointer must be either in bounds or one byte past the end of the same allocated object." from the docs of pointer::offset?

The offsetting is allowed but dereferencing/reading from the pointee may be still illegal. Note that the two unsafe operations require separate reasoning.

from memchr.

RalfJung avatar RalfJung commented on May 18, 2024 1

From what I can tell, creating the pointer manually from the starting raw pointer and creating it from the empty slice at the end seem like equivalent operations to me? What's going on in the language semantics that cause these two things to actually be different?

To clarify, this is where the mismatch comes in. @HeroicKatora already mentioned pointer provenance, so let me just point you to rust-lang/unsafe-code-guidelines#134 where we track if maybe Stacked Borrows is too strict here (but also, we might lose basically all aliasing-based optimizations if we relax this). Also see this discussion where I explained the issue in some more details:

Raw pointer operations to not affect the permissions a pointer has, so the moment you are casting a reference to a raw pointer, you are deciding what is allowed to be done with all raw pointers ever created from this one.

Basically, &slice[0] as *const T and slice.as_ptr() are not equivalent (even ignoring the empty slice case): the former is a ref-to-raw cast of the first element, so the resulting raw pointer may only access that element; the latter does a ref-to-raw cast of the entire slice, so the resulting raw pointer may be used on the entire slice. (To be clear, as_ptr does no magic, it's implementation just does the right thing: first cast to wide raw pointer for entire slice, and only then cast to thin raw ptr for first element.)

from memchr.

BurntSushi avatar BurntSushi commented on May 18, 2024 1

This will be fixed in #82. I know this technically isn't required yet, but I don't see any good reason not to do it.

from memchr.

BurntSushi avatar BurntSushi commented on May 18, 2024

Interesting. Thanks for elaborating. I think I grok this at a surface level, but probably still lack a deeper understanding. I'll mark this as a bug for now. I'd almost slightly prefer to hold off on fixing it until Miri is able to track it, so that we get some confidence that this is the right play.

from memchr.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.