Comments (8)
I must say that I would enjoy having these restrictions placed on raw pointers, but only if tooling could reliably enforce it. (Presumably through miri.)
Doing this in Miri is not a big issue. (Raw pointers right now are more relaxed mostly because there is some libstd code that is broken, and cannot be fixed without rust-lang/rfcs#2582.)
But Miri cannot run on FFI code, or code interacting with the hardware (embedded/kernel stuff), or code that just does lots of stuff (Miri is very slow). I think we could have a valgrind tool that helps detect such issues, but by their very nature valgrind tools are unable to reliably find UB.
If tooling couldn't reliably enforce it, then I think my joy would turn to misery. :-)
Fair. ;)
Reference-to-raw transmutes are an interesting open question. This is related to the fact that nobody knows what the exact LLVM semantics are for pointer-to-int transmutes -- making them the same as a cast would kill some quite important optimizations. Also see the discussion of type punning in §8 of this paper.
If you want your model of pointer casts shattered, have a look at this LLVM bug. inttoptr(ptrtoint x)
and x
are not the same thing.
from memchr.
(To be clear, as_ptr does no magic, it's implementation just does the right thing: first cast to wide raw pointer for entire slice, and only then cast to thin raw ptr for first element.)
I think this is the most interesting part to me! My prior mental model is that this
pub const fn as_ptr(&self) -> *const T {
self as *const [T] as *const T
}
could equivalently be implemented as
pub const fn as_ptr(&self) -> *const T {
core::mem::transmute(self)
}
But it sounds like that may not be the case since the actual series of steps on goes through via the raw pointer casts seems to matter.
I must say that I would enjoy having these restrictions placed on raw pointers, but only if tooling could reliably enforce it. (Presumably through miri.) If tooling couldn't reliably enforce it, then I think my joy would turn to misery. :-)
from memchr.
@BurntSushi I don't actually understand your two examples (the transmute doesn't seem to be valid because &[T]
is not the same size as *const T
).
The first one is the current implementation of <[T]>::as_ptr
, right?
While looking at this I noticed again (playground link) that we don't allow &[T]
to be cast directly to *const T
while we do allow it for &[T; N]
.
(When someone brought up this issue elsewhere I thought the &slice[0] as *const T
comment referred to code that was written instead of .as_ptr()
, which is why I looked at casts)
from memchr.
Thanks for the issue! Unfortunately, I don't fully grok the underlying explanation for this. Perhaps you could ELINRJ (that is, Explain Like I'm Not Ralf Jung :-)).
While these points/questions don't represent the full extent of my non-understanding, they are perhaps a start:
- The phrase "may conflict with stacked borrows" has absolutely no meaning to me. I recall reading Ralf's post on stacked borrows a while back, but I promise you that it didn't fully sink in.
- Does this result in undefined behavior?
- From what I can tell, creating the pointer manually from the starting raw pointer and creating it from the empty slice at the end seem like equivalent operations to me? What's going on in the language semantics that cause these two things to actually be different?
- The ending pointer is calculated this way in all of the memchr implementations I think, not just the reverse ones. For example: https://github.com/BurntSushi/rust-memchr/blob/1ec5ecce03c220c762dd9a8b08f7a3d95522b765/src/x86/sse2.rs#L112 --- Or is there something about the reverse implementation that is special that I am not understanding?
- How is this reconciled with things like "Both the starting and resulting pointer must be either in bounds or one byte past the end of the same allocated object." from the docs of
pointer::offset
?
from memchr.
Sure, I will try to answer the immediate questions first.
The phrase "may conflict with stacked borrows" has absolutely no meaning to me. I recall reading Ralf's post on stacked borrows a while back, but I promise you that it didn't fully sink in.
The phrase says: Under the memory model of stacked borrows, the implementation here is illegal. Since stacked borrows is not the official memory model of Rust (yet?) this doesn't make the code have undefined behaviour but the compiler may be changed so that it has. Since adapting the code to be defined even under stacked borrowed should be possible, this is a future proofing issue.
From what I can tell, creating the pointer manually from the starting raw pointer and creating it from the empty slice at the end seem like equivalent operations to me? What's going on in the language semantics that cause these two things to actually be different?
A part of stacked borrows is 'pointer provenance', the concept that the source of a pointer may influence the allowed operations. In particular, two pointers that compare equal may not be allowed to be used interchangably (for dereferencing). (Some more reading, another blog post by Ralf Jung.) And so, while the pointer of the empty slice at the end equals the pointer offset from the starting pointer, dereferencing the two pointers or other pointers created from them need not be allowed equally.
The end_ptr
was created by casting a reference to the empty slice at the end. Under the strict pointer provenance model it must only be used to access that empty slice. In contrast, start_ptr
is created by casting the reference to the whole slice (haystack.as_ptr()
does this internally) and is thus valid for dereferencing it at any index within the slice. Meanwhile, pointer offsetting and comparison is itself allowed on both pointers equally.
Or is there something about the reverse implementation that is special that I am not understanding?
Yes, see the above. memchr
only inspects end_ptr
, compares it against ptr
and subtracts the two—but it never dereferences end_ptr
. That is the crucial part.
How is this reconciled with things like "Both the starting and resulting pointer must be either in bounds or one byte past the end of the same allocated object." from the docs of pointer::offset?
The offsetting is allowed but dereferencing/reading from the pointee may be still illegal. Note that the two unsafe
operations require separate reasoning.
from memchr.
From what I can tell, creating the pointer manually from the starting raw pointer and creating it from the empty slice at the end seem like equivalent operations to me? What's going on in the language semantics that cause these two things to actually be different?
To clarify, this is where the mismatch comes in. @HeroicKatora already mentioned pointer provenance, so let me just point you to rust-lang/unsafe-code-guidelines#134 where we track if maybe Stacked Borrows is too strict here (but also, we might lose basically all aliasing-based optimizations if we relax this). Also see this discussion where I explained the issue in some more details:
Raw pointer operations to not affect the permissions a pointer has, so the moment you are casting a reference to a raw pointer, you are deciding what is allowed to be done with all raw pointers ever created from this one.
Basically, &slice[0] as *const T
and slice.as_ptr()
are not equivalent (even ignoring the empty slice case): the former is a ref-to-raw cast of the first element, so the resulting raw pointer may only access that element; the latter does a ref-to-raw cast of the entire slice, so the resulting raw pointer may be used on the entire slice. (To be clear, as_ptr
does no magic, it's implementation just does the right thing: first cast to wide raw pointer for entire slice, and only then cast to thin raw ptr for first element.)
from memchr.
This will be fixed in #82. I know this technically isn't required yet, but I don't see any good reason not to do it.
from memchr.
Interesting. Thanks for elaborating. I think I grok this at a surface level, but probably still lack a deeper understanding. I'll mark this as a bug for now. I'd almost slightly prefer to hold off on fixing it until Miri is able to track it, so that we get some confidence that this is the right play.
from memchr.
Related Issues (20)
- Use of `AtomicPtr` in `unsafe_ifunc` prevents memchr from being inlined when compiled with avx enabled HOT 4
- Error building for arm64 on a am64 with docker HOT 2
- Provide const implementation HOT 4
- Poor performance on Zen 1/Threadripper due to loop unrolling HOT 4
- Build script flagged as Virus by Kaspersky Endpoint Security HOT 1
- memchr 2.5.0 fails to compile on Android HOT 3
- Runtime configuration of byte frequency table used to classify rare bytes HOT 10
- Feature request: no-std + alloc HOT 3
- no-std + cpu feature detection? HOT 5
- failed to compile `memchr 2.5.0` on macos-12 and macos-11 HOT 2
- `Memchr<'_>` is no longer `Sync` from v2.6 HOT 2
- consider semver-checks HOT 8
- New SIMD improvements break MSRV for aarch64 targets HOT 2
- Update Cargo.toml `package.description` HOT 1
- `x86` performance regression `2.5.0` -> `2.6.0` HOT 16
- The force-enabling of SIMD128 feature can lead to unloadable WASMs in some browsers HOT 9
- Cannot compile with thin-lto on MinGW HOT 1
- Alternate API ordering suggestion HOT 2
- `memchr4` HOT 1
- [Windows 10] Failed to run custom build command for 'memchr v2.4.1' HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from memchr.