Git Product home page Git Product logo

Comments (15)

saethlin avatar saethlin commented on July 29, 2024 2

It's just strange to see this behavior change in a minor release of the language.

The change here was not to the documented behavior of a stable API (we never guaranteed any particular pointer value or range), so there's no semver reason to forbid it. Your system accidentally depended on an implementation detail, and we must be allowed to change implementation details.

from rust.

asquared31415 avatar asquared31415 commented on July 29, 2024

0x01 is a valid pointer for a zero length slice. This was an intentional change for pointers to constants of zero size (I can't find the PR that added it though)

@rustbot label -needs-triage -C-bug +C-discussion

from rust.

workingjubilee avatar workingjubilee commented on July 29, 2024

It also requires the type, str, require an alignment of only 1. Which it does.

These are both constants and the compiler can, by definition, even if we allocate them somewhere in the binary, put them anywhere. It is also permitted to e.g. take these two strings and make the relevant assert pass. Or not. You shouldn't rely on the address of constants, only a static could ever possibly be anything meaningful there (and it is also probable that we can make zero-sized statics occupy the same address).

let x = "lol, lmao";
let y = "lol";
assert_eq!(x.as_ptr(), y.as_ptr());

The PR in question was #123936

from rust.

workingjubilee avatar workingjubilee commented on July 29, 2024

Could you explain what code you are concerned about that this change breaks, and if it differs from either of these cases?

from rust.

flaub avatar flaub commented on July 29, 2024

Here's my problem with this, the ZST is somehow being extended to a type that isn't a ZST or could be changed to a DST.

For instance:

let mut buf: &[u8] = &[]; // starts off as a ZST
buf = &[1]; // clearly not a ZST

It feels like a ZST should only exist in the context of construction, but in later contexts, the type alone is not enough to express that something is a ZST. Like &[] is fine, but if I assign it to a &[u8] it feels like all bets are off.

The code where I am running into issues is similar to FFI. In our case it's a syscall that is used for interfacing a guest running in a virtual machine with a host. The actual test case is:

let mut input: &[u8] = &[];
let mut input_len: usize = 0;

for _ in 0..count {
    let host_data = env::send_recv_slice::<u8, u8>(SYS_MULTI_TEST, &input[..input_len]);

    input = bytemuck::cast_slice(host_data);
    input_len = input.len();
}

The env::send_recv_slice calls a syscall which operates on pointers and eventually operates on raw u32 values.

I think everything would be fine if we could just detect (ideally at compile-time) and disallow ZSTs from calling send_recv_slice, the signature is:

pub fn send_recv_slice<T: Pod, U: Pod>(syscall_name: SyscallName, to_host: &[T]) -> &'static [U]

which eventually calls:

/// Exchange data with the host.
pub fn syscall(syscall: SyscallName, to_host: &[u8], from_host: &mut [u32]) -> syscall::Return {
    unsafe {
        syscall_2(
            syscall,
            from_host.as_mut_ptr(),
            from_host.len(),
            to_host.as_ptr() as u32,
            to_host.len() as u32,
        )
    }
}

from rust.

flaub avatar flaub commented on July 29, 2024

I suppose we could check for a buf.len() == 0 and pass a null pointer instead, but now we need to add extra logic and in our context, every cycle is very precious.

from rust.

workingjubilee avatar workingjubilee commented on July 29, 2024

So the syscall itself performs undefined behavior on being told that it should read and write 0 bytes, and you do not have control over how it behaves?

from rust.

asquared31415 avatar asquared31415 commented on July 29, 2024

You should not be accessing more than 0 bytes behind a slice with length 0, that's UB no matter where the pointer is. I don't see the issue here without knowing exactly what you're doing with that pointer.

from rust.

workingjubilee avatar workingjubilee commented on July 29, 2024

Here's my problem with this, the ZST is somehow being extended to a type that isn't a ZST or could be changed to a DST.

What, exactly, is the actual problem? Does the program now segfault where it previously did not?

from rust.

flaub avatar flaub commented on July 29, 2024

In our hypervisor, we are checking that addresses that are sent from the guest do not land in the NULL page (0x0000 - 0x0400). This breaks that assumption and so an assertion is firing.

I think we were under the impression that a pointer would always look and feel like a pointer, and that it would come from a region of memory that was either in .data, .sdata, in the heap, or on the stack. To come across a pointer with the value of 0x01 is odd.

I suppose we can add more checks that if a slice has a length of 0 then we must treat the pointer as garbage. It's just strange to see this behavior change in a minor release of the language.

from rust.

workingjubilee avatar workingjubilee commented on July 29, 2024

There is no requirement I am aware of that a pointer to a 0-length slice must point inside the range of any of the locations you mentioned.

from rust.

workingjubilee avatar workingjubilee commented on July 29, 2024

It would be, for instance, also permissible for us to make these pointers all instead point at 0x8000_0000_0000_0000 (aka isize::MIN). That would not trigger your assert but would not make more sense if you assume that it has to have bytes behind it somewhere.

from rust.

saethlin avatar saethlin commented on July 29, 2024

Also I'm surprised you didn't run into this before, empty Vecs have used these same addresses forever.

fn main() {
    let v: Vec<u8> = Vec::new();
    println!("{:?}", v.as_ptr());
}

prints 0x1 since Rust 1.0.

from rust.

tgross35 avatar tgross35 commented on July 29, 2024

Iirc zero-sized pointers often get their values from the logic of ptr::dangling. Which has been true for heap allocations for a while, maybe only more recently for things that don't go through an allocator? This is usually okay because attempting to access a zero-sized pointer should never happen.

In our hypervisor, we are checking that addresses that are sent from the guest do not land in the NULL page (0x0000 - 0x0400). This breaks that assumption and so an assertion is firing.

What exactly does this check - that anything with a pointer type doesn't have a value < 0x0400, or that accesses don't hit that address? It would seem unusual to somehow check the value rather than the accesses, and I am curious how this is done if this is the case. Having a NULL pointer in existence certainly shouldn't be an issue, but accessing it is obviously a fault.

from rust.

workingjubilee avatar workingjubilee commented on July 29, 2024

About "every cycle is precious", @flaub: It may interest you to know that in C, all of the mem* family of functions must also assume that passing a pointer that does not point to any byte within bounds of an existing object, but with an argument of 0, cannot be made undefined. This is because the pointer may be the "1 past the end" pointer. Obviously a pointer to 0x01 is a problem, but that is why N3261 proposes to simply accept memcpy, etc. on null pointers, and LLVM's mem* intrinsics specifically allow doing so on invalid pointers (like our dangling 0x01 pointers) as well.

Note this implies it would be, e.g., valid for LLVM to replace any pointer that eventually becomes an argument to these intrinsics, but has a len of 0, with another pointer. Perhaps null, or perhaps one with an address of 0x01? LLVM assumes that the system's libc handles this sort of thing correctly. In practice, they do, as mentioned by that paper. So I must ask, is this syscall truly so hot that it is hotter than memcpy?

But that is all of academic interest. There does not seem to be a problem here? This is unfortunately no different than relying on us emitting 10 instructions instead of 9 or 11 for a given function: it's an optimizing compiler, not a macro assembler.

from rust.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.