Git Product home page Git Product logo

Comments (24)

mockersf avatar mockersf commented on July 21, 2024 4

Thank you for the help and the investigation! It now has been fixed in wgpu, waiting for a backport and a release

from rust.

janhohenheim avatar janhohenheim commented on July 21, 2024 3

I'm a bit sleepy, but I'll try cargo-bisect-rustc now and see how far I get
Update 1: Running the following now

$env:WGPU_BACKEND='dx12'; cargo-bisect-rustc --start=2024-04-17 --end=2024-04-18 --prompt --with-src -- run --example alien_cake_addict

Update 2: Can confirm the reported range (fails on nightly-2024-04-18, and works on nightly-2024-04-17) is correct
Update 3: Let's hope it's not a rollup PR, since that won't get us the exact failing commit on a Windows-only bug:

And even further, if the regression is in a rollup PR, then it will bisect the individual PRs within the rollup. This final bisection is only available for x86_64-unknown-linux-gnu since it is using the builds made for the rustc performance tracker.

from rust.

workingjubilee avatar workingjubilee commented on July 21, 2024 2

If that is the offending commit, then it seems very unlikely that this has not unveiled a bug inside wgpu or whatever else is busy talking to DX12.

from rust.

workingjubilee avatar workingjubilee commented on July 21, 2024 1

Can you get a backtrace in WinDbg or whatever's fashionable?

Or can a "minimalized"1 repro case be produced?

Footnotes

  1. where "minimalized" is "slightly less than the entire .exe there", I am aware it is not going to be small ↩

from rust.

lqd avatar lqd commented on July 21, 2024

A few users reported the same error, so it's not just on virtualised hardware. It's also not on all Windows machines

https://github.com/rust-lang/cargo-bisect-rustc would help locating the nightly (and validate it’s nightly-2024-04-18) and in particular the PR where this issue appeared.

from rust.

janhohenheim avatar janhohenheim commented on July 21, 2024

Per another user running the tools, the offending commit is 38104f3
I'm still running the bisect to confirm this.

from rust.

lqd avatar lqd commented on July 21, 2024

cc PR author @Mark-Simulacrum on that bisection

from rust.

janhohenheim avatar janhohenheim commented on July 21, 2024

My bisection is done and I can confirm the result. This is the output:

searched nightlies: from nightly-2024-04-17 to nightly-2024-04-18
regressed nightly: nightly-2024-04-18
searched commit range: 1cec373...becebb3
regressed commit: 38104f3

bisected with cargo-bisect-rustc v0.6.8

Host triple: x86_64-pc-windows-msvc
Reproduce with:

cargo bisect-rustc --end=2024-04-18 --prompt --with-src -- run --example alien_cake_addict

note by me: also requires $env:WGPU_BACKEND='dx12'; to actually get the crash

from rust.

alice-i-cecile avatar alice-i-cecile commented on July 21, 2024

The offending PR: #123936

from rust.

Brezak avatar Brezak commented on July 21, 2024

When looking at WinDbg the issue seems to arise here. wgpu-hal was converting a normal rust str into a c_char ptr and passing that across ffi to a function expecting a null terminated string. Unluckily for us this string was a empty string "". Before #123936 that meant we'd get a pointer to somewhere we can read from. After the PR we get a 0x1 pointer that will cause a segfault when read from.

from rust.

workingjubilee avatar workingjubilee commented on July 21, 2024

Agh, that is the classic error that I predicted in my remarks here.

from rust.

workingjubilee avatar workingjubilee commented on July 21, 2024

This passes, so changing to c"" should be fine.

#[test]
fn cstrs_dont_dangle() {
    let c_str = c"";
    assert_ne!(c_str.as_ptr(), std::ptr::NonNull::dangling().as_ptr());
}

However... @Brezak Are you sure that's the issue? It seems that this string was created from a CString::new()?

https://github.com/gfx-rs/wgpu/blob/d7a35ecda06742acb4cdd5a83c5b9737f375a70b/wgpu-hal/src/dx12/device.rs#L281-L288

Does that CString actually survive the lifetime of that function call?

from rust.

Brezak avatar Brezak commented on July 21, 2024

The debugger said that source_name had the address 0x1.

from rust.

workingjubilee avatar workingjubilee commented on July 21, 2024

Oh, I was squinting at the wrong line it seems.

from rust.

Brezak avatar Brezak commented on July 21, 2024

I'm not at my computer right now but in the code calling said borked function there was a unwrap_or_default creating a that would get passed into the function.

from rust.

workingjubilee avatar workingjubilee commented on July 21, 2024

yep.

from rust.

Brezak avatar Brezak commented on July 21, 2024

For reference. The source of the source_name: https://github.com/gfx-rs/wgpu/blob/d7a35ecda06742acb4cdd5a83c5b9737f375a70b/wgpu-hal/src/dx12/device.rs#L262-L267

from rust.

workingjubilee avatar workingjubilee commented on July 21, 2024

It seems that the reason this isn't hit on the other path is that D3DCompiler is called pretty much directly by wgpu, whereas the hassle_rs bindings of the DxcCompiler convert strings into null terminated wide strings first.

from rust.

RalfJung avatar RalfJung commented on July 21, 2024

So IIUC this line is unsound; if the callee expects a null-terminated string then you can't just pass a str::as_ptr(). compile_fxc should probably take a &ffi::CStr to represent the fact that the string must be null-terminated, or it needs to allocate a CString itself to guarantee null-termination.

When looking at WinDbg the issue seems to arise here. wgpu-hal was converting a normal rust str into a c_char ptr and passing that across ffi to a function expecting a null terminated string. Unluckily for us this string was a empty string "". Before #123936 that meant we'd get a pointer to somewhere we can read from. After the PR we get a 0x1 pointer that will cause a segfault when read from.

So seems like previously you were lucky that the pointer was pointing to a zero byte and so the c string was considered empty -- but that was still UB since you were outside the bounds of that zero-sized allocation.
Now this OOB is detected by hardware and leads to a segfault.

CString can never be backed by a zero-sized allocation, so looks like stage.module.raw_name is created incorrectly if it has pointer value 0x1. Where is that value set? Is it using a safe constructor?
EDIT: I missed the unwrap_or_default, which creates an &"" here. That then later gets passed as a pointer to a function that wants a null-terminated string, which indeed will go Very Wrong.

FWIW, using CString::to_str() and then accessing the trailing 0 is UB under the Stacked Borrows aliasing rules, since the 0 is outside the bounds of the str. To get a pointer to the entire string including the 0, you need to use CStr::as_ptr(). It seems unlikely that that would be the issue though, AFAIK we do not make use of this UB anywhere in the compiler.

from rust.

workingjubilee avatar workingjubilee commented on July 21, 2024

@RalfJung Note that hassle_rs::DxcCompiler::compile is somewhat to "blame" here: its API design requests a &str and then "helpfully" converts this to the requisite string, allocating to do so. Thus the compile_fxc API implemented by wgpu matches the signature that is used by compile_dxc, which partially matches the signature of DxcCompiler::compile.

from rust.

RalfJung avatar RalfJung commented on July 21, 2024

hassle_rs is providing a nice Rusty API and doing so correctly. I don't think they have to take any blame here.

wgpu tried and failed to provide the same nice Rusty API, and used an &str in a way that is not backed by its invariant.

(For anyone reading along, also see the discussion here.)

from rust.

workingjubilee avatar workingjubilee commented on July 21, 2024

It is a Rusty API, it's just slightly perplexing to me because the caller is holding on to a CString anyways, it seems.

from rust.

RalfJung avatar RalfJung commented on July 21, 2024

from rust.

workingjubilee avatar workingjubilee commented on July 21, 2024

Hopefully!

Actually what's more confusing, now that I look closer, is that the underlying compiler API for DXC takes a wide string (thus it does necessitate allocation, it seems...)... and this is the newer one? It seems like that should be the other way around, knowing Microsoft API version history, and the new one should accept a UTF-8 string, and the older one should ask for a wide string...

...oh wait, both APIs are deprecated, actually, and Microsoft is on to IDxcCompiler3::Compile now. Well!

from rust.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.