Git Product home page Git Product logo

Comments (12)

bitsandfoxes avatar bitsandfoxes commented on June 16, 2024 1

Yes, sentry-dotnet initializes sentry-native when targeting NET 8 or newer when running native AOT. You can opt-out of this behavior through options.InitNativeSdks = false;.

from sentry-native.

kahest avatar kahest commented on June 16, 2024

Hi @maxp1256, thanks for writing in, we'll need to investigate this. It's curious that it would work on one machine but not on the other - did you see any other differences in behavior between the two machines? Also, do you use sentry.io and would you be able to provide a link to one of the EXCEPTION_ACCESS_VIOLATION_READ events you got? Either here or you can send it to [email protected]

from sentry-native.

maxp1256 avatar maxp1256 commented on June 16, 2024

Hello @kahest!
I'm using on-prem so I can't provide a link to the event. It seems that sentry isn't working for any kind of native exception on this machine. I've already compared registry (Windows Error Reporting, ...) but didn't find any differences. Both machines are in the same domain and have the same GPOs so I don't expect any relevant differences.

Is it possible to enable any more detailed logging for the crashpad registration in WER or is there anything (registry key, pipes, logs) I can check?

from sentry-native.

supervacuus avatar supervacuus commented on June 16, 2024

Hi @maxp1256.

Would you happen to have access to both machines so that you can look at the respective logs?

The first step should be determining whether the UnhandledExceptionFilter triggers on both machines. This can easily be checked when enabling debug logs in the Native SDK. When you trigger your crash, you should see the following log lines:

flushing session and queue before crashpad handler
handing control over to crashpad

If those are missing, we know the UnhandledExceptionFilter isn't firing. In that case, it would also be helpful if you could run the example application on both machines to see whether the issue is environmental (related to some config on the computer) or related to unexpected execution paths in your application.

Is it possible to enable any more detailed logging for the crashpad registration in WER or is there anything (registry key, pipes, logs) I can check?

If you build the Native SDK as a Debug build, crashpad will add additional logs. Those can sometimes be helpful, but mostly when initialization fails (sentry_init() returns 1).

What makes you think that the WER module is involved in crash handling? Do you also trigger a fast-fail crash? I am asking because, from your description, an SEH exception has been raised, so the WER registration should be irrelevant.

from sentry-native.

maxp1256 avatar maxp1256 commented on June 16, 2024

Hi @supervacuus
Yes, I've direct access to both test-)machines and can test/debug everything!

I've enabled sentry debug-log (sentry_options_set_debug) and can verify, that sentry_init is returning 0 on both machines.

I can see the UnhandledExecptionFilter on the 1. machine (where it's working as expected):

Line 765: 11.03.2024 11:09:47.764/4624/7476 Info: [sentry] INFO flushing session and queue before crashpad handler
Line 768: 11.03.2024 11:09:47.765/4624/7476 Info: [sentry] INFO handing control over to crashpad

But those lines are missing on the 2nd machine! Do I need a debug-build of sentry for a further diagnose or isn't it helpful in this case?

I've seen that the WER-Module is not involved in this crashdump, but I've found out, that the WER-path is working in both machines. So it seems that the "only" problem is the unhandled exception filter-path with crashpad.

from sentry-native.

supervacuus avatar supervacuus commented on June 16, 2024

But those lines are missing on the 2nd machine! Do I need a debug-build of sentry for a further diagnose or isn't it helpful in this case?

This is very helpful feedback. No, a debug build won't help in this case, at least not as an immediate next step.

If the UnhandledExceptionFilter isn't running, that typically means that some component is overwriting the UnhandledExceptionFilter our backend has installed. At this point, I cannot tell why this differs between the two machines, but this is typically caused from inside the process (and rarely from the outside).

If you can run the sentry_example on your failing machine using:

.\sentry_example.exe log crash

and before that, set your SENTRY_DSN in the environment and make sure that crashpad_handler.exe, sentry.dll, and crashpad_wer.dll (optional) are collocated with the executable; then, we will know whether this affects executables in general on that machine or whether it's a behavior your application shows on that machine.

If the sentry_example works fine, the next step is determining where the overwrite happens in your code. You can either set up a debug session (if you have a debugger available on these machines) that lets the debugger break on SetUnhandledExceptionFilter or move the sentry_init() closer to the crash location. The latter can also be accomplished by a call to sentry_reinstall_backend(), which only reruns the backend initialization so that you don't have to move your configuration with sentry_init().

As a first step, you can place sentry_reinstall_backend() right next to your crash cause, and then you would typically bisect back to sentry_init().

from sentry-native.

maxp1256 avatar maxp1256 commented on June 16, 2024

Hello @supervacuus
I've done further diagnostics on this issue. the sentry_example is working without a problem on this machine, but we've still issues with our app. Our app contains a few applications, some are based on C/C++ but there are also some newer parts with C#/.NET. We still have some older components which are native and load newer .NET modules and we've newer components which are based on .NET but still need to load older C/C++ components at runtime. Nobody is perfect so when we included sentry, we also need to include the .NET Module of sentry to cover "new" exceptions in the newer .NET parts. Is it possible that the .NET part has side effects to native? Are there some best practices which one should be loaded first/last or documents about side effects?

Regarding your tips: I've tried to call sentry_reinstall_backend before the simulated native crash and it's working, so it seems that the handler is indeed overwritten, but I don't know why and when. When I call sentry_reinstall_backend 1 minute after the program start (where C/C++ and .NET is fully initialized) than it's still not guaranteed to be working, but sometimes it does.

I can remember that I've problems with exceptions in the .NET part (2 years ago with sentry 0.2.x) and I solved it with pre-registering a ExceptionHandler returning EXCEPTION_CONTINUE_SEARCH unless it's initialized completely. But it seems that the workaround was lost while we upgraded to 0.6.5. I've also to investigate this.

So I guess that there's a race-condition and the native part is registered on this machine before the .NET part and the .NET part maybe is not calling the previous delegate (function return of SetUnhandledExceptionFilter) correctly which ends up with (classic) windows watson.

When I read the docs for sentry_reinstall_backend it says that it's dangerous and should not be done without reason. I'm asking: Is it really dangerous and should I avoid it for a production workaround?

from sentry-native.

kahest avatar kahest commented on June 16, 2024

Hey @maxp1256 thanks for reporting back - we'll get back to you in a bit

from sentry-native.

supervacuus avatar supervacuus commented on June 16, 2024

Is it possible that the .NET part has side effects to native?

That is a good question. AFAIK sentry-dotnet only hooks into the UEF (with sentry-native as the acting dependency) with its NativeAOT support. So, if you use sentry-dotnet in a NativeAOT configuration, another sentry-native may be packaged and initialized with your application. In this case, it would be sensible to turn off native crash reporting in sentry-dotnet and use sentry-native solely from your native libraries.

@bitsandfoxes, is this close to the truth?

If you are not using the NativeAOT feature in sentry-dotnet, I cannot imagine that a UEF would be overwritten from that SDK. Every Windows executable (including .NET executables) will start with a default UEF from the Windows CRT, and CoreCLR executables overwrite these during startup again:

https://github.com/dotnet/runtime/blob/589beb0b2af3edd1b272f8ac21ca5e3fb142bdd0/src/coreclr/vm/ceemain.cpp#L833-L841

So, depending on how your application is structured, loading the .NET execution engine after initializing sentry-native will also overwrite the UEF.

Are there some best practices which one should be loaded first/last or documents about side effects?

Is this question meant in the context of SDKs or UEF in general? If it is the former, there is no more to know than what I mentioned above (although I would be happy if @bitsandfoxes could verify). The dotnet SDK primarily focuses on uncaught exceptions from managed code, not native crashes. If it is the latter, then only one UEF can be installed at any given moment in a Windows Application—the last one wins. It would be best to reinstall the sentry backend at the latest point that still allows you to catch your crashes.

I can remember that I've problems with exceptions in the .NET part (2 years ago with sentry 0.2.x) and I solved it with pre-registering a ExceptionHandler returning EXCEPTION_CONTINUE_SEARCH unless it's initialized completely. But it seems that the workaround was lost while we upgraded to 0.6.5. I've also to investigate this.

It would be interesting to understand what this means exactly. How do you pre-register? What is the sequence? What problem did it solve?

So I guess that there's a race-condition and the native part is registered on this machine before the .NET part and the .NET

I think there are two parts to this issue:

  • the fact that the overwrite only seems to happen depending on some machine-state
  • finding the overwrite

The latter will typically lead to the former. I can only recommend that you use a native debugger (WinDbg or the C/C++ debugger in VisualStudio) and set a breakpoint for the calls to SetUnhandledExceptionFilter:

  • in WinDbg, you will typically have to set it to the symbol stub: bp KERNEL32!SetUnhandledExceptionFilterStub. If you cannot find that symbol, look it up using x KERNEL32!SetUnhandledExceptionFilter*
  • in VisualStudio, you can set the breakpoint via Debug > New Breakpoint > New Function Breakpoint... and specify the function as {,,KERNEL32.DLL}SetUnhandledExceptionFilter (no need to consider symbol stubs).

In both, you can look at the stack trace to find the path to the overwrite (as mentioned above, there will be multiple breaks). Of course, it is still possible that this overwrite happens somewhere in a closed-source dependency, but at least then, you can isolate the issue.

When I read the docs for sentry_reinstall_backend it says that it's dangerous and should not be done without reason. I'm asking: Is it really dangerous and should I avoid it for a production workaround?

While I haven't written that line, I could imagine it meant not to just move around backend reinstallation until it works. You have to understand which components need the UnhandledExceptionsFilter and why. In some scenarios, a short-lived UEF is necessary for a component to work correctly, and our global handler should be installed after these.

In that sense, sentry_reinstall_backend() is meant as a debugging tool to narrow down culprits. It may even be used in a production scenario when you need to overwrite a UEF whose source you have clearly identified and have no other option to disable or install at a different point in your program flow.

from sentry-native.

getsantry avatar getsantry commented on June 16, 2024

This issue has gone three weeks without activity. In another week, I will close it.

But! If you comment or otherwise update it, I will reset the clock, and if you remove the label Waiting for: Community, I will leave it alone ... forever!


"A weed is but an unloved flower." ― Ella Wheeler Wilcox 🥀

from sentry-native.

maxp1256 avatar maxp1256 commented on June 16, 2024

Hello,
sorry for my late response, but meanwhile there were some other critical problems on another part of our project, so I'd to postpone this one.

Thanks to @supervacuus who linked #644 which directed me to #706. This is a very interesting thing, because both of my test machines are using an intel GPU and indeed the problematic machine is using an older version of the Intel HD driver. This could be a very good reason for this problem.
for reference:
Sentry is working fine with Intel HD driver 31.0.101.5186
Sentry NOT working with Intel HD driver 31.0.101.2111

@bitsandfoxes Currently we're targeting .NET 4.6.x and 4.8.x - Correct me if I am wrong, but then we've still to init both parts seperately?

@supervacuus: Ad workaround: If I remeber correct - it was needed to ensure that the native exception doesn't capture the (.NET) exception before the .NET handler was installed, because otherwise .NET exceptions were captured as native Exception with minidump but without .NET Stacktrace.

from sentry-native.

supervacuus avatar supervacuus commented on June 16, 2024

Thanks for the update, @maxp1256!

This is a very interesting thing, because both of my test machines are using an intel GPU and indeed the problematic machine is using an older version of the Intel HD driver. This could be a very good reason for this problem. for reference: Sentry is working fine with Intel HD driver 31.0.101.5186 Sentry NOT working with Intel HD driver 31.0.101.2111

To be clear, have you tested that updating the driver on the affected machine fixes the issue?

It would be interesting if it is the Intel driver that somehow installs a UEF (or disables registering one) inside the application process or if there is a dependency in your application that acts weirdly because of that Intel driver.

@supervacuus: Ad workaround: If I remeber correct - it was needed to ensure that the native exception doesn't capture the (.NET) exception before the .NET handler was installed, because otherwise .NET exceptions were captured as native Exception with minidump but without .NET Stacktrace.

Okay, that makes sense. I can imagine that the .net runtime will raise an error if a managed code exception isn't handled, and that native crash handling will then detect that error. But if both are handled, there should not be interferences between the two.

from sentry-native.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.