Comments (19)
If you run this after sentry_init
it works for me:
void lockUnhandledExceptionFilter()
{
HMODULE kernel = GetModuleHandle("kernel32.dll");
if (!kernel)
{
printf("[Warning] LockUnhandledExceptionFilter: Failed locating kernel32 module.\n");
return;
}
FARPROC setUnhandledExceptionFilter = GetProcAddress(kernel, "SetUnhandledExceptionFilter");
if (!setUnhandledExceptionFilter)
{
printf("[Warning] LockUnhandledExceptionFilter: Failed locating SetUnhandledExceptionFilter.\n");
return;
}
#ifdef Q_OS_WIN64
static const BYTE newBody[] = {
0x33, 0xC0, // xor eax,eax
0xC3 // ret
};
#else
static const BYTE newBody[] = {
0x33, 0xC0, // xor eax,eax
0xC2, 0x04, 0x00, // ret 4
};
#endif
SIZE_T bytes = 0;
if (WriteProcessMemory(GetCurrentProcess(), (LPVOID)setUnhandledExceptionFilter, (LPCVOID)newBody, sizeof(newBody), &bytes) == 0)
{
printf("[Warning] LockUnhandledExceptionFilter: Failed writing process memory.\n");
}
}
I tried to simplify the other examples I found as much as possible but I'm no Windows programmer so take it with a grain of salt.
The best way to get to the bottom of what it is that clears the filter would be to make a proper hook, and print the callstack in there, but its somewhat involved and I'm not sure what I would do with the info anyway since I guess it theoretically could be multiple things.
To get the WER module you have to build sentry-native with CRASHPAD_WER_ENABLED
from sentry-native.
Is there any updates on this issue?
We must understand that we cannot solve this problem within the Native SDK.
While a solution like the one proposed here: #833 (comment) can work to mitigate the issue, we have to be clear that this kind of solution is the reason why the problem exists in the first place.
We already had support topics that took weeks to resolve. In the end, we discovered that a customer code dependency overwrote SetUnhandledExceptionFilter()
in the same way described above. This means that it is not only wrong inside the Native SDK but also wrong as an "officially" suggested mitigation from Sentry's perspective.
Also, sentry_reinstall_backend()
is not a solution to the problem but rather a way to identify and debug the issue. If it is used as a permanent mitigation, then only as a last resort, for instance, when a closed source dependency with an unresponsive dev team misbehaves.
IMO, the correct approach for Sentry is to extend the "Advanced Usage" documentation and provide users of the Native SDK with steps to identify the issue, isolate potential culprits, and implement solid mitigations. I did this with multiple users in an ad hoc fashion over the last two years, and while all users had different setups and root causes, there were a lot of common steps in between. Another aspect is to collect known culprits and potential mitigations, if any.
Since the focus over the last six to nine months has been primarily on supporting downstream SDKs to improve quality on mobile platforms, these Desktop documentation topics have been put on the back burner. But we will tackle them.
cc: @kahest
from sentry-native.
IMO hooking the UnhandledExceptionFilter
completely is a really big hammer. We might be breaking apps in surprising ways that we can’t predict yet.
There are legitimate uses for temporarily using it and then restoring the original handler afterwards. We don’t want to break those usecases. Although as the examples show, restoring the original handler is often not done correctly by other tools either.
We might want to offer this either as a compile time flag or an explicit runtime API though, wdyt @supervacuus ? Then the responsibility of (not) breaking any unpredictable usecases is up to the app author.
from sentry-native.
Hi @mntt-SSE! I could not look into this on any of my Windows machines yet, but it seems you can access both types of machines to verify behavior and configuration.
Can you look in the registry to see whether the machines differ in the settings for HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows\Windows Error Reporting\LocalDumps
? I could imagine some unexpected interaction with the crashpad WER module.
from sentry-native.
Thanks for your answer! On a machine where the problem exists, the LocalDumps entry looks like this:
And each entry has a DumpFolder
key with the data %PROGRAMDATA%\NVIDIA Corporation\CrashDumps
. All of them are the same, and none seem to point to the application I'm using, and the DumpFolder
is also different from the one where I get the crash dumps (although that folder also exists, but is empty).
On a machine where Sentry works as expected, no LocalDumps
entry exists at all under Windows Error Reporting
. I verified with 2 machines that don't work and 1 that does work, and they fit the pattern.
from sentry-native.
Thx for verifying, @mntt-SSE; that is what I expected. LocalDumps
bypasses application-local handling in favor of a centralized dump approach. It should be possible to deactivate it for your application by creating a new key (similar to the other apps below the LocalDumps
key), but I haven't tested it yet. Here are the docs for the key: https://learn.microsoft.com/en-us/windows/win32/wer/collecting-user-mode-dumps
from sentry-native.
And each entry has a
DumpFolder
key with the data%PROGRAMDATA%\NVIDIA Corporation\CrashDumps
. All of them are the same, and none seem to point to the application I'm using, and theDumpFolder
is also different from the one where I get the crash dumps (although that folder also exists, but is empty).
Maybe I should also say, that what is interesting are the values in LocalDumps
, not the sub-keys which probably just provide a counter config because those affect all applications not listed below.
from sentry-native.
For further root-causing, it would certainly help if you could also gather the debug output of the Native SDK on the systems where the dump happens in %LOCALAPPDATA%\CrashDumps
; maybe something obvious is already visible there.
from sentry-native.
The LocalDumps
key itself does not contain any values other than (Default), which has no data.
I have so far tried to:
- Create a key under
LocalDumps
with the name of the application, and set itsDumpType
value to 0. - Create a key under
Windows Error Reporting
calledExcludedApplications
, and add a DWORD value with the name of the application as the key, and 1 as the value. - Under the
ExcludedApplications
key, create another key with the name of the application. - Call the Windows API function
WerAddExcludedApplication
. This did the same as 2.
I tried restarting the computer after each step, but nothing worked :/ I also checked the corresponding keys under HKEY_CURRENT_USER
, but there is nothing there.
I tried attaching WinDbg to crashpad_handler.exe but I don't seem to get any logging from it in the Logs widget. Do I have to build it with a certain configuration or flag? I should also mention that I haven't used WinDbg before so I might be missing something.
from sentry-native.
I tried attaching WinDbg to crashpad_handler.exe but I don't seem to get any logging from it in the Logs widget. Do I have to build it with a certain configuration or flag? I should also mention that I haven't used WinDbg before so I might be missing something.
What I meant was that even though you have compared the two logs, which were identical on both types of machines, it would be great to see them.
I am asking because the only other reason I can imagine this happening is if our backend didn't start correctly (and for some reason, sentry_init()
was still successful). Can you check if sentry_init()
returns successfully (i.e., doesn't return 1
) on those machines? From what you described, I don't think the crashpad_handler
is involved in the crash process. It looks like your crash escapes our local exception filter (or rather: that ours wasn't installed and instead the OS handles the crash).
The LocalDumps key itself does not contain any values other than (Default), which has no data.
That means that it is configured to use the default values.
from sentry-native.
What I meant was that even though you have compared the two logs, which were identical on both types of machines, it would be great to see them.
Ah, sorry, here is what's being printed on startup from a computer that doesn't work (which is the same as when it works):
[2023-04-24 16:30:29] Info: Sentry: "received response:
HTTP/1.1 200 OK
Connection: keep-alive
Date: Mon, 24 Apr 2023 14:30:28 GMT
Content-Length: 2
Content-Type: application/json
Server: nginx
Vary: origin,access-control-request-method,access-control-request-headers
access-control-allow-origin: *
access-control-expose-headers: x-sentry-error,x-sentry-rate-limits,retry-after
x-envoy-upstream-service-time: 0
Strict-Transport-Security: max-age=31536000; includeSubDomains; preload
" (level=-1)
[2023-04-24 16:30:29] Info: Sentry: "request handled in 520m" (level=-1)
[2023-04-24 16:30:29] Info: Sentry: "executing task on worker threa" (level=-1)
[2023-04-24 16:30:29] Info: Sentry: "sending request using winhttp to <address>:
x-sentry-auth:<auth> sentry_key=<key>, sentry_version=7, sentry_client=sentry.native/0.5.3
content-type:application/x-sentry-envelope
content-length:17594 " (level=-1)
[2023-04-24 16:30:29] Info: Sentry: "received response:
HTTP/1.1 200 OK
Connection: keep-alive
Date: Mon, 24 Apr 2023 14:30:28 GMT
Content-Length: 41
Content-Type: application/json
Server: nginx
Vary: origin,access-control-request-method,access-control-request-headers
access-control-allow-origin: *
access-control-expose-headers: x-sentry-error,x-sentry-rate-limits,retry-after
x-envoy-upstream-service-time: 0
Strict-Transport-Security: max-age=31536000; includeSubDomains; preload
" (level=-1)
[2023-04-24 16:30:29] Info: Sentry: "request handled in 243m" (level=-1)
I am asking because the only other reason I can imagine this happening is if our backend didn't start correctly (and for some reason, sentry_init() was still successful). Can you check if sentry_init() returns successfully (i.e., doesn't return 1) on those machines? From what you described, I don't think the crashpad_handler is involved in the crash process. It looks like your crash escapes our local exception filter (or rather: that ours wasn't installed and instead the OS handles the crash).
sentry_init()
returns 0
on all machines I've tested it on. And I can also see that the crashpad_handler.exe process is starting when the application is started, and quits when the application crashes (or when it exits nicely).
from sentry-native.
Ah, sorry, here is what's being printed on startup from a computer that doesn't work (which is the same as when it works):
Okay, the logs are wrapped, interleaved, and incomplete. They also show some other transmission (maybe a manual event) but not the crash (because with crashpad
, you would not see a transmission from within our worker_thread
).
Maybe it is easier to debug this issue with the plain sentry_example.exe
. In any case, I would expect a program configured as you described it in your initial description to print all these things - at some point in its execution - until it crashes:
[sentry] INFO using database path "your_database_path"
[sentry] DEBUG starting transport
[sentry] DEBUG starting background worker thread
[sentry] DEBUG starting backend
[sentry] DEBUG background worker thread started
[sentry] DEBUG starting crashpad backend with handler "your_handler_path"
[sentry] DEBUG using minidump url "your_dsn_url"
[sentry] DEBUG registering crashpad WER handler "your_wer_module_dll"
[sentry] INFO started crashpad client handler
[sentry] DEBUG processing and pruning old runs
[sentry] INFO flushing session and queue before crashpad handler
[sentry] INFO handing control over to crashpad
Since you sentry_init()
successfully (at least according to the return value), I am especially interested if you see the last two lines. If those never appear, then the client-side crash-handler is never invoked, which means that the crashpad_handler
will not be triggered (even if it is running).
You can also verify this externally by deleting your database path, executing your program, provoking a crash, and looking into (the newly created) path; there must be a last_crash
marker file in there.
from sentry-native.
I have been plagued by the same issue, that some machines just refuse to send any reports. After reading this discussion and talking to @supervacuus on discord I spent a day trying a bunch of things:
To start by answering the last question, if the 2 last lines of the sentry output: Those lines does appear on working machines and does not on the machines that doesn't work. This indicates that the UnhandledExceptionFilter
is not run, so I started trying to figure out why.
The first thing that proved that it was not a Windows issue (as a lot of blogs and discussions around the web indicates it could be) was that it worked to do sentry_reinstall_backend()
just before triggering the crash.
So something in the startup process overwrites our exception handler, and I'm pretty sure it is not any of our dependencies since it should behave the same for all machines in that case, and I couldn't find any reference to SetUnhandledExceptionFilter
. Not sure what else it could be, some driver or something? I tried inspecting loaded dlls but didn't see anything significant, but I assume it could be something that gets loaded and then released during startup.
I tried the sentry_example.exe
which seems to work fine on all machines, so I would assume it has to do with it being a gui application.
I finally tried replacing the SetUnhandledExceptionFilter
with a simple return by overwriting the function address and it works perfectly, but it feels very sad. Searching a bit on github though this seems to be the way people do it. Maybe it is something sentry-native should do to increase its reliability? I assume if you use sentry-native you want it to have full control of these things. But at the same time it is a bit harsh...
from sentry-native.
Thank you @Cyriuz and @supervacuus for looking into this!
I have also done some more investigation. Firstly, the reason I didn't get the full Sentry logs was because I initialized Sentry before the other log handling. When I switched the order I got output identical to yours @supervacuus, except for the registering crashpad WER handler "your_wer_module_dll"
, where it says no WER handler is registered.
Secondly, I have found a workaround. I'm building a Qt application (Qt 5.15.8), and I noticed that if I initialize Sentry after the first window is shown (using QWidget::show()
), then everything works as expected, which strengthen your theory about it haviing to do with UI @Cyriuz . So now I'm flashing a tiny window before the main one, which is far from ideal, but at least I'm getting the events. Would still be good to have another solution though.
from sentry-native.
That is some very good feedback, thanks for the investigation!
We have indeed heard reports before that Qt / display drivers are constantly overwriting the UnhandledExceptionFilter
, similar to managed language runtimes.
While it would be possible to hook / lock overwriting it, we decided against that, as it is a very deep "incision" that we don’t want to do by default.
from sentry-native.
Fair enough. Although I wonder if it can still be considered somewhat standard as so many other projects seem to do the same? And considering that sentry-native is, in my understanding at least, supposed to take ownership of the entire crash reporting pipeline I feel like there might be an argument for including it, I guess it could even be optional? Or maybe it could just be a good topic for a page on the sentry docs as a workaround so people can find it easier? I assume it potentially affects most people that use sentry native on Windows.
from sentry-native.
Just going to throw in my 2 cents here. Had a very similar issue with regards to QT (5.15.8), the moment it was initialized the entire crash reporting seized to function. There we're no registry keys, or misplaced dumps, in fact there was absolutely nothing.
A quick test basically confirmed what I suspected, I would throw and exception right after sentry initialized, it worked. Did the same right after calling QApplication a(argc, argv);
nope back to being broken.
Weirdly enough it worked on some Windows machines just fine (e.g: server, ...) but on others basically nothing worked. No dumps, no logs (only the one indicating that sentry had initialized), no indication that something went wrong whatsoever.
from sentry-native.
This issue is also affecting a user of the Unreal SDK v0.16.0. It happens with both auto and manual initialisation:
FConfigureSettingsDelegate SettingsDelegate;
SettingsDelegate.BindDynamic(this, &USentryCrashReporterIntegration::HandleSettingsDelegate);
USentrySubsystem* SentrySubsystem = GEngine-<GetEngineSubsystem>USentrySubsystem<();
SentrySubsystem-<InitializeWithSettings(SettingsDelegate);
Is there any updates on this issue?
from sentry-native.
While I respect your decision and stance on the issue, mostly for discussions sake, I would argue that because of the nature of this library, its only purpose being catching crashes, the argument that the "solution is the reason why the problem exists in the first place" is not really helpful. For sure it is not good that many dependencies and drivers use this api wrong/hooks it, but that doesn't change the fact that this is the situation we're in.
We want to take ownership of the exception handling but can't possibly do it any other way. If it was just dependencies it would be one thing, it is easy enough to debug who is resetting the hook and we can complain upstream, but for drivers that does it on customer PCs, its just not feasible.
Is it really that big of a deal to say that we don't want anyone else to be able to handle exceptions, given the scope of this lib? My proposed fix doesn't involve using the apis wrong, just making sure no one else is allowed. I agree that it feels wrong in the context of doing it in a library, and maybe it is also wrong to do it here, but I see no other option and I think it would help a lot of users of this lib to save a lot of time debugging things.
In any case docs covering all of this that is easy to find would be great so you can easily find and make your own decisions :)
from sentry-native.
Related Issues (20)
- Stream envelopes to file directly HOT 2
- Request: Merge getsentry/crashpad PR #101 to fix llvm-mingw arm64 crashpad builds HOT 3
- Add WINE meta-data to OS context.
- Breadcrumbs should use more efficient ringbuffer
- [#inc-517] Add basic crashing test with asan (HW Asan, GWP-ASan)
- install issue HOT 4
- [linux] RISC-V 32/64 support HOT 1
- Android NDK Segfault crash related to `sentry_value_set_by_key_n` HOT 5
- Craspad under Epic's "Easy Anti-Cheat" HOT 6
- Stack overflow not sent to backend HOT 10
- Extend envelope API to allow event-specific attachments HOT 3
- sentry_transaction_set_data() - Discarded unknown attribute
- Only set up sigaltstack if there is none (Linux + Android)
- Add support to build for Windows 11 ARM64 HOT 2
- [Android] Support dynamic page size for Android 15+ devices
- Breadcrumb Duplication for Multi-Transaction Applications HOT 5
- crashpad_handler processes persist after app close HOT 10
- Add static+dynamic stack usage measures and alerts in CI
- Introduce CI checks for binaries in our NDK release AAR
- Crashpad submodule reference doesn't point to a valid commit HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from sentry-native.