Comments (10)
I experimented a little more, and I can provoke stack overflows that hit your scenario. I seemingly have hit the __chkstk
in all my previous tests, and now, through minor reordering in the triggering function, I can bypass it and land on a global guard page.
At this point, the stack is exhausted enough that the UnhandledExceptionFilter
is no longer invoked. SetThreadStackGuarantee()
is a fix for that. I will create a stable integration test for an exhausted stack and add docs and the fix to the code.
cc @kahest, this is sigaltstack()
for Windows. I will add related docs when I add the Android docs. I also wonder whether extending the WER module for stack-overflows wouldn't be an interesting scenario since it should be informed about these exceptions, too, independent of whether all threads have called SetThreadStackGuarantee()
or not. It also would make us independent of guesstimating the stack's size (including required docs to let users tune it), given that the clients' crash hooks could considerably increase the required stack size.
@pontusn, you don't need to do any more checks on your side. Thanks again for the report!
from sentry-native.
Any information on when fix for this will be included in SDK?
Not before next week. Other tasks, including precursors for this topic, are still in progress. After these tasks it will be the next thing I will work on.
Just to be clear, we cannot provide a comprehensive fix for this problem because we don't own all threads. For these, it is mostly a case for the documentation.
We will provide a fix for the thread that executes sentry_init()
and, I think, adapt the crashpad WER
module so that we can report crashes like this one even if the in-process crash handler crashes due to stack exhaustion. But this only works for users of the crashpad
backend.
from sentry-native.
Hi @pontusn, thanks for the report. Can you provide more information?
- which crash-backend (crashpad, breakpad, inproc) do you use?
- do you specify any stack-relevant parameters (
/STACK
,/GS
,/Gs
, etc.) in your build? - do you see the same behavior on 64-bit systems?
- do other crashes report correctly?
- do you register either an
on_crash
orbefore_send
hook in which you allocate memory on the stack?
Generate stack overflow by allocating all available stack and then calling into any function.
I am just asking because I can't reproduce this. I regularly test against stack overflows, and Sentry certainly receives stack overflows on Windows.
Can you elaborate on your mitigations? Did any of them fix the failure to report (i.e., do you see EXCEPTION_STACK_OVERFLOW
events in Sentry after applying your mitigations)?
In particular, _resetstkoflw
should not be called by any filter functions, which are the primary handlers in all our backends; it is also unclear how it could help in that situation inside an UnhandledExceptionFilter
.
SetThreadStackGuarantee()
is a function that sets the guarantee per thread and will only provide the requested stack size for the thread in which sentry_init()
would be run. So, this is not a generic solution, but I could imagine adding this to the docs. However, that information would only make sense for us to add after we understand your setup better.
However, I would very much prefer that this situations was handled also with "naive" integration.
Can you explain what you mean by "naive" integrations? What do you expect from the Native SDK?
from sentry-native.
- Crashpad (default)
- /GS
- /STACK was previously missing (ie 1Mb), now explicitly set to 2Mb
- Our application is 32-bit only, hence no information about 64-bit.
- Other crashes are reported, but since this failed I really don't know if other crashes are missing
- No hooks
The crash I was investigating happened inside MSFTEDIT_CLASS when handling WM_COPY for 260000 characters. I suspect some internal handling use _alloca
during conversion between RTF, ANSI and UNICODE to place all formats on clipboard.
We have been using Sentry Native SDK with great results for +3 years. The "naive" integrations is simply calling sentry_init
and start collecting. Now our integration goes much deeper, but the core is the simplicity. I expect that the SDK will continue to help us improve software quality in ways no one in the team initially imagined.
I'll try to investigate the problem deeper, but for some reason these crashes are nor uploaded....
from sentry-native.
I've now confirmed that usage of _alloca
is the trigger.
from sentry-native.
- Crashpad (default)
- /GS
- /STACK was previously missing (ie 1Mb), now explicitly set to 2Mb
- Our application is 32-bit only, hence no information about 64-bit.
- Other crashes are reported, but since this failed I really don't know if other crashes are missing
- No hooks
Thanks! With my question regarding other kinds of crashes, I want to ensure that stack-overflows specifically do not report instead of you experiencing a regression with crash reporting in general. Can I ask you to check a few things?
- Did any of your mentioned mitigations fix the reporting issue (i.e., not only prevent the stack-overflow from happening but allow the Native SDK to report a stack-overflow when it happens)?
- Can you access the Native SDK debug log in the affected application? If so, do you see the log entries
"flushing session and queue before crashpad handler"
and"handing control over to crashpad"
when the stack-overflow happens? Do you see any warnings during the initialization? - Do you see new
*.dmp
files being produced in<database-path>/reports
directory? - Can you provoke another crash, like an access violation in which you write to a NULL pointer or another invalid address, and check whether it reports correctly?
- Can you check that the
crashpad_handler.exe
is running when you trigger the stack-overflow?
We have been using Sentry Native SDK with great results for +3 years. The "naive" integrations is simply calling
sentry_init
and start collecting. Now our integration goes much deeper, but the core is the simplicity. I expect that the SDK will continue to help us improve software quality in ways no one in the team initially imagined.
It's awesome that the Native SDK has given you value over the years. We'll try to keep it that way! None of the mentioned mitigations should be necessary for the Native SDK to detect the crash, so let's find out why it is not reporting the stack overflow.
On the other hand, if you expect that some part of your application requires stack allocations of any considerable size and there is no way to change that to a heap allocation, then I think specifying larger stack commit/reserve values is the right approach, independent of your use of the Native SDK.
from sentry-native.
Any information on when fix for this will be included in SDK?
from sentry-native.
For our product, where the integration with Sentry is in a separate DLL, we hooked thread-level injection via DllMain:
extern "C" BOOL WINAPI DllMain(HINSTANCE hInstance, DWORD dwReason, LPVOID lpReserved)
{
if (dwReason == DLL_PROCESS_ATTACH || dwReason == DLL_THREAD_ATTACH)
{
// Reserve memory for handling "stack overflow"
ULONG ulStackGuarantee = 64 * 1024;
SetThreadStackGuarantee(&ulStackGuarantee);
}
//...
}
I would assume you could use similar pattern for sentry.dll
itself. Possibly guarded by some initialization flag.
These articles provided me with interesting details:
https://peteronprogramming.wordpress.com/2016/08/10/crashes-you-cant-handle-easily-2-stack-overflows-on-windows/
https://devblogs.microsoft.com/oldnewthing/20200610-00/?p=103855
from sentry-native.
Thanks @pontusn. The problem with using DllMain
is twofold:
- it only affects threads started after
sentry.dll
was loaded - it doesn't cover the considerable use of the Native SDK as a static library
But I'll consider adding it, even if only as part of the documentation.
from sentry-native.
Fair limitations, to my understanding there is no alternative method to inject code for thread initialization in Windows.
For most applications I would assume Sentry is among first things to initialize on the start thread, hence all threads should be included for real world use cases.
The static scenario could be handled by embedding a minimal DLL for this purpose that is exported to file and loaded dynamically.
However, as you already mentioned, documentation is probably the key part of addressing this issue
from sentry-native.
Related Issues (20)
- Stream envelopes to file directly HOT 2
- Request: Merge getsentry/crashpad PR #101 to fix llvm-mingw arm64 crashpad builds HOT 3
- Add WINE meta-data to OS context.
- [Epic] #inc-517 Followups HOT 1
- [#inc-517] Add basic crashing test with asan (HW Asan, GWP-ASan)
- install issue HOT 4
- [linux] RISC-V 32/64 support HOT 1
- Android NDK Segfault crash related to `sentry_value_set_by_key_n` HOT 4
- Craspad under Epic's "Easy Anti-Cheat" HOT 6
- Extend envelope API to allow event-specific attachments HOT 3
- sentry_transaction_set_data() - Discarded unknown attribute
- Only set up sigaltstack if there is none (Linux + Android)
- Add support to build for Windows 11 ARM64 HOT 2
- [Android] Support dynamic page size for Android 15+ devices
- Breadcrumb Duplication for Multi-Transaction Applications HOT 5
- crashpad_handler processes persist after app close HOT 10
- Add static+dynamic stack usage measures and alerts in CI
- Introduce CI checks for binaries in our NDK release AAR
- Crashpad submodule reference doesn't point to a valid commit HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from sentry-native.