Git Product home page Git Product logo

Comments (10)

supervacuus avatar supervacuus commented on June 20, 2024 2

I experimented a little more, and I can provoke stack overflows that hit your scenario. I seemingly have hit the __chkstk in all my previous tests, and now, through minor reordering in the triggering function, I can bypass it and land on a global guard page.

At this point, the stack is exhausted enough that the UnhandledExceptionFilter is no longer invoked. SetThreadStackGuarantee() is a fix for that. I will create a stable integration test for an exhausted stack and add docs and the fix to the code.

cc @kahest, this is sigaltstack() for Windows. I will add related docs when I add the Android docs. I also wonder whether extending the WER module for stack-overflows wouldn't be an interesting scenario since it should be informed about these exceptions, too, independent of whether all threads have called SetThreadStackGuarantee() or not. It also would make us independent of guesstimating the stack's size (including required docs to let users tune it), given that the clients' crash hooks could considerably increase the required stack size.

@pontusn, you don't need to do any more checks on your side. Thanks again for the report!

from sentry-native.

supervacuus avatar supervacuus commented on June 20, 2024 1

Any information on when fix for this will be included in SDK?

Not before next week. Other tasks, including precursors for this topic, are still in progress. After these tasks it will be the next thing I will work on.

Just to be clear, we cannot provide a comprehensive fix for this problem because we don't own all threads. For these, it is mostly a case for the documentation.

We will provide a fix for the thread that executes sentry_init() and, I think, adapt the crashpad WER module so that we can report crashes like this one even if the in-process crash handler crashes due to stack exhaustion. But this only works for users of the crashpad backend.

from sentry-native.

supervacuus avatar supervacuus commented on June 20, 2024

Hi @pontusn, thanks for the report. Can you provide more information?

  • which crash-backend (crashpad, breakpad, inproc) do you use?
  • do you specify any stack-relevant parameters (/STACK, /GS, /Gs, etc.) in your build?
  • do you see the same behavior on 64-bit systems?
  • do other crashes report correctly?
  • do you register either an on_crash or before_send hook in which you allocate memory on the stack?

Generate stack overflow by allocating all available stack and then calling into any function.

I am just asking because I can't reproduce this. I regularly test against stack overflows, and Sentry certainly receives stack overflows on Windows.

Can you elaborate on your mitigations? Did any of them fix the failure to report (i.e., do you see EXCEPTION_STACK_OVERFLOW events in Sentry after applying your mitigations)?

In particular, _resetstkoflw should not be called by any filter functions, which are the primary handlers in all our backends; it is also unclear how it could help in that situation inside an UnhandledExceptionFilter.

SetThreadStackGuarantee() is a function that sets the guarantee per thread and will only provide the requested stack size for the thread in which sentry_init() would be run. So, this is not a generic solution, but I could imagine adding this to the docs. However, that information would only make sense for us to add after we understand your setup better.

However, I would very much prefer that this situations was handled also with "naive" integration.

Can you explain what you mean by "naive" integrations? What do you expect from the Native SDK?

from sentry-native.

pontusn avatar pontusn commented on June 20, 2024
  • Crashpad (default)
  • /GS
  • /STACK was previously missing (ie 1Mb), now explicitly set to 2Mb
  • Our application is 32-bit only, hence no information about 64-bit.
  • Other crashes are reported, but since this failed I really don't know if other crashes are missing
  • No hooks

The crash I was investigating happened inside MSFTEDIT_CLASS when handling WM_COPY for 260000 characters. I suspect some internal handling use _alloca during conversion between RTF, ANSI and UNICODE to place all formats on clipboard.

We have been using Sentry Native SDK with great results for +3 years. The "naive" integrations is simply calling sentry_init and start collecting. Now our integration goes much deeper, but the core is the simplicity. I expect that the SDK will continue to help us improve software quality in ways no one in the team initially imagined.

I'll try to investigate the problem deeper, but for some reason these crashes are nor uploaded....

from sentry-native.

pontusn avatar pontusn commented on June 20, 2024

I've now confirmed that usage of _alloca is the trigger.

from sentry-native.

supervacuus avatar supervacuus commented on June 20, 2024
  • Crashpad (default)
  • /GS
  • /STACK was previously missing (ie 1Mb), now explicitly set to 2Mb
  • Our application is 32-bit only, hence no information about 64-bit.
  • Other crashes are reported, but since this failed I really don't know if other crashes are missing
  • No hooks

Thanks! With my question regarding other kinds of crashes, I want to ensure that stack-overflows specifically do not report instead of you experiencing a regression with crash reporting in general. Can I ask you to check a few things?

  • Did any of your mentioned mitigations fix the reporting issue (i.e., not only prevent the stack-overflow from happening but allow the Native SDK to report a stack-overflow when it happens)?
  • Can you access the Native SDK debug log in the affected application? If so, do you see the log entries "flushing session and queue before crashpad handler" and "handing control over to crashpad" when the stack-overflow happens? Do you see any warnings during the initialization?
  • Do you see new *.dmp files being produced in <database-path>/reports directory?
  • Can you provoke another crash, like an access violation in which you write to a NULL pointer or another invalid address, and check whether it reports correctly?
  • Can you check that the crashpad_handler.exe is running when you trigger the stack-overflow?

We have been using Sentry Native SDK with great results for +3 years. The "naive" integrations is simply calling sentry_init and start collecting. Now our integration goes much deeper, but the core is the simplicity. I expect that the SDK will continue to help us improve software quality in ways no one in the team initially imagined.

It's awesome that the Native SDK has given you value over the years. We'll try to keep it that way! None of the mentioned mitigations should be necessary for the Native SDK to detect the crash, so let's find out why it is not reporting the stack overflow.

On the other hand, if you expect that some part of your application requires stack allocations of any considerable size and there is no way to change that to a heap allocation, then I think specifying larger stack commit/reserve values is the right approach, independent of your use of the Native SDK.

from sentry-native.

pontusn avatar pontusn commented on June 20, 2024

Any information on when fix for this will be included in SDK?

from sentry-native.

pontusn avatar pontusn commented on June 20, 2024

For our product, where the integration with Sentry is in a separate DLL, we hooked thread-level injection via DllMain:

extern "C" BOOL WINAPI DllMain(HINSTANCE hInstance, DWORD dwReason, LPVOID lpReserved)
{
    if (dwReason == DLL_PROCESS_ATTACH || dwReason == DLL_THREAD_ATTACH)
    {
        // Reserve memory for handling "stack overflow"
        ULONG ulStackGuarantee = 64 * 1024;
        SetThreadStackGuarantee(&ulStackGuarantee);
    }

    //...
}

I would assume you could use similar pattern for sentry.dll itself. Possibly guarded by some initialization flag.

These articles provided me with interesting details:
https://peteronprogramming.wordpress.com/2016/08/10/crashes-you-cant-handle-easily-2-stack-overflows-on-windows/
https://devblogs.microsoft.com/oldnewthing/20200610-00/?p=103855

from sentry-native.

supervacuus avatar supervacuus commented on June 20, 2024

Thanks @pontusn. The problem with using DllMain is twofold:

  • it only affects threads started after sentry.dll was loaded
  • it doesn't cover the considerable use of the Native SDK as a static library

But I'll consider adding it, even if only as part of the documentation.

from sentry-native.

pontusn avatar pontusn commented on June 20, 2024

Fair limitations, to my understanding there is no alternative method to inject code for thread initialization in Windows.

For most applications I would assume Sentry is among first things to initialize on the start thread, hence all threads should be included for real world use cases.

The static scenario could be handled by embedding a minimal DLL for this purpose that is exported to file and loaded dynamically.

However, as you already mentioned, documentation is probably the key part of addressing this issue

from sentry-native.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.