Git Product home page Git Product logo

Comments (34)

supervacuus avatar supervacuus commented on August 15, 2024 4

Hi, I'm a Sentry Native SDK maintainer. I had the chance to investigate this topic further to understand what was happening.

In short, what is happening is the following:

  1. The dotnet JIT and AOT generate machine code for the provided code snippet that will cause a page fault in the CPU (specifically accessing the NULL page),
  2. The kernel must respond in this case and will invoke the process' signal handler with a SIGSEGV
    Since the Native SDK installs its signal handler last, it will be the first in the signal chain.
  3. It is unaware that it is running inside the CLR, and a SIGSEGV will produce a crash event that will be reported as a native crash.
  4. Ultimately, our signal handler invokes the next in the chain, i.e., the one from the dotnet runtime.
  5. The runtime handler identifies the signal source as part of the generated machine code for its managed code and raises a managed code exception (NullReferenceException) that, if uncaught, will also produce a crash event.
  6. In the case where a signal can be "converted" into a managed code exception, the dotnet runtime handlers will discontinue the signal chain (since the program must continue with the exception), which is why debuggerd will not log any crashes in logcat (and no tombstones will be generated either).

This is why you will get two events and see no crashes in logcat. The SIGSEGV isn't an additional crash (and isn't provoked by the Native SDK either) but the result of optimized CLR-generated native code. The dotnet CLR expects this to happen and converts it to a NullReferenceException in its signal handlers.

The Native SDK will also receive that SIGSEGV, but it acts like the signal was an unrecoverable native crash. That should change. An approach we already tested on Linux (where we can see the same behavior) was to invoke the dotnet runtime handler at the start of our handler (rather than the end).

In the case of a native-provoked managed code exception, our handler would never execute (and, as a result, produce a crash event). Only if the runtime handler continues the signal chain (which would either be an unintended CLR crash or, more likely, a crash in some other native code) will we send a native crash.


If you want to follow along the process in the dotnet runtime:

  1. The dotnet runtime installs its signal handlers here:
    https://github.com/dotnet/runtime/blob/35f62c1c6938da074b4c350d6f81947d38bb316d/src/coreclr/src/pal/src/exception/signal.cpp#L165-L175

  2. While it installs different handlers for each signal type, in our case, the sigsegv_handler()
    https://github.com/dotnet/runtime/blob/35f62c1c6938da074b4c350d6f81947d38bb316d/src/coreclr/src/pal/src/exception/signal.cpp#L509-L572
    they all boil down to checking whether a signal should be handled as a native crash, typically continuing the signal chain or as a managed code crash provoked from its generated native code.

  3. The latter case ends up in the common_signal_handler()
    https://github.com/dotnet/runtime/blob/35f62c1c6938da074b4c350d6f81947d38bb316d/src/coreclr/src/pal/src/exception/signal.cpp#L837-L906
    which produces an SEH exception record from the signal data and in the end calls SEHProcessException()
    https://github.com/dotnet/runtime/blob/main/src/coreclr/pal/src/exception/seh.cpp#L250-L288,
    which either passes the exception on to additional hardware handlers (our case), throws the SEH right there or can't propagate the exception which case the path falls back to signal chaining.

  4. In the end, we either have a CPU exception that will be reported by the OS default handler (and tombstoned/logged to logcat via debuggerd on Android) or (the case in the issue) we never really return to the signal chain, and the program sets the PC to continue raising a managed code exception, ending up here:
    https://github.com/dotnet/runtime/blob/main/src/coreclr/vm/exceptionhandling.cpp#L5537-L5573.

I couldn't see any significant differences between a Linux/Android handling, rather that the same is true for other POSIX systems as well (except where that layer only emulates lower level mechanisms, i.e., Mach and SEH).


I also created a trivial dotnet program (running on a Linux GHA runner) that installs a signal handler similar ours (without all the handling code, but maintaining the signal chain):

https://github.com/supervacuus/signals_dotnet

This program doesn't use any sentry code and shows the same behavior, the last installed signal handler will receive the signal provoked by the generated code while the dotnet runtime handler (if invoked) will produce a NullReferenceException.

from xamarin-android.

supervacuus avatar supervacuus commented on August 15, 2024 3

It sounds like we can close this issue in the dotnet/android repo and create one instead in getsentry/sentry-native then?

Yes, I think this entirely a sentry issue.

from xamarin-android.

grendello avatar grendello commented on August 15, 2024 1

@tranb3r Sentry could intercept the signal if it installed its own handler and didn't chain it up to the previous one before exiting the application. This could prevent the signal from being logged in the logcat.

Sure. But when the app does not use Sentry, there is no segfault in the logcat either, right? So it means it's not Sentry that is preventing the segfault from being logged.

It may also mean that Sentry is causing and catching the segfault. But we don't have enough information to figure that out yet.

from xamarin-android.

grendello avatar grendello commented on August 15, 2024 1

@supervacuus got it :) The details I presume will be more or less the same, as far as the mechanics are concerned, after all we're dealing with standard POSIX way of chaining signals. I mentioned MonoVM just for the record and fullness of information, so that future readers of this issue have a clear image on what's involved here.

from xamarin-android.

grendello avatar grendello commented on August 15, 2024

@tranb3r considering I have no idea what Sentry is and how to use it, I will need more info from you :)

Can you add to this issue logcat output with the segfault, as well as the managed exception strack trace?

The fact that you don't use native code directly doesn't mean it isn't involved, in fact, it's always involved in one manner or another.

If you're able to reproduce this issue locally, please capture logcat output using the following commands:

> adb shell setprop debug.mono.log default,assembly,mono_log_level=debug,mono_log_mask=all
> adb logcat -G 64M
> adb logcat -c
rem Start and crash the app here, wait 2-3 seconds and then:
> adb logcat -d > logcat.txt

from xamarin-android.

tranb3r avatar tranb3r commented on August 15, 2024

considering I have no idea what Sentry is and how to use it, I will need more info from you :)

@grendello
Sentry is cloud-based error tracking for applications.
If you want to reproduce this issue on your machine, you can create a free account in 2 minutes, and then simply copy your account id (it's called DSN) into MauiProgram.cs.

Can you add to this issue logcat output with the segfault, as well as the managed exception strack trace?

The manage exception is catched. I've added an INFO log so you can see it in the logcat.

06-27 14:20:50.722 11869 11869 I MauiAppSegFault: System.NullReferenceException: Object reference not set to an instance of an object
06-27 14:20:50.722 11869 11869 I MauiAppSegFault:    at MauiAppSegfault.MainPage.Button_OnClicked(Object sender, EventArgs e)

The application is not crashing, and I haven't seen a trace for the Segfault.
But Sentry is capturing it, so it must be somewhere.

So here are two logcats:

  • first is a repro WITH sentry : run app, click on Run button, exception is catched, close app, run it again, Segfault is sent to Sentry.
    logcat_with_sentry.txt
  • second is a repro WITHOUT sentry : run app, click on Run button, exception is catched, close app, run it again.
    logcat_without_sentry.txt

from xamarin-android.

grendello avatar grendello commented on August 15, 2024

@tranb3r thanks, I'd still like to see the error your instance of Sentry records, the one with SIGSEGV. It would be helpful if you pasted it (and whatever context surrounds it) here.

from xamarin-android.

tranb3r avatar tranb3r commented on August 15, 2024

image

image

from xamarin-android.

tranb3r avatar tranb3r commented on August 15, 2024

@grendello
I've posted screenshots of everything I can see in Sentry.

from xamarin-android.

grendello avatar grendello commented on August 15, 2024

@tranb3r thanks! They clearly have the whole stack trace somewhere, since without it they wouldn't show the registers nor the frame where it happens. Alas, their UI made the trace useless - could you try digging in the UI to find the raw data they parse?

from xamarin-android.

tranb3r avatar tranb3r commented on August 15, 2024

I really think that's everything I can get from the UI.

Adding @jamescrosswell , hope you don't mind.
James, maybe you can provide more context?

from xamarin-android.

grendello avatar grendello commented on August 15, 2024

It's too bad the raw data is missing. With these crashes the context is everything - not just the header, but most importantly the frames themselves. With some kinds of signals (e.g. SIGABRT) lines preceding the native trace are often crucial. Contents of registers, as shown in the screenshot, is mostly of secondary interest - they don't give us any information about the location of the crash (except of, in this case, the RPI register which points to code position at the crash) with regards to source code. Even if the trace doesn't contain file:line information, it contains addresses relative to the loaded shared libraries/executables and we can post-mortem translate them to code location (not always, but in most cases). The information contained in the UI above is, alas, not helpful.

from xamarin-android.

tranb3r avatar tranb3r commented on August 15, 2024

Here is the data that is saved by Sentry when the error occurs. It contains a bit more than what is visible in the UI. Maybe you can take a look?
68e4d535-186b-42dc-56f2-c213526eed94.envelope.json

from xamarin-android.

grendello avatar grendello commented on August 15, 2024

The only frame in this data is this:

"frames": [{
  "instruction_addr": "0x7085005af665",
  "package": "/data/app/~~lhSuplZuwh7nocNaMhXmWw==/com.companyname.mauiappsegfault-henAPYLgWXjjYhjFwUCWbQ==/split_config.x86_64.apk!/lib/x86_64/libaot-MauiAppSegfault.dll.so",
  "image_addr": "0x7085005ae000"
}]

which is weird, because AOT libraries don't contain directly executable code. They contain code-as-data that is loaded by the runtime, patched up and made executable so the frame shouldn't point to anything in that shared library. However, try without AOT and see if you still get the segfault?

I don't know if this is the only frame that was logged by Android or the only frame that was deemed worthy of being saved by Sentry, but without the rest it's very, VERY, hard to see what's going on.

The fact that the segfault is reported by Sentry, but is not in your logcat is very weird. The only explanation that comes to me is that Sentry intercepts it and prevents it from ending up in the logcat, exiting the application "cleanly" instead. Is such scenario possible?

from xamarin-android.

tranb3r avatar tranb3r commented on August 15, 2024

However, try without AOT and see if you still get the segfault?

Yes. Still getting the segfault without AOT (<RunAOTCompilation>false</RunAOTCompilation>)

The fact that the segfault is reported by Sentry, but is not in your logcat is very weird. The only explanation that comes to me is that Sentry intercepts it and prevents it from ending up in the logcat, exiting the application "cleanly" instead. Is such scenario possible?

This is why I also pasted the logcat without sentry. I don't see the segfault in this log.
So, I don't think that Sentry is preventing the error from ending up in the logcat.
Now the question is, how is Sentry capturing a segfault that we cannot see in the logcat?
cc @jamescrosswell @bitsandfoxes

from xamarin-android.

grendello avatar grendello commented on August 15, 2024

@tranb3r Sentry could intercept the signal if it installed its own handler and didn't chain it up to the previous one before exiting the application. This could prevent the signal from being logged in the logcat.

from xamarin-android.

tranb3r avatar tranb3r commented on August 15, 2024

@tranb3r Sentry could intercept the signal if it installed its own handler and didn't chain it up to the previous one before exiting the application. This could prevent the signal from being logged in the logcat.

Sure.
But when the app does not use Sentry, there is no segfault in the logcat either, right?
So it means it's not Sentry that is preventing the segfault from being logged.

from xamarin-android.

tranb3r avatar tranb3r commented on August 15, 2024

I will consider opening an issue in Sentry SDK repository if I do not get an answer from James or Stefan here.

from xamarin-android.

bruno-garcia avatar bruno-garcia commented on August 15, 2024

Sentry could intercept the signal if it installed its own handler and didn't chain it up to the previous one before exiting the application.

sentry-native should chain the handlers so this shouldn't be the case. @Swatinem or @markushi could confirm this

Now the question is, how is Sentry capturing a segfault that we cannot see in the logcat?
I'm not sure how this could happen, and I wonder if it's something related to .NET's usage of signals that our SDK is capturing as an error?
IIRC .NET used some signals to communicate between the native and .NET layers. I wonder if that could be related here?

Looking at the screenshot of this event in Sentry I noticed it's missing symbols. Any chance you can upload debug symbols when building the app so we can see the stack trace? Upload is done automatically via msbuild if you configure your .NET project.

Thanks for the repro though, at least we have something to dig into it. @bitsandfoxes said he's taking a look

from xamarin-android.

bitsandfoxes avatar bitsandfoxes commented on August 15, 2024

Running the repro without any problems and I can't seem to reproduce the segfault being sent. I'm on Android 14 on a Pixel 6, might this be a device-specific issue?

from xamarin-android.

tranb3r avatar tranb3r commented on August 15, 2024

Running the repro without any problems and I can't seem to reproduce the segfault being sent. I'm on Android 14 on a Pixel 6, might this be a device-specific issue?

Did you follow all the steps ? (release mode ; tap the run button; restart the app)
I'm reproducing on emulator and pixel5. Could you please try to repro either on emulator or pixel5?
I can also provide an apk for you to test on pixel6 if you give me your Dsn.

from xamarin-android.

bitsandfoxes avatar bitsandfoxes commented on August 15, 2024

Did you follow all the steps ? (release mode ; tap the run button; restart the app)

Let me try that again.

from xamarin-android.

tranb3r avatar tranb3r commented on August 15, 2024

I can confirm my repro on:

  • android 10, 11, 12, 13, 14
  • pixel 5, 6, 7 ; moto G20 ; xiaomi redmi 7 ; oneplus 8 pro

As soon as the Run button is pressed and the exception is triggered, a last_crash file and an xxx.envelope file are created in the cache folder of the app. But the exception is catched and the app is not crashing. The crash report is sent to Sentry when the app is restarted.

Is Sentry capturing a crash of the app? Or is it Sentry SDK that is actually crashing?

from xamarin-android.

bitsandfoxes avatar bitsandfoxes commented on August 15, 2024

Oh that is really really interesting! I can reproduce this and I end up with logs from our native SDK that receives a signal triggered by the exception! Thanks for the repro! I'll have to figure out how to debug this and will get back to you!

sentry-native I  entering signal handler
sentry-native D  captured backtrace from ucontext with 1 frames
sentry-native D  captured backtrace with 1 frames
sentry-native D  merging scope into event
sentry-native D  trying to read modules from /proc/self/maps
sentry-native D  read 396 modules from /proc/self/maps
sentry-native D  adding attachments to envelope
sentry-native D  sending envelope
sentry-native D  serializing envelope into buffer
sentry-native I  crash has been captured

from xamarin-android.

bitsandfoxes avatar bitsandfoxes commented on August 15, 2024

So after a whole bunch of testing:

  1. This only happens in Release and not in Debug
  2. This only happens if there is an actual exception.
    i.e. var s = default(string); var c = s.Length;
  3. throw new Exception() does not cause a signal on the signal handler.

@grendello is this intended behavior? And if so, how does the runtime handle the signal? What is getting checked to ignore the signal safely?

For context, the SDK is hooking itself up to the signal handler and it receives a signal, thus creating an event from it. That signal then gets forwarded and seems to get swallowed.

from xamarin-android.

grendello avatar grendello commented on August 15, 2024

@bitsandfoxes no, the behavior isn't correct... I wonder if your handler catches the signal on a thread that's not attached to MonoVM and thus there are no handlers to chain to.

from xamarin-android.

bruno-garcia avatar bruno-garcia commented on August 15, 2024

To confirm, a C# null reference is supposed to trigger SIGSEGV? I mean seems that's what's happening, but the app doesn't crash. If Sentry wasn't there the signal probably wouldn't be 'noticed' by anything else in the app. Correct?

from xamarin-android.

grendello avatar grendello commented on August 15, 2024

@bruno-garcia no, a managed null reference shouldn't do that. SIGSEGV is always an issue with some native code. If sentry weren't there, then the Android launcher process would have caught it, logged in logcat and created a tombstone. Since there's no segfault without Sentry, however, the problem either exists in Sentry's native code or something Sentry does to/in managed land triggers a bug in the MonoVM runtime.

from xamarin-android.

bruno-garcia avatar bruno-garcia commented on August 15, 2024

If sentry weren't there, then the Android launcher process would have caught it, logged in logcat and created a tombstone.

Doesn't seem to be the case as OP mentioned without Sentry still nothing shows up on logcat. Unless I misunderstood things.

Since there's no segfault without Sentry, however, the problem either exists in Sentry's native code or something Sentry does to/in managed land triggers a bug in the MonoVM runtime.

That's possible (except the signal might still exist even without Sentry since nothing shows up logcat either with or without Sentry).

I'm saying that after reading:

This is why I also pasted the logcat without sentry. I don't see the segfault in this log.

from xamarin-android.

grendello avatar grendello commented on August 15, 2024

If sentry weren't there, then the Android launcher process would have caught it, logged in logcat and created a tombstone.

Doesn't seem to be the case as OP mentioned without Sentry still nothing shows up on logcat. Unless I misunderstood things.

It's a very, very, rare case that an Android application crashes with segfault and nothing gets logged. I think, I've seen only one such instance over the years. Sometimes there's very little logged, but there's always a trace. If MonoVM misses to catch the signal, ART or Zygote will. Mono won't silently handle and ignore a segfault, so the fact that we see nothing in the logcat without Sentry means that most likely (very likely) there's no signal raised.

Since there's no segfault without Sentry, however, the problem either exists in Sentry's native code or something Sentry does to/in managed land triggers a bug in the MonoVM runtime.

That's possible (except the signal might still exist even without Sentry since nothing shows up logcat either with or without Sentry).

This is very unlikely, as explained above. The only way for that to happen is if there were a signal handler installed somewhere which would handle and not log the segfault. Neither ART nor MonoVM would do that, and also there's no good reason to swallow such destructive and dangerous signals, I can't imagine why a legitimate piece of software would do that. The only scenario I can imagine is where we have a corrupted chain of signal handlers. For instance, let's consider a situation where both ART and MonoVM uninstall their own handlers and the chain is left with a handler in the middle that does some processing when it captures a signal and then passes it on, if there is another handler installed. If we assume that other signals are gone and our hypothetical handler doesn't have logging code, nor was it designed to abort the application, the app can keep running. I can imagine this scenario with software like Sentry running in the application, but I can't imagine it without it present (barring application itself doing that, of course).

from xamarin-android.

tranb3r avatar tranb3r commented on August 15, 2024

Is there anything I can do to help make some progress?

from xamarin-android.

jamescrosswell avatar jamescrosswell commented on August 15, 2024

Awesome - thank you @supervacuus !

An approach we already tested on Linux (where we can see the same behavior) was to invoke the dotnet runtime handler at the start of our handler (rather than the end).

It sounds like we can close this issue in the dotnet/android repo and create one instead in getsentry/sentry-native then?

from xamarin-android.

grendello avatar grendello commented on August 15, 2024

@supervacuus thanks for the analysis, however I have one thing that must be noted. In the case of .NET for Android, CLR isn't used, we use the MonoVM runtime, so while overall behavior might be the same, the details may differ.

from xamarin-android.

supervacuus avatar supervacuus commented on August 15, 2024

@supervacuus thanks for the analysis, however I have one thing that must be noted. In the case of .NET for Android, CLR isn't used, we use the MonoVM runtime, so while overall behavior might be the same, the details may differ.

Yeah, sorry, you are absolutely right @grendello. I did not mention this because I did not want to extend the already long comment. While the concrete implementation differs, MonoVM has a similar signal-to-exception mapping to the one in the CLR, split across:

My point was also that we (Sentry) should probably have a more abstract view of the differences between the two implementations and care more about the similarity in behavior as seen from the chaining in our handler.

But I am not a runtime dev, and any input you can give us on the implementation details and how they could affect choices in our handler is more than welcome!

from xamarin-android.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.