Comments (17)
Ok, I guess it is an issue with signal handling and threading. I have a signal handler which is left via longjmp which doesn't correctly restore the signal mask. This results in a problem if a garbage collection is running at that time.
I am trying to catch stack overflow segfaults with a sigaltstack. But I don't get it to work correctly with bdwgc. This was also discussed before here: http://www.hpl.hp.com/hosted/linux/mail-archives/gc/2012-May/005173.html Maybe @hboehm could help?
from bdwgc.
I think I fixed the issue in my program and it isn't a problem with bdwgc. Sorry for bothering you.
Just for future reference:
This might be also interesting for @paurkedal (I've seen your culibs library).
I use the ucontext to retrieve the sigmask before the segfault and restore with pthread_sigmask. Maybe you know of a better way?
static void handleSigsegv(int sig, siginfo_t *si, void* ctx) {
char* addr = (char*)si->si_addr;
if (addr >= stackLow - pageSize && addr <= stackHigh) {
oldmask = ((ucontext_t*)ctx)->uc_sigmask;
longjmp(...);
}
abort();
}
The second important thing is to use stack banging in the functions which could trigger a stack overflow.
void bang() {
volatile char x;
*(&x - 4096);
}
The stack banging is especially important if allocations are done. This could be resolved by using separate collector threads which is not possible right now, I guess?
from bdwgc.
For what it's worth, I wouldn't ever trust longjmp out of a signal handler, since you're skipping the context restore and any other unknown magic that the runtime system/kernel needs to do at that point. What you can do on a lot of systems is write to the ucontext so that when you return from the signal handler you end up where you want to be.
from bdwgc.
@minad I think the similar culibs code (cuflow/wind.h) is okay, as it uses siglongjmp or unw_resume from libunwind if available.
from bdwgc.
@rptb1 Thank you for your answer! It seems to work quite well now with longjmp and manual signal mask restoration (uc_sigmask from ucontext). But your idea to overwrite the ucontext is also a nice one - I would like to try that. But then I have to convert from the jmp_buf to the ucontext which would be ugly or switch everything to getcontext/setcontext.
from bdwgc.
According to the man page, setcontext does not return if the call is successful. I interpret that to mean that it jumps directly instead of waiting for the signal handler to return.
from bdwgc.
@paurkedal longjmp also jumps directly out of the signal handler. Therefore I got problems with the signal mask at the first place.
from bdwgc.
@minad Sure, and so does siglongjump, but at least siglongjump and setcontext restores the signal mask. I was mostly wondering: @rptb1, did you have something else in mind then setcontext when you suggested writing to ucontext?
from bdwgc.
@paurkedal Well the point is - do you want to have the signal mask restored before or AFTER the stack switch assuming that the handler uses a sigaltstack? Therefore neither siglongjmp nor setcontext might work :)
It is pretty easy to play with the different variants. This is what I am using right now:
extern __thread volatile bool jumped;
#define JMP_BUF ucontext_t
#define SETJMP(buf) (getcontext(&buf), jumped ? (jumped = false, true) : false)
#define LONGJMP(buf) (jumped = true, setcontext(&buf))
#define JMP_BUF jmp_buf
#define SETJMP(buf) _setjmp(buf)
#define LONGJMP(buf) _longjmp(buf, 1)
#define JMP_BUF sigjmp_buf
#define SETJMP(buf) sigsetjmp(buf, 1)
#define LONGJMP(buf) siglongjmp(buf, 1)
from bdwgc.
@minad Use getcontext rather than setjmp, and extract PC/SP into the signal context. You have to do some architecture-specific register poking, but I have had this work reliably.
(For what it's worth I've just spent a couple of months on commercial work where signals, threads, and signal masks are all mixed up in a hostile environment. Fortunately it was Linux/Intel specific, so I was able to make assumptions. But it ain't pretty.)
from bdwgc.
@rptb1 Thank you very much for your help. I am working on some kind of vm/runtime thing also on Linux/Intel right now, so it is acceptable to make some assumptions I guess. But it can get very ugly probably 😬
The problem I am especially dealing here is that I am catching SIGSEGV from stack overflow (stack banging in a guard page) which makes the signal mask and the signal handler stack a problem.
To summarize - I see the following combinations for the problem:
- Either save ucontexts or save jmp_bufs in normal program flow.
- Either extract pc/sp from previously saved context into signal ucontext and return from signal handler, or extract signal mask from signal ucontext and put it into the previously saved context. Jump then with longjmp/setcontext.
I would like to find out which combination is the best.
- Extract the signal mask from the signal ucontext into the previously saved ucontext and jump with setcontext. I got this working.
- Extract the signal mask from the signal ucontext, restore it with pthread_sigmask and return with a longjmp to the previously stored context. However I guess there is a race condition between restoring the mask and longjumping. It seems there would also be a race condition if I would overwrite the signal mask in the sigjmp_buf and return with siglongjump because siglongjump does
essentially the same (restoring with pthread_sigmask and then jumping/stack switching). I got this working. - Extract pc/sp from the previously saved ucontext into the signal ucontext and return from signal context. I don't know how to extract pc/sp from the ucontext. Do you have some example code?
- Extract pc/sp from the previously stored jmp_buf into the signal ucontext and return from signal context.
RACE CONDITIONS: I would especially like to know about the race conditions. I think the cases with longjmp/siglongjmp all have race conditions between signal mask restoration and jumping. Is this also the case with setcontext? What about the variants where one uses the signal handler context and just returns. Are those race condition free?
from bdwgc.
Both setcontext and longjmp have race conditions because they have to restore the signal mask first:
- http://osxr.org:8080/glibc/source/sysdeps/unix/sysv/linux/x86_64/setcontext.S
- http://osxr.org:8080/glibc/source/setjmp/longjmp.c
@rptb1 So the only hope is really modifying the signal handler ucontext as you suggested?
from bdwgc.
@minad Here's a quick demo I've knocked up for you. Just cc main.c && ./a.out
#define _GNU_SOURCE
#include <sys/ucontext.h>
#include <stdio.h>
#include <sys/types.h>
#include <err.h>
#include <string.h>
#include <unistd.h>
#include <signal.h>
#include <ucontext.h>
#include <stdbool.h>
static volatile bool jumped = false;
static volatile ucontext_t context;
static void handler(int sig, siginfo_t *info, void *p)
{
printf("handler(%d, %p, %p)\n", sig, info, p);
jumped = true;
ucontext_t *uc = p;
uc->uc_mcontext.gregs[REG_RIP] = context.uc_mcontext.gregs[REG_RIP];
uc->uc_mcontext.gregs[REG_RSP] = context.uc_mcontext.gregs[REG_RSP];
}
int main(void)
{
struct sigaction act, oldact;
act.sa_sigaction = handler;
act.sa_flags = SA_SIGINFO | SA_ONESHOT;
sigemptyset(&act.sa_mask);
if (sigaction(SIGUSR1, &act, &oldact) == -1)
err(1, "sigaction");
ucontext_t uc;
getcontext(&uc);
if (!jumped) {
context = uc;
printf("first time\n");
kill(getpid(), SIGUSR1);
printf("shouldn't reach this\n");
} else {
printf("second time\n");
}
return 0;
}
This is x86_64 specific, because of the register references.
Have a go at poking uc->uc_sigmask
inside handler
and let me know if that does what you want.
from bdwgc.
@rptb1 Wow, great! thx! I wonder - is it really sufficient to only restore ip/sp?
I mean setcontext also does the following...
0071 movq oRSP(%rdi), %rsp
0072 movq oRBX(%rdi), %rbx
0073 movq oRBP(%rdi), %rbp
0074 movq oR12(%rdi), %r12
0075 movq oR13(%rdi), %r13
0076 movq oR14(%rdi), %r14
0077 movq oR15(%rdi), %r15
I think this is where my problem differs from your example. You jump at a known position with known register status. This is not what I am doing because I am catching the segfault which could occur anywhere.
So I guess it might not work.
Maybe copying the whole opaque mcontext does what I want...
from bdwgc.
@minad I just wanted to show you the essential logic. I think it's wise if you duplicate what setcontext does. Personally I would avoid relying on any local variables at the jump destination after such a manoeuvre, just in case.
Please let me know if and how well this works out for you.
from bdwgc.
@rptb1 I didn't really get it to work. I think I stick with normal setcontext/_longjmp from the signal handler and restore the signal mask afterwards. This works pretty well, I don't have to refer to all those registers and also avoid the race condition. About local variables - you are right about that. That is a documented issue of _setjmp/getcontext. If you use volatile local variables or a compiler memory barrier after _setjmp/getcontext you are fine.
from bdwgc.
@minad An alternative would be to disable just the signal used by GC in uc_sigmask before setcontext, and then re-enable it when entering in the saved continuation.
from bdwgc.
Related Issues (20)
- Downstream libgc releases (Feb 2024) HOT 21
- GC hangs with "parallel mark" if using `GC_allow_register_threads()` HOT 11
- Debugging an issue with GC seemingly not happening on Emscripten/WebAssembly
- Upstream TLS support HOT 2
- How final are the finalizers? HOT 2
- Problem getting finalizers to work HOT 3
- Add examples collection of small programs showing how to use GC for varioues features and with different configurations HOT 2
- Support CHERI extension
- using extra pointer bits on x86_64 (tagged pointer support) HOT 3
- Resolve warnings reported by cppcheck 2.13.3
- rare case of infinite loop on Android HOT 9
- gctest hang on msys64 if mprotected-based incremental collection is on HOT 5
- zig-msvc target error: 'unistd.h' file not found HOT 18
- Specify soversion for shared libs built by Zig
- Unexpected heap growth in gctest if no parallel marker
- cordtest hang on Windows if shared build by Zig using msvc HOT 7
- Some Windows gnu builds hang in zig build
- gctest hang sometimes on Windows if compiled with threads discovery
- Redirect malloc_usable_size() in leak_detector.h HOT 6
- Compilation with TCC fails on ARM macs HOT 4
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from bdwgc.