This was discussed before, but I don't know if it was resolved: <a href="http://www.hp

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Race conditions between thread termination and garbage collection about bdwgc HOT 17 CLOSED

ivmai commented on July 17, 2024

Race conditions between thread termination and garbage collection

from bdwgc.

Comments (17)

minad commented on July 17, 2024

Ok, I guess it is an issue with signal handling and threading. I have a signal handler which is left via longjmp which doesn't correctly restore the signal mask. This results in a problem if a garbage collection is running at that time.

I am trying to catch stack overflow segfaults with a sigaltstack. But I don't get it to work correctly with bdwgc. This was also discussed before here: http://www.hpl.hp.com/hosted/linux/mail-archives/gc/2012-May/005173.html Maybe @hboehm could help?

from bdwgc.

minad commented on July 17, 2024

I think I fixed the issue in my program and it isn't a problem with bdwgc. Sorry for bothering you.

Just for future reference:

This might be also interesting for @paurkedal (I've seen your culibs library).

I use the ucontext to retrieve the sigmask before the segfault and restore with pthread_sigmask. Maybe you know of a better way?

static void handleSigsegv(int sig, siginfo_t *si, void* ctx) {
        char* addr = (char*)si->si_addr;
        if (addr >= stackLow - pageSize && addr <= stackHigh) {
                oldmask = ((ucontext_t*)ctx)->uc_sigmask;
                longjmp(...);
        }
        abort();
}

The second important thing is to use stack banging in the functions which could trigger a stack overflow.

void bang() {
        volatile char x;
        *(&x - 4096);
}

The stack banging is especially important if allocations are done. This could be resolved by using separate collector threads which is not possible right now, I guess?

from bdwgc.

rptb1 commented on July 17, 2024

For what it's worth, I wouldn't ever trust longjmp out of a signal handler, since you're skipping the context restore and any other unknown magic that the runtime system/kernel needs to do at that point. What you can do on a lot of systems is write to the ucontext so that when you return from the signal handler you end up where you want to be.

from bdwgc.

paurkedal commented on July 17, 2024

@minad I think the similar culibs code (cuflow/wind.h) is okay, as it uses siglongjmp or unw_resume from libunwind if available.

from bdwgc.

minad commented on July 17, 2024

@rptb1 Thank you for your answer! It seems to work quite well now with longjmp and manual signal mask restoration (uc_sigmask from ucontext). But your idea to overwrite the ucontext is also a nice one - I would like to try that. But then I have to convert from the jmp_buf to the ucontext which would be ugly or switch everything to getcontext/setcontext.

from bdwgc.

paurkedal commented on July 17, 2024

According to the man page, setcontext does not return if the call is successful. I interpret that to mean that it jumps directly instead of waiting for the signal handler to return.

from bdwgc.

minad commented on July 17, 2024

@paurkedal longjmp also jumps directly out of the signal handler. Therefore I got problems with the signal mask at the first place.

from bdwgc.

paurkedal commented on July 17, 2024

@minad Sure, and so does siglongjump, but at least siglongjump and setcontext restores the signal mask. I was mostly wondering: @rptb1, did you have something else in mind then setcontext when you suggested writing to ucontext?

from bdwgc.

minad commented on July 17, 2024

@paurkedal Well the point is - do you want to have the signal mask restored before or AFTER the stack switch assuming that the handler uses a sigaltstack? Therefore neither siglongjmp nor setcontext might work :)

It is pretty easy to play with the different variants. This is what I am using right now:

extern __thread volatile bool jumped;
#define JMP_BUF      ucontext_t
#define SETJMP(buf)  (getcontext(&buf), jumped ? (jumped = false, true) : false)
#define LONGJMP(buf) (jumped = true, setcontext(&buf))

#define JMP_BUF      jmp_buf
#define SETJMP(buf)  _setjmp(buf)
#define LONGJMP(buf) _longjmp(buf, 1)

#define JMP_BUF      sigjmp_buf
#define SETJMP(buf)  sigsetjmp(buf, 1)
#define LONGJMP(buf) siglongjmp(buf, 1)

from bdwgc.

rptb1 commented on July 17, 2024

@minad Use getcontext rather than setjmp, and extract PC/SP into the signal context. You have to do some architecture-specific register poking, but I have had this work reliably.
(For what it's worth I've just spent a couple of months on commercial work where signals, threads, and signal masks are all mixed up in a hostile environment. Fortunately it was Linux/Intel specific, so I was able to make assumptions. But it ain't pretty.)

from bdwgc.

minad commented on July 17, 2024

@rptb1 Thank you very much for your help. I am working on some kind of vm/runtime thing also on Linux/Intel right now, so it is acceptable to make some assumptions I guess. But it can get very ugly probably 😬

The problem I am especially dealing here is that I am catching SIGSEGV from stack overflow (stack banging in a guard page) which makes the signal mask and the signal handler stack a problem.

To summarize - I see the following combinations for the problem:

Either save ucontexts or save jmp_bufs in normal program flow.
Either extract pc/sp from previously saved context into signal ucontext and return from signal handler, or extract signal mask from signal ucontext and put it into the previously saved context. Jump then with longjmp/setcontext.

I would like to find out which combination is the best.

Extract the signal mask from the signal ucontext into the previously saved ucontext and jump with setcontext. I got this working.
Extract the signal mask from the signal ucontext, restore it with pthread_sigmask and return with a longjmp to the previously stored context. However I guess there is a race condition between restoring the mask and longjumping. It seems there would also be a race condition if I would overwrite the signal mask in the sigjmp_buf and return with siglongjump because siglongjump does
essentially the same (restoring with pthread_sigmask and then jumping/stack switching). I got this working.
Extract pc/sp from the previously saved ucontext into the signal ucontext and return from signal context. I don't know how to extract pc/sp from the ucontext. Do you have some example code?
Extract pc/sp from the previously stored jmp_buf into the signal ucontext and return from signal context.

RACE CONDITIONS: I would especially like to know about the race conditions. I think the cases with longjmp/siglongjmp all have race conditions between signal mask restoration and jumping. Is this also the case with setcontext? What about the variants where one uses the signal handler context and just returns. Are those race condition free?

from bdwgc.

minad commented on July 17, 2024

Both setcontext and longjmp have race conditions because they have to restore the signal mask first:

@rptb1 So the only hope is really modifying the signal handler ucontext as you suggested?

from bdwgc.

rptb1 commented on July 17, 2024

@minad Here's a quick demo I've knocked up for you. Just cc main.c && ./a.out

#define _GNU_SOURCE
#include <sys/ucontext.h>
#include <stdio.h>
#include <sys/types.h>
#include <err.h>
#include <string.h>
#include <unistd.h>
#include <signal.h>
#include <ucontext.h>
#include <stdbool.h>

static volatile bool jumped = false;
static volatile ucontext_t context;

static void handler(int sig, siginfo_t *info, void *p)
{
    printf("handler(%d, %p, %p)\n", sig, info, p);
    jumped = true;
    ucontext_t *uc = p;
    uc->uc_mcontext.gregs[REG_RIP] = context.uc_mcontext.gregs[REG_RIP];
    uc->uc_mcontext.gregs[REG_RSP] = context.uc_mcontext.gregs[REG_RSP];
}

int main(void)
{
    struct sigaction act, oldact;
    act.sa_sigaction = handler;
    act.sa_flags = SA_SIGINFO | SA_ONESHOT;
    sigemptyset(&act.sa_mask);
    if (sigaction(SIGUSR1, &act, &oldact) == -1)
        err(1, "sigaction");
    ucontext_t uc;
    getcontext(&uc);
    if (!jumped) {
        context = uc;
        printf("first time\n");
        kill(getpid(), SIGUSR1);
        printf("shouldn't reach this\n");
    } else {
        printf("second time\n");
    }
    return 0;
}

This is x86_64 specific, because of the register references.

Have a go at poking uc->uc_sigmask inside handler and let me know if that does what you want.

from bdwgc.

minad commented on July 17, 2024

@rptb1 Wow, great! thx! I wonder - is it really sufficient to only restore ip/sp?

I mean setcontext also does the following...

0071     movq    oRSP(%rdi), %rsp
0072     movq    oRBX(%rdi), %rbx
0073     movq    oRBP(%rdi), %rbp
0074     movq    oR12(%rdi), %r12
0075     movq    oR13(%rdi), %r13
0076     movq    oR14(%rdi), %r14
0077     movq    oR15(%rdi), %r15

I think this is where my problem differs from your example. You jump at a known position with known register status. This is not what I am doing because I am catching the segfault which could occur anywhere.
So I guess it might not work.

Maybe copying the whole opaque mcontext does what I want...

from bdwgc.

rptb1 commented on July 17, 2024

@minad I just wanted to show you the essential logic. I think it's wise if you duplicate what setcontext does. Personally I would avoid relying on any local variables at the jump destination after such a manoeuvre, just in case.

Please let me know if and how well this works out for you.

from bdwgc.

minad commented on July 17, 2024

@rptb1 I didn't really get it to work. I think I stick with normal setcontext/_longjmp from the signal handler and restore the signal mask afterwards. This works pretty well, I don't have to refer to all those registers and also avoid the race condition. About local variables - you are right about that. That is a documented issue of _setjmp/getcontext. If you use volatile local variables ~~or a compiler memory barrier~~ after _setjmp/getcontext you are fine.

from bdwgc.

paurkedal commented on July 17, 2024

@minad An alternative would be to disable just the signal used by GC in uc_sigmask before setcontext, and then re-enable it when entering in the saved continuation.

from bdwgc.

Race conditions between thread termination and garbage collection about bdwgc HOT 17 CLOSED

Comments (17)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent