Skip to content
This repository has been archived by the owner on Jan 20, 2022. It is now read-only.

[LibOS] Race conditions among thread_sleep, thread_wakeup and DkThreadResume() in append_signal() #453

Closed
yamahata opened this issue Feb 2, 2019 · 7 comments

Comments

@yamahata
Copy link
Contributor

yamahata commented Feb 2, 2019

There are several race conditions among thread_sleep, thread_wakeup and DkThreadResume.
The test program at #452
shows the program stops execution.

There would be needed a way for LibOS to atomically sleep/wakeup in Pal API regarding to host signal.

@dimakuv
Copy link

dimakuv commented Feb 14, 2019

The situation is actually more complicated.

Upon testing program in #452, I realized that sigsuspend() incorrectly puts the thread to sleep even if there is a pending signal which becomes unblocked thanks to the new signal mask (SIGALRM in our program). If we change the timeout for nanosleep() in the testing program:

...
        struct timespec t = {
            .tv_sec = 5,    // sleep for 5 seconds so SIGALRM signal is blocked
            .tv_nsec = 0
        };
...

The program executes correctly on normal Linux (it doesn't hang). This is the behavior of sigsuspend(): all pending signals are delivered on sigsuspend() invocation. Since the program actually has a pending signal SIGALRM due to alarm(), sigsuspend() returns immediately.

The program deterministically hangs on Graphene-SGX. The problem is that shim_do_sigsuspend() doesn't have this piece of logic where it checks for any pending signals before going to sleep.

Thus, before fixing the race conditions as proposed by @yamahata, we should fix this issue.

The fix: add the check of signal_logs of now-unblocked signals in shim_do_sigsuspend() (similar to shim_do_sigpending()). If there is any pending signal in signal_logs, deliver this signal instead of thread_sleep() and return from shim_do_sigsuspend().

@donporter
Copy link
Contributor

Good catch. Do you have a patch? And is your proposal not to merge #452/#454?

@dimakuv
Copy link

dimakuv commented Feb 14, 2019

@donporter I will need more time to analyze the current pending-signal handling, what @yamahata has done already in this direction, and to propose a fix for this particular issue. In the meantime, I would like to block the merging of #452, #454, and #455.

@yamahata
Copy link
Contributor Author

@dimakuv, i didn’t say #454 fixes all the issues. it only address only one of them. i don’ t think only shim change can solve it, but pal changes are needed. #445 is totally different stuff.

@dimakuv
Copy link

dimakuv commented Feb 15, 2019

@yamahata I understand this. I just wanted to understand the issues with signal handling deeper and then review your PRs. I have no doubt these are legitimate fixes, just need more time to understand what's going on there.

@dimakuv
Copy link

dimakuv commented Jul 17, 2019

PR #811 solves the issue with shim_do_sigsuspend() (among other things).

@boryspoplawski
Copy link
Contributor

There should be no more races, thread_sleep checks for pending signals.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants