-
Notifications
You must be signed in to change notification settings - Fork 705
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
how to deal with syscall interruption due to (runtime) signals #853
Labels
documentation
Improvements or additions to documentation
Comments
lmb
added a commit
to lmb/ebpf
that referenced
this issue
Dec 12, 2022
Extend the profiler blocking we introduced for BPF_PROG_LOAD to BPF_PROG_RUN, with the difference that EINTR is not handled in the syscall wrapper itself. This allows us to have PROG_RUN specific retry logic. Deprecate RunOptions.Reset since its main motivation was to account for profiler skew. Interruptions from the runtime are only possible via runtime.AllThreadSyscall, which is very specialised anyways. This doesn't protect from interruptions due to signals that are being sent from "outside" the process. Document that RunOptions.Repeat is a lower bound for how often a program is run due to the retry behaviour. Also add a workaround that makes the common case of repeat == 1 not suffer from interruptions. Fixes cilium#853
lmb
added a commit
to lmb/ebpf
that referenced
this issue
Dec 12, 2022
Extend the profiler blocking we introduced for BPF_PROG_LOAD to BPF_PROG_RUN, with the difference that EINTR is not handled in the syscall wrapper itself. This allows us to have PROG_RUN specific retry logic. Deprecate RunOptions.Reset since its main motivation was to account for profiler skew. Interruptions from the runtime are only possible via runtime.AllThreadSyscall, which is very specialised anyways. This doesn't protect from interruptions due to signals that are being sent from "outside" the process, which will still skew results and cause retries. Document that RunOptions.Repeat is a lower bound for how often a program is run due to the retry behaviour. Also add a workaround that makes the common case of repeat == 1 not suffer from interruptions. Fixes cilium#853
lmb
added a commit
to lmb/ebpf
that referenced
this issue
Dec 12, 2022
Extend the profiler blocking we introduced for BPF_PROG_LOAD to BPF_PROG_RUN, with the difference that EINTR is not handled in the syscall wrapper itself. This allows us to have PROG_RUN specific retry logic. Deprecate RunOptions.Reset since its main motivation was to account for profiler skew. Interruptions from the runtime are only possible via runtime.AllThreadSyscall, which is very specialised anyways. This doesn't protect from interruptions due to signals that are being sent from "outside" the process, which will still skew results and cause retries. Document that RunOptions.Repeat is a lower bound for how often a program is run due to the retry behaviour. Also add a workaround that makes the common case of repeat == 1 not suffer from interruptions. Fixes cilium#853
lmb
added a commit
to lmb/ebpf
that referenced
this issue
Dec 13, 2022
Extend the profiler blocking we introduced for BPF_PROG_LOAD to BPF_PROG_RUN, with the difference that EINTR is not handled in the syscall wrapper itself. This allows us to have PROG_RUN specific retry logic. Deprecate RunOptions.Reset since its main motivation was to account for profiler skew. Interruptions from the runtime are only possible via runtime.AllThreadSyscall, which is very specialised anyways. This doesn't protect from interruptions due to signals that are being sent from "outside" the process, which will still skew results and cause retries. Document that RunOptions.Repeat is a lower bound for how often a program is run due to the retry behaviour. Also add a workaround that makes the common case of repeat == 1 not suffer from interruptions. Fixes cilium#853
ti-mo
pushed a commit
that referenced
this issue
Dec 13, 2022
Extend the profiler blocking we introduced for BPF_PROG_LOAD to BPF_PROG_RUN, with the difference that EINTR is not handled in the syscall wrapper itself. This allows us to have PROG_RUN specific retry logic. Deprecate RunOptions.Reset since its main motivation was to account for profiler skew. Interruptions from the runtime are only possible via runtime.AllThreadSyscall, which is very specialised anyways. This doesn't protect from interruptions due to signals that are being sent from "outside" the process, which will still skew results and cause retries. Document that RunOptions.Repeat is a lower bound for how often a program is run due to the retry behaviour. Also add a workaround that makes the common case of repeat == 1 not suffer from interruptions. Fixes #853
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
BPF_PROG_LOAD and BPF_PROG_RUN are interrupted when a signal is sent to the thread executing the syscall. This leads to an unfortunate interaction with the Go runtime, which uses signals to implement the following features:
In my opinion, the library must deal with all of these gracefully.
Async preemption
Async preemption is not used when a thread is in a syscall, since that is a "blocked safe point" already. No need to worry about SIGURG therefore.
CPU profiling
The runtime periodically sends SIGPROF to all threads if CPU profiling is enabled. This includes threads in a syscall.
We can block SIGPROF at the expense of less accurate profiles. This solves the most common cause of problems. We currently do this for BPF_PROG_LOAD but not for BPF_PROG_RUN.
All threads syscall
syscall.AllThreadSyscall sends SIGSETXID to invoke runPerThreadSyscall from the signal handler.
Blocking SIGSETXID would prevent invoking runPerThreadSyscall and cause the runtime to hang. We have to leave the signal unblocked and retry the syscall instead.
Since we can't know when a syscall interruption was due to AllThreadsSyscall we have to always retry.
Conclusion
I think we should extend the BPF_PROG_LOAD behaviour to BPF_PROG_RUN:
I expect that AllThreadsSyscall interrupting PROG_RUN is actually very rare. So I propose to also remove RunOptions.Reset and instead document that AllThreadsSyscall or signals may skew the results of a Run. Most importantly we have to document that
RunOptions.Repeat
is basically a lower bound of how often a program is run due to retries.The text was updated successfully, but these errors were encountered: