Inconsistent signal-handling behavior in java_stub_template / java_binary #6338
Labels
P3
We're not considering working on this, but happy to review a PR. (No assignee)
stale
Issues or PRs that are stale (no activity for 30 days)
team-Rules-Java
Issues for Java rules
type: bug
Based on manual code reading, I believe that the shell wrapper scripts generated from
java_stub_template.txt
have a signal-handling bug: it looks like the behavior of passing of signals from thebash
wrapper tojava
is dependent on the string length of the child JVM's classpath and I fear that this may cause subtle, hard-to-diagnose bugs if a classpath length increase can trigger such a significant behavior change.To spot the problem, let's take a look at
java_stub_template.txt
as of the current master commit:At the very bottom of the file, we have this branch:
In
create_and_run_classpath_jar
, Java is invoked asThis file does not use bash's
trap
built-in, so in this case signals likeSIGTERM
will not be propagated to the childjava
process.If a
SIGTERM
is sent to the bash wrapper script then it will be propagated tojava
in the common case whereexec
is used, but if the classpath becomes sufficiently long then thecreate_and_run_classpath_jar
path will be taken and signals will not be propagated.To fix this, I think that Bazel needs to make proper use of
trap
or signal handlers increate_and_run_classpath_jar
to ensure that signals are consistently forwarded to the child process. We can't simply useexec
because we need to perform therm -rf
calls to clean up the temporary manifest files. It turns out that this can be subtle and hard to get right: I found a good blog post at https://veithen.github.io/2014/11/16/sigterm-propagation.html which illustrates some of the pitfalls involved. Also see https://unix.stackexchange.com/questions/146756/forward-sigterm-to-child-in-bash/444676#444676, linked from a comment on that blog post.It appears that the Linux/Darwin version of this bug was introduced in 102ce6d (which fixed #3069) because the Linux/Darwin "large classpath" case was using
exec
prior to that patch's changes.I spotted this problem through manual code reading rather than through an actual failure / bug reproduction: I encountered a similar signal-handling issue in one of my my own Bazel rules (which generates a similar, but much simpler, wrapper script) and decided to look at Bazel's scripts to see whether they used
exec
to address this signal-handling problem.The text was updated successfully, but these errors were encountered: