Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inconsistent signal-handling behavior in java_stub_template / java_binary #6338

Closed
JoshRosen opened this issue Oct 9, 2018 · 3 comments
Closed
Labels
P3 We're not considering working on this, but happy to review a PR. (No assignee) stale Issues or PRs that are stale (no activity for 30 days) team-Rules-Java Issues for Java rules type: bug

Comments

@JoshRosen
Copy link

JoshRosen commented Oct 9, 2018

Based on manual code reading, I believe that the shell wrapper scripts generated from java_stub_template.txt have a signal-handling bug: it looks like the behavior of passing of signals from the bash wrapper to java is dependent on the string length of the child JVM's classpath and I fear that this may cause subtle, hard-to-diagnose bugs if a classpath length increase can trigger such a significant behavior change.

To spot the problem, let's take a look at java_stub_template.txt as of the current master commit:

At the very bottom of the file, we have this branch:

if is_windows && (("${#CLASSPATH}" > ${CLASSPATH_LIMIT} )); then
  create_and_run_classpath_jar "local_jdk/bin/jar.exe"
elif (("${#CLASSPATH}" > ${CLASSPATH_LIMIT})); then
  create_and_run_classpath_jar "local_jdk/bin/jar"
else
  exec $JAVABIN -classpath $CLASSPATH "${ARGS[@]}"
fi

In create_and_run_classpath_jar, Java is invoked as

  # Execute JAVA command
  $JAVABIN -classpath "$MANIFEST_JAR_FILE" "${ARGS[@]}"
  exit_code=$?
  rm -f "$MANIFEST_FILE"
  rm -f "$MANIFEST_JAR_FILE"
  exit $exit_code

This file does not use bash's trap built-in, so in this case signals like SIGTERM will not be propagated to the child java process.

If a SIGTERM is sent to the bash wrapper script then it will be propagated to java in the common case where exec is used, but if the classpath becomes sufficiently long then the create_and_run_classpath_jar path will be taken and signals will not be propagated.

To fix this, I think that Bazel needs to make proper use of trap or signal handlers in create_and_run_classpath_jar to ensure that signals are consistently forwarded to the child process. We can't simply use exec because we need to perform the rm -rf calls to clean up the temporary manifest files. It turns out that this can be subtle and hard to get right: I found a good blog post at https://veithen.github.io/2014/11/16/sigterm-propagation.html which illustrates some of the pitfalls involved. Also see https://unix.stackexchange.com/questions/146756/forward-sigterm-to-child-in-bash/444676#444676, linked from a comment on that blog post.

It appears that the Linux/Darwin version of this bug was introduced in 102ce6d (which fixed #3069) because the Linux/Darwin "large classpath" case was using exec prior to that patch's changes.

I spotted this problem through manual code reading rather than through an actual failure / bug reproduction: I encountered a similar signal-handling issue in one of my my own Bazel rules (which generates a similar, but much simpler, wrapper script) and decided to look at Bazel's scripts to see whether they used exec to address this signal-handling problem.

@ulfjack
Copy link
Contributor

ulfjack commented Jan 25, 2019

This is really odd. I have just seen the same problem in test-setup.sh. However, I did not have to follow the referenced blog post, but just adding a trap handler for SIGTERM seems to make it work correctly. Anyone working on this might want to try that first (I'm cc'ing this bug in the patch, so it should show up when it's submitted).

bazel-io pushed a commit that referenced this issue Feb 1, 2019
If we don't set a trap here, then bash ignores the signal, and the test
process also does not receive the signal, so the test runner has no
chance of writing a test.xml output.

However, the behavior of trap forwarding the signal to the subprocess is
not at all documented in the bash documentation, and also inconsistent
with the behavior reported in #7119.

There is a similar problem in the Java stub template reported in #6338.

This may or may not be progress on #4608.

PiperOrigin-RevId: 232035930
@lberki lberki added P3 We're not considering working on this, but happy to review a PR. (No assignee) and removed untriaged labels Mar 14, 2019
@github-actions
Copy link

Thank you for contributing to the Bazel repository! This issue has been marked as stale since it has not had any activity in the last 2+ years. It will be closed in the next 14 days unless any other activity occurs or one of the following labels is added: "not stale", "awaiting-bazeler". Please reach out to the triage team (@bazelbuild/triage) if you think this issue is still relevant or you are interested in getting the issue resolved.

@github-actions github-actions bot added the stale Issues or PRs that are stale (no activity for 30 days) label Apr 12, 2023
@github-actions
Copy link

This issue has been automatically closed due to inactivity. If you're still interested in pursuing this, please reach out to the triage team (@bazelbuild/triage). Thanks!

@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Apr 27, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
P3 We're not considering working on this, but happy to review a PR. (No assignee) stale Issues or PRs that are stale (no activity for 30 days) team-Rules-Java Issues for Java rules type: bug
Projects
None yet
Development

No branches or pull requests

4 participants