-
Notifications
You must be signed in to change notification settings - Fork 24.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Do not catch throwable #19231
Do not catch throwable #19231
Conversation
This PR is blocked on a release of SecureSM that incorporates elastic/securesm#4, but I think that reviews can start, and the PR also helps clarify the intent of elastic/securesm#4. |
I am not sure we should block this aweseom cleanup by the securesm changes. I'm likely -1 on having multiple exit points in the JVM but most of this PR is awesome. Can we detach the two? |
I've detached the uncaught exception handler changes and will open a second PR on top of master (for discussion) when this PR is integrated. |
Today throughout the codebase, catch throwable is used with reckless abandon. This is dangerous because the throwable could be a fatal virtual machine error resulting from an internal error in the JVM, or an out of memory error or a stack overflow error that leaves the virtual machine in an unstable and unpredictable state. This commit removes catch throwable from the codebase and removes the temptation to use it by modifying listener APIs to receive instances of Exception instead of the top-level Throwable.
LGTM lets get this in! I really wonder how we can keep this under control maybe we can add some forbidden API magic? |
Can't this change potentially make it harder on operations to deal with ES? I am thinking of case where you get a domino effect of nodes just dying, and the fewer nodes are up the more likely the others will die. Factor in the fact that with some customers it can take many minutes for ES to restart. |
If the JVM throws an OOM then salvaging it is likely to put it in an unexpected state. That is scary enough that it is worth crashing I think. IO errors are still caught and salvaged. I agree that we need to prevent these OOMs from happening and we do actively work on that. |
I think that it makes it easier because there is no safe recovery from virtual machine errors like OOM. In the past, we would silently discard these fatal errors leaving the JVM in an unpredictable state. In that case, operators should be restarting their instances, but they didn't have a way of reliably knowing if the JVM experienced such an error or not. With this change and #19272, we've taken this burden away from operators and removed a potential source of corruption and other resiliency issues.
When the JVM throws an OOM it enters a questionable and unexpected state, the point is that it can not be salvaged.
We can not reliably do this after the JVM has entered such a state, there are no guarantees that we can safely do anything at this point. |
Today throughout the codebase, catch throwable is used with reckless
abandon. This is dangerous because the throwable could be a fatal
virtual machine error resulting from an internal error in the JVM, or an
out of memory error or a stack overflow error that leaves the virtual
machine in an unstable and unpredictable state. This commit removes
catch throwable from the codebase and removes the temptation to use it
by modifying listener APIs to receive instances of Exception instead of
the top-level Throwable.