Do not catch throwable #19231

jasontedor · 2016-07-03T01:37:06Z

Today throughout the codebase, catch throwable is used with reckless
abandon. This is dangerous because the throwable could be a fatal
virtual machine error resulting from an internal error in the JVM, or an
out of memory error or a stack overflow error that leaves the virtual
machine in an unstable and unpredictable state. This commit removes
catch throwable from the codebase and removes the temptation to use it
by modifying listener APIs to receive instances of Exception instead of
the top-level Throwable.

jasontedor · 2016-07-03T01:38:31Z

This PR is blocked on a release of SecureSM that incorporates elastic/securesm#4, but I think that reviews can start, and the PR also helps clarify the intent of elastic/securesm#4.

s1monw · 2016-07-03T11:10:10Z

I am not sure we should block this aweseom cleanup by the securesm changes. I'm likely -1 on having multiple exit points in the JVM but most of this PR is awesome. Can we detach the two?

jasontedor · 2016-07-03T12:31:11Z

Can we detach the two?

I've detached the uncaught exception handler changes and will open a second PR on top of master (for discussion) when this PR is integrated.

Today throughout the codebase, catch throwable is used with reckless abandon. This is dangerous because the throwable could be a fatal virtual machine error resulting from an internal error in the JVM, or an out of memory error or a stack overflow error that leaves the virtual machine in an unstable and unpredictable state. This commit removes catch throwable from the codebase and removes the temptation to use it by modifying listener APIs to receive instances of Exception instead of the top-level Throwable.

s1monw · 2016-07-04T12:11:25Z

LGTM lets get this in! I really wonder how we can keep this under control maybe we can add some forbidden API magic?

agirbal · 2016-07-11T15:33:09Z

Can't this change potentially make it harder on operations to deal with ES? I am thinking of case where you get a domino effect of nodes just dying, and the fewer nodes are up the more likely the others will die. Factor in the fact that with some customers it can take many minutes for ES to restart.
Back in the day I was working on a Saas, following many incidents of domino effects, we tried really hard to always salvage the JVM, would it be only to troubleshoot it. Upon certain exceptions (IO error from disk, OOM) it would enter a degraded state and report it to rest of cluster to stop receiving requests or just pass through only (i.e. client mode). From there one could flush its caches, check the instance and potentially put it back in normal state.

nik9000 · 2016-07-11T15:36:55Z

If the JVM throws an OOM then salvaging it is likely to put it in an unexpected state. That is scary enough that it is worth crashing I think.

IO errors are still caught and salvaged.

I agree that we need to prevent these OOMs from happening and we do actively work on that.

jasontedor · 2016-07-11T15:51:28Z

Can't this change potentially make it harder on operations to deal with ES?

I think that it makes it easier because there is no safe recovery from virtual machine errors like OOM. In the past, we would silently discard these fatal errors leaving the JVM in an unpredictable state. In that case, operators should be restarting their instances, but they didn't have a way of reliably knowing if the JVM experienced such an error or not. With this change and #19272, we've taken this burden away from operators and removed a potential source of corruption and other resiliency issues.

Back in the day I was working on a Saas, following many incidents of domino effects, we tried really hard to always salvage the JVM, would it be only to troubleshoot it.

When the JVM throws an OOM it enters a questionable and unexpected state, the point is that it can not be salvaged.

From there one could flush its caches, check the instance and potentially put it back in normal state.

We can not reliably do this after the JVM has entered such a state, there are no guarantees that we can safely do anything at this point.

jasontedor added >enhancement review resiliency v5.0.0-alpha5 labels Jul 3, 2016

jasontedor changed the title ~~Die with dignity~~ Do not catch throwable Jul 3, 2016

jasontedor merged commit 3343cee into elastic:master Jul 4, 2016

jasontedor deleted the throwable-be-gone branch July 4, 2016 12:41

clintongormley added the :Exceptions label Jul 4, 2016

danielmitterdorfer mentioned this pull request Jul 5, 2016

Elasticsearch nodes run into OOM during sustained ThreadPoolRejections #18230

Closed

jasontedor mentioned this pull request Jul 5, 2016

Die with dignity #19272

Merged

jasontedor mentioned this pull request Jul 20, 2016

Acquire Java version simply netty/netty#5552

Closed

jasontedor mentioned this pull request Aug 1, 2016

Simplify write failure handling #19105

Merged

This was referenced Dec 2, 2016

Netty should stop swallowing netty/netty#6096

Open

Remove support for Visio and potm files #22079

Merged

jasontedor mentioned this pull request Sep 16, 2017

[o.e.b.ElasticsearchUncaughtExceptionHandler] [node01-es-dev] fatal error in thread [elasticsearch[node01-es-dev][search][T#6]], exiting java.lang.OutOfMemoryError: Java heap space #26525

Closed

lcawl added :Core/Infra/Core Core issues without another label and removed :Exceptions labels Feb 13, 2018

jasontedor mentioned this pull request Apr 19, 2018

jvmStats does not handle java.lang.InternalError #29624

Closed

DaveCTurner mentioned this pull request Feb 9, 2021

Don't allow descriptionless assertion statements in production code #68616

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Do not catch throwable #19231

Do not catch throwable #19231

jasontedor commented Jul 3, 2016 •

edited

Loading

jasontedor commented Jul 3, 2016

s1monw commented Jul 3, 2016

jasontedor commented Jul 3, 2016

s1monw commented Jul 4, 2016

agirbal commented Jul 11, 2016

nik9000 commented Jul 11, 2016

jasontedor commented Jul 11, 2016 •

edited

Loading

Do not catch throwable #19231

Do not catch throwable #19231

Conversation

jasontedor commented Jul 3, 2016 • edited Loading

jasontedor commented Jul 3, 2016

s1monw commented Jul 3, 2016

jasontedor commented Jul 3, 2016

s1monw commented Jul 4, 2016

agirbal commented Jul 11, 2016

nik9000 commented Jul 11, 2016

jasontedor commented Jul 11, 2016 • edited Loading

jasontedor commented Jul 3, 2016 •

edited

Loading

jasontedor commented Jul 11, 2016 •

edited

Loading