Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve exception handling on TransportMasterNodeAction #29314

Merged
merged 1 commit into from
Apr 3, 2018

Conversation

ywelsch
Copy link
Contributor

@ywelsch ywelsch commented Mar 30, 2018

We have seen exceptions of the following kind bubbling up to the uncaught exception handler:

[WARN ][o.e.b.ElasticsearchUncaughtExceptionHandler] [XYZ] uncaught exception in thread [elasticsearch[XYZ][clusterService#updateTask][T#1]]
org.elasticsearch.index.IndexNotFoundException: no such index
	at org.elasticsearch.cluster.metadata.IndexNameExpressionResolver$WildcardExpressionResolver.infe(IndexNameExpressionResolver.java:676) ~[elasticsearch-5.6.5.jar:5.6.5]
	at org.elasticsearch.cluster.metadata.IndexNameExpressionResolver$WildcardExpressionResolver.innerResolve(IndexNameExpressionResolver.java:630) ~[elasticsearch-5.6.5.jar:5.6.5]
	at org.elasticsearch.cluster.metadata.IndexNameExpressionResolver$WildcardExpressionResolver.resolve(IndexNameExpressionResolver.java:578) ~[elasticsearch-5.6.5.jar:5.6.5]
	at org.elasticsearch.cluster.metadata.IndexNameExpressionResolver.concreteIndices(IndexNameExpressionResolver.java:168) ~[elasticsearch-5.6.5.jar:5.6.5]
	at org.elasticsearch.cluster.metadata.IndexNameExpressionResolver.concreteIndexNames(IndexNameExpressionResolver.java:144) ~[elasticsearch-5.6.5.jar:5.6.5]
	at org.elasticsearch.cluster.metadata.IndexNameExpressionResolver.concreteIndexNames(IndexNameExpressionResolver.java:77) ~[elasticsearch-5.6.5.jar:5.6.5]
	at org.elasticsearch.action.admin.indices.get.TransportGetIndexAction.checkBlock(TransportGetIndexAction.java:63) ~[elasticsearch-5.6.5.jar:5.6.5]
	at org.elasticsearch.action.admin.indices.get.TransportGetIndexAction.checkBlock(TransportGetIndexAction.java:47) ~[elasticsearch-5.6.5.jar:5.6.5]
	at org.elasticsearch.action.support.master.TransportMasterNodeAction$AsyncSingleAction.doStart(TransportMasterNodeAction.java:134) ~[elasticsearch-5.6.5.jar:5.6.5]
	at org.elasticsearch.action.support.master.TransportMasterNodeAction$AsyncSingleAction$4.onNewClusterState(TransportMasterNodeAction.java:198) ~[elasticsearch-5.6.5.jar:5.6.5]
	at org.elasticsearch.cluster.ClusterStateObserver$ContextPreservingListener.onNewClusterState(ClusterStateObserver.java:297) ~[elasticsearch-5.6.5.jar:5.6.5]
	at org.elasticsearch.cluster.ClusterStateObserver$ObserverClusterStateListener.postAdded(ClusterStateObserver.java:208) ~[elasticsearch-5.6.5.jar:5.6.5]
	at org.elasticsearch.cluster.service.ClusterService$1.run(ClusterService.java:401) ~[elasticsearch-5.6.5.jar:5.6.5]
	at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:569) ~[elasticsearch-5.6.5.jar:5.6.5]
	at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.runAndClean(PrioritizedEsThreadPoolExecutor.java:247) ~[elasticsearch-5.6.5.jar:5.6.5]
	at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedEsThreadPoolExecutor.java:210) ~[elasticsearch-5.6.5.jar:5.6.5]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) ~[?:1.8.0_144]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) ~[?:1.8.0_144]
	at java.lang.Thread.run(Thread.java:748) [?:1.8.0_144]

Checking the blocks can lead for example to IndexNotFoundException when the indices are resolved. In order to make TransportMasterNodeAction more resilient against such expected exceptions, this code change wraps the execution of doStart() into a try catch and informs the listener in case of failures.

@ywelsch ywelsch added >enhancement :Core/Infra/Core Core issues without another label v7.0.0 v6.3.0 labels Mar 30, 2018
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-core-infra

}
} catch (Exception e) {
listener.onFailure(e);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While it might be tricky to see in this diff what's actually changed: The two lines above here have been added.

} catch (Exception e) {
// accept state as block will be rechecked by doStart() and listener.onFailure() then called
logger.trace("exception occurred during cluster block checking, accepting state", e);
return true;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While it might be tricky to see in this diff what's actually changed: The three lines above here have been added.

Copy link
Member

@jasontedor jasontedor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@ywelsch ywelsch merged commit d4538df into elastic:master Apr 3, 2018
ywelsch added a commit that referenced this pull request Apr 3, 2018
We have seen exceptions bubble up to the uncaught exception handler. Checking the blocks can
lead for example to IndexNotFoundException when the indices are resolved. In order to make
TransportMasterNodeAction more resilient against such expected exceptions, this code change
wraps the execution of doStart() into a try catch and informs the listener in case of failures.
jasontedor added a commit to jasontedor/elasticsearch that referenced this pull request Apr 3, 2018
* master:
  Reindex: Fix error in delete-by-query rest spec (elastic#29318)
  Improve similarity integration. (elastic#29187)
  Fix some query extraction bugs. (elastic#29283)
  [Docs] Correct experimental note formatting
  Move Nullable into core (elastic#29341)
  [Docs] Update getting-started.asciidoc (elastic#29294)
  Elasticsearch 6.3.0 is now on Lucene 7.3.
  [DOCS] Refer back to index API for full-document updates in _update API section (elastic#28677)
  Fix missing comma in ingest-node.asciidoc (elastic#29343)
  Improve exception handling on TransportMasterNodeAction (elastic#29314)
  Don't break allocation if resize source index is missing (elastic#29311)
  Use fixture to test repository-s3 plugin (elastic#29296)
  Fix NDCG for empty search results (elastic#29267)
  Pass through script params in scripted metric agg (elastic#29154)
  Fix Eclipse build.
  Upgrade to lucene-7.3.0-snapshot-98a6b3d. (elastic#29298)
  Painless: Remove extraneous INLINE constant. (elastic#29340)
jasontedor added a commit to jasontedor/elasticsearch that referenced this pull request Apr 3, 2018
* master:
  Build: Fix Java9 MR build (elastic#29312)
  Reindex: Fix error in delete-by-query rest spec (elastic#29318)
  Improve similarity integration. (elastic#29187)
  Fix some query extraction bugs. (elastic#29283)
  [Docs] Correct experimental note formatting
  Move Nullable into core (elastic#29341)
  [Docs] Update getting-started.asciidoc (elastic#29294)
  Elasticsearch 6.3.0 is now on Lucene 7.3.
  [DOCS] Refer back to index API for full-document updates in _update API section (elastic#28677)
  Fix missing comma in ingest-node.asciidoc (elastic#29343)
  Improve exception handling on TransportMasterNodeAction (elastic#29314)
  Don't break allocation if resize source index is missing (elastic#29311)
  Use fixture to test repository-s3 plugin (elastic#29296)
  Fix NDCG for empty search results (elastic#29267)
  Pass through script params in scripted metric agg (elastic#29154)
  Fix Eclipse build.
  Upgrade to lucene-7.3.0-snapshot-98a6b3d. (elastic#29298)
  Painless: Remove extraneous INLINE constant. (elastic#29340)
  Remove HTTP max content length leniency (elastic#29337)
  Begin moving XContent to a separate lib/artifact (elastic#29300)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants