-
Notifications
You must be signed in to change notification settings - Fork 24.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Log summarised description of StartupExceptions #44536
Conversation
When a Node fails to start, it will throw a StartupException on the main thread which will cause the process to exit. Previously these were simply logged in the same way as any other uncaught exception, which would potentially result in long stack traces with the key details (the primary cause) being nested somewhere in the middle of the log lines. This was particularly true if the failure was due to an exception being thrown within a plugin - the primary cause may well have been wrapped in two or three other exceptions before it was logged. This commit adds a new summarised description whenever there is an uncaught StartupException. This summary is logged before and after the standard stack trace logging to make it more prominent and increase the likelihood that it will be noticed and understood. The summary focuses on printing messages from ElasticsearchExceptions as these are the most likely to hold clear, specific and actionable information and also prints the message for each cause of the ElasticsearchException which may contain the precise details (e.g. the pathname in a FileNotFoundException or AccessDeniedException). Resolves: elastic#34895
Pinging @elastic/es-core-infra |
As an example (the test case has similar, but more contrived examples) here is the log message you get if you have the wrong password for a keystore for SSL:
and this is a missing CA file:
some of those still need tidying up - we should include the path to the keystore when we fail to decrypt it, but they're separate to this change. |
Getting the root |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don’t like the approach here of changing the uncaught exception handler, as this change overloads its purpose (from handling uncaught exceptions and deciding whether or not to exit, to handling uncaught exceptions, and formatting some of them in a special way, and also still deciding whether or not to exit). Another problem that I have with this change is that it only applies to uncaught StartupException
but that doesn’t cover if we catch them and handle them specially already. If we were to make this change, I would rather see it that we customize how StartupException
is displayed (it’s printStackTrace
method, etc.). I say if because I still need to think about whether or not this even a change that I am in favor of, but let’s see how it looks done differently, I don’t think it’s too much to move away from changing the uncaught exception handler.
I've finally found some cycles to come back to this, and I'm a bit stuck. Key Summary (TL;DR)
Full detailsThe problem
The current implementation
Options to resolve the problem:
About Setting Validators:
|
@tvernum In regards to one underlying issue:
I think we need to solve this regardless. We shouldn't be consuming settings in plugin constructors (long term we should remove this ability) since we have not yet even registered settings at that point. Looking at the consumers of the "shared ssl service", it looks like monitoring and watcher both use it to construct http clients. I wonder if we should have this constructable in core (similar to NodeClient?), and xpack security registers a hook to make it an ssl client. In short, I think this is a solvable problem, and we should first focus on that. |
@rjernst I assume (and If so, I think moving the creation of the SSL service to
Is the contract around the order in which plugin methods get called documented anywhere (even if the docs are in the form of a test)? I've run into similar questions in the past while trying to ensure that an object constructed in a particular method can depend on other objects having been constructed first.
From memory, core doesn't (and shouldn't) depend on HTTP client, and there's dependency conflict issues if we tried to pull it in (different plugins can use different versions of http client). |
No, and in fact, the intention was for there to be no contract. Ordering across plugins was never meant to be guaranteed. The sorting you mentioned was specifically added to control the hierarchy of plugin classloaders specifically to allow SPI across plugins, but that order does not gaurantee anything about the iteration order when calling plugin methods. Historically, however, we have relied on createComponents being called before certain other plugin methods, and so what you propose is ok for now at least. In fact, it used to be this way when x-pack was all one plugin. |
XPackPlugin created an SSLService within the plugin contructor. This has 2 negative consequences: 1. The service may be constructed based on a partial view of settings. Other plugins are free to add setting values via the additionalSettings() method, but this (necessarily) happens after plugins have been constructed. 2. Any exceptions thrown during the plugin construction are handled differently than exceptions thrown during "createComponents". Since SSL configurations exceptions are relatively common, it is far preferable for them to be thrown and handled as part of the createComponents flow. This commit moves the creation of the SSLService to XPackPlugin.createComponents, and alters the sequence of some other steps to accommodate this change. Relates: elastic#44536
XPackPlugin created an SSLService within the plugin contructor. This has 2 negative consequences: 1. The service may be constructed based on a partial view of settings. Other plugins are free to add setting values via the additionalSettings() method, but this (necessarily) happens after plugins have been constructed. 2. Any exceptions thrown during the plugin construction are handled differently than exceptions thrown during "createComponents". Since SSL configurations exceptions are relatively common, it is far preferable for them to be thrown and handled as part of the createComponents flow. This commit moves the creation of the SSLService to XPackPlugin.createComponents, and alters the sequence of some other steps to accommodate this change. Relates: #44536
XPackPlugin created an SSLService within the plugin contructor. This has 2 negative consequences: 1. The service may be constructed based on a partial view of settings. Other plugins are free to add setting values via the additionalSettings() method, but this (necessarily) happens after plugins have been constructed. 2. Any exceptions thrown during the plugin construction are handled differently than exceptions thrown during "createComponents". Since SSL configurations exceptions are relatively common, it is far preferable for them to be thrown and handled as part of the createComponents flow. This commit moves the creation of the SSLService to XPackPlugin.createComponents, and alters the sequence of some other steps to accommodate this change. Relates: elastic#44536 Backport of: elastic#49667
XPackPlugin created an SSLService within the plugin contructor. This has 2 negative consequences: 1. The service may be constructed based on a partial view of settings. Other plugins are free to add setting values via the additionalSettings() method, but this (necessarily) happens after plugins have been constructed. 2. Any exceptions thrown during the plugin construction are handled differently than exceptions thrown during "createComponents". Since SSL configurations exceptions are relatively common, it is far preferable for them to be thrown and handled as part of the createComponents flow. This commit moves the creation of the SSLService to XPackPlugin.createComponents, and alters the sequence of some other steps to accommodate this change. Relates: elastic#44536
Hi @tvernum, I've created a changelog YAML for you. |
When a Node fails to start, it will throw a StartupException on the
main thread which will cause the process to exit.
Previously these were simply logged in the same way as any other
uncaught exception, which would potentially result in long stack
traces with the key details (the primary cause) being nested somewhere
in the middle of the log lines. This was particularly true if the
failure was due to an exception being thrown within a plugin - the
primary cause may well have been wrapped in two or three other
exceptions before it was logged.
This commit adds a new summarised description whenever there is an
uncaught StartupException. This summary is logged before and after the
standard stack trace logging to make it more prominent and increase
the likelihood that it will be noticed and understood.
The summary focuses on printing messages from ElasticsearchExceptions
as these are the most likely to hold clear, specific and actionable
information and also prints the message for each cause of the
ElasticsearchException which may contain the precise details (e.g. the
pathname in a FileNotFoundException or AccessDeniedException).
Resolves: #34895