-
Notifications
You must be signed in to change notification settings - Fork 25.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Simplify messages for startup errors in plugins #34895
Comments
Pinging @elastic/es-core-infra |
CC: @elastic/es-security because x-pack-security relies on a lot of configuration which can go wrong & we see a lot of this pain. |
This has been discussed in the core-infra-meeting today. No engineer would put his vote on removing stack traces. We have discussed on not catching and rethrowing so much, and of throwing special exceptions that are not catched but very high in the call stack. Neither alternative resisted the argument that we might be hiding obscure failure modes. But, user friendly messages are desirable and we have convened that we should include user friendly banner messages before and after the stack trace deluge. We will do our best to guess the most likely original cause (by walking the cause-by-chain) and show that to the user. The guessing has to be tailored for every particular failure instance. Discuss should be a useful trove for such samples. |
This part I am not convinced by; why is it not enough to use what we use on the REST layer? |
As per Jason's comment, |
@albertzaharovits Any progress on this? |
@albertzaharovits I've assigned this to me - let me know if you had something in progress. I need to implement it in order to make progress on better TLS error messages (#43079) |
@tvernum I have nothing in progress. |
When a Node fails to start, it will throw a StartupException on the main thread which will cause the process to exit. Previously these were simply logged in the same way as any other uncaught exception, which would potentially result in long stack traces with the key details (the primary cause) being nested somewhere in the middle of the log lines. This was particularly true if the failure was due to an exception being thrown within a plugin - the primary cause may well have been wrapped in two or three other exceptions before it was logged. This commit adds a new summarised description whenever there is an uncaught StartupException. This summary is logged before and after the standard stack trace logging to make it more prominent and increase the likelihood that it will be noticed and understood. The summary focuses on printing messages from ElasticsearchExceptions as these are the most likely to hold clear, specific and actionable information and also prints the message for each cause of the ElasticsearchException which may contain the precise details (e.g. the pathname in a FileNotFoundException or AccessDeniedException). Resolves: elastic#34895
@tvernum This issue came up in core/infra's issue triage for |
This has been open for quite a while, and we haven't made much progress on this due to focus in other areas. For now I'm going to close this as something we aren't planning on implementing. We can re-open it later if needed. |
If a plugin/module has a configuration problem that prevents startup, we currently throw/log a very verbose chain of exceptions that tends to overwhelm users, and makes it difficult to diagnose and resolve the problem.
For example, this is the error message you get if a TLS certificate is missing:
Some of those are caused by Security being overly enthusiastic about wrapping & rethrowing exceptions. The two ElasticsearchExceptions should definitely be merged into a single exception with a better message (
"Cannot configure SSL/TLS because ..."
).But the top of that exception chain is currently unavoidable for errors that exist within a plugin.
We should be aiming to have something actionable on the first line of the error output.
There seems to be simple improvement which would be to have something like this:
in
PluginsService
With that change, and some cleanup of the exception handling in Security, we get a more useful (though still long) error log:
That's still longer than I'd like (compare it to the simple messaging when a bootstrap check fails), but it would be a step forward.
It would be even better if we could to something really clear like:
The text was updated successfully, but these errors were encountered: