-
Notifications
You must be signed in to change notification settings - Fork 24.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Don't close stderr under --quiet
#47208
Don't close stderr under --quiet
#47208
Conversation
Closes elastic#46900. When running ES with --quiet, if ES then exits abnormally, a user has to go hunting in the logs for the error. Instead, never close System.err, and print more information to it if ES encounters a fatal error e.g. config validation, or some fatal runtime exception.
Pinging @elastic/es-core-infra |
@elasticmachine run elasticsearch-ci/packaging-sample as it appears to have timed out. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left an observation that I think requires following up on.
@@ -91,6 +91,12 @@ public void checkPermission(Permission perm) { | |||
final Elasticsearch elasticsearch = new Elasticsearch(); | |||
int status = main(args, elasticsearch, Terminal.DEFAULT); | |||
if (status != ExitCodes.OK) { | |||
Terminal.DEFAULT.errorPrintln("ERROR: Elasticsearch did not exit normally - check the logs at " + |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's possible that we can fail before these system properties have been set.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem here is that we can fail before we’ve configured logging, and so those system properties would not be set giving an empty error message to the user. The guarantee that we have here though is that if logging is not yet been configured, then no log messages will have been written (we have a check that ensures we don't touch logging before it's configured, and we fail startup if it is). That means that we should skip these message if the logging properties are not set, because we know from our guarantees that there won't be anything in the logs to look at.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jasontedor I've added guards around both places where I'm printing the log file location.
Packaging tests will block reading from streams until they close when starting ES. |
@elasticmachine update branch |
@elasticmachine update branch |
BATS tests are still hanging. I'll need to dig into what's going on here. |
I ssh'd into a VM while the BATS tests were running, and only some of them are hanging. By 'sudo kill'ing the rogue ones, the test run was able to proceed (though failed overall, naturally). I'll see what I can find out. |
I think I know what's going on. BATS has a output="$("$@" 2>&1)" So while a process' stdout or stderr is open, bash will be stuck waiting. Since stderr isn't closed with |
…-print-logs-location
The previous change never closed stderr, either when `--quiet` or when `-d` is passed. However, this broken the BATS tests, which expected to be able to quickly consume all output when passing `-d`. Thinking about it some more, it's reasonable (if perhaps still debatable) to leave stderr open under `--quiet`, which fixes the systemd case, but close stderr under `-d`.
@elasticmachine run elasticsearch-ci/packaging-sample-matrix as it was killed by some mysterious outside force. |
…-print-logs-location
There's no point telling the user to go and check the logs if we haven't configured logging yet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left a question, a nit. I also wonder if we can add a packaging test here, to ensure that future behavior maintains this? It would look something like forcing Elasticsearch to die at startup, and then checking standard error for the expected output (which might have been redirected to the journal, for example).
@@ -151,6 +151,10 @@ public final boolean promptYesNo(String prompt, boolean defaultYes) { | |||
} | |||
} | |||
|
|||
public void flush() { | |||
this.getErrorWriter().flush(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's puzzling to me that flush only flushes the error stream but not the output stream?
@@ -91,6 +91,17 @@ public void checkPermission(Permission perm) { | |||
final Elasticsearch elasticsearch = new Elasticsearch(); | |||
int status = main(args, elasticsearch, Terminal.DEFAULT); | |||
if (status != ExitCodes.OK) { | |||
final String basePath = System.getProperty("es.logs.base_path"); | |||
// It's possible to fail before logging has been configured, in which case there's no point | |||
// suggested that the user look in the log file. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggested
-> suggesting
…-print-logs-location
…-print-logs-location
@jasontedor I addressed your feedback in 4091c55. |
@jasontedor is this PR good for an approval now? |
@pugnascotia Did you see my previous comment about testing?
|
Gyah, I did...and didn’t do it. Sorry, I’ll get that sorted out before I ping you again |
Refactor PackageTests so that the logic for preparing a custom config is extracted to a helper method that takes a closure. Add a test that configures two GC types, and check that the bootstrap checks fail. Also some other misc cleanups.
…-print-logs-location
@jasontedor I've added a test now, where I configure two GC strategies and check the systemd journal for the error message. |
@elasticmachine run elasticsearch-ci/packaging-sample-matrix |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left a comment about the test. We're testing JVM behavior with what is proposed, not Elasticsearch behavior.
@@ -374,4 +340,58 @@ public void test83serviceFileSetsLimits() throws Exception { | |||
|
|||
stopElasticsearch(sh); | |||
} | |||
|
|||
public void test90DoNotCloseStderrWhenQuiet() throws Exception { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this tests what we want to test since the JVM never starts here (i.e., Elasticsearch code never runs). Instead, you could create an invalid elasticsearch.yml
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know how this happened 😅 I was trying to trigger the bootstrap check about using the serial collector, and got in a muddle. I'll rework it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Take 2 ready for a look @jasontedor.
The new test for checking stderr output was actually checking what Java emits when multiple GC strategies are configured. Now the test configures a broken YAML config, and check that ES reports this to stderr.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
@elasticmachine run elasticsearch-ci/2 due to docker / overlay issues. |
--quiet
It'd be super if this was back ported to 7.x - I've added labels to that effect so we don't lose track of it. |
Yep, I'll definitely do that 👍 |
Closes elastic#46900. When running ES with `--quiet`, if ES then exits abnormally, a user has to go hunting in the logs for the error. Instead, never close System.err, and print more information to it if ES encounters a fatal error e.g. config validation, or some fatal runtime exception. This is useful when running under e.g. systemd, since the error will go into the journal. Note that stderr is still closed in daemon (`-d`) mode.
Backport of #47208. Closes #46900. When running ES with `--quiet`, if ES then exits abnormally, a user has to go hunting in the logs for the error. Instead, never close System.err, and print more information to it if ES encounters a fatal error e.g. config validation, or some fatal runtime exception. This is useful when running under e.g. systemd, since the error will go into the journal. Note that stderr is still closed in daemon (`-d`) mode.
The backport PR seems to have been merged. I'm therefore removing the backport pending label here. Please shout if this is incorrect |
Closes #46900. When running ES with
--quiet
, if ES then exits abnormally, a user has to go hunting in the logs for the error. Instead, never close System.err, and print more information to it if ES encounters a fatal error e.g. config validation, or some fatal runtime exception. This is useful when running under e.g. systemd, since the error will go into the journal.Note that stderr is still closed in daemon (
-d
) mode.