Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Don't close stderr under --quiet #47208

Merged

Conversation

pugnascotia
Copy link
Contributor

@pugnascotia pugnascotia commented Sep 27, 2019

Closes #46900. When running ES with --quiet, if ES then exits abnormally, a user has to go hunting in the logs for the error. Instead, never close System.err, and print more information to it if ES encounters a fatal error e.g. config validation, or some fatal runtime exception. This is useful when running under e.g. systemd, since the error will go into the journal.

Note that stderr is still closed in daemon (-d) mode.

Closes elastic#46900. When running ES with --quiet, if ES then exits
abnormally, a user has to go hunting in the logs for the error. Instead,
never close System.err, and print more information to it if ES
encounters a fatal error e.g.  config validation, or some fatal runtime
exception.
@pugnascotia pugnascotia added :Core/Infra/Core Core issues without another label v8.0.0 v7.5.0 labels Sep 27, 2019
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-core-infra

@pugnascotia
Copy link
Contributor Author

@elasticmachine run elasticsearch-ci/packaging-sample as it appears to have timed out.

Copy link
Member

@jasontedor jasontedor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left an observation that I think requires following up on.

@@ -91,6 +91,12 @@ public void checkPermission(Permission perm) {
final Elasticsearch elasticsearch = new Elasticsearch();
int status = main(args, elasticsearch, Terminal.DEFAULT);
if (status != ExitCodes.OK) {
Terminal.DEFAULT.errorPrintln("ERROR: Elasticsearch did not exit normally - check the logs at " +
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's possible that we can fail before these system properties have been set.

Copy link
Member

@jasontedor jasontedor Oct 13, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem here is that we can fail before we’ve configured logging, and so those system properties would not be set giving an empty error message to the user. The guarantee that we have here though is that if logging is not yet been configured, then no log messages will have been written (we have a check that ensures we don't touch logging before it's configured, and we fail startup if it is). That means that we should skip these message if the logging properties are not set, because we know from our guarantees that there won't be anything in the logs to look at.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jasontedor I've added guards around both places where I'm printing the log file location.

@alpar-t
Copy link
Contributor

alpar-t commented Sep 30, 2019

Packaging tests will block reading from streams until they close when starting ES.
The changes in #46924 will help with this, switching to redirect outputs to files instead and implementing process based timeouts so it's easier to see if something hangs.
Not sure if it's something similar with the bats tests failing here

@pugnascotia
Copy link
Contributor Author

@elasticmachine update branch

@pugnascotia
Copy link
Contributor Author

@elasticmachine update branch

@pugnascotia
Copy link
Contributor Author

BATS tests are still hanging. I'll need to dig into what's going on here.

@pugnascotia
Copy link
Contributor Author

I ssh'd into a VM while the BATS tests were running, and only some of them are hanging. By 'sudo kill'ing the rogue ones, the test run was able to proceed (though failed overall, naturally). I'll see what I can find out.

@pugnascotia
Copy link
Contributor Author

I think I know what's going on. BATS has a run() helper, which executes a command like so:

output="$("$@" 2>&1)"

So while a process' stdout or stderr is open, bash will be stuck waiting. Since stderr isn't closed with -d in this PR...that's a problem.

The previous change never closed stderr, either when `--quiet` or when
`-d` is passed. However, this broken the BATS tests, which expected to
be able to quickly consume all output when passing `-d`. Thinking about
it some more, it's reasonable (if perhaps still debatable) to leave
stderr open under `--quiet`, which fixes the systemd case, but close
stderr under `-d`.
@pugnascotia
Copy link
Contributor Author

@elasticmachine run elasticsearch-ci/packaging-sample-matrix as it was killed by some mysterious outside force.

There's no point telling the user to go and check the logs if we haven't
configured logging yet.
@jasontedor jasontedor self-requested a review October 28, 2019 14:49
Copy link
Member

@jasontedor jasontedor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left a question, a nit. I also wonder if we can add a packaging test here, to ensure that future behavior maintains this? It would look something like forcing Elasticsearch to die at startup, and then checking standard error for the expected output (which might have been redirected to the journal, for example).

@@ -151,6 +151,10 @@ public final boolean promptYesNo(String prompt, boolean defaultYes) {
}
}

public void flush() {
this.getErrorWriter().flush();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's puzzling to me that flush only flushes the error stream but not the output stream?

@@ -91,6 +91,17 @@ public void checkPermission(Permission perm) {
final Elasticsearch elasticsearch = new Elasticsearch();
int status = main(args, elasticsearch, Terminal.DEFAULT);
if (status != ExitCodes.OK) {
final String basePath = System.getProperty("es.logs.base_path");
// It's possible to fail before logging has been configured, in which case there's no point
// suggested that the user look in the log file.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggested -> suggesting

@pugnascotia
Copy link
Contributor Author

@jasontedor I addressed your feedback in 4091c55.

@pugnascotia
Copy link
Contributor Author

@jasontedor is this PR good for an approval now?

@jasontedor
Copy link
Member

@pugnascotia Did you see my previous comment about testing?

I also wonder if we can add a packaging test here, to ensure that future behavior maintains this? It would look something like forcing Elasticsearch to die at startup, and then checking standard error for the expected output (which might have been redirected to the journal, for example).

@pugnascotia
Copy link
Contributor Author

Gyah, I did...and didn’t do it. Sorry, I’ll get that sorted out before I ping you again

Refactor PackageTests so that the logic for preparing a custom config is
extracted to a helper method that takes a closure. Add a test that
configures two GC types, and check that the bootstrap checks fail.

Also some other misc cleanups.
@pugnascotia
Copy link
Contributor Author

@jasontedor I've added a test now, where I configure two GC strategies and check the systemd journal for the error message.

@pugnascotia
Copy link
Contributor Author

@elasticmachine run elasticsearch-ci/packaging-sample-matrix

Copy link
Member

@jasontedor jasontedor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left a comment about the test. We're testing JVM behavior with what is proposed, not Elasticsearch behavior.

@@ -374,4 +340,58 @@ public void test83serviceFileSetsLimits() throws Exception {

stopElasticsearch(sh);
}

public void test90DoNotCloseStderrWhenQuiet() throws Exception {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this tests what we want to test since the JVM never starts here (i.e., Elasticsearch code never runs). Instead, you could create an invalid elasticsearch.yml.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know how this happened 😅 I was trying to trigger the bootstrap check about using the serial collector, and got in a muddle. I'll rework it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Take 2 ready for a look @jasontedor.

The new test for checking stderr output was actually checking what Java
emits when multiple GC strategies are configured. Now the test
configures a broken YAML config, and check that ES reports this to
stderr.
Copy link
Member

@jasontedor jasontedor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@pugnascotia
Copy link
Contributor Author

@elasticmachine run elasticsearch-ci/2 due to docker / overlay issues.

@pugnascotia pugnascotia changed the title Don't close stderr and print the logs directory on fatal error Don't close stderr under --quiet Nov 15, 2019
@pugnascotia pugnascotia merged commit 090859b into elastic:master Nov 15, 2019
@pugnascotia pugnascotia deleted the 46900-stderr-open-print-logs-location branch November 15, 2019 13:58
@DaveCTurner
Copy link
Contributor

It'd be super if this was back ported to 7.x - I've added labels to that effect so we don't lose track of it.

@pugnascotia
Copy link
Contributor Author

Yep, I'll definitely do that 👍

pugnascotia added a commit to pugnascotia/elasticsearch that referenced this pull request Nov 21, 2019
Closes elastic#46900. When running ES with `--quiet`, if ES then exits abnormally, a
user has to go hunting in the logs for the error. Instead, never close
System.err, and print more information to it if ES encounters a fatal error
e.g. config validation, or some fatal runtime exception. This is useful when
running under e.g. systemd, since the error will go into the journal.

Note that stderr is still closed in daemon (`-d`) mode.
pugnascotia added a commit that referenced this pull request Nov 22, 2019
Backport of #47208.

Closes #46900. When running ES with `--quiet`, if ES then exits abnormally, a
user has to go hunting in the logs for the error. Instead, never close
System.err, and print more information to it if ES encounters a fatal error
e.g. config validation, or some fatal runtime exception. This is useful when
running under e.g. systemd, since the error will go into the journal.

Note that stderr is still closed in daemon (`-d`) mode.
@ywelsch
Copy link
Contributor

ywelsch commented Feb 27, 2020

The backport PR seems to have been merged. I'm therefore removing the backport pending label here. Please shout if this is incorrect

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Expose logs to systemd users
8 participants