-
Notifications
You must be signed in to change notification settings - Fork 456
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
pageserver: avoid logging the "ERROR" part of DbErrors that are successes #4902
Conversation
pageserver/src/tenant/timeline/walreceiver/connection_manager.rs
Outdated
Show resolved
Hide resolved
302d38b
to
be225bf
Compare
1264 tests run: 1212 passed, 0 failed, 52 skipped (full report) |
pageserver/src/tenant/timeline/walreceiver/connection_manager.rs
Outdated
Show resolved
Hide resolved
pageserver/src/tenant/timeline/walreceiver/walreceiver_connection.rs
Outdated
Show resolved
Hide resolved
Example: ``` walreceiver connection handling ended: db error: ERROR: ending streaming to Some("pageserver") at 0/4031CA8 ``` The inner DbError has a severity of ERROR so DbError's Display implementation includes that ERROR, even though we are actually logging the error at INFO level. Introduce an explicit WalReceiverError type, and in its From<> for postgres errors, apply the logic from ExpectedError, for expected errors, and a new condition for successes. The new output looks like: ``` walreceiver connection handling ended with success: ending streaming to Some("pageserver") at 0/154E9C0, receiver is caughtup and there is no computes ```
be225bf
to
645442c
Compare
(force push because it's entirely different approach) The more I picked at trying to make a more explicit check for the condition instead of playing with messages, the more it seemed to make sense to use some structured error enums here instead, and do traditional error This is a more invasive change, but the overall LOC change is only about +10 compared with the old way, and it makes us ready to use the SuccessfulCompletion variant for logic as/when we need to. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall this is looking great, we are on the path to recognize the "ending streaming" succesful completions and can propagate that upwards once all or some subset of safekeepers have closed wal streams for this reason. Unsure though if we know how many safekeepers there should be; probably only probabilisticly.
Problem
The pageserver<->safekeeper protocol uses error messages to indicate end of stream. pageserver already logs these at INFO level, but the inner error message includes the word "ERROR", which interferes with log searching.
Example:
The inner DbError has a severity of ERROR so DbError's Display
implementation includes that ERROR, even though we are actually
logging the error at INFO level.
Summary of changes
Introduce an explicit WalReceiverError type, and in its From<>
for postgres errors, apply the logic from ExpectedError, for
expected errors, and a new condition for successes.
The new output looks like:
Checklist before requesting a review
Checklist before merging