-
Notifications
You must be signed in to change notification settings - Fork 456
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
pageserver: suppress error logs in shutdown/detach #4876
Conversation
This suppresses logs like this: ``` Compaction failed, retrying in 2s: timeline is Stopping ``` ...by introducing an explicit ShuttingDown error and using it instead of a generic error.
This fixes spurious log lines like: ``` Compaction failed, retrying in 2s: Cannot run compaction iteration on inactive tenant ``` compaction_loop was doing its first-iteration random_init_delay, but not checking the CancellationToken between that and calling into copmaction_iteration, which expects to only be called when the timeline is in an active state. We should re-check cancellation token after sleeping in case we were asked to shut down in the interim.
1264 tests run: 1213 passed, 0 failed, 51 skipped (full report) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
create_image_layers
/ compact_level0
might also take a long time to complete and causing some failures. In the future, it would be good to pass the cancellation to them and return CompactionError::Shutdown
correspondingly.
@@ -103,6 +103,11 @@ async fn compaction_loop(tenant: Arc<Tenant>, cancel: CancellationToken) { | |||
} | |||
} | |||
|
|||
if cancel.is_cancelled() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is now a bit racy. Adding the error is good, and we should filter against such values knstead when logging near lines L114R119.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nevermind I missed we turn this into an Ok at the outermost fn. This check is then just extra lines, correct?
Also, still too many info!
, zero are needed when shutting down but this PR at least gets rid of one extra stacktrace.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tend to agree about the info!
being a bit too verbose, I put it in here for consistency with the other cancel checks in the function.
The functional impact of this check is just to avoid calling into compaction_iteration (which emits an error log when state isn't active), rather than calling in there, logging error, returning and then dropping out of this loop further down when we next enter a timeout
.
Problem
Error messages like this coming up during normal operations:
Summary of changes
Add explicit handling for the shutdown case in these locations, to suppress error logs.
Checklist before requesting a review
Checklist before merging