Skip to content

Commit

Permalink
pageserver: move deletion failpoint inside backoff (#5814)
Browse files Browse the repository at this point in the history
## Problem

When enabled, this failpoint would busy-spin in a loop that emits log
messages.

## Summary of changes

Move the failpoint inside a backoff::exponential block: it will still
spam the log, but at much lower rate.

---------

Co-authored-by: Joonas Koivunen <joonas@neon.tech>
  • Loading branch information
jcsp and koivunej authored Nov 7, 2023
1 parent 4cd47b7 commit 1d68f52
Showing 1 changed file with 13 additions and 10 deletions.
23 changes: 13 additions & 10 deletions pageserver/src/deletion_queue/deleter.rs
Original file line number Diff line number Diff line change
Expand Up @@ -55,21 +55,24 @@ impl Deleter {

/// Wrap the remote `delete_objects` with a failpoint
async fn remote_delete(&self) -> Result<(), anyhow::Error> {
fail::fail_point!("deletion-queue-before-execute", |_| {
info!("Skipping execution, failpoint set");
metrics::DELETION_QUEUE
.remote_errors
.with_label_values(&["failpoint"])
.inc();
Err(anyhow::anyhow!("failpoint hit"))
});

// A backoff::retry is used here for two reasons:
// - To provide a backoff rather than busy-polling the API on errors
// - To absorb transient 429/503 conditions without hitting our error
// logging path for issues deleting objects.
backoff::retry(
|| async { self.remote_storage.delete_objects(&self.accumulator).await },
|| async {
fail::fail_point!("deletion-queue-before-execute", |_| {
info!("Skipping execution, failpoint set");

metrics::DELETION_QUEUE
.remote_errors
.with_label_values(&["failpoint"])
.inc();
Err(anyhow::anyhow!("failpoint: deletion-queue-before-execute"))
});

self.remote_storage.delete_objects(&self.accumulator).await
},
|_| false,
3,
10,
Expand Down

1 comment on commit 1d68f52

@github-actions
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2426 tests run: 2305 passed, 0 failed, 121 skipped (full report)


Flaky tests (1)

Postgres 14

  • test_pageserver_chaos: debug

Code coverage (full report)

  • functions: 54.5% (8875 of 16271 functions)
  • lines: 81.7% (51149 of 62641 lines)

The comment gets automatically updated with the latest test results
1d68f52 at 2023-11-07T15:10:54.029Z :recycle:

Please sign in to comment.