Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pageserver: call maybe_freeze_ephemeral_layer from a dedicated task #7594

Merged
merged 3 commits into from
May 6, 2024

Conversation

jcsp
Copy link
Collaborator

@jcsp jcsp commented May 2, 2024

Problem

In testing of the earlier fix for OOMs under heavy write load (#7218), we saw that the limit on ephemeral layer size wasn't being reliably enforced. That was diagnosed as being due to overwhelmed compaction loops: most tenants were waiting on the semaphore for background tasks, and thereby not running the function that proactively rolls layers frequently enough.

Related: #6939

Summary of changes

  • Create a new per-tenant background loop for "ingest housekeeping", which invokes maybe_freeze_ephemeral_layer() without taking the background task semaphore.
  • Downgrade to DEBUG a log line in maybe_freeze_ephemeral_layer that had been INFO, but turns out to be pretty common in the field.

There's some discussion on the issue (#6939 (comment)) about alternatives for calling this maybe_freeze_epemeral_layer periodically without it getting stuck behind compaction. A whole task just for this feels like kind of a big hammer, but we may in future find that there are other pieces of lightweight housekeeping that we want to do here too.

Why is it okay to call maybe_freeze_ephemeral_layer outside of the background tasks semaphore?

  • this is the same work we would do anyway if we receive writes from the safekeeper, just done a bit sooner.
  • The period of the new task is generously jittered (+/- 5%), so when the ephemeral layer size tips over the threshold, we shouldn't see an excessively aggressive thundering herd of layer freezes (and only layers larger than the mean layer size will be frozen)
  • All that said, this is an imperfect approach that relies on having a generous amount of RAM to dip into when we need to freeze somewhat urgently. It would be nice in future to also block compaction/GC when we recognize resource stress and need to do other work (like layer freezing) to reduce memory footprint.

Checklist before requesting a review

  • I have performed a self-review of my code.
  • If it is a core feature, I have added thorough tests.
  • Do we need to implement analytics? if so did you add the relevant metrics to the dashboard?
  • If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section.

Checklist before merging

  • Do not forget to reformat commit message to not include the above checklist

Copy link

github-actions bot commented May 2, 2024

2886 tests run: 2765 passed, 0 failed, 121 skipped (full report)


Flaky tests (1)

Postgres 15

  • test_vm_bit_clear_on_heap_lock: debug

Code coverage* (full report)

  • functions: 28.0% (6598 of 23554 functions)
  • lines: 46.7% (46856 of 100434 lines)

* collected from Rust tests only


The comment gets automatically updated with the latest test results
94ee8cb at 2024-05-03T14:13:17.872Z :recycle:

@jcsp jcsp changed the title pageserver: move test_sharding_split_failures to its own task pageserver: call maybe_freeze_ephemeral_layer from a dedicated task May 3, 2024
@jcsp jcsp marked this pull request as ready for review May 3, 2024 16:59
@jcsp jcsp requested a review from a team as a code owner May 3, 2024 16:59
@jcsp jcsp requested a review from arssher May 3, 2024 16:59
@jcsp jcsp merged commit 3764dd2 into main May 6, 2024
55 checks passed
@jcsp jcsp deleted the jcsp/issue-6939-background-loop branch May 6, 2024 13:07
conradludgate pushed a commit that referenced this pull request May 8, 2024
…7594)

## Problem

In testing of the earlier fix for OOMs under heavy write load
(#7218), we saw that the limit
on ephemeral layer size wasn't being reliably enforced. That was
diagnosed as being due to overwhelmed compaction loops: most tenants
were waiting on the semaphore for background tasks, and thereby not
running the function that proactively rolls layers frequently enough.

Related: #6939 

## Summary of changes

- Create a new per-tenant background loop for "ingest housekeeping",
which invokes maybe_freeze_ephemeral_layer() without taking the
background task semaphore.
- Downgrade to DEBUG a log line in maybe_freeze_ephemeral_layer that had
been INFO, but turns out to be pretty common in the field.

There's some discussion on the issue
(#6939 (comment))
about alternatives for calling this maybe_freeze_epemeral_layer
periodically without it getting stuck behind compaction. A whole task
just for this feels like kind of a big hammer, but we may in future find
that there are other pieces of lightweight housekeeping that we want to
do here too.

Why is it okay to call maybe_freeze_ephemeral_layer outside of the
background tasks semaphore?
- this is the same work we would do anyway if we receive writes from the
safekeeper, just done a bit sooner.
- The period of the new task is generously jittered (+/- 5%), so when
the ephemeral layer size tips over the threshold, we shouldn't see an
excessively aggressive thundering herd of layer freezes (and only layers
larger than the mean layer size will be frozen)
- All that said, this is an imperfect approach that relies on having a
generous amount of RAM to dip into when we need to freeze somewhat
urgently. It would be nice in future to also block compaction/GC when we
recognize resource stress and need to do other work (like layer
freezing) to reduce memory footprint.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants