-
Notifications
You must be signed in to change notification settings - Fork 485
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
pageserver: fix ingest handling of CLog truncate #10080
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
skyzh
approved these changes
Dec 11, 2024
jcsp
added
t/bug
Issue Type: Bug
c/storage/pageserver
Component: storage: pageserver
labels
Dec 11, 2024
jcsp
changed the title
paseserver: fix CLog truncate walingest
paseserver: fix ingest handling of CLog truncate
Dec 11, 2024
skyzh
changed the title
paseserver: fix ingest handling of CLog truncate
pageserver: fix ingest handling of CLog truncate
Dec 11, 2024
7099 tests run: 6784 passed, 0 failed, 315 skipped (full report)Flaky tests (6)Postgres 17
Postgres 15
Postgres 14
Code coverage* (full report)
* collected from Rust tests only The comment gets automatically updated with the latest test results
ea0de03 at 2024-12-11T08:37:50.359Z :recycle: |
This was referenced Dec 11, 2024
github-merge-queue bot
pushed a commit
that referenced
this pull request
Dec 11, 2024
## Problem We get slru truncation commands on non-zero shards. Compaction will drop the slru dir keys and ingest will fail when receiving such records. #10080 fixed it for clog, but not for multixact. ## Summary of changes Only truncate multixact slrus on shard zero. I audited the rest of the ingest code and it looks fine from this pov.
github-merge-queue bot
pushed a commit
that referenced
this pull request
Dec 16, 2024
## Problem Changes in #9786 were functionally complete but missed some edges that made testing less robust than it should have been: - `is_key_disposable` didn't consider SLRU dir keys disposable - Timeline `init_empty` was always creating SLRU dir keys on all shards The result was that when we had a bug (#10080), it wasn't apparent in tests, because one would only encounter the issue if running on a long-lived timeline with enough compaction to drop the initially created empty SLRU dir keys, _and_ some CLog truncation going on. Closes: neondatabase/cloud#21516 ## Summary of changes - Update is_key_global and init_empty to handle SLRU dir keys properly -- the only functional impact is that we avoid writing some spurious keys in shards >0, but this makes testing much more robust. - Make `test_clog_truncate` explicitly use a sharded tenant The net result is that if one reverts #10080, then tests fail (i.e. this PR is a reproducer for the issue)
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Problem
In #9786 we stop storing SLRUs on non-zero shards.
However, there was one code path during ingest that still tries to enumerate SLRU relations on all shards. This fails if it sees a tenant who has never seen any write to an SLRU, or who has done such thorough compaction+GC that it has dropped its SLRU directory key.
Summary of changes