Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pageserver: fix ingest handling of CLog truncate #10080

Merged
merged 1 commit into from
Dec 11, 2024
Merged

Conversation

jcsp
Copy link
Collaborator

@jcsp jcsp commented Dec 11, 2024

Problem

In #9786 we stop storing SLRUs on non-zero shards.

However, there was one code path during ingest that still tries to enumerate SLRU relations on all shards. This fails if it sees a tenant who has never seen any write to an SLRU, or who has done such thorough compaction+GC that it has dropped its SLRU directory key.

Summary of changes

  • Avoid trying to list SLRU relations on nonzero shards

@jcsp jcsp added t/bug Issue Type: Bug c/storage/pageserver Component: storage: pageserver labels Dec 11, 2024
@jcsp jcsp changed the title paseserver: fix CLog truncate walingest paseserver: fix ingest handling of CLog truncate Dec 11, 2024
@jcsp jcsp marked this pull request as ready for review December 11, 2024 02:56
@jcsp jcsp requested a review from a team as a code owner December 11, 2024 02:56
@jcsp jcsp requested a review from VladLazar December 11, 2024 02:56
@skyzh skyzh changed the title paseserver: fix ingest handling of CLog truncate pageserver: fix ingest handling of CLog truncate Dec 11, 2024
@jcsp jcsp enabled auto-merge December 11, 2024 03:53
Copy link

github-actions bot commented Dec 11, 2024

7099 tests run: 6784 passed, 0 failed, 315 skipped (full report)


Flaky tests (6)

Postgres 17

Postgres 15

Postgres 14

Code coverage* (full report)

  • functions: 31.4% (8334 of 26537 functions)
  • lines: 47.7% (65634 of 137583 lines)

* collected from Rust tests only


The comment gets automatically updated with the latest test results
ea0de03 at 2024-12-11T08:37:50.359Z :recycle:

@jcsp jcsp added this pull request to the merge queue Dec 11, 2024
Merged via the queue into main with commit 38415a9 Dec 11, 2024
92 checks passed
@jcsp jcsp deleted the jcsp/fix-ingest-truncate branch December 11, 2024 09:17
github-merge-queue bot pushed a commit that referenced this pull request Dec 11, 2024
## Problem

We get slru truncation commands on non-zero shards.
Compaction will drop the slru dir keys and ingest will fail when
receiving such records.
#10080 fixed it for clog, but
not for multixact.

## Summary of changes

Only truncate multixact slrus on shard zero. I audited the rest of the
ingest code and it looks
fine from this pov.
github-merge-queue bot pushed a commit that referenced this pull request Dec 16, 2024
## Problem

Changes in #9786 were functionally complete but missed some edges that
made testing less robust than it should have been:
- `is_key_disposable` didn't consider SLRU dir keys disposable
- Timeline `init_empty` was always creating SLRU dir keys on all shards

The result was that when we had a bug
(#10080), it wasn't apparent in
tests, because one would only encounter the issue if running on a
long-lived timeline with enough compaction to drop the initially created
empty SLRU dir keys, _and_ some CLog truncation going on.

Closes: neondatabase/cloud#21516

## Summary of changes

- Update is_key_global and init_empty to handle SLRU dir keys properly
-- the only functional impact is that we avoid writing some spurious
keys in shards >0, but this makes testing much more robust.
- Make `test_clog_truncate` explicitly use a sharded tenant

The net result is that if one reverts #10080, then tests fail (i.e. this
PR is a reproducer for the issue)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
c/storage/pageserver Component: storage: pageserver t/bug Issue Type: Bug
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants