Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pageserver: refuse to load tenants with suspiciously old indices in old generations #9719

Merged
merged 5 commits into from
Nov 13, 2024

Conversation

jcsp
Copy link
Collaborator

@jcsp jcsp commented Nov 11, 2024

Problem

Historically, if a control component passed a pageserver "generation: 1" this could be a quick way to corrupt a tenant by loading a historic index.

Follows #9383
Closes #6951

Summary of changes

  • Introduce a Fatal variant to DownloadError, to enable index downloads to signal when they have encountered a scary enough situation that we shouldn't proceed to load the tenant.
  • Handle this variant by putting the tenant into a broken state (no matter which timeline within the tenant reported it)
  • Add a test for this case

In the event that this behavior fires when we don't want it to, we have ways to intervene:

  • "Touch" an affected index to update its mtime (download+upload S3 object)
  • If this behavior is triggered, it indicates we're attaching in some old generation, so we should be able to fix that by manually bumping generation numbers in the storage controller database (this should never happen, but it's an option if it does)

@jcsp jcsp added c/storage/pageserver Component: storage: pageserver a/tech_debt Area: related to tech debt labels Nov 11, 2024
Copy link

github-actions bot commented Nov 11, 2024

5400 tests run: 5173 passed, 0 failed, 227 skipped (full report)


Flaky tests (1)

Postgres 16

Code coverage* (full report)

  • functions: 31.8% (7889 of 24837 functions)
  • lines: 49.4% (62455 of 126301 lines)

* collected from Rust tests only


The comment gets automatically updated with the latest test results
e33af1b at 2024-11-13T17:21:51.883Z :recycle:

@jcsp jcsp marked this pull request as ready for review November 12, 2024 16:27
@jcsp jcsp requested a review from a team as a code owner November 12, 2024 16:27
libs/remote_storage/src/error.rs Outdated Show resolved Hide resolved
pageserver/src/tenant.rs Show resolved Hide resolved
@jcsp jcsp enabled auto-merge (squash) November 13, 2024 17:52
@jcsp jcsp merged commit b4e00b8 into main Nov 13, 2024
85 checks passed
@jcsp jcsp deleted the jcsp/issue-6951-detect-old-indices-pt2 branch November 13, 2024 18:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
a/tech_debt Area: related to tech debt c/storage/pageserver Component: storage: pageserver
Projects
None yet
Development

Successfully merging this pull request may close these issues.

pageserver: sanity check when loading an old index, to detect bad generations in attach
2 participants