Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a retain_lsn test #9599

Merged
merged 10 commits into from
Nov 11, 2024
Merged

Add a retain_lsn test #9599

merged 10 commits into from
Nov 11, 2024

Conversation

arpad-m
Copy link
Member

@arpad-m arpad-m commented Nov 1, 2024

Add a test that ensures the retain_lsn functionality works. Right now, there is not a single test that is broken if offloaded or non-offloaded timelines don't get registered at their parents, preventing gc from discarding the ancestor_lsns of the children. This PR fills that gap.

The test has four modes:

  • offloaded: offload the child timeline, run compaction on the parent timeline, unarchive the child timeline, then try reading from it. hopefully the data is still there.
  • offloaded-corrupted: offload the child timeline, corrupts the manifest in a way that the pageserver believes the timeline was flattened. This is the closest we can get to pretend the retain_lsn mechanism doesn't exist for offloaded timelines, so we can avoid adding endpoints to the pageserver that do this manually for tests. The test then checks that indeed data is corrupted and the endpoint can't be started. That way we know that the test is actually working, and actually tests the retain_lsn mechanism, instead of say the lsn lease mechanism, or one of the many other mechanisms that impede gc.
  • archived: the child timeline gets archived but doesn't get offloaded. this currently matches the None case but we might have refactors in the future that make archived timelines sufficiently different from non-archived ones.
  • None: the child timeline doesn't even get archived. this tests that normal timelines participate in retain_lsn. I've made them locally not participate in retain_lsn (via commenting out the respective ancestor_children.push statement in tenant.rs) and ran the testsuite, and not a single test failed. So this test is first of its kind.

Part of #8088.

Copy link

github-actions bot commented Nov 1, 2024

5408 tests run: 5186 passed, 0 failed, 222 skipped (full report)


Flaky tests (1)

Postgres 17

Code coverage* (full report)

  • functions: 31.7% (7863 of 24802 functions)
  • lines: 49.4% (62219 of 125958 lines)

* collected from Rust tests only


The comment gets automatically updated with the latest test results
76137d9 at 2024-11-11T22:38:34.046Z :recycle:

@arpad-m arpad-m requested a review from erikgrinaker November 7, 2024 15:13
@arpad-m arpad-m enabled auto-merge (squash) November 8, 2024 15:08
@arpad-m arpad-m merged commit b018bc7 into main Nov 11, 2024
80 checks passed
@arpad-m arpad-m deleted the arpad/offloaded_retain_lsn_test branch November 11, 2024 22:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants