Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

safekeeper: add separate tombstones map for deleted timelines #8253

Merged
merged 4 commits into from
Jul 5, 2024

Conversation

jcsp
Copy link
Collaborator

@jcsp jcsp commented Jul 3, 2024

Problem

Safekeepers left running for a long time use a lot of memory (up to the point of OOMing, on small nodes) for deleted timelines, because the Timeline struct is kept alive as a guard against recreating deleted timelines.

Closes: #6810

Summary of changes

  • Create separate tombstones that just record a ttid and when the timeline was deleted.
  • Add a periodic housekeeping task that cleans up tombstones older than a hardcoded TTL (24h)

I think this also makes #6766 un-needed, as the tombstone is also checked during deletion.

I considered making the overall timeline map use an enum type containing active or deleted, but having a separate map of tombstones avoids bloating that map, so that calls like get() can still go straight to a timeline without having to walk a hashmap that also contains tombstones.

Checklist before requesting a review

  • I have performed a self-review of my code.
  • If it is a core feature, I have added thorough tests.
  • Do we need to implement analytics? if so did you add the relevant metrics to the dashboard?
  • If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section.

Checklist before merging

  • Do not forget to reformat commit message to not include the above checklist

@jcsp jcsp added t/bug Issue Type: Bug c/storage/safekeeper Component: storage: safekeeper labels Jul 3, 2024
Copy link

github-actions bot commented Jul 3, 2024

3024 tests run: 2909 passed, 0 failed, 115 skipped (full report)


Flaky tests (2)

Postgres 14

  • test_ondemand_wal_download_in_replication_slot_funcs: debug
  • test_subscriber_restart: release

Code coverage* (full report)

  • functions: 32.6% (6932 of 21274 functions)
  • lines: 50.0% (54454 of 108936 lines)

* collected from Rust tests only


The comment gets automatically updated with the latest test results
93e85f7 at 2024-07-05T10:18:39.640Z :recycle:

@jcsp jcsp requested a review from arssher July 3, 2024 21:15
@jcsp jcsp marked this pull request as ready for review July 3, 2024 21:15
@jcsp jcsp requested a review from a team as a code owner July 3, 2024 21:15
Copy link
Contributor

@arssher arssher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are two related FIXMEs in timeline_global_map.rs, they also should be removed.

@jcsp jcsp enabled auto-merge (squash) July 5, 2024 09:10
@jcsp jcsp merged commit 6849ae4 into main Jul 5, 2024
63 checks passed
@jcsp jcsp deleted the jcsp/safekeeper-issue-6810-tombstones branch July 5, 2024 10:17
VladLazar pushed a commit that referenced this pull request Jul 8, 2024
## Problem

Safekeepers left running for a long time use a lot of memory (up to the
point of OOMing, on small nodes) for deleted timelines, because the
`Timeline` struct is kept alive as a guard against recreating deleted
timelines.

Closes: #6810

## Summary of changes

- Create separate tombstones that just record a ttid and when the
timeline was deleted.
- Add a periodic housekeeping task that cleans up tombstones older than
a hardcoded TTL (24h)

I think this also makes #6766
un-needed, as the tombstone is also checked during deletion.

I considered making the overall timeline map use an enum type containing
active or deleted, but having a separate map of tombstones avoids
bloating that map, so that calls like `get()` can still go straight to a
timeline without having to walk a hashmap that also contains tombstones.
VladLazar pushed a commit that referenced this pull request Jul 8, 2024
## Problem

Safekeepers left running for a long time use a lot of memory (up to the
point of OOMing, on small nodes) for deleted timelines, because the
`Timeline` struct is kept alive as a guard against recreating deleted
timelines.

Closes: #6810

## Summary of changes

- Create separate tombstones that just record a ttid and when the
timeline was deleted.
- Add a periodic housekeeping task that cleans up tombstones older than
a hardcoded TTL (24h)

I think this also makes #6766
un-needed, as the tombstone is also checked during deletion.

I considered making the overall timeline map use an enum type containing
active or deleted, but having a separate map of tombstones avoids
bloating that map, so that calls like `get()` can still go straight to a
timeline without having to walk a hashmap that also contains tombstones.
VladLazar pushed a commit that referenced this pull request Jul 8, 2024
## Problem

Safekeepers left running for a long time use a lot of memory (up to the
point of OOMing, on small nodes) for deleted timelines, because the
`Timeline` struct is kept alive as a guard against recreating deleted
timelines.

Closes: #6810

## Summary of changes

- Create separate tombstones that just record a ttid and when the
timeline was deleted.
- Add a periodic housekeeping task that cleans up tombstones older than
a hardcoded TTL (24h)

I think this also makes #6766
un-needed, as the tombstone is also checked during deletion.

I considered making the overall timeline map use an enum type containing
active or deleted, but having a separate map of tombstones avoids
bloating that map, so that calls like `get()` can still go straight to a
timeline without having to walk a hashmap that also contains tombstones.
VladLazar pushed a commit that referenced this pull request Jul 8, 2024
## Problem

Safekeepers left running for a long time use a lot of memory (up to the
point of OOMing, on small nodes) for deleted timelines, because the
`Timeline` struct is kept alive as a guard against recreating deleted
timelines.

Closes: #6810

## Summary of changes

- Create separate tombstones that just record a ttid and when the
timeline was deleted.
- Add a periodic housekeeping task that cleans up tombstones older than
a hardcoded TTL (24h)

I think this also makes #6766
un-needed, as the tombstone is also checked during deletion.

I considered making the overall timeline map use an enum type containing
active or deleted, but having a separate map of tombstones avoids
bloating that map, so that calls like `get()` can still go straight to a
timeline without having to walk a hashmap that also contains tombstones.
VladLazar pushed a commit that referenced this pull request Jul 8, 2024
## Problem

Safekeepers left running for a long time use a lot of memory (up to the
point of OOMing, on small nodes) for deleted timelines, because the
`Timeline` struct is kept alive as a guard against recreating deleted
timelines.

Closes: #6810

## Summary of changes

- Create separate tombstones that just record a ttid and when the
timeline was deleted.
- Add a periodic housekeeping task that cleans up tombstones older than
a hardcoded TTL (24h)

I think this also makes #6766
un-needed, as the tombstone is also checked during deletion.

I considered making the overall timeline map use an enum type containing
active or deleted, but having a separate map of tombstones avoids
bloating that map, so that calls like `get()` can still go straight to a
timeline without having to walk a hashmap that also contains tombstones.
VladLazar pushed a commit that referenced this pull request Jul 8, 2024
## Problem

Safekeepers left running for a long time use a lot of memory (up to the
point of OOMing, on small nodes) for deleted timelines, because the
`Timeline` struct is kept alive as a guard against recreating deleted
timelines.

Closes: #6810

## Summary of changes

- Create separate tombstones that just record a ttid and when the
timeline was deleted.
- Add a periodic housekeeping task that cleans up tombstones older than
a hardcoded TTL (24h)

I think this also makes #6766
un-needed, as the tombstone is also checked during deletion.

I considered making the overall timeline map use an enum type containing
active or deleted, but having a separate map of tombstones avoids
bloating that map, so that calls like `get()` can still go straight to a
timeline without having to walk a hashmap that also contains tombstones.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
c/storage/safekeeper Component: storage: safekeeper t/bug Issue Type: Bug
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Improve deleted timelines in safekeepers
2 participants