-
Notifications
You must be signed in to change notification settings - Fork 457
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve deleted timelines in safekeepers #6810
Labels
c/storage/safekeeper
Component: storage: safekeeper
Comments
5 tasks
jcsp
added a commit
that referenced
this issue
Jul 5, 2024
## Problem Safekeepers left running for a long time use a lot of memory (up to the point of OOMing, on small nodes) for deleted timelines, because the `Timeline` struct is kept alive as a guard against recreating deleted timelines. Closes: #6810 ## Summary of changes - Create separate tombstones that just record a ttid and when the timeline was deleted. - Add a periodic housekeeping task that cleans up tombstones older than a hardcoded TTL (24h) I think this also makes #6766 un-needed, as the tombstone is also checked during deletion. I considered making the overall timeline map use an enum type containing active or deleted, but having a separate map of tombstones avoids bloating that map, so that calls like `get()` can still go straight to a timeline without having to walk a hashmap that also contains tombstones.
VladLazar
pushed a commit
that referenced
this issue
Jul 8, 2024
## Problem Safekeepers left running for a long time use a lot of memory (up to the point of OOMing, on small nodes) for deleted timelines, because the `Timeline` struct is kept alive as a guard against recreating deleted timelines. Closes: #6810 ## Summary of changes - Create separate tombstones that just record a ttid and when the timeline was deleted. - Add a periodic housekeeping task that cleans up tombstones older than a hardcoded TTL (24h) I think this also makes #6766 un-needed, as the tombstone is also checked during deletion. I considered making the overall timeline map use an enum type containing active or deleted, but having a separate map of tombstones avoids bloating that map, so that calls like `get()` can still go straight to a timeline without having to walk a hashmap that also contains tombstones.
VladLazar
pushed a commit
that referenced
this issue
Jul 8, 2024
## Problem Safekeepers left running for a long time use a lot of memory (up to the point of OOMing, on small nodes) for deleted timelines, because the `Timeline` struct is kept alive as a guard against recreating deleted timelines. Closes: #6810 ## Summary of changes - Create separate tombstones that just record a ttid and when the timeline was deleted. - Add a periodic housekeeping task that cleans up tombstones older than a hardcoded TTL (24h) I think this also makes #6766 un-needed, as the tombstone is also checked during deletion. I considered making the overall timeline map use an enum type containing active or deleted, but having a separate map of tombstones avoids bloating that map, so that calls like `get()` can still go straight to a timeline without having to walk a hashmap that also contains tombstones.
VladLazar
pushed a commit
that referenced
this issue
Jul 8, 2024
## Problem Safekeepers left running for a long time use a lot of memory (up to the point of OOMing, on small nodes) for deleted timelines, because the `Timeline` struct is kept alive as a guard against recreating deleted timelines. Closes: #6810 ## Summary of changes - Create separate tombstones that just record a ttid and when the timeline was deleted. - Add a periodic housekeeping task that cleans up tombstones older than a hardcoded TTL (24h) I think this also makes #6766 un-needed, as the tombstone is also checked during deletion. I considered making the overall timeline map use an enum type containing active or deleted, but having a separate map of tombstones avoids bloating that map, so that calls like `get()` can still go straight to a timeline without having to walk a hashmap that also contains tombstones.
VladLazar
pushed a commit
that referenced
this issue
Jul 8, 2024
## Problem Safekeepers left running for a long time use a lot of memory (up to the point of OOMing, on small nodes) for deleted timelines, because the `Timeline` struct is kept alive as a guard against recreating deleted timelines. Closes: #6810 ## Summary of changes - Create separate tombstones that just record a ttid and when the timeline was deleted. - Add a periodic housekeeping task that cleans up tombstones older than a hardcoded TTL (24h) I think this also makes #6766 un-needed, as the tombstone is also checked during deletion. I considered making the overall timeline map use an enum type containing active or deleted, but having a separate map of tombstones avoids bloating that map, so that calls like `get()` can still go straight to a timeline without having to walk a hashmap that also contains tombstones.
VladLazar
pushed a commit
that referenced
this issue
Jul 8, 2024
## Problem Safekeepers left running for a long time use a lot of memory (up to the point of OOMing, on small nodes) for deleted timelines, because the `Timeline` struct is kept alive as a guard against recreating deleted timelines. Closes: #6810 ## Summary of changes - Create separate tombstones that just record a ttid and when the timeline was deleted. - Add a periodic housekeeping task that cleans up tombstones older than a hardcoded TTL (24h) I think this also makes #6766 un-needed, as the tombstone is also checked during deletion. I considered making the overall timeline map use an enum type containing active or deleted, but having a separate map of tombstones avoids bloating that map, so that calls like `get()` can still go straight to a timeline without having to walk a hashmap that also contains tombstones.
VladLazar
pushed a commit
that referenced
this issue
Jul 8, 2024
## Problem Safekeepers left running for a long time use a lot of memory (up to the point of OOMing, on small nodes) for deleted timelines, because the `Timeline` struct is kept alive as a guard against recreating deleted timelines. Closes: #6810 ## Summary of changes - Create separate tombstones that just record a ttid and when the timeline was deleted. - Add a periodic housekeeping task that cleans up tombstones older than a hardcoded TTL (24h) I think this also makes #6766 un-needed, as the tombstone is also checked during deletion. I considered making the overall timeline map use an enum type containing active or deleted, but having a separate map of tombstones avoids bloating that map, so that calls like `get()` can still go straight to a timeline without having to walk a hashmap that also contains tombstones.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Fix keeping deleted timelines in memory
Currently we store all timelines in memory, even deleted ones. This is needed to fix a bug with active computes for deleted timelines. Active computes can recreate timeline after deletion, so without this fix timelines can reappear after deletion.
The issue with this fix is that RAM usage grows indefinetely until restart. This can be addresssed with using less memory for deleted timelines and/or GC timelines deleted for a long time (~1 hour).
Fix timeline delete retries
When user deletes a project, safekeepers receive a
delete_tenant
API call. It iterates over all timelines for this tenant and deletes them individually.The issue here is that we don't track completed deletions in safekeepers and some timelines can be deleted more than once. This is not good when a tenant has thousands of timelines, because it takes too much time to retry deleting all of them.
#6766 fixes this by adding a flag inside a timeline. The slightly better fix can be to move this flag to
timelines_global_map.rs
instead, and don't invoke timeline deletion more than once from the tenant deletion code there.The text was updated successfully, but these errors were encountered: