You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Per the "Cleaning up parent-shard layers" section in #6358 -- currently after a shard split, layers from the parent shards are not deleted until the whole tenant is eventually deleted.
We should implement an occasional online scrub routine that checks which of these are referenced by children, and cleans them up.
It likely makes sense to combine this work with cleaning up old-generation index_part.json objects, as these older objects will likely reference parent shard layers -- we should first define the criteria for cleaning up old indices, and then use the still-alive indices as the source of references for cleaning up parent layers.
The content you are editing has changed. Please copy your edits and refresh the page.
## Problem
Currently, we leave `index_part.json` objects from old generations
behind each time a pageserver restarts or a tenant is migrated. This
doesn't break anything, but it's annoying when a tenant has been around
for a long time and starts to accumulate 10s-100s of these.
Partially implements: #7043
## Summary of changes
- Add a new `pageserver-physical-gc` command to `s3_scrubber`
The name is a bit of a mouthful, but I think it makes sense:
- GC is the accurate term for what we are doing here: removing data that
takes up storage but can never be accessed.
- "physical" is a necessary distinction from the "normal" GC that we do
online in the pageserver, which operates at a higher level in terms of
LSNs+layers, whereas this type of GC is purely about S3 objects.
- "pageserver" makes clear that this command deals exclusively with
pageserver data, not safekeeper.
Per the "Cleaning up parent-shard layers" section in #6358 -- currently after a shard split, layers from the parent shards are not deleted until the whole tenant is eventually deleted.
We should implement an occasional online scrub routine that checks which of these are referenced by children, and cleans them up.
It likely makes sense to combine this work with cleaning up old-generation index_part.json objects, as these older objects will likely reference parent shard layers -- we should first define the criteria for cleaning up old indices, and then use the still-alive indices as the source of references for cleaning up parent layers.
Tasks
pageserver-physical-gc
#7925The text was updated successfully, but these errors were encountered: