-
Notifications
You must be signed in to change notification settings - Fork 25k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Crashing during snapshot deletion might result in unreferenced data left in repository #13159
Comments
We talked about it today with @ywelsch and we still think that this is a real issue. We don't have any plan about it yet but this is something we need to think about for the future of the snapshot/restore functionality. |
There is another scenario here that can lead to the same stale data that I found:
If in this scenario the data node continues to write segment files (as seen in this test failure #39852) we get unreferenced data since step 5 has removed any metadata reference to the index and snapshotting it again will lead to a new index uuid. |
I opened #40228 with a suggestion on how to write the tombstones to get a better handle on this situation. |
* Use ability to list child "folders" in the blob store to implement recursive delete on all stale index folders when cleaning up instead of using the diff between two `RepositoryData` instances to cover aborted deletes * Runs after ever delete operation * Relates #13159 (fixing most of this issues caused by unreferenced indices, leaving some meta files to be cleaned up only)
Leaving this open, since there's still a little work left here in cleaning up unreferenced top level blobs. I'll raise a PR for that shortly. |
* Use ability to list child "folders" in the blob store to implement recursive delete on all stale index folders when cleaning up instead of using the diff between two `RepositoryData` instances to cover aborted deletes * Runs after ever delete operation * Relates elastic#13159 (fixing most of this issues caused by unreferenced indices, leaving some meta files to be cleaned up only)
* Use ability to list child "folders" in the blob store to implement recursive delete on all stale index folders when cleaning up instead of using the diff between two `RepositoryData` instances to cover aborted deletes * Runs after ever delete operation * Relates #13159 (fixing most of this issues caused by unreferenced indices, leaving some meta files to be cleaned up only)
Closing this as #42189 (and numerous follow-ups) resolved the bulk of the un-referenced file leaks on every delete operation and #43900 will bring a solution to cleanup the remainder (while acknowledging that some leaking can occur on errors that must be cleaned up via the cleanup endpoint that #43900 introdcues) |
It's currently possible to end up with unreferenced data in a snapshot repository, given the following steps:
foobar
, with sizeX bytes
snapshot-{}
andmetadata-{}
filesfoobar
Normally, step 5 would cause the files no longer referenced by any snapshots do be deleted, but if the underlying index is deleted as well, they won't get cleaned up. In the example above, there would be
X bytes
of disk space used without any snapshot referencing them. Given sufficiently large values ofX
, this could be a significant amount of storage wasted. Even with small amounts of data, this might accrue over time to become significant.Suggestion: create a
deleting-{}
file as a sibling of thesnapshot-{}
file that gets written before the files referenced by the snapshot gets deleted. When the deletion has been completed, this file should be the last one deleted. These files indicates that a deletion is in progress or have been attempted so it's possible to tell that the snapshot might be in a half-deleted state (so we can avoid using it). It should also enable later snapshot processes to continue the deletion process where the previous left off.The text was updated successfully, but these errors were encountered: