Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make snapshot deletes less memory intensive by reordering repository metadata updates #89163

Open
Tracked by #77466
original-brownbear opened this issue Aug 8, 2022 · 1 comment
Assignees
Labels
:Distributed Coordination/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs >enhancement Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination.

Comments

@original-brownbear
Copy link
Member

Current snapshot deletes consume memory that scales as O(N) in the number of shards in the repository that the delete operation touches. This makes them very memory intensive when deleting snapshots that themselves contain many shards.
The deeper issue behind this O(N) memory consumption is that snapshot deletes currently update all shard level metadata in the repository before updating root level metadata and then finally running the blob delete operations.
This means that master must hold information for all shards in memory from the beginning of the delete operation until the physical blob delete execution. Moreover, this also means that delete operations block other repository operations for longer, the larger the number of shards in the delete operation because no other operation may execute while the shard level metadata is updated.

We discussed this issue and settled on a fix that will update the root level metadata first and then update the shard level metadata (that might now point to non-existent snapshots) step-by-step in a newly added delete/GC step. This allows updating only a limited number of shards at once, limiting the memory consumption of the operation and shortening the window during which other operations are blocked by delete operations.

@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-distributed (Team:Distributed)

@elasticsearchmachine elasticsearchmachine added the Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. label Aug 8, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed Coordination/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs >enhancement Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination.
Projects
None yet
Development

No branches or pull requests

2 participants