Make snapshot deletes less memory intensive by reordering repository metadata updates #89163

original-brownbear · 2022-08-08T10:10:08Z

Current snapshot deletes consume memory that scales as O(N) in the number of shards in the repository that the delete operation touches. This makes them very memory intensive when deleting snapshots that themselves contain many shards.
The deeper issue behind this O(N) memory consumption is that snapshot deletes currently update all shard level metadata in the repository before updating root level metadata and then finally running the blob delete operations.
This means that master must hold information for all shards in memory from the beginning of the delete operation until the physical blob delete execution. Moreover, this also means that delete operations block other repository operations for longer, the larger the number of shards in the delete operation because no other operation may execute while the shard level metadata is updated.

We discussed this issue and settled on a fix that will update the root level metadata first and then update the shard level metadata (that might now point to non-existent snapshots) step-by-step in a newly added delete/GC step. This allows updating only a limited number of shards at once, limiting the memory consumption of the operation and shortening the window during which other operations are blocked by delete operations.

elasticsearchmachine · 2022-08-08T10:10:31Z

Pinging @elastic/es-distributed (Team:Distributed)

original-brownbear added >enhancement :Distributed Coordination/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs labels Aug 8, 2022

original-brownbear self-assigned this Aug 8, 2022

original-brownbear mentioned this issue Aug 8, 2022

Fix Large Shard Count Scalability Issues #77466

Open

97 tasks

elasticsearchmachine added the Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. label Aug 8, 2022

original-brownbear mentioned this issue Aug 9, 2022

Fail Snapshot Delete if Metadata can't be Read #57786

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make snapshot deletes less memory intensive by reordering repository metadata updates #89163

Make snapshot deletes less memory intensive by reordering repository metadata updates #89163

original-brownbear commented Aug 8, 2022

elasticsearchmachine commented Aug 8, 2022

Make snapshot deletes less memory intensive by reordering repository metadata updates #89163

Make snapshot deletes less memory intensive by reordering repository metadata updates #89163

Comments

original-brownbear commented Aug 8, 2022

elasticsearchmachine commented Aug 8, 2022