Fail Snapshot Delete if Metadata can't be Read #57786

original-brownbear · 2020-06-07T14:15:07Z

In case we use shard generation UUIDS, there is no good reason
whatsoever to be resilient/lenient here when deleting a snapshot.
All we do is put writes on top of a corrupted repository state.
In line with the behavior elsewhere in snapshotting, we should
fail-fast here to stop further writes to the repository in a situation
where something clearly went wrong.
Also, by failing fast here we prevent garbage from accumulating in the repository because it allows the user to rerun the delete if it failed because of some transient IO issue (we couldn't/can't do this in the old metadata format because we update the root level RepositoryData first).

Relates #57785 (the impact of this bug would've been much lower with this change,
it's not as easy to make a similar change to snapshot creation but I'm looking into it as well).

In case we use shard generation uuids, there is no good reason whatsoever to be resilient/lenient here when deleting a snapshot. All we do is put writes on top of a corrupted repository state. In line with the behavior elsewhere in snapshotting, we should fail-fast here to stop further writes to the repository.

elasticmachine · 2020-06-07T14:15:09Z

Pinging @elastic/es-distributed (:Distributed/Snapshot/Restore)

tlrx

LGTM

ywelsch

I'm undecided if we should do this. Let's discuss it in the coming days.

…d-shard-gen-failure

original-brownbear · 2021-02-25T14:59:39Z

@ywelsch it's been a while on this one. Given the recent discussions on hardening the repo I'm a bigger fan than ever of this one. I really don't see how anything good could come out of deleting from a repo when a shard folder isn't readable properly. You can still get rid of the snapshots with this change by deleting all snapshots referencing the shard, but you can't just quietly keep going in this broken state => I think it's a neat safe-guard against concurrent writing, especially (but not limited to) on S3.

elasticsearchmachine · 2022-08-03T12:43:15Z

Pinging @elastic/es-distributed (Team:Distributed)

original-brownbear · 2022-08-09T13:07:33Z

closing this now, this is irrelevant since we decided on doing #89163 which makes this impossible

original-brownbear added >non-issue :Distributed Coordination/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs v8.0.0 v7.9.0 labels Jun 7, 2020

elasticmachine added the Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. label Jun 7, 2020

original-brownbear requested review from ywelsch and tlrx June 8, 2020 08:35

tlrx approved these changes Jun 8, 2020

View reviewed changes

ywelsch reviewed Jun 8, 2020

View reviewed changes

This was referenced Jul 3, 2020

Enable Fully Concurrent Snapshot Operations #56911

Merged

Add Check for Metadata Existence in BlobStoreRepository #59141

Merged

original-brownbear added 2 commits February 25, 2021 14:57

Merge remote-tracking branch 'elastic/master' into mark-repo-corrupte…

3eab5df

…d-shard-gen-failure

fix merge issues

1dfff06

original-brownbear requested a review from ywelsch February 25, 2021 14:57

ywelsch removed their request for review August 26, 2021 13:34

arteam added v8.1.0 and removed v8.0.0 labels Jan 12, 2022

mark-vieira added v8.2.0 and removed v8.1.0 labels Feb 2, 2022

salvatore-campagna added v8.3.0 and removed v8.2.0 labels Mar 30, 2022

craigtaverner added v8.4.0 and removed v8.3.0 labels May 25, 2022

elasticsearchmachine changed the base branch from master to main July 22, 2022 23:13

mark-vieira added v8.5.0 and removed v8.4.0 labels Jul 27, 2022

original-brownbear added the team-discuss label Aug 3, 2022

original-brownbear closed this Aug 9, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fail Snapshot Delete if Metadata can't be Read #57786

Fail Snapshot Delete if Metadata can't be Read #57786

original-brownbear commented Jun 7, 2020 •

edited

Loading

elasticmachine commented Jun 7, 2020

tlrx left a comment

ywelsch left a comment

original-brownbear commented Feb 25, 2021

elasticsearchmachine commented Aug 3, 2022

original-brownbear commented Aug 9, 2022

Fail Snapshot Delete if Metadata can't be Read #57786

Fail Snapshot Delete if Metadata can't be Read #57786

Conversation

original-brownbear commented Jun 7, 2020 • edited Loading

elasticmachine commented Jun 7, 2020

tlrx left a comment

Choose a reason for hiding this comment

ywelsch left a comment

Choose a reason for hiding this comment

original-brownbear commented Feb 25, 2021

elasticsearchmachine commented Aug 3, 2022

original-brownbear commented Aug 9, 2022

original-brownbear commented Jun 7, 2020 •

edited

Loading