Delete shard store files before restoring a snapshot #27476

tlrx · 2017-11-21T13:32:14Z

Pull request #20220 added a change where the store files
that have the same name but are different from the ones in the
snapshot are deleted first before the snapshot is restored.
This logic was based on the Store.RecoveryDiff.different
set of files which works by computing a diff between an
existing store and a snapshot.

This works well when the files on the filesystem form valid
shard store, ie there's a segments file and store files
are not corrupted. Otherwise, the existing store's snapshot
metadata cannot be read (using Store#snapshotStoreMetadata())
and an exception is thrown
(CorruptIndexException, IndexFormatTooOldException etc) which
is later caught as the begining of the restore process
(see RestoreContext#restore()) and is translated into
an empty store metadata (Store.MetadataSnapshot.EMPTY).

This will make the deletion of different files introduced
in #20220 useless as the set of files will always be empty
even when store files exist on the filesystem. And if some
files are present within the store directory, then restoring
a snapshot with files with same names will fail with a
FileAlreadyExistException.

This is part of the #26865 issue.

There are various cases were some files could exist in the
store directory before a snapshot is restored. One that
Igor identified is a restore attempt that failed on a node
and only first files were restored, then the shard is allocated
again to the same node and the restore starts again (but fails
because of existing files). Another one is when some files
of a closed index are corrupted / deleted and the index is
restored.

This commit adds a test that uses the infrastructure provided
by IndexShardTestCase in order to test that restoring a shard
succeed even when files with same names exist on filesystem.

Related to #26865

ywelsch

I've left some minor questions, but overall change LGTM.

ywelsch · 2017-11-22T11:22:26Z

core/src/main/java/org/elasticsearch/repositories/blobstore/BlobStoreRepository.java

+                    // list of all existing store files without the identical ones
+                    final Set<String> deleteIfExistFiles = Sets.difference(
+                        new HashSet<>(Arrays.asList(store.directory().listAll())),
+                        diff.identical.stream().map(StoreFileMetaData::name).collect(Collectors.toSet())


Is there a need to substract the identical ones? The filesToRecover, which we iterate over, won't contain those anyhow.

That was a safety check I added but I agree it does not make sense, I'll remove it.

I agree with identical point. Could you also clean the trace logging 10 lines above while you are at it? I keep forgetting to do it and it doesn't make any sense now.

Also, can anyone think of a scenario where case-insensitive FS can screw us here somehow?

Also, can anyone think of a scenario where case-insensitive FS can screw us here somehow?

I don't see any scenario like this...

ywelsch · 2017-11-22T11:24:48Z

core/src/main/java/org/elasticsearch/repositories/blobstore/BlobStoreRepository.java

                final Store.MetadataSnapshot sourceMetaData = new Store.MetadataSnapshot(unmodifiableMap(snapshotMetaData), emptyMap(), 0);
+
+                final StoreFileMetaData restoredSegmentsFile = sourceMetaData.getSegmentsFile();
+                if (restoredSegmentsFile == null) {


I see that you fixed the check here (before it read recoveryTargetMetadata == null which did not make any sense as that one was always non-null). I wonder what impact this bug fix has. What change in behavior do we expect by this (it's a bit unclear to me what this check does)?

Sadly there's no test for this and as you noticed the previous check did not make any sense. I saw it as a bug and I think that we can't restore a snapshot without segments file so I don't expect any impact for this. Maybe @imotov has more knowledge?

I cannot really think of a scenario where snapshot would have no segments. So, removing check for recoveryTargetMetadata and replacing it with check for no snapshots shouldn't have any change in behavior expect in some pathological cases that would fail anyway. Now they will at least fail with a reasonable error message.

ywelsch · 2017-11-22T11:39:28Z

.../src/test/java/org/elasticsearch/repositories/blobstore/BlobStoreRepositoryRestoreTests.java

+     *
+     * It implements only the methods required by the tests and is not thread safe.
+     */
+    class MemoryBlobStoreRepository extends BlobStoreRepository {


I think we can just use an FsRepository for this. All our other shard-level tests do the same, so no need to optimize this. If we want to change that in the future, I think it's easier to switch to jimfs and continue using FsRepository.

It's that, or we can replace replace FsRepository with this one, but we need to beef it up.

Let's just use a FsRepository.

ywelsch · 2017-11-22T11:42:06Z

.../src/test/java/org/elasticsearch/repositories/blobstore/BlobStoreRepositoryRestoreTests.java

+    }
+
+    /** Recover a shard from a snapshot using a given repository **/
+    private void recoverShardFromSnapshot(final IndexShard shard,


you can move this method (and the one below it) up to IndexShardTestCase (in test:framework). It could be useful for other people.

imotov

Agree with @ywelsch on minor comments. Left one cleanup suggestion. Otherwise LGTM.

imotov · 2017-11-22T14:40:31Z

core/src/main/java/org/elasticsearch/repositories/blobstore/BlobStoreRepository.java

                final Store.MetadataSnapshot sourceMetaData = new Store.MetadataSnapshot(unmodifiableMap(snapshotMetaData), emptyMap(), 0);
+
+                final StoreFileMetaData restoredSegmentsFile = sourceMetaData.getSegmentsFile();
+                if (restoredSegmentsFile == null) {


I cannot really think of a scenario where snapshot would have no segments. So, removing check for recoveryTargetMetadata and replacing it with check for no snapshots shouldn't have any change in behavior expect in some pathological cases that would fail anyway. Now they will at least fail with a reasonable error message.

imotov · 2017-11-22T14:44:50Z

core/src/main/java/org/elasticsearch/repositories/blobstore/BlobStoreRepository.java

+                    // list of all existing store files without the identical ones
+                    final Set<String> deleteIfExistFiles = Sets.difference(
+                        new HashSet<>(Arrays.asList(store.directory().listAll())),
+                        diff.identical.stream().map(StoreFileMetaData::name).collect(Collectors.toSet())


I agree with identical point. Could you also clean the trace logging 10 lines above while you are at it? I keep forgetting to do it and it doesn't make any sense now.

Also, can anyone think of a scenario where case-insensitive FS can screw us here somehow?

imotov · 2017-11-22T14:58:56Z

.../src/test/java/org/elasticsearch/repositories/blobstore/BlobStoreRepositoryRestoreTests.java

+     *
+     * It implements only the methods required by the tests and is not thread safe.
+     */
+    class MemoryBlobStoreRepository extends BlobStoreRepository {


It's that, or we can replace replace FsRepository with this one, but we need to beef it up.

ywelsch

LGTM. Thanks @tlrx

Pull request elastic#20220 added a change where the store files that have the same name but are different from the ones in the snapshot are deleted first before the snapshot is restored. This logic was based on the `Store.RecoveryDiff.different` set of files which works by computing a diff between an existing store and a snapshot. This works well when the files on the filesystem form valid shard store, ie there's a `segments` file and store files are not corrupted. Otherwise, the existing store's snapshot metadata cannot be read (using Store#snapshotStoreMetadata()) and an exception is thrown (CorruptIndexException, IndexFormatTooOldException etc) which is later caught as the begining of the restore process (see RestoreContext#restore()) and is translated into an empty store metadata (Store.MetadataSnapshot.EMPTY). This will make the deletion of different files introduced in elastic#20220 useless as the set of files will always be empty even when store files exist on the filesystem. And if some files are present within the store directory, then restoring a snapshot with files with same names will fail with a FileAlreadyExistException. This is part of the elastic#26865 issue. There are various cases were some files could exist in the store directory before a snapshot is restored. One that Igor identified is a restore attempt that failed on a node and only first files were restored, then the shard is allocated again to the same node and the restore starts again (but fails because of existing files). Another one is when some files of a closed index are corrupted / deleted and the index is restored. This commit adds a test that uses the infrastructure provided by IndexShardTestCase in order to test that restoring a shard succeed even when files with same names exist on filesystem. Related to elastic#26865

Pull request #20220 added a change where the store files that have the same name but are different from the ones in the snapshot are deleted first before the snapshot is restored. This logic was based on the `Store.RecoveryDiff.different` set of files which works by computing a diff between an existing store and a snapshot. This works well when the files on the filesystem form valid shard store, ie there's a `segments` file and store files are not corrupted. Otherwise, the existing store's snapshot metadata cannot be read (using Store#snapshotStoreMetadata()) and an exception is thrown (CorruptIndexException, IndexFormatTooOldException etc) which is later caught as the begining of the restore process (see RestoreContext#restore()) and is translated into an empty store metadata (Store.MetadataSnapshot.EMPTY). This will make the deletion of different files introduced in #20220 useless as the set of files will always be empty even when store files exist on the filesystem. And if some files are present within the store directory, then restoring a snapshot with files with same names will fail with a FileAlreadyExistException. This is part of the #26865 issue. There are various cases were some files could exist in the store directory before a snapshot is restored. One that Igor identified is a restore attempt that failed on a node and only first files were restored, then the shard is allocated again to the same node and the restore starts again (but fails because of existing files). Another one is when some files of a closed index are corrupted / deleted and the index is restored. This commit adds a test that uses the infrastructure provided by IndexShardTestCase in order to test that restoring a shard succeed even when files with same names exist on filesystem. Related to #26865

* es/master: (38 commits) Backport wait_for_initialiazing_shards to cluster health API Carry over version map size to prevent excessive resizing (#27516) Fix scroll query with a sort that is a prefix of the index sort (#27498) Delete shard store files before restoring a snapshot (#27476) Replace `delimited_payload_filter` by `delimited_payload` (#26625) CURRENT should not be a -SNAPSHOT version if build.snapshot is false (#27512) Fix merging of _meta field (#27352) Remove unused method (#27508) unmuted test, this has been fixed by #27397 Consolidate version numbering semantics (#27397) Add wait_for_no_initializing_shards to cluster health API (#27489) [TEST] use routing partition size based on the max routing shards of the second split Adjust CombinedDeletionPolicy for multiple commits (#27456) Update composite-aggregation.asciidoc Deprecate `levenstein` in favor of `levenshtein` (#27409) Automatically prepare indices for splitting (#27451) Validate `op_type` for `_create` (#27483) Minor ShapeBuilder cleanup muted test Decouple nio constructs from the tcp transport (#27484) ...

* es/6.x: (30 commits) Add wait_for_no_initializing_shards to cluster health API (#27489) Carry over version map size to prevent excessive resizing (#27516) Fix scroll query with a sort that is a prefix of the index sort (#27498) Delete shard store files before restoring a snapshot (#27476) CURRENT should not be a -SNAPSHOT version if build.snapshot is false (#27512) Fix merging of _meta field (#27352) test: do not run percolator query builder bwc test against 5.x versions Remove unused method (#27508) Consolidate version numbering semantics (#27397) Adjust CombinedDeletionPolicy for multiple commits (#27456) Minor ShapeBuilder cleanup [GEO] Deprecate ShapeBuilders and decouple geojson parse logic Improve docs for split API in 6.1/6.x (#27504) test: use correct pre 6.0.0-alpha1 format Update composite-aggregation.asciidoc Deprecate `levenstein` in favor of `levenshtein` (#27409) Decouple nio constructs from the tcp transport (#27484) Bump version from 6.1 to 6.2 Fix whitespace in Security.java Tighten which classes can exit ...

fhalde · 2020-04-12T17:14:04Z

Does this mean if i had a backup cluster on which a snapshot S1 was restored & later on a snapshot S2 was created & restored, then the restore of S2 will be a full restoration of indices and not just of the diff between S1 & S2?

Just want to know if this is expected because we have been seeing slow primary shard initialization (when we restore on backup) after we upgraded to 7.2. In 1.7, subsequent restores on a backup cluster (with data) was a very fast process

fhalde · 2020-04-12T17:15:00Z

And if this is true, then what's the rationale? Why can't segment files which are untouched cannot be reused?

tlrx · 2020-04-14T09:17:55Z

@fhalde the restore process tries to reuse the existing files on disk, when possible. It is always based on a diff between the files on disk and the files contained in the snapshot to be restored. Note that shard files can differ substantially between two consecutive snapshots: it's possible that some segments were merged or that the snapshot was taken from a primary that has since been demoted in favor of a new primary. This pull request concerned the case were old files are present on disk but cannot be linked to an existing segment, end therefore should not be reused, causing trouble later in the restore process.

ywelsch · 2020-04-14T09:22:00Z

Probably #55142

tlrx · 2020-04-14T09:29:28Z

Yep, I didn't see that @fhalde already opened #55013 and also asked questions here. #55142 looks like the culprit here.

fhalde · 2020-04-14T14:51:00Z

Thanks @tlrx @ywelsch , will track this

tlrx added :Distributed Coordination/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs >bug review v5.6.5 v6.0.1 v6.1.0 v7.0.0 labels Nov 21, 2017

tlrx requested review from imotov and ywelsch November 21, 2017 13:32

ywelsch reviewed Nov 22, 2017

View reviewed changes

imotov approved these changes Nov 22, 2017

View reviewed changes

ywelsch approved these changes Nov 22, 2017

View reviewed changes

tlrx added 2 commits November 23, 2017 13:37

Apply feedback

7e82230

tlrx force-pushed the file-already-exist-in-snapshot-recovery branch from 18f07ca to 7e82230 Compare November 23, 2017 12:37

tlrx merged commit 5dc5580 into elastic:master Nov 24, 2017

tlrx added the v6.2.0 label Nov 24, 2017

tlrx deleted the file-already-exist-in-snapshot-recovery branch November 24, 2017 13:55

tlrx mentioned this pull request Jan 9, 2018

Snapshot restore does not fail properly when files are missing. #9433

Closed

jimczi added v7.0.0-beta1 and removed v7.0.0 labels Feb 7, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Delete shard store files before restoring a snapshot #27476

Delete shard store files before restoring a snapshot #27476

tlrx commented Nov 21, 2017

ywelsch left a comment

ywelsch Nov 22, 2017

tlrx Nov 22, 2017

imotov Nov 22, 2017

tlrx Nov 22, 2017

ywelsch Nov 22, 2017

tlrx Nov 22, 2017

imotov Nov 22, 2017

ywelsch Nov 22, 2017

imotov Nov 22, 2017

tlrx Nov 22, 2017

ywelsch Nov 22, 2017

imotov left a comment

imotov Nov 22, 2017

imotov Nov 22, 2017

imotov Nov 22, 2017

ywelsch left a comment

fhalde commented Apr 12, 2020

fhalde commented Apr 12, 2020

tlrx commented Apr 14, 2020

ywelsch commented Apr 14, 2020

tlrx commented Apr 14, 2020 •

edited

Loading

fhalde commented Apr 14, 2020

Delete shard store files before restoring a snapshot #27476

Delete shard store files before restoring a snapshot #27476

Conversation

tlrx commented Nov 21, 2017

ywelsch left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

imotov left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ywelsch left a comment

Choose a reason for hiding this comment

fhalde commented Apr 12, 2020

fhalde commented Apr 12, 2020

tlrx commented Apr 14, 2020

ywelsch commented Apr 14, 2020

tlrx commented Apr 14, 2020 • edited Loading

fhalde commented Apr 14, 2020

tlrx commented Apr 14, 2020 •

edited

Loading