Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Delete shard store files before restoring a snapshot #27476

Merged
merged 2 commits into from
Nov 24, 2017

Conversation

tlrx
Copy link
Member

@tlrx tlrx commented Nov 21, 2017

Pull request #20220 added a change where the store files
that have the same name but are different from the ones in the
snapshot are deleted first before the snapshot is restored.
This logic was based on the Store.RecoveryDiff.different
set of files which works by computing a diff between an
existing store and a snapshot.

This works well when the files on the filesystem form valid
shard store, ie there's a segments file and store files
are not corrupted. Otherwise, the existing store's snapshot
metadata cannot be read (using Store#snapshotStoreMetadata())
and an exception is thrown
(CorruptIndexException, IndexFormatTooOldException etc) which
is later caught as the begining of the restore process
(see RestoreContext#restore()) and is translated into
an empty store metadata (Store.MetadataSnapshot.EMPTY).

This will make the deletion of different files introduced
in #20220 useless as the set of files will always be empty
even when store files exist on the filesystem. And if some
files are present within the store directory, then restoring
a snapshot with files with same names will fail with a
FileAlreadyExistException.

This is part of the #26865 issue.

There are various cases were some files could exist in the
store directory before a snapshot is restored. One that
Igor identified is a restore attempt that failed on a node
and only first files were restored, then the shard is allocated
again to the same node and the restore starts again (but fails
because of existing files). Another one is when some files
of a closed index are corrupted / deleted and the index is
restored.

This commit adds a test that uses the infrastructure provided
by IndexShardTestCase in order to test that restoring a shard
succeed even when files with same names exist on filesystem.

Related to #26865

Copy link
Contributor

@ywelsch ywelsch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've left some minor questions, but overall change LGTM.

// list of all existing store files without the identical ones
final Set<String> deleteIfExistFiles = Sets.difference(
new HashSet<>(Arrays.asList(store.directory().listAll())),
diff.identical.stream().map(StoreFileMetaData::name).collect(Collectors.toSet())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a need to substract the identical ones? The filesToRecover, which we iterate over, won't contain those anyhow.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That was a safety check I added but I agree it does not make sense, I'll remove it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with identical point. Could you also clean the trace logging 10 lines above while you are at it? I keep forgetting to do it and it doesn't make any sense now.

Also, can anyone think of a scenario where case-insensitive FS can screw us here somehow?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, can anyone think of a scenario where case-insensitive FS can screw us here somehow?

I don't see any scenario like this...

final Store.MetadataSnapshot sourceMetaData = new Store.MetadataSnapshot(unmodifiableMap(snapshotMetaData), emptyMap(), 0);

final StoreFileMetaData restoredSegmentsFile = sourceMetaData.getSegmentsFile();
if (restoredSegmentsFile == null) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see that you fixed the check here (before it read recoveryTargetMetadata == null which did not make any sense as that one was always non-null). I wonder what impact this bug fix has. What change in behavior do we expect by this (it's a bit unclear to me what this check does)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sadly there's no test for this and as you noticed the previous check did not make any sense. I saw it as a bug and I think that we can't restore a snapshot without segments file so I don't expect any impact for this. Maybe @imotov has more knowledge?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I cannot really think of a scenario where snapshot would have no segments. So, removing check for recoveryTargetMetadata and replacing it with check for no snapshots shouldn't have any change in behavior expect in some pathological cases that would fail anyway. Now they will at least fail with a reasonable error message.

*
* It implements only the methods required by the tests and is not thread safe.
*/
class MemoryBlobStoreRepository extends BlobStoreRepository {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can just use an FsRepository for this. All our other shard-level tests do the same, so no need to optimize this. If we want to change that in the future, I think it's easier to switch to jimfs and continue using FsRepository.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's that, or we can replace replace FsRepository with this one, but we need to beef it up.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's just use a FsRepository.

}

/** Recover a shard from a snapshot using a given repository **/
private void recoverShardFromSnapshot(final IndexShard shard,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can move this method (and the one below it) up to IndexShardTestCase (in test:framework). It could be useful for other people.

Copy link
Contributor

@imotov imotov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree with @ywelsch on minor comments. Left one cleanup suggestion. Otherwise LGTM.

final Store.MetadataSnapshot sourceMetaData = new Store.MetadataSnapshot(unmodifiableMap(snapshotMetaData), emptyMap(), 0);

final StoreFileMetaData restoredSegmentsFile = sourceMetaData.getSegmentsFile();
if (restoredSegmentsFile == null) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I cannot really think of a scenario where snapshot would have no segments. So, removing check for recoveryTargetMetadata and replacing it with check for no snapshots shouldn't have any change in behavior expect in some pathological cases that would fail anyway. Now they will at least fail with a reasonable error message.

// list of all existing store files without the identical ones
final Set<String> deleteIfExistFiles = Sets.difference(
new HashSet<>(Arrays.asList(store.directory().listAll())),
diff.identical.stream().map(StoreFileMetaData::name).collect(Collectors.toSet())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with identical point. Could you also clean the trace logging 10 lines above while you are at it? I keep forgetting to do it and it doesn't make any sense now.

Also, can anyone think of a scenario where case-insensitive FS can screw us here somehow?

*
* It implements only the methods required by the tests and is not thread safe.
*/
class MemoryBlobStoreRepository extends BlobStoreRepository {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's that, or we can replace replace FsRepository with this one, but we need to beef it up.

Copy link
Contributor

@ywelsch ywelsch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks @tlrx

tlrx added 2 commits November 23, 2017 13:37
Pull request elastic#20220 added a change where the store files
that have the same name but are different from the ones in the
snapshot are deleted first before the snapshot is restored.
This logic was based on the `Store.RecoveryDiff.different`
set of files which works by computing a diff between an
existing store and a snapshot.

This works well when the files on the filesystem form valid
shard store, ie there's a `segments` file and store files
are not corrupted. Otherwise, the existing store's snapshot
metadata cannot be read (using Store#snapshotStoreMetadata())
and an exception is thrown
(CorruptIndexException, IndexFormatTooOldException etc) which
is later caught as the begining of the restore process
(see RestoreContext#restore()) and is translated into
an empty store metadata (Store.MetadataSnapshot.EMPTY).

This will make the deletion of different files introduced
in elastic#20220 useless as the set of files will always be empty
even when store files exist on the filesystem. And if some
files are present within the store directory, then restoring
a snapshot with files with same names will fail with a
FileAlreadyExistException.

This is part of the elastic#26865 issue.

There are various cases were some files could exist in the
 store directory before a snapshot is restored. One that
Igor identified is a restore attempt that failed on a node
and only first files were restored, then the shard is allocated
again to the same node and the restore starts again (but fails
 because of existing files). Another one is when some files
of a closed index are corrupted / deleted and the index is
restored.

This commit adds a test that uses the infrastructure provided
by IndexShardTestCase in order to test that restoring a shard
succeed even when files with same names exist on filesystem.

Related to elastic#26865
@tlrx tlrx force-pushed the file-already-exist-in-snapshot-recovery branch from 18f07ca to 7e82230 Compare November 23, 2017 12:37
@tlrx tlrx merged commit 5dc5580 into elastic:master Nov 24, 2017
tlrx added a commit that referenced this pull request Nov 24, 2017
Pull request #20220 added a change where the store files
that have the same name but are different from the ones in the
snapshot are deleted first before the snapshot is restored.
This logic was based on the `Store.RecoveryDiff.different`
set of files which works by computing a diff between an
existing store and a snapshot.

This works well when the files on the filesystem form valid
shard store, ie there's a `segments` file and store files
are not corrupted. Otherwise, the existing store's snapshot
metadata cannot be read (using Store#snapshotStoreMetadata())
and an exception is thrown
(CorruptIndexException, IndexFormatTooOldException etc) which
is later caught as the begining of the restore process
(see RestoreContext#restore()) and is translated into
an empty store metadata (Store.MetadataSnapshot.EMPTY).

This will make the deletion of different files introduced
in #20220 useless as the set of files will always be empty
even when store files exist on the filesystem. And if some
files are present within the store directory, then restoring
a snapshot with files with same names will fail with a
FileAlreadyExistException.

This is part of the #26865 issue.

There are various cases were some files could exist in the
 store directory before a snapshot is restored. One that
Igor identified is a restore attempt that failed on a node
and only first files were restored, then the shard is allocated
again to the same node and the restore starts again (but fails
 because of existing files). Another one is when some files
of a closed index are corrupted / deleted and the index is
restored.

This commit adds a test that uses the infrastructure provided
by IndexShardTestCase in order to test that restoring a shard
succeed even when files with same names exist on filesystem.

Related to #26865
@tlrx tlrx added the v6.2.0 label Nov 24, 2017
tlrx added a commit that referenced this pull request Nov 24, 2017
Pull request #20220 added a change where the store files
that have the same name but are different from the ones in the
snapshot are deleted first before the snapshot is restored.
This logic was based on the `Store.RecoveryDiff.different`
set of files which works by computing a diff between an
existing store and a snapshot.

This works well when the files on the filesystem form valid
shard store, ie there's a `segments` file and store files
are not corrupted. Otherwise, the existing store's snapshot
metadata cannot be read (using Store#snapshotStoreMetadata())
and an exception is thrown
(CorruptIndexException, IndexFormatTooOldException etc) which
is later caught as the begining of the restore process
(see RestoreContext#restore()) and is translated into
an empty store metadata (Store.MetadataSnapshot.EMPTY).

This will make the deletion of different files introduced
in #20220 useless as the set of files will always be empty
even when store files exist on the filesystem. And if some
files are present within the store directory, then restoring
a snapshot with files with same names will fail with a
FileAlreadyExistException.

This is part of the #26865 issue.

There are various cases were some files could exist in the
 store directory before a snapshot is restored. One that
Igor identified is a restore attempt that failed on a node
and only first files were restored, then the shard is allocated
again to the same node and the restore starts again (but fails
 because of existing files). Another one is when some files
of a closed index are corrupted / deleted and the index is
restored.

This commit adds a test that uses the infrastructure provided
by IndexShardTestCase in order to test that restoring a shard
succeed even when files with same names exist on filesystem.

Related to #26865
tlrx added a commit that referenced this pull request Nov 24, 2017
Pull request #20220 added a change where the store files
that have the same name but are different from the ones in the
snapshot are deleted first before the snapshot is restored.
This logic was based on the `Store.RecoveryDiff.different`
set of files which works by computing a diff between an
existing store and a snapshot.

This works well when the files on the filesystem form valid
shard store, ie there's a `segments` file and store files
are not corrupted. Otherwise, the existing store's snapshot
metadata cannot be read (using Store#snapshotStoreMetadata())
and an exception is thrown
(CorruptIndexException, IndexFormatTooOldException etc) which
is later caught as the begining of the restore process
(see RestoreContext#restore()) and is translated into
an empty store metadata (Store.MetadataSnapshot.EMPTY).

This will make the deletion of different files introduced
in #20220 useless as the set of files will always be empty
even when store files exist on the filesystem. And if some
files are present within the store directory, then restoring
a snapshot with files with same names will fail with a
FileAlreadyExistException.

This is part of the #26865 issue.

There are various cases were some files could exist in the
 store directory before a snapshot is restored. One that
Igor identified is a restore attempt that failed on a node
and only first files were restored, then the shard is allocated
again to the same node and the restore starts again (but fails
 because of existing files). Another one is when some files
of a closed index are corrupted / deleted and the index is
restored.

This commit adds a test that uses the infrastructure provided
by IndexShardTestCase in order to test that restoring a shard
succeed even when files with same names exist on filesystem.

Related to #26865
tlrx added a commit that referenced this pull request Nov 24, 2017
Pull request #20220 added a change where the store files
that have the same name but are different from the ones in the
snapshot are deleted first before the snapshot is restored.
This logic was based on the `Store.RecoveryDiff.different`
set of files which works by computing a diff between an
existing store and a snapshot.

This works well when the files on the filesystem form valid
shard store, ie there's a `segments` file and store files
are not corrupted. Otherwise, the existing store's snapshot
metadata cannot be read (using Store#snapshotStoreMetadata())
and an exception is thrown
(CorruptIndexException, IndexFormatTooOldException etc) which
is later caught as the begining of the restore process
(see RestoreContext#restore()) and is translated into
an empty store metadata (Store.MetadataSnapshot.EMPTY).

This will make the deletion of different files introduced
in #20220 useless as the set of files will always be empty
even when store files exist on the filesystem. And if some
files are present within the store directory, then restoring
a snapshot with files with same names will fail with a
FileAlreadyExistException.

This is part of the #26865 issue.

There are various cases were some files could exist in the
 store directory before a snapshot is restored. One that
Igor identified is a restore attempt that failed on a node
and only first files were restored, then the shard is allocated
again to the same node and the restore starts again (but fails
 because of existing files). Another one is when some files
of a closed index are corrupted / deleted and the index is
restored.

This commit adds a test that uses the infrastructure provided
by IndexShardTestCase in order to test that restoring a shard
succeed even when files with same names exist on filesystem.

Related to #26865
@tlrx tlrx deleted the file-already-exist-in-snapshot-recovery branch November 24, 2017 13:55
martijnvg added a commit that referenced this pull request Nov 24, 2017
* es/master: (38 commits)
  Backport wait_for_initialiazing_shards to cluster health API
  Carry over version map size to prevent excessive resizing (#27516)
  Fix scroll query with a sort that is a prefix of the index sort (#27498)
  Delete shard store files before restoring a snapshot (#27476)
  Replace `delimited_payload_filter` by `delimited_payload` (#26625)
  CURRENT should not be a -SNAPSHOT version if build.snapshot is false (#27512)
  Fix merging of _meta field (#27352)
  Remove unused method (#27508)
  unmuted test, this has been fixed by #27397
  Consolidate version numbering semantics (#27397)
  Add wait_for_no_initializing_shards to cluster health API (#27489)
  [TEST] use routing partition size based on the max routing shards of the second split
  Adjust CombinedDeletionPolicy for multiple commits (#27456)
  Update composite-aggregation.asciidoc
  Deprecate `levenstein` in favor of `levenshtein` (#27409)
  Automatically prepare indices for splitting (#27451)
  Validate `op_type` for `_create` (#27483)
  Minor ShapeBuilder cleanup
  muted test
  Decouple nio constructs from the tcp transport (#27484)
  ...
martijnvg added a commit that referenced this pull request Nov 24, 2017
* es/6.x: (30 commits)
  Add wait_for_no_initializing_shards to cluster health API (#27489)
  Carry over version map size to prevent excessive resizing (#27516)
  Fix scroll query with a sort that is a prefix of the index sort (#27498)
  Delete shard store files before restoring a snapshot (#27476)
  CURRENT should not be a -SNAPSHOT version if build.snapshot is false (#27512)
  Fix merging of _meta field (#27352)
  test: do not run percolator query builder bwc test against 5.x versions
  Remove unused method (#27508)
  Consolidate version numbering semantics (#27397)
  Adjust CombinedDeletionPolicy for multiple commits (#27456)
  Minor ShapeBuilder cleanup
  [GEO] Deprecate ShapeBuilders and decouple geojson parse logic
  Improve docs for split API in 6.1/6.x (#27504)
  test: use correct pre 6.0.0-alpha1 format
  Update composite-aggregation.asciidoc
  Deprecate `levenstein` in favor of `levenshtein` (#27409)
  Decouple nio constructs from the tcp transport (#27484)
  Bump version from 6.1 to 6.2
  Fix whitespace in Security.java
  Tighten which classes can exit
  ...
@jimczi jimczi added v7.0.0-beta1 and removed v7.0.0 labels Feb 7, 2019
@fhalde
Copy link

fhalde commented Apr 12, 2020

Does this mean if i had a backup cluster on which a snapshot S1 was restored & later on a snapshot S2 was created & restored, then the restore of S2 will be a full restoration of indices and not just of the diff between S1 & S2?

Just want to know if this is expected because we have been seeing slow primary shard initialization (when we restore on backup) after we upgraded to 7.2. In 1.7, subsequent restores on a backup cluster (with data) was a very fast process

@fhalde
Copy link

fhalde commented Apr 12, 2020

And if this is true, then what's the rationale? Why can't segment files which are untouched cannot be reused?

@tlrx
Copy link
Member Author

tlrx commented Apr 14, 2020

@fhalde the restore process tries to reuse the existing files on disk, when possible. It is always based on a diff between the files on disk and the files contained in the snapshot to be restored. Note that shard files can differ substantially between two consecutive snapshots: it's possible that some segments were merged or that the snapshot was taken from a primary that has since been demoted in favor of a new primary. This pull request concerned the case were old files are present on disk but cannot be linked to an existing segment, end therefore should not be reused, causing trouble later in the restore process.

@ywelsch
Copy link
Contributor

ywelsch commented Apr 14, 2020

Probably #55142

@tlrx
Copy link
Member Author

tlrx commented Apr 14, 2020

Yep, I didn't see that @fhalde already opened #55013 and also asked questions here. #55142 looks like the culprit here.

@fhalde
Copy link

fhalde commented Apr 14, 2020

Thanks @tlrx @ywelsch , will track this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants