-
Notifications
You must be signed in to change notification settings - Fork 24.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Delete shard store files before restoring a snapshot #27476
Delete shard store files before restoring a snapshot #27476
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've left some minor questions, but overall change LGTM.
// list of all existing store files without the identical ones | ||
final Set<String> deleteIfExistFiles = Sets.difference( | ||
new HashSet<>(Arrays.asList(store.directory().listAll())), | ||
diff.identical.stream().map(StoreFileMetaData::name).collect(Collectors.toSet()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a need to substract the identical ones? The filesToRecover, which we iterate over, won't contain those anyhow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That was a safety check I added but I agree it does not make sense, I'll remove it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with identical point. Could you also clean the trace logging 10 lines above while you are at it? I keep forgetting to do it and it doesn't make any sense now.
Also, can anyone think of a scenario where case-insensitive FS can screw us here somehow?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, can anyone think of a scenario where case-insensitive FS can screw us here somehow?
I don't see any scenario like this...
final Store.MetadataSnapshot sourceMetaData = new Store.MetadataSnapshot(unmodifiableMap(snapshotMetaData), emptyMap(), 0); | ||
|
||
final StoreFileMetaData restoredSegmentsFile = sourceMetaData.getSegmentsFile(); | ||
if (restoredSegmentsFile == null) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see that you fixed the check here (before it read recoveryTargetMetadata == null
which did not make any sense as that one was always non-null). I wonder what impact this bug fix has. What change in behavior do we expect by this (it's a bit unclear to me what this check does)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sadly there's no test for this and as you noticed the previous check did not make any sense. I saw it as a bug and I think that we can't restore a snapshot without segments file so I don't expect any impact for this. Maybe @imotov has more knowledge?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I cannot really think of a scenario where snapshot would have no segments. So, removing check for recoveryTargetMetadata and replacing it with check for no snapshots shouldn't have any change in behavior expect in some pathological cases that would fail anyway. Now they will at least fail with a reasonable error message.
* | ||
* It implements only the methods required by the tests and is not thread safe. | ||
*/ | ||
class MemoryBlobStoreRepository extends BlobStoreRepository { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can just use an FsRepository for this. All our other shard-level tests do the same, so no need to optimize this. If we want to change that in the future, I think it's easier to switch to jimfs and continue using FsRepository.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's that, or we can replace replace FsRepository with this one, but we need to beef it up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's just use a FsRepository.
} | ||
|
||
/** Recover a shard from a snapshot using a given repository **/ | ||
private void recoverShardFromSnapshot(final IndexShard shard, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can move this method (and the one below it) up to IndexShardTestCase (in test:framework). It could be useful for other people.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree with @ywelsch on minor comments. Left one cleanup suggestion. Otherwise LGTM.
final Store.MetadataSnapshot sourceMetaData = new Store.MetadataSnapshot(unmodifiableMap(snapshotMetaData), emptyMap(), 0); | ||
|
||
final StoreFileMetaData restoredSegmentsFile = sourceMetaData.getSegmentsFile(); | ||
if (restoredSegmentsFile == null) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I cannot really think of a scenario where snapshot would have no segments. So, removing check for recoveryTargetMetadata and replacing it with check for no snapshots shouldn't have any change in behavior expect in some pathological cases that would fail anyway. Now they will at least fail with a reasonable error message.
// list of all existing store files without the identical ones | ||
final Set<String> deleteIfExistFiles = Sets.difference( | ||
new HashSet<>(Arrays.asList(store.directory().listAll())), | ||
diff.identical.stream().map(StoreFileMetaData::name).collect(Collectors.toSet()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with identical point. Could you also clean the trace logging 10 lines above while you are at it? I keep forgetting to do it and it doesn't make any sense now.
Also, can anyone think of a scenario where case-insensitive FS can screw us here somehow?
* | ||
* It implements only the methods required by the tests and is not thread safe. | ||
*/ | ||
class MemoryBlobStoreRepository extends BlobStoreRepository { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's that, or we can replace replace FsRepository with this one, but we need to beef it up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks @tlrx
Pull request elastic#20220 added a change where the store files that have the same name but are different from the ones in the snapshot are deleted first before the snapshot is restored. This logic was based on the `Store.RecoveryDiff.different` set of files which works by computing a diff between an existing store and a snapshot. This works well when the files on the filesystem form valid shard store, ie there's a `segments` file and store files are not corrupted. Otherwise, the existing store's snapshot metadata cannot be read (using Store#snapshotStoreMetadata()) and an exception is thrown (CorruptIndexException, IndexFormatTooOldException etc) which is later caught as the begining of the restore process (see RestoreContext#restore()) and is translated into an empty store metadata (Store.MetadataSnapshot.EMPTY). This will make the deletion of different files introduced in elastic#20220 useless as the set of files will always be empty even when store files exist on the filesystem. And if some files are present within the store directory, then restoring a snapshot with files with same names will fail with a FileAlreadyExistException. This is part of the elastic#26865 issue. There are various cases were some files could exist in the store directory before a snapshot is restored. One that Igor identified is a restore attempt that failed on a node and only first files were restored, then the shard is allocated again to the same node and the restore starts again (but fails because of existing files). Another one is when some files of a closed index are corrupted / deleted and the index is restored. This commit adds a test that uses the infrastructure provided by IndexShardTestCase in order to test that restoring a shard succeed even when files with same names exist on filesystem. Related to elastic#26865
18f07ca
to
7e82230
Compare
Pull request #20220 added a change where the store files that have the same name but are different from the ones in the snapshot are deleted first before the snapshot is restored. This logic was based on the `Store.RecoveryDiff.different` set of files which works by computing a diff between an existing store and a snapshot. This works well when the files on the filesystem form valid shard store, ie there's a `segments` file and store files are not corrupted. Otherwise, the existing store's snapshot metadata cannot be read (using Store#snapshotStoreMetadata()) and an exception is thrown (CorruptIndexException, IndexFormatTooOldException etc) which is later caught as the begining of the restore process (see RestoreContext#restore()) and is translated into an empty store metadata (Store.MetadataSnapshot.EMPTY). This will make the deletion of different files introduced in #20220 useless as the set of files will always be empty even when store files exist on the filesystem. And if some files are present within the store directory, then restoring a snapshot with files with same names will fail with a FileAlreadyExistException. This is part of the #26865 issue. There are various cases were some files could exist in the store directory before a snapshot is restored. One that Igor identified is a restore attempt that failed on a node and only first files were restored, then the shard is allocated again to the same node and the restore starts again (but fails because of existing files). Another one is when some files of a closed index are corrupted / deleted and the index is restored. This commit adds a test that uses the infrastructure provided by IndexShardTestCase in order to test that restoring a shard succeed even when files with same names exist on filesystem. Related to #26865
Pull request #20220 added a change where the store files that have the same name but are different from the ones in the snapshot are deleted first before the snapshot is restored. This logic was based on the `Store.RecoveryDiff.different` set of files which works by computing a diff between an existing store and a snapshot. This works well when the files on the filesystem form valid shard store, ie there's a `segments` file and store files are not corrupted. Otherwise, the existing store's snapshot metadata cannot be read (using Store#snapshotStoreMetadata()) and an exception is thrown (CorruptIndexException, IndexFormatTooOldException etc) which is later caught as the begining of the restore process (see RestoreContext#restore()) and is translated into an empty store metadata (Store.MetadataSnapshot.EMPTY). This will make the deletion of different files introduced in #20220 useless as the set of files will always be empty even when store files exist on the filesystem. And if some files are present within the store directory, then restoring a snapshot with files with same names will fail with a FileAlreadyExistException. This is part of the #26865 issue. There are various cases were some files could exist in the store directory before a snapshot is restored. One that Igor identified is a restore attempt that failed on a node and only first files were restored, then the shard is allocated again to the same node and the restore starts again (but fails because of existing files). Another one is when some files of a closed index are corrupted / deleted and the index is restored. This commit adds a test that uses the infrastructure provided by IndexShardTestCase in order to test that restoring a shard succeed even when files with same names exist on filesystem. Related to #26865
Pull request #20220 added a change where the store files that have the same name but are different from the ones in the snapshot are deleted first before the snapshot is restored. This logic was based on the `Store.RecoveryDiff.different` set of files which works by computing a diff between an existing store and a snapshot. This works well when the files on the filesystem form valid shard store, ie there's a `segments` file and store files are not corrupted. Otherwise, the existing store's snapshot metadata cannot be read (using Store#snapshotStoreMetadata()) and an exception is thrown (CorruptIndexException, IndexFormatTooOldException etc) which is later caught as the begining of the restore process (see RestoreContext#restore()) and is translated into an empty store metadata (Store.MetadataSnapshot.EMPTY). This will make the deletion of different files introduced in #20220 useless as the set of files will always be empty even when store files exist on the filesystem. And if some files are present within the store directory, then restoring a snapshot with files with same names will fail with a FileAlreadyExistException. This is part of the #26865 issue. There are various cases were some files could exist in the store directory before a snapshot is restored. One that Igor identified is a restore attempt that failed on a node and only first files were restored, then the shard is allocated again to the same node and the restore starts again (but fails because of existing files). Another one is when some files of a closed index are corrupted / deleted and the index is restored. This commit adds a test that uses the infrastructure provided by IndexShardTestCase in order to test that restoring a shard succeed even when files with same names exist on filesystem. Related to #26865
Pull request #20220 added a change where the store files that have the same name but are different from the ones in the snapshot are deleted first before the snapshot is restored. This logic was based on the `Store.RecoveryDiff.different` set of files which works by computing a diff between an existing store and a snapshot. This works well when the files on the filesystem form valid shard store, ie there's a `segments` file and store files are not corrupted. Otherwise, the existing store's snapshot metadata cannot be read (using Store#snapshotStoreMetadata()) and an exception is thrown (CorruptIndexException, IndexFormatTooOldException etc) which is later caught as the begining of the restore process (see RestoreContext#restore()) and is translated into an empty store metadata (Store.MetadataSnapshot.EMPTY). This will make the deletion of different files introduced in #20220 useless as the set of files will always be empty even when store files exist on the filesystem. And if some files are present within the store directory, then restoring a snapshot with files with same names will fail with a FileAlreadyExistException. This is part of the #26865 issue. There are various cases were some files could exist in the store directory before a snapshot is restored. One that Igor identified is a restore attempt that failed on a node and only first files were restored, then the shard is allocated again to the same node and the restore starts again (but fails because of existing files). Another one is when some files of a closed index are corrupted / deleted and the index is restored. This commit adds a test that uses the infrastructure provided by IndexShardTestCase in order to test that restoring a shard succeed even when files with same names exist on filesystem. Related to #26865
* es/master: (38 commits) Backport wait_for_initialiazing_shards to cluster health API Carry over version map size to prevent excessive resizing (#27516) Fix scroll query with a sort that is a prefix of the index sort (#27498) Delete shard store files before restoring a snapshot (#27476) Replace `delimited_payload_filter` by `delimited_payload` (#26625) CURRENT should not be a -SNAPSHOT version if build.snapshot is false (#27512) Fix merging of _meta field (#27352) Remove unused method (#27508) unmuted test, this has been fixed by #27397 Consolidate version numbering semantics (#27397) Add wait_for_no_initializing_shards to cluster health API (#27489) [TEST] use routing partition size based on the max routing shards of the second split Adjust CombinedDeletionPolicy for multiple commits (#27456) Update composite-aggregation.asciidoc Deprecate `levenstein` in favor of `levenshtein` (#27409) Automatically prepare indices for splitting (#27451) Validate `op_type` for `_create` (#27483) Minor ShapeBuilder cleanup muted test Decouple nio constructs from the tcp transport (#27484) ...
* es/6.x: (30 commits) Add wait_for_no_initializing_shards to cluster health API (#27489) Carry over version map size to prevent excessive resizing (#27516) Fix scroll query with a sort that is a prefix of the index sort (#27498) Delete shard store files before restoring a snapshot (#27476) CURRENT should not be a -SNAPSHOT version if build.snapshot is false (#27512) Fix merging of _meta field (#27352) test: do not run percolator query builder bwc test against 5.x versions Remove unused method (#27508) Consolidate version numbering semantics (#27397) Adjust CombinedDeletionPolicy for multiple commits (#27456) Minor ShapeBuilder cleanup [GEO] Deprecate ShapeBuilders and decouple geojson parse logic Improve docs for split API in 6.1/6.x (#27504) test: use correct pre 6.0.0-alpha1 format Update composite-aggregation.asciidoc Deprecate `levenstein` in favor of `levenshtein` (#27409) Decouple nio constructs from the tcp transport (#27484) Bump version from 6.1 to 6.2 Fix whitespace in Security.java Tighten which classes can exit ...
Does this mean if i had a backup cluster on which a snapshot S1 was restored & later on a snapshot S2 was created & restored, then the restore of S2 will be a full restoration of indices and not just of the diff between S1 & S2? Just want to know if this is expected because we have been seeing slow primary shard initialization (when we restore on backup) after we upgraded to 7.2. In 1.7, subsequent restores on a backup cluster (with data) was a very fast process |
And if this is true, then what's the rationale? Why can't segment files which are untouched cannot be reused? |
@fhalde the restore process tries to reuse the existing files on disk, when possible. It is always based on a diff between the files on disk and the files contained in the snapshot to be restored. Note that shard files can differ substantially between two consecutive snapshots: it's possible that some segments were merged or that the snapshot was taken from a primary that has since been demoted in favor of a new primary. This pull request concerned the case were old files are present on disk but cannot be linked to an existing segment, end therefore should not be reused, causing trouble later in the restore process. |
Probably #55142 |
Pull request #20220 added a change where the store files
that have the same name but are different from the ones in the
snapshot are deleted first before the snapshot is restored.
This logic was based on the
Store.RecoveryDiff.different
set of files which works by computing a diff between an
existing store and a snapshot.
This works well when the files on the filesystem form valid
shard store, ie there's a
segments
file and store filesare not corrupted. Otherwise, the existing store's snapshot
metadata cannot be read (using Store#snapshotStoreMetadata())
and an exception is thrown
(CorruptIndexException, IndexFormatTooOldException etc) which
is later caught as the begining of the restore process
(see RestoreContext#restore()) and is translated into
an empty store metadata (Store.MetadataSnapshot.EMPTY).
This will make the deletion of different files introduced
in #20220 useless as the set of files will always be empty
even when store files exist on the filesystem. And if some
files are present within the store directory, then restoring
a snapshot with files with same names will fail with a
FileAlreadyExistException.
This is part of the #26865 issue.
There are various cases were some files could exist in the
store directory before a snapshot is restored. One that
Igor identified is a restore attempt that failed on a node
and only first files were restored, then the shard is allocated
again to the same node and the restore starts again (but fails
because of existing files). Another one is when some files
of a closed index are corrupted / deleted and the index is
restored.
This commit adds a test that uses the infrastructure provided
by
IndexShardTestCase
in order to test that restoring a shardsucceed even when files with same names exist on filesystem.
Related to #26865