Enable Fully Concurrent Snapshot Operations #56911

original-brownbear · 2020-05-18T13:43:58Z

Enables fully concurrent snapshot operations:

Snapshot create- and delete operations can be started in any order
Delete operations wait for snapshot finalization to finish, are batched as much as possible to improve efficiency and once enqueued in the cluster state prevent new snapshots from starting on data nodes until executed
- We could be even more concurrent here in a follow-up by interleaving deletes and snapshots on a per-shard level. I decided not to do this for now since it seemed not worth the added complexity yet. Due to batching+deduplicating of deletes the pain of having a delete stuck behind a long -running snapshot seemed manageable (dropped client connections + resulting retries don't cause issues due to deduplication of delete jobs, batching of deletes allows enqueuing more and more deletes even if a snapshot blocks for a long time that will all be executed in essentially constant time (due to bulk snapshot deletion, deleting multiple snapshots is mostly about as fast as deleting a single one))
Snapshot creation is completely concurrent across shards, but per shard snapshots are linearized for each repository as are snapshot finalizations

See updated JavaDoc and added test cases for more details and illustration on the functionality.

Some notes:

The queuing of snapshot finalizations and deletes and the related locking/synchronization is a little awkward in this version but can be much simplified with some refactoring. The problem is that snapshot finalizations resolve their listeners on the SNAPSHOT pool while deletes resolve the listener on the master update thread. With some refactoring both of these could be moved to the master update thread, effectively removing the need for any synchronization around the SnapshotService state. I didn't do this refactoring here because it's a fairly large change and not necessary for the functionality but plan to do so in a follow-up.

This change allows for completely removing any trickery around synchronizing deletes and snapshots from SLM and 100% does away with SLM errors from collisions between deletes and snapshots.

Snapshotting a single index in parallel to a long running full backup will execute without having to wait for the long running backup as required by the ILM/SLM use case of moving indices to "snapshot tier". Finalizations are linearized but ordered according to which snapshot saw all of its shards complete first

elasticmachine · 2020-05-18T13:44:00Z

Pinging @elastic/es-distributed (:Distributed/Snapshot/Restore)

Implements fully concurrent snapshot operations. See documentation changes to snapshot package level JavaDoc for details.

original-brownbear · 2020-06-15T09:28:01Z

...er/src/internalClusterTest/java/org/elasticsearch/snapshots/MinThreadsSnapshotRestoreIT.java

- * in the thread pool (for example, tests that use the mock repository that
- * block on master).
- */
-public class MinThreadsSnapshotRestoreIT extends AbstractSnapshotIntegTestCase {


All scenarios covered by this test become obsolete. The actual premise of this test (checking that we don't dead-lock from blocked threads) is covered by the fact that SnapshotResiliencyTests work for the most part anyway.

server/src/main/java/org/elasticsearch/snapshots/SnapshotShardsService.java

…pshots

ywelsch

LGTM. Thanks for tackling this challenge, and good luck with the backport.

…pshots

tlrx

LGTM This is great work, thanks Armin! I carefully reviewed most parts but I must admit that I lightly reviewed the "repository loop" part.

I deeply apologize for the time it took me to review this PR. The reviewing experience was not great for me due to the amount of code changes, I think using a dedicated branch here would have made sense.

tlrx · 2020-07-10T07:19:01Z

server/src/internalClusterTest/java/org/elasticsearch/snapshots/ConcurrentSnapshotsIT.java

+    private static String startDataNodeWithLargeSnapshotPool() {
+        return internalCluster().startDataOnlyNode(LARGE_SNAPSHOT_POOL_SETTINGS);
+    }
+    public void testSnapshotRunsAfterInProgressDelete() throws Exception {


nit: add extra line

tlrx · 2020-07-10T07:25:37Z

server/src/internalClusterTest/java/org/elasticsearch/snapshots/ConcurrentSnapshotsIT.java

+        assertThat(secondSnapshotResponse.isDone(), is(false));
+
+        unblockNode(repoName, dataNode);
+        assertThat(firstSnapshotResponse.get().getSnapshotInfo().state(), is(SnapshotState.FAILED));


Can we check that the 1st snapshot failed because it was aborted?

tlrx · 2020-07-10T07:41:41Z

server/src/internalClusterTest/java/org/elasticsearch/snapshots/ConcurrentSnapshotsIT.java

+        ensureStableCluster(3);
+
+        awaitNoMoreRunningOperations();
+        expectThrows(RepositoryException.class, deleteFuture::actionGet);


Is there a meaningful error message we could check here?

Not really, it's just "failed to update repository". It's all in the cause here, but that's also just a JSON parse failure.

tlrx · 2020-07-10T07:52:35Z

server/src/main/java/org/elasticsearch/cluster/SnapshotDeletionsInProgress.java

+        this(in.readList(Entry::new));
+    }
+
+    private static boolean assertConsistency(List<Entry> entries) {


nit: assertConsistency -> assertNoConcurrentDeletionsForSameRepository() ?

tlrx · 2020-07-10T08:00:31Z

server/src/main/java/org/elasticsearch/cluster/SnapshotsInProgress.java

+            try {
+                assert assertConsistentEntries(entries);
+            } catch (AssertionError e) {
+                throw e;


I'm not sure to understand why we catch and rethrow here

So I could put a debug breakpoint there :D Thanks for spotting!

tlrx · 2020-07-10T08:10:18Z

server/src/main/java/org/elasticsearch/snapshots/SnapshotsService.java

+    private final OngoingRepositoryOperations repositoryOperations = new OngoingRepositoryOperations();
+
+    /**
+     * Setting that specifies the maximum number of allow concurrent snapshot create and delete operations in the


allow -> allowed

tlrx · 2020-07-10T08:13:17Z

server/src/main/java/org/elasticsearch/snapshots/SnapshotsService.java

-                    currentState.custom(SnapshotDeletionsInProgress.TYPE, SnapshotDeletionsInProgress.EMPTY);
-                if (deletionsInProgress.hasDeletionsInProgress()) {
+                        currentState.custom(SnapshotDeletionsInProgress.TYPE, SnapshotDeletionsInProgress.EMPTY);
+                if (deletionsInProgress.hasDeletionsInProgress() && concurrentOperationsAllowed == false) {
                    throw new ConcurrentSnapshotExecutionException(repositoryName, snapshotName,
                        "cannot snapshot while a snapshot deletion is in-progress in [" + deletionsInProgress + "]");


When backporting, we could maybe indicate in the error message that concurrent snapshot/deletions are possible in version 7.9?

👍 Right, I put down a note for that when doing the back-port work.

tlrx · 2020-07-10T08:18:16Z

server/src/main/java/org/elasticsearch/snapshots/SnapshotsService.java

                    throw new ConcurrentSnapshotExecutionException(repositoryName, snapshotName, " a snapshot is already running");
                }
+                ensureBelowConcurrencyLimit(repositoryName, snapshotName, snapshots, deletionsInProgress);


What happen if multiple snapshot operations are started but the maxConcurrentOperations settings is updated to a value lower than the current number of concurrent ops? Would it still be possible to enque more ops?

What happen if multiple snapshot operations are started but the maxConcurrentOperations settings is updated to a value lower than the current number of concurrent ops?

Existing ops won't be affected but you can't start new ones.

Would it still be possible to enque more ops?
No then the ops have to come down to below the new limit first before we can enqueue more.

…pshots

original-brownbear · 2020-07-10T13:18:55Z

Thanks so much @ywelsch @tlrx !!!

Enables fully concurrent snapshot operations: * Snapshot create- and delete operations can be started in any order * Delete operations wait for snapshot finalization to finish, are batched as much as possible to improve efficiency and once enqueued in the cluster state prevent new snapshots from starting on data nodes until executed * We could be even more concurrent here in a follow-up by interleaving deletes and snapshots on a per-shard level. I decided not to do this for now since it seemed not worth the added complexity yet. Due to batching+deduplicating of deletes the pain of having a delete stuck behind a long -running snapshot seemed manageable (dropped client connections + resulting retries don't cause issues due to deduplication of delete jobs, batching of deletes allows enqueuing more and more deletes even if a snapshot blocks for a long time that will all be executed in essentially constant time (due to bulk snapshot deletion, deleting multiple snapshots is mostly about as fast as deleting a single one)) * Snapshot creation is completely concurrent across shards, but per shard snapshots are linearized for each repository as are snapshot finalizations See updated JavaDoc and added test cases for more details and illustration on the functionality. Some notes: The queuing of snapshot finalizations and deletes and the related locking/synchronization is a little awkward in this version but can be much simplified with some refactoring. The problem is that snapshot finalizations resolve their listeners on the `SNAPSHOT` pool while deletes resolve the listener on the master update thread. With some refactoring both of these could be moved to the master update thread, effectively removing the need for any synchronization around the `SnapshotService` state. I didn't do this refactoring here because it's a fairly large change and not necessary for the functionality but plan to do so in a follow-up. This change allows for completely removing any trickery around synchronizing deletes and snapshots from SLM and 100% does away with SLM errors from collisions between deletes and snapshots. Snapshotting a single index in parallel to a long running full backup will execute without having to wait for the long running backup as required by the ILM/SLM use case of moving indices to "snapshot tier". Finalizations are linearized but ordered according to which snapshot saw all of its shards complete first

There were two subtle bugs here from backporting elastic#56911 to 7.x. 1. We passed `null` for the `shards` map which isn't nullable any longer when creating `SnapshotsInProgress.Entry`, fixed by just passing an empty map like the `null` handling did in the past. 2. The removal of a failed `INIT` state snapshot from the cluster state tried removing it from the finalization loop (the set of repository names that are currently finalizing). This will trip an assertion since the snapshot failed before its repository was put into the set. I made the logic ignore the set in case we remove a failed `INIT` state snapshot to restore the old logic to exactly as it was before the concurrent snapshots backport to be on the safe side here. Also, added tests that explicitly call the old code paths because as can be seen from initially missing this, the BwC tests will only run in the configuration new version master, old version nodes ever so often and having a deterministic test for the old state machine seems the safest bet here. Closes elastic#59986

There were two subtle bugs here from backporting #56911 to 7.x. 1. We passed `null` for the `shards` map which isn't nullable any longer when creating `SnapshotsInProgress.Entry`, fixed by just passing an empty map like the `null` handling did in the past. 2. The removal of a failed `INIT` state snapshot from the cluster state tried removing it from the finalization loop (the set of repository names that are currently finalizing). This will trip an assertion since the snapshot failed before its repository was put into the set. I made the logic ignore the set in case we remove a failed `INIT` state snapshot to restore the old logic to exactly as it was before the concurrent snapshots backport to be on the safe side here. Also, added tests that explicitly call the old code paths because as can be seen from initially missing this, the BwC tests will only run in the configuration new version master, old version nodes ever so often and having a deterministic test for the old state machine seems the safest bet here. Closes #59986

There were two subtle bugs here from backporting elastic#56911 to 7.x. 1. We passed `null` for the `shards` map which isn't nullable any longer when creating `SnapshotsInProgress.Entry`, fixed by just passing an empty map like the `null` handling did in the past. 2. The removal of a failed `INIT` state snapshot from the cluster state tried removing it from the finalization loop (the set of repository names that are currently finalizing). This will trip an assertion since the snapshot failed before its repository was put into the set. I made the logic ignore the set in case we remove a failed `INIT` state snapshot to restore the old logic to exactly as it was before the concurrent snapshots backport to be on the safe side here. Also, added tests that explicitly call the old code paths because as can be seen from initially missing this, the BwC tests will only run in the configuration new version master, old version nodes ever so often and having a deterministic test for the old state machine seems the safest bet here. Closes elastic#59986

There were two subtle bugs here from backporting #56911 to 7.x. 1. We passed `null` for the `shards` map which isn't nullable any longer when creating `SnapshotsInProgress.Entry`, fixed by just passing an empty map like the `null` handling did in the past. 2. The removal of a failed `INIT` state snapshot from the cluster state tried removing it from the finalization loop (the set of repository names that are currently finalizing). This will trip an assertion since the snapshot failed before its repository was put into the set. I made the logic ignore the set in case we remove a failed `INIT` state snapshot to restore the old logic to exactly as it was before the concurrent snapshots backport to be on the safe side here. Also, added tests that explicitly call the old code paths because as can be seen from initially missing this, the BwC tests will only run in the configuration new version master, old version nodes ever so often and having a deterministic test for the old state machine seems the safest bet here. Closes #59986

original-brownbear added >enhancement WIP :Distributed Coordination/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs labels May 18, 2020

elasticmachine added the Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. label May 18, 2020

Fully Concurrent Snapshots

346f306

Implements fully concurrent snapshot operations. See documentation changes to snapshot package level JavaDoc for details.

original-brownbear force-pushed the allow-multiple-snapshots branch from 923aebd to 346f306 Compare June 15, 2020 09:12

original-brownbear commented Jun 15, 2020

View reviewed changes

original-brownbear added 5 commits June 15, 2020 11:37

cleanup todo

0fffc10

docs

26e4bce

cleanup

7076a0f

doc

7cd1c4e

reduce noise

a9b894c

original-brownbear commented Jun 15, 2020

View reviewed changes

server/src/main/java/org/elasticsearch/snapshots/SnapshotShardsService.java Outdated Show resolved Hide resolved

original-brownbear added 16 commits June 15, 2020 11:59

less noise

98516c6

fix test

37d439f

cover primary fail-over

8d10c4c

drier tests

ca110ec

fix test

75a4aad

update comment

37e7994

drier

8a5eed5

make things nicer looking

457c443

drop pointless short-circuit

66c6428

doc

72dbbcf

moar test

fc267dc

another corner case fixed

fa5176f

Merge remote-tracking branch 'elastic/master' into allow-multiple-sna…

c976fce

…pshots

Merge remote-tracking branch 'elastic/master' into allow-multiple-sna…

6cb5989

…pshots

more docs

075a0b7

bck

e27ec41

ywelsch approved these changes Jul 9, 2020

View reviewed changes

original-brownbear added 2 commits July 9, 2020 14:29

Merge remote-tracking branch 'elastic/master' into allow-multiple-sna…

e9b0311

…pshots

CR: limit 1k

897e6d1

tlrx approved these changes Jul 10, 2020

View reviewed changes

original-brownbear added 2 commits July 10, 2020 13:53

Merge remote-tracking branch 'elastic/master' into allow-multiple-sna…

382962b

…pshots

review comments

9e1a0db

original-brownbear added the backport pending label Jul 10, 2020

original-brownbear merged commit d333dac into elastic:master Jul 10, 2020

original-brownbear deleted the allow-multiple-snapshots branch July 10, 2020 13:19

original-brownbear mentioned this pull request Jul 14, 2020

Enable Fully Concurrent Snapshot Operations (#56911) #59578

Merged

original-brownbear removed the backport pending label Jul 14, 2020

original-brownbear mentioned this pull request Jul 15, 2020

Most of SLM Snapshot Orchestration can be Removed #59655

Closed

dakrone mentioned this pull request Jul 16, 2020

SnapshotLifecycleRestIT testFullPolicySnapshot failure #50358

Closed

original-brownbear mentioned this pull request Jul 21, 2020

Fix BwC Snapshot INIT Path #60006

Merged

original-brownbear mentioned this pull request Jul 22, 2020

Fix BwC Snapshot INIT Path (#60006) #60038

Merged

original-brownbear restored the allow-multiple-snapshots branch August 6, 2020 18:33

steve-mushero mentioned this pull request Aug 30, 2020

[Docs] - Snapshots can be concurrent, but docs not fully updated #61680

Closed

joegallo mentioned this pull request Nov 20, 2020

Rework SLM to avoid concurrent snapshots by logging-and-skipping #65318

Open

DaveCTurner mentioned this pull request Mar 2, 2021

Fix Concurrent Snapshot Create+Delete + Delete Index #61770

Merged

nickofthyme mentioned this pull request May 31, 2021

Remove SLM multiple policy restriction elastic/kibana#101007

Closed

jakelandis added v8.0.0-alpha1 and removed v8.0.0 labels Jul 26, 2021

tlrx mentioned this pull request Sep 6, 2022

Limit the number of concurrent shard snapshots #89826

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enable Fully Concurrent Snapshot Operations #56911

Enable Fully Concurrent Snapshot Operations #56911

original-brownbear commented May 18, 2020 •

edited

Loading

elasticmachine commented May 18, 2020

original-brownbear Jun 15, 2020

ywelsch left a comment

tlrx left a comment

tlrx Jul 10, 2020

tlrx Jul 10, 2020

tlrx Jul 10, 2020

original-brownbear Jul 10, 2020

tlrx Jul 10, 2020

tlrx Jul 10, 2020

original-brownbear Jul 10, 2020

tlrx Jul 10, 2020

tlrx Jul 10, 2020

original-brownbear Jul 10, 2020

tlrx Jul 10, 2020

original-brownbear Jul 10, 2020

original-brownbear commented Jul 10, 2020

Enable Fully Concurrent Snapshot Operations #56911

Enable Fully Concurrent Snapshot Operations #56911

Conversation

original-brownbear commented May 18, 2020 • edited Loading

elasticmachine commented May 18, 2020

Choose a reason for hiding this comment

ywelsch left a comment

Choose a reason for hiding this comment

tlrx left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

original-brownbear commented Jul 10, 2020

original-brownbear commented May 18, 2020 •

edited

Loading