ReplicationTracker.markAllocationIdAsInSync may hang if allocation is cancelled #30316

bleskes · 2018-05-01T21:59:08Z

At the end of recovery, we mark the recovering shard as "in sync" on the primary. From this point on the primary will treat any replication failure on it as critical and will reach out to the master to fail the shard. To do so, we wait for the local checkpoint of the recovered shard to be above the global checkpoint (in order to maintain global checkpoint invariant).

If the master decides to cancel the allocation of the recovering shard while we wait, the method can currently hang and fail to return. It will also ignore the interrupts that are triggered by the cancelled recovery due to the primary closing.

Note that this is crucial as this method is called while holding a primary permit. Since the method never comes back, the permit is never released. The unreleased permit will then block any primary relocation and while the primary is trying to relocate all indexing will be blocked for 30m as it waits to acquire the missing permit.

jasontedor

I left an annoying nit but you made the typo. 😇

LGTM.

jasontedor · 2018-05-01T22:02:30Z

server/src/main/java/org/elasticsearch/index/seqno/ReplicationTracker.java

@@ -339,6 +339,11 @@ private boolean invariant() {
                "shard copy " + entry.getKey() + " is in-sync but not tracked";
        }

+        // all pending in sync shards are tracked
+        for (String aId: pendingInSync) {


Nit: aId: -> aId :

ywelsch

LGTM. This looks to be the cause for the test failures seen here: #29161 (comment)

ywelsch · 2018-05-02T07:19:22Z

As it's a blocker, can you label it 6.3.0?

bleskes · 2018-05-02T11:56:56Z

As it's a blocker, can you label it 6.3.0?

I agree it's confusing. I was following the standard labelling of 6.3.1 + relabel on respin (once we know it ships). I'll change the label.

bleskes · 2018-05-02T12:56:05Z

run gradle build tests

…30318) The code in `SourceRecoveryHandler` runs under a `CancellableThreads` instance in order to allow long running operations to be interrupted when the recovery is cancelled. Sadly if this happens at just the wrong moment while acquiring a permit from the primary, that primary can be leaked and never be freed. Note that this is slightly better than it sounds - we only cancel recoveries on the source side if the primary shard itself is closed. Relates to #30316

bleskes · 2018-05-02T17:40:44Z

Thanks @jasontedor @ywelsch

… cancelled (#30316) At the end of recovery, we mark the recovering shard as "in sync" on the primary. From this point on the primary will treat any replication failure on it as critical and will reach out to the master to fail the shard. To do so, we wait for the local checkpoint of the recovered shard to be above the global checkpoint (in order to maintain global checkpoint invariant). If the master decides to cancel the allocation of the recovering shard while we wait, the method can currently hang and fail to return. It will also ignore the interrupts that are triggered by the cancelled recovery due to the primary closing. Note that this is crucial as this method is called while holding a primary permit. Since the method never comes back, the permit is never released. The unreleased permit will then block any primary relocation *and* while the primary is trying to relocate all indexing will be blocked for 30m as it waits to acquire the missing permit.

* master: Set the new lucene version for 6.4.0 [ML][TEST] Clean up jobs in ModelPlotIT Upgrade to 7.4.0-snapshot-1ed95c097b (#30357) Watcher: Ensure trigger service pauses execution (#30363) [DOCS] Added coming qualifiers in changelog [DOCS] Commented out empty sections in the changelog to fix the doc build. (#30372) Security: reduce garbage during index resolution (#30180) Make RepositoriesMetaData contents unmodifiable (#30361) Change quad tree max levels to 29. Closes #21191 (#29663) Test: use trial license in qa tests with security [ML] Add integration test for model plots (#30359) SQL: Fix bug caused by empty composites (#30343) [ML] Account for gaps in data counts after job is reopened (#30294) InternalEngineTests.testConcurrentOutOfOrderDocsOnReplica should use two documents (#30121) Change signature of Get Repositories Response (#30333) Tests: Use different watch ids per test in smoke test (#30331) [Docs] Add term query with normalizer example Adds Eclipse config for xpack licence headers (#30299) Watcher: Make start/stop cycle more predictable and synchronous (#30118) [test] add debug logging for packaging test [DOCS] Removed X-Pack Breaking Changes [DOCS] Fixes link to TLS LDAP info Update versions for start_trial after backport (#30218) Packaging: Set elasticsearch user to have non-existent homedir (#29007) [DOCS] Fixes broken links to bootstrap user (#30349) Fix NPE when CumulativeSum agg encounters null/empty bucket (#29641) Make licensing FIPS-140 compliant (#30251) [DOCS] Reorganizes authentication details in Stack Overview (#30280) Network: Remove http.enabled setting (#29601) Fix merging logic of Suggester Options (#29514) [DOCS] Adds LDAP realm configuration details (#30214) [DOCS] Adds native realm configuration details (#30215) ReplicationTracker.markAllocationIdAsInSync may hang if allocation is cancelled (#30316) [DOCS] Enables edit links for X-Pack pages (#30278) Packaging: Unmark systemd service file as a config file (#29004) SQL: Reduce number of ranges generated for comparisons (#30267) Tests: Simplify VersionUtils released version splitting (#30322) Cancelling a peer recovery on the source can leak a primary permit (#30318) Added changelog entry for deb prerelease version change (#30184) Convert server javadoc to html5 (#30279) Create default ES_TMPDIR on Windows (#30325) [Docs] Clarify `fuzzy_like_this` redirect (#30183) Post backport of #29658. Fix docs of the `_ignored` meta field. Remove MapperService#types(). (#29617) Remove useless version checks in REST tests. (#30165) Add a new `_ignored` meta field. (#29658) Move repository-azure fixture test to QA project (#30253) # Conflicts: # buildSrc/version.properties # server/src/test/java/org/elasticsearch/index/engine/InternalEngineTests.java

* 6.x: Stop forking javac (#30462) Fix tribe tests Docs: Use task_id in examples of tasks (#30436) Security: Rename IndexLifecycleManager to SecurityIndexManager (#30442) Packaging: Set elasticsearch user to have non-existent homedir (#29007) [Docs] Fix typo in cardinality-aggregation.asciidoc (#30434) Avoid NPE in `more_like_this` when field has zero tokens (#30365) Build: Switch to building javadoc with html5 (#30440) Add a quick tour of the project to CONTRIBUTING (#30187) Add stricter geohash parsing (#30376) Reindex: Use request flavored methods (#30317) Silence SplitIndexIT.testSplitIndexPrimaryTerm test failure. (#30432) Auto-expand replicas when adding or removing nodes (#30423) Silence IndexUpgradeIT test failures. (#30430) Fix line length violation in cache tests Add failing test for core cache deadlock [DOCS] convert forcemerge snippet Update forcemerge.asciidoc (#30113) Added zentity to the list of API extension plugins (#29143) Fix the search request default operation behavior doc (#29302) (#29405) Watcher: Mark watcher as started only after loading watches (#30403) Correct wording in log message (#30336) Do not fail snapshot when deleting a missing snapshotted file (#30332) AwaitsFix testCreateShrinkIndexToN DOCS: Correct mapping tags in put-template api DOCS: Fix broken link in the put index template api Add put index template api to high level rest client (#30400) [Docs] Add snippets for POS stop tags default value Remove entry inadvertently picked into changelog Move respect accept header on no handler to 6.3.1 Respect accept header on no handler (#30383) [Test] Add analysis-nori plugin to the vagrant tests [Rollup] Validate timezone in range queries (#30338) [Docs] Fix bad link [Docs] Fix end of section in the korean plugin docs add the Korean nori plugin to the change logs Expose the Lucene Korean analyzer module in a plugin (#30397) Docs: remove transport_client from CCS role example (#30263) Test: remove cluster permission from CCS user (#30262) Watcher: Remove unneeded index deletion in tests fix docs branch version fix lucene snapshot version Upgrade to 7.4.0-snapshot-1ed95c097b (#30357) [ML][TEST] Clean up jobs in ModelPlotIT Watcher: Ensure trigger service pauses execution (#30363) [DOCS] Fixes ordering of changelog sections [DOCS] Commented out empty sections in the changelog to fix the doc build. (#30372) Make RepositoriesMetaData contents unmodifiable (#30361) Change signature of Get Repositories Response (#30333) 6.x Backport: Terms query validate bug (#30319) InternalEngineTests.testConcurrentOutOfOrderDocsOnReplica should use two documents (#30121) Security: reduce garbage during index resolution (#30180) Test: use trial license in qa tests with security [ML] Add integration test for model plots (#30359) SQL: Fix bug caused by empty composites (#30343) [ML] Account for gaps in data counts after job is reopened (#30294) [ML] Refactor DataStreamDiagnostics to use array (#30129) Make licensing FIPS-140 compliant (#30251) Do not load global state when deleting a snapshot (#29278) Don't load global state when only restoring indices (#29239) Tests: Use different watch ids per test in smoke test (#30331) Watcher: Make start/stop cycle more predictable and synchronous (#30118) [Docs] Add term query with normalizer example Adds Eclipse config for xpack licence headers (#30299) Fix message content in users tool (#30293) [DOCS] Removed X-Pack breaking changes page [DOCS] Added security breaking change [DOCS] Fixes link to TLS LDAP info [DOCS] Merges X-Pack release notes into changelog (#30350) [DOCS] Fixes broken links to bootstrap user (#30349) [Docs] Remove errant changelog line Fix NPE when CumulativeSum agg encounters null/empty bucket (#29641) [DOCS] Reorganizes authentication details in Stack Overview (#30280) Tests: Simplify VersionUtils released version splitting (#30322) Fix merging logic of Suggester Options (#29514) ReplicationTracker.markAllocationIdAsInSync may hang if allocation is cancelled (#30316) [DOCS] Adds LDAP realm configuration details (#30214) [DOCS] Adds native realm configuration details (#30215) Disable SSL on testing old BWC nodes (#30337) [DOCS] Enables edit links for X-Pack pages Cancelling a peer recovery on the source can leak a primary permit (#30318) SQL: Reduce number of ranges generated for comparisons (#30267) [DOCS] Adds links to changelog sections Convert server javadoc to html5 (#30279) REST Client: Add Request object flavored methods (#29623) Create default ES_TMPDIR on Windows (#30325) [Docs] Clarify `fuzzy_like_this` redirect (#30183) Fix docs of the `_ignored` meta field. Add a new `_ignored` meta field. (#29658) Move repository-azure fixture test to QA project (#30253)

fix inSync on cancelled allocation

503e932

bleskes added >bug blocker :Distributed/Recovery Anything around constructing a new shard, either from a local or a remote source. v7.0.0 v6.4.0 v6.3.1 labels May 1, 2018

bleskes requested review from ywelsch and jasontedor May 1, 2018 21:59

jasontedor approved these changes May 1, 2018

View reviewed changes

bleskes mentioned this pull request May 1, 2018

Cancelling a peer recovery on the source can leak a primary permit #30318

Merged

ywelsch approved these changes May 2, 2018

View reviewed changes

spaces

cef12b1

bleskes mentioned this pull request May 2, 2018

[CI] RelocationIT testIndexAndRelocateConcurrently fails #29161

Closed

bleskes added v6.3.0 and removed v6.3.1 labels May 2, 2018

bleskes merged commit 1391716 into elastic:master May 2, 2018

bleskes deleted the recover_mark_in_sync branch May 2, 2018 17:40

jimczi added the v7.0.0-beta1 label Feb 7, 2019

jimczi removed the v7.0.0 label Feb 7, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ReplicationTracker.markAllocationIdAsInSync may hang if allocation is cancelled #30316

ReplicationTracker.markAllocationIdAsInSync may hang if allocation is cancelled #30316

bleskes commented May 1, 2018

jasontedor left a comment

jasontedor May 1, 2018

ywelsch left a comment

ywelsch commented May 2, 2018

bleskes commented May 2, 2018

bleskes commented May 2, 2018

bleskes commented May 2, 2018

ReplicationTracker.markAllocationIdAsInSync may hang if allocation is cancelled #30316

ReplicationTracker.markAllocationIdAsInSync may hang if allocation is cancelled #30316

Conversation

bleskes commented May 1, 2018

jasontedor left a comment

Choose a reason for hiding this comment

jasontedor May 1, 2018

Choose a reason for hiding this comment

ywelsch left a comment

Choose a reason for hiding this comment

ywelsch commented May 2, 2018

bleskes commented May 2, 2018

bleskes commented May 2, 2018

bleskes commented May 2, 2018