Replicate write actions before fsyncing them #49746

ywelsch · 2019-12-02T09:07:28Z

This PR fixes a number of issues with data replication:

Local and global checkpoints are not updated after the new operations have been fsynced, but might capture a state before the fsync. The reason why this probably went undetected for so long is that AsyncIOProcessor is synchronous if you index one item at a time, and hence working as intended unless you have a high enough level of concurrent indexing. As we rely in other places on the assumption that we have an up-to-date local checkpoint in case of synchronous translog durability, there's a risk for the local and global checkpoints not to be up-to-date after replication completes, and that this won't be corrected by the periodic global checkpoint sync.
AsyncIOProcessor also has another "bad" side effect here: if you index one bulk at a time, the bulk is always first fsynced on the primary before being sent to the replica. Further, if one thread is tasked by AsyncIOProcessor to drain the processing queue and fsync, other threads can easily pile more bulk requests on top of that thread. Things are not very fair here, and the thread might continue doing a lot more fsyncs before returning (as the other threads pile more and more on top), which blocks it from returning as a replication request (e.g. if this thread is on the primary, it blocks the replication requests to the replicas from going out, and delaying checkpoint advancement).

This PR fixes all these issues, and also simplifies the code that coordinates all the after write actions.

Currently this is rarely an issue as the AsyncIOProcessor runs synchronously as long as there is one request at a time This means that if you do non-concurrent indexing, that we always first fsync on the primary before we do on the replica.

…al checkpoint after fsync

…sync

elasticmachine · 2019-12-02T09:07:30Z

Pinging @elastic/es-distributed (:Distributed/CRUD)

original-brownbear · 2019-12-02T10:01:48Z

@ywelsch just a heads-up, this still needs a fix to make org.elasticsearch.xpack.ccr.action.ShardFollowTaskReplicationTests.CcrAction#performOnPrimary compile:

> Task :x-pack:plugin:ccr:compileTestJava FAILED
/home/brownbear/src/elasticsearch/x-pack/plugin/ccr/src/test/java/org/elasticsearch/xpack/ccr/action/ShardFollowTaskReplicationTests.java:682: error: cannot find symbol
                        ccrResult.respond(listener);
                                 ^
  symbol:   method respond(ActionListener<BulkShardOperationsResponse>)
  location: variable ccrResult of type WritePrimaryResult<BulkShardOperationsRequest,BulkShardOperationsResponse>
Note: Some input files use or override a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
1 error

original-brownbear · 2019-12-02T13:21:19Z

Jenkins run elasticsearch-ci/1 (test failure is unrelated GCS mock repo issue, that I'm investigating right now)

henningandersen

Thanks for looking into this @ywelsch . I did an initial read-through and have primarily one comment. AFAICS, we now reach out to replicas before/concurrently with fsync'ing the primary. I wonder if this is safe in case the primary dies after sending replication messages but before fsync'ing. In that case, I am not sure who will mark the primary as stale/not in-sync?

Also, in the case where the primary cannot talk to any replicas and at the same time the primary's own fsync fails, we might mark all copies stale (except for the todo noted in the code now).

server/src/main/java/org/elasticsearch/action/support/replication/ReplicationOperation.java

...r/src/main/java/org/elasticsearch/action/support/replication/TransportReplicationAction.java

henningandersen · 2019-12-02T15:02:13Z

We discussed my concerns on another channel. The first one is not valid, since the global checkpoint will only advance when all in-sync copies have advanced their fsync'ed LCP.

The second one is really an existing issue that should be addressed separately from this PR.

original-brownbear

Code looks good :) But I don't understand the implications of this well enough to LGTM I'm afraid :(

...in/ccr/src/test/java/org/elasticsearch/xpack/ccr/action/ShardFollowTaskReplicationTests.java

...rc/main/java/org/elasticsearch/xpack/ccr/action/bulk/TransportBulkShardOperationsAction.java

...in/ccr/src/test/java/org/elasticsearch/xpack/ccr/action/ShardFollowTaskReplicationTests.java

dnhatn

I like the fix. I left a small comment about the listener. LGTM.

server/src/main/java/org/elasticsearch/action/support/replication/ReplicationOperation.java

...r/src/main/java/org/elasticsearch/action/support/replication/TransportReplicationAction.java

ywelsch · 2019-12-03T08:24:25Z

Thanks for all your comments. I've pushed a5ed86d.

…sync

This commit fixes a number of issues with data replication: - Local and global checkpoints are not updated after the new operations have been fsynced, but might capture a state before the fsync. The reason why this probably went undetected for so long is that AsyncIOProcessor is synchronous if you index one item at a time, and hence working as intended unless you have a high enough level of concurrent indexing. As we rely in other places on the assumption that we have an up-to-date local checkpoint in case of synchronous translog durability, there's a risk for the local and global checkpoints not to be up-to-date after replication completes, and that this won't be corrected by the periodic global checkpoint sync. - AsyncIOProcessor also has another "bad" side effect here: if you index one bulk at a time, the bulk is always first fsynced on the primary before being sent to the replica. Further, if one thread is tasked by AsyncIOProcessor to drain the processing queue and fsync, other threads can easily pile more bulk requests on top of that thread. Things are not very fair here, and the thread might continue doing a lot more fsyncs before returning (as the other threads pile more and more on top), which blocks it from returning as a replication request (e.g. if this thread is on the primary, it blocks the replication requests to the replicas from going out, and delaying checkpoint advancement). This commit fixes all these issues, and also simplifies the code that coordinates all the after write actions.

ywelsch added 10 commits November 29, 2019 17:40

Only fsync primary after replicating operation, and update local/glob…

5e379c9

…al checkpoint after fsync

fsync primary after replication

a0a5ada

simplify listeners

0eac0b2

simplify replica even more

173515c

remove extra callback

a863113

Remove test

5061faa

add assertions

b53f25a

reuse code

1cb2680

Merge remote-tracking branch 'elastic/master' into return-lcp-after-f…

9790dc9

…sync

ywelsch added >bug :Distributed Indexing/CRUD A catch all label for issues around indexing, updating and getting a doc by id. Not search. v8.0.0 v7.6.0 labels Dec 2, 2019

ywelsch requested review from dnhatn and henningandersen December 2, 2019 09:07

ywelsch added 2 commits December 2, 2019 12:44

fix CCR tests

0207ef5

cs

d995da5

henningandersen reviewed Dec 2, 2019

View reviewed changes

server/src/main/java/org/elasticsearch/action/support/replication/ReplicationOperation.java Show resolved Hide resolved

...r/src/main/java/org/elasticsearch/action/support/replication/TransportReplicationAction.java Outdated Show resolved Hide resolved

original-brownbear reviewed Dec 2, 2019

View reviewed changes

dnhatn approved these changes Dec 2, 2019

View reviewed changes

server/src/main/java/org/elasticsearch/action/support/replication/ReplicationOperation.java Outdated Show resolved Hide resolved

...r/src/main/java/org/elasticsearch/action/support/replication/TransportReplicationAction.java Outdated Show resolved Hide resolved

review comments

a5ed86d

ywelsch requested review from henningandersen and original-brownbear December 3, 2019 08:24

Merge remote-tracking branch 'elastic/master' into return-lcp-after-f…

28165b2

…sync

ywelsch merged commit 8c165e0 into elastic:master Dec 3, 2019

This was referenced Feb 3, 2020

[meta] 7.6 release elastic/elasticsearch-net#4340

Closed

[meta] 7.6 release elastic/elasticsearch-net#4341

Closed

mfussenegger mentioned this pull request Mar 24, 2020

ES Backports crate/crate#9796

Closed

37 tasks

jakelandis added v8.0.0-alpha1 and removed v8.0.0 labels Jul 26, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Replicate write actions before fsyncing them #49746

Replicate write actions before fsyncing them #49746

ywelsch commented Dec 2, 2019

elasticmachine commented Dec 2, 2019

original-brownbear commented Dec 2, 2019

original-brownbear commented Dec 2, 2019

henningandersen left a comment

henningandersen commented Dec 2, 2019

original-brownbear left a comment

dnhatn left a comment

ywelsch commented Dec 3, 2019

Replicate write actions before fsyncing them #49746

Replicate write actions before fsyncing them #49746

Conversation

ywelsch commented Dec 2, 2019

elasticmachine commented Dec 2, 2019

original-brownbear commented Dec 2, 2019

original-brownbear commented Dec 2, 2019

henningandersen left a comment

Choose a reason for hiding this comment

henningandersen commented Dec 2, 2019

original-brownbear left a comment

Choose a reason for hiding this comment

dnhatn left a comment

Choose a reason for hiding this comment

ywelsch commented Dec 3, 2019