Interpret `?timeout=-1` as infinite ack timeout #107675

DaveCTurner · 2024-04-22T09:05:25Z

APIs which perform cluster state updates typically accept the
?master_timeout= and ?timeout= parameters to respectively set the
pending task queue timeout and the acking timeout for the cluster state
update. Both of these parameters accept the value -1, but
?master_timeout=-1 means to wait indefinitely whereas ?timeout=-1
means the same thing as ?timeout=0, namely that acking times out
immediately on commit.

There are some situations where it makes sense to wait for as long as
possible for nodes to ack a cluster state update. In practice this wait
is bounded by other mechanisms (e.g. the lag detector will remove the
node from the cluster after a couple of minutes of failing to apply
cluster state updates) but these are not really the concern of clients.

Therefore with this commit we change the meaning of ?timeout=-1 to
mean that the acking timeout is infinite.

APIs which perform cluster state updates typically accept the `?master_timeout=` and `?timeout=` parameters to respectively set the pending task queue timeout and the acking timeout for the cluster state update. Both of these parameters accept the value `-1`, but `?master_timeout=-1` means to wait indefinitely whereas `?timeout=-1` means the same thing as `?timeout=0`, namely that acking times out immediately on commit. There are some situations where it makes sense to wait for as long as possible for nodes to ack a cluster state update. In practice this wait is bounded by other mechanisms (e.g. the lag detector will remove the node from the cluster after a couple of minutes of failing to apply cluster state updates) but these are not really the concern of clients. Therefore with this commit we change the meaning of `?timeout=-1` to mean that the acking timeout is infinite.

github-actions · 2024-04-22T09:05:36Z

Documentation preview:

✨ Changed pages

elasticsearchmachine · 2024-04-22T09:05:48Z

Pinging @elastic/es-distributed (Team:Distributed)

elasticsearchmachine · 2024-04-22T09:05:49Z

Hi @DaveCTurner, I've created a changelog YAML for you. Note that since this PR is labelled >breaking, you need to update the changelog YAML to fill out the extended information sections.

DaveCTurner · 2024-04-26T09:45:16Z

We've reached consensus that this is an acceptable change to make so this is good to review now.

ywangd

LGTM

Sorry for the delayed review. It is always a great learning opportunity to see changes around master coordination. Thanks!

ywangd · 2024-04-30T11:59:56Z

server/src/main/java/org/elasticsearch/cluster/service/MasterService.java

+            if (ackTimeout.millis() < 0) {
+                if (countDown.countDown()) {
+                    finish();
+                }
+                return;


Not really related to this PR but for my learning. I am trying to understand the sequence of how cluster state happens. Is the following order correct?

Master sends cluster state publish requests to all nodes

After receiving publish response from a quorum of nodes, this onCommit is called.

Master sends apply commit requests to all nodes

On each apply commit response, master calls onNodeAck.

When step 2 completes and all nodes responded in step 4, the overall request is considered as acknowledged.

Reading the code here, it seems to me that when onCommit is called, there is a chance that step 4 has already completed (since it checks the countDown and call finish). But I am not sure how that can happen since onCommit is called before any apply commit request can be sent (code)? Or is it to take care of single node cluster? I must be missing something (or even many things). I'd appreciate if you could help clarify it. Thanks!

The sequence is right.

You could be right about onCommit never actually finishing the acking today. FWIW this code was added in #31303 (6.4.0) and we've rewritten much of the surrounding code since then. That said, it's a fairly delicate argument to prove this, whereas the "obviously correct" code as written today is robust and not meaningfully less efficient.

In particular, it's hard to see that onCommit is called before any ApplyCommit is sent. The relevant code is org.elasticsearch.cluster.coordination.Publication.PublicationTarget#handlePublishResponse:

void handlePublishResponse(PublishResponse publishResponse) { assert isWaitingForQuorum() : this; logger.trace("handlePublishResponse: handling [{}] from [{}])", publishResponse, discoveryNode); if (applyCommitRequest.isPresent()) { sendApplyCommit(); } else { try { Publication.this.handlePublishResponse(discoveryNode, publishResponse).ifPresent(applyCommit -> { assert applyCommitRequest.isPresent() == false; applyCommitRequest = Optional.of(applyCommit); ackListener.onCommit(TimeValue.timeValueMillis(currentTimeSupplier.getAsLong() - startTime)); publicationTargets.stream() .filter(PublicationTarget::isWaitingForQuorum) .forEach(PublicationTarget::sendApplyCommit); }); } catch (Exception e) { setFailed(e); onPossibleCommitFailure(); } } }

As written, you could have the committing thread setting applyCommitRequest, then pausing before calling ackListener.onCommit, while another thread concurrently processes a later PublishResponse, discovers that applyCommitRequest.isPresent() and sends the ApplyCommit. But then if you look at how this code is called eventually you discover that this all happens underneath Coordinator#mutex so these things cannot happen concurrently. But relying on a mutex in Coordinator to protect against concurrency in Publication as part of the correctness argument for MasterService is too deeply opaque for my liking.

Thanks a lot for the explanation. TIL 🙇

ywangd · 2024-04-30T12:15:45Z

docs/reference/rest-api/common-parms.asciidoc

 Can also be set to `-1` to indicate that the request should never timeout.
 end::master-timeout[]


It would be great to also explain how master_timeout is computed. By reading the code, it seems to start when the task is added to the queue and expires if the task does not get processed by the master. And I believe during this waiting, the task is visible via the PendingTasks API?

elasticsearchmachine · 2024-04-30T12:54:27Z

Hi @DaveCTurner, I've updated the changelog YAML for you. Note that since this PR is labelled >breaking, you need to update the changelog YAML to fill out the extended information sections.

This test doesn't fail anymore, I've run it 1000 times locally. This test got introduced in #107050, and I believe the test got fixed in #107675. Unfortunately, the got muted before #107675 got merged, so I can't confirm that #107675 fixed the test on CI.

This test doesn't fail anymore, I've run it 1000 times locally. This test got introduced in #107050, and I believe the test got fixed in #107675. Unfortunately, the got muted before #107675 got merged, so I can't confirm that PR actually fixed the test on CI.

DaveCTurner added >enhancement >breaking :Distributed Coordination/Cluster Coordination Cluster formation and cluster state publication, including cluster membership and fault detection. v8.15.0 labels Apr 22, 2024

elasticsearchmachine added the Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. label Apr 22, 2024

Update docs/changelog/107675.yaml

3377636

DaveCTurner added 2 commits April 22, 2024 10:20

Update changelog

7f4aab3

Merge branch 'main' into 2024/04/22/infinite-ack-timeout

33b87f6

DaveCTurner requested a review from ywangd April 26, 2024 09:44

ywangd approved these changes Apr 30, 2024

View reviewed changes

DaveCTurner added the auto-merge-without-approval Automatically merge pull request when CI checks pass (NB doesn't wait for reviews!) label Apr 30, 2024

Update docs/changelog/107675.yaml

fc0854e

DaveCTurner added 2 commits April 30, 2024 14:01

Merge branch 'main' into 2024/04/22/infinite-ack-timeout

cb88b5d

Fix changelog

3a557e2

DaveCTurner removed the >enhancement label Apr 30, 2024

elasticsearchmachine merged commit fc287bd into elastic:main Apr 30, 2024
15 checks passed

DaveCTurner deleted the 2024/04/22/infinite-ack-timeout branch April 30, 2024 13:54

arteam mentioned this pull request May 2, 2024

Unmute SnapshotStatusApisIT#testInfiniteTimeout #108178

Merged

arteam mentioned this pull request May 3, 2024

[CI] SnapshotStatusApisIT testInfiniteTimeout failing #107405

Closed

DaveCTurner mentioned this pull request May 3, 2024

Add support for infinite ack timeout #107044

Closed

DaveCTurner restored the 2024/04/22/infinite-ack-timeout branch June 17, 2024 06:17

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Interpret `?timeout=-1` as infinite ack timeout #107675

Interpret `?timeout=-1` as infinite ack timeout #107675

DaveCTurner commented Apr 22, 2024

github-actions bot commented Apr 22, 2024

elasticsearchmachine commented Apr 22, 2024

elasticsearchmachine commented Apr 22, 2024

DaveCTurner commented Apr 26, 2024

ywangd left a comment

ywangd Apr 30, 2024

DaveCTurner Apr 30, 2024

ywangd May 1, 2024

ywangd Apr 30, 2024

elasticsearchmachine commented Apr 30, 2024

		Can also be set to `-1` to indicate that the request should never timeout.
		end::master-timeout[]

Interpret ?timeout=-1 as infinite ack timeout #107675

Interpret ?timeout=-1 as infinite ack timeout #107675

Conversation

DaveCTurner commented Apr 22, 2024

github-actions bot commented Apr 22, 2024

elasticsearchmachine commented Apr 22, 2024

elasticsearchmachine commented Apr 22, 2024

DaveCTurner commented Apr 26, 2024

ywangd left a comment

Choose a reason for hiding this comment

ywangd Apr 30, 2024

Choose a reason for hiding this comment

DaveCTurner Apr 30, 2024

Choose a reason for hiding this comment

ywangd May 1, 2024

Choose a reason for hiding this comment

ywangd Apr 30, 2024

Choose a reason for hiding this comment

elasticsearchmachine commented Apr 30, 2024

Interpret `?timeout=-1` as infinite ack timeout #107675

Interpret `?timeout=-1` as infinite ack timeout #107675