Workaround for updates delivered in first WFT #1026

mmcshane · 2023-02-01T15:20:11Z

What was changed

Adds a conditional yield point during update processing to allow the scheduler to do other things like registering update handlers.

Why?

If the first WFT includes an update we can have a problem because the workflow function itself hasn't run yet to register update handlers and the update will be rejected. Adding this yield allows the scheduler to execute the workflow function up to the first blocking point, at which point the handler(s) will be registered by user code.

Checklist

Closes
How was this tested:

Unit test here and I was able to remove all the time.Sleep(1*time.Second) from the features tests.

Any docs updates needed?

No

If the first WFT includes an update we can have a problem because the workflow function itself hasn't run yet to register update handlers and the update will be rejected. Adding this yield allows the scheduler to execute the workflow function up to the first blocking point, at which point the handler(s) will be registered by user code.

mmcshane · 2023-02-01T15:21:57Z

I don't love this - it's a little brittle and dependent on current scheduler design - but I haven't been able to figure out a better/cleaner way to achieve the same thing. Would love to hear some other suggestions.

internal/internal_update.go

cretz · 2023-02-01T15:27:15Z

internal/internal_update.go

+		// delivered before the workflow function itself has run and had a
+		// chance to register update handlers) then we yield control back to the
+		// scheduler to allow handler registration to occur. The sceduler will
+		// resume this coroutine after others have run to a blocking point.


Hrmm. I am racking my brain trying to think of a downside to this. We will have to discuss what update-with-start looks like I think (just whether the update event coming before the workflow start event is even an issue).

I can't think of any downsides to this off the top of my head. Even if there are multiple updates before workflow start I think a yield from each is safe. And I don't think yielding in every case of no handlers no matter how far into the workflow we are is harmful either (just pushing this to the end of the loop). Is there any chance this yield could not get completed before next WFT? I know there's some "run until all yielded" code/logic before commands are sent as a WFT completion, so I want to confirm that this "yield" is not considered waiting on Temporal like the other yields.

I would like to request a update-before-worker-start integration test in the features repo (though all other languages offer pre-start handler registration unlike Go).

The downside for me is that this aspect of update delivery (admittedly an edge-case) becomes dependent on the precise mechanism of how workflow execution happens and it does so in a somewhat invisible way. In particular there is an expectation encoded here that during startup the root coro always gets scheduler slot zero and that it yields exactly once before running the user-supplied workflow code.

As for the update-before-start integration test ... that's actually all of them! Prior to this PR, I always had to time.Sleep(1*time.Second) between launching the workflow and submitting the first update. Without that sleep, the first update sees an error returned "no registered handler for 'foo'" With this change, that sleep can be removed.

Missed one of your questions...

Is there any chance this yield could not get completed before next WFT? I know there's some "run until all yielded" code/logic before commands are sent as a WFT completion, so I want to confirm that this "yield" is not considered waiting on Temporal like the other yields.

What I found is that yield is lighter than the blocking that you might otherwise see (e.g. with workflow.Sleep(ctx, 1)) in that the yielded coroutine is not considered to have blocked and thus stays in a runnable state within the current call to ExecuteUntilAllBlocked on the coro dispatcher.

The downside for me is that this aspect of update delivery (admittedly an edge-case) becomes dependent on the precise mechanism of how workflow execution happens and it does so in a somewhat invisible way.

This is the accepted cost of Go SDK's inability to predefine anything about a workflow before starting it. No worries (much cleaner in other langs)

is not considered to have blocked [...] within the current call to ExecuteUntilAllBlocked

Perfect, thanks

Co-authored-by: Chad Retz <chad.retz@gmail.com>

cretz · 2023-02-01T16:16:42Z

internal/internal_update.go

+		// delivered before the workflow function itself has run and had a
+		// chance to register update handlers) then we yield control back to the
+		// scheduler to allow handler registration to occur. The sceduler will
+		// resume this coroutine after others have run to a blocking point.


The downside for me is that this aspect of update delivery (admittedly an edge-case) becomes dependent on the precise mechanism of how workflow execution happens and it does so in a somewhat invisible way.

This is the accepted cost of Go SDK's inability to predefine anything about a workflow before starting it. No worries (much cleaner in other langs)

is not considered to have blocked [...] within the current call to ExecuteUntilAllBlocked

Perfect, thanks

bergundy · 2023-02-01T20:57:30Z

internal/internal_update.go

+		// chance to register update handlers) then we yield control back to the
+		// scheduler to allow handler registration to occur. The scheduler will
+		// resume this coroutine after others have run to a blocking point.
+		if len(eo.updateHandlers) == 0 {


I have a few of concerns here:

What are the delivery ordering semantics after adding this yield?
What if we have a signal-update-signal sequence?
With this solution, will the update necessarily be delivered after the signals?

What about the future async update guarantees?
Are we okay rejecting any updates if a handler is not registered by the end of a workflow task? This is different than signals, which are unidirectional and are okay to buffer.

What if user wants to run a local activity before registering a dynamic update handler?
Does using this one time yield mechanism prevent that case?

I'm leaning towards having a solution where we buffer updates and (at least with the fast path) we reject them at the end of workflow task processing.

mmcshane requested a review from a team as a code owner February 1, 2023 15:20

cretz reviewed Feb 1, 2023

View reviewed changes

Update internal/internal_update.go

142a890

Co-authored-by: Chad Retz <chad.retz@gmail.com>

cretz approved these changes Feb 1, 2023

View reviewed changes

mmcshane merged commit f037c9d into temporalio:master Feb 1, 2023

mmcshane deleted the mpm/update-in-first-wft branch February 1, 2023 16:18

bergundy reviewed Feb 1, 2023

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Workaround for updates delivered in first WFT #1026

Workaround for updates delivered in first WFT #1026

mmcshane commented Feb 1, 2023

mmcshane commented Feb 1, 2023

cretz Feb 1, 2023 •

edited

Loading

mmcshane Feb 1, 2023 •

edited

Loading

mmcshane Feb 1, 2023 •

edited

Loading

cretz Feb 1, 2023

cretz Feb 1, 2023

bergundy Feb 1, 2023

Workaround for updates delivered in first WFT #1026

Workaround for updates delivered in first WFT #1026

Conversation

mmcshane commented Feb 1, 2023

What was changed

Why?

Checklist

mmcshane commented Feb 1, 2023

cretz Feb 1, 2023 • edited Loading

Choose a reason for hiding this comment

mmcshane Feb 1, 2023 • edited Loading

Choose a reason for hiding this comment

mmcshane Feb 1, 2023 • edited Loading

Choose a reason for hiding this comment

cretz Feb 1, 2023

Choose a reason for hiding this comment

cretz Feb 1, 2023

Choose a reason for hiding this comment

bergundy Feb 1, 2023

Choose a reason for hiding this comment

cretz Feb 1, 2023 •

edited

Loading

mmcshane Feb 1, 2023 •

edited

Loading

mmcshane Feb 1, 2023 •

edited

Loading