Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Workaround for updates delivered in first WFT #1026

Merged
merged 2 commits into from
Feb 1, 2023
Merged

Workaround for updates delivered in first WFT #1026

merged 2 commits into from
Feb 1, 2023

Conversation

mmcshane
Copy link
Contributor

@mmcshane mmcshane commented Feb 1, 2023

What was changed

Adds a conditional yield point during update processing to allow the scheduler to do other things like registering update handlers.

Why?

If the first WFT includes an update we can have a problem because the workflow function itself hasn't run yet to register update handlers and the update will be rejected. Adding this yield allows the scheduler to execute the workflow function up to the first blocking point, at which point the handler(s) will be registered by user code.

Checklist

  1. Closes

  2. How was this tested:

Unit test here and I was able to remove all the time.Sleep(1*time.Second) from the features tests.

  1. Any docs updates needed?

No

If the first WFT includes an update we can have a problem because the
workflow function itself hasn't run yet to register update handlers and
the update will be rejected. Adding this yield allows the scheduler to
execute the workflow function up to the first blocking point, at which
point the handler(s) will be registered by user code.
@mmcshane mmcshane requested a review from a team as a code owner February 1, 2023 15:20
@mmcshane
Copy link
Contributor Author

mmcshane commented Feb 1, 2023

I don't love this - it's a little brittle and dependent on current scheduler design - but I haven't been able to figure out a better/cleaner way to achieve the same thing. Would love to hear some other suggestions.

internal/internal_update.go Outdated Show resolved Hide resolved
// delivered before the workflow function itself has run and had a
// chance to register update handlers) then we yield control back to the
// scheduler to allow handler registration to occur. The sceduler will
// resume this coroutine after others have run to a blocking point.
Copy link
Member

@cretz cretz Feb 1, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hrmm. I am racking my brain trying to think of a downside to this. We will have to discuss what update-with-start looks like I think (just whether the update event coming before the workflow start event is even an issue).

I can't think of any downsides to this off the top of my head. Even if there are multiple updates before workflow start I think a yield from each is safe. And I don't think yielding in every case of no handlers no matter how far into the workflow we are is harmful either (just pushing this to the end of the loop). Is there any chance this yield could not get completed before next WFT? I know there's some "run until all yielded" code/logic before commands are sent as a WFT completion, so I want to confirm that this "yield" is not considered waiting on Temporal like the other yields.

I would like to request a update-before-worker-start integration test in the features repo (though all other languages offer pre-start handler registration unlike Go).

Copy link
Contributor Author

@mmcshane mmcshane Feb 1, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The downside for me is that this aspect of update delivery (admittedly an edge-case) becomes dependent on the precise mechanism of how workflow execution happens and it does so in a somewhat invisible way. In particular there is an expectation encoded here that during startup the root coro always gets scheduler slot zero and that it yields exactly once before running the user-supplied workflow code.

As for the update-before-start integration test ... that's actually all of them! Prior to this PR, I always had to time.Sleep(1*time.Second) between launching the workflow and submitting the first update. Without that sleep, the first update sees an error returned "no registered handler for 'foo'" With this change, that sleep can be removed.

Copy link
Contributor Author

@mmcshane mmcshane Feb 1, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missed one of your questions...

Is there any chance this yield could not get completed before next WFT? I know there's some "run until all yielded" code/logic before commands are sent as a WFT completion, so I want to confirm that this "yield" is not considered waiting on Temporal like the other yields.

What I found is that yield is lighter than the blocking that you might otherwise see (e.g. with workflow.Sleep(ctx, 1)) in that the yielded coroutine is not considered to have blocked and thus stays in a runnable state within the current call to ExecuteUntilAllBlocked on the coro dispatcher.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The downside for me is that this aspect of update delivery (admittedly an edge-case) becomes dependent on the precise mechanism of how workflow execution happens and it does so in a somewhat invisible way.

This is the accepted cost of Go SDK's inability to predefine anything about a workflow before starting it. No worries (much cleaner in other langs)

is not considered to have blocked [...] within the current call to ExecuteUntilAllBlocked

Perfect, thanks

Co-authored-by: Chad Retz <chad.retz@gmail.com>
// delivered before the workflow function itself has run and had a
// chance to register update handlers) then we yield control back to the
// scheduler to allow handler registration to occur. The sceduler will
// resume this coroutine after others have run to a blocking point.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The downside for me is that this aspect of update delivery (admittedly an edge-case) becomes dependent on the precise mechanism of how workflow execution happens and it does so in a somewhat invisible way.

This is the accepted cost of Go SDK's inability to predefine anything about a workflow before starting it. No worries (much cleaner in other langs)

is not considered to have blocked [...] within the current call to ExecuteUntilAllBlocked

Perfect, thanks

@mmcshane mmcshane merged commit f037c9d into temporalio:master Feb 1, 2023
@mmcshane mmcshane deleted the mpm/update-in-first-wft branch February 1, 2023 16:18
// chance to register update handlers) then we yield control back to the
// scheduler to allow handler registration to occur. The scheduler will
// resume this coroutine after others have run to a blocking point.
if len(eo.updateHandlers) == 0 {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a few of concerns here:

  1. What are the delivery ordering semantics after adding this yield?
    What if we have a signal-update-signal sequence?
    With this solution, will the update necessarily be delivered after the signals?

  2. What about the future async update guarantees?
    Are we okay rejecting any updates if a handler is not registered by the end of a workflow task? This is different than signals, which are unidirectional and are okay to buffer.

  3. What if user wants to run a local activity before registering a dynamic update handler?
    Does using this one time yield mechanism prevent that case?

I'm leaning towards having a solution where we buffer updates and (at least with the fast path) we reject them at the end of workflow task processing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants