-
Notifications
You must be signed in to change notification settings - Fork 77
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make update caller receive update failed error on workflow cancellation #653
base: main
Are you sure you want to change the base?
Conversation
1af52e4
to
6b4d5ca
Compare
6b4d5ca
to
c721ec9
Compare
) | ||
self._in_progress_updates[ | ||
job.id | ||
].unfinished_policy = defn.unfinished_policy |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are now forced to create the _in_progress_updates
entry before this point (when we create the task), but we update the policy from the defn
here, since defn
is not known until handler coroutine execution time.
task, | ||
temporalio.workflow.HandlerUnfinishedPolicy.WARN_AND_ABANDON, | ||
job.id, | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are now storing Task
in the HandlerExecution
dataclass, in order to get hold of unfinished update handler tasks for cancellation.
task.add_done_callback(done_callback) | ||
self._in_progress_signals[id] = HandlerExecution( | ||
job.signal_name, task, defn.unfinished_policy | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't actually do anything with the signal tasks currently, but we add the Task
to the HandlerExecution
for consistency with update.
@@ -1845,6 +1856,14 @@ async def _run_top_level_workflow_function(self, coro: Awaitable[None]) -> None: | |||
err | |||
): | |||
self._add_command().cancel_workflow_execution.SetInParent() | |||
self._warn_if_unfinished_handlers() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In order to maintain our rules regarding unfinished handler warnings, we are forced to issue the warnings here in the case of cancellation (for the other workflow termination types, we continue to do it at the end of activate()
)
c721ec9
to
87f540e
Compare
@@ -1845,6 +1856,14 @@ async def _run_top_level_workflow_function(self, coro: Awaitable[None]) -> None: | |||
err | |||
): | |||
self._add_command().cancel_workflow_execution.SetInParent() | |||
self._warn_if_unfinished_handlers() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we should warn before sending the cancellation. A user should have the opportunity to receive and react to cancellation without warning.
# We do also warn in the case of workflow cancellation, but this is done | ||
# when handling the workflow cancellation, since we also cancel update | ||
# handlers at that time. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO it should be at workflow completion time where we warn always, regardless of reason for completion. This includes cancellation. Users should ensure that their handlers complete even during cancellation, but that doesn't need to be checked until the actual workflow completion is being recorded (primary workflow method could be swallowing cancellation, throwing some other error, etc).
) | ||
|
||
|
||
async def test_update_cancellation(client: Client): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
May be worth having a test for doing a cleanup step on cancellation (e.g. in a finally
). Can do this in the update or the workflow. This is common in Temporal code and I fear cleanup code in an update with issue a warning.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, good point. So to restate what you're saying:
If a workflow cancellation request comes in, and a handler is unfinished, the workflow author should be able to allow the workflow to be cancelled, and to have it gracefully terminate the handler, and not to be criticized by any warning.
That is true today: we don't cancel signals or updates on workflow cancellation, but a user could write a workflow that does, and they would escape the warning, because the handler would not be alive at check time.
But my PR breaks it, by attempting to make the check at workflow cancellation time: it would classify a handler as unfinished when in fact it's about to be gracefully terminated.
The reason my PR does that is that it wanted to continue to warn the workflow author when workflow cancellation causes an update cancellation. And the reason it wants to do that is to maintain consistency with signals: when the workflow is canceled, if we're going to warn on unfinished signal handlers (which we do) then we should warn on unfinished update handlers also.
I think that these desires are mutually incompatible. We have to allow a user to clean up their update gracefully without the warning, as you point out. I see two solutions:
-
to change things so that we no longer issue unfinished handler warnings for either signal or update, in the case of workflow cancellation.
-
to automatically cancel signals also (with a workflow logic flag)
There is a strong incentive (caller UX) for automatically canceling update handlers on workflow cancellation. We have never automatically cancelled signal handlers in the Python SDK, and doing so would require a flag as it may be backwards incompatible with replay of existing histories. The inconsistency in cancelling update handlers but not signal handlers would only be a problem if there is a potential for observable side effects.
I'll carry on thinking about it but for now I've updated the PR to stop issuing the unfinished warning on workflow cancellation, for both signal and update.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
to change things so that we no longer issue unfinished handler warnings for either signal or update, in the case of workflow cancellation.
[...]
I'll carry on thinking about it but for now I've updated the PR to stop issuing the unfinished warning on workflow cancellation, for both signal and update.
I think we should warn on unfinished handlers upon workflow completion regardless of reason for completion and regardless of reason handlers are unfinished. I think the problem with the PR before was it was warning before cancellation is processed and I think the previous behavior of warning after cancellation/completion is processed is best. So I think the line at #653 (comment) should not be removed.
It does mean it's a pain for users doing "cleanup" in update handlers, but IMO that's an education issue. We have to educate them that they should not exit the primary run function until they have done cleanup in handlers. This is similar to process exiting before an async thing has been given a chance to clean up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're suggesting that we continue to do the check late, and warn on workflow cancellation.
We can't do that, because in the common case we will then warn on unfinished signals (since these are not automatically cancelled, and hence the handlers will still be alive) but not warn on unfinished updates (since these are automatically cancelled, and hence the handlers will be dead).
(I did try to explain that in the comment above #653 (comment), but it is turning out to be a slightly tricky subject! And my comment isn't very clear).
This is why I've switched the PR to no longer warn on cancellation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To confirm from our off-PR discussion, we are still warning on unfinished handler regardless of reason (i.e. cancellation)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, still thinking about the design here. I've put the PR into draft mode for now.
be8ec97
to
2d13fa6
Compare
2d13fa6
to
423d2d5
Compare
Fixes #575
Today, if a workflow terminates via cancellation with an in-progress update, the update caller sees a gRPC
NOT FOUND
error (because the server aborts the update after the workflow is closed).This PR causes the caller to see
WorkflowUpdateFailedError
, which is what the caller sees in this situation with all other Temporal SDKs.It does so by cancelling the asyncio
Task
representing the update handler execution; the resulting exception in the update handler coroutine causes the update to fail.How this was tested