Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Workflow got hanged if spawn childWorkflow with same workflowId #1041

Closed
raymondsze opened this issue Feb 16, 2023 · 9 comments · Fixed by #1058
Closed

Workflow got hanged if spawn childWorkflow with same workflowId #1041

raymondsze opened this issue Feb 16, 2023 · 9 comments · Fixed by #1058

Comments

@raymondsze
Copy link

Expected Behavior

I expect the parent workflow can be run successfully and only one child workflow is executed (because same workflow id).

Actual Behavior

It hanged in the second child workflow childWorkflowFuture.GetChildWorkflowExecution().Get(ctx, &childWE).

Steps to Reproduce the Problem

I followed the documentation in https://legacy-documentation-sdks.temporal.io/go/spawn-a-child-workflow-execution.
If I spawn two childworkflow, both with the same workflowId. The second one will be hanged when calling childWorkflowFuture.GetChildWorkflowExecution().Get(ctx, &childWE). What I want to do is, if the child workflow is already registered and running, then don't do anything. If the child workflow is completed or not yet registered, then register and run the child workflow without waiting.

Specifications

  • Version:
  • Platform:
@cretz
Copy link
Member

cretz commented Feb 16, 2023

Can you provide the SDK version you are using (to see if it includes #999) and if possible provide a small amount of code to demonstrate this? You should be able to respond to the duplicate error coming back from that Get.

@raymondsze
Copy link
Author

raymondsze commented Feb 17, 2023

Sure. The sdk version is the latest, v1.21.1

func ConsumerWorkflow(ctx workflow.Context) error {
	logger := workflow.GetLogger(ctx)
	for i := 0; i < 2; i++ {
		childWorkflowOptions := workflow.ChildWorkflowOptions{
			ParentClosePolicy:     enums.PARENT_CLOSE_POLICY_ABANDON,
			WorkflowID:            fmt.Sprintf("consumer-%v-%v", workflow.GetInfo(ctx).WorkflowExecution.ID, 0),
			WorkflowIDReusePolicy: enums.WORKFLOW_ID_REUSE_POLICY_REJECT_DUPLICATE,
		}
		ctx = workflow.WithChildOptions(ctx, childWorkflowOptions)
		logger.Info("BEFORE EXECUTE!")
		childWorkflow := workflow.ExecuteChildWorkflow(ctx, ChildWorkflow, i)
		logger.Info("AFTER EXECUTE!")
		logger.Info("BEFORE GET EXECUTION!")
		_ = childWorkflow.GetChildWorkflowExecution().Get(ctx, nil)
		logger.Info("AFTER GET EXECUTION!")
	}
	return nil
}

Here is the log

Worker

2023/02/17 11:22:28 INFO  No logger configured for temporal client. Created default one.
2023/02/17 11:22:28 INFO  Started Worker Namespace default TaskQueue hello-world WorkerID 4822@Raymond@
2023/02/17 11:22:39 INFO  BEFORE EXECUTE! Namespace default TaskQueue hello-world WorkerID 4822@Raymond@ WorkflowType ConsumerWorkflow WorkflowID hello_world_1626421842992435200 RunID ea2be856-3b9d-452e-8f7d-b4b439f489b9 Attempt 1
2023/02/17 11:22:39 DEBUG ExecuteChildWorkflow Namespace default TaskQueue hello-world WorkerID 4822@Raymond@ WorkflowType ConsumerWorkflow WorkflowID hello_world_1626421842992435200 RunID ea2be856-3b9d-452e-8f7d-b4b439f489b9 Attempt 1 ChildWorkflowID consumer-hello_world_1626421842992435200-0 WorkflowType ChildWorkflow
2023/02/17 11:22:39 INFO  AFTER EXECUTE! Namespace default TaskQueue hello-world WorkerID 4822@Raymond@ WorkflowType ConsumerWorkflow WorkflowID hello_world_1626421842992435200 RunID ea2be856-3b9d-452e-8f7d-b4b439f489b9 Attempt 1
2023/02/17 11:22:39 INFO  BEFORE GET EXECUTION! Namespace default TaskQueue hello-world WorkerID 4822@Raymond@ WorkflowType ConsumerWorkflow WorkflowID hello_world_1626421842992435200 RunID ea2be856-3b9d-452e-8f7d-b4b439f489b9 Attempt 1
2023/02/17 11:22:39 INFO  AFTER GET EXECUTION! Namespace default TaskQueue hello-world WorkerID 4822@Raymond@ WorkflowType ConsumerWorkflow WorkflowID hello_world_1626421842992435200 RunID ea2be856-3b9d-452e-8f7d-b4b439f489b9 Attempt 1
2023/02/17 11:22:39 INFO  BEFORE EXECUTE! Namespace default TaskQueue hello-world WorkerID 4822@Raymond@ WorkflowType ConsumerWorkflow WorkflowID hello_world_1626421842992435200 RunID ea2be856-3b9d-452e-8f7d-b4b439f489b9 Attempt 1
2023/02/17 11:22:39 INFO  AFTER EXECUTE! Namespace default TaskQueue hello-world WorkerID 4822@Raymond@ WorkflowType ConsumerWorkflow WorkflowID hello_world_1626421842992435200 RunID ea2be856-3b9d-452e-8f7d-b4b439f489b9 Attempt 1
2023/02/17 11:22:39 INFO  BEFORE GET EXECUTION! Namespace default TaskQueue hello-world WorkerID 4822@Raymond@ WorkflowType ConsumerWorkflow WorkflowID hello_world_1626421842992435200 RunID ea2be856-3b9d-452e-8f7d-b4b439f489b9 Attempt 1

The "AFTER GET EXECUTION!" line didn't print for the 2nd child workflow that with the same workflow id. And the workflow keeps "Running" status.

Client

2023/02/17 11:22:39 INFO  No logger configured for temporal client. Created default one.
2023/02/17 11:22:39 Started workflow WorkflowID hello_world_1626421842992435200 RunID ea2be856-3b9d-452e-8f7d-b4b439f489b9

@Quinn-With-Two-Ns
Copy link
Contributor

I can reproduce this looks like WorkflowIDReusePolicy: enums.WORKFLOW_ID_REUSE_POLICY_REJECT_DUPLICATE is not handled correctly

@raymondsze
Copy link
Author

@Quinn-With-Two-Ns
Any workaround or I need wait a fix?

@Quinn-With-Two-Ns
Copy link
Contributor

I'll take a look, one obvious work around is to not using the same workflow ID for any child workflow.

@raymondsze
Copy link
Author

But this is what I wanna achieve.

If the child workflow does not exists --> start the child workflow
If the child workflow already exists and completed --> restart the child workflow
If the child workflow already exists and running --> do nothing

basically it is like a notifier to notify the child workflow there is a new message in something like mailbox, the child workflow is responsible to take the messages out of the mailbox. If the child is already picking message, then don't do anything to make sure only one child workflow is running.

@cretz
Copy link
Member

cretz commented Feb 17, 2023

Since it's your workflow, you know whether you've started a child or not. You can maintain a set of child workflow futures keyed by their ID to get-or-create. Granted we need to fix this issue, but you still control what children you start and can easily check for your conditions.

@Quinn-With-Two-Ns
Copy link
Contributor

The problem appears to be from this PR #999 the ChildWorkflowExecutionAlreadyStartedError is only set on one of the two futures child workflow use, so waiting on the execution future never unblocks. @Sushisource I can't see why we wouldn't propagate the error to both futures?

@Quinn-With-Two-Ns
Copy link
Contributor

This would break determinism if someone waited on the execution future in a selector, probably the best thing to do here is to use SDKFlags implemented here #1056 to version the fix.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants