Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Panic: Error adding duplicate command on duplicated child workflow #947

Closed
mfateev opened this issue Nov 6, 2022 · 1 comment · Fixed by #999
Closed

Panic: Error adding duplicate command on duplicated child workflow #947

mfateev opened this issue Nov 6, 2022 · 1 comment · Fixed by #999
Assignees

Comments

@mfateev
Copy link
Member

mfateev commented Nov 6, 2022

Expected Behavior

An attempt to start a child workflow with the same ID as the one already running should result in ChildWorkflowExecutionAlreadyStartedError.

Actual Behavior

2022/11/06 11:12:07 ERROR Workflow panic Namespace default TaskQueue child-workflow WorkerID 8968@Maxims-MacBook-Pro.local@ WorkflowType SampleParentWorkflow WorkflowID parent-workflow_ed5bbdc7-183d-43b7-936a-d2a9e4d9bd2f RunID b4a73c0b-a1d3-4ef9-ab29-8bfd34b4138d Attempt 1 Error adding duplicate command CommandType: ChildWorkflow, ID: ABC-SIMPLE-CHILD-WORKFLOW-ID, state=Created, isDone()=false, history=[Created] StackTrace coroutine root [panic]:
go.temporal.io/sdk/internal.panicIllegalState(...)
        /Users/maxim/go/pkg/mod/go.temporal.io/sdk@v1.17.0/internal/internal_command_state_machine.go:502
go.temporal.io/sdk/internal.(*commandsHelper).addCommand(0x14000554230, {0x1016656f0, 0x140000a3290})
        /Users/maxim/go/pkg/mod/go.temporal.io/sdk@v1.17.0/internal/internal_command_state_machine.go:998 +0x2f8
go.temporal.io/sdk/internal.(*commandsHelper).startChildWorkflowExecution(0x41b8?, 0x102180668?)
        /Users/maxim/go/pkg/mod/go.temporal.io/sdk@v1.17.0/internal/internal_command_state_machine.go:1228 +0x38
go.temporal.io/sdk/internal.(*workflowEnvironmentImpl).ExecuteChildWorkflow(_, {{{0x140001b21e0, 0xe}, 0x0, 0x0, 0x0, {0x1013796f7, 0x7}, {0x10138d56c, 0x1c}, ...}, ...}, ...)
        /Users/maxim/go/pkg/mod/go.temporal.io/sdk@v1.17.0/internal/internal_event_handlers.go:476 +0x3b0
go.temporal.io/sdk/internal.(*workflowEnvironmentInterceptor).ExecuteChildWorkflow(0x140005506c0?, {0x10165fff8?, 0x1400017e000}, {0x1017c363a, 0x13}, {0x140000a3200, 0x1, 0x1})
        /Users/maxim/go/pkg/mod/go.temporal.io/sdk@v1.17.0/internal/workflow.go:903 +0x6b8
go.temporal.io/sdk/internal.ExecuteChildWorkflow({0x10165fff8, 0x140005506c0}, {0x101548ca0, 0x101651d80}, {0x140000a3200, 0x1, 0x1})
        /Users/maxim/go/pkg/mod/go.temporal.io/sdk@v1.17.0/internal/workflow.go:851 +0x188
go.temporal.io/sdk/workflow.ExecuteChildWorkflow(...)
        /Users/maxim/go/pkg/mod/go.temporal.io/sdk@v1.17.0/workflow/workflow.go:181
github.com/temporalio/samples-go/child-workflow.SampleParentWorkflow({0x10165fff8, 0x14000550690})
        /Users/maxim/temporal/temporal-go-samples/child-workflow/parent_workflow.go:25 +0x164
reflect.Value.call({0x101539b00?, 0x101651d88?, 0x102000f18?}, {0x101376901, 0x4}, {0x1400000fb48, 0x1, 0x5?})
        /opt/homebrew/Cellar/go/1.19.2/libexec/src/reflect/value.go:584 +0x688
reflect.Value.Call({0x101539b00?, 0x101651d88?, 0x5000?}, {0x1400000fb48?, 0x1400044bcf8?, 0x10134ce50?})
        /opt/homebrew/Cellar/go/1.19.2/libexec/src/reflect/value.go:368 +0x90
go.temporal.io/sdk/internal.executeFunction({0x101539b00, 0x101651d88}, {0x1400044bde8, 0x1, 0x14000090000?})
        /Users/maxim/go/pkg/mod/go.temporal.io/sdk@v1.17.0/internal/internal_worker.go:1651 +0x140
go.temporal.io/sdk/internal.(*workflowEnvironmentInterceptor).ExecuteWorkflow(0x14000554280, {0x10165fea8?, 0x14000109d40}, 0x1400000fb30)
        /Users/maxim/go/pkg/mod/go.temporal.io/sdk@v1.17.0/internal/workflow.go:504 +0x14c
go.temporal.io/sdk/internal.(*workflowExecutor).Execute(0x14000113300, {0x10165fea8, 0x14000109d40}, 0x25?)
        /Users/maxim/go/pkg/mod/go.temporal.io/sdk@v1.17.0/internal/internal_worker.go:771 +0x238
go.temporal.io/sdk/internal.(*syncWorkflowDefinition).Execute.func1({0x10165fff8, 0x14000550570})
        /Users/maxim/go/pkg/mod/go.temporal.io/sdk@v1.17.0/internal/internal_workflow.go:553 +0xd0

Steps to Reproduce the Problem

Modified child_worklfow sample:

func SampleParentWorkflow(ctx workflow.Context) (string, error) {
	logger := workflow.GetLogger(ctx)

	cwo := workflow.ChildWorkflowOptions{
		WorkflowID: "ABC-SIMPLE-CHILD-WORKFLOW-ID",
	}
	ctx = workflow.WithChildOptions(ctx, cwo)

	var result string
	child1 := workflow.ExecuteChildWorkflow(ctx, SampleChildWorkflow, "World")

	workflow.Sleep(ctx, 10*time.Second)
	err := workflow.ExecuteChildWorkflow(ctx, SampleChildWorkflow, "World").Get(ctx, &result)
	if err != nil {
		logger.Error("Parent execution received second child execution failure.", "Error", err)
		return "", err
	}
	err = child1.Get(ctx, &result)
	if err != nil {
		logger.Error("Parent execution received child execution failure.", "Error", err)
		return "", err
	}

	logger.Info("Parent execution completed.", "Result", result)
	return result, nil
}

func SampleChildWorkflow(ctx workflow.Context, name string) (string, error) {
	logger := workflow.GetLogger(ctx)
	greeting := "Hello " + name + "!"
	logger.Info("Child workflow execution: " + greeting)
	workflow.Sleep(ctx, 15*time.Second)
	return greeting, nil
}

Specifications

  • Version: go.temporal.io/sdk v1.17.0
@bohlander
Copy link

bohlander commented Nov 7, 2022

We have seen this same panic even when the 1st child workflow has exited before the 2nd one is started.

We have some long running (~40 day) workflows that have a timer that runs daily. These have been stable for 6+ months, but last night we had a significant amount of them fail with: adding duplicate command CommandType: ChildWorkflow, ID: <snip>, state=Created, isDone()=false, history=[Created]","Namespace":"default","RunID":"10c62aba-5487-4a5f-b1c1-93badbdb5ee8","StackTrace":<snip> . This occurs when the workflow with the timer is attempting to start a child workflow.

Screen Shot 2022-11-06 at 10 41 06 AM

Screen Shot 2022-11-06 at 10 40 40 AM

You can see it finished the childworkflow with same id finished at 11/5/22 7:21:31. Then the duplicate command failure occurs ~24 hrs later at 11/6/22 7:20:52 (the temporal web UI screenshot is in PST whereas the logs are in UTC)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants