You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Child context cancellation propagates in a deterministic order
Actual Behavior
Child context cancellation does not propagates in a deterministic order.
When a parent context cancels its' children it loops through a map. Map order in Go is not deterministic so the order child contexts are canceled is not deterministic.
Steps to Reproduce the Problem
func TestContextCancellationOrderDeterminism(t *testing.T) {
/*
Previously, child-contexts were stored in a map, preventing deterministic order when propagating cancellation.
The order of branches being selected in this test was random, both for the first event and in following ones.
In principle this should be fine, but it's possible for the effects of cancellation to trigger a selector's
future-done callback, which currently records the *real-time*-first event as the branch to unblock, rather than
doing something more safe by design (e.g. choosing based on state when the selector's goroutine is unblocked).
Unfortunately, we cannot change the selector's behavior without introducing non-backwards-compatible changes to
currently-working workflows.
So the workaround for now is to maintain child-context order, so they are canceled in a consistent order.
As this order was not controlled before, and Go does a pretty good job at randomizing map iteration order,
converting non-determinism to determinism should be strictly no worse for backwards compatibility, and it
fixes the issue for future executions.
*/
check := func(t *testing.T, separateStart, separateSelect bool) {
env := newTestWorkflowEnv(t)
act := func(ctx context.Context) error {
return nil // will be mocked
}
wf := func(ctx Context) ([]int, error) {
ctx, cancel := WithCancel(ctx)
Go(ctx, func(ctx Context) {
_ = Sleep(ctx, time.Minute)
cancel()
})
// start some activities, which will not complete before the timeout cancels them
ctx = WithActivityOptions(ctx, ActivityOptions{
TaskQueue: "",
ScheduleToCloseTimeout: time.Hour,
ScheduleToStartTimeout: time.Hour,
StartToCloseTimeout: time.Hour,
})
s := NewSelector(ctx)
var result []int
for i := 0; i < 10; i++ {
i := i
// need a child context, a future alone is not enough as it does not become a child
cctx, ccancel := WithCancel(ctx)
s.AddFuture(ExecuteActivity(cctx, act), func(f Future) {
ccancel() // TODO: is this necessary to prevent leaks? if it is, how can we make it not?
err := f.Get(ctx, nil)
if err == nil || !IsCanceledError(err) {
// fail the test, this should not happen - activities must be canceled or it's not valid.
t.Errorf("activity completion or failure for some reason other than cancel: %v", err)
}
result = append(result, i)
})
if separateStart {
// yield so they are submitted one at a time, in case that matters
_ = Sleep(ctx, time.Second)
}
}
for i := 0; i < 10; i++ {
if separateSelect {
// yield so they are selected one at a time, in case that matters
_ = Sleep(ctx, time.Second)
}
s.Select(ctx)
}
return result, nil
}
env.RegisterWorkflow(wf)
env.RegisterActivity(act)
// activities must not complete in time
env.OnActivity(act, mock.Anything).After(5 * time.Minute).Return(nil)
env.ExecuteWorkflow(wf)
require.NoError(t, env.GetWorkflowError())
var result []int
require.NoError(t, env.GetWorkflowResult(&result))
require.NotEmpty(t, result)
assert.Equal(t, 0, result[0], "first activity to be created should be the first one canceled")
assert.Equal(t, []int{1, 2, 3, 4, 5, 6, 7, 8, 9}, result[1:], "other activities should finish in a consistent (but undefined) order")
}
type variant struct {
name string
separateStart bool
separateSelect bool
}
// all variants expose this behavior, but being a bit more exhaustive in the face
// of decision-scheduling differences seems good.
for _, test := range []variant{
{"many in one decision", false, false},
{"many started at once, selected slowly", false, true},
{"started slowly, selected quickly", true, false},
{"started and selected slowly", true, true},
} {
t.Run(test.name, func(t *testing.T) {
check(t, test.separateStart, test.separateSelect)
})
}
}
Note: this test came from cadence go client where I noticed they had fixed this bug.
The text was updated successfully, but these errors were encountered:
Expected Behavior
Child context cancellation propagates in a deterministic order
Actual Behavior
Child context cancellation does not propagates in a deterministic order.
When a parent context cancels its' children it loops through a map. Map order in Go is not deterministic so the order child contexts are canceled is not deterministic.
Steps to Reproduce the Problem
Note: this test came from cadence go client where I noticed they had fixed this bug.
The text was updated successfully, but these errors were encountered: