Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix lost signal from Selector when Default path blocks #1682

Merged
merged 11 commits into from
Nov 18, 2024

Conversation

yuandrew
Copy link
Contributor

@yuandrew yuandrew commented Oct 22, 2024

What was changed

Fixed a lost signal, gated behind a new SdkFlag. Currently the flag is added, but never set anywhere, so existing behavior is unchanged. A separate PR will be added to enable the flag.

Why?

We shouldn't be losing a signal in our selector

Checklist

#1624

  1. How was this tested:

Added tests

@yuandrew yuandrew marked this pull request as ready for review October 28, 2024 16:50
@yuandrew yuandrew requested a review from a team as a code owner October 28, 2024 16:50
readyBranch = func() {
// readyBranch is not executed when AddDefault is specified,
// setting the value here prevents the signal from being dropped
dropSignalFlag := getWorkflowEnvironment(ctx).GetFlag(SDKFlagBlockedSelectorSignalReceive)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What test did you add that test the true path here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TestSelectBlockingDefaultWithFlag tests the true scenario

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah I see since the test environment always assumes the flag is true, hm I am not sure if we should assume that since the test environment would diverge from the real workflow environment, what do you think?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that the test environment shouldn't assume flags are true by default. That seems like a change that belongs in its own separate PR, I can create an issue for this

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be clear I don't think SDK flags should be off by default in the test environment, I meant SDK flags used in the test environment should match what we enable when running a new workflow.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, I misunderstood, I agree.

Is there anywhere flags are set by default for a workflow? From looking at the code, it seems like TryUse are scattered around the code for different scenarios.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there anywhere flags are set by default for a workflow? From looking at the code, it seems like TryUse are scattered around the code for different scenarios.

Yeah so how you can think of that is for new workflows all current flags are set to true. However for the new flag that you are adding here we don't want to set it to true by default because that would make it difficult for users to rollback their SDK version.

require.EqualValues(t, expected, history)
}

func TestSelectBlockingDefaultWithFlag(t *testing.T) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reported bug was that blocking in the default case of a selector could cause signals to be lost, when I last looked at these tests we didn't seem to have any coverage for blocking in one selector case while a signal is received. Can we add tests to verify their is no bugs if a signal is received while blocking in another case of a selector, not just default?

@Quinn-With-Two-Ns
Copy link
Contributor

Do you think it would be feasibly to add some debug API to set the flag to true and an integration test + replay test that tests a real workflow with the flag set?

ch2 := workflow.NewChannel(ctx)

if enableFlag {
internal.SetUnblockSelectorSignal()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should do this as part of the test, not the workflow. More importantly though can you confirm setting this won't effect any test that runs after this test?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should do this as part of the test, not the workflow

What is there difference between the two? or is this more of a style/preference?

can you confirm setting this won't effect any test that runs after this test?

Looks like it does affect subsequent tests :( Is it enough to unset this value? Go tests run in parallel, so that seems insufficient?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is there difference between the two? or is this more of a style/preference?

I think it is clearer when we are testing a workflow if all the setup is run before hand.

Go tests run in parallel, so that seems insufficient?

Only if t.Parallel() is called

defer cancel()
options := ts.startWorkflowOptions("test-selector-block")

internal.SetUnblockSelectorSignal()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: could just have one function that takes a bool: SetUnblockSelectorSignal(bool)

@Quinn-With-Two-Ns
Copy link
Contributor

LGTM! Thanks for putting up with all my requests for more tests

@yuandrew yuandrew merged commit c31c2f2 into temporalio:master Nov 18, 2024
13 checks passed
@yuandrew yuandrew deleted the selector-signal-loss branch November 18, 2024 19:10
ReyOrtiz pushed a commit to ReyOrtiz/temporal-sdk-go that referenced this pull request Dec 5, 2024
* initial changes, added replay test for legacy history, need to finish writing tests

* Clean up tests, fix error

* unit test for fixed behavior

* PR feedback

* improve tests, add tests for AddFuture, AddSend

* add integration tests, add debug API to enable SDK flag for tests

* set flag in test itself not workflow, unset flag after test

* unify set/unset function into one
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants