Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(semaphore): ensure holderKeys carry all information needed. Fixes #8684 #13553

Merged
merged 16 commits into from
Sep 18, 2024

Conversation

isubasinghe
Copy link
Member

@isubasinghe isubasinghe commented Sep 3, 2024

Fixes #8684

Motivation

The Holder keys in the status field of a workflow do not carry enough information to work as expected.
This change ensures a single function (getHolderKey) is used for both acquiring and releasing keys.
This in turn deprecates(getResourceKey).

Modifications

Store the entire holderKey in the status. Simple but effective change to fix #8684.
Allows controllers to support V2 holder names even if the holder was first populated by a v1 holder name.

Verification

The unit tests were run with HOLDER_KEY_VERSION=v2.
NOTE: E2E tests HAVE NOT BEEN RUN WITH THIS ENV VAR.
Largely due to being unable to run the E2E tests in a reliable way on my machine.

@isubasinghe isubasinghe changed the title fix: ensure Holding keys carry all information needed fix: ensure Holding keys carry all information needed. Fixes #8684 Sep 4, 2024
@isubasinghe isubasinghe marked this pull request as ready for review September 4, 2024 05:02
Copy link
Member Author

@isubasinghe isubasinghe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally looks alright, a few minor issues

pkg/apis/workflow/v1alpha1/workflow_types.go Outdated Show resolved Hide resolved
pkg/apis/workflow/v1alpha1/workflow_types.go Outdated Show resolved Hide resolved
workflow/controller/operator_concurrency_test.go Outdated Show resolved Hide resolved
workflow/sync/sync_manager.go Outdated Show resolved Hide resolved
Copy link

@agilgur5 agilgur5 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In #8684 (comment), you said this would be a breaking change? Then this should have a ! in the title and the appropriate milestone.

Or are you making this backward compatible with the env var?

NOTE: E2E tests HAVE NOT BEEN RUN WITH THIS ENV VAR.
Largely due to being unable to run the E2E tests in a reliable way on my machine.

If this is intended to be required in the future, we should probably enable it during testing similar to #12413 (comment)

@isubasinghe
Copy link
Member Author

isubasinghe commented Sep 4, 2024

In #8684 (comment), you said this would be a breaking change? Then this should have a ! in the title and the appropriate milestone.

Or are you making this backward compatible with the env var?

Yeah this should mean that you can upgrade workflows to use the new holder key name without any issue, no longer a breaking change.

NOTE: E2E tests HAVE NOT BEEN RUN WITH THIS ENV VAR.
Largely due to being unable to run the E2E tests in a reliable way on my machine.

If this is intended to be required in the future, we should probably enable it during testing similar to #12413 (comment)

Yeah that sounds good to me.

@agilgur5 agilgur5 changed the title fix: ensure Holding keys carry all information needed. Fixes #8684 fix(semaphore): ensure holderKeys carry all information needed. Fixes #8684 Sep 5, 2024
pkg/apis/workflow/v1alpha1/workflow_types.go Outdated Show resolved Hide resolved
workflow/sync/mutex_test.go Outdated Show resolved Hide resolved
workflow/controller/operator_concurrency_test.go Outdated Show resolved Hide resolved
pkg/apis/workflow/v1alpha1/workflow_types.go Outdated Show resolved Hide resolved
@agilgur5 agilgur5 changed the title fix(semaphore): ensure holderKeys carry all information needed. Fixes #8684 fix(semaphore): ensure holderKeys carry all information needed. Fixes #8684 Sep 5, 2024
@juliev0 juliev0 requested a review from sarabala1979 September 6, 2024 16:06
@isubasinghe isubasinghe requested a review from agilgur5 September 9, 2024 05:15
Copy link
Member

@Joibel Joibel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At minimum we need duplicates of the tests that are V1 only as V2 tests.

My preference is to drop V1 completely, agreeing with @agilgur5

docs/environment-variables.md Outdated Show resolved Hide resolved
workflow/sync/sync_manager.go Outdated Show resolved Hide resolved
pkg/apis/workflow/v1alpha1/workflow_types.go Outdated Show resolved Hide resolved
pkg/apis/workflow/v1alpha1/workflow_types.go Outdated Show resolved Hide resolved
@isubasinghe isubasinghe requested a review from Joibel September 16, 2024 09:16
@isubasinghe
Copy link
Member Author

legacy holding keys have been eliminated, now only v2 style keys exist. Meaning existing keys now test v2 directly.
We upgrade v1 keys to v2. There is a test for this upgrade process in place.

Copy link
Member Author

@isubasinghe isubasinghe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Joibel opinions? wait please, adding more tests

@isubasinghe isubasinghe marked this pull request as draft September 16, 2024 09:48
@isubasinghe isubasinghe marked this pull request as ready for review September 17, 2024 06:19
@isubasinghe
Copy link
Member Author

isubasinghe commented Sep 17, 2024

This PR is now ready, I believe I have addressed the edge cases. Unfortunately there is some ambiguity (as expected really) if a sync exist at both the template level and workflow level.

We just choose to go with workflow level in this case of ambiguity.

P.S. I hate this bug

workflow/sync/sync_manager.go Outdated Show resolved Hide resolved
workflow/sync/sync_manager.go Outdated Show resolved Hide resolved
workflow/sync/sync_manager.go Outdated Show resolved Hide resolved
workflow/sync/sync_manager.go Outdated Show resolved Hide resolved
workflow/sync/sync_manager.go Outdated Show resolved Hide resolved
workflow/sync/sync_manager.go Outdated Show resolved Hide resolved
pkg/apis/workflow/v1alpha1/workflow_types.go Outdated Show resolved Hide resolved
workflow/sync/sync_manager_test.go Outdated Show resolved Hide resolved
Signed-off-by: isubasinghe <isitha@pipekit.io>
Signed-off-by: isubasinghe <isitha@pipekit.io>
Signed-off-by: isubasinghe <isitha@pipekit.io>
Signed-off-by: isubasinghe <isitha@pipekit.io>
Signed-off-by: isubasinghe <isitha@pipekit.io>
Signed-off-by: isubasinghe <isitha@pipekit.io>
Signed-off-by: isubasinghe <isitha@pipekit.io>
Signed-off-by: isubasinghe <isitha@pipekit.io>
Signed-off-by: isubasinghe <isitha@pipekit.io>
Signed-off-by: isubasinghe <isitha@pipekit.io>
Signed-off-by: isubasinghe <isitha@pipekit.io>
Signed-off-by: isubasinghe <isitha@pipekit.io>
Signed-off-by: isubasinghe <isitha@pipekit.io>
Signed-off-by: isubasinghe <isitha@pipekit.io>
Signed-off-by: isubasinghe <isitha@pipekit.io>
Signed-off-by: isubasinghe <isitha@pipekit.io>
@isubasinghe
Copy link
Member Author

/retest

1 similar comment
@isubasinghe
Copy link
Member Author

/retest

@isubasinghe
Copy link
Member Author

Irrelevant tests seem to be broken since a rebase.

@Joibel Joibel merged commit 440f1cc into argoproj:main Sep 18, 2024
27 checks passed
Joibel pushed a commit to pipekit/argo-workflows that referenced this pull request Sep 19, 2024
Joibel pushed a commit that referenced this pull request Sep 20, 2024
@agilgur5 agilgur5 added this to the v3.5.x patches milestone Sep 20, 2024
@Joibel Joibel deleted the fix-deadlock-mutex branch November 7, 2024 10:41
Joibel pushed a commit to Joibel/argo-workflows that referenced this pull request Dec 6, 2024
Fixes: argoproj#8684
Backports: argoproj#13553
Signed-off-by: isubasinghe <isitha@pipekit.io>
Co-authored-by: Isitha Subasinghe <isitha@pipekit.io>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Workflow deadlocks on mutex in steps template if controller is restarted
4 participants