Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Fix]Solve the deadlock problem caused by queuing #13191

Merged
merged 4 commits into from
Dec 16, 2022
Merged

Conversation

dahai1996
Copy link
Contributor

Purpose of the pull request

when using task group for jobs,every workflow will rob taskGroup When it gets a "WAIT_TASK_GROUP" message from the queue.And the workflow will not go to check next message until the current message rob taskGroup successfully.
When every workflow is going to "rob taskGroup",but taskGroup has no resource.
it's get a deadlock problem.
I test my code on version 3.0.1 , and it works well.

Brief change log

move the failed message to the fail of queue.

Verify this pull request

This pull request is code cleanup without any test coverage.

(or)

This pull request is already covered by existing tests, such as (please describe tests).

(or)

This change added tests and can be verified as follows:

(or)

If your pull request contain incompatible change, you should also add it to docs/docs/en/guide/upgrede/incompatible.md

@ruanwenjun
Copy link
Member

ruanwenjun commented Dec 14, 2022

Good catch, this PR will change the event order, but not all event can work well after change the order.

Can we throw a exception to tell the looper this event need to put in the end?

@SbloodyS SbloodyS added first time contributor First-time contributor 3.0.x labels Dec 14, 2022
@SbloodyS SbloodyS added this to the 3.0.4 milestone Dec 14, 2022
@dahai1996
Copy link
Contributor Author

dahai1996 commented Dec 14, 2022

Good catch, this PR will change the event order, but not all event can work well after change the order.

Can we throw a exception to tell the looper this event need to put in the end?

yes,it will change the order, I thought about making a temporary list to save the failed event.
or we can make a new Exception for "rob taskGroup failed"

@codecov-commenter
Copy link

codecov-commenter commented Dec 14, 2022

Codecov Report

Merging #13191 (732663c) into dev (ca5af01) will decrease coverage by 0.01%.
The diff coverage is 0.00%.

@@             Coverage Diff              @@
##                dev   #13191      +/-   ##
============================================
- Coverage     39.39%   39.38%   -0.02%     
+ Complexity     4282     4280       -2     
============================================
  Files          1066     1067       +1     
  Lines         40482    40493      +11     
  Branches       4657     4662       +5     
============================================
  Hits          15947    15947              
- Misses        22746    22757      +11     
  Partials       1789     1789              
Impacted Files Coverage Δ
...r/server/master/event/StateEventHandleFailure.java 0.00% <0.00%> (ø)
...er/master/event/TaskWaitTaskGroupStateHandler.java 0.00% <0.00%> (ø)
.../server/master/runner/WorkflowExecuteRunnable.java 10.38% <0.00%> (-0.05%) ⬇️

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@dahai1996
Copy link
Contributor Author

Good catch, this PR will change the event order, but not all event can work well after change the order.

Can we throw a exception to tell the looper this event need to put in the end?

i do it by throwing a exception,pls check

@sonarcloud
Copy link

sonarcloud bot commented Dec 15, 2022

SonarCloud Quality Gate failed.    Quality Gate failed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 1 Code Smell

37.0% 37.0% Coverage
0.0% 0.0% Duplication

Copy link
Member

@ruanwenjun ruanwenjun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@ruanwenjun ruanwenjun added the bug Something isn't working label Dec 16, 2022
Copy link
Contributor

@davidzollo davidzollo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1
Good job

@davidzollo
Copy link
Contributor

Thanks for your first contribution, Looking forward to your deep participation ^-^

@davidzollo davidzollo merged commit 7a0a2c2 into apache:dev Dec 16, 2022
1 check passed
@davidzollo davidzollo changed the title Solve the deadlock problem caused by queuing [Fix]Solve the deadlock problem caused by queuing Dec 16, 2022
zhongjiajie pushed a commit that referenced this pull request Dec 28, 2022
* Solve the deadlock problem caused by queuing

* Solve the deadlock problem caused by queuing

* Solve the deadlock problem caused by queuing

* Solve the deadlock problem caused by queuing,move the event to the tail by throwing a exception

Co-authored-by: wfs <wangfushun@cdqcp.cpm>

(cherry picked from commit 7a0a2c2)
@zhuangchong zhuangchong modified the milestones: 3.0.4, 3.1.9 Sep 11, 2023
@zhuangchong zhuangchong added the 3.1.x for 3.1.x version label Sep 11, 2023
@zhuangchong zhuangchong added the release cherry-pick Mark this issue/PR had cherry-pick for release version label Dec 6, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3.1.x for 3.1.x version backend bug Something isn't working first time contributor First-time contributor release cherry-pick Mark this issue/PR had cherry-pick for release version
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants