-
Notifications
You must be signed in to change notification settings - Fork 14.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[AIRFLOW-2921][AIRFLOW-2922] Fix two potential bugs in CeleryExecutor() #3773
Conversation
Codecov Report
@@ Coverage Diff @@
## master #3773 +/- ##
==========================================
- Coverage 77.65% 77.64% -0.01%
==========================================
Files 204 204
Lines 15841 15841
==========================================
- Hits 12301 12300 -1
- Misses 3540 3541 +1
Continue to review full report at Codecov.
|
Will need tests to cover this please. |
Hi @ashb , sure I will. May I confirm whether the logic/reasoning is making sense to you? |
Bug-1: if a task state becomes either SUCCESS or FAILURE or REVOKED, it will be removed from self.tasks() and self.last_state(). However, because line 108 is not indented properly, this task will be added back to self.last_state() again. Bug-2: When the state is updated, it's referring to the latest state `task.state` rather than variable `state`. This may result in dead-lock if the state changed from `STARTED` to `SUCCESS` after the if-elif-else block started. Test case is updated for fix to bug-1.
Hi @ashb,
|
Hi @yrqls21 , would you like to give your review on this for fun? ;-) Especially the |
@@ -54,6 +54,8 @@ def test_celery_integration(self): | |||
self.assertNotIn('success', executor.tasks) | |||
self.assertNotIn('fail', executor.tasks) | |||
|
|||
self.assertNotIn('success', executor.last_state) | |||
self.assertNotIn('fail', executor.last_state) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test case change is for potential bug-1
only.
self.log.info("Unexpected state: %s", task.state) | ||
self.last_state[key] = task.state | ||
self.log.info("Unexpected state: %s", state) | ||
self.last_state[key] = state |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-
The indentation change is to fix
potential bug-1
. -
Changing from
task.state
tostate
in both lines 107 and 108 are to fixpotential bug-2
Great catch, we have actually a patch in Airbnb that covered the 2nd bug( involves more changes and we're baking it in our env). I agree on the fix for the 1st bug and its test. Also agree that it is very hard to have test to guard the 2nd bug. Tyvm for the fixes, LGTM. 🎉 |
Thanks @yrqls21 ! Luckily Big |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @XD-DENG
Thanks @Fokko |
Does this mean that CeleryExecutor is totally broken in 1.10.0, and we need to get a 1.10.1 out quickly? |
Hi @ashb , to my understanding:
Personally I don’t think it’s that urgent. |
@ashb I don't think they are so severe that we need to call celery executor broken. The 1st bug practically won't create problem and the 2nd bug creates a rare edge case that one task will get stuck in a deadlock, and can be corrected by doing something like clear the task instance |
K. Just getting a sense of urgency. We might have a 1.10.1 anyway, and if we do I'll pull this in to it. |
Thanks both @yrqls21 @ashb |
Bug-1: if a task state becomes either SUCCESS or FAILURE or REVOKED, it will be removed from self.tasks() and self.last_state(). However, because line 108 is not indented properly, this task will be added back to self.last_state() again. Bug-2: When the state is updated, it's referring to the latest state `task.state` rather than variable `state`. This may result in dead-lock if the state changed from `STARTED` to `SUCCESS` after the if-elif-else block started. Test case is updated for fix to bug-1.
Bug-1: if a task state becomes either SUCCESS or FAILURE or REVOKED, it will be removed from self.tasks() and self.last_state(). However, because line 108 is not indented properly, this task will be added back to self.last_state() again. Bug-2: When the state is updated, it's referring to the latest state `task.state` rather than variable `state`. This may result in dead-lock if the state changed from `STARTED` to `SUCCESS` after the if-elif-else block started. Test case is updated for fix to bug-1.
Bug-1: if a task state becomes either SUCCESS or FAILURE or REVOKED, it will be removed from self.tasks() and self.last_state(). However, because line 108 is not indented properly, this task will be added back to self.last_state() again. Bug-2: When the state is updated, it's referring to the latest state `task.state` rather than variable `state`. This may result in dead-lock if the state changed from `STARTED` to `SUCCESS` after the if-elif-else block started. Test case is updated for fix to bug-1.
Bug-1: if a task state becomes either SUCCESS or FAILURE or REVOKED, it will be removed from self.tasks() and self.last_state(). However, because line 108 is not indented properly, this task will be added back to self.last_state() again. Bug-2: When the state is updated, it's referring to the latest state `task.state` rather than variable `state`. This may result in dead-lock if the state changed from `STARTED` to `SUCCESS` after the if-elif-else block started. Test case is updated for fix to bug-1.
Make sure you have checked all steps below.
Jira
Description
Potential bug-1
If a task state becomes either
SUCCESS
orFAILURE
orREVOKED
, it will be removed fromself.tasks
andself.last_state
. However, because line 108 is not indented properly, this task will be added back toself.last_state
again.This will not lead to any error. But may still be worth changing.
Solution: indent line 108.
Potential bug-2
Celery Task states normally change in ways like “PENDING -> STARTED -> SUCCESS/FAILURE” (http://docs.celeryproject.org/en/latest/reference/celery.states.html).
In lines 107 and 108, it’s
task.state
rather thanstate
, i.e. it will reflect the latest real-time state of the Celery task.Let’s imagine: task state becomes STARTED first (initial state is PENDING), then the
if-elif-else
block will be triggered (line 93). It’s possible the Celery task state becomes SUCCESS when whichever line between 94-105 is running. At line 108, the latest state of the task will be changed to SUCCESS rather than STARTED inself.last_state
because it’s referring to the latest statetask.state
rather than variablestate
.Then this task will be dead-locked because the
if-elif-else block
will never be triggered for it again. It has no chance to be ended properly with methodsself.success()
orself.fail()
. Although the chance that this happens is quite low (the time window is very short), the chance is not zero.Solution: change
task.state
tostate
in lines 107 and 108.Tests
Commits
Documentation
Code Quality
git diff upstream/master -u -- "*.py" | flake8 --diff