Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix test_handle_stuck_queued_tasks_multiple_attempts #44093

Conversation

gopidesupavan
Copy link
Member

@gopidesupavan gopidesupavan commented Nov 16, 2024

Tests are failing due to side effects.
https://github.com/apache/airflow/actions/runs/11868349296/job/33077514082?pr=44087#step:7:2550

The query is pulling all the log events, Log table might have previous test results stored. so updated now to pull with runid


^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named {pr_number}.significant.rst or {issue_number}.significant.rst, in newsfragments.

@boring-cyborg boring-cyborg bot added the area:Scheduler including HA (high availability) scheduler label Nov 16, 2024
@gopidesupavan gopidesupavan merged commit 16bc382 into apache:main Nov 16, 2024
45 checks passed
@gopidesupavan gopidesupavan deleted the fix-test-queued-tasks-multiple-attempts branch November 16, 2024 09:36
@dstandish
Copy link
Contributor

ah, oops, thanks

jscheffl pushed a commit to jscheffl/airflow that referenced this pull request Nov 18, 2024
kandharvishnu pushed a commit to kandharvishnu/airflow that referenced this pull request Nov 19, 2024
@dstandish dstandish added this to the Airflow 2.10.4 milestone Nov 19, 2024
jscheffl added a commit that referenced this pull request Nov 19, 2024
…44158)

* [v2-10-test] Re-queue tassk when they are stuck in queued (#43520)

The old "stuck in queued" logic just failed the tasks.  Now we requeue them.  We accomplish this by revoking the task from executor and setting state to scheduled.  We'll re-queue it up to 2 times.  Number of times is configurable by hidden config.

We added a method to base executor revoke_task because, it's a discrete operation that is required for this feature, and it might be useful in other cases e.g. when detecting as zombies etc.  We set state to failed or scheduled directly from scheduler (rather than sending through the event buffer) because event buffer makes more sense for handling external events -- why round trip through the executor and back to scheduler when scheduler is initiating the action?  Anyway this avoids having to deal with "state mismatch" issues when processing events.

---------

(cherry picked from commit a41feeb)

Co-authored-by: Daniel Imberman <daniel.imberman@gmail.com>
Co-authored-by: Daniel Standish <15932138+dstandish@users.noreply.github.com>
Co-authored-by: Jed Cunningham <66968678+jedcunningham@users.noreply.github.com>

* fix test_handle_stuck_queued_tasks_multiple_attempts (#44093)

---------

Co-authored-by: Daniel Imberman <daniel.imberman@gmail.com>
Co-authored-by: Daniel Standish <15932138+dstandish@users.noreply.github.com>
Co-authored-by: Jed Cunningham <66968678+jedcunningham@users.noreply.github.com>
Co-authored-by: GPK <gopidesupavan@gmail.com>
utkarsharma2 pushed a commit that referenced this pull request Dec 4, 2024
…44158)

* [v2-10-test] Re-queue tassk when they are stuck in queued (#43520)

The old "stuck in queued" logic just failed the tasks.  Now we requeue them.  We accomplish this by revoking the task from executor and setting state to scheduled.  We'll re-queue it up to 2 times.  Number of times is configurable by hidden config.

We added a method to base executor revoke_task because, it's a discrete operation that is required for this feature, and it might be useful in other cases e.g. when detecting as zombies etc.  We set state to failed or scheduled directly from scheduler (rather than sending through the event buffer) because event buffer makes more sense for handling external events -- why round trip through the executor and back to scheduler when scheduler is initiating the action?  Anyway this avoids having to deal with "state mismatch" issues when processing events.

---------

(cherry picked from commit a41feeb)

Co-authored-by: Daniel Imberman <daniel.imberman@gmail.com>
Co-authored-by: Daniel Standish <15932138+dstandish@users.noreply.github.com>
Co-authored-by: Jed Cunningham <66968678+jedcunningham@users.noreply.github.com>

* fix test_handle_stuck_queued_tasks_multiple_attempts (#44093)

---------

Co-authored-by: Daniel Imberman <daniel.imberman@gmail.com>
Co-authored-by: Daniel Standish <15932138+dstandish@users.noreply.github.com>
Co-authored-by: Jed Cunningham <66968678+jedcunningham@users.noreply.github.com>
Co-authored-by: GPK <gopidesupavan@gmail.com>
utkarsharma2 pushed a commit that referenced this pull request Dec 9, 2024
…44158)

* [v2-10-test] Re-queue tassk when they are stuck in queued (#43520)

The old "stuck in queued" logic just failed the tasks.  Now we requeue them.  We accomplish this by revoking the task from executor and setting state to scheduled.  We'll re-queue it up to 2 times.  Number of times is configurable by hidden config.

We added a method to base executor revoke_task because, it's a discrete operation that is required for this feature, and it might be useful in other cases e.g. when detecting as zombies etc.  We set state to failed or scheduled directly from scheduler (rather than sending through the event buffer) because event buffer makes more sense for handling external events -- why round trip through the executor and back to scheduler when scheduler is initiating the action?  Anyway this avoids having to deal with "state mismatch" issues when processing events.

---------

(cherry picked from commit a41feeb)

Co-authored-by: Daniel Imberman <daniel.imberman@gmail.com>
Co-authored-by: Daniel Standish <15932138+dstandish@users.noreply.github.com>
Co-authored-by: Jed Cunningham <66968678+jedcunningham@users.noreply.github.com>

* fix test_handle_stuck_queued_tasks_multiple_attempts (#44093)

---------

Co-authored-by: Daniel Imberman <daniel.imberman@gmail.com>
Co-authored-by: Daniel Standish <15932138+dstandish@users.noreply.github.com>
Co-authored-by: Jed Cunningham <66968678+jedcunningham@users.noreply.github.com>
Co-authored-by: GPK <gopidesupavan@gmail.com>
LefterisXefteris pushed a commit to LefterisXefteris/airflow that referenced this pull request Jan 5, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:Scheduler including HA (high availability) scheduler
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants