Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Row lock TI query in SchedulerJob._process_executor_events #18975

Merged
merged 3 commits into from
Oct 14, 2021

Conversation

ephraimbuddy
Copy link
Contributor

Using multiple schedulers causes Deadlock in _process_executor_events.
This PR fixes it.

The error:

[2021-10-14 08:37:18,377: INFO/ForkPoolWorker-1] Celery task ID: 6e8e34d/persistence.py", line 995, in _emit_update_statements
    statement, multiparams
  File "/usr/local/lib/python3.6/site-packages/sqlalchemy/engine/base.py", line 1011, in execute
    return meth(self, multiparams, params)
  File "/usr/local/lib/python3.6/site-packages/sqlalchemy/sql/elements.py", line 298, in _execute_on_connection
    return connection._execute_clauseelement(self, multiparams, params)
  File "/usr/local/lib/python3.6/site-packages/sqlalchemy/engine/base.py", line 1130, in _execute_clauseelement
    distilled_params,
  File "/usr/local/lib/python3.6/site-packages/sqlalchemy/engine/base.py", line 1317, in _execute_context
    e, statement, parameters, cursor, context
  File "/usr/local/lib/python3.6/site-packages/sqlalchemy/engine/base.py", line 1511, in _handle_dbapi_exception
    sqlalchemy_exception, with_traceback=exc_info[2], from_=e
  File "/usr/local/lib/python3.6/site-packages/sqlalchemy/util/compat.py", line 182, in raise_
    raise exception
  File "/usr/local/lib/python3.6/site-packages/sqlalchemy/engine/base.py", line 1257, in _execute_context
    cursor, statement, parameters, context
  File "/usr/local/lib/python3.6/site-packages/sqlalchemy/dialects/postgresql/psycopg2.py", line 912, in do_executemany
    cursor.executemany(statement, parameters)
sqlalchemy.exc.OperationalError: (psycopg2.errors.DeadlockDetected) deadlock detected
DETAIL:  Process 35678 waits for ShareLock on transaction 150292; blocked by process 58800.
Process 58800 waits for ShareLock on transaction 150291; blocked by process 58799.
Process 58799 waits for ShareLock on transaction 150289; blocked by process 35678.
HINT:  See server log for query details.
CONTEXT:  while updating tuple (11,54) in relation "task_instance"

[SQL: UPDATE task_instance SET external_executor_id=%(external_executor_id)s WHERE task_instance.task_id = %(task_instance_task_id)s AND task_instance.dag_id = %(task_instance_dag_id)s AND task_instance.run_id = %(task_instance_run_id)s]
[parameters: ({'external_executor_id': 'fe1c200a-ca4a-49b2-b5f9-ea53d5ed0846', 'task_instance_task_id': 'test_sensor_15', 'task_instance_dag_id': 'airflow_bug', 'task_instance_run_id': 'manual__2021-10-14T08:17:18.387618+00:00'}, {'external_executor_id': '1ee3e79a-ceab-4215-94cb-100d8de73bad', 'task_instance_task_id': 'test_sensor_22', 'task_instance_dag_id': 'airflow_bug', 'task_instance_run_id': 'manual__2021-10-14T08:17:18.387618+00:00'}, {'external_executor_id': '8873162e-519f-4ef2-890c-d38754760c81', 'task_instance_task_id': 'test_sensor_24', 'task_instance_dag_id': 'airflow_bug', 'task_instance_run_id': 'manual__2021-10-14T08:17:18.387618+00:00'}, {'external_executor_id': '52d05186-5297-4079-b94a-132bef845673', 'task_instance_task_id': 'test_sensor_27', 'task_instance_dag_id': 'airflow_bug', 'task_instance_run_id': 'manual__2021-10-14T08:17:18.387618+00:00'}, {'external_executor_id': '1e47d3a0-4398-4f07-9ebd-73e16ffa02ec', 'task_instance_task_id': 'test_sensor_28', 'task_instance_dag_id': 'airflow_bug', 'task_instance_run_id': 'manual__2021-10-14T08:17:18.387618+00:00'}, {'external_executor_id': '1c89c6e0-8cb6-437b-9ee5-56968bb993fc', 'task_instance_task_id': 'test_sensor_35', 'task_instance_dag_id': 'airflow_bug', 'task_instance_run_id': 'manual__2021-10-14T08:17:18.387618+00:00'})]
(Background on this error at: http://sqlalche.me/e/13/e3q8)

I tested the fix with 3 schedulers and it works.

How to reproduce:
Using 2 schedulers and celery executor, run this dag:
dags.zip

One scheduler will stop with errors like the above after some time.


^ Add meaningful description above

Read the Pull Request Guidelines for more information.
In case of fundamental code change, Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in UPDATING.md.

@boring-cyborg boring-cyborg bot added the area:Scheduler including HA (high availability) scheduler label Oct 14, 2021
Copy link
Member

@potiuk potiuk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whoa! Good one!

@potiuk potiuk added this to the Airflow 2.2.1 milestone Oct 14, 2021
@github-actions github-actions bot added the full tests needed We need to run full set of tests for this PR to merge label Oct 14, 2021
@github-actions
Copy link

The PR most likely needs to run full matrix of tests because it modifies parts of the core of Airflow. However, committers might decide to merge it quickly and take the risk. If they don't merge it quickly - please rebase it to the latest main at your convenience, or amend the last commit of the PR, and push it with --force-with-lease.

Using multiple schedulers causes Deadlock in _process_executor_events.
This PR fixes it.
ephraimbuddy and others added 2 commits October 14, 2021 13:01
@potiuk
Copy link
Member

potiuk commented Oct 14, 2021

Looks green! BTW. We have LOT MORE GREEN nowadays after all those small flaky test fixes :D.

@ephraimbuddy
Copy link
Contributor Author

Looks green! BTW. We have LOT MORE GREEN nowadays after all those small flaky test fixes :D.

Yay!! 🎉🎉 Thanks for the good works🙌🙌

@ephraimbuddy ephraimbuddy merged commit 52cc84c into apache:main Oct 14, 2021
@ephraimbuddy ephraimbuddy deleted the fix-deadlock branch October 14, 2021 15:09
jedcunningham pushed a commit that referenced this pull request Oct 14, 2021
Using multiple schedulers causes Deadlock in _process_executor_events.
This PR fixes it.

Co-authored-by: Tzu-ping Chung <uranusjr@gmail.com>
(cherry picked from commit 52cc84c)
jedcunningham pushed a commit to astronomer/airflow that referenced this pull request Oct 26, 2021
)

Using multiple schedulers causes Deadlock in _process_executor_events.
This PR fixes it.

Co-authored-by: Tzu-ping Chung <uranusjr@gmail.com>
(cherry picked from commit 52cc84c)
jedcunningham pushed a commit to astronomer/airflow that referenced this pull request Oct 27, 2021
)

Using multiple schedulers causes Deadlock in _process_executor_events.
This PR fixes it.

Co-authored-by: Tzu-ping Chung <uranusjr@gmail.com>
(cherry picked from commit 52cc84c)
jedcunningham pushed a commit to astronomer/airflow that referenced this pull request Jan 13, 2022
)

Using multiple schedulers causes Deadlock in _process_executor_events.
This PR fixes it.

Co-authored-by: Tzu-ping Chung <uranusjr@gmail.com>
(cherry picked from commit 52cc84c)
jedcunningham pushed a commit to astronomer/airflow that referenced this pull request Jan 13, 2022
)

Using multiple schedulers causes Deadlock in _process_executor_events.
This PR fixes it.

Co-authored-by: Tzu-ping Chung <uranusjr@gmail.com>
(cherry picked from commit 52cc84c)
(cherry picked from commit d161134)
jedcunningham pushed a commit to astronomer/airflow that referenced this pull request Jan 13, 2022
)

Using multiple schedulers causes Deadlock in _process_executor_events.
This PR fixes it.

Co-authored-by: Tzu-ping Chung <uranusjr@gmail.com>
(cherry picked from commit 52cc84c)
(cherry picked from commit d161134)
jedcunningham pushed a commit to astronomer/airflow that referenced this pull request Jan 13, 2022
)

Using multiple schedulers causes Deadlock in _process_executor_events.
This PR fixes it.

Co-authored-by: Tzu-ping Chung <uranusjr@gmail.com>
(cherry picked from commit 52cc84c)
(cherry picked from commit d161134)
(cherry picked from commit b109fe4)
@jedcunningham jedcunningham added the type:bug-fix Changelog: Bug Fixes label Apr 7, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:Scheduler including HA (high availability) scheduler full tests needed We need to run full set of tests for this PR to merge type:bug-fix Changelog: Bug Fixes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants