Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Faster db migration to Airflow 2.2 #19166

Merged
merged 9 commits into from
Oct 24, 2021

Conversation

jedcunningham
Copy link
Member

@jedcunningham jedcunningham commented Oct 22, 2021

Bigger Airflow databases can take a long time to migrate the database,
particularly if they have a lot of task instances. Dropping indexes
before setting run_id from dag_run allows the update to complete
faster.

With my test database containing 2.28 million task instances, this resulted in
a significantly faster migration (~145s to ~50s).

@jedcunningham jedcunningham added this to the Airflow 2.2.1 milestone Oct 22, 2021
@jedcunningham jedcunningham marked this pull request as draft October 22, 2021 18:18
@jedcunningham jedcunningham force-pushed the faster_2.2.0_migrations branch from 1daa77a to 4834fb8 Compare October 22, 2021 19:04
@jedcunningham jedcunningham reopened this Oct 22, 2021
@jedcunningham jedcunningham force-pushed the faster_2.2.0_migrations branch from 4834fb8 to 9b3f4cc Compare October 22, 2021 19:19
@jedcunningham jedcunningham marked this pull request as ready for review October 22, 2021 21:44
@github-actions github-actions bot added the full tests needed We need to run full set of tests for this PR to merge label Oct 22, 2021
@github-actions
Copy link

The PR most likely needs to run full matrix of tests because it modifies parts of the core of Airflow. However, committers might decide to merge it quickly and take the risk. If they don't merge it quickly - please rebase it to the latest main at your convenience, or amend the last commit of the PR, and push it with --force-with-lease.

Copy link
Member

@potiuk potiuk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

@kaxil
Copy link
Member

kaxil commented Oct 23, 2021

Well done 👏

Looks like all SQLite tests are failing, I haven't looked at failures but you might have to special case SQLite in some instances

@potiuk
Copy link
Member

potiuk commented Oct 24, 2021

Needs rebase?

Bigger Airflow databases can take a long time to migrate the database,
particularly if they have a lot of task instances. Dropping indexes
before setting `run_id` from `dag_run` allows the update to complete
faster.
@jedcunningham jedcunningham force-pushed the faster_2.2.0_migrations branch from 9c316e1 to 65e9d39 Compare October 24, 2021 12:44
@jedcunningham
Copy link
Member Author

Needs rebase?

Doesn't hurt. Images weren't built quickly enough, but they were for this run thankfully.

Copy link
Member

@potiuk potiuk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool! I see also few flaky tests fixed by adding sorting :). 👍

@jedcunningham jedcunningham merged commit 559d607 into apache:main Oct 24, 2021
@jedcunningham jedcunningham deleted the faster_2.2.0_migrations branch October 24, 2021 14:45
jedcunningham added a commit that referenced this pull request Oct 24, 2021
Bigger Airflow databases can take a long time to migrate the database,
particularly if they have a lot of task instances. On PostgreSQL, creating a
new table is much faster than updating the existing table.

(cherry picked from commit 559d607)
jedcunningham added a commit to astronomer/airflow that referenced this pull request Oct 26, 2021
Bigger Airflow databases can take a long time to migrate the database,
particularly if they have a lot of task instances. On PostgreSQL, creating a
new table is much faster than updating the existing table.

(cherry picked from commit 559d607)
sharon2719 pushed a commit to sharon2719/airflow that referenced this pull request Oct 27, 2021
Bigger Airflow databases can take a long time to migrate the database,
particularly if they have a lot of task instances. On PostgreSQL, creating a
new table is much faster than updating the existing table.
jedcunningham added a commit to astronomer/airflow that referenced this pull request Oct 27, 2021
Bigger Airflow databases can take a long time to migrate the database,
particularly if they have a lot of task instances. On PostgreSQL, creating a
new table is much faster than updating the existing table.

(cherry picked from commit 559d607)
@jedcunningham jedcunningham added the type:bug-fix Changelog: Bug Fixes label Apr 7, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
full tests needed We need to run full set of tests for this PR to merge type:bug-fix Changelog: Bug Fixes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants