-
Notifications
You must be signed in to change notification settings - Fork 14.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Faster db migration to Airflow 2.2 #19166
Faster db migration to Airflow 2.2 #19166
Conversation
1daa77a
to
4834fb8
Compare
4834fb8
to
9b3f4cc
Compare
The PR most likely needs to run full matrix of tests because it modifies parts of the core of Airflow. However, committers might decide to merge it quickly and take the risk. If they don't merge it quickly - please rebase it to the latest main at your convenience, or amend the last commit of the PR, and push it with --force-with-lease. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice!
Well done 👏 Looks like all SQLite tests are failing, I haven't looked at failures but you might have to special case SQLite in some instances |
Needs rebase? |
Bigger Airflow databases can take a long time to migrate the database, particularly if they have a lot of task instances. Dropping indexes before setting `run_id` from `dag_run` allows the update to complete faster.
9c316e1
to
65e9d39
Compare
Doesn't hurt. Images weren't built quickly enough, but they were for this run thankfully. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool! I see also few flaky tests fixed by adding sorting :). 👍
Bigger Airflow databases can take a long time to migrate the database, particularly if they have a lot of task instances. On PostgreSQL, creating a new table is much faster than updating the existing table. (cherry picked from commit 559d607)
Bigger Airflow databases can take a long time to migrate the database, particularly if they have a lot of task instances. On PostgreSQL, creating a new table is much faster than updating the existing table. (cherry picked from commit 559d607)
Bigger Airflow databases can take a long time to migrate the database, particularly if they have a lot of task instances. On PostgreSQL, creating a new table is much faster than updating the existing table.
Bigger Airflow databases can take a long time to migrate the database, particularly if they have a lot of task instances. On PostgreSQL, creating a new table is much faster than updating the existing table. (cherry picked from commit 559d607)
Bigger Airflow databases can take a long time to migrate the database,
particularly if they have a lot of task instances. Dropping indexes
before setting
run_id
fromdag_run
allows the update to completefaster.
With my test database containing 2.28 million task instances, this resulted in
a significantly faster migration (~145s to ~50s).