-
Notifications
You must be signed in to change notification settings - Fork 14.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove MSSQL support from Airflow core #35868
Remove MSSQL support from Airflow core #35868
Conversation
Should work after rebase |
Thanks @potiuk for the analysis and the hist to history. My main "mis-understanding" was that I did not know that there are multiple installs for the driver. Was wondering if this is a leftover (=garbage). But understood it is a "feature". I understand (now) that mssql is needed in the docker for the provider package, which is fail. reverted the last two (functional) commits, leaving the driver and the install options for the docker build scripts like before. |
…support-from-airflow-core
278baa1
to
f321c84
Compare
Yeah. It is not clear indeed :). As @Taragolis mentioned in slack, likely we should split the mssql_sh installation in two. Just to add a bit of context here. The Dockerfiles of ours (especially the production one) are built with "users can build their own custom, optimized builds files using them". There is the whole set of documentation and set of arguments users can use if they want to build their own, even more optimized image using our Dockerfile if they want. Currently we pre-built two images:
But the users can simply get our Dockerfile and build their own image using build-args https://airflow.apache.org/docs/docker-stack/build.html#customizing-the-image For now both MSSQL clients we install (pymssql and msodbc) are controlled by a single arg when you want to build custom image with or without it: https://airflow.apache.org/docs/docker-stack/build-arg-ref.html#image-optimization-options
But likely we should split that into two variables. Another small comment/context. For development purpose you can very easily build your own, custom PROD image using these. Without even looking at the variable / build image docs. The |
@@ -139,11 +139,11 @@ def upgrade(): | |||
type_=mysql.TIMESTAMP(fsp=6), | |||
) | |||
else: | |||
# sqlite and mssql datetime are fine as is. Therefore, not converting |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not really sure if we want to remove mssql in all the migrations from the past.
Maybe yes, Maybe no. I have arguments for both.
The arguments for "removing MSSQL From migrations":
I am leaning towards removing them (so basically what this PR does), but I am a bit torn on that one. On one hand I would love to constraint and limit the "mssql" users to only use Airflow 2.7.3 and never, ever use airflow >= 2.8.0 for anything MSSQL. On the other hand it feels a little weird. I'd love other's opinion - especially those who worked on the migrations in the past: @ephraimbuddy @dstandish @ashb @kaxil @eladkal
The argument for "leaving MSSQL in migrations":
It does not really looks like it is necessary to remove them and does not cost us much. It's a bit sunken cost already and we are not going to maintain those migrations anyway. <aybe at some point in time we might want to merge them somehow and get a new "starting point" migration in the future, but I am not sure what is the value in removing mssql from the old migrations.
It feels a bit weird to change the past migrations that are already marked as "I run this migration already" by alembic. I know it SHOULD NOT matter when you have Postgres/MySQL, but it feels a bit wrong to modify the migration scripts, especially that it does not hurt us to leave them there (we generally do not touch/maintain migrations too often).
I see two potential prolems with removing the migrations for mssql from those migrations:
-
there is a risk we remove something here accidentally by missing it in review - there are a lot of files to be reviewed and lots of conditions, there is a slight risk we will miss it. But with careful review we can mitigate it.
-
there is a side-effect on --from-ref/--to-ref, --from-version/--to-version. Since we are still supporting MSSQL for airlfow < 2.8.0, it would be nice if Airflow 2.8+ still supports migrations for older versions of airflow even for MSSQL.
I think (but I might be wrong) - just leaving the mssql in migration code will just allow it to be run and it SHOULD work in general. So users will still be able - for example - to run airflow db migrate --to-version 2.7.3
using Airflow 2.8, 2.9 etc. There will be no need to run those migration with Airflow 2.7.3
strictly.
My current thinking
But I am bit torn on that one. It also relates to the discussion we had on #35861 and what we want to tell our users who run MSSQL now.
So far my points there at least were:
- let's constraint the set of tool and environment as much as we can to reduce variability of environment
- let's provide a tool/script that they can use - but also modify if needed, so that it's not part of "airflow" itself
So if we follow and agree that line of thought - then yes, removing MSSQL from Airflow 2.8.0 and not allowing the MSSQL users to run --to-version 2.7.3
feels about right. On the other hand - if we allow for variability, and letting the users to do the migrations in various ways, it's ok to leave it in.
I wonder what you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
100% on your argument line - and I really had the same thoughts, just did not write it like this.
Mainly I started removing migrations when I cleaned (future) dead code in airflow/migrations/utils.py and then saw migration scripts failing because common functions missing. Migration code is not 100% separated.
As we are shortly before the release and we need more days in discussing/debating about migrations I'd also be OK to split the PR in two. (with a minimum risk that I need to leave some migration dependency dead code behind).
And still if somebody want to play with MSSQL there is the 2.7.3 release available, from there still you could up-and downgrade. Leaving traces behind if 2.8.0 does not support MSSQL anyway is just some leftover cleanup. And I would not like to keep all MSSQL dependencies in the future (2.9++?) just because in ancient history we supported MSSQL - rather clean it soon.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
waiting to what others say. I think we do not have to rush it. Spoke to @ephraimbuddy today and RC is planned 11th of December. I think we have quite some time to discuss, decide, iterate and merge.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Main risk I see is this PR is quire large, if we merge last-minute, there is no buffer to see if we had a unforeseen side-effect. But I am okay if you see it relaxed :-D
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah. The problem with size here is that it's "all" or "nothing" in case of migrations (and those are the big ones) . All the rest is "2 minutes" review :).
So yeah, agree we should - I think - quickly - get to conclusions on how we should do it and start reviewing/merging what we agreed about.
CC: everyone else who migh have an opinion :) . Maybe also worth to send it to devlist to raise "importance" of it - might be in the response of the vote/consensus we had about MSSQL removal.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the other hand it feels a little weird
Does it? even before we started experimental support we accepted changes to migration scripts to better support MsSQL. It doesn't cost us anything other than few min of review.
i think we should keep it and also accept PRs improving/fixing issues related to MsSQL.
The removal of MsSQL support from our side just means that we do not test Airflow deployment against MsSQL backend. It doesn't mean that it won't work it just mean that we don't guarantee it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah. It was a "little" weird. But I could go either way.
The removal of MsSQL support from our side just means that we do not test Airflow deployment against MsSQL backend. It doesn't mean that it won't work it just mean that we don't guarantee it.
I think we need a a bit more than that.
There are two potential scenarios for our MSSQL current users:
- Scenario 1 - we do not hard/stop mssql users for migration and it will maybe work and maybe not
If we leave it in "maybe it will work" then our users will continue using airflow until it breaks and willl come to us when it will stop (inevitably) - the whole point is that the "mssql-specific" code in Airflow slows us down, so getting rid of it is a good idea and also it's likely in a few months time, it will catastrophically break MSSQL installation - say somoene already moved to 2.9.0 - still using MsSQL and then it will break in 2.10.0. Now. they have no way to go back (becuase migration scripts will also break for example) and we have a user who has broken installation that they can do nothing about.
And they will complain to us, will take our time, will cry for help and workarounds regardless what is the status of MSSQL - and most of all they will loose the trust in Airflow as a good solution for them because it won't work for them for possibly days when they will try to recover somehow. And this all will be happening 2 years from now when nobody here will even remember what kind of changes they were there implmented and likely many of people who are now in Airflow won't even be around.
And this is actually one of the reasons we want to remove MSSQL - we do not WANT to get issues about MSSQL for 2.8+
Is it their fault? Technically yes - they have not read the release notes (I have no stats but there is quite a number of people who don't).
- Scenario 2 - we forbid MSSQL and remove MSSQL code straight away
We can explicitly forbid migration to 2.8.0 for MSSQL users. that will be much cleaner solution. We will provide them the migration script on 2.7.3 and how to migrate to other database. The users will see it immediatley when they will attempt to migrate - and they will have two choices: a) stay with Airflow 2.7.3 longer b) migrate to other DB.
I think this is far more plausible scenario - both for our users and maintainers (including future ones) of Airflow. The cut is clean, and we know already that (some) people do not read release notes so basically - if we allow MSSQL users to migrate to 2.8.0, then it's all but guaranteed that some of our users will actually get into troubles sooner or later. We are basically accepting the fact that some percentage of our users, rather than stopped from migration - will migrate and eventually get in the troubles.
So I'd personally be for even saying:
if mssql => raise Error ("Migrate to Postgres or MySQL").
I don't know if this "breaking" is really a problem, users still could follow building the dependencies on their own. It is breaking apart a non-promised dependency (I think we never promised which dependencies we ship per default, was just a dependency because we had it in code). So in my eyes making a note is OK. Else also for other python depts we would never be able to remove any. And as one other MSSQL client driver is there, it would mainly (in 90% of cases) just need to switch the connection URL (if not custom python code)? |
…support-from-airflow-core
I'm not confortable having this in 2.8.0. Let's leave it for 2.9.0 because the change is big and close to the release time. |
I understand the change is large, this is a reason. Raised this a bit late, unfortunately review was stale for a number of days :-( Anyway, admins should have been warned already, there was a post in the docs already, see: https://airflow.apache.org/docs/apache-airflow/stable/howto/set-up-database.html#setting-up-a-mssql-database Anyway... so let put this PR on pause and merge it for 2.9.0... |
Well. I think we have all the building-blocks in place. Yes it might meen few people doing some out-of-hours stuff - but for a good rason. I think if we miss the window of opportunity with announcements and "potentially breaking" changes, then we won't be abe to do much (due to cherry-pickability of some fixes). We have the script to migrate, we can put it separately (few minutes tomorrow mid-day) - and we are done. IMHO - the risk is small. |
…support-from-airflow-core
…support-from-airflow-core
…support-from-airflow-core
…support-from-airflow-core
Hey @jscheffl -> do you need my help / actions for that one? I am happy to create the separate repo, if needed and we could already arrange the PR in the way to be "ready to marge" when we start thinking about 2.9.0 - we want to get it all ready and prepared quite some time before 2.9.0 so that we do not miss another window of opportunity for MSSQL removal :). |
@potiuk Yes, before re-basing another 10 times, I'd like to get this off the plate and merged. So your advise and prep help might help:
|
I created https://github.com/apache/airflow-mssql-migration . In about 5 minutes permissions should be set there and you should be able to initiate it an push the migration tooling there (even directly without PR/Approval). We can setup branch protection later when it settles down, for now it's ok to just push the changes there (but if you prefer to open PRs it's also fine - happy to review them).
Yeah. I would rather do it in reverse:
Yeah. See above, I think with just dropping MSSQL the chance of breaking cherry-picking is rather low (and in that case we would likely not cherry-pick it anyway - as I see it only happening for new features). But I'd love to hear what others say as well (especially @ephraimbuddy ). |
Closing this PR as the attempt is now super-seeded with PRs:
Leaving the cleanup of migration scripts "open for now" |
PR to remove MSSQL support from Airflow - as discussed and voted in https://lists.apache.org/thread/pgcgmhf6560k8jbsmz8nlyoxosvltph2
This PR removes MSSQL from:
It does NOT remove the provider package, so no mis-understanding: Apache Airflow further will "work" with MSSQL and the provider package and features are un-touched. All Operators keep in place. Just the metadata support is reduced to sqlite (dev only), Postgres, MySQL.
Note that for this also the build of MSSQL client code is kept inside the Docker as Provider packages rely on this availability.
Preconditions before merge: