-
Notifications
You must be signed in to change notification settings - Fork 14.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support serialization to Pydantic models in Internal API #30282
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good.
I think it would be great to add some unit tests though. |
Added support for BaseJob and unit tests |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
/cc @bolkedebruin who has been overhauling the serde code for 2.6 |
dc6478b
to
0de9630
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bolkedebruin -> any comments?
A small general comment as I know from offline discussions I know you have reservation to the way AIP-44 in general - even if it is approaved and way in-progress.
I think those changes we do nownicely plugs in SQLAlchemy models into our serialization with super-small cost and maintenance effort needed (as described in the original Pydantic PR #29776 - almost no overhead of code to maintain and nicely piggybacks on battle proven Pydantic - used by very serious projects - wih SQLALchemy and ORM model as the first-class citizen).
I think if we implement it now and follow through with Internal API implementation (for quite a while we will mark it as experimental, I am sure) and my bet is the work we are doing will allow to nicely decouple the DB from the actual code that is run in the worker. Even if we (much) later decide to implement another solution (workload based for example), those changes will not hold us back, but rather make it easier to understand which parts of the process we need to replace in order to make this happen.
I think our changes are specifically aimed in making minimal "surgical" changes to existing code to make also separation-of-concerns better, and then any future discussions might get easier once we complete it.
I think we are not ready to spend time on making more "Architectural" changes now. And I also think what we implement now will make it easier, not harder if we decide to them (for Airflow 3 for example).
I honestly think this is a good way to move it forward :).
78d349a
to
e9e7ed1
Compare
e9e7ed1
to
4f30904
Compare
Hey @bolkedebruin - any comments ? I think it nicely (and optionally) plugs in our serialization to add more capabilities we need in AIP-44 so if there will be no more feedback, I am planning to merge that one so that we can proceed. |
c62d53e
to
0d66567
Compare
I assume no news is a good news and merge this one once it passes (I rebased to apply fix to pytest collection timeout from today)., @uranusjr -> maybe alse we could start merging the decoupling: #30255 -> #30302 -> #30308 -> #30376 It starts to hold us back quite a lot and I would love to progress with AIP-44 implementation. |
Looks good. Merging. I decided also to move the Pydantic classes to serialization as suggested by @uranusjr as immediate follow-up. Any conflicts there will be super-easy to solve as those will be only imports - In my refactoring, I barely touch those classes now after removing |
Though - I see this PR has fallen victim of the "UI Rebase bug" from GitHub @mhenc can you please re-push your branch after manual rebase? |
If someone can tell me how to setup gmail filters so these notifications get somewhere visible I'd appreciate it :/. My comment would be why this is tied to the old serialization? I understand it is merged of course, because I think earlier (other PR related to this AIP) we agreed upon not extending the old serializer. |
Where should we add it ? - we can still move it (I have not followed the serialisation discussion earlier but I am eager to learn :). I would suggest to add "@bolkedebruin" in "contains word" and make sure it is flagged as important, not skip inbox etc. |
Thanks, I'll try that. In an earlier discussion @mhenc added some functionality to the old serializer on the basis of that rebasing on the current serializer, which supports versioning, is an order of magnitude faster, and is more flexible and maintainable, was too much impact. I agreed on the condition (i think explicitly) that we then wouldn't add more functionality to the old serializer. However, I don't consider this a small change anymore. So, I wonder what the reasoning was and maybe the suggestion is to at least provide the same functionality in the new serializer if we really need this now. |
What's the new vs. old serialization @bolkedebruin? Sorry I have not paid too much attention. I looked through the code now - do I understand correctly (correct me if I am wrong) that the change is to separate serializer "type"s into separate modules in the "serializer" package and we register them dynamically in That sounds very reasonable approach IMHO - and I like it a lot more than the if/elif approach in the BaseSerializer (if I correctly understand it). So @mhenc - is there a reason why we would not be able to do that? Because if that's the future direcrion of our serialization then I'd be in favour of doing so (unless there are some resons we would not like/or couldn't do it). I think we are anyhow heading now for 2.7 with AIP-44, so even if it means a bit more refactoring and testing, that would make more sense. |
I think we can remove that part for now since we are anyhow moving it behind a feature flag as 2.6.0 is going to happen before we manage to merge outstanding changes and complete testing #30509 #30510 |
Yes, I still plan to integrate with new serialization, this PR was just to unblock any work with AIP-44. |
Yeah. You can start with putting the change here behind _ENABLE_AIP_44 feature flag, so that we know this part is not used, until we replace it with the "new" serialization (and we can leave TODO: comment to remove it once we switch to the new one). |
Thx :-) |
Follow-up for #29776
With Pydantic types added to Airflow and changes to LocalTask Job in #30255 we can use serialize TaskInstance/DagRun/DagSet to Pydantic object to be used on client side.
closes: #30240