Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support serialization to Pydantic models in Internal API #30282

Merged
merged 2 commits into from
Apr 5, 2023

Conversation

mhenc
Copy link
Contributor

@mhenc mhenc commented Mar 24, 2023

Follow-up for #29776

With Pydantic types added to Airflow and changes to LocalTask Job in #30255 we can use serialize TaskInstance/DagRun/DagSet to Pydantic object to be used on client side.

closes: #30240

@mhenc mhenc marked this pull request as ready for review March 24, 2023 14:11
@mhenc mhenc requested review from kaxil and ashb as code owners March 24, 2023 14:11
@mhenc
Copy link
Contributor Author

mhenc commented Mar 24, 2023

cc: @potiuk @vincbeck

Copy link
Member

@potiuk potiuk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good.

@potiuk
Copy link
Member

potiuk commented Mar 24, 2023

I think it would be great to add some unit tests though.

@mhenc
Copy link
Contributor Author

mhenc commented Mar 24, 2023

Added support for BaseJob and unit tests

@mhenc mhenc requested a review from potiuk March 24, 2023 15:01
Copy link
Contributor

@vincbeck vincbeck left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@ashb
Copy link
Member

ashb commented Mar 25, 2023

/cc @bolkedebruin who has been overhauling the serde code for 2.6

@ashb ashb requested a review from bolkedebruin March 25, 2023 09:26
@mhenc mhenc force-pushed the pydantic_internal branch from dc6478b to 0de9630 Compare March 27, 2023 08:59
Copy link
Member

@potiuk potiuk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bolkedebruin -> any comments?

A small general comment as I know from offline discussions I know you have reservation to the way AIP-44 in general - even if it is approaved and way in-progress.

I think those changes we do nownicely plugs in SQLAlchemy models into our serialization with super-small cost and maintenance effort needed (as described in the original Pydantic PR #29776 - almost no overhead of code to maintain and nicely piggybacks on battle proven Pydantic - used by very serious projects - wih SQLALchemy and ORM model as the first-class citizen).

I think if we implement it now and follow through with Internal API implementation (for quite a while we will mark it as experimental, I am sure) and my bet is the work we are doing will allow to nicely decouple the DB from the actual code that is run in the worker. Even if we (much) later decide to implement another solution (workload based for example), those changes will not hold us back, but rather make it easier to understand which parts of the process we need to replace in order to make this happen.

I think our changes are specifically aimed in making minimal "surgical" changes to existing code to make also separation-of-concerns better, and then any future discussions might get easier once we complete it.

I think we are not ready to spend time on making more "Architectural" changes now. And I also think what we implement now will make it easier, not harder if we decide to them (for Airflow 3 for example).

I honestly think this is a good way to move it forward :).

@mhenc mhenc force-pushed the pydantic_internal branch 2 times, most recently from 78d349a to e9e7ed1 Compare March 29, 2023 14:51
@potiuk potiuk force-pushed the pydantic_internal branch from e9e7ed1 to 4f30904 Compare March 30, 2023 18:44
@potiuk
Copy link
Member

potiuk commented Mar 30, 2023

Hey @bolkedebruin - any comments ? I think it nicely (and optionally) plugs in our serialization to add more capabilities we need in AIP-44 so if there will be no more feedback, I am planning to merge that one so that we can proceed.

@mhenc mhenc force-pushed the pydantic_internal branch 2 times, most recently from c62d53e to 0d66567 Compare April 3, 2023 09:46
@potiuk
Copy link
Member

potiuk commented Apr 4, 2023

I assume no news is a good news and merge this one once it passes (I rebased to apply fix to pytest collection timeout from today)., @uranusjr -> maybe alse we could start merging the decoupling:

#30255 -> #30302 -> #30308 -> #30376

It starts to hold us back quite a lot and I would love to progress with AIP-44 implementation.

@potiuk
Copy link
Member

potiuk commented Apr 5, 2023

Looks good. Merging. I decided also to move the Pydantic classes to serialization as suggested by @uranusjr as immediate follow-up. Any conflicts there will be super-easy to solve as those will be only imports - In my refactoring, I barely touch those classes now after removing globals stuff recently.

@potiuk
Copy link
Member

potiuk commented Apr 5, 2023

Though - I see this PR has fallen victim of the "UI Rebase bug" from GitHub @mhenc can you please re-push your branch after manual rebase?

@mhenc mhenc force-pushed the pydantic_internal branch from 75c112e to f872dc5 Compare April 5, 2023 07:47
@potiuk potiuk merged commit 41c8e58 into apache:main Apr 5, 2023
@mhenc mhenc deleted the pydantic_internal branch April 5, 2023 13:39
@bolkedebruin
Copy link
Contributor

bolkedebruin commented Apr 6, 2023

If someone can tell me how to setup gmail filters so these notifications get somewhere visible I'd appreciate it :/. My comment would be why this is tied to the old serialization? I understand it is merged of course, because I think earlier (other PR related to this AIP) we agreed upon not extending the old serializer.

@potiuk
Copy link
Member

potiuk commented Apr 6, 2023

Where should we add it ? - we can still move it (I have not followed the serialisation discussion earlier but I am eager to learn :).

I would suggest to add "@bolkedebruin" in "contains word" and make sure it is flagged as important, not skip inbox etc.

@bolkedebruin
Copy link
Contributor

Thanks, I'll try that.

In an earlier discussion @mhenc added some functionality to the old serializer on the basis of that rebasing on the current serializer, which supports versioning, is an order of magnitude faster, and is more flexible and maintainable, was too much impact. I agreed on the condition (i think explicitly) that we then wouldn't add more functionality to the old serializer. However, I don't consider this a small change anymore. So, I wonder what the reasoning was and maybe the suggestion is to at least provide the same functionality in the new serializer if we really need this now.

@potiuk
Copy link
Member

potiuk commented Apr 6, 2023

What's the new vs. old serialization @bolkedebruin?

Sorry I have not paid too much attention. I looked through the code now - do I understand correctly (correct me if I am wrong) that the change is to separate serializer "type"s into separate modules in the "serializer" package and we register them dynamically in serde._register() - discovering the different "serializers" we have there ? In this case we would then have "pydantic.py" module in "serializers" and use "serde.serialize/deserialize" to serialize such objects instead of BaseSerialization.serialize/deserialize?

That sounds very reasonable approach IMHO - and I like it a lot more than the if/elif approach in the BaseSerializer (if I correctly understand it).

So @mhenc - is there a reason why we would not be able to do that? Because if that's the future direcrion of our serialization then I'd be in favour of doing so (unless there are some resons we would not like/or couldn't do it). I think we are anyhow heading now for 2.7 with AIP-44, so even if it means a bit more refactoring and testing, that would make more sense.

@bolkedebruin
Copy link
Contributor

@potiuk Yes, you are generally correct. In @mhenc defense there could be reasons to not integrate with the new serializer yet, basically if you rely on DAG serialization which I haven't completed yet. Yet, extending the old serialization significantly seems odd.

@potiuk
Copy link
Member

potiuk commented Apr 6, 2023

@potiuk Yes, you are generally correct. In @mhenc defense there could be reasons to not integrate with the new serializer yet, basically if you rely on DAG serialization which I haven't completed yet. Yet, extending the old serialization significantly seems odd.

I think we can remove that part for now since we are anyhow moving it behind a feature flag as 2.6.0 is going to happen before we manage to merge outstanding changes and complete testing #30509 #30510

@potiuk
Copy link
Member

potiuk commented Apr 7, 2023

Once we merge #30509 #30510 we can put it behind AIP-44 feature flag and we can swap it out when the new serialization is ready.

@mhenc
Copy link
Contributor Author

mhenc commented Apr 7, 2023

Yes, I still plan to integrate with new serialization, this PR was just to unblock any work with AIP-44.
I remember when we talked about that a while ago it was still under development.
Let me check if we rely on DAG Serialization (I think we don't). So I can work now on replacing the serialization framework for AIP-44.

@potiuk
Copy link
Member

potiuk commented Apr 7, 2023

Yeah. You can start with putting the change here behind _ENABLE_AIP_44 feature flag, so that we know this part is not used, until we replace it with the "new" serialization (and we can leave TODO: comment to remove it once we switch to the new one).

@ephraimbuddy ephraimbuddy added changelog:skip Changes that should be skipped from the changelog (CI, tests, etc..) AIP-44 Airflow Internal API labels Apr 11, 2023
@bolkedebruin
Copy link
Contributor

Thx :-)

@potiuk
Copy link
Member

potiuk commented Apr 11, 2023

Thx :-)

yep. Feature flag nicely implemented here to start with and I will let @mhenc to explore switching to the new serializer #30560

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
AIP-44 Airflow Internal API area:serialization changelog:skip Changes that should be skipped from the changelog (CI, tests, etc..)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

AIP-44 Implement conversion to Pydantic-ORM objects in Internal API
7 participants