Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simplify checks for package versions #37585

Merged
merged 1 commit into from
Feb 21, 2024

Conversation

potiuk
Copy link
Member

@potiuk potiuk commented Feb 21, 2024

Replaces more complex package version checks with one-liners.


^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named {pr_number}.significant.rst or {issue_number}.significant.rst, in newsfragments.

@potiuk
Copy link
Member Author

potiuk commented Feb 21, 2024

cc: @Dev-iL

Copy link
Contributor

@Dev-iL Dev-iL left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for notifying me! I've left some questions/suggestions.

While you're at it, perhaps also modify the PENDULUM3 definitions in

  • airflow/utils/timezone.py
  • tests/serialization/serializers/test_serializers.py
    ?

airflow/utils/pydantic.py Outdated Show resolved Hide resolved
airflow/utils/pydantic.py Outdated Show resolved Hide resolved
airflow/utils/pydantic.py Outdated Show resolved Hide resolved
@Dev-iL
Copy link
Contributor

Dev-iL commented Feb 21, 2024

In #37575, @Taragolis used from packaging.version import Version instead of parse, presumably because parse is just an alias for Version,

def parse(version: str) -> "Version":
    """Parse the given version string.

    >>> parse('1.0.dev1')
    <Version('1.0.dev1')>

    :param version: The version string to parse.
    :raises InvalidVersion: When the version string is not a valid version.
    """
    return Version(version)

Perhaps we should agree on a uniform way to import this?

@Dev-iL
Copy link
Contributor

Dev-iL commented Feb 21, 2024

Build failed due to the patching path not being updated in the sqlalchemy tests - should be changed something like this:

def test_is_sqlalchemy_v1(major_version, expected_result):
-    with mock.patch("airflow.utils.sqlalchemy._get_lib_major_version") as mock_get_major_version:
+    with mock.patch("airflow.utils.sqlalchemy.parse_version") as mock_get_major_version:

And also entirely remove test_get_lib_major_version.

@potiuk
Copy link
Member Author

potiuk commented Feb 21, 2024

While you're at it, perhaps also modify the PENDULUM3 definitions in

airflow/utils/timezone.py
tests/serialization/serializers/test_serializers.py

Ah. Nice. I missed those, yes, good idea to fix them there - we had recently a long discussion about using __version__ with @Taragolis #37320 (comment) and we came to the conclusion is that we should indeed switch to using importlib/metadata instead as the only "standard" way, I even have a draft PR to change the approach we use in our providers #37335 as result of that (waited for the release of providers that just happened so I might revive it).

@potiuk
Copy link
Member Author

potiuk commented Feb 21, 2024

And also entirely remove test_get_lib_major_version.

Yep. This is why we have CI to tell us so.

Replaces more complex package version checks with one-liners.
@potiuk potiuk force-pushed the simplify-package-version-checks branch from c67a245 to 6687cf3 Compare February 21, 2024 11:20
@potiuk
Copy link
Member Author

potiuk commented Feb 21, 2024

Perhaps we should agree on a uniform way to import this?

perhaps. you seem to be good at chasing all of those. Maybe a separate PR proposing such unification. This is best way here to get things done in the way you think things shoudl be done - just doing it.

@potiuk
Copy link
Member Author

potiuk commented Feb 21, 2024

Right. Shoudl be nicer. simpler, more consistent and less code everywhere

@Taragolis
Copy link
Contributor

I also think about some caching mechanism, we parse all distributions which installed into the environment:

def _get_grouped_entry_points() -> dict[str, list[EPnD]]:
mapping: dict[str, list[EPnD]] = defaultdict(list)
for dist in metadata.distributions():
try:
for e in dist.entry_points:
mapping[e.group].append((e, dist))
except Exception as e:
log.warning("Error when retrieving package metadata (skipping it): %s, %s", dist, e)
return mapping

Distribution have the information about version also. So maybe it is a good point to have generalised util around importlib.metadata and packaging for retrieve and parse this information in reusable format, e.g.

from __future__ import annotations

import sys

from typing import NamedTuple
from packaging.version import Version
if sys.version_info >= (3, 10):
    from importlib import metadata
else:
    # The “selectable” entry points were introduced in importlib_metadata 3.6 and Python 3.10.
    # Prior to those changes, entry_points accepted no parameters and always returned a dictionary of entry points,
    # keyed by group. With importlib_metadata 5.0 and Python 3.12, entry_points always returns an EntryPoints object.
    # See backports.entry_points_selectable for compatibility options.
    # source: https://docs.python.org/3/library/importlib.metadata.html
    import importlib_metadata as metadata


class DistVersion(NamedTuple):
    distribution_name: str
    version: str
    version_info: Version
    base_version_tuple: tuple[int, int, int]

    @classmethod
    def from_distribution(cls, dist: metadata.Distribution) -> DistVersion:
        vi = Version(dist.version)
        return DistVersion(
            distribution_name=dist.name,
            version=dist.version,
            version_info=vi,
            base_version_tuple=(vi.major, vi.minor, vi.micro)
        )


airflow_dist = metadata.distribution("apache-airflow")
dist_version = DistVersion.from_distribution(airflow_dist)
print(dist_version)
print(dist_version.base_version_tuple >= (2, 8))
print(dist_version.base_version_tuple >= (2, 9))
if dist_version.base_version_tuple < (2, ):
    raise RuntimeError("¯\_(ツ)_/¯")

That is idea for potential future improvements, not for particular this PR

@potiuk
Copy link
Member Author

potiuk commented Feb 21, 2024

That is idea for potential future improvements, not for particular this PR

Yep. If it can give measurable improvements - THAT could be the reason to introduce a common uitl (just retrieving it is IMHO not enough to extract common code.

@Taragolis
Copy link
Contributor

I guess it is more common in general if compare to one time usage in the Provider Manager.

  • We check different versions into the core, maybe even into the providers
  • Providers collected information about different providers version.
  • Providers collected information about itself (for tracing purpose I guess)

However it is required more time to introduce common interface for both core and providers, because if it will allow to use into the providers, than we cant change it.

@potiuk
Copy link
Member Author

potiuk commented Feb 21, 2024

However it is required more time to introduce common interface for both core and providers, because if it will allow to use into the providers, than we cant change it.

Yep. Unless/untll we introduce common.utils provider :). Which we've been toying the idea with.

@Taragolis
Copy link
Contributor

The problem with airflow.utils it is not covered by Public Interface (except TaskGroups), however if Community Provider use it, it might prevent to changing it. So maybe we need to move some internal utilities to the internal sub-package all common stuff keep in the separate package, and deprecate all other stuff (something we might safely remove).

Presumably good point to future discussion.

@potiuk potiuk merged commit f33b686 into apache:main Feb 21, 2024
59 checks passed
@potiuk potiuk deleted the simplify-package-version-checks branch February 21, 2024 16:04
@ephraimbuddy ephraimbuddy added this to the Airflow 2.9.0 milestone Mar 6, 2024
@ephraimbuddy ephraimbuddy added the type:improvement Changelog: Improvements label Mar 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type:improvement Changelog: Improvements
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants