Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigate best way to fix and fix MyPy failure on pipeline_registry.py in default template #2526

Open
astrojuanlu opened this issue Apr 18, 2023 · 7 comments

Comments

@astrojuanlu
Copy link
Member

Description

The default template has this code in pipeline_registry.py:

pipelines = find_pipelines()
pipelines["__default__"] = sum(pipelines.values())

However, latest MyPy version (1.2.0 at the time of writing) doesn't have a high opinion on that snippet:

src/openrepair/pipeline_registry.py:13: error: Incompatible types in assignment (expression has type "Union[Pipeline, Literal[0]]", target has type "Pipeline")  [assignment]
Found 1 error in 1 file (checked 8 source files)

I tested this on Python 3.10 with the default MyPy settings. I think it's reasonable because sum has a default start=0, therefore if the pipelines list is empty, the result has a different type:

In [5]: pipelines = [Pipeline([]), Pipeline([])]

In [6]: sum(pipelines)
Out[6]: Pipeline([])

In [7]: sum([])
Out[7]: 0

Context

The reason why sum(pipelines) works at all if because of this:

def __radd__(self, other):
if isinstance(other, int) and other == 0:
return self

which naturally allows this nice syntactic sugar by making this work:

In [9]: 0 + Pipeline([])
Out[9]: Pipeline([])

Steps to Reproduce

  1. Create a new Kedro project with the default template
  2. Run mypy on it
  3. Observe error

Expected Result

The project templates should produce statically correct code, and give no MyPy errors.

Your Environment

Include as many relevant details about the environment in which you experienced the bug:

  • Kedro version used (pip show kedro or kedro -V): 0.18.7
  • Python version used (python -V): 3.10.9
  • Operating system and version: macOS Ventura
@astrojuanlu
Copy link
Member Author

This line makes MyPy happy but it's not as nice:

pipelines["__default__"] = sum(pipelines.values(), start=Pipeline([]))

@antonymilne
Copy link
Contributor

Hmm, this is an annoying one. I'm sure other people must have come across this also since it's pretty common to do sum like this on custom classes without specifying start.

Does it help if we add a type annotation to __radd__?

def __radd__(self, other: Union[Pipeline, Literal[0]])

Definitely I'd prefer to not explicitly specify start=Pipeline([])) - we could just put a mypy ignore in there if required instead?

@astrojuanlu
Copy link
Member Author

I think the problem is that find_pipelines always returns {"__default__": Pipeline([])} even if it finds nothing, but this dynamic behavior is difficult to communicate to MyPy. Even with the __radd__ annotation, that wouldn't help with the mismatch on the register_pipelines annotation.

I think there only ways are either ignoring the error (always feels like surrender 🏳️) or try to assure MyPy that somehow pipelines.values() will never be empty, but the latter necessarily requires code changes. An alternative to start= could be

    pipelines["__default__"] = sum(pipelines.values()) or Pipeline([])

still looks meh but at least one does not need to know about the obscure sum behavior.

@deepyaman
Copy link
Member

I feel like overriding the sum return type should work, like so:

from typing import Iterable, overload
 
 
 @overload
 def sum(__iterable: Iterable[Pipeline]) -> Iterable[Pipeline]: ...

The idea is to override https://github.com/python/typeshed/blob/dd2818a41db6cd31e4680abf5e1362a7e5bfb5a6/stdlib/builtins.pyi#L1731-L1748.

I need to test this properly, though; whatever I'm changing seems to be magically make mypy --no-incremental . on a new Kedro project pass, even if I undo changes, so that can't be right... (I am able to get the error from this issue when just installing from PyPI)

@astrojuanlu
Copy link
Member Author

This is a mostly harmless code change and would allow us to introduce MyPy as a linter. Is anybody strongly opposed to #2526 (comment) or #2526 (comment) ?

@antonymilne
Copy link
Contributor

I am not strongly opposed but I am opposed enough to leave a comment 😀

I think that sum(pipelines.values()) is significantly nicer than the suggestions you linked to. IMO the correct solution here would be to some fix that allows that to pass mypy, e.g. @deepyaman's suggestion #2526 (comment).

If it's too difficult to get something like that working then changing sum(pipelines.values()) is ok by me, but I do think it's worth investing a bit more time investigating alternatives because your proposed solutions make all users' pipeline_registry.py (which is arguably already too complicated) even harder to understand.

@astrojuanlu astrojuanlu changed the title MyPy complains about pipeline_registry.py in default template Investigate best way to fix and fix MyPy failure on pipeline_registry.py in default template Feb 12, 2024
@astrojuanlu
Copy link
Member Author

I'm leaving this to whoever takes the ticket, since all solutions are quite small and doesn't seem to be consensus about that, but it's also quite minor

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: No status
Development

No branches or pull requests

3 participants