Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change Parallelizable and Collect to structural subtypes #1115

Merged
merged 3 commits into from
Sep 1, 2024

Conversation

cswartzvi
Copy link
Contributor

@cswartzvi cswartzvi commented Aug 31, 2024

Attempts to fix #1114 by introducing structural subtyping for Parallelizable and Collect.

Changes

Both Parallelizable and Collect were converted to generic protocols (vice abstract classes). The function is_parallelizable_type was also updated because generic protocols are not runtime checkable via issubclass - the function now checks the origin type via typing.get_origin.

How I tested this

As mentioned in #1114, I was having issues running the full test suite locally on my Windows 11 machine, but I was able to run all the tests where Parallelizable is used, that includes the following folders:

  • ./tests/execution/
  • ./tests/function_modifiers/
  • ./tests/resources/

Notes

I didn't find any usage of is_parallelizable_type in the codebase. Is this intended for a future feature?

Checklist

  • PR has an informative and human-readable title (this will be pulled into the release notes)
  • Changes are limited to a single goal (no scope creep)
  • Code passed the pre-commit check & code is left cleaner/nicer than when first encountered.
  • Any change in functionality is tested
  • New functions are documented (with a description, list of inputs, and expected output)
  • Placeholder code is flagged / future TODOs are captured in comments
  • Project documentation has been updated if adding/changing functionality.

@elijahbenizzy
Copy link
Collaborator

Hey! Thanks for this -- makes sense at a high-level. Also the issue is quite helpful in highlighting the problem.

TL;DR -- originally I intended it to be a generator but that's over-engineered. Looks good!

So, two changes here:
(1) changing it from a Generator to an Iterable
(2) Changing it to be covariant types + using P

Covariant types make sense -- this is a pretty clear-cut case of where any subclass of V would work. Re: iterable -- generator was implemented by design (although not strictly necessary). Specifically, the future of this feature intends to allow more fine-grained pipelining for streaming results -- the send/yield from of a generator would be critical there.

The problem is that python has no builtin protocol for Generator (it's a concrete type) which forces us to use ABC, which then goes up against the same problem as earlier. The "right" way would be to define a protocol, but ughh, that's a bit ridiculous. This is hitting the limitations of python's (quite lacking) type system. All this as far as I understand, I'm not an expert in python's typing system.

All that said, we haven't actually made use of the fact that it's a generator, and we could always have another type (E.G. Stream). So talking myself into a circle, I think it's fine to loosen it to Iterable, then we can have a special case when we ant for a generator.

Mind adding a few comments on the type? Also while you're working on this, maybe we should change the name of U and V to ParallelizableElement and CollectElement? Or better names.

Appreciate this, and the chance to dig into python's type system!

@cswartzvi
Copy link
Contributor Author

Ah! So you were thinking about streaming via send - I thought that might be the case when I saw Generator as the base class. For what it's worth, I think that adding an explicit Stream type down the road, separate from Parallelizable, fits nicely with the overall library.

I feel your frustrations with python's type system - I have been using it pretty heavily over the last 3-4 years (I work with a lot of C# people and that is the only way to get them involved with python) but I still bump into rough paths all the time - like no builtin protocol for Generator, which I have seen before, but definitely did not remember.

Anyway, let me add those comments and update the type variable names - I decided to go with your suggestions for the names. Note that I also updated the commented out Sequential so that it will start in a consistent place if/when it is introduced.

@elijahbenizzy
Copy link
Collaborator

Ah! So you were thinking about streaming via send - I thought that might be the case when I saw Generator as the base class. For what it's worth, I think that adding an explicit Stream type down the road, separate from Parallelizable, fits nicely with the overall library.

I feel your frustrations with python's type system - I have been using it pretty heavily over the last 3-4 years (I work with a lot of C# people and that is the only way to get them involved with python) but I still bump into rough paths all the time - like no builtin protocol for Generator, which I have seen before, but definitely did not remember.

Anyway, let me add those comments and update the type variable names - I decided to go with your suggestions for the names. Note that I also updated the commented out Sequential so that it will start in a consistent place if/when it is introduced.

Yep, I agree on the way to proceed. I'm just that weird one hoping for true compiled python at some point 😆

Appreciate the comments + changes + in-depth thoughts! This looks good, merging!

@elijahbenizzy elijahbenizzy self-requested a review September 1, 2024 18:48
Copy link
Collaborator

@elijahbenizzy elijahbenizzy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

@elijahbenizzy elijahbenizzy merged commit d90212f into DAGWorks-Inc:main Sep 1, 2024
24 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Parallelizable type checking error with pyright / VS Code
2 participants