Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Intersection: The Ordered Layers Formulation #48

Open
mark-todd opened this issue Apr 9, 2024 · 10 comments
Open

Intersection: The Ordered Layers Formulation #48

mark-todd opened this issue Apr 9, 2024 · 10 comments

Comments

@mark-todd
Copy link
Collaborator

mark-todd commented Apr 9, 2024

Summary

I'm starting this thread as a fresh start on the Intersection PEP. We've come across various difficulties in formulating the Intersection PEP, and it's been a long road of discussion - I'm keen for all that hard work not to go to waste.

What's the idea?

This PEP will build a lot on the ideas of the "Ordered Intersection" that's been discussed a lot over the past few weeks, but conceptually is perhaps slightly different. I'm hoping that perhaps these differences will offer solutions to some of the issues we've been facing. The main concepts:

  1. Intersection can be thought of a series of structural layers - Intersection[A,B] is A layered on top of B. There will be more details on exactly what I mean by layering to follow.
  2. For all non-structural types, the intersection must be an instance of all intersection components. They must appear in the same order in the intersection as in the MRO.

What isn't part of this PEP

  • LSP Violations are not a consideration of this PEP. While they may be invalid at the point of inheritance, intersection will not consider if there is a violation or otherwise, and there may be examples that feature such violations.
  • Any will be treated like any other class. My hope is that the rules above are sufficient to mean no special casing will be required for the Any class if we consider it by it's structure.
  • Syntax using & will not be considered, to avoid confusion with unordered intersections. While it may be possible to include, as in the previous PEP there was a subcase dealing with this, for a first PEP I think it will be simpler to exclude this from discussion. It's always possible to add this syntax at a later date, but I leave this to a future PEP.

Other considerations

  • TypedDict, Callable and Protocol will be considered "structural types", and will not have the instance restriction. Callable will act exactly as a form of the __call__ method, and TypedDict will effectively define the type for the __getitem__ method.

What is layering?

The concept of layering is inspired by the way a lot of photo editing software works. Layers are built up - where features overlap the highest priority layer appears in the final image. In this analogy each class/type is effectively a layer, and the stack of layers is the intersection. A layer does not need to a concrete class and can be a protocol. It only represents the exposed method and attribute types of the class/protocol.

This might be quite controversial, but it seems to me so far that the two concepts at the core of the idea deal with a lot of the discussed edge cases. The PEP will put quite a lot of emphasis on the "How to teach" section as these concepts will be at the very heart of the idea.

@hoelzeli
Copy link

Moving here from #38.

Thank you @mark-todd for the quick response and the provided context, that was very helpful! I am still by far not up to date on all past discussions, but the more I read, the more I realize that I probably did not bring anything new to the discussion but just regurgitated other peoples' ideas without giving them credit, especially mikeshardmind, who basically opened #38 just to pursue the same idea; sorry for that :( I did not read thoroughly enough and gave the existing ideas not enough thought.

I agree with the following (non-exhaustive list of) points made in #38:

  • type[T1...TN] / object[T1...TN] would fit a lot of use cases from the original issue #213 in python typing
  • ... but a different/only one name would be better
  • Intersection is not the right name for this as it is not unordered

I also agree with your approach in this issue. I am not sure if introducing the concept of "layering" is beneficial, as it is basically just inheritance with some special cases. Maybe just "extending" the rules of inheritance for the special cases would make more sense/ would be more easily understood?

Currently the structural type will contain the "behaviour" of object (the python class) if a non-protocol type is in the parameter list, correct? Would it be beneficial if we could split the result into two structural parts with one being the ObjectProtocol and the other one being everything "interesting"/non-default? This would then effectively be a way to extract a protocol from a type (which we are already implicitely doing in this PEP, right?). Or should we always "remove" the default object behaviour from the resulting protocol as the ObjectProtocol is basically of no use for annotations?

Regarding the issue with Any: I just posted another idea to the issue that you mentioned in the draft. I am curious to see what the reactions will be, but even if nothing happens, we could maybe transfer the idea from there and provide two options for how to treat Any (either LSP-compatible or not)?

@carljm
Copy link

carljm commented Oct 12, 2024

For what it's worth, I believe we should just do a true (unordered) intersection, and I don't think there are any significant issues standing in the way of doing so, after the clarifications to the typing spec around the meaning of Any in gradual typing that were made last May (in some part motivated by all the intersection discussions). I think that the questions around the meaning of T & Any that caused difficulty in previous discussions now have clear and unambiguously correct answers, given those clarifications to the typing spec. (I'm happy to clarify what I think those answers are, in detail, if that's useful, but that's arguably off-topic for this particular thread.)

If we do add an "ordered inheritance type", I would prefer that it be little or nothing more than syntactic sugar for "inline definition of an anonymous empty class type with given bases", so that it doesn't require new special handling or complexity in type checkers or in the type system. This would imply that no type would be valid in such a type that isn't already valid in the bases of a class, nor would any type have a different interpretation than it would have in the bases of a class. It would also imply that type checkers would check LSP validity of such a type in the same way they would check it for creation of a real class.

@mikeshardmind
Copy link
Collaborator

I believe the prior efforts at trying to minimize impact showed fatal flaws to any approximation of it via ordering. I don't have the energy to re-detail all of that right now, but I agree with @carljm that we have sound definitions available for a true unordered intersection and it's just a matter of someone motivated enough to get it from those definitions to a full proposal with an implementation.

I think there's maybe two things which should be cleared up as a prerequisite to this though, because there's going to be issues with the sound definitions if this isn't.

The implied types imposed by the datamodel (specifically the parts of the datamodel that are enforced by the interpreter/c-api) should be always understood as the lower bound.

__hash__ and __eq__ need to be understood as being methods that return int and bool respectively, but that they can be set to None, making their type Optional callables. This part in particular is necccesary so that intersecting with Hashable has a usable meaning.

@carljm
Copy link

carljm commented Oct 15, 2024

Thanks for thinking through edge cases!

The implied types imposed by the datamodel (specifically the parts of the datamodel that are enforced by the interpreter/c-api) should be always understood as the lower bound.

Do you have an example of where this needs to be special-cased in a way that isn't already encoded in typeshed annotations?

__hash__ and __eq__ need to be understood as being methods that return int and bool respectively, but that they can be set to None, making their type Optional callables. This part in particular is necccesary so that intersecting with Hashable has a usable meaning.

Not sure I understand what specific behavior you're suggesting here, or why it's related to intersections specifically. I think __hash__ and Hashable are inherently problematic, because built-in types break LSP here. We have object having a non-optional __hash__(self) -> int but then subclasses of object with __hash__ = None. Because of this, IMO Hashable (and static typing of hashability generally) is currently not really possible or usable. It would be great to come to a better resolution here, but I'm not sure why it's specifically blocking for intersections; IMO Hashable is currently not usable either with or without intersections.

@mikeshardmind
Copy link
Collaborator

mikeshardmind commented Oct 15, 2024

Do you have an example of where this needs to be special-cased in a way that isn't already encoded in typeshed annotations?

I believe it's just __hash__ and __eq__ which aren't adequately covered at this point in time, and I view this as a major issue for intersections since one of the primary goals was being able to compose protocols.

IMO Hashable is currently not usable either with or without intersections.

Hashable works as I expect currently, it's a bit underspecified, but pyright and mypy at least are both working around it currently. I'd like to ensure we have a consistent way of handling T & Hashable for implementations to use before people start writing that, rather than after we invite writing it, there are use cases for exactly that which have come up on discourse recently.

Not sure I understand what specific behavior you're suggesting here, or why it's related to intersections specifically. I think __hash__ and Hashable are inherently problematic, because built-in types break LSP here. We have object having a non-optional __hash__(self) -> int but then subclasses of object with __hash__ = None

This is fixable by changing the way we type object such that it's optional, but exists and requiring type checkers to understand the language semantics for removal of __hash__ / or __eq__, type checkers are already doing this to some extent, it's just not specified. Specifying it would allow us to define that the presence of Hashable forbids types which have removed hashing capability.

Going that route, there's an implied negation type required to fully express the set of types here, because it's possible for a type to remove hashability and then a subtype of that type to re-add it.

The other route for fixing it would be that the first non-object base defines whether something is Hashable or not, and it is an LSP violation to change that in further subclasses. This would be my preferred method as it is simpler in nature, but it reraises issues with the fact that in 3.11, subclassing Any was specified as allowed without defining the semantics of that. The obvious route here is that subclassing Any is ignored for this calculation and the type is assumed compatible.

@carljm
Copy link

carljm commented Oct 15, 2024

Ok, I can certainly see that people will be tempted to write T & Hashable if we have intersections, and it would be ideal to sort out Hashable before people start writing that. I'm just wary of over-approximating blocking edges means nothing ever moves forward :) But I'm definitely open to looking at Hashable if someone has time to put together a proposal.

In a vacuum I think my preference would be to change object in typeshed to make it not-necessarily-hashable, and then simply require LSP for __hash__ overrides as normal. But I'm not sure what the backwards-compatibility fallout from that would look like.

@mikeshardmind
Copy link
Collaborator

mikeshardmind commented Oct 15, 2024

I'll try and get something together tonight or tomorrow for discussion on that for discourse, I think it's the only remaining major blocker (I understand the feeling of things not moving forward...). There are other things I'd prefer handling before then, but all of those seem like things which are less important and less likely to be encountered at a point that leaks into public interfaces of libraries. (And therefore things that we can just accept might have some minor sharp edges we can smooth over in the future)

@CarliJoy
Copy link
Owner

CarliJoy commented Dec 14, 2024

@mikeshardmind and @carljm can tell us the state of affairs?
Is there anything we can do to get thing moving?
Are there still any blockers?

I am sorry I really lost track.

Edit:

Okay in the answer below I didn't know that if you overwrite __eq__ in a class, python sets __hash__ to None.
That make things, well more complicated :(

object and eq

I would argue that __eq__ is bool|type[NotImplemented], as this is the runtime implementation:

>>> print(1 == "1")
False
>>> int(1).__eq__("1")
NotImplemented
>>> class Foo: ...
>>> print(Foo() == "1")
False
>>> Foo().__eq__("1")
NotImplemented

object and hashable.

Maybe we could remove __hash__ from object in "typeshed" (I didn't find the definition there btw.) but define that object.new() -> object&Hashable as well as define that every class or type() results also in object & hashable ?

I don't know. Probably we should look more detailed in the use cases in which __hash__ isn't possible i.e. list?
Maybe this is also a usecase for not.
So we can keep object with __hash__ but allow not, maybe also in inheritance: class List(not[Hashable]): ... ? We already allow Any, so why not ;-) ?

edit2:

Knowing about the handling of __eq__ and __hash__ my naive idea would be:

class wrapper_descriptor: ...

class internal_eq(wrapper_descriptor):
   def __call__(self, obj, other) -> bool|type[NotImplemented]: ...

class EqNotOverwritten(Protocol)
   __eq__: internal_eq

class AnyEq(Protocol):
   __eq__: Callable[[Any], bool|type[NotImplemented]] 

class object:
    __eq__: Callable[[Any], bool|type[NotImplemented]] = internal_eq()

   @overload
   def __hash__(self: EqNotOverwritten) -> int: ...

   @overload
   def __hash__(self: AnyEq) -> Never: ...

But that's not really a working definition. Just the idea, somehting like a low level "overload", if eq is not the internal function use a different hash.

Sorry I am too far away from how exactly type checkers work in combination with typeshed, hope that the ideas are not too stupid.

@carljm
Copy link

carljm commented Dec 14, 2024

The thread on __hash__ and __eq__ that @mikeshardmind started is at https://discuss.python.org/t/hash-eq-and-lsp/68138 -- I probably need to get back to that and see if we can reach a consensus proposal, I've just been short on time.

@mikeshardmind
Copy link
Collaborator

mikeshardmind commented Dec 15, 2024

I might be able to put it into specification language in the near future, but I'm a bit concerned about some of the questions it poses about LSP compatibility in general, and that there's likely more that needs doing. This isn't the only place where an addition removes capability. The obvious other cases change variance, but there are a few that are more involved than that. Fixing this one is likely enough to unblock intersections, as it is the most likely to be interacted with due to how fundamental hashability is in python, but I don't think the other known issues stop being issues.

(Some other places require larger changes to how variance is handled, a few would benefit from more narrowly scoping LSP exceptions in a similar way, but those would be more disruptive to existing code, even when highlighting real issues)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants