-
Notifications
You must be signed in to change notification settings - Fork 24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
The type_of
could be very slow
#53
Comments
Hey @ice-tong! Thank you for the kind words. :) You're totally right that When a list with many elements is passed to a function, one must inspect all elements of the list to determine the right element type. As far as I can tell, there is no cheaper way to determine this. One way of approaching this issue is, for large lists, say Do you have any thoughts on this? |
Hi @wesselb, I agree that there is no cheaper way to accurately get the type of a Python Object. I also don't have any good ideas for this right now. 🤣 |
Let's leave this issue open, because it is certainly a performance issue that at some point will need addressing! |
Hi @wesselb, I want to share with you some possible tricks to optimize the In the following big list test, we can use the singleton Type since a lot of instance creation in import time
from contextlib import contextmanager
from plum import type_of
from plum.type import TypeMeta
from plum.type import Type as plum_Type
from plum.parametric import Tuple as plum_Tuple
import cProfile
big_list = [(1, 2) for _ in range(10000*100)]
start_t = time.time()
print(type_of(big_list))
base_t = time.time() - start_t
print(f'Big list cost: {base_t} s')
@contextmanager
def singleton(meta_cls):
origin = meta_cls.__call__
meta_cls._instances = {}
def __call__(cls, *args, **kwargs):
assert not kwargs
key = (cls, args)
if key not in cls._instances:
cls._instances[key] = origin(cls, *args, **kwargs)
return cls._instances[key]
meta_cls.__call__ = __call__
yield
meta_cls.__call__ = origin
with singleton(TypeMeta):
start_t = time.time()
print(type_of(big_list))
t_1 = time.time() - start_t
print(f'Big list with singleton type cost: {t_1} s')
print(f'speed up: {(base_t - t_1) / base_t}')
@contextmanager
def hash_cache(hashable_ty):
hash_core = hashable_ty.__hash__
hashable_ty._hash = None
def __hash__(self):
if self._hash is None:
self._hash = hash_core(self)
return self._hash
hashable_ty.__hash__ = __hash__
yield
hashable_ty.__hash__ = hash_core
with hash_cache(plum_Tuple), hash_cache(plum_Type), singleton(TypeMeta):
start_t = time.time()
print(type_of(big_list))
t_2 = time.time() - start_t
print(f'Big list with singleton type and hash cache cost: {t_2} s')
print(f'speed up: {(base_t - t_2) / base_t}')
cProfile.run('type_of(big_list)', filename='plum_type_of.prof') we got 40+% speed up:
|
The def type_of(obj):
if type(obj) is list:
return List(_types_of_iterable(obj))
elif type(obj) is tuple:
return Tuple(*(type_of(x) for x in obj))
elif type(obj) is dict:
return Dict(_types_of_iterable(obj.keys()), _types_of_iterable(obj.values()))
else:
return ptype(type(obj)) |
Hey @ice-tong, Those are some clever optimisations—halving the runtime of
Currently, multiple dispatch is used to make |
@wesselb I'm late to this game, but there's a pretty large body of work around describing this issue, which maybe plum can take advantage of? Two things: 1. runtime typechecks of deeply nested generics, and 2. using this for dispatch
It's effectively a robust version of doing this, just with @leycec's incredible obsession for speed verification (and puns 😅)
So, I realize this is a massive can of worms I'm opening in terms of the horrifying way python treats runtime types.
BUT I guess I'm just trying to start a broader conversation here about how one might go about unifying the community around a decent approach to dispatch on deeply nested containers that doesn't require reinventing the wheel every two seconds/libraries? 😄 |
Hi @tbsexton, I had the same idea as you: using In [1]: from typing import List, Tuple
In [2]: from beartype.abby import is_bearable
In [3]: import torch
In [4]: import numpy as np
In [5]: typing_hints = List[Tuple[torch.Tensor, torch.Tensor]]
In [6]: data = [(np.array(1), np.array(2)) for _ in range(10000*100)]
In [7]: %time is_bearable(data, typing_hints)
CPU times: user 23.6 s, sys: 41.2 ms, total: 23.6 s
Wall time: 23.6 s
Out[7]: False So I dropped this idea. 🤣 As long as we can optimize the time consumption of |
shots fired First up, thanks a metric ton to @tbsexton! But what else would you expect from NIST? Only the best. That's what. Thankfully, ludicrously fast runtime type-checking is in our wheelhouse. Let's see if we can't shed a light into the darkness that is this issue. Second up, let's re-profile @ice-tong's horrifying worst-case example that makes my filmy eyes bleed for posterity. Gotta set that record straight if we can. To do so, we'll begin with a simpler example that elucidates possibly interesting things: $ ipython3.10
[ins] In [1]: import beartype
[ins] In [2]: beartype.__version__
Out[2]: '0.11.0' # <-- good. this is good.
[ins] In [3]: from beartype.abby import is_bearable
[ins] In [4]: from beartype.typing import List, Tuple # <-- avoid PEP 585 deprecation warnings, yo.
[ins] In [5]: typing_hints = List[Tuple[str, str]] # <-- so, strings instead of PyTorch tensors.
# So far, so good. First, a plus-sized list of 2-tuples satisfying this hint.
[ins] In [6]: data_good = [(str(i), str(i+1)) for i in range(10000*100)]
[ins] In [7]: %time is_bearable(data_good, typing_hints)
CPU times: user 2.17 ms, sys: 958 µs, total: 3.13 ms
Wall time: 3.12 ms
Out[7]: True
# Goodness continues. @beartype yeets. Second, let's try that again. Just 'cause.
[ins] In [8]: %time is_bearable(data_good, typing_hints)
CPU times: user 34 µs, sys: 3 µs, total: 37 µs
Wall time: 39.1 µs
Out[8]: True
# Goodness amplifies! @beartype is two orders of magnitude faster the second time
# you invoke it than the first. What gives? Simple. The first time you call is_bearable(),
# @beartype secretly creates and caches a synthetic runtime type-checker specific to
# the exact type hint you passed (i.e., "typing_hints" above). Every subsequent time
# you call is_bearable() with the same type hint, @beartype internally reuses the
# previously cached synthetic runtime type-checker for that type hint.
#
# In complexity terms, is_bearable() exhibits amortized worst-case O(1) time complexity
# with negligible constants. This is good and exactly as advertised.
#
# So far, still good. Next, a plus-sized list of 2-tuples *VIOLATING* this hint.
[ins] In [9]: data_bad = [(i, i+1) for i in range(10000*100)] # <-- so, integers instead of NumPy arrays.
[ins] In [9]: %time is_bearable(data_bad, typing_hints)
CPU times: user 1.03 s, sys: 12.1 ms, total: 1.05 s
Wall time: 1.04 s
Out[9]: False
# *THAT'S TERRIBAD.* Okay, sure. It's not quite the 23.6 seconds terribad that
# @ice-tong exhibited above. But... that's still not great. This should be tens of
# orders of magnitudes faster than it currently is. What gives, @beartype!?!?
#
# The real-world, mostly. In a desperate last-ditch effort to push beartype 0.10.0
# out the door, I implemented is_bearable() in the crudest way possible. I wasn't
# convinced anyone was actually interested in performing their own runtime
# type-checking at the statement level. So, I just shipped the "beartype.abby"
# subpackage out the door as fast as possible. We know how this story ends.
# It would eventually become clear that everyone, in fact, is interested in
# performing their own runtime type-checking at the statement level. Then I
# began facepalming myself repeatedly.
#
# Because laziness prevailed, is_bearable() currently uses the Easier to Ask for
# Permission than Forgiveness (EAFP) approach to detect type hint violations.
# Basically, is_bearable() catches exceptions. Exception handling is fast in
# Python if no exception is raised. As soon as an exception is raised, however,
# exception handling is slooooow. That's what we're seeing above.
#
# This is compounded by the fact that the "data_bad" object is a stupidly large
# list of 2-tuples. So, is_bearable() is not only catching an exception there;
# is_bearable() is catching an exception that has embedded inside of itself a
# string representation of a stupidly large list of 2-tuples. It takes a considerable
# amount of wall-clock time to construct that string representation. In fact,
# almost *ALL* of the 1.03 seconds consumed by that last call to is_bearable()
# was spent constructing a string representation that no one ever actually sees.
# More facepalming. I'll stop there. This embarrassment must end! It's now evident that there is tangible interest in statement-level runtime type-checking. So, I'll absolutely divert resources to optimizing That Said, Nothing Above MattersThat's right. @leycec always cackles madly at the end of every GitHub post – and this is no exception. That's fascinating and ultimately the reason we're all here. So, here are the two things I'm seeing:
The core issue is that Preach, old man! There are many intersecting issues here. The most significant is that PEP-compliant type hints were never designed, intended, or implemented to be used at runtime. Type hints explicitly prohibit introspection by defining metaclasses whose The runtime API of type hints varies across Python versions and PEP standards. Most type hints initially popularized by PEP 484 have since been deprecated by PEP 585. Critically, this includes core type hint factories like from typing import List, Tuple
# This is a PEP 484-compliant generic.
class ListOfStrs484(List[Tuple[str]]): pass
# This is a PEP 585-compliant generic.
class ListOfStrs585(list[tuple[str]]): pass So, now users can create instances of both Postponement raises its ugly head, too. How do you resolve PEP 563-postponed type hints under the Ambiguity raises its ugly head, too. How do you differentiate between a fixed-length tuple (e.g., an object satisfying So What You're Saying is Type Hints SuckSorta. That's not far off, really. Type hints don't suck, per say. They're just sufficiently non-trivial to support at runtime that it's generally not worth anyone's time to actually do so – unless they're @beartype or What I'm saying is... @beartype actually does all of the above internally already – full-blown PEP 563 resolution of postponed type hints, full-blown detection of arbitrary type hints, full-blown iteration of generic superclass trees. You name it and we probably have a private internal API handling it. That's the rub, though. Private. We sorta nebulously suspected that somebody else might want to introspect type hints, but never quite got around to finalizing a public API. The closest we've come is our upcoming Decidedly Object-Oriented Runtime-checking (DOOR) API, shipping with # This is DOOR. It's a Pythonic API providing an object-oriented interface
# to low-level type hints that basically have no interface whatsoever.
>>> from beartype.door import TypeHint
>>> usable_type_hint = TypeHint(int | str | None)
>>> print(usable_type_hint)
TypeHint(int | str | None)
# DOOR hints can be iterated Pythonically.
>>> for child_hint in usable_type_hint: print(child_hint)
TypeHint(<class 'int'>)
TypeHint(<class 'str'>)
TypeHint(<class 'NoneType'>)
# DOOR hints support equality Pythonically.
>>> from typing import Union
>>> usable_type_hint == TypeHint(Union[int, str, None])
True # <-- this is madness.
# DOOR hints support rich comparisons Pythonically.
>>> usable_type_hint <= TypeHint(int | str | bool | None)
True # <-- madness continues.
# DOOR hints are self-caching.
>>> TypeHint(int | str | bool | None) is TypeHint(int | str | bool | None)
True # <-- blowing minds over here. Basically, Roll CallThanks again to @tbsexton for the gracious ping! Furious fist-bumps to @wesselb for maintaining all of this mysterious magic and @ice-tong for their justifiable interest in both macro- and micro-optimization. Literally can't wait to see where you take |
@leycec thanks for dropping by! I hesitated to so rudely summon you from "hibernation" (read: implementing whatever beautiful insanity v0.12 is turning into), but I wont lie, I'm glad I did! 😅 You mentinoned:
I actually had started writing out a reply to @ice-tong somewhat along those lines, since I had noticed the
This was exactly what I was hoping to hear, though how to do this smoothly and without breaking up @wesselb's workflow as much as possible would be excellent. I know the Needless to say, DOOR is everything I was ever scared I needed, so I'd be happy to beta test for some ideas I've been floating around. Is there an issue/PR I could migrate the discussion to, to keep this on-topic?
I feel like we need our own Anyway, thanks @leycec for dropping by, and @wesselb this is a lot of text-walls to come back to but I guess I just want to say, we're here to help? Happy to continue this discussion here or in a new thread, etc., if beartype/typegaurd reliance interests you. |
Sup @tbsexton and @leycec, I’m currently away on holidays, but I saw emails for your messages popping up and just wanted to briefly thank for you for the fantastic posts! Once I’m back to a keyboard I’ll reply properly. In the meantime, I just wanted to say that beartype looks absolutely fantastic, and I’m puzzled that I wasn’t aware of the library before. I am more than happy—hell, I’d even much prefer to have plum offload everything that isn’t multiple dispatch to other libraries. I realise that some appreciate that plum currently is dependency free, so perhaps a simplistic fallback (like the current type_of) in case beartype / other dependency isn’t installed seems like a good solution that might yield the best of both worlds. |
@wesselb: Wondrous idea. Graceful fallbacks and optional dependencies are the only way. @beartype itself does that all the friggin' time in our main codebase and test suite. Let us know how @beartype can help @ice-tong: Behold! @beartype commit 8b15fa resolves the shameful performance regression in In [1]: from typing import List, Tuple
In [2]: from beartype.abby import is_bearable
In [3]: import torch
In [4]: import numpy as np
In [5]: typing_hints = List[Tuple[torch.Tensor, torch.Tensor]]
In [6]: data = [(np.array(1), np.array(2)) for _ in range(10000*100)]
In [7]: %time is_bearable(data, typing_hints)
CPU times: user 31 µs, sys: 2 µs, total: 33 µs
Wall time: 36.7 µs
Out[7]: False I threw all of Saturday and Sunday evening at making that magic happen, which meant refactoring the entire @beartype codebase behind the scenes to support dynamic generation of arbitrary type-checking tester functions. Please don't do this at home, kids. 😅 Let me know if you hit any other speed snafus, @ice-tong. Actually... don't let me know. I intend to spend next weekend doing nothing but playing every video game that exists. |
Holy bananas, bearman Well @leycec that's a tour-de-force, right there. Incredibly useful! Now... go get some sleep, I guess? 😅 |
Whoa! That's a massive effort. I hope you indeed spent last weekend playing your entire catalogue of video games. 😅 First of all, thanks @leycec, @tbsexton, and @ice-tong for the in-depth discussion. It’s much appreciated. On a high level, for Plum to function, we basically need two things: (Thing 1) A function >>> type([1, 2])
list # Desired: list[int] As @leycec rightly so says, implementing such a function, unfortunately, is a massive can of worms, and probably the biggest challenge to tackle.
If we could not need Thing 1 and only require the ability to check whether objects are of a certain type at runtime, then we could use @beartype all the way. Sadly, though, I do think multiple dispatch requires a (Thing 2) For all types returned by In addition to Thing 1 and Thing 2, we would really like to ability to define custom types which extend the type system. One such example would be something that we could call a "deferred type". By deferred types I mean types which can already be used for type hints, but which may not yet exist. Forward references, I suppose, are one such example. Another example would be On Issue 1: Variable Runtime API of Type HintsPlum currently attempts to deal with this by converting the type to a string and parsing the string. Yes, really. The parsed string is then wrapped in versions of
If at all possible, I would really like to get rid of this wrapping of types. (@PhilipVinc) On Issue 2: PostponementForward references have been a pain to get right and are responsible for a fair chunk of the complexity of Pum. @leycec, you say that @beartype resolves PEP 563-postponed type hints. Is that something Plum could borrow, too?
You're right that from typing import Union
from plum import dispatch
class MyClass:
@dispatch
def f(self, x: Union["MyClass", int]):
return x Briefly on this topic, there was a time where I thought the following would be a good idea: whenever a user desires a forward reference, require On Issue 3: Ambiguity
Fortunately, I believe this should not be an issue, as long as Thing 1 and Thing 2 are soundly implemented. That is, For the above example, Concluding Remark
I wholeheartedly agree. My sense is that it might be time for a |
Excellent exegesis, @wesselb! As always, because you're right about everything, I'm in agreement with everything. I'm also on the eroded precipice of releasing
Yes! You also want to do this in Again, @beartype can help you here. You might want to allow your users themselves to select between # Type-check all items of a list in O(n) time.
from beartype import beartype, BeartypeConf, BeartypeStrategy
@beartype(conf=BeartypeConf(strategy=BeartypeStrategy.On))
def get_list_size_slowly(muh_list: list[str]) -> int:
return len(muh_list) The same support extends to our statement-level runtime checkers as well: # Type-check all items of a list in O(n) time.
from beartype import BeartypeConf, BeartypeStrategy
from beartype.abby import die_if_unbearable
big_list_of_strings = ['Pretend', 'this', 'is', 'alotta', 'strings.']
die_if_unbearable(big_list_of_strings, conf=BeartypeConf(strategy=BeartypeStrategy.On)) So, in synopsis:
Onward and upward, valiant typing soldiers!
Partial order, actually. But, yes! Exactly! A partial order is what builtin Python sets and frozen sets provide as well, I believe. The idea here is that you can only reasonably compare semantically commensurable type hints (i.e., type hints that are meaningfully related to one another). Examples include Type hints that are semantically incommensurable, however, always compare as
Thus, partial rather than total order. Everything above is shipping with
Yes! Blazing fast is @beartype's two middle names. Prod us if performance regressions rear their ugly mugs again and we'll promptly wack them with the Mallet of Justice. Gah!Depressingly, I just ran out of time. Oh, right! PEP 563-postponed type hints. That's critical. So, >>> from beartype.door import TypeHint
# Currently fails, but definitely will work someday.
# I swear. Would @leycec lie? *sweat drips down forehead*
>>> resolved_typehint = TypeHint('Union[MuhClass, MuhOtherClass, bool]')
>>> resolved_typehint.hint
Union[MuhClass, MuhOtherClass, bool] # <-- everything resolved, yo 😓 |
Yes. 🤦 How very silly of me... You're totally right, of course! Let's pretend I said partial order all along. 😅
Oh, yes—please! If @beartype could be used to implement
The goodness just keeps on going! Also this sounds fantastic. :)
Please do prioritise your own work and don't let this issue take up too much of your time! I am very appreciative of all your time invested in the discussion thus far, @leycec, @ice-tong, @tbsexton. |
This commit is the first in a commit chain publicizing our previously private algorithm for resolving stringified type hints postponed by PEP 563 via ``from __future__ import annotations`` statements at the heads of user-defined modules. Specifically, this commit: * Defines a new public `beartype.peps` subpackage. * Defines a new public `beartype.peps.resolve_pep563()` function extracted from our previously private `beartype._decor._pep.pep563` submodule. This function is intended to be "the final word" on runtime resolution of PEP 563-postponed type hints. May no other third-party package suffer as we have suffered. This commit is for you, everyone. And "by everyone," we of course mostly mean @wesselb of [Plum](github.com/wesselb/plum) fame. See also beartype/plum#53. (*Very connective invective!*)
This minor release unleashes a major firestorm of support for **class decoration,** **colourful exceptions,** **pyright + PyLance + VSCode,** [PEP 484][PEP 484], [PEP 544][PEP 544], [PEP 561][PEP 561], [PEP 563][PEP 563], [PEP 585][PEP 585], [PEP 604][PEP 604], [PEP 612][PEP 612], and [PEP 647][PEP 647]. This minor release resolves a mammoth **29 issues** and merges **12 pull requests.** Noteworthy changes include: ## Compatibility Improved * **Class decoration.** The `@beartype` decorator now decorates both higher-level classes *and* lower-level callables (i.e., functions, methods), resolving feature request #152 kindly submitted by @posita the positively sublime. All possible edge cases are supported, including: * Classes defining methods decorated by builtin decorators: i.e., * Class methods via `@classmethod`. * Static methods via `@staticmethod`. * Property getters, setters, and deleters via `@property`. * Arbitrarily deeply nested (i.e., inner) classes. * Arbitrarily deeply nested (i.e., inner) classes whose type hints are postponed under [PEP 563][PEP 563]. Since this was surprisingly trivial, @leycec probably should have done this a few years ago. He didn't. This is why he laments into his oatmeal in late 2022. * **[PEP 484][PEP 484]- and [PEP 585][PEP 585]-compliant nested generics.** @beartype now supports arbitrarily complex [PEP 484][PEP 484]- and [PEP 585][PEP 585]-compliant inheritance trees subclassing non-trivial combinations of the `typing.Generic` superclass and other `typing` pseudo-superclasses, resolving issue #140 kindly submitted by @langfield (William Blake – yes, *that* William Blake). Notably, this release extricated our transitive visitation of the tree of all pseudo-superclasses of any PEP 484- and 585-compliant generic type hint (*...don't ask*) from its prior hidden sacred cave deep within the private `beartype._decor._code._pep._pephint` submodule into a new reusable `iter_hint_pep484585_generic_bases_unerased_tree()` generator, which is now believed to be the most fully-compliant algorithm for traversing generic inheritance trees at runtime. This cleanly resolved all lingering issues surrounding generics, dramatically reduced the likelihood of more issues surrounding generics, and streamlined the resolution of any more issues surrounding generics should they arise... *which they won't.* Generics: we have resoundingly beaten you. Stay down, please. * **[PEP 544][PEP 544] compatibility.** @beartype now supports arbitrarily complex [PEP 544][PEP 544]-compliant inheritance trees subclassing non-trivial combinations of the `typing.Protocol` + `abc.ABC` superclasses, resolving #117 kindly submitted by too-entertaining pun master @twoertwein (Torsten Wörtwein). Notably, `@beartype` now: * Correctly detects non-protocols as non-protocols. Previously, @beartype erroneously detected a subset of non-protocols as PEP 544-compliant protocols. It's best not to ask why. * Ignores both the unsubscripted `beartype.typing.Protocol` superclass *and* parametrizations of that superclass by one or more type variables (e.g., `beartype.typing.Protocol[typing.TypeVar('T')]`) as semantically meaningless in accordance with similar treatment of the `typing.Protocol` superclass. * Permits caller-defined abstract protocols subclassing our caching `beartype.typing.Protocol` superclass to themselves be subclassed by one or more concrete subclasses. Previously, attempting to do so would raise non-human-readable exceptions from the `typing` module; now, doing so behaves as expected. * Relaxed our prior bad assumption that the second-to-last superclass of all generics – and thus protocols – is the `typing.Generic` superclass. That assumption *only* holds for standard generics and protocols; non-standard protocols subclassing non-`typing` superclasses (e.g., the `abc.ABC` superclass) *after* the list `typing` superclass in their method resolution order (MRO) flagrantly violate this assumption. Well, that's fine. We're fine with that. What's not fine about that? **Fine. This is fine.** * Avoids a circular import dependency. Previously, our caching `beartype.typing.Protocol` superclass leveraged the general-purpose `@beartype._util.cache.utilcachecall.callable_cached decorator` to memoize its subscription; however, since that decorator transitively imports from the `beartype.typing` subpackage, doing so induced a circular import dependency. To circumvent this, a new `@beartype.typing._typingcache.callable_cached_minimal` decorator implementing only the minimal subset of the full `@beartype._util.cache.utilcachecall.callable_cached` decorator has been defined; the `beartype.typing` subpackage now safely defers to this minimal variant for all its caching needs. * **[PEP 563][PEP 563] compatibility.** @beartype now resolves [PEP 563][PEP 563]-postponed **self-referential type hints** (i.e., type hints circularly referring to the class currently being decorated). **Caveat:** this support requires that external callers decorate the *class* being referred to (rather than the *method* doing the referring) by the `@beartype` decorator. For this and similar reasons, users are advised to begin refactoring their object-oriented codebases to decorate their *classes* rather than *methods* with `@beartype`. * **[PEP 612][PEP 612] partial shallow compatibility.** @beartype now shallowly detects [PEP 612][PEP 612]-compliant `typing.ParamSpec` objects by internally associating such objects with our `beartype._data.hint.pep.sign.datapepsigns.HintSignParamSpec` singleton, enabling @beartype to portably introspect `Callable[typing.ParamSpec(...), ...]` type hints. * **Static type-checking.** @beartype is now substantially more compliant with static type-checkers, including: * **Microsoft [pyright](https://github.com/microsoft/pyright) + [PyLance](https://marketplace.visualstudio.com/items?itemName=ms-python.vscode-pylance) + [VSCode](https://visualstudio.com).** @beartype now officially supports pyright, Microsoft's in-house static type-checker oddly implemented in pure-TypeScript, <sup>*gulp*</sup> resolving issues #126 and #127 kindly submitted by fellow Zelda aficionado @rbroderi. Specifically, this release resolves several hundred false warnings and errors issued by pyright against the @beartype codebase. It is, indeed, dangerous to go alone – but we did it anyway. * **mypy `beartype.typing.Protocol` compatibility.** The @beartype-specific `beartype.typing.Protocol` superclass implementing [PEP 544][PEP 544]-compliant fast caching protocols is now fully compatible with mypy, Python's official static type-checker. Specifically, `beartype.typing.Protocol` now circumvents: * python/mypy#11013 by explicitly annotating the type of its `__slots__` as `Any`. * python/mypy#9282 by explicitly setting the `typing.TypeVar()` `bounds` parameter to this superclass. * **[PEP 647][PEP 647] compatibility.** @beartype now supports arbitrarily complex **[type narrowing](https://mypy.readthedocs.io/en/latest/type_narrowing.html)** in [PEP 647][PEP 647]-compliant static type-checkers (e.g., mypy, pyright), resolving issues #164 and #165 kindly submitted in parallel by foxy machine learning gurus @justinchuby (Justin Chuby) and @rsokl (Ryan Soklaski). Thanks to their earnest dedication, @beartype is now believed to be the most fully complete type narrower. Specifically, the return of both the `beartype.door.is_bearable()` function and corresponding `beartype.door.TypeHint.is_bearable()` method are now annotated by the [PEP 647][PEP 647]-compliant `typing.TypeGuard[...]` type hint under both Python ≥ 3.10 *and* Python < 3.10 when the optional third-party `typing_extensions` dependency is installed. Doing so substantially reduces false positives from static type checkers on downstream codebases deferring to these callables. Thanks so much for improving @beartype so much, @justinchuby and @rsokl! * **`@{classmethod,staticmethod,property}` chaining.** The `@beartype` decorator now implicitly supports callables decorated by both `@beartype` *and* one of the builtin method decorators `@classmethod`, `@staticmethod`, or `@property` regardless of decoration order, resolving issue #80 kindly requested by @qiujiangkun (AKA, Type Genius-kun). Previously, `@beartype` explicitly raised an exception when ordered *after* one of those builtin method decorators. This releseae relaxes this constraint, enabling callers to list `@beartype` either before or after one of those builtin method decorators. * **`beartype.vale.Is[...]` integration.** Functional validators (i.e., `beartype.vale.Is[...]`) now integrate more cleanly with the remainder of the Python ecosystem, including: * **IPython.** Functional validators localized to a sufficiently intelligent REPL (e.g., IPython) that caches locally defined callables to the standard `linecache` module now raise human-readable errors on type-checking, resolving issue #123 kindly submitted by typing brain-child @braniii. Relatedly, @beartype now permissively accepts both physical on-disk files and dynamic in-memory fake files cached with `linecache` as the files defining an arbitrary callable. * **NumPy,** which publishes various **bool-like tester functions** (i.e., functions returning a non-`bool` object whose class defines at least one of the `__bool__()` or `__len__()` dunder methods and is thus implicitly convertible into a `bool`). Functional validators now support subscription by these functions, resolving issue #153 kindly submitted by molecular luminary @braniii (Daniel Nagel). Specifically, @beartype now unconditionally wraps *all* tester callables subscripting (indexing) `beartype.vale.Is` with a new private `_is_valid_bool()` closure that (in order): 1. Detects when those tester callables return bool-like objects. 2. Coerces those objects into corresponding `bool` values. 3. Returns those `bool` values instead. * **Moar fake builtin types.**@beartype now detects all known **fake builtin types** (i.e., C-based types falsely advertising themselves as being builtin and thus *not* require explicit importation), succinctly resolving issue #158 kindly submitted by the decorous typing gentleman @langfield. Specifically, @beartype now recognizes instances of all of the following as fake builtin types: * `beartype.cave.AsyncCoroutineCType`. * `beartype.cave.AsyncGeneratorCType`. * `beartype.cave.CallableCodeObjectType`. * `beartype.cave.CallableFrameType`. * `beartype.cave.ClassDictType`. * `beartype.cave.ClassType`. * `beartype.cave.ClosureVarCellType`. * `beartype.cave.EllipsisType`. * `beartype.cave.ExceptionTracebackType`. * `beartype.cave.FunctionType`. * `beartype.cave.FunctionOrMethodCType`. * `beartype.cave.GeneratorCType`. * `beartype.cave.MethodBoundInstanceDunderCType`. * `beartype.cave.MethodBoundInstanceOrClassType`. * `beartype.cave.MethodDecoratorBuiltinTypes`. * `beartype.cave.MethodUnboundClassCType`. * `beartype.cave.MethodUnboundInstanceDunderCType`. * `beartype.cave.MethodUnboundInstanceNondunderCType`. * `beartype.cave.MethodUnboundPropertyNontrivialCExtensionType`. * `beartype.cave.MethodUnboundPropertyTrivialCExtensionType`. ## Compatibility Broken * **Python 3.6.x support dropped.** This release unilaterally drops support for the Python 3.6.x series, which somnambulantly collided with its End-of-Life (EOL) a year ago and now constitutes a compelling security risk. Doing so substantially streamlines the codebase, whose support for Python 3.6.x required an unmaintainable writhing nest of wicked corner cases. We all now breathe a sigh of contentment in the temporary stillness of morning. * **`beartype.cave` deprecation removals.** This release removes all deprecated third-party attributes from the `beartype.cave` submodule. The continued existence of these attributes substantially increased the cost of importing *anything* from our mostly undocumented `beartype.cave` submodule, rendering that submodule even less useful than it already is. Specifically, this release removes these previously deprecated attributes: * `beartype.cave.NumpyArrayType`. * `beartype.cave.NumpyScalarType`. * `beartype.cave.SequenceOrNumpyArrayTypes`. * `beartype.cave.SequenceMutableOrNumpyArrayTypes`. * `beartype.cave.SetuptoolsVersionTypes`. * `beartype.cave.VersionComparableTypes`. * `beartype.cave.VersionTypes`. ## Exceptions Improved * **Colour** – the sensation formerly known as "color." @beartype now emits colourized type-checking violations (i.e., `beartype.roar.BeartypeCallHintViolation` exceptions) raised by both `@beartype`-decorated callables *and* statement-level type-checkers (e.g., `beartype.door.die_if_unbearable()`, `beartype.door.TypeHint.die_if_unbearable()`), resolving issue #161 kindly submitted by foxy machine learning expert @justinchuby (Justin Chu). When standard output is attached to an interactive terminal (TTY), ANSII-flavoured colours now syntactically highlight various substrings of those violations for improved visibility, readability, and debuggability. Since *all* actively maintained versions of Windows (i.e., Windows ≥ 10) now widely support ANSII escape sequences across both Microsoft-managed terminals (e.g., Windows Terminal) and Microsoft-managed Integrated Development Environments (IDEs) (e.g., VSCode), this supports extends to Windows as well. The bad old days of non-standard behaviour are behind us all. Thanks *so* much to @justinchuby for his immense contribution to the righteous cause of eye-pleasing user experience (UX)! * **Types disambiguated.** @beartype now explicitly disambiguates the types of parameters and returns that violate type-checking in exception messages raised by the `@beartype` decorator, resolving issue #124 kindly submitted by typing brain-child @braniii. Thus was justice restored to the QAverse. * **Stack frame squelched.** @beartype now intentionally squelches (i.e., hides) the ignorable stack frame encapsulating the call to our private `beartype._decor._error.errormain.get_beartype_violation()` getter from the parent type-checking wrapper function generated by the :mod:`beartype.beartype` decorator, resolving issue #140 kindly submitted by @langfield (William Blake – yes, *that* William Blake). That stack frame only needlessly complicated visual inspection of type-checking violations in tracebacks – especially from testing frameworks like :mod:`pytest` that recapitulate the full definition of the `get_beartype_violation()` getter (including verbose docstring) in those tracebacks. Specifically, this release: * Renamed the poorly named `raise_pep_call_exception()` function to `get_beartype_violation()` for clarity. * Refactored `get_beartype_violation()` to return rather than raise `BeartypeCallHintViolation` exceptions (while still raising all other types of unexpected exceptions for robustness). * Refactored type-checking wrapper functions to directly raise the exception returned by calling `get_beartype_violation()`. * **``None`` type.** The type of the ``None`` singleton is no longer erroneously labelled as a PEP 544-compliant protocol in type-checking violations. Let's pretend that never happened. * **`beartype.abby.die_if_unbearable()` violations.** The `beartype.abby.die_if_unbearable()` validator function no longer raises non-human-readable exception messages prefixed by the unexpected substring `"@beartyped beartype.abby._abbytest._get_type_checker._die_if_unbearable() return"`. "Surely that never happened, @beartype!" ## Features Added * **`beartype.door.** @beartype now provides a new public framework for introspecting, sorting, and type-checking type hints at runtime in constant time. N-n-now... hear me out here. @leycec came up with a ludicrous acronym and we're going to have to learn to live with it: the **D**ecidedly **O**bject-**O**rientedly **R**ecursive (DOOR) API. Or, `beartype.door` for short. Open the door to a whole new type-hinting world, everyone. `beartype.door` enables type hint arithmetic via an object-oriented type hint class hierarchy encapsulating the crude non-object-oriented type hint declarative API standardized by the :mod:`typing` module, resolving issues #133 and #138 kindly submitted by Harvard microscopist and general genius @tlambert03. The new `beartype.door` subpackage defines a public: * `TypeHint({type_hint})` superclass, enabling rich comparisons between pairs of arbitrary type hints. Altogether, this class implements a partial ordering over the countably infinite set of all type hints. Pedagogical excitement ensues. Instances of this class efficiently satisfy both the `collections.abc.Sequence` and `collections.abc.FrozenSet` abstract base classes (ABC) and thus behave just like tuples and frozen sets over child type hints. Public attributes defined by this class include: * A pair of `die_if_unbearable()` and `is_bearable()` runtime type-checking methods, analogous in behaviour to the existing `beartype.abby.die_if_unbearable()` and `beartype.abby.is_bearable()` runtime type-checking functions. * `TypeHint.is_bearable()`, currently implemented in terms of the procedural `beartype.abby.is_bearable()` tester. * An `is_ignorable` property evaluating to `True` only if the current type hint is semantically ignorable (e.g., `object`, `typing.Any`). There exist a countably infinite number of semantically ignorable type hints. The more you know, the less you want to read this changeset. * The equality comparison operator (e.g., `==`), enabling type hints to be compared according to semantic equivalence. * Rich comparison operators (e.g., `<=`, `>`), enabling type hints to be compared and sorted according to semantic narrowing. * A sane `__bool__()` dunder method, enabling type hint wrappers to be trivially evaluated as booleans according to the child type hints subscripting the wrapped type hints. * A sane `__len__()` dunder method, enabling type hint wrappers to be trivially sized according to the child type hints subscripting the wrapped type hints. * A sane `__contains__()` dunder method, enabling type hint wrappers to be tested for child type hint membership – just like builtin sets, frozen sets, and dictionaries. * A sane `__getindex__()` dunder method, enabling type hint wrappers to be subscripted by both positive and negative indices as well as slices of such indices – just like builtin tuples. * `beartype.door.AnnotatedTypeHint` subclass. * `beartype.door.CallableTypeHint` subclass. * `beartype.door.LiteralTypeHint` subclass. * `beartype.door.NewTypeTypeHint` subclass. * `beartype.door.TupleTypeHint` subclass. * `beartype.door.TypeVarTypeHint` subclass. * `beartype.door.UnionTypeHint` subclass. * `is_subtype({type_hint_a}, {type_hint_b})` function, enabling @beartype users to decide whether any type hint is a **subtype** (i.e., narrower type hint) of any other type hint. * `beartype.roar.BeartypeDoorNonpepException` type, raised when the `beartype.door.TypeHint` constructor is passed an object that is *not* a PEP-compliant type hint currently supported by the DOOR API. Thanks so much to @tlambert03 for his phenomenal work here. He ran GitHub's PR gauntlet so that you did not have to. Praise be to him. Some people are the living embodiment of quality. @tlambert03 is one such people. * **`beartype.peps`.** @beartype now publicizes runtime support for `typing`-centric Python Enhancement Proposals (PEPs) that currently lack official runtime support via a new public subpackage: `beartype.peps`. Notably, @beartype now provides: . Specifically, this commit: * A new public `beartype.peps.resolve_pep563()` function resolving [PEP 563][PEP 563]-postponed type hints on behalf of third-party Python packages. This function is intended to be "the final word" on runtime resolution of [PEP 563][PEP 563]. May no other third-party package suffer as we have suffered. This commit is for you, everyone. And "by everyone," we of course mostly mean @wesselb of [Plum](github.com/wesselb/plum) fame. See also beartype/plum#53. * **`beartype.vale.Is*[...] {&,|}` short-circuiting.** `&`- and `|`-chained beartype validators now explicitly short-circuit when raising human-readable exceptions from type-checking violations against those validators, resolving issue #125 kindly submitted by typing brain-child @braniii. ## Features Optimized * **`beartype.abby.is_bearable()` when returning `False`.** Previously, the public `beartype.abby.is_bearable()` runtime type-checker behaved reasonably optimally when the passed object satisfied the passed type hint but *extremely* suboptimally when that object violated that hint; this was due to our current naive implementation of that tester using the standard Easier to Ask for Permission than Forgiveness (EAFP) approach. This release fundamentally refactored `beartype.abby.is_bearable()` in terms of our new private `beartype._check.checkmake.make_func_tester()` type-checking tester function factory function. Ad-hoc profiling shows a speedup on the order of eight orders of magnitude – the single most intense optimization @beartype has ever brought to bear (*heh*). Our core code generation API now transparently generates both: * **Runtime type-checking testers** (i.e., functions merely returning ``False`` on type-checking violations). * **Runtime type-checking validators** (i.e., functions raising exceptions on type-checking violations). * **[PEP 604][PEP 604]-compliant new unions** (e.g., `int | str | None`). Since these unions are **non-self-caching type hints** (i.e., hints that do *not* implicitly cache themselves to reduce space and time consumption), @beartype now efficiently coerces these unions into singletons in the same manner as [PEP 585][PEP 585]-compliant type hints – which are similarly non-self-caching. ## Features Deprecated * **`beartype.abby` → `beartype.door`.** This release officially deprecates the poorly named `beartype.abby` subpackage in favour of the sorta less poorly named `beartype.door` subpackage, whose name actually means something – even if that something is a punny acronym no one will ever find funny. Specifically: * `beartype.abby.die_if_unbearable()` has been moved to `beartype.door.die_if_unbearable()`. * `beartype.abby.is_bearable()` has been moved to `beartype.door.is_bearable()`. To preserve backward compatibility, the `beartype.abby` subpackage continues to dynamically exist (and thus be importable from) – albeit as a deprecated alias of the `beartype.door` subpackage. ## Deprecations Resolved * **Setuptools licensing.** This release resolves a mostly negligible `setuptools` deprecation warning concerning the deprecated `license_file` setting in the top-level `setup.cfg` file. *Next!* ## Tests Improved * **[PEP 544][PEP 544] compatibility.** All [PEP 544][PEP 544]-specific test type hints have been generalized to apply to both the non-caching `typing.Protocol` superclass *and* our caching `beartype.typing.Protocol` superclass. * **[PEP 561][PEP 561] compatibility via pyright.** Our test suite now enforces static type-checking with `pyright`. Notably: * A new `test_pep561_pyright` functional test statically type-checks the @beartype codebase against the external `pyright` command in the current `${PATH}` (if available) specific to the version of the active Python interpreter currently being tested. For personal sanity, this test is currently ignored on remote continuous integration (CI) workflows. Let this shrieking demon finally die! * The private `beartype_test.util.cmd.pytcmdrun` submodule underlying our cross-platform portable forking of testing subprocesses now transparently supports vanilla Windows shells (e.g., `CMD.exe`, PowerShell). * **Tarball compatibility.** `beartype` may now be fully tested from non-`git` repositories, including source tarballs containing the `beartype_test` package. Previously, three functional tests making inappropriate assumptions about the existence of a top-level `.git/` directory failed when exercised from a source tarball. * **Sphinx documentation.** Our test suite now exercises that our documentation successfully builds with Sphinx via a new `test_sphinx_build()` functional test. This was surprisingly non-trivial – thanks to the `pytest`-specific `sphinx.testing` subpackage being mostly undocumented, behaving non-orthogonally, and suffering a host of unresolved issues that required we monkey-patch the core `pathlib.Path` class. Insanity, thy name is Sphinx. * **GitHub Actions dependencies bumped.** This release bumps our GitHub Actions-based continuous integration (CI) workflows to both the recently released `checkout@v3` and `setup-python@v3` actions, inspired by a pair of sadly closed PRs by @RotekHandelsGmbH CTO @bitranox (Robert Nowotny). Thanks so much for the great idea, @bitranox! * **`beartype.door` conformance.** A new smoke test guarantees conformance between our DOOR API and abstract base classes (ABCs) published by the standard `typing` module. * **python/mypy#13627 circumvention.** This release pins our GitHub Actions-based CI workflow to Python 3.10.6 rather than 3.10.7, resolving a mypy-specific complaint inducing spurious test failures. ## Documentation Improved * **[`beartype.abby` documented](https://github.com/beartype/beartype#beartype-at-any-time-api).** The new "Beartype At Any Time API" subsection of our front-facing `README.rst` file now documents our public `beartype.abby` API, resolving issue #139 kindly submitted by @gelatinouscube42 (i.e., the user whose username is the answer to the question: "What is the meaning of collagen sustainably harvested from animal body parts?"). * **[GitHub Sponsors activated](https://github.com/sponsors/leycec).** @beartype is now proudly financially supported by **GitHub Sponsors.** Specifically, this release: * Defines a new GitHub-specific funding configuration (i.e., `.github/FUNDING.yml`). * Injects a hopefully non-intrusive advertising template <sup>*gulp*</sup> at the head of our `README.rst` documentation. * **Sphinx configuration sanitized.** As the first tentative step towards chain refactoring our documentation from its current monolithic home in our top-level `README.rst` file to its eventual modular home at [ReadTheDocs (RTD)](https://beartype.readthedocs.io), en-route to resolving issue #8 (!) kindly submitted a literal lifetime ago by visionary computer vision export and long-standing phenomenal Finn @felix-hilden (Felix Hildén): * Our core Sphinx configuration has been resurrected from its early grave – which now actually builds nothing without raising errors. Is this an accomplishment? In 2022, mere survival is an accomplishment! So... *yes.* Significant improvements include: * Activation and configuration of the effectively mandatory `autosectionlabels` builtin Sphinx extension. * Our `doc/source/404.rst` file has been temporarily moved aside, resolving a non-fatal warning pertaining to that file. Look, we're not here to actually solve deep issues; we're here to just get documentation building, which it's not. Sphinx, you have much to answer for. * Our top-level `sphinx` entry point now: * Temporarily disables Sphinx's nit-picky mode (i.e., the `-n` option previously passed to `sphinx-build`) due to Sphinx's `autodoc` extension locally failing to generate working references. * Unconditionally disables Sphinx caching by forcing *all* target documentation files to be rebuilt regardless of whether their underlying source files have since been modified or not, obviating spurious build issues. [PEP 484]: https://www.python.org/dev/peps/pep-0484/ [PEP 544]: https://www.python.org/dev/peps/pep-0544/ [PEP 561]: https://www.python.org/dev/peps/pep-0561/ [PEP 563]: https://www.python.org/dev/peps/pep-0563/ [PEP 585]: https://www.python.org/dev/peps/pep-0585/ [PEP 604]: https://www.python.org/dev/peps/pep-0604/ [PEP 612]: https://www.python.org/dev/peps/pep-0612/ [PEP 647]: https://www.python.org/dev/peps/pep-0647/ (*Impossible journey on an implacable placard-studded gurney!*)
💥 Just did things. The new Plum-friendly
|
@leycec @wesselb So --- and this might be totally naive --- do we need a Hear me out: Dispatch means we know what we wantOstensibly, any user of multiple dispatch here isn't actually trying to dispatch on every extant type possible, but rather they register allowable types and provide a mechanism for extensibility. This means that, at runtime, we're not actually needing to know "what is the You might be thinking (as I did, previously) "hey wait! I have to know what the type is in order know if it's registered...right?" And that's kind of correct. Except that it happens we have a "magic oracle" a la So, yeah, for a given input, if the "nice" type information isn't known at runtime (e.g. complex, nested iterables, and anything that But we can do better, right?So, this is a lot like the good old nearest-neighbor problem, right? (no? Not following? alright, hang on...) If I have a bunch of data inputs and outputs, and some new data comes along, I want to guess what the output should be. If I were But we aren't usually By imposing a hierarchy onto the space of data (e.g. different sized binnning of the cartesian coordinates) we can ask fewer, simpler questions... is my new point in the left or right half of my data? Ok, now in the left or right of that? And that? This is a (really awful) way of describing a binary search tree (more like a kd-Tree). But anyways, we do have hierarchy information, no? there's a partial order on types, and iirc containers can be treated as a tree of sorts... Let's make a prefix tree of some kind, that includes everything what this looks likebrief sketch
For your example of infinite lists, then either
post scriptTechnically, we don't even need the first step... Admittedly, this is going to run in O(log n) for n types known a priori (whether all known types, or just the dispatched types, see above). My assumption here is that the number of possible input types is going to WAY dwarf the number of types a dispatcher is registered to, so I'd personally rather scale by the latter. ALSO, we can always memoize this, so that if we do find a decent type to "match" our input, it goes right to the O(1) lookup table? |
Yup. Smarty-pants NIST scientist @tbsexton with the game-winning layup at the buzzer. A binary search tree-ish monstrosity administered by It's germane for me to admit at this critical juncture that @beartype currently does not deeply type-check nearly as much as everyone believes it does: e.g., >>> from beartype.door import is_bearable
>>> is_bearable({'This is...': 'kinda non-ideal.'}, dict[bool, float])
True # <-- oh gods wth is this horror show The reason being I haven't found a spare hot minute to throw at deep type-checking. This is embarrassing, because that's what we're supposed to do. I kinda got distracted by other deceptively enticing open feature requests like documenting everything on ReadTheDocs (RTD) 😬 and the implicit numeric tower and Maybe I'd just better do deep type-checking, instead. I'm confident I can spin up deep type-checking support for sets and mappings (i.e., dictionaries), which hopefully covers most of the remaining type hints |
Hah I mean, we'll see if this is viable, and monstrous just sounds fun right? :P As for deep-checking, this is strictly for (based on the nasty example we saw starting this whole thing) dict, set, and tuple? Since those don't randomly sample without some ctypes shenanigans? Hmm... I see your door api already does this? e.g. >>> TypeHint(T.Dict[T.Any, int]) > TypeHint(T.Dict[bool,int])
True
>>> TypeHint(T.Dict[T.Any, int]) < TypeHint(T.Dict[bool,int])
False
>>> TypeHint(T.Dict[str, T.Dict]) > TypeHint(T.Dict[str,T.Dict[str,int]])
True
>>> TypeHint(T.Dict[str, T.Dict[str, T.Dict]]) > TypeHint(T.Dict[str,T.Dict[str,int]])
False So yeah that's awesome. AND we now know, a priori, how deep a given dispatcher needs to go (since by construction we are assuming the universe of possible types is given by the dispatcher). For my case, if my dispatcher had the audacity to register >>> my_json={'look':{'forthe':{'bear':'necessities'}}}
>>> is_bearable(my_json, T.Dict) # <-- the root check
True
>>> is_bearable(my_json.items[1], T.Dict) # <-- the 1st-level child check
True
>>> is_bearable(my_json.items[1].items[1], T.Dict) # <--- I have no memory of this place
True
>>> is_bearable(my_json.items[1].items[1].items[1], str) # this way! the air is not so foul down here
True I skipped the items[0] checks, but the point is the same...each time it's just a check against the generic parameter, and whatever we found inside the corresponding container in what got passed? We don't need to randomly access anything, since we know a priori what exactly should be at every level of the registered type hint. Am I missing something really obvious? I'm running a tad low on sleep... 😅 EDIT: I thought of an easier way to describe that monologue (thank you, sleep)...
We only really need the search tree for the first one, because parameters are either also needing the first one, or also have their own parameter, and down the recursion we go, until we hit the max depth specified by the dispatch registry (this part is important!) Obviously it's easier (for us, anyway) if beartype handled both types in one call, with deep-checks. But frankly I'm too excited for real documentation and whatever |
Ah, ha. High-IQ NIST researcher is high-IQ, which explains why one of us works at NIST while the other lives in a cabin in the woods. So... you're absolutely right. I think. I mean, that's the safe bet here, isn't here? See also our respective workplaces. As you astutely observe, the subset of the Likewise, the subset of the |
Hello, @leycec and @tbsexton! I'm very sorry for the radio silence on my part. Between wrapping up the PhD and starting a full-time job I've not managed to find time for everything I've been wanting to do, but I'm finally slowly finding a balance.
You're onto something here! In fact, I think this is really good idea. As you say, this completely avoids the need to implement a function Your proposal has made something really clear to me. In Python, in many practical scenarios, the type of an object is not sufficient to appropriately perform dispatch. That is, whereas we would hope that isinstance(x, Type) == issubclass(type(x), Type) this identity is more than often not true. This is made really clear by #66, which is a feature request for support for literal types. This insight makes me believe that your proposed approach would be a strictly better way of doing things, if not for one downside: caching. Since the type of an object cannot be relied upon to find the right method, caching would not be possible. I would therefore propose the following design, which attempts to combine the best of both approaches. Let us call a type isinstance(x, Type) == issubclass(type(x), Type) holds true. If all types in the dispatch tree (I guess we could use this terminology?) for a function are faithful, then dispatch only depends on the types of the arguments, which means that caching is possible again. Therefore, whenever all methods of a function only usedfaithful types, fast method lookup is possible. This gives the user the ability to (1) only use faithful types and get optimal performance or (2) use more complicated types at the cost of a small performance penalty. I think (1) is very important for functions which are called very often. I think performing dispatch only based on
What @beartype currently already does is amazing! I'm absolutely sure we can find a workaround for deep type checking that suffices in the vast majority of use cases. @tbsexton's proposal seems like an excellent way to go about this in a first iteration. |
Still digesting, but lemme point out something real quick:
So there's two separate "trees" in my proposal. One is the dispatch tree:
This is where your faithful vs. not faithful thing is most relevant, because if I have a partial order over types, I can guarantee a cacheable dispatch function (dispatch only depends on types). So-called unfaithful types are ones that break the partial order somehow... In your case given, they aren't subclasses. BUT the beartype.door api solves that. Theoretically, every possible type hint is now fully "faithful", as long as we a) have the type hint, and b) use the TypeHint wrapper with the But now I've assumed I know the type, so...
This is where we really don't have a good answer in general. Instead we have a nearly perfect chance of answering "is it a subtype of this other type (yes/no)" using beartype So we construct a binary search tree of TypeHints we know/care about (which we can make really easily using the first tree via So, two separate trees. Technically, the second is a tree the dispatcher builds for us from the first one, since we already get the first one, for free, from beartype partial order. Because they're the same tree, but we have to find clever subsets of the (infinite?) first one to avoid making |
What about a method
Hmmm, as far as I can see,
Yes! This is super nice and makes a whole lot of sense!
Right, gotcha! I think I understand the distinction you're making. |
OH I gotcha. Ok so in this case, I assumed we would have an ambiguity resolution precedence. Either the order the dispatch rules were defined in the code, or (my preference) the most specific applicable rule always gets applied. E.g.
Which is only true if If they have defined both, I think the search tree should continue as far down as it can, to obtain the most specific/narrow applicable type. Similar to how classes works, just simpler with beartype resolution checking. Theoretically this would be cacheable (I hope!) 😄 |
I think this is what should happen! (And how the current design works.) Hmm, just to be sure we're on the same page, let me walk through a slightly more elaborate example. Consider the code from numbers import Number
from typing import Literal
from plum import dispatch
@dispatch
def f(x: str):
return "str!"
@dispatch
def f(x: Number):
return "number!"
@dispatch
def f(x: int):
return "int!"
@dispatch
def f(x: float):
return "float!"
@dispatch
def f(x: Literal[1]):
return "the literal one!" This registers five methods for the function
Note that >>> isinstance(1, Literal[1])
True
>>> issubclass(type(1), Literal[1])
False We desire the following behaviour: >>> f("1")
'string!'
>>> f(1)
'the literal one!'
>>> f(1j)
'number!'
>>> f(1.0)
'float!'
>>> f(2)
'int!'
>>> f(object())
MethodNotFoundError: cannot find an appropriate method To determine the right method, there are two ways of doing this: The
|
Ahhh I got you. Ok, I have some deadlines here but very briefly let me clarify what parts I would hope are (ostensibly) cacheable: First, I don't think we can ever really hope to use the native
since a Literal isn't a subclass of Int. However, I'm actually imagining a compilation and 2x passes here.
OK so now that I've said that, I think both 2 and 3 are still cachable. They are (pure) functions, since every input will deterministically end at either a type in the tree or fail if no type is available. But as you note, you wouldn't be caching on types in 2, only in 3. But if you think it's needed (e.g. if a user passes the same x over and over such that memoizing on I generally think that relying on any kind of SO yeah I think the ^[1]: as a sidenote, if users want to do numeric type checks, I would direct them to the (beartype-fueled) numerary since they're another project running into type issues and trying to solve them 📦 ^[2]: @leycec I will say, if you're looking for feature requests... damn I want a clean syntax for sum types/tagged union so. bad.. I don't want to have to add a janky |
Yes. This makes my eye twitch even more spasmodically than it usually does: >>> issubclass(type(1), Literal[1])
False I hear the Jaws theme when I see horrors like that.
I see your " My bald-faced lies are hard to believe, but all of the runtime type-checking performed by @beartype (including Okay. It's actually
Friends don't let friends pydantic. I kid, of course! Pydantic is probably great – at least as a temporary stopgap until @beartype gracefully absorbs pydantic's feature set into itself like some gurgling tinfoil amoeba from Mystery Science Theatre 3000.
...this is not a thing people should be doing: from dataclasses import dataclass
from enum import Enum
# Horrible sum type is horrible.
@dataclass
class SumType:
kind: Enum
data: object
# Rando enumeration is rando.
Token = Enum('Token', ['Number', 'Operator', 'Identifier', 'Space', 'Expression'])
muh_horrible_sum_type = SumType(Token.Number, 42) That's awful, as both the from typing import Literal, TypedDict, Union
class NewJobEvent(TypedDict):
tag: Literal["new-job"]
job_name: str
config_file_path: str
class CancelJobEvent(TypedDict):
tag: Literal["cancel-job"]
job_id: int
Event = Union[NewJobEvent, CancelJobEvent]
def process_event(event: Event) -> None:
# Since we made sure both TypedDicts have a key named 'tag', it's
# safe to do 'event["tag"]'. This expression normally has the type
# Literal["new-job", "cancel-job"], but the check below will narrow
# the type to either Literal["new-job"] or Literal["cancel-job"].
#
# This in turns narrows the type of 'event' to either NewJobEvent
# or CancelJobEvent.
if event["tag"] == "new-job":
print(event["job_name"])
else:
print(event["job_id"]) Seriously? Expensive
Gah! You have discovered all my undocumented secrets. I... I was pretty sure nobody even knew about that API. You're like some kind of GitHub bloodhound, sniffing out the bitrot and skeletons in my commit history. This is why you work at NIST.
You're in luck then! In addition to being stupidly faster than prior CPython versions, the just-released CPython 3.11.0 publishes an official
No. No! Oh, please GitHub Gods above (...so, Satya Nadella), no!!! I think I may be hyperventilating here.
Yes! Do this for me, please. Do everything for me and then I will make this happen. Actually, this leads to an interesting factotum I just recalled. Did you know that
Combining those two realizations, we can exhaustively type enumeration members: from enum import Enum
from typing import Literal, assert_never
# Rando enumeration is rando.
Token = Enum('Token', ['Number', 'Operator', 'Identifier', 'Space', 'Expression'])
TokenKind = Literal[
Token.Number,
Token.Operator,
Token.Identifier,
Token.Space,
Token.Expression,
]
TokenTypes = int | str | OTHER_STUFF_I_GUESS
@beartype
def muh_tokenizer(token: TokenTypes, token_kind: TokenKind):
match token_kind:
case Token.Number:
...
case Token.Identifier:
...
blah-blah-blah!
case _:
assert_never() @beartype supports that already. I think. No quotes, please. Of course, that's just a really bad form of multiple dispatch. Plum would solve the same quandary considerably more elegantly, assuming ```python
from enum import Enum
from plum import dispatch
from typing import Literal
# Rando enumeration is rando.
Token = Enum('Token', ['Number', 'Operator', 'Identifier', 'Space', 'Expression'])
@dispatch
def muh_tokenizer(token: int, token_kind: Literal[Token.Number]): ...
@dispatch
def muh_tokenizer(token: str, token_kind: Literal[Token.Space]): ... Assuming But you probably want something more Rust-like, right? Like, a full-blown
@posita: People love |
@tbsexton, I should've clarified that in all instances of >>> TypeHint(Literal[1]) <= TypeHint(int)
True
Isn't this the function Ok then! I think we've converged onto an approach that should generally work: the I was not aware of I was also not aware of tagged unions, and I think I'd rather stay unaware. That's ugly. |
@wesselb spitting hard truths as always. I applaud this timely advice, too:
|
Right, sorry, let me fix that sentence:
rather than:
The second is the bad one. But a function that goes As for why sum types (discriminated/tagged unions), it's actually a way we can a priori do dispatching for a bunch of complicated types, without needing to validate or check which subtype gets dispatched to, a priori. See this tidbit from the pydantic docs:
So what I imagined here was a more general "validate()" function that is effectively doing, you guessed it, multiple dispatch, where I don't have to do a bunch of So this is all a bit of a tangent right? well... it's connected, I promise! I came across a brilliant SO comment chain that illustrates what I'm talking about. With "Untagged", i.e. normal
But hang on, how do we seem to get along fine without tagged types in other languages?
SO we've come full circle! Hah! The entire issue here is that in reality python has what amounts to no real support for @leycec so this is (kinda) getting out of scope here, so I might move this, but...you mention
So yeah, and in reality pattern matching can kinda do this to an extent. And plum is definitely the way to do sum type destructuring, IMO. But my issue is with sum type construction, so what bothers me is the need for these annoying little (user-defined) tags, such that no two libraries implementing sum types will ever really have a common protocol. See: the pydantic thing above, or the janky mypy SO what I'm thinking is... beartype could (probably?) be... coaxed... into emulating type tags (and therefore, ADTs muahahaha). After all, we're pretty much talking about one more kind of type arithmetic that needs to be supported (the disjoint union), and the new TypeHint object seems like a perfect candidate for representing something's true type information as metadata. |
Is a function
How would an ideal solution to this problem look like to you? I.e., how would you really like the code to look like?
I think I'm becoming increasingly convinced that we might not be able to do an implementation of |
Perhaps technically correct in that more than one person qualifies as "people" (shout out to @tbsexton), but if I'm famous, someone forgot to tell me. Maybe I should contact the Department of Internet Money and see how many millions I'm entitled to. 😉
Thanks! And by "nice work" I will take it to mean the type that never should have been required in the first place and should be obsoleted by the standard library as quickly as possible. The bad news is that we're not there yet. The good news is that there's a discussion, if that's something that interests you. The bad news is that it has kind of stalled. The good news is ... hang on, I'm thinking ... I've been spending some casual time wrestling with what an algorithmic generic might look like. The bad news is that I haven't gotten very far, probably because it's hard and because I've been distracted by other things that life has a way of throwing at one wrestling with tough problems without the benefit of complete creative freedom afforded by elective solitude and inherited financial independence. Alas, I fear I will never qualify for such niceties, and must make do with what I do have (and for which I am grateful). Patience is humbly appreciated! 🙇 |
@wesselb For that last example, "stuff that implements class Addable(T.Protocol):
def __add__(self, other):
...
TypeHint(T.Literal[1]) < TypeHint(int) < TypeHint(Addable)
Even though we're in structural-typing land now,
I suppose I should start working on this 😅 Gimme a little bit, I'll see if I can sketch some things out. |
@tbsexton I fully agree that DOOR works beautifully here and gives us the order that we’re after! (@leycec, this really is fantastic!) My point was that implementing a function typeof in this case is like solving the dispatch problem in the first place. That is, the logic to determine that 2 should go to int is in this case equivalent to determining that f(2) should dispatch to the method implemented by int. But perhaps we’re in agreement! I should have a prototype working soon(tm). |
@posita That’s certainly something I’m interested in! It’s exciting to see that there is discussion around this. I can fully imagine that what you’re after is a tall PS: The GitHub mobile website doesn’t allow one to react to posts with emoji’s? That’s a bummer. |
Hi, thanks for the nice project!
I found that the _types_of_iterable could be very slow if the iterable is very large. This could be a big problem for practical application. Is there any solution for this?
The text was updated successfully, but these errors were encountered: