-
Notifications
You must be signed in to change notification settings - Fork 87
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat!: rework ufunc type promotion handling #2767
Conversation
Codecov Report
Additional details and impacted files
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are a lot of changes here. Expanding
_apply_ufunc(ufunc, *args)
to
apply_ufunc(ufunc, method, input_args, kwargs)
is a good idea.
I have scrolled through it carefully, and I don't see any red flags. Instances of nplike.asarray
would look dangerous if I didn't know that this function is already carefully controlled.
So, it looks good, and feel free to merge it if you're sure you're done.
@jpivarski I updated the PR description to make it explicit that this will have a high-level effect upon the visible type semantics of ufunc operations. When you next have a moment, could you confirm that you're happy to move in this direction? I think it's where NumPy is moving, but moreover we need our operations to be value agnostic, so I think we have to |
Previously, weren't we gluing our type promotion behavior to whatever NumPy does (by actually calling NumPy on empty arrays)? Then we'd inherit whatever NumPy decides to do in the future. Is that not what would automatically happen, without this PR? As an aside, while it's bad for the output type of an operation to depend on the values in arrays, I'm less compelled to worry about it depending on the values of Python objects (scalar Also, it would be fine to assume that any Python I'm not against this PR; if it's necessary to fully conform to NEP-50 and there are good reasons to believe that NumPy will actually adopt NEP-50. |
Somewhat; the main places that we need to define what happens w.r.t types is in typetracer. We want the same rules to apply to typetracer and non-typetracer, so our The real problem with NumPy's default handling is that scalars are always typed by value; e.g. |
>>> np.array([1, 2, 3, 4, 5], np.int8) + np.int64(10)
array([11, 12, 13, 14, 15], dtype=int8) Yikes! Okay, that's terrible. Yes, you're right to introduce scaffolding to predict types manually. Although this does it right: >>> np.array([1, 2, 3, 4, 5], np.int8) + np.array([np.int64(10)])
array([11, 12, 13, 14, 15]) It seems that it just needs to not be a scalar ( >>> np.array([1, 2, 3, 4, 5], np.int8) + np.int64(10)[np.newaxis]
array([11, 12, 13, 14, 15]) |
Yes, I'm genuinely surprised that this is the default behavior, but I'm sure that there's a long and complex reason behind its history. The behaviour you describe above is now what this PR delivers: >>> ak.from_numpy(np.arange(10, dtype=np.uint8)) + np.int64(2**32 - 1)
<Array [4294967295, 4294967296, ..., 4294967303, 4294967304] type='10 * int64'> In the examples in the PR description, I demonstrate what happens to untyped scalars, which inherit the integer flavour of the array; >>> ak.from_numpy(np.arange(10, dtype=np.uint8)) + (2**32 - 1)
/home/angus/Git/awkward/src/awkward/_nplikes/array_module.py:44: DeprecationWarning: NumPy will stop allowing conversion of out-of-bound Python integers to integer arrays. The conversion of 4294967295 to uint8 will fail in the future.
For the old behavior, usually:
np.array(value).astype(dtype)
will give the desired result (the cast overflows).
return self._module.asarray(obj, dtype=dtype)
<Array [255, 0, 1, 2, 3, 4, 5, 6, 7, 8] type='10 * uint8'> |
I'm in favor of this. Also, I read over this PR carefully yesterday; it can be merged. |
Thanks — changing something as fundamental as this feels a lot safer with a detailed review :) |
In pursuing recent PRs, the fragility of our ufunc handling became clearer; we've spoken for a while about NEP-50 and value-based promotion, and the complexities of NumPy's type conversion rules. This PR I believe makes a pragmatic step in this direction, using https://numpy.org/doc/stable/reference/generated/numpy.ufunc.resolve_dtypes.html to predict the input and output type combination that a ufunc would use. As such, it applies NEP-50 rules which are value-insensitive.
How does this PR change Awkward?
In
main
, the following behavior is exhibited:After this PR:
Crucially this means that old code will start throwing warnings if bare Python scalars exceed the size of the array
dtype
. This is probably not all that likely; most of the time we pull values from other array operations, e.g.NumPy is going in this direction with NEP-50 (under consideration only so far). The only other way we could do this in a "safer" type agnostic fashion is to always take the default for the Python value kind (e.g.
int64
for integrals, etc.), or the largest type for value kinds. I prefer to align with NumPy, and let warnings inform users.