feat!: rework ufunc type promotion handling #2767

agoose77 · 2023-10-23T17:32:42Z

In pursuing recent PRs, the fragility of our ufunc handling became clearer; we've spoken for a while about NEP-50 and value-based promotion, and the complexities of NumPy's type conversion rules. This PR I believe makes a pragmatic step in this direction, using https://numpy.org/doc/stable/reference/generated/numpy.ufunc.resolve_dtypes.html to predict the input and output type combination that a ufunc would use. As such, it applies NEP-50 rules which are value-insensitive.

How does this PR change Awkward?

In main, the following behavior is exhibited:

>>> import awkward as ak
>>> import numpy as np
>>> x = ak.from_numpy(np.arange(10, dtype=np.uint8)) + np.int64(1)
>>> x.type.show()
10 * uint8
>>> y = ak.from_numpy(np.arange(10, dtype=np.uint8)) + np.int64(2**32 - 1)
>>> y.type.show()
10 * uint32
>>> z = ak.from_numpy(np.arange(10, dtype=np.uint8)) + (2**32 - 1)
>>> z.type.show()
10 * uint32

After this PR:

>>> import awkward as ak
>>> import numpy as np
>>> x = ak.from_numpy(np.arange(10, dtype=np.uint8)) + np.int64(1)
>>> x.type.show()
10 * int64
>>> y = ak.from_numpy(np.arange(10, dtype=np.uint8)) + np.int64(2**32 - 1)
>>> y.type.show()
10 * int64
>>> z = ak.from_numpy(np.arange(10, dtype=np.uint8)) + (2**32 - 1)
DeprecationWarning: NumPy will stop allowing conversion of out-of-bound Python integers to integer arrays.  The conversion of 4294967296 to uint8 will fail in the future.
For the old behavior, usually:
    np.array(value).astype(dtype)
will give the desired result (the cast overflows).
  return self._module.asarray(obj, dtype=dtype)
>>> z.type.show()
10 * uint8

Crucially this means that old code will start throwing warnings if bare Python scalars exceed the size of the array dtype. This is probably not all that likely; most of the time we pull values from other array operations, e.g.

>>> array_of_large_values = ak.from_numpy(np.array([2**32-20], dtype=np.uint32))
>>> large_value = np.max(array_of_large_values)
>>> w = ak.from_numpy(np.arange(10, dtype=np.uint8)) + large_value
>>> w
<Array [4294967276, 4294967277, ..., 4294967284, 4294967285] type='10 * uint32'>

NumPy is going in this direction with NEP-50 (under consideration only so far). The only other way we could do this in a "safer" type agnostic fashion is to always take the default for the Python value kind (e.g. int64 for integrals, etc.), or the largest type for value kinds. I prefer to align with NumPy, and let warnings inform users.

codecov · 2023-10-23T17:45:59Z

Codecov Report

Merging #2767 (f848940) into main (589b351) will increase coverage by 0.00%.
The diff coverage is 86.79%.

Additional details and impacted files

Files	Coverage Δ
src/awkward/_backends/backend.py	`87.50% <100.00%> (+0.83%)`	⬆️
src/awkward/_backends/jax.py	`100.00% <100.00%> (ø)`
src/awkward/_connect/numpy.py	`91.73% <100.00%> (-0.04%)`	⬇️
src/awkward/_nplikes/array_module.py	`87.50% <100.00%> (+0.54%)`	⬆️
src/awkward/_nplikes/jax.py	`80.85% <100.00%> (+1.30%)`	⬆️
src/awkward/_slicing.py	`86.59% <100.00%> (ø)`
src/awkward/_nplikes/numpylike.py	`73.72% <72.72%> (-0.04%)`	⬇️
src/awkward/_nplikes/typetracer.py	`76.46% <81.81%> (-0.02%)`	⬇️

jpivarski

There are a lot of changes here. Expanding

_apply_ufunc(ufunc, *args)

to

apply_ufunc(ufunc, method, input_args, kwargs)

is a good idea.

I have scrolled through it carefully, and I don't see any red flags. Instances of nplike.asarray would look dangerous if I didn't know that this function is already carefully controlled.

So, it looks good, and feel free to merge it if you're sure you're done.

agoose77 · 2023-10-23T22:36:09Z

@jpivarski I updated the PR description to make it explicit that this will have a high-level effect upon the visible type semantics of ufunc operations. When you next have a moment, could you confirm that you're happy to move in this direction? I think it's where NumPy is moving, but moreover we need our operations to be value agnostic, so I think we have to

jpivarski · 2023-10-24T15:06:05Z

Previously, weren't we gluing our type promotion behavior to whatever NumPy does (by actually calling NumPy on empty arrays)? Then we'd inherit whatever NumPy decides to do in the future. Is that not what would automatically happen, without this PR?

As an aside, while it's bad for the output type of an operation to depend on the values in arrays, I'm less compelled to worry about it depending on the values of Python objects (scalar int or float), unless we can somehow have these Python objects be unknown in a Dask DAG. The type-tracer mechanism allows all of the values in our array buffers (and their sizes) to be unknown with known types, but Python scalar int and float are outside of this system. If it was a np.int32 or a np.float64, that would be inside the system, as those are the kinds of values that can come out of arrays.

Also, it would be fine to assume that any Python int is always np.int64 (we do that in several places, NumPy does it on all platforms but Windows, and Numba does it universally), and that any Python float is always np.float64 (which it literally is, unlike the arbitrary-precision int).

I'm not against this PR; if it's necessary to fully conform to NEP-50 and there are good reasons to believe that NumPy will actually adopt NEP-50.

agoose77 · 2023-10-24T15:41:56Z

Previously, weren't we gluing our type promotion behavior to whatever NumPy does (by actually calling NumPy on empty arrays)? Then we'd inherit whatever NumPy decides to do in the future. Is that not what would automatically happen, without this PR?

Somewhat; the main places that we need to define what happens w.r.t types is in typetracer. We want the same rules to apply to typetracer and non-typetracer, so our ufunc handling is changed in this PR to ensure that we can predict types. So we have to implement scaffolding for typetracer, and right now we're choosing to be value (which is always 0 for typetracer) dependent.

The real problem with NumPy's default handling is that scalars are always typed by value; e.g. np.int64(0) resolves to np.int8/np.int8 if added to such an array. This is worse than just being value-sensitive for Python scalars which may well be effectively static constants in the program.

jpivarski · 2023-10-24T16:10:21Z

>>> np.array([1, 2, 3, 4, 5], np.int8) + np.int64(10)
array([11, 12, 13, 14, 15], dtype=int8)

Yikes! Okay, that's terrible. Yes, you're right to introduce scaffolding to predict types manually.

Although this does it right:

>>> np.array([1, 2, 3, 4, 5], np.int8) + np.array([np.int64(10)])
array([11, 12, 13, 14, 15])

It seems that it just needs to not be a scalar (ndims == 0 array). This is perhaps a better way:

>>> np.array([1, 2, 3, 4, 5], np.int8) + np.int64(10)[np.newaxis]
array([11, 12, 13, 14, 15])

agoose77 · 2023-10-24T16:37:01Z

>>> np.array([1, 2, 3, 4, 5], np.int8) + np.int64(10)
array([11, 12, 13, 14, 15], dtype=int8)
Yikes! Okay, that's terrible. Yes, you're right to introduce scaffolding to predict types manually.

Although this does it right:
>>> np.array([1, 2, 3, 4, 5], np.int8) + np.array([np.int64(10)])
array([11, 12, 13, 14, 15])
It seems that it just needs to not be a scalar (ndims == 0 array). This is perhaps a better way:
>>> np.array([1, 2, 3, 4, 5], np.int8) + np.int64(10)[np.newaxis]
array([11, 12, 13, 14, 15])

Yes, I'm genuinely surprised that this is the default behavior, but I'm sure that there's a long and complex reason behind its history.

The behaviour you describe above is now what this PR delivers:

>>> ak.from_numpy(np.arange(10, dtype=np.uint8)) + np.int64(2**32 - 1)
<Array [4294967295, 4294967296, ..., 4294967303, 4294967304] type='10 * int64'>

In the examples in the PR description, I demonstrate what happens to untyped scalars, which inherit the integer flavour of the array;

>>> ak.from_numpy(np.arange(10, dtype=np.uint8)) + (2**32 - 1)
/home/angus/Git/awkward/src/awkward/_nplikes/array_module.py:44: DeprecationWarning: NumPy will stop allowing conversion of out-of-bound Python integers to integer arrays.  The conversion of 4294967295 to uint8 will fail in the future.
For the old behavior, usually:
    np.array(value).astype(dtype)
will give the desired result (the cast overflows).
  return self._module.asarray(obj, dtype=dtype)
<Array [255, 0, 1, 2, 3, 4, 5, 6, 7, 8] type='10 * uint8'>

jpivarski · 2023-10-24T16:41:59Z

I'm in favor of this.

Also, I read over this PR carefully yesterday; it can be merged.

agoose77 · 2023-10-24T16:47:35Z

Also, I read over this PR carefully yesterday; it can be merged.

Thanks — changing something as fundamental as this feels a lot safer with a detailed review :)

src/awkward/_nplikes/array_module.py

refactor: move ufunc preparation to nplike

f848940

agoose77 requested a review from jpivarski October 23, 2023 17:32

agoose77 deployed to docs October 23, 2023 17:50 — with GitHub Actions View deployment

jpivarski approved these changes Oct 23, 2023

View reviewed changes

agoose77 changed the title ~~refactor: rework ufunc type promotion handling~~ feat!: rework ufunc type promotion handling Oct 24, 2023

agoose77 commented Oct 24, 2023

View reviewed changes

src/awkward/_nplikes/array_module.py Show resolved Hide resolved

agoose77 merged commit 97cf35f into main Oct 24, 2023
39 checks passed

agoose77 deleted the agoose77/refactor-nplike-ufunc-type-promotion branch October 24, 2023 17:00

agoose77 mentioned this pull request Oct 30, 2023

ci: fix oldest-supported-dependencies test workflow #2783

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat!: rework ufunc type promotion handling #2767

feat!: rework ufunc type promotion handling #2767

agoose77 commented Oct 23, 2023 •

edited

Loading

codecov bot commented Oct 23, 2023

jpivarski left a comment

agoose77 commented Oct 23, 2023

jpivarski commented Oct 24, 2023

agoose77 commented Oct 24, 2023

jpivarski commented Oct 24, 2023

agoose77 commented Oct 24, 2023

jpivarski commented Oct 24, 2023

agoose77 commented Oct 24, 2023

feat!: rework ufunc type promotion handling #2767

feat!: rework ufunc type promotion handling #2767

Conversation

agoose77 commented Oct 23, 2023 • edited Loading

codecov bot commented Oct 23, 2023

Codecov Report

jpivarski left a comment

Choose a reason for hiding this comment

agoose77 commented Oct 23, 2023

jpivarski commented Oct 24, 2023

agoose77 commented Oct 24, 2023

jpivarski commented Oct 24, 2023

agoose77 commented Oct 24, 2023

jpivarski commented Oct 24, 2023

agoose77 commented Oct 24, 2023

agoose77 commented Oct 23, 2023 •

edited

Loading