feat: add `.attrs` to highlevel objects #2757

agoose77 · 2023-10-13T11:44:29Z

This will partially fix #1391 by adding .attrs, a dictionary of metadata that is associated with the top-level array, with the following semantics:

Operations with multiple arrays take the set-union of properties, first wins
Keys beginning with a special prefix (@) will be removed at serialisation time.

Whilst touching some of this code, I did some work on type hints and refactoring.

agoose77 · 2023-10-13T13:33:49Z

Design question (1) — should array.attrs always return a dict, or can it return None?

Return `None`

(invert the below for "always return a dict")

Convenient token for "no metadata", allowing optimisations (including non-Awkward cases)
Harder to set metadata in a non-functional style (unless metadata already set)

jpivarski · 2023-10-13T18:00:03Z

The _parameters are internally None or dict for performance, but the external parameters property always returns a dict for uniformity. In fact, the property creates the dict so that it can be mutably updated from that point onward.

I think this is a good compromise, and it would follow the principle of least surprise if attrs does the same thing. In fact, I think this is what behaviors does, too.

You were only asking about how it looks from the outside, so I'm voting "always return a dict." But since most users won't be using attrs (that's the situation we're starting from before this first implementation), so using None internally minimizes the performance impact of adding a new feature across all ak.Arrays.

If we really need a has_attrs: bool (more likely a has_attr(name)), then we can add it.

Oh, unlike parameters, I think we should not conflate key-not-found with value-is-None. We should allow users to consider None a meaningful value for an attribute, distinct from not having that attribute. attrs is the users' space, much more so than parameters.

agoose77 · 2023-10-15T17:57:11Z

The _parameters are internally None or dict for performance, but the external parameters property always returns a dict for uniformity. In fact, the property creates the dict so that it can be mutably updated from that point onward.
...
I think this is a good compromise, and it would follow the principle of least surprise if attrs does the same thing. In fact, I think this is what behaviors does, too.

Actually, Array.behavior returns None | dict, making it possible to predict whether the array will use the global behavior lookup.

attrs doesn't have that benefit; there's only a single namespace to look at, so this motivation for exposing None disappears. I think it's therefore OK to hide the detail from users.

Oh, unlike parameters, I think we should not conflate key-not-found with value-is-None. We should allow users to consider None a meaningful value for an attribute, distinct from not having that attribute. attrs is the users' space, much more so than parameters.

Agreed.

agoose77 · 2023-10-22T22:22:04Z

This PR indicates that it would be nice to have something like

ctx = HighLevelCtx()

layouts = ctx.finalise([
    ak.to_layout(ctx.maybe_highlevel(a)) for a in arrays
])

backend = ctx.backend
behavior = ctx.behavior

Rather than the existing stateless functional API that revisits inputs multiple times. This function would ensure all layouts have the proper "final" backend.

As such, it's on-hold for PR #2763 and another PR.

agoose77 · 2023-10-30T23:13:58Z

@jpivarski this is a big PR.

I've split it into commits that should be a bit more manageable.

I still need to do a pass to confirm that the various policy settings used by ak.to_layout are sensible, but I'd benefit from the main review to confirm that the (internal) API changes are a step in the right direction.

The purpose of the HighLevelContext addition to this PR was make the general idiom of pulling behaviors off of high-level object(s) and applying them to the results is less error prone. The use of ctx.wrap_layout leads to an Exception if the context hasn't been explicitly finalised, making it harder to pass the wrong behavior argument around.

There's a test failure pertaining to the choice to permit records, that I'll revisit.

agoose77 · 2023-10-31T13:53:45Z

Do we have a formal policy regarding which functions should prohibit/support record objects? e.g. ak.is_none on a record.

jpivarski

It is a big PR with a lot of differences, although many of them come from adding the single extra attrs argument to lots of functions, which in many cases turns them from single-line to multi-line due to black.

The context object is a big difference. I see that this context object is private, for our own use, and I'm not certain that it's necessary for keeping track of two pieces of data (behavior and attrs), but it's fine.

(In your comment, I thought you were referring to a public context object, so that people can say,

with ak.Context(behavior, attrs):
    ak.something...

although that doesn't make a lot of sense: one would want the behavior and attrs to be permanently glued to specific objects and not others. But I guess you were talking about a private context.)

I'm in favor of the direction that this PR is heading.

src/awkward/_attrs.py

jpivarski

This is a huge work, and tests/test_2757_attrs_metadata.py is very extensive, parameterizing over so many functions to make sure that attrs are correctly passed through all of them.

My only request is about Numba: see below. If you have questions about that, I can help during our meeting tomorrow.

src/awkward/_connect/numba/arrayview.py

agoose77 force-pushed the agoose77/feat-array-attrs branch 10 times, most recently from ffd0597 to 4a7361e Compare October 19, 2023 22:08

agoose77 marked this pull request as ready for review October 19, 2023 22:11

agoose77 temporarily deployed to docs October 19, 2023 22:18 — with GitHub Actions Inactive

agoose77 temporarily deployed to docs October 19, 2023 22:48 — with GitHub Actions Inactive

agoose77 added the pr-on-hold This PR is inactive due to a pending decision or other constraint label Oct 20, 2023

agoose77 marked this pull request as draft October 23, 2023 17:34

agoose77 force-pushed the agoose77/feat-array-attrs branch from bf62d43 to c346295 Compare October 30, 2023 11:53

pre-commit-ci bot temporarily deployed to docs October 30, 2023 12:11 Inactive

agoose77 force-pushed the agoose77/feat-array-attrs branch 5 times, most recently from b8f6058 to 78231ff Compare October 30, 2023 23:13

agoose77 requested a review from jpivarski October 30, 2023 23:14

jpivarski reviewed Oct 31, 2023

View reviewed changes

src/awkward/_attrs.py Outdated Show resolved Hide resolved

agoose77 requested a review from jpivarski November 7, 2023 14:50

agoose77 temporarily deployed to docs November 7, 2023 15:08 — with GitHub Actions Inactive

agoose77 added 9 commits November 7, 2023 19:57

feat: add initial implementation of attrs

fc0f68b

refactor: move pickling to private module

8bd797d

fix: implement support for disabling custom pickle in tests

ab630c5

test: test attrs

5764dcb

test: fix old test usage (unrelated)

7effa28

test: importorskip arrow (unrelated)

081d554

test: raise TypeError for to_regular

d5c5da1

feat: change prefix to @

7aa4086

test: update test

eaec10a

agoose77 force-pushed the agoose77/feat-array-attrs branch from c8b34b2 to eaec10a Compare November 7, 2023 19:58

agoose77 temporarily deployed to docs November 7, 2023 20:06 — with GitHub Actions Inactive

fix: use of ctx.behavior

f22139e

agoose77 temporarily deployed to docs November 7, 2023 21:10 — with GitHub Actions Inactive

Merge branch 'main' into agoose77/feat-array-attrs

fda71a4

jpivarski removed the pr-on-hold This PR is inactive due to a pending decision or other constraint label Nov 7, 2023

agoose77 temporarily deployed to docs November 7, 2023 21:55 — with GitHub Actions Inactive

jpivarski requested changes Nov 7, 2023

View reviewed changes

src/awkward/_connect/numba/arrayview.py Outdated Show resolved Hide resolved

refactor: store attrs on NumbaLookup

5f1df9c

agoose77 commented Nov 8, 2023

View reviewed changes

src/awkward/_connect/numba/arrayview.py Show resolved Hide resolved

agoose77 temporarily deployed to docs November 8, 2023 01:04 — with GitHub Actions Inactive

fix: more removals

e7cccc9

agoose77 deployed to docs November 8, 2023 01:22 — with GitHub Actions View deployment

jpivarski approved these changes Nov 8, 2023

View reviewed changes

agoose77 merged commit 8a2fa20 into main Nov 8, 2023
36 checks passed

agoose77 deleted the agoose77/feat-array-attrs branch November 8, 2023 16:23

chrispap95 mentioned this pull request Feb 13, 2024

tests are broken scikit-hep/fastjet#268

Closed

agoose77 mentioned this pull request Feb 13, 2024

ak.zip does not accept record objects #3023

Closed

jpivarski mentioned this pull request Aug 22, 2024

"Can't pickle" error message changed? #3223

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add `.attrs` to highlevel objects #2757

feat: add `.attrs` to highlevel objects #2757

agoose77 commented Oct 13, 2023 •

edited

Loading

agoose77 commented Oct 13, 2023

jpivarski commented Oct 13, 2023

agoose77 commented Oct 15, 2023

agoose77 commented Oct 22, 2023

agoose77 commented Oct 30, 2023

agoose77 commented Oct 31, 2023

jpivarski left a comment

jpivarski left a comment

feat: add .attrs to highlevel objects #2757

feat: add .attrs to highlevel objects #2757

Conversation

agoose77 commented Oct 13, 2023 • edited Loading

agoose77 commented Oct 13, 2023

Return None

jpivarski commented Oct 13, 2023

agoose77 commented Oct 15, 2023

agoose77 commented Oct 22, 2023

agoose77 commented Oct 30, 2023

agoose77 commented Oct 31, 2023

jpivarski left a comment

Choose a reason for hiding this comment

jpivarski left a comment

Choose a reason for hiding this comment

feat: add `.attrs` to highlevel objects #2757

feat: add `.attrs` to highlevel objects #2757

agoose77 commented Oct 13, 2023 •

edited

Loading

Return `None`