Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

replace carry with indexedarray #261

Merged
merged 28 commits into from
Jun 30, 2020

Conversation

ianna
Copy link
Collaborator

@ianna ianna commented May 12, 2020

Address issue #204

Following a discussion with @jpivarski to finish what @Ellipse0934 has started in #214

@ianna ianna marked this pull request as draft May 12, 2020 12:00
@ianna
Copy link
Collaborator Author

ianna commented May 12, 2020

@jpivarski - it looks like the Identities get truncated...

        content1 = awkward1.layout.NumpyArray(numpy.array([1, 2, 3, 4, 5]))
        content2 = awkward1.layout.NumpyArray(numpy.array([1.1, 2.2, 3.3, 4.4, 5.5, 6.6, 7.7, 8.8, 9.9]))
        offsets = awkward1.layout.Index64(numpy.array([0, 3, 3, 5, 6, 9]))
        listoffsetarray = awkward1.layout.ListOffsetArray64(offsets, content2)
    
        recordarray = awkward1.layout.RecordArray({"one": content1, "two": listoffsetarray})
        recordarray2 = awkward1.layout.RecordArray({"outer": awkward1.layout.RegularArray(recordarray, 1)})
        recordarray2.setidentities()
        assert recordarray2["outer"].identities.fieldloc == [(0, "outer")]
>       assert recordarray2["outer", 0, "one"].identities.fieldloc == [(0, "outer"), (1, "one")]
E       AssertionError: assert [(0, 'outer')] == [(0, 'outer'), (1, 'one')]
E         Right contains one more item: (1, 'one')
E         Full diff:
E         - [(0, 'outer')]
E         + [(0, 'outer'), (1, 'one')]

Here carry index is:

<Index64 i="[0]" offset="0" length="1" at="0x7fa648d80fa0"/>

that gives the following IndexedArray as an output of the RecordArray::carry

<IndexedArray64>
    <Identities32 ref="7" fieldloc="[(0, 'outer')]" width="2" offset="0" length="1" at="0x7f92edc07d70"/>
    <index><Index64 i="[0]" offset="0" length="1" at="0x7f92edc04990"/></index>
    <content><RecordArray>
        <Identities32 ref="7" fieldloc="[(0, 'outer')]" width="2" offset="0" length="5" at="0x7f92edc06af0"/>
        <field index="0" key="one">
            <NumpyArray format="l" shape="5" data="1 2 3 4 5" at="0x7f92eda05b70">
                <Identities32 ref="7" fieldloc="[(0, 'outer') (1, 'one')]" width="2" offset="0" length="5" at="0x7f92edc06af0"/>
            </NumpyArray>
        </field>
        <field index="1" key="two">
            <ListOffsetArray64>
                <Identities32 ref="7" fieldloc="[(0, 'outer') (1, 'two')]" width="2" offset="0" length="5" at="0x7f92edc06af0"/>
                <offsets><Index64 i="[0 3 3 5 6 9]" offset="0" length="6" at="0x7f92eda06bd0"/></offsets>
                <content><NumpyArray format="d" shape="9" data="1.1 2.2 3.3 4.4 5.5 6.6 7.7 8.8 9.9" at="0x7f92eda046e0">
                    <Identities64 ref="8" fieldloc="[(0, 'outer') (1, 'two')]" width="3" offset="0" length="9" at="0x7f92edc073a0"/>
                </NumpyArray></content>
            </ListOffsetArray64>
        </field>
    </RecordArray></content>
</IndexedArray64>

Here is what it should have been before:

<RecordArray>
    <Identities32 ref="7" fieldloc="[(0, 'outer')]" width="2" offset="0" length="1" at="0x7fa648d7fd30"/>
    <field index="0" key="one">
        <NumpyArray format="l" shape="1" data="3" at="0x7fa648d7fa40">
            <Identities32 ref="7" fieldloc="[(0, 'outer') (1, 'one')]" width="2" offset="0" length="1" at="0x7fa648d839f0"/>
        </NumpyArray>
    </field>
    <field index="1" key="two">
        <ListArray64>
            <Identities32 ref="7" fieldloc="[(0, 'outer') (1, 'two')]" width="2" offset="0" length="1" at="0x7fa648d83c20"/>
            <starts><Index64 i="[3]" offset="0" length="1" at="0x7fa648d7fa30"/></starts>
            <stops><Index64 i="[5]" offset="0" length="1" at="0x7fa648d83b40"/></stops>
            <content><NumpyArray format="d" shape="9" data="1.1 2.2 3.3 4.4 5.5 6.6 7.7 8.8 9.9" at="0x7fa648d4e5a0">
                <Identities64 ref="8" fieldloc="[(0, 'outer') (1, 'two')]" width="3" offset="0" length="9" at="0x7fa648d653c0"/>
            </NumpyArray></content>
        </ListArray64>
    </field>
</RecordArray>

fail in highlevel.py line 839
	modified:   tests/test_0025-record-array.py
	modified:   tests/test_0056-partitioned-array.py
	modified:   tests/test_0084-start-unionarray.py
	modified:   tests/test_0111-jagged-and-masked-getitem.py
	modified:   tests/test_0117-finish-the-sizes-operation.py
@ianna
Copy link
Collaborator Author

ianna commented Jun 16, 2020

@jpivarski - I think, I'd need to modify the high-level Python functions:

    def test_listarray():
        one = awkward1.Array([[{"x": 1}, {"x": 2}, {"x": 3}], [], [{"x": 4}, {"x": 5}]], check_valid=True)
        two = awkward1.Array([[1.1, 2.2, 3.3], [], [4.4, 5.5]], check_valid=True)
>       assert awkward1.to_list(awkward1.with_field(one, two, "y")) == [[{"x": 1, "y": 1.1}, {"x": 2, "y": 2.2}, {"x": 3, "y": 3.3}], [], [{"x": 4, "y": 4.4}, {"x": 5, "y": 5.5}]]

tests/test_0107-assign-fields-to-records.py:80: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
awkward1/operations/structure.py:453: in with_field
    out = awkward1._util.broadcast_and_apply(
awkward1/_util.py:714: in broadcast_and_apply
    out = apply(broadcast_pack(inputs, isscalar), 0)
awkward1/_util.py:570: in apply
    outcontent = apply(nextinputs, depth + 1)
awkward1/_util.py:614: in apply
    outcontent = apply(nextinputs, depth + 1)
awkward1/_util.py:449: in apply
    return apply([x if not isinstance(x, indexedtypes) else x.project()
awkward1/_util.py:449: in apply
>   if (any(isinstance(x, listtypes) for x in inputs) and
        not any(isinstance(x, (awkward1.layout.Content,
                               awkward1.layout.Record)) and
                x.has_virtual_form for x in inputs)):
E               RecursionError: maximum recursion depth exceeded while calling a Python object

awkward1/_util.py:401: RecursionError

@jpivarski
Copy link
Member

I don't know where the recursion error is, but yes, it should be investigated. Does that test example work in master or is it somehow related to this PR? (If the latter, it's an assumption that broke about what to expect from a given function.)

Oh! It could be IndexedArray::project, which should return something with one less level of indexing. On the Python side, broadcast_and_apply unwraps arrays to get down to the NumpyArray level (the leaves of the tree) and it unwraps IndexedArrays with project. If project on the C++ side is calling carry on its argument and now RecordArray::carry returns a new IndexedArray, then that unwrapping step is oscillating between unwrap and rewrap. IndexedArray::project, at least, should eagerly carry, so maybe carry needs an eagerness argument to let IndexedArray::project force an eager projection in RecordArray::carry, whereas it's lazy in most other cases.

If that's it, then the bug (bad assumption) manifested itself on the Python side, but can/should be fixed on the C++ side.

@ianna
Copy link
Collaborator Author

ianna commented Jun 17, 2020

I don't know where the recursion error is, but yes, it should be investigated. Does that test example work in master or is it somehow related to this PR? (If the latter, it's an assumption that broke about what to expect from a given function.)

Oh! It could be IndexedArray::project, which should return something with one less level of indexing. On the Python side, broadcast_and_apply unwraps arrays to get down to the NumpyArray level (the leaves of the tree) and it unwraps IndexedArrays with project. If project on the C++ side is calling carry on its argument and now RecordArray::carry returns a new IndexedArray, then that unwrapping step is oscillating between unwrap and rewrap. IndexedArray::project, at least, should eagerly carry, so maybe carry needs an eagerness argument to let IndexedArray::project force an eager projection in RecordArray::carry, whereas it's lazy in most other cases.

If that's it, then the bug (bad assumption) manifested itself on the Python side, but can/should be fixed on the C++ side.

@jpivarski - thanks! An eager projection has fixed this issue.

@ianna
Copy link
Collaborator Author

ianna commented Jun 17, 2020

@jpivarski - Here is where we stand so far. The 10 tests that cause a Segmentation fault are commented out, the 3 identity tests give different results - please, see in the commit. One test has been modified to allow a different type:

assert isinstance(a, (awkward1.layout.RecordArray, awkward1.layout.IndexedArray64))

@jpivarski
Copy link
Member

What specifically do you want me to check?

The Python change you made is very likely right: the output that used to always be RecordArray can now be IndexedArray (whenever the carry is lazy, which is almost always: IndexedArray::project being an exception). I guess it would always be IndexedArray64 because new (output) arrays are always 64-bit.

@ianna
Copy link
Collaborator Author

ianna commented Jun 17, 2020

What specifically do you want me to check?

The Python change you made is very likely right: the output that used to always be RecordArray can now be IndexedArray (whenever the carry is lazy, which is almost always: IndexedArray::project being an exception). I guess it would always be IndexedArray64 because new (output) arrays are always 64-bit.

@jpivarski - Please, review the PR when you have time. I think, I'm done with it for the moment: all but identity tests run fine. I'm not sure what to do with truncated identities and identity fields. Also, I might have overdone on eager carry in IndexedArray::getitem_*

@ianna ianna marked this pull request as ready for review June 17, 2020 13:01
@ianna ianna requested a review from jpivarski June 17, 2020 13:01
…to ianna/recordarray-carry-to-return-indexedarray
Copy link
Member

@jpivarski jpivarski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a few inline comments about copy, which could be better named must_be_eager (without changing how it's used) or allow_lazy (swapping all true values for false and vice-versa).

About the failing identities test: one of them is simply that the test needs to be updated. The tests operate at the level of layout nodes and you've inserted one new layout node (an IndexedArray), so the test needs to descend one level with .content to get to where it was before.

The other was an indication of a much deeper issue. In the old implementation, the quantity being tested is a Record, and in the new implementation, it's a one element RecordArray. I didn't follow it all the way back to find out what caused that, but it's somewhere in the getitem methods, not Identities-handling.

Edit: after lots of investigation (trying to make a suitable test-case), I found that I had been testing the wrong thing. Old and new give the same results from getitem; they only differ in the Identities. I'm going to leave some of those examples here, anyway.

This calls for a more stringent set of tests. You can start with structures like these:

import awkward1
import numpy
a = awkward1.from_iter([
    {"outer": [{"x": 0, "y": []}, {"x": 1, "y": [1]}, {"x": 2, "y": [1, 2]}]},
    {"outer": []},
    {"outer": [{"x": 3, "y": [1, 2, 3]}, {"x": 4, "y": [1, 2, 3, 4]}]}
], highlevel=False)
a.setidentities()

b = awkward1.from_iter([
    {"outer": {"x": 0, "y": []}},
    {"outer": {"x": 1, "y": [1]}},
    {"outer": {"x": 2, "y": [1, 2]}},
    {"outer": {"x": 3, "y": [1, 2, 3]}},
    {"outer": {"x": 4, "y": [1, 2, 3, 4]}}
], highlevel=False)
b.setidentities()

which both have two levels of RecordArrays, but a has lists in between.

Outside of this PR, make a lot of selections on this record structure to pick out elements at every level and write assertions for the awkward1.to_list(...) of each selection, as well as assertions for the Identities, and then ensure that they get the same results in the PR.

>>> a[2, "outer", 1, "y"]
<NumpyArray format="l" shape="4" data="1 2 3 4" at="0x55cb15b9f100">
    <Identities64 ref="2" fieldloc="[(0, 'outer') (1, 'y')]" width="3" offset="18" length="4" at="0x55cb15849140"/>
</NumpyArray>
>>> assert awkward1.to_list(a[2, "outer", 1, "y"]) == [1, 2, 3, 4]
>>> assert a[2, "outer", 1, "y"].identities.fieldloc == [(0, 'outer'), (1, 'y')]
>>> assert numpy.asarray(a[2, "outer", 1, "y"].identities).tolist() == [
...        [2, 1, 0],
...        [2, 1, 1],
...        [2, 1, 2],
...        [2, 1, 3]
... ]

Regardless of the fact that it's just an Identities error, doing these kinds of tests can make the transition a lot safer, and they're also a good example of how to diagnose Identities.

include/awkward/Content.h Outdated Show resolved Hide resolved
include/awkward/array/RawArray.h Outdated Show resolved Hide resolved
include/awkward/array/RawArray.h Outdated Show resolved Hide resolved
tests/test_0025-record-array.py Outdated Show resolved Hide resolved
tests/test_0025-record-array.py Outdated Show resolved Hide resolved
@jpivarski
Copy link
Member

FYI, to have a clean build I just picked up a master and tried to build it on my Mac. There is a failure in the test

This is an error in Matplotlib, something unrelated to the thing we're testing (colors?), and is probably due to a bad Matplotlib installation. It's an optional test—if you don't have Matplotlib installed, it will be skipped. (I do have Matplotlib installed, and the test runs fine; CI does not and skips it.)

If you don't need Matplotlib, you can uninstall your copy and the test will be skipped. Otherwise, you probably want to fix your installation, anyway. If you do have Matplotlib and it works in other contexts, then I don't know what's going on.

I usually install these things with Conda (exclusively—I switched over about a year ago with a new computer). It manages binary versions of all your packages so that you always have a compatible set (which is why it has to be exclusive—it was hard to switch over from apt-get without starting fresh). The downside of this is that installing one little package sometimes means upgrading 20 others and downgrading 5, but at least they all work together.

@ianna
Copy link
Collaborator Author

ianna commented Jun 19, 2020

FYI, to have a clean build I just picked up a master and tried to build it on my Mac. There is a failure in the test

This is an error in Matplotlib, something unrelated to the thing we're testing (colors?), and is probably due to a bad Matplotlib installation. It's an optional test—if you don't have Matplotlib installed, it will be skipped. (I do have Matplotlib installed, and the test runs fine; CI does not and skips it.)

If you don't need Matplotlib, you can uninstall your copy and the test will be skipped. Otherwise, you probably want to fix your installation, anyway. If you do have Matplotlib and it works in other contexts, then I don't know what's going on.

I usually install these things with Conda (exclusively—I switched over about a year ago with a new computer). It manages binary versions of all your packages so that you always have a compatible set (which is why it has to be exclusive—it was hard to switch over from apt-get without starting fresh). The downside of this is that installing one little package sometimes means upgrading 20 others and downgrading 5, but at least they all work together.

Thanks, Jim! Yes, I've done that. It's just I haven't seen this kind of failure before and thought I'd let you know.

@jpivarski
Copy link
Member

@jpivarski - please, see the benchmark results on MacOS 10.15.5 (19F101) 2.6 GHz 6-Core Intel Core i7

These look great!

The PR accomplished its goal of speeding up operations on big records:

  • Cartesian product with big records (pions and protons): 2.7×
  • Combinations with big records (pions): 8×
  • Combinations with big records (photons): 11× (with the much slower photon selection factored out)

The reason it was a fix to RecordArray::carry, rather than ak.cartesian and ak.combinations, was so that performance on non-records wouldn't be worsened. But since these weren't changed, they can't be any faster or slower than they were.

I looked into the one failing test, but I couldn't see what was going on there. I just triggered a test on master to see if it's some new bug that's crept in due to dependencies changing under us or something.

I'll review the actual code soon. Thanks!

@ianna
Copy link
Collaborator Author

ianna commented Jun 26, 2020

@jpivarski - please, see the benchmark results on MacOS 10.15.5 (19F101) 2.6 GHz 6-Core Intel Core i7

These look great!

The PR accomplished its goal of speeding up operations on big records:

  • Cartesian product with big records (pions and protons): 2.7×
  • Combinations with big records (pions): 8×
  • Combinations with big records (photons): 11× (with the much slower photon selection factored out)

The reason it was a fix to RecordArray::carry, rather than ak.cartesian and ak.combinations, was so that performance on non-records wouldn't be worsened. But since these weren't changed, they can't be any faster or slower than they were.

I looked into the one failing test, but I couldn't see what was going on there. I just triggered a test on master to see if it's some new bug that's crept in due to dependencies changing under us or something.

I'll review the actual code soon. Thanks!

Thanks, @jpivarski ! Here is the price we pay (length = 10000):

Before:

In [97]: arr = ak.Array([[_ for _ in range(length)]])                                                                                        

In [98]: %%timeit         #TEST 3 
    ...: x = ak.combinations(arr, 2) 
    ...:  
    ...:                                                                                                                                     
828 ms ± 11.1 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

In [99]: %%timeit         #TEST 4 
    ...: x = ak.combinations(arr, 2) 
    ...: ak.sum(x) 
    ...:  
    ...:                                                                                                                                     
907 ms ± 19.9 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

vs with this PR:

In [73]: arr = ak.Array([[_ for _ in range(length)]])                                                                                        

In [74]: %%timeit         #TEST 3 
    ...: x = ak.combinations(arr, 2) 
    ...:  
    ...:                                                                                                                                     
2.04 s ± 175 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

In [75]: %%timeit         #TEST 4 
    ...: x = ak.combinations(arr, 2) 
    ...: ak.sum(x) 
    ...:  
    ...:                                                                                                                                     
2.06 s ± 197 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

@jpivarski
Copy link
Member

It looks like a factor of 2 cost, but I don't know why there would be any cost. Maybe it's something else? Nevertheless, I'll be looking at the code soon.

@jpivarski
Copy link
Member

I looked into the one failing test, but I couldn't see what was going on there. I just triggered a test on master to see if it's some new bug that's crept in due to dependencies changing under us or something.

The master branch is passing, so I started another on this branch.

@ianna
Copy link
Collaborator Author

ianna commented Jun 28, 2020

@jpivarski - Moving back to an eager carry for all except for ak.combinations improves its performance, however, there is no improvement for ak.cartesian

In [19]: %%timeit 
    ...: pairs = ak.cartesian([events.pions, events.protons]) 
    ...:  
    ...:                                                                                                                                     
69.7 ms ± 2.83 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

In [20]: %%timeit -n100 -r1  
    ...: pairs = ak.combinations(events.pions, 2, with_name="pair") 
    ...:  
    ...:                                                                                                                                     
12.8 ms ± 0 ns per loop (mean ± std. dev. of 1 run, 100 loops each)

In [21]: evts = events.prt[events.prt.pdg == Particle.from_string("gamma").pdgid]                                                            

In [22]: %%timeit 
    ...: pizero_candidates = ak.combinations(evts, 2, with_name="pair") 
    ...:  
    ...:                                                                                                                                     
12.5 ms ± 213 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

on the plus side, there is no performance degradation here:

In [25]: arr = ak.Array([[_ for _ in range(length)]])                                                                                        

In [26]: %%timeit 
    ...: x = ak.combinations(arr, 2) 
    ...:  
    ...:                                                                                                                                     
812 ms ± 15.8 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

In [27]: %%timeit 
    ...: x = ak.combinations(arr, 2) 
    ...: ak.sum(x) 
    ...:  
    ...:                                                                                                                                     
879 ms ± 27.3 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

@ianna
Copy link
Collaborator Author

ianna commented Jun 29, 2020

@jpivarski - finally, everything looks reasonable here:

In [15]: %%timeit  
    ...: pairs = ak.cartesian([events.pions, events.protons])  
    ...:  
    ...:                                                                                                                                     
14.6 ms ± 624 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

In [16]: %%timeit -n100 -r1  
    ...: pairs = ak.combinations(events.pions, 2, with_name="pair")  
    ...:  
    ...:                                                                                                                                     
9.44 ms ± 0 ns per loop (mean ± std. dev. of 1 run, 100 loops each)

In [17]: evts = events.prt[events.prt.pdg == Particle.from_string("gamma").pdgid]                                                            

In [18]: %%timeit  
    ...: pizero_candidates = ak.combinations(evts, 2, with_name="pair") 
    ...:  
    ...:                                                                                                                                     
12.4 ms ± 140 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

and here:

In [5]: length=10000                                                                                                                         

In [6]: arr = ak.Array([[_ for _ in range(length)]])                                                                                         

In [7]: %%timeit 
   ...: x = ak.combinations(arr, 2) 
   ...: ak.sum(x) 
   ...:  
   ...:                                                                                                                                      
873 ms ± 8.04 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

In [8]: %%timeit 
   ...: x = ak.combinations(arr, 2) 
   ...:  
   ...:                                                                                                                                      
814 ms ± 11.2 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

@ianna ianna changed the title [WIP] replace carry with indexedarray replace carry with indexedarray Jun 29, 2020
@ianna
Copy link
Collaborator Author

ianna commented Jun 30, 2020

It looks like a factor of 2 cost, but I don't know why there would be any cost. Maybe it's something else? Nevertheless, I'll be looking at the code soon.

It was a cost of a dynamic cast put in place as a workaround. It’s not needed any more and was removed.

@ianna
Copy link
Collaborator Author

ianna commented Jun 30, 2020

@jpivarski - I think, the PR is ready for a review :-)

Copy link
Member

@jpivarski jpivarski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great, apart from one correction: the header-file declarations name the new carry argument "must_be_eager" and the implementations name it "allow_lazy", which have opposite meanings:

% fgrep -r 'carry(' include/
include/awkward/Content.h:      carry(const Index64& carry, bool allow_lazy = false) const = 0;
include/awkward/array/RecordArray.h:      carry(const Index64& carry, bool must_be_eager) const override;
include/awkward/array/IndexedArray.h:      carry(const Index64& carry, bool must_be_eager) const override;
include/awkward/array/None.h:      carry(const Index64& carry, bool must_be_eager) const override;
include/awkward/array/BitMaskedArray.h:      carry(const Index64& carry, bool must_be_eager) const override;
include/awkward/array/NumpyArray.h:      carry(const Index64& carry, bool must_be_eager) const override;
include/awkward/array/UnionArray.h:      carry(const Index64& carry, bool must_be_eager) const override;
include/awkward/array/RawArray.h:      carry(const Index64& carry, bool must_be_eager) const override {
include/awkward/array/RawArray.h:        Index64 nextcarry(lenhead);
include/awkward/array/RawArray.h:        return carry(nextcarry, false);
include/awkward/array/RawArray.h:      return carry(flathead, false);
include/awkward/array/VirtualArray.h:      carry(const Index64& carry, bool must_be_eager) const override;
include/awkward/array/ByteMaskedArray.h:      carry(const Index64& carry, bool must_be_eager) const override;
include/awkward/array/ListOffsetArray.h:      carry(const Index64& carry, bool must_be_eager) const override;
include/awkward/array/ListArray.h:      carry(const Index64& carry, bool must_be_eager) const override;
include/awkward/array/UnmaskedArray.h:      carry(const Index64& carry, bool must_be_eager) const override;
include/awkward/array/Record.h:      carry(const Index64& carry, bool must_be_eager) const override;
include/awkward/array/RegularArray.h:      carry(const Index64& carry, bool must_be_eager) const override;
include/awkward/array/EmptyArray.h:      carry(const Index64& carry, bool must_be_eager) const override;
% fgrep -r '::carry(' src/
src/libawkward/array/ListOffsetArray.cpp:  ListOffsetArrayOf<T>::carry(const Index64& carry, bool allow_lazy) const {
src/libawkward/array/UnionArray.cpp:  UnionArrayOf<T, I>::carry(const Index64& carry, bool allow_lazy) const {
src/libawkward/array/ByteMaskedArray.cpp:  ByteMaskedArray::carry(const Index64& carry, bool allow_lazy) const {
src/libawkward/array/Record.cpp:  Record::carry(const Index64& carry, bool allow_lazy) const {
src/libawkward/array/None.cpp:  None::carry(const Index64& carry, bool allow_lazy) const {
src/libawkward/array/NumpyArray.cpp:  NumpyArray::carry(const Index64& carry, bool allow_lazy) const {
src/libawkward/array/BitMaskedArray.cpp:  BitMaskedArray::carry(const Index64& carry, bool allow_lazy) const {
src/libawkward/array/RecordArray.cpp:  RecordArray::carry(const Index64& carry, bool allow_lazy) const {
src/libawkward/array/IndexedArray.cpp:  IndexedArrayOf<T, ISOPTION>::carry(const Index64& carry, bool allow_lazy) const {
src/libawkward/array/ListArray.cpp:  ListArrayOf<T>::carry(const Index64& carry, bool allow_lazy) const {
src/libawkward/array/EmptyArray.cpp:  EmptyArray::carry(const Index64& carry, bool allow_lazy) const {
src/libawkward/array/UnmaskedArray.cpp:  UnmaskedArray::carry(const Index64& carry, bool allow_lazy) const {
src/libawkward/array/VirtualArray.cpp:  VirtualArray::carry(const Index64& carry, bool allow_lazy) const {
src/libawkward/array/RegularArray.cpp:  RegularArray::carry(const Index64& carry, bool allow_lazy) const {

Looking at the implementation of RecordArray::carry, it's clear that the true meaning of this parameter is "allow lazy."

https://github.com/scikit-hep/awkward-1.0/blob/c27a4e5c037c9d582a412ee9a6c73c93063b83a1/src/libawkward/array/RecordArray.cpp#L739-L762

which I prefer anyway (shorter to express; easier to think about). But I think I can make that change. I'll put in the next commit and see if the tests still pass.

@jpivarski
Copy link
Member

Now I just want to get a handle on the ways allow_lazy are used. The parameter is passed down in only 5 classes:

  • VirtualArray
  • UnmaskedArray
  • BitMaskedArray
  • ByteMaskedArray
  • RegularArray

Is there a reason to single out these 5? I just answered my own question: these are the only 5 classes for whom carry is not a shallow operation—if any other classes passed carry down to their contents, they, too, would pass allow_lazy rather than setting it to be a particular value. The one exception to this is that RecordArray::carry is not shallow when allow_lazy = false, but then forcing allow_lazy = false when recursing in RecordArray::carry is a tautology.

Fortunately, all the other times that carry is called, the true or false is on the same line (I checked). So all the trues are:

% fgrep -rn 'carry(' src | fgrep true
src/libawkward/array/ListOffsetArray.cpp:285:    ContentPtr nextcontent = content_.get()->carry(nextcarry, true);
src/libawkward/array/ListOffsetArray.cpp:1474:      ContentPtr nextcontent = content_.get()->carry(nextcarry, true);
src/libawkward/array/ListOffsetArray.cpp:1664:        contents.push_back(content_.get()->carry(Index64(ptr, 0, totallen), true));
src/libawkward/array/ListOffsetArray.cpp:2037:    ContentPtr nextcontent = content_.get()->carry(nextcarry, true);
src/libawkward/array/ListOffsetArray.cpp:2086:    ContentPtr nextcontent = content_.get()->carry(nextcarry, true);
src/libawkward/array/ListOffsetArray.cpp:2145:      ContentPtr nextcontent = content_.get()->carry(nextcarry, true);
src/libawkward/array/ListOffsetArray.cpp:2169:      ContentPtr nextcontent = content_.get()->carry(nextcarry, true);
src/libawkward/array/UnionArray.cpp:661:      return contents[0].get()->carry(index, true);
src/libawkward/array/ByteMaskedArray.cpp:1039:      ContentPtr next = content_.get()->carry(nextcarry, true);
src/libawkward/array/IndexedArray.cpp:363:      return content_.get()->carry(nextcarry, true);
src/libawkward/array/IndexedArray.cpp:1144:        ContentPtr next = content_.get()->carry(nextcarry, true);
src/libawkward/array/IndexedArray.cpp:1162:        ContentPtr next = content_.get()->carry(nextcarry, true);
src/libawkward/array/IndexedArray.cpp:1283:      ContentPtr next = content_.get()->carry(nextcarry, true);
src/libawkward/array/IndexedArray.cpp:1310:      ContentPtr next = content_.get()->carry(nextcarry, true);
src/libawkward/array/IndexedArray.cpp:1668:      ContentPtr next = content_.get()->carry(nextcarry, true);
src/libawkward/array/IndexedArray.cpp:1861:    ContentPtr next = content_.get()->carry(nextcarry, true);
src/libawkward/array/IndexedArray.cpp:1929:        ContentPtr next = content_.get()->carry(nextcarry, true);
src/libawkward/array/IndexedArray.cpp:1965:        ContentPtr next = content_.get()->carry(nextcarry, true);
src/libawkward/array/IndexedArray.cpp:2285:      ContentPtr next = content_.get()->carry(nextcarry, true);
src/libawkward/array/IndexedArray.cpp:2306:      ContentPtr next = content_.get()->carry(nextcarry, true);
src/libawkward/array/ListArray.cpp:1360:        contents.push_back(content_.get()->carry(Index64(ptr, 0, totallen), true));
src/libawkward/array/RegularArray.cpp:230:      ContentPtr nextcontent = content_.get()->carry(nextcarry, true);
src/libawkward/array/RegularArray.cpp:983:        contents.push_back(content_.get()->carry(Index64(ptr, 0, totallen), true));
src/libawkward/array/RegularArray.cpp:1203:      ContentPtr nextcontent = content_.get()->carry(nextcarry, true);
src/libawkward/array/RegularArray.cpp:1225:      ContentPtr nextcontent = content_.get()->carry(nextcarry, true);

and all the falses are

% fgrep -rn 'carry(' src | fgrep false
src/libawkward/array/ListOffsetArray.cpp:1790:      ContentPtr nextcontent = content_.get()->carry(nextcarry, false);
src/libawkward/array/ListOffsetArray.cpp:1804:      outcontent = outcontent.get()->carry(outcarry, false);
src/libawkward/array/ListOffsetArray.cpp:1945:      ContentPtr nextcontent = content_.get()->carry(nextcarry, false);
src/libawkward/array/ListOffsetArray.cpp:1959:      outcontent = outcontent.get()->carry(outcarry, false);
src/libawkward/array/UnionArray.cpp:431:    return contents_[(size_t)index].get()->carry(nextcarry, false);
src/libawkward/array/ByteMaskedArray.cpp:223:    return content_.get()->carry(nextcarry, false);
src/libawkward/array/ByteMaskedArray.cpp:610:      ContentPtr next = content_.get()->carry(nextcarry, false);
src/libawkward/array/ByteMaskedArray.cpp:659:                                             content_.get()->carry(carry, false),
src/libawkward/array/ByteMaskedArray.cpp:712:      ContentPtr next = content_.get()->carry(nextcarry, false);
src/libawkward/array/ByteMaskedArray.cpp:735:      ContentPtr next = content_.get()->carry(nextcarry, false);
src/libawkward/array/ByteMaskedArray.cpp:944:    ContentPtr next = content_.get()->carry(nextcarry, false);
src/libawkward/array/ByteMaskedArray.cpp:1009:      ContentPtr next = content_.get()->carry(nextcarry, false);
src/libawkward/array/ByteMaskedArray.cpp:1086:    ContentPtr next = content_.get()->carry(nextcarry, false);
src/libawkward/array/ByteMaskedArray.cpp:1171:    ContentPtr next = content_.get()->carry(nextcarry, false);
src/libawkward/array/ByteMaskedArray.cpp:1300:      ContentPtr next = content_.get()->carry(nextcarry, false);
src/libawkward/array/RecordArray.cpp:754:        contents.push_back(content.get()->carry(carry, false));
src/libawkward/array/IndexedArray.cpp:375:      return content_.get()->carry(nextcarry, false);
src/libawkward/array/IndexedArray.cpp:2020:    ContentPtr next = content_.get()->carry(nextcarry, false);
src/libawkward/array/IndexedArray.cpp:2123:    ContentPtr next = content_.get()->carry(nextcarry, false);
src/libawkward/array/ListArray.cpp:266:    ContentPtr nextcontent = content_.get()->carry(nextcarry, false);
src/libawkward/array/ListArray.cpp:1456:    ContentPtr nextcontent = content_.get()->carry(nextcarry, false);
src/libawkward/array/ListArray.cpp:1510:    ContentPtr nextcontent = content_.get()->carry(nextcarry, false);
src/libawkward/array/ListArray.cpp:1574:      ContentPtr nextcontent = content_.get()->carry(nextcarry, false);
src/libawkward/array/ListArray.cpp:1598:      ContentPtr nextcontent = content_.get()->carry(nextcarry, false);
src/libawkward/array/ListArray.cpp:1637:    ContentPtr carried = content_.get()->carry(nextcarry, false);
src/libawkward/array/ListArray.cpp:1700:    ContentPtr nextcontent = content_.get()->carry(nextcarry, false);
src/libawkward/array/ListArray.cpp:1758:      ContentPtr nextcontent = content_.get()->carry(nextcarry, false);
src/libawkward/array/RegularArray.cpp:1093:    ContentPtr nextcontent = content_.get()->carry(nextcarry, false);
src/libawkward/array/RegularArray.cpp:1145:    ContentPtr nextcontent = content_.get()->carry(nextcarry, false);

To be more specific, carry is only used with allow_lazy = true in the following situations:

  • A: ByteMaskedArray::combinations.
  • A: IndexedArray::combinations.
  • A: ListArray::combinations.
  • A: ListOffsetArray::combinations.
  • A: RegularArray::combinations.
  • A: In fact, every class whose combinations implementation calls carry does so with allow_lazy = true. Good.
  • B: IndexedArray::project, but only when it's option-type. This is the minimum required to break recursion: if broadcast_and_apply (on the Python side) is opening up an IndexedOptionArray containing a RecordArray, the first step would turn it into an IndexedArray of RecordArray and the next step would turn it into a RecordArray (because IndexedArray::project is eager when not option-type). But if the default policy is to be eager, only lazily carrying in combinations, then perhaps IndexedArray::project should be simplified to only be eager like everything else.
  • B: IndexedArray::num (both). Same argument.
  • B: IndexedArray::offsets_and_flattened (only relevant for option-type). Same argument.
  • B: IndexedArray::asslice (only relevant for option-type). Same argument, more strongly in this case: arrays used as slices do not allow indirection (for simplicity), so the laziness will be going away anyway. This case should always be eager, to save an unnecessary extra step.
  • B: IndexedArray::reduce_next (both). Same argument, and also fairly strongly for this method because reducers will completely unravel an array eventually, anyway. Laziness doesn't help.
  • B: IndexedArray::localindex (only relevant for option-type). Same argument.
  • B: IndexedArray::getitem_next_jagged_generic (both). Same argument.
  • B: ListOffsetArray::reduce_next. This one should be eager for the reasons described in IndexedArray::reduce_next.
  • C: ListOffsetArray::broadcast_tooffsets64. This could actually be very good for performance in functions defined in Python involving broadcasting. If broadcasting is lazy on RecordArrays, then broadcasting and only accessing one or a few fields would have better performance than eagerly broadcasting all fields. It keeps the broadcasting operation shallow.
  • C: RegularArray::broadcast_tooffsets64. Good: this is consistent with the above.
  • D: UnionArray::simplify_uniontype. This could be good for performance.
  • E: IndexedArray::getitem_next(all) (both option-type and non-option-type). It could be useful for getitem_next methods to be lazy, but that should be consistent for all classes.
  • E: ListOffsetArray::getitem_next(at, range, array). Same as above.
  • E: RegularArray::getitem_next(at, range) are eager and RegularArray::getitem_next(array) is lazy. The inconsistency is probably not a good thing.

In the next commit, I'm going to make all the IndexedArray implementations except IndexedArray::combinations and IndexedArray::getitem_next be eager, as well as ListOffsetArray::reduce_next (all the B cases above).

…th the definite exception of 'combinations' and the possible exception of 'getitem_next' (leaving it for now). Same for 'ListOffsetArray::reduce_next'.
@jpivarski
Copy link
Member

Of the getitem_next (cases E above), only the following are relevant (non-trivial implementation that calls carry):

  • ByteMaskedArray:getitem_next (centralized method). Currently, it uses eager carry.
  • RegularArray::getitem_next (decentralized: looking at SliceAt, SliceRange, SliceArray). Currently, it uses eager carry.
  • IndexedArray::getitem_next (centralized). Currently, it uses lazy carry.
  • ListOffsetArray::getitem_next (decentralized). Currently, it uses lazy carry.
  • ListArray::getitem_next (decentralized). Currently, it uses eager carry.

I'm going to try making all get-items of SliceAt, SliceRange, and SliceArray use a lazy carry (allow_lazy = true).

…ay, and jagged all use lazy carrys. Error in identities (test_0025-record-array) has been reopened: will be fixing that next.
…ong-standing bug in RecordArray::getitem_range_nowrap: it was not propagating Identities at all, though it should have been.
… (except IndexedArray, which runs the risk of an infinite loop), and ByteMaskedArray::carry should pass down allow_lazy to its content.
…rds are lazily carried in all operations that broadcast (i.e. all operations that take more than one array argument).
@jpivarski
Copy link
Member

@ianna I think it's done. Thanks for all your work on this!

I may have invalidated your latest timing measurements, but probably not by much because I left the laziness on in ak.combinations. With laziness turned on in all cases of broadcast_tooffsets64, the ak.cartesian function (which takes two arguments, but is otherwise similar to ak.combinations) is now fully lazy as well, which should give it a similar performance boost as ak.combinations. At least, they have the same form (the x/y RecordArray below is wrapped in IndexedArray as a result of the operation—no expensive carrying of wide records):

>>> import awkward1 as ak
>>> records = ak.Array([[{"x": 1.1, "y": 100}, {"x": 2.2, "y": 200}, {"x": 3.3, "y": 300}],
...                    [],
...                    [{"x": 4.4, "y": 400}, {"x": 5.5, "y": 500}]])
>>> records.layout.form
{
    "class": "ListOffsetArray64",
    "offsets": "i64",
    "content": {
        "class": "RecordArray",
        "contents": {
            "x": "float64",
            "y": "int64"
        }
    }
}
>>> ak.combinations(records, 2).layout.form
{
    "class": "ListOffsetArray64",
    "offsets": "i64",
    "content": {
        "class": "RecordArray",
        "contents": [
            {
                "class": "IndexedArray64",
                "index": "i64",
                "content": {
                    "class": "RecordArray",
                    "contents": {
                        "x": "float64",
                        "y": "int64"
                    }
                }
            },
            {
                "class": "IndexedArray64",
                "index": "i64",
                "content": {
                    "class": "RecordArray",
                    "contents": {
                        "x": "float64",
                        "y": "int64"
                    }
                }
            }
        ]
    }
}
>>> ak.cartesian([records, records]).layout.form
{
    "class": "ListOffsetArray64",
    "offsets": "i64",
    "content": {
        "class": "RecordArray",
        "contents": [
            {
                "class": "IndexedArray64",
                "index": "i64",
                "content": {
                    "class": "RecordArray",
                    "contents": {
                        "x": "float64",
                        "y": "int64"
                    }
                }
            },
            {
                "class": "IndexedArray64",
                "index": "i64",
                "content": {
                    "class": "RecordArray",
                    "contents": {
                        "x": "float64",
                        "y": "int64"
                    }
                }
            }
        ]
    }
}

Having this extra handle will give us the leverage to tune performance wherever users raise the issue. Thanks!

@jpivarski jpivarski merged commit 6e6de66 into master Jun 30, 2020
@jpivarski jpivarski deleted the ianna/recordarray-carry-to-return-indexedarray branch June 30, 2020 19:56
@ianna
Copy link
Collaborator Author

ianna commented Jun 30, 2020

Thank you @jpivarski !!!

@ianna
Copy link
Collaborator Author

ianna commented Jul 1, 2020

@jpivarski - the final numbers look great! Thanks!
Here is the master:

In [33]: %%timeit -n1000 -r1 
    ...: pairs = ak.cartesian([events.pions, events.protons], with_name="pair") 
    ...:  
    ...:                                                                                                                                     
7.93 ms ± 0 ns per loop (mean ± std. dev. of 1 run, 1000 loops each)

In [34]: %%timeit -n100 -r1 
    ...: pairs = ak.combinations(events.pions, 2, with_name="pair")  
    ...:  
    ...:                                                                                                                                     
16.4 ms ± 0 ns per loop (mean ± std. dev. of 1 run, 100 loops each)

In [35]: %%timeit  
    ...: pizero_candidates = ak.combinations(events.prt[events.prt.pdg == Particle.from_string("gamma").pdgid], 2, with_name="pair")  
    ...:  
    ...:                                                                                                                                     
269 ms ± 3.66 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

In [36]: evts = events.prt[events.prt.pdg == Particle.from_string("gamma").pdgid]                                                            

In [37]: %%timeit  
    ...: pizero_candidates = ak.combinations(evts, 2, with_name="pair") 
    ...:  
    ...:                                                                                                                                     
25.1 ms ± 472 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

In [38]: %%timeit 
    ...: pizero_candidates = ak.combinations(events.prt[events.prt.pdg == Particle.from_string("gamma").pdgid], 2, with_name="pair") 
    ...: pizero = pizero_candidates[pizero_candidates.mass(0, 0) - 0.13498 < 0.000001]  
    ...: pizero["px"] = pizero.slot0.px + pizero.slot1.px  
    ...: pizero["py"] = pizero.slot0.py + pizero.slot1.py  
    ...: pizero["pz"] = pizero.slot0.pz + pizero.slot1.pz 
    ...: pizero["p"] = np.sqrt(pizero.px**2 + pizero.py**2 + pizero.pz**2)  
    ...:  
    ...:                                                                                                                                     
1.08 s ± 22.1 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

and before:

In [72]: %%timeit -n1000 -r1 
    ...: pairs = ak.cartesian([events.pions, events.protons], with_name="pair") 
    ...:  
    ...:                                                                                                                                     
67.4 ms ± 0 ns per loop (mean ± std. dev. of 1 run, 1000 loops each)

In [80]: %%timeit -n100 -r1 
    ...: pairs = ak.combinations(events.pions, 2, with_name="pair") 
    ...:  
    ...:                                                                                                                                     
356 ms ± 0 ns per loop (mean ± std. dev. of 1 run, 100 loops each)

In [83]: %%timeit 
    ...: pizero_candidates = ak.combinations(events.prt[events.prt.pdg == Particle.from_string("gamma").pdgid], 2, with_name="pair") 
    ...:  
    ...:                                                                                                                                     
830 ms ± 14.7 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

In [85]: evts = events.prt[events.prt.pdg == Particle.from_string("gamma").pdgid]                                                            

In [86]: %%timeit 
    ...: pizero_candidates = ak.combinations(evts, 2, with_name="pair") 
    ...:  
    ...:                                                                                                                                     
551 ms ± 2 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

In [94]: %%timeit 
    ...: pizero_candidates = ak.combinations(events.prt[events.prt.pdg == Particle.from_string("gamma").pdgid], 2, with_name="pair") 
    ...: pizero = pizero_candidates[pizero_candidates.mass(0, 0) - 0.13498 < 0.000001] 
    ...: pizero["px"] = pizero.slot0.px + pizero.slot1.px 
    ...: pizero["py"] = pizero.slot0.py + pizero.slot1.py 
    ...: pizero["pz"] = pizero.slot0.pz + pizero.slot1.pz 
    ...: pizero["p"] = np.sqrt(pizero.px**2 + pizero.py**2 + pizero.pz**2) 
    ...:  
    ...:                                                                                                                                     
2.24 s ± 36.8 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

There is no change in here:

In [19]: length=10000                                                                                                                        

In [20]: arr = ak.Array([[_ for _ in range(length)]])                                                                                        

In [21]: %%timeit  
    ...: x = ak.combinations(arr, 2)  
    ...:  
    ...:                                                                                                                                     
851 ms ± 7.37 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

In [22]: %%timeit 
    ...: x = ak.combinations(arr, 2)  
    ...: ak.sum(x)  
    ...:  
    ...:                                                                                                                                     
910 ms ± 8.88 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

before:

In [97]: arr = ak.Array([[_ for _ in range(length)]])                                                                                        

In [98]: %%timeit
    ...: x = ak.combinations(arr, 2) 
    ...:  
    ...:                                                                                                                                     
828 ms ± 11.1 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

In [99]: %%timeit 
    ...: x = ak.combinations(arr, 2) 
    ...: ak.sum(x) 
    ...:  
    ...:                                                                                                                                     
907 ms ± 19.9 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants