-
Notifications
You must be signed in to change notification settings - Fork 87
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
replace carry with indexedarray #261
replace carry with indexedarray #261
Conversation
@jpivarski - it looks like the Identities get truncated... content1 = awkward1.layout.NumpyArray(numpy.array([1, 2, 3, 4, 5]))
content2 = awkward1.layout.NumpyArray(numpy.array([1.1, 2.2, 3.3, 4.4, 5.5, 6.6, 7.7, 8.8, 9.9]))
offsets = awkward1.layout.Index64(numpy.array([0, 3, 3, 5, 6, 9]))
listoffsetarray = awkward1.layout.ListOffsetArray64(offsets, content2)
recordarray = awkward1.layout.RecordArray({"one": content1, "two": listoffsetarray})
recordarray2 = awkward1.layout.RecordArray({"outer": awkward1.layout.RegularArray(recordarray, 1)})
recordarray2.setidentities()
assert recordarray2["outer"].identities.fieldloc == [(0, "outer")]
> assert recordarray2["outer", 0, "one"].identities.fieldloc == [(0, "outer"), (1, "one")]
E AssertionError: assert [(0, 'outer')] == [(0, 'outer'), (1, 'one')]
E Right contains one more item: (1, 'one')
E Full diff:
E - [(0, 'outer')]
E + [(0, 'outer'), (1, 'one')] Here carry index is: <Index64 i="[0]" offset="0" length="1" at="0x7fa648d80fa0"/> that gives the following IndexedArray as an output of the <IndexedArray64>
<Identities32 ref="7" fieldloc="[(0, 'outer')]" width="2" offset="0" length="1" at="0x7f92edc07d70"/>
<index><Index64 i="[0]" offset="0" length="1" at="0x7f92edc04990"/></index>
<content><RecordArray>
<Identities32 ref="7" fieldloc="[(0, 'outer')]" width="2" offset="0" length="5" at="0x7f92edc06af0"/>
<field index="0" key="one">
<NumpyArray format="l" shape="5" data="1 2 3 4 5" at="0x7f92eda05b70">
<Identities32 ref="7" fieldloc="[(0, 'outer') (1, 'one')]" width="2" offset="0" length="5" at="0x7f92edc06af0"/>
</NumpyArray>
</field>
<field index="1" key="two">
<ListOffsetArray64>
<Identities32 ref="7" fieldloc="[(0, 'outer') (1, 'two')]" width="2" offset="0" length="5" at="0x7f92edc06af0"/>
<offsets><Index64 i="[0 3 3 5 6 9]" offset="0" length="6" at="0x7f92eda06bd0"/></offsets>
<content><NumpyArray format="d" shape="9" data="1.1 2.2 3.3 4.4 5.5 6.6 7.7 8.8 9.9" at="0x7f92eda046e0">
<Identities64 ref="8" fieldloc="[(0, 'outer') (1, 'two')]" width="3" offset="0" length="9" at="0x7f92edc073a0"/>
</NumpyArray></content>
</ListOffsetArray64>
</field>
</RecordArray></content>
</IndexedArray64> Here is what it should have been before: <RecordArray>
<Identities32 ref="7" fieldloc="[(0, 'outer')]" width="2" offset="0" length="1" at="0x7fa648d7fd30"/>
<field index="0" key="one">
<NumpyArray format="l" shape="1" data="3" at="0x7fa648d7fa40">
<Identities32 ref="7" fieldloc="[(0, 'outer') (1, 'one')]" width="2" offset="0" length="1" at="0x7fa648d839f0"/>
</NumpyArray>
</field>
<field index="1" key="two">
<ListArray64>
<Identities32 ref="7" fieldloc="[(0, 'outer') (1, 'two')]" width="2" offset="0" length="1" at="0x7fa648d83c20"/>
<starts><Index64 i="[3]" offset="0" length="1" at="0x7fa648d7fa30"/></starts>
<stops><Index64 i="[5]" offset="0" length="1" at="0x7fa648d83b40"/></stops>
<content><NumpyArray format="d" shape="9" data="1.1 2.2 3.3 4.4 5.5 6.6 7.7 8.8 9.9" at="0x7fa648d4e5a0">
<Identities64 ref="8" fieldloc="[(0, 'outer') (1, 'two')]" width="3" offset="0" length="9" at="0x7fa648d653c0"/>
</NumpyArray></content>
</ListArray64>
</field>
</RecordArray> |
fail in highlevel.py line 839 modified: tests/test_0025-record-array.py modified: tests/test_0056-partitioned-array.py modified: tests/test_0084-start-unionarray.py modified: tests/test_0111-jagged-and-masked-getitem.py modified: tests/test_0117-finish-the-sizes-operation.py
@jpivarski - I think, I'd need to modify the high-level Python functions: def test_listarray():
one = awkward1.Array([[{"x": 1}, {"x": 2}, {"x": 3}], [], [{"x": 4}, {"x": 5}]], check_valid=True)
two = awkward1.Array([[1.1, 2.2, 3.3], [], [4.4, 5.5]], check_valid=True)
> assert awkward1.to_list(awkward1.with_field(one, two, "y")) == [[{"x": 1, "y": 1.1}, {"x": 2, "y": 2.2}, {"x": 3, "y": 3.3}], [], [{"x": 4, "y": 4.4}, {"x": 5, "y": 5.5}]]
tests/test_0107-assign-fields-to-records.py:80:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
awkward1/operations/structure.py:453: in with_field
out = awkward1._util.broadcast_and_apply(
awkward1/_util.py:714: in broadcast_and_apply
out = apply(broadcast_pack(inputs, isscalar), 0)
awkward1/_util.py:570: in apply
outcontent = apply(nextinputs, depth + 1)
awkward1/_util.py:614: in apply
outcontent = apply(nextinputs, depth + 1)
awkward1/_util.py:449: in apply
return apply([x if not isinstance(x, indexedtypes) else x.project()
awkward1/_util.py:449: in apply > if (any(isinstance(x, listtypes) for x in inputs) and
not any(isinstance(x, (awkward1.layout.Content,
awkward1.layout.Record)) and
x.has_virtual_form for x in inputs)):
E RecursionError: maximum recursion depth exceeded while calling a Python object
awkward1/_util.py:401: RecursionError |
I don't know where the recursion error is, but yes, it should be investigated. Does that test example work in master or is it somehow related to this PR? (If the latter, it's an assumption that broke about what to expect from a given function.) Oh! It could be If that's it, then the bug (bad assumption) manifested itself on the Python side, but can/should be fixed on the C++ side. |
@jpivarski - thanks! An eager projection has fixed this issue. |
@jpivarski - Here is where we stand so far. The 10 tests that cause a Segmentation fault are commented out, the 3 identity tests give different results - please, see in the commit. One test has been modified to allow a different type: assert isinstance(a, (awkward1.layout.RecordArray, awkward1.layout.IndexedArray64)) |
What specifically do you want me to check? The Python change you made is very likely right: the output that used to always be RecordArray can now be IndexedArray (whenever the carry is lazy, which is almost always: |
@jpivarski - Please, review the PR when you have time. I think, I'm done with it for the moment: all but identity tests run fine. I'm not sure what to do with truncated identities and identity fields. Also, I might have overdone on eager carry in |
…to ianna/recordarray-carry-to-return-indexedarray
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a few inline comments about copy
, which could be better named must_be_eager
(without changing how it's used) or allow_lazy
(swapping all true values for false and vice-versa).
About the failing identities test: one of them is simply that the test needs to be updated. The tests operate at the level of layout nodes and you've inserted one new layout node (an IndexedArray), so the test needs to descend one level with .content
to get to where it was before.
The other was an indication of a much deeper issue. In the old implementation, the quantity being tested is a Record, and in the new implementation, it's a one element RecordArray. I didn't follow it all the way back to find out what caused that, but it's somewhere in the getitem
methods, not Identities-handling.
Edit: after lots of investigation (trying to make a suitable test-case), I found that I had been testing the wrong thing. Old and new give the same results from getitem
; they only differ in the Identities. I'm going to leave some of those examples here, anyway.
This calls for a more stringent set of tests. You can start with structures like these:
import awkward1
import numpy
a = awkward1.from_iter([
{"outer": [{"x": 0, "y": []}, {"x": 1, "y": [1]}, {"x": 2, "y": [1, 2]}]},
{"outer": []},
{"outer": [{"x": 3, "y": [1, 2, 3]}, {"x": 4, "y": [1, 2, 3, 4]}]}
], highlevel=False)
a.setidentities()
b = awkward1.from_iter([
{"outer": {"x": 0, "y": []}},
{"outer": {"x": 1, "y": [1]}},
{"outer": {"x": 2, "y": [1, 2]}},
{"outer": {"x": 3, "y": [1, 2, 3]}},
{"outer": {"x": 4, "y": [1, 2, 3, 4]}}
], highlevel=False)
b.setidentities()
which both have two levels of RecordArrays, but a
has lists in between.
Outside of this PR, make a lot of selections on this record structure to pick out elements at every level and write assertions for the awkward1.to_list(...)
of each selection, as well as assertions for the Identities, and then ensure that they get the same results in the PR.
>>> a[2, "outer", 1, "y"]
<NumpyArray format="l" shape="4" data="1 2 3 4" at="0x55cb15b9f100">
<Identities64 ref="2" fieldloc="[(0, 'outer') (1, 'y')]" width="3" offset="18" length="4" at="0x55cb15849140"/>
</NumpyArray>
>>> assert awkward1.to_list(a[2, "outer", 1, "y"]) == [1, 2, 3, 4]
>>> assert a[2, "outer", 1, "y"].identities.fieldloc == [(0, 'outer'), (1, 'y')]
>>> assert numpy.asarray(a[2, "outer", 1, "y"].identities).tolist() == [
... [2, 1, 0],
... [2, 1, 1],
... [2, 1, 2],
... [2, 1, 3]
... ]
Regardless of the fact that it's just an Identities error, doing these kinds of tests can make the transition a lot safer, and they're also a good example of how to diagnose Identities.
This is an error in Matplotlib, something unrelated to the thing we're testing (colors?), and is probably due to a bad Matplotlib installation. It's an optional test—if you don't have Matplotlib installed, it will be skipped. (I do have Matplotlib installed, and the test runs fine; CI does not and skips it.) If you don't need Matplotlib, you can uninstall your copy and the test will be skipped. Otherwise, you probably want to fix your installation, anyway. If you do have Matplotlib and it works in other contexts, then I don't know what's going on. I usually install these things with Conda (exclusively—I switched over about a year ago with a new computer). It manages binary versions of all your packages so that you always have a compatible set (which is why it has to be exclusive—it was hard to switch over from apt-get without starting fresh). The downside of this is that installing one little package sometimes means upgrading 20 others and downgrading 5, but at least they all work together. |
Thanks, Jim! Yes, I've done that. It's just I haven't seen this kind of failure before and thought I'd let you know. |
These look great! The PR accomplished its goal of speeding up operations on big records:
The reason it was a fix to I looked into the one failing test, but I couldn't see what was going on there. I just triggered a test on master to see if it's some new bug that's crept in due to dependencies changing under us or something. I'll review the actual code soon. Thanks! |
Thanks, @jpivarski ! Here is the price we pay ( Before: In [97]: arr = ak.Array([[_ for _ in range(length)]])
In [98]: %%timeit #TEST 3
...: x = ak.combinations(arr, 2)
...:
...:
828 ms ± 11.1 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
In [99]: %%timeit #TEST 4
...: x = ak.combinations(arr, 2)
...: ak.sum(x)
...:
...:
907 ms ± 19.9 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
vs with this PR: In [73]: arr = ak.Array([[_ for _ in range(length)]])
In [74]: %%timeit #TEST 3
...: x = ak.combinations(arr, 2)
...:
...:
2.04 s ± 175 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
In [75]: %%timeit #TEST 4
...: x = ak.combinations(arr, 2)
...: ak.sum(x)
...:
...:
2.06 s ± 197 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
|
It looks like a factor of 2 cost, but I don't know why there would be any cost. Maybe it's something else? Nevertheless, I'll be looking at the code soon. |
The master branch is passing, so I started another on this branch. |
@jpivarski - Moving back to an eager carry for all except for In [19]: %%timeit
...: pairs = ak.cartesian([events.pions, events.protons])
...:
...:
69.7 ms ± 2.83 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
In [20]: %%timeit -n100 -r1
...: pairs = ak.combinations(events.pions, 2, with_name="pair")
...:
...:
12.8 ms ± 0 ns per loop (mean ± std. dev. of 1 run, 100 loops each)
In [21]: evts = events.prt[events.prt.pdg == Particle.from_string("gamma").pdgid]
In [22]: %%timeit
...: pizero_candidates = ak.combinations(evts, 2, with_name="pair")
...:
...:
12.5 ms ± 213 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) on the plus side, there is no performance degradation here: In [25]: arr = ak.Array([[_ for _ in range(length)]])
In [26]: %%timeit
...: x = ak.combinations(arr, 2)
...:
...:
812 ms ± 15.8 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
In [27]: %%timeit
...: x = ak.combinations(arr, 2)
...: ak.sum(x)
...:
...:
879 ms ± 27.3 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) |
@jpivarski - finally, everything looks reasonable here: In [15]: %%timeit
...: pairs = ak.cartesian([events.pions, events.protons])
...:
...:
14.6 ms ± 624 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
In [16]: %%timeit -n100 -r1
...: pairs = ak.combinations(events.pions, 2, with_name="pair")
...:
...:
9.44 ms ± 0 ns per loop (mean ± std. dev. of 1 run, 100 loops each)
In [17]: evts = events.prt[events.prt.pdg == Particle.from_string("gamma").pdgid]
In [18]: %%timeit
...: pizero_candidates = ak.combinations(evts, 2, with_name="pair")
...:
...:
12.4 ms ± 140 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) and here: In [5]: length=10000
In [6]: arr = ak.Array([[_ for _ in range(length)]])
In [7]: %%timeit
...: x = ak.combinations(arr, 2)
...: ak.sum(x)
...:
...:
873 ms ± 8.04 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
In [8]: %%timeit
...: x = ak.combinations(arr, 2)
...:
...:
814 ms ± 11.2 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) |
It was a cost of a dynamic cast put in place as a workaround. It’s not needed any more and was removed. |
@jpivarski - I think, the PR is ready for a review :-) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great, apart from one correction: the header-file declarations name the new carry
argument "must_be_eager
" and the implementations name it "allow_lazy
", which have opposite meanings:
% fgrep -r 'carry(' include/
include/awkward/Content.h: carry(const Index64& carry, bool allow_lazy = false) const = 0;
include/awkward/array/RecordArray.h: carry(const Index64& carry, bool must_be_eager) const override;
include/awkward/array/IndexedArray.h: carry(const Index64& carry, bool must_be_eager) const override;
include/awkward/array/None.h: carry(const Index64& carry, bool must_be_eager) const override;
include/awkward/array/BitMaskedArray.h: carry(const Index64& carry, bool must_be_eager) const override;
include/awkward/array/NumpyArray.h: carry(const Index64& carry, bool must_be_eager) const override;
include/awkward/array/UnionArray.h: carry(const Index64& carry, bool must_be_eager) const override;
include/awkward/array/RawArray.h: carry(const Index64& carry, bool must_be_eager) const override {
include/awkward/array/RawArray.h: Index64 nextcarry(lenhead);
include/awkward/array/RawArray.h: return carry(nextcarry, false);
include/awkward/array/RawArray.h: return carry(flathead, false);
include/awkward/array/VirtualArray.h: carry(const Index64& carry, bool must_be_eager) const override;
include/awkward/array/ByteMaskedArray.h: carry(const Index64& carry, bool must_be_eager) const override;
include/awkward/array/ListOffsetArray.h: carry(const Index64& carry, bool must_be_eager) const override;
include/awkward/array/ListArray.h: carry(const Index64& carry, bool must_be_eager) const override;
include/awkward/array/UnmaskedArray.h: carry(const Index64& carry, bool must_be_eager) const override;
include/awkward/array/Record.h: carry(const Index64& carry, bool must_be_eager) const override;
include/awkward/array/RegularArray.h: carry(const Index64& carry, bool must_be_eager) const override;
include/awkward/array/EmptyArray.h: carry(const Index64& carry, bool must_be_eager) const override;
% fgrep -r '::carry(' src/
src/libawkward/array/ListOffsetArray.cpp: ListOffsetArrayOf<T>::carry(const Index64& carry, bool allow_lazy) const {
src/libawkward/array/UnionArray.cpp: UnionArrayOf<T, I>::carry(const Index64& carry, bool allow_lazy) const {
src/libawkward/array/ByteMaskedArray.cpp: ByteMaskedArray::carry(const Index64& carry, bool allow_lazy) const {
src/libawkward/array/Record.cpp: Record::carry(const Index64& carry, bool allow_lazy) const {
src/libawkward/array/None.cpp: None::carry(const Index64& carry, bool allow_lazy) const {
src/libawkward/array/NumpyArray.cpp: NumpyArray::carry(const Index64& carry, bool allow_lazy) const {
src/libawkward/array/BitMaskedArray.cpp: BitMaskedArray::carry(const Index64& carry, bool allow_lazy) const {
src/libawkward/array/RecordArray.cpp: RecordArray::carry(const Index64& carry, bool allow_lazy) const {
src/libawkward/array/IndexedArray.cpp: IndexedArrayOf<T, ISOPTION>::carry(const Index64& carry, bool allow_lazy) const {
src/libawkward/array/ListArray.cpp: ListArrayOf<T>::carry(const Index64& carry, bool allow_lazy) const {
src/libawkward/array/EmptyArray.cpp: EmptyArray::carry(const Index64& carry, bool allow_lazy) const {
src/libawkward/array/UnmaskedArray.cpp: UnmaskedArray::carry(const Index64& carry, bool allow_lazy) const {
src/libawkward/array/VirtualArray.cpp: VirtualArray::carry(const Index64& carry, bool allow_lazy) const {
src/libawkward/array/RegularArray.cpp: RegularArray::carry(const Index64& carry, bool allow_lazy) const {
Looking at the implementation of RecordArray::carry
, it's clear that the true meaning of this parameter is "allow lazy."
which I prefer anyway (shorter to express; easier to think about). But I think I can make that change. I'll put in the next commit and see if the tests still pass.
Now I just want to get a handle on the ways
Is there a reason to single out these 5? I just answered my own question: these are the only 5 classes for whom Fortunately, all the other times that
and all the
To be more specific,
In the next commit, I'm going to make all the |
…th the definite exception of 'combinations' and the possible exception of 'getitem_next' (leaving it for now). Same for 'ListOffsetArray::reduce_next'.
Of the
I'm going to try making all get-items of |
…ay, and jagged all use lazy carrys. Error in identities (test_0025-record-array) has been reopened: will be fixing that next.
…ong-standing bug in RecordArray::getitem_range_nowrap: it was not propagating Identities at all, though it should have been.
… (except IndexedArray, which runs the risk of an infinite loop), and ByteMaskedArray::carry should pass down allow_lazy to its content.
…rds are lazily carried in all operations that broadcast (i.e. all operations that take more than one array argument).
@ianna I think it's done. Thanks for all your work on this! I may have invalidated your latest timing measurements, but probably not by much because I left the laziness on in >>> import awkward1 as ak
>>> records = ak.Array([[{"x": 1.1, "y": 100}, {"x": 2.2, "y": 200}, {"x": 3.3, "y": 300}],
... [],
... [{"x": 4.4, "y": 400}, {"x": 5.5, "y": 500}]])
>>> records.layout.form
{
"class": "ListOffsetArray64",
"offsets": "i64",
"content": {
"class": "RecordArray",
"contents": {
"x": "float64",
"y": "int64"
}
}
}
>>> ak.combinations(records, 2).layout.form
{
"class": "ListOffsetArray64",
"offsets": "i64",
"content": {
"class": "RecordArray",
"contents": [
{
"class": "IndexedArray64",
"index": "i64",
"content": {
"class": "RecordArray",
"contents": {
"x": "float64",
"y": "int64"
}
}
},
{
"class": "IndexedArray64",
"index": "i64",
"content": {
"class": "RecordArray",
"contents": {
"x": "float64",
"y": "int64"
}
}
}
]
}
}
>>> ak.cartesian([records, records]).layout.form
{
"class": "ListOffsetArray64",
"offsets": "i64",
"content": {
"class": "RecordArray",
"contents": [
{
"class": "IndexedArray64",
"index": "i64",
"content": {
"class": "RecordArray",
"contents": {
"x": "float64",
"y": "int64"
}
}
},
{
"class": "IndexedArray64",
"index": "i64",
"content": {
"class": "RecordArray",
"contents": {
"x": "float64",
"y": "int64"
}
}
}
]
}
} Having this extra handle will give us the leverage to tune performance wherever users raise the issue. Thanks! |
Thank you @jpivarski !!! |
@jpivarski - the final numbers look great! Thanks! In [33]: %%timeit -n1000 -r1
...: pairs = ak.cartesian([events.pions, events.protons], with_name="pair")
...:
...:
7.93 ms ± 0 ns per loop (mean ± std. dev. of 1 run, 1000 loops each)
In [34]: %%timeit -n100 -r1
...: pairs = ak.combinations(events.pions, 2, with_name="pair")
...:
...:
16.4 ms ± 0 ns per loop (mean ± std. dev. of 1 run, 100 loops each)
In [35]: %%timeit
...: pizero_candidates = ak.combinations(events.prt[events.prt.pdg == Particle.from_string("gamma").pdgid], 2, with_name="pair")
...:
...:
269 ms ± 3.66 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
In [36]: evts = events.prt[events.prt.pdg == Particle.from_string("gamma").pdgid]
In [37]: %%timeit
...: pizero_candidates = ak.combinations(evts, 2, with_name="pair")
...:
...:
25.1 ms ± 472 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
In [38]: %%timeit
...: pizero_candidates = ak.combinations(events.prt[events.prt.pdg == Particle.from_string("gamma").pdgid], 2, with_name="pair")
...: pizero = pizero_candidates[pizero_candidates.mass(0, 0) - 0.13498 < 0.000001]
...: pizero["px"] = pizero.slot0.px + pizero.slot1.px
...: pizero["py"] = pizero.slot0.py + pizero.slot1.py
...: pizero["pz"] = pizero.slot0.pz + pizero.slot1.pz
...: pizero["p"] = np.sqrt(pizero.px**2 + pizero.py**2 + pizero.pz**2)
...:
...:
1.08 s ± 22.1 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) and before: In [72]: %%timeit -n1000 -r1
...: pairs = ak.cartesian([events.pions, events.protons], with_name="pair")
...:
...:
67.4 ms ± 0 ns per loop (mean ± std. dev. of 1 run, 1000 loops each)
In [80]: %%timeit -n100 -r1
...: pairs = ak.combinations(events.pions, 2, with_name="pair")
...:
...:
356 ms ± 0 ns per loop (mean ± std. dev. of 1 run, 100 loops each)
In [83]: %%timeit
...: pizero_candidates = ak.combinations(events.prt[events.prt.pdg == Particle.from_string("gamma").pdgid], 2, with_name="pair")
...:
...:
830 ms ± 14.7 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
In [85]: evts = events.prt[events.prt.pdg == Particle.from_string("gamma").pdgid]
In [86]: %%timeit
...: pizero_candidates = ak.combinations(evts, 2, with_name="pair")
...:
...:
551 ms ± 2 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
In [94]: %%timeit
...: pizero_candidates = ak.combinations(events.prt[events.prt.pdg == Particle.from_string("gamma").pdgid], 2, with_name="pair")
...: pizero = pizero_candidates[pizero_candidates.mass(0, 0) - 0.13498 < 0.000001]
...: pizero["px"] = pizero.slot0.px + pizero.slot1.px
...: pizero["py"] = pizero.slot0.py + pizero.slot1.py
...: pizero["pz"] = pizero.slot0.pz + pizero.slot1.pz
...: pizero["p"] = np.sqrt(pizero.px**2 + pizero.py**2 + pizero.pz**2)
...:
...:
2.24 s ± 36.8 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) There is no change in here: In [19]: length=10000
In [20]: arr = ak.Array([[_ for _ in range(length)]])
In [21]: %%timeit
...: x = ak.combinations(arr, 2)
...:
...:
851 ms ± 7.37 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
In [22]: %%timeit
...: x = ak.combinations(arr, 2)
...: ak.sum(x)
...:
...:
910 ms ± 8.88 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) before: In [97]: arr = ak.Array([[_ for _ in range(length)]])
In [98]: %%timeit
...: x = ak.combinations(arr, 2)
...:
...:
828 ms ± 11.1 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
In [99]: %%timeit
...: x = ak.combinations(arr, 2)
...: ak.sum(x)
...:
...:
907 ms ± 19.9 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) |
Address issue #204
Following a discussion with @jpivarski to finish what @Ellipse0934 has started in #214