Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

zip + combinations clobbers data on x86? #2695

Closed
lgray opened this issue Sep 6, 2023 · 7 comments · Fixed by #2697
Closed

zip + combinations clobbers data on x86? #2695

lgray opened this issue Sep 6, 2023 · 7 comments · Fixed by #2697
Labels
bug (unverified) The problem described would be a bug, but needs to be triaged

Comments

@lgray
Copy link
Contributor

lgray commented Sep 6, 2023

Version of Awkward Array

2.4.1

Description and code to reproduce

Starting from investigation in: dask-contrib/dask-histogram#100

import hist
import numpy as np
import awkward as ak

if __name__ == "__main__":

    ar = ak.Array(200*np.random.random((100_000, 3)).astype(np.float32) + 32)

    electrons = ak.zip({"pt": ar})

    combos = ak.combinations(electrons, 2, fields=["first", "second"])

    for _ in range(10):
        x = hist.Hist.new.Variable([0, 10, 20, 30, 40, 80, 120, 200], name="x").Double()
        print(x.fill(x=ak.flatten(combos.second.pt)).values(flow=True))

results in:

[2.00000e+00 6.00000e+00 0.00000e+00 0.00000e+00 1.20520e+04 5.98710e+04
 6.01770e+04 1.20042e+05 4.78500e+04]
[2.00000e+00 6.00000e+00 0.00000e+00 0.00000e+00 1.20520e+04 5.98710e+04
 6.01770e+04 1.20042e+05 4.78500e+04]
[2.00000e+00 6.00000e+00 0.00000e+00 0.00000e+00 1.20520e+04 5.98710e+04
 6.01770e+04 1.20042e+05 4.78500e+04]
[2.00000e+00 6.00000e+00 0.00000e+00 0.00000e+00 1.20520e+04 5.98710e+04
 6.01770e+04 1.20042e+05 4.78500e+04]
[2.00000e+00 6.00000e+00 0.00000e+00 0.00000e+00 1.20520e+04 5.98710e+04
 6.01770e+04 1.20042e+05 4.78500e+04]
[2.00000e+00 6.00000e+00 0.00000e+00 0.00000e+00 1.20520e+04 5.98710e+04
 6.01770e+04 1.20042e+05 4.78500e+04]
[2.00000e+00 6.00000e+00 0.00000e+00 0.00000e+00 1.20520e+04 5.98710e+04
 6.01770e+04 1.20042e+05 4.78500e+04]
[2.00000e+00 6.00000e+00 0.00000e+00 0.00000e+00 1.20520e+04 5.98710e+04
 6.01770e+04 1.20042e+05 4.78500e+04]
[2.00000e+00 6.00000e+00 0.00000e+00 0.00000e+00 1.20520e+04 5.98710e+04
 6.01770e+04 1.20042e+05 4.78500e+04]
[2.00000e+00 6.00000e+00 0.00000e+00 0.00000e+00 1.20520e+04 5.98710e+04
 6.01770e+04 1.20042e+05 4.78500e+04]

Note there are no underflows or values less than 32 in the dataset.

If I print out ak.flatten(combos.second.pt) before the histogram filling step the error magically goes away.
Still might be a histogramming issue.

@lgray lgray added the bug (unverified) The problem described would be a bug, but needs to be triaged label Sep 6, 2023
@lgray
Copy link
Contributor Author

lgray commented Sep 6, 2023

@iasonkrom @NJManganelli

@lgray
Copy link
Contributor Author

lgray commented Sep 6, 2023

Interestingly:

import numpy as np
import awkward as ak

if __name__ == "__main__":

    ar = ak.Array(200*np.random.random((100_000, 3)).astype(np.float32) + 32)

    electrons = ak.zip({"pt": ar})

    combos = ak.combinations(electrons, 2, fields=["first", "second"])

    for _ in range(10):
        print(np.histogram(ak.flatten(combos.second.pt), bins=[0, 10, 20, 30, 40, 80, 120, 200]))

Functions correctly and produces the correct output.
So the problem is somewhere on the histogramming side maybe? This is really odd.

@lgray
Copy link
Contributor Author

lgray commented Sep 6, 2023

If I go to just using the boost_histogram python bindings, rather than hist, the problem is still there.

@henryiii @HDembinski FYI this appears to be some deeply odd interaction between awkward array and boost histogram.

We may want to move the issue there?

@lgray
Copy link
Contributor Author

lgray commented Sep 6, 2023

Interestingly - if I switch dtype from np.float32 to np.float64 (i.e. just remove the astype) this reproducer segfaults when trying to fill the histogram. This change also functions correctly on arm64.

@ikrommyd
Copy link

ikrommyd commented Sep 6, 2023

casting to .astype(np.float64) fixes the problem though

@lgray
Copy link
Contributor Author

lgray commented Sep 6, 2023

At a request from @agoose77 the following functions correctly:

import boost_histogram as bh
import numpy as np
import awkward as ak

if __name__ == "__main__":

    ar = ak.Array(200*np.random.random((100_000, 3)).astype(np.float32) + 32)

    electrons = ak.zip({"pt": ar})

    combos = ak.combinations(electrons, 2, fields=["first", "second"])

    for _ in range(10):
        x = bh.Histogram(bh.axis.Variable([0, 10, 20, 30, 40, 80, 120, 200]))
        print(x.fill(ak.flatten(combos.second.pt).to_numpy()).values(flow=True))

with either np.float32 or default dtypes.

@ikrommyd
Copy link

ikrommyd commented Sep 6, 2023

filling with x=ak.flatten(ak.values_astype(combos.second.pt, np.float64)) while using .astype(np.float32) in generation also fixes it for me. Not doing .astype in the generation though always seg faults

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug (unverified) The problem described would be a bug, but needs to be triaged
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants