Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Numpy dtype of Index64 #335

Closed
nsmith- opened this issue Jul 13, 2020 · 4 comments · Fixed by #337
Closed

Numpy dtype of Index64 #335

nsmith- opened this issue Jul 13, 2020 · 4 comments · Fixed by #337
Labels
bug The problem described is something that must be fixed

Comments

@nsmith-
Copy link
Member

nsmith- commented Jul 13, 2020

It seems the dtype of numpy arrays converted from awkward Index64 have a dtype roundtrip issue. Consider:

import awkward1
import numpy

a = awkward1.layout.Index64([0, 1, 2, 3])
b = numpy.asarray(a)

b.dtype.char is q, while the platform I am on expects l. Numpy handles this gracefully:

>>> numpy.arange(4).dtype.char
'l'
>>> b + numpy.arange(4)
array([0, 2, 4, 6], dtype=int64)

however awkward1 does not:

>>> b + awkward1.Array([0, 1, 2, 3])
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/ncsmith/src/awkward-1.0/awkward1/highlevel.py", line 1169, in __repr__
    typestr = repr(str(awkward1._util.highlevel_type(layout, self._behavior, True)))
  File "/Users/ncsmith/src/awkward-1.0/awkward1/_util.py", line 1110, in highlevel_type
    return awkward1.types.ArrayType(layout.type(typestrs(behavior)), len(layout))
ValueError: Numpy format "q" cannot be expressed as a PrimitiveType

It is sufficient to recast (despite the dtype not actually changing other than char):

>>> b.astype('i8') + awkward1.Array([0, 1, 2, 3])
<Array [0, 2, 4, 6] type='4 * int64'>
@nsmith- nsmith- added the bug (unverified) The problem described would be a bug, but needs to be triaged label Jul 13, 2020
@nsmith-
Copy link
Member Author

nsmith- commented Jul 13, 2020

This is on OS X, I believe it also happens on linux but am not sure.

@jpivarski jpivarski added bug The problem described is something that must be fixed and removed bug (unverified) The problem described would be a bug, but needs to be triaged labels Jul 13, 2020
@jpivarski
Copy link
Member

It's reproducible on Linux.

Also, I noticed this:

>>> b.dtype.type
<class 'numpy.longlong'>
>>> b.astype('i8').dtype.type
<class 'numpy.int64'>

long-long...

@jpivarski
Copy link
Member

I believe this is related to pybind/pybind11#1908.

@jpivarski
Copy link
Member

This is fixed because ak.Array now knows about all the NumPy dtypes, including longlong. That came as part of a general clean-up, in which I used "format in {c, b, h, i, l, q, B, H, I, L, Q" + itemsize to determine its portable type (e.g. int32_t vs int64_t), rather than relying on the (platform-dependent) value of the format to determine itemsize. It was a major refactoring (3.5k lines). But it introduces uniformity that was lacking and stubs in all the places we would need it if we're ever going to support float16 or complex numbers.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug The problem described is something that must be fixed
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants