Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Array.to_list fails when length is a numpy integer in 2.5.2 #2947

Closed
ivirshup opened this issue Jan 15, 2024 · 3 comments
Closed

Array.to_list fails when length is a numpy integer in 2.5.2 #2947

ivirshup opened this issue Jan 15, 2024 · 3 comments
Labels
bug The problem described is something that must be fixed good first issue Good for newcomers

Comments

@ivirshup
Copy link

Version of Awkward Array

2.5.2

Description and code to reproduce

import awkward as ak, numpy as np

awk = ak.Array(np.ones((7, 0)))
form, length, container = ak.to_buffers(ak.to_packed(awk))
awk_from_buf = ak.from_buffers(form, np.int64(length), container)

awk_from_buf.to_list()
TypeError                                 Traceback (most recent call last)
Cell In[1], line 7
      4 form, length, container = ak.to_buffers(ak.to_packed(awk))
      5 awk_from_buf = ak.from_buffers(form, np.int64(length), container)
----> 7 awk_from_buf.to_list()

File ~/miniforge3/envs/scanpy-dev/lib/python3.11/site-packages/awkward/highlevel.py:496, in Array.to_list(self)
    492 def to_list(self):
    493     """
    494     Converts this Array into Python objects; same as #ak.to_list.
    495     """
--> 496     return self._layout.to_list(self._behavior)

File ~/miniforge3/envs/scanpy-dev/lib/python3.11/site-packages/awkward/contents/content.py:1086, in Content.to_list(self, behavior)
   1085 def to_list(self, behavior: dict | None = None) -> list:
-> 1086     return self.to_packed()._to_list(behavior, None)

File ~/miniforge3/envs/scanpy-dev/lib/python3.11/site-packages/awkward/contents/regulararray.py:1464, in RegularArray.to_packed(self, recursive)
   1462 index_nplike = self._backend.index_nplike
   1463 length = self._length * self._size
-> 1464 content = self._content[: index_nplike.shape_item_as_index(length)]
   1466 return RegularArray(
   1467     content.to_packed(True) if recursive else content,
   1468     self._size,
   1469     self._length,
   1470     parameters=self._parameters,
   1471 )

File ~/miniforge3/envs/scanpy-dev/lib/python3.11/site-packages/awkward/_nplikes/array_module.py:323, in ArrayModuleNumpyLike.shape_item_as_index(self, x1)
    321     return x1
    322 else:
--> 323     raise TypeError(f"expected None or int type, received {x1}")

TypeError: expected None or int type, received 0

Notably, this doesn't throw an error in 2.5.1.

The issue may be a bit more specific than the title, but we're running into this on the anndata test suite with some of our round-tripping IO tests.

In our use case we are serializing to hdf5. We're storing the length directly in hdf5, so h5py reads it out as a numpy integer. This the causes errors post reconstruction since awkward doesn't recognize the numpy integer length as the correct type.

@ivirshup ivirshup added the bug (unverified) The problem described would be a bug, but needs to be triaged label Jan 15, 2024
@agoose77
Copy link
Collaborator

Internally we try to be careful to meaningfully type and discriminate between lengths and index values. According to our type hints, we expect lengths to be int or unknown_length. My instinct is to uphold this distinction, and convert NumPy integers into Python integers at the L1/L2—L3 API boundary.

An example of where this is helpful is in Form.to_json — right now we don't do anything special to try and convert the NumPy numeric types to Python's int. This would be pertinent to the reproducer, because np.int64 in the form would not serialise.

So, I'll make a PR that looks to support np.int64 in the ak.from_buffers and ak.unflatten (as two examples of high level functions accepting lengths).

@agoose77
Copy link
Collaborator

@ivirshup this is a bug, but the fix will likely be int(length) in ak.from_buffers. As such, instead of pinning against awkward 2.5.2, you could introduce this harmless fix (that will eventually no-op).

@ivirshup
Copy link
Author

That also makes sense. Tbh I had gotten started with the pinning while narrowing down the bug.

@agoose77 agoose77 added good first issue Good for newcomers bug The problem described is something that must be fixed and removed bug (unverified) The problem described would be a bug, but needs to be triaged labels Jan 17, 2024
maxgalli added a commit to maxgalli/awkward that referenced this issue Jul 30, 2024
Test against np.integer as well in ArrayModuleNumpyLike.shape_item_as_index such that it does not break
if ak.from_buffers is built with a numpy integer instead of an integer
pfackeldey pushed a commit to pfackeldey/awkward that referenced this issue Jul 31, 2024
* fix: cast numpy integers to integers in ak.from_buffers (fixes scikit-hep#2947)

* Add test for 3502225e9246ce0197d6143c9d94c9e1aeec88b8

* style: pre-commit fixes

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug The problem described is something that must be fixed good first issue Good for newcomers
Projects
Status: Done
Development

No branches or pull requests

2 participants