Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Zarr structured arrays with strings #249

Closed
ivirshup opened this issue Nov 15, 2019 · 0 comments · Fixed by #252
Closed

Zarr structured arrays with strings #249

ivirshup opened this issue Nov 15, 2019 · 0 comments · Fixed by #252

Comments

@ivirshup
Copy link
Member

AFAIK the last blocking bug for 0.7. Originally reported in scverse/scanpy#832.

Minimal reproducer:

import scanpy as sc
pbmc = sc.datasets.pbmc68k_reduced()
pbmc.write("tmp.h5ad")
fromdisk = sc.read("tmp.h5ad")  # Do we read okay
fromdisk.write(pbmc)  # Can we round trip

The issue here is with structured numpy arrays and the variety of string types, and didn't get caught earlier because these are a bit of pain to actually instantiate... A brief summary of the conflict (copied from the earlier issue:

  • h5py doesn't do fixed length unicode strings
  • h5py does do variable length unicode strings, pretty much anywhere
  • zarr doesn't do variable length strings in structured arrays
  • We probably don't actually want to use fixed length unicode strings much. Bytestrings, more likely.
  • We can probably just add another element type to allow special handling for these. I think it'd be fine to not do np.str_ type arrays.

This is pretty easy to fix for hdf5 if we just say all unicode strings are variable length. Zarr has an open pull request to support this zarr-developers/zarr-python#422.
The question is whether we wait for a zarr release to keep consistency between the formats. This is the simplest solution, and probably what we should go with once it's available. The problem is we end up with some intermediary solution if it's not available yet, which adds complexity to backwards compatibility.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant