You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
AFAIK the last blocking bug for 0.7. Originally reported in scverse/scanpy#832.
Minimal reproducer:
importscanpyasscpbmc=sc.datasets.pbmc68k_reduced()
pbmc.write("tmp.h5ad")
fromdisk=sc.read("tmp.h5ad") # Do we read okayfromdisk.write(pbmc) # Can we round trip
The issue here is with structured numpy arrays and the variety of string types, and didn't get caught earlier because these are a bit of pain to actually instantiate... A brief summary of the conflict (copied from the earlier issue:
h5py doesn't do fixed length unicode strings
h5py does do variable length unicode strings, pretty much anywhere
zarr doesn't do variable length strings in structured arrays
We probably don't actually want to use fixed length unicode strings much. Bytestrings, more likely.
We can probably just add another element type to allow special handling for these. I think it'd be fine to not do np.str_ type arrays.
This is pretty easy to fix for hdf5 if we just say all unicode strings are variable length. Zarr has an open pull request to support this zarr-developers/zarr-python#422.
The question is whether we wait for a zarr release to keep consistency between the formats. This is the simplest solution, and probably what we should go with once it's available. The problem is we end up with some intermediary solution if it's not available yet, which adds complexity to backwards compatibility.
The text was updated successfully, but these errors were encountered:
AFAIK the last blocking bug for 0.7. Originally reported in scverse/scanpy#832.
Minimal reproducer:
The issue here is with structured numpy arrays and the variety of string types, and didn't get caught earlier because these are a bit of pain to actually instantiate... A brief summary of the conflict (copied from the earlier issue:
This is pretty easy to fix for hdf5 if we just say all unicode strings are variable length. Zarr has an open pull request to support this zarr-developers/zarr-python#422.
The question is whether we wait for a zarr release to keep consistency between the formats. This is the simplest solution, and probably what we should go with once it's available. The problem is we end up with some intermediary solution if it's not available yet, which adds complexity to backwards compatibility.
The text was updated successfully, but these errors were encountered: