"invalid shape in fixed-type tuple" when saving adata after ranking genes #832

dylkot · 2019-09-11T16:50:20Z

I'm using Scanpy with the following software versions:

python==3.7
scanpy==1.4.4
numpy==1.17.2
anndata==0.6.22.post1

on Ubuntu 18.04. I am able to save my AnnData object just fine with

sc.write(results_file, adata)

and to load it again with

adata = sc.read(results_file)

however if I save it after I run the command

sc.tl.rank_genes_groups(adata, 'louvain12_lab', method='wilcoxon')

the AnnData object will save but when I try to reload it, I get an error message:

ValueError                                Traceback (most recent call last)
<ipython-input-141-159082f1696f> in <module>
      1 results_file = os.path.join(adir, '{project}.count_{count}.gene_{gene}.mito_{mito}.HVGs_{nhvgs}.TPT.{log}.scale.TEST.h5ad'.format(project=project_name, count=count_thresh, gene=gene_thresh, mito=mitothresh, nhvgs=nhvgs, log=logstatus))
      2 print(results_file)
----> 3 adata = sc.read(results_file)

/opt/miniconda3/envs/py37/lib/python3.7/site-packages/scanpy/readwrite.py in read(filename, backed, sheet, ext, delimiter, first_column_names, backup_url, cache, **kwargs)
     95             filename, backed=backed, sheet=sheet, ext=ext,
     96             delimiter=delimiter, first_column_names=first_column_names,
---> 97             backup_url=backup_url, cache=cache, **kwargs,
     98         )
     99     # generate filename and read to dict

/opt/miniconda3/envs/py37/lib/python3.7/site-packages/scanpy/readwrite.py in _read(filename, backed, sheet, ext, delimiter, first_column_names, backup_url, cache, suppress_cache_warning, **kwargs)
    497     if ext in {'h5', 'h5ad'}:
    498         if sheet is None:
--> 499             return read_h5ad(filename, backed=backed)
    500         else:
    501             logg.debug(f'reading sheet {sheet} from file {filename}')

/opt/miniconda3/envs/py37/lib/python3.7/site-packages/anndata/readwrite/read.py in read_h5ad(filename, backed, chunk_size)
    445     else:
    446         # load everything into memory
--> 447         constructor_args = _read_args_from_h5ad(filename=filename, chunk_size=chunk_size)
    448         X = constructor_args[0]
    449         dtype = None

/opt/miniconda3/envs/py37/lib/python3.7/site-packages/anndata/readwrite/read.py in _read_args_from_h5ad(adata, filename, mode, chunk_size)
    484             d[key] = None
    485         else:
--> 486             _read_key_value_from_h5(f, d, key, chunk_size=chunk_size)
    487     # backwards compat: save X with the correct name
    488     if 'X' not in d:

/opt/miniconda3/envs/py37/lib/python3.7/site-packages/anndata/readwrite/read.py in _read_key_value_from_h5(f, d, key, key_write, chunk_size)
    508         d[key_write] = OrderedDict() if key == 'uns' else {}
    509         for k in f[key].keys():
--> 510             _read_key_value_from_h5(f, d[key_write], key + '/' + k, k, chunk_size)
    511         return
    512 

/opt/miniconda3/envs/py37/lib/python3.7/site-packages/anndata/readwrite/read.py in _read_key_value_from_h5(f, d, key, key_write, chunk_size)
    508         d[key_write] = OrderedDict() if key == 'uns' else {}
    509         for k in f[key].keys():
--> 510             _read_key_value_from_h5(f, d[key_write], key + '/' + k, k, chunk_size)
    511         return
    512 

/opt/miniconda3/envs/py37/lib/python3.7/site-packages/anndata/readwrite/read.py in _read_key_value_from_h5(f, d, key, key_write, chunk_size)
    542         return key, value
    543 
--> 544     key, value = postprocess_reading(key, value)
    545     d[key_write] = value
    546     return

/opt/miniconda3/envs/py37/lib/python3.7/site-packages/anndata/readwrite/read.py in postprocess_reading(key, value)
    539             new_dtype = [((dt[0], 'U{}'.format(int(int(dt[1][2:])/4)))
    540                           if dt[1][1] == 'S' else dt) for dt in value.dtype.descr]
--> 541             value = value.astype(new_dtype)
    542         return key, value
    543 

ValueError: invalid shape in fixed-type tuple.

Any idea what is going on or what I can do to make it past this error? It only started happening after I updated my operating system to Ubuntu 18 and my Python to 3.7 and reinstalled scanpy from conda.

gokceneraslan · 2019-09-11T19:26:50Z

Hi Dylan,

This is an issue with the new h5py package, which @ivirshup already fixed on master (928d475). For now, you can downgrade your h5py package to 2.9.0 using pip install h5py==2.9.0 as a workaround.

bkmartinjr · 2019-09-11T19:28:25Z

@gokceneraslan - thanks for the fast response. This broke our (cellxgene) travis pipeline as well. Do you have any info on eta for a fix/workaround other than pinning the module version? TY!

ivirshup · 2019-09-12T04:28:06Z

Just a heads up, there is a remaining issue on anndata master where reading older files with h5py 2.10.0 results in bytestring indexes.

…

On Sep 12, 2019, at 05:28, Bruce Martin ***@***.***> wrote: @gokceneraslan - thanks for the fast response. This broke our (cellxgene) travis pipeline as well. Do you have any info on eta for a fix/workaround other than pinning the module version? TY! — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

LuckyMD · 2019-10-22T13:43:09Z

I'm having the same error with h5py==2.9.0. Cellxgene doesn't seem to be working with the object that I created the object with scanpy 1.4.3+116.g0075c62. I can however load it again with that version. But when I downgrade to 1.3.7 (recommendation from @mbuttner who had the same cellxgene issue) I can no longer load the object and get the above error.

Back in the 1.4.3 dev version scanpy it no longer writes the object after loading, and gives me the following error:

In [23]: adata.write("cellxgene.h5ad")                                                                                                                                                         
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-23-33b15d710f71> in <module>
----> 1 adata.write("cellxgene.h5ad")

~/new_anndata/anndata/anndata/core/anndata.py in write_h5ad(self, filename, compression, compression_opts, force_dense)
   2222             compression=compression,
   2223             compression_opts=compression_opts,
-> 2224             force_dense=force_dense,
   2225         )
   2226 

~/new_anndata/anndata/anndata/readwrite/h5ad.py in write_h5ad(filepath, adata, force_dense, dataset_kwargs, **kwargs)
     90         write_attribute(f, "varp", adata.varp, dataset_kwargs)
     91         write_attribute(f, "layers", adata.layers, dataset_kwargs)
---> 92         write_attribute(f, "uns", adata.uns, dataset_kwargs)
     93         write_attribute(f, "raw", adata.raw, dataset_kwargs)
     94     if adata.isbacked:

~/new_anndata/anndata/anndata/readwrite/h5ad.py in write_attribute(f, key, value, dataset_kwargs)
    103     if key in f:
    104         del f[key]
--> 105     _write_method(type(value))(f, key, value, dataset_kwargs)
    106 
    107 

~/new_anndata/anndata/anndata/readwrite/h5ad.py in write_mapping(f, key, value, dataset_kwargs)
    203 def write_mapping(f, key, value, dataset_kwargs=MappingProxyType({})):
    204     for sub_key, sub_value in value.items():
--> 205         write_attribute(f, f"{key}/{sub_key}", sub_value, dataset_kwargs)
    206 
    207 

~/new_anndata/anndata/anndata/readwrite/h5ad.py in write_attribute(f, key, value, dataset_kwargs)
    103     if key in f:
    104         del f[key]
--> 105     _write_method(type(value))(f, key, value, dataset_kwargs)
    106 
    107 

~/new_anndata/anndata/anndata/readwrite/h5ad.py in write_mapping(f, key, value, dataset_kwargs)
    203 def write_mapping(f, key, value, dataset_kwargs=MappingProxyType({})):
    204     for sub_key, sub_value in value.items():
--> 205         write_attribute(f, f"{key}/{sub_key}", sub_value, dataset_kwargs)
    206 
    207 

~/new_anndata/anndata/anndata/readwrite/h5ad.py in write_attribute(f, key, value, dataset_kwargs)
    103     if key in f:
    104         del f[key]
--> 105     _write_method(type(value))(f, key, value, dataset_kwargs)
    106 
    107 

~/new_anndata/anndata/anndata/readwrite/h5ad.py in write_array(f, key, value, dataset_kwargs)
    152     elif value.dtype.names is not None:
    153         value = _to_hdf5_vlen_strings(value)
--> 154     f.create_dataset(key, data=value, **dataset_kwargs)
    155 
    156 

~/new_anndata/anndata/anndata/h5py/h5sparse.py in create_dataset(self, name, data, chunk_size, **kwargs)
    151         if not isinstance(data, SparseDataset) and not ss.issparse(data):
    152             return self.h5py_group.create_dataset(
--> 153                 name=name, data=data, **kwargs
    154             )
    155         if self.force_dense:

~/anaconda3/envs/sc-tutorial/lib/python3.6/site-packages/h5py/_hl/group.py in create_dataset(self, name, shape, dtype, data, **kwds)
    134 
    135         with phil:
--> 136             dsid = dataset.make_new_dset(self, shape, dtype, data, **kwds)
    137             dset = dataset.Dataset(dsid)
    138             if name is not None:

~/anaconda3/envs/sc-tutorial/lib/python3.6/site-packages/h5py/_hl/dataset.py in make_new_dset(parent, shape, dtype, data, chunks, compression, shuffle, fletcher32, maxshape, compression_opts, fillvalue, scaleoffset, track_times, external, track_order, dcpl)
    116         else:
    117             dtype = numpy.dtype(dtype)
--> 118         tid = h5t.py_create(dtype, logical=1)
    119 
    120     # Legacy

h5py/h5t.pyx in h5py.h5t.py_create()

h5py/h5t.pyx in h5py.h5t.py_create()

h5py/h5t.pyx in h5py.h5t.py_create()

h5py/h5t.pyx in h5py.h5t._c_compound()

h5py/h5t.pyx in h5py.h5t.py_create()

h5py/h5t.pyx in h5py.h5t.py_create()

TypeError: Object dtype dtype('O') has no native HDF5 equivalent

Everything however seems to work fine when I throw out the rank_genes_groups results from adata.uns

Edit: actually cellxgene still isn't working, but I could at least save again.

ivirshup · 2019-10-23T09:03:52Z

@LuckyMD

I can replicate that with:

import scanpy as sc
pbmc = sc.datasets.pbmc68k_reduced()
pbmc.write("tmp.h5ad")
fromdisk = sc.read("tmp.h5ad")  # Do we read okay
fromdisk.write(pbmc)  # Can we round trip

Some context around this, and my current thinking on a solution:

h5py doesn't do fixed length unicode strings
h5py does do variable length unicode strings, pretty much anywhere
zarr doesn't do variable length strings in structured arrays
We probably don't actually want to use fixed length unicode strings much. Bytestrings, more likely.
We can probably just add another element type to allow special handling for these. I think it'd be fine to not do np.str_ type arrays.

LuckyMD · 2019-10-23T09:15:06Z

Sorry for the lack of minimal reproducible example... and thanks for creating one :).

flying-sheep · 2019-10-23T11:53:57Z

We probably don't actually want to use fixed length unicode strings much. Bytestrings, more likely.

Where might bytestrings be useful? If you say text, I have a very strong opposing opinion as I’m a survivor of Python 2 and don’t want to see an UnicodeDecodeError in my life again 😉

ivirshup · 2019-10-23T13:53:09Z

I think fixed length bytestrings would be useful when the data isn’t actually text. I think the assumption of a fixed length with text data, especially if it might have Unicode characters, is just asking for trouble.

flying-sheep · 2019-10-24T13:01:43Z

It is, too bad numpy has no good variable-length string array type.

When would bytes make sense? Bytes just mean “data, but I don’t know its structure or am about to write it to disk”

This was referenced Sep 11, 2019

pin h5py to 2.9.0 to temporarily work around regression chanzuckerberg/cellxgene#916

Merged

remove temp pin of h5py when regression fixed chanzuckerberg/cellxgene#917

Closed

ivirshup mentioned this issue Sep 17, 2019

Change in dtype returned in h5py v2.10.0 h5py/h5py#1307

Closed

cflerin mentioned this issue Oct 11, 2019

The Scanpy docker image contains h5py 2.10.0 vib-singlecell-nf/vsn-pipelines#12

Closed

ivirshup mentioned this issue Oct 23, 2019

Support Numpy structured arrays containing an object zarr-developers/zarr-python#422

Closed

3 tasks

ivirshup mentioned this issue Nov 7, 2019

v0.7 scverse/anndata#171

Closed

8 tasks

This was referenced Nov 15, 2019

Zarr structured arrays with strings scverse/anndata#249

Closed

Fix structured array IO scverse/anndata#252

Merged

flying-sheep closed this as completed in scverse/anndata#252 Nov 25, 2019

ivirshup mentioned this issue Dec 1, 2019

Reading an h5ad file not working anymore after I run the rank_genes_groups function (used to work fine before python module update) #937

Closed

mruffalo mentioned this issue Jan 8, 2020

Read Matt's data hubmapconsortium/portal-containers#1

Closed

bkmartinjr mentioned this issue Feb 5, 2020

AnnData 0.7 chanzuckerberg/cellxgene#1142

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

"invalid shape in fixed-type tuple" when saving adata after ranking genes #832

"invalid shape in fixed-type tuple" when saving adata after ranking genes #832

dylkot commented Sep 11, 2019 •

edited by flying-sheep

Loading

gokceneraslan commented Sep 11, 2019 •

edited

Loading

bkmartinjr commented Sep 11, 2019

ivirshup commented Sep 12, 2019 via email

LuckyMD commented Oct 22, 2019 •

edited

Loading

ivirshup commented Oct 23, 2019

LuckyMD commented Oct 23, 2019

flying-sheep commented Oct 23, 2019

ivirshup commented Oct 23, 2019 via email

flying-sheep commented Oct 24, 2019

"invalid shape in fixed-type tuple" when saving adata after ranking genes #832

"invalid shape in fixed-type tuple" when saving adata after ranking genes #832

Comments

dylkot commented Sep 11, 2019 • edited by flying-sheep Loading

gokceneraslan commented Sep 11, 2019 • edited Loading

bkmartinjr commented Sep 11, 2019

ivirshup commented Sep 12, 2019 via email

LuckyMD commented Oct 22, 2019 • edited Loading

ivirshup commented Oct 23, 2019

LuckyMD commented Oct 23, 2019

flying-sheep commented Oct 23, 2019

ivirshup commented Oct 23, 2019 via email

flying-sheep commented Oct 24, 2019

dylkot commented Sep 11, 2019 •

edited by flying-sheep

Loading

gokceneraslan commented Sep 11, 2019 •

edited

Loading

LuckyMD commented Oct 22, 2019 •

edited

Loading