-
Notifications
You must be signed in to change notification settings - Fork 157
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SparseDataset().append() unexpected behavior #453
Comments
Not sure what exactly is going on, but it probably has to do with the indices and indptr being stored as 32 bit integers (since A few things things to investigate/ do:
|
I try to cast
Code:
Trace:
|
Hmm, wasn't expecting that, but I've opened an issue at h5py for it h5py/h5py#1761 I think a quick solution here will be to make sure the growing import h5py
from scipy import sparse
import numpy as np
from anndata._io.h5ad import write_attribute
from anndata._core.sparse_dataset import SparseDataset
f = h5py.File("text.h5", "w")
s = sparse.random(100, 200, format="csr")
s.indices = s.indices.astype(np.int64)
s.intptr = s.indptr.astype(np.int64)
write_attribute(f, "base_array", s)
s_dset = SparseDataset(f["base_array"])
# Now do your appending
s_dset.append(sparse.random(200, 200, format="csr")) |
Thanks! Would you know whether there's a way to do this with an Due to switching this dataset to a backed object, the matrix becomes dense. Casting the matrix back to Code: adata = anndata.AnnData(scipy.sparse.csr_matrix((0, n), dtype=np.float32))
print(f"Type 1: {type(adata.X)}")
# switch to backed object
adata.filename = filename
print(f"Type 2: {type(adata.X)}")
adata.X = scipy.sparse.csr_matrix(adata.X)
print(f"Type 3: {type(adata.X)}") Output:
|
You can set the datatypes of the indices and indptr on the matrix in the anndata object (i.e. adata.X.indices = adata.X.indices.astype(np.int64)
adata.X.indptr = adata.X.indptr.astype(np.int64)
adata.write_h5ad("path.h5ad")
backed = ad.read_h5ad("path.h5ad", backed="r") A few points of caution:
I'm pretty surprised by the results you get from |
Ah, I see what's happening now. This is part of the current backed interface which I think is a bit confusing. When you call import anndata as ad
from scipy import sparse
a = ad.AnnData(sparse.random(50, 20, format="csr", density=0.1))
print(type(a.X))
# <class 'scipy.sparse.csr.csr_matrix'>
a.filename = "test_backed.h5ad"
print(type(a.X))
# <class 'h5py._hl.dataset.Dataset'>
X = sparse.csr_matrix(a.X)
print(type(X))
# <class 'scipy.sparse.csr.csr_matrix'>
a.X = X
print(type(a.X))
# <class 'anndata._core.sparse_dataset.SparseDataset'> Setting As for your use case, that sounds good. There just aren't many operations defined on I think we'll have a more polished API for this use case soon. Would you mind if I pinged you to take a look once it's ready for some more eyes on it? |
Thanks a lot, it works now! And feel free to reach me once the new API is in place. |
Great! Glad to help. |
For the background on this: scverse/anndata#453
* add plot_npc and plot_active_latent_units (#9) * add plot_npc and plot_active_latent_units * make sure handling of z and z_mean is consistent for VAE embeddings * clean up and documentation * formatting Co-authored-by: Martin König <martin.koenig@ymail.com> Co-authored-by: le-ander <20015434+le-ander@users.noreply.github.com> * added data loader for interactive workflows with unprocessed data * made cell type loading optional in dataset .load() * enabled usage of type estimator on data without labels in prediction mode * recursively search custom model repo for weights files * sort model lookuptable alphabetically before writing it * make sure mode_path is set correctly in model_lookuptable when recursive weights loading is used * fix os.path.join usage in dataloaders * replace path handling through string concatenations with os.paths.join and f-strings * fix bug in lookup table writing * add mdoel file path to lookup table * reset index in model lookuptable before saving * add method to user interface for pushing local model weights to zenodo * fix bug in user interface * fix bux in summaries.py * use absolute model paths when model_lookuptable is used * fix bug in pretrained weights loading * fix bug in pretrained weights loading * automatically create an InteractiveDataset when loading data through the UI * fix bug inUI data loading * Explicitly cast indices and indptr of final backed file to int64. (#17) For the background on this: scverse/anndata#453 * update human lung dataset doi * align mouse organ names with human organ names * fix typo in trachea organ naming in mouse * rename mouse ovary organ to femalegonad * rename mouse ovary organ to femalegonad * sort by model type in classwise f1 heatmap plot * another hacky solution to ensure a summary tab can be created when both vae and other models are loaded at once * allow custom metadata in zenodo submission * do not return doi but deposit url after depositing to zenodo sandbox as dois don't wrk on sandbox * updated model zoo description * recognise all .h5 and .data-0000... files as sfaira weights when constructing lookuptable * Update README.rst * Add selu activation and lecun_normal weight_init scheme for human VAEVAMP. (#19) * update sfaira erpo url and handle .h5 extension in model lookuptable id * add meta_data download information to all human dataloaders * updated docs * updated reference to README in docs * updated index * included reference to svensson et al data base in docs * fixed typo in docs * fixed typos in docs * restructured docs * fixed bug in reference roadmap in docs * updated data and model zoo description * added summary picture into index of docs * fixed typo in docs * updated summary panel * add badges to readme and docs index * updated summary panel (#20) * Doc updates (#21) * updated summary panel * fixed concept figure references * Doc updates (#22) * updated zoo panels * move from `import sfaira.api as sfaira` to `import sfaira` and from `import sfaira_extension.api as sfairae` to `import sfaira_extension` * add custom genomes to sfaira_extension * fix loading of custom topology versions from sfaira_extension * fix circular imports between sfaira_extension and sfaira * fix dataloader * fix celltype versioning through sfaira_extension * fix celltype versioning through sfaira_extension * formatting * Doc updates (#25) * added mention of download scripts into docs Co-authored-by: mk017 <martin.koenig@tum.de> Co-authored-by: Martin König <martin.koenig@ymail.com> Co-authored-by: le-ander <20015434+le-ander@users.noreply.github.com> Co-authored-by: Abdul Moeed <abdulmoeed444@gmail.com>
* add plot_npc and plot_active_latent_units (#9) * add plot_npc and plot_active_latent_units * make sure handling of z and z_mean is consistent for VAE embeddings * clean up and documentation * formatting Co-authored-by: Martin König <martin.koenig@ymail.com> Co-authored-by: le-ander <20015434+le-ander@users.noreply.github.com> * added data loader for interactive workflows with unprocessed data * made cell type loading optional in dataset .load() * enabled usage of type estimator on data without labels in prediction mode * recursively search custom model repo for weights files * sort model lookuptable alphabetically before writing it * make sure mode_path is set correctly in model_lookuptable when recursive weights loading is used * fix os.path.join usage in dataloaders * replace path handling through string concatenations with os.paths.join and f-strings * fix bug in lookup table writing * add mdoel file path to lookup table * reset index in model lookuptable before saving * add method to user interface for pushing local model weights to zenodo * fix bug in user interface * fix bux in summaries.py * use absolute model paths when model_lookuptable is used * fix bug in pretrained weights loading * fix bug in pretrained weights loading * automatically create an InteractiveDataset when loading data through the UI * fix bug inUI data loading * Explicitly cast indices and indptr of final backed file to int64. (#17) For the background on this: scverse/anndata#453 * update human lung dataset doi * align mouse organ names with human organ names * fix typo in trachea organ naming in mouse * rename mouse ovary organ to femalegonad * rename mouse ovary organ to femalegonad * sort by model type in classwise f1 heatmap plot * another hacky solution to ensure a summary tab can be created when both vae and other models are loaded at once * allow custom metadata in zenodo submission * do not return doi but deposit url after depositing to zenodo sandbox as dois don't wrk on sandbox * updated model zoo description * recognise all .h5 and .data-0000... files as sfaira weights when constructing lookuptable * Update README.rst * Add selu activation and lecun_normal weight_init scheme for human VAEVAMP. (#19) * update sfaira erpo url and handle .h5 extension in model lookuptable id * add meta_data download information to all human dataloaders * updated docs * updated reference to README in docs * updated index * included reference to svensson et al data base in docs * fixed typo in docs * fixed typos in docs * restructured docs * fixed bug in reference roadmap in docs * updated data and model zoo description * added summary picture into index of docs * fixed typo in docs * updated summary panel * add badges to readme and docs index * updated summary panel (#20) * Doc updates (#21) * updated summary panel * fixed concept figure references * Doc updates (#22) * updated zoo panels * move from `import sfaira.api as sfaira` to `import sfaira` and from `import sfaira_extension.api as sfairae` to `import sfaira_extension` * add custom genomes to sfaira_extension * fix loading of custom topology versions from sfaira_extension * fix circular imports between sfaira_extension and sfaira * fix dataloader * fix celltype versioning through sfaira_extension * fix celltype versioning through sfaira_extension * formatting * Doc updates (#25) * added mention of download scripts into docs Co-authored-by: mk017 <martin.koenig@tum.de> Co-authored-by: Martin König <martin.koenig@ymail.com> Co-authored-by: le-ander <20015434+le-ander@users.noreply.github.com> Co-authored-by: Abdul Moeed <abdulmoeed444@gmail.com>
I am trying to append sparse datasets to one (large) dataset. It works as expected for a while (~2147483647 non-negative elements) but then stops appending non-negative elements without any errors/warnings and fills everything afterwards with 0s.
The problem is even stranger: while investigating
append()
, I found the following:In script:
In
append()
:So apparently the object
X
in my script has the samennz
count before and afterappend()
, but insideappend()
the 'nnz' count is correct i.e. the new data is appended successfully.The text was updated successfully, but these errors were encountered: