Constructing a scarf Zarr dataset from scratch #56
-
Hello there. I've been thinking a lot lately about Dask and scRNAseq analysis so I'm excited to give this a try! We have a ton of data to analyze and most tools aren't able to scale well enough, so having something that can distribute the work is great. The first step I'm figuring out is how to get the data into the right format. I already have a very large dataset (10X snRNAseq) in Zarr format, stored as a cell-by-gene matrix of UMI counts, so I would prefer not to go through the CR import process but just tweak the layout to be compatible with Scarf. Is that reasonable? I think I can figure this out from the docs but I thought I'd open an issue in case there's an easy answer. I am guessing it's just a matter of naming things correctly. |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment
-
Hi @jamestwebber, This is absolutely possible. If you haven't already, I would suggest you read this vignette about Zarr organization So, a minimal Zarr hierarchy in Scarf looks like this (this data has 892 cells and 36601 features): The top folders
where
You will need The last step is to populate the
With this you basically have the Zarr file ready to be loaded into Scarf. Please let me know if something is not clear and needs to be explained better. |
Beta Was this translation helpful? Give feedback.
Hi @jamestwebber,
This is absolutely possible.
I have a question first, is there any other reason than time-consumption that you would like to avoid the CR import functions?
If you haven't already, I would suggest you read this vignette about Zarr organization
So, a minimal Zarr hierarchy in Scarf looks like this (this data has 892 cells and 36601 features):
/
├── RNA
│ ├── counts (892, 36601) uint32
│ └── featureData
│ ├── I (36601,) bool
│ ├── ids (36601,) <U15
│ └── names (36601,) <U17
└── cellData
├── I (892,) bool
├── ids (892,) <U18
└── names (892,) <U18
The top folders
RNA
andcellData
are simply Zarr groups. You must add the following two attributes to theRNA
group like this: