-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cellxgene loader #10
Cellxgene loader #10
Conversation
davidsebfischer
commented
Oct 27, 2020
- automatised data loader on streamlined data structure
- refactoring of code to ease 3rd party integration
adata = anndata.read(fn) | ||
adata.X = adata.raw.X | ||
|
||
self.adata.uns["lab"] = adata.uns["contributors"]["name"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like this loader evaded the ADATA_IDS
refactor.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks @ambrosejcarr, indeed this was the case, I fixed that and introduced a separate constants container for cellxgene objects!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like this implementation! I made my comment because I assumed (incorrectly) that you would replace our schema names with those in ADATA_IDS
, making our data conform to your internal schema.
Checking my understanding: What you've done instead is added cellxgene's schema as an alternative schema. Is that correct?
If going this route, I'd recommend you inherit both ADATA_IDS
and ADATA_IDS_CELLXGENE
from an abstract base class that defines the properties that sfaira
requires. I imagine this includes but may not be limited to the "lazy loading metadata". If you did it this way, a future PR could modularize the schema; a user could choose to swap between different schema based on their selection of which set of constants to use with the data loader.
Someone who wanted to use sfaira
to work in the HTAN or HuBMAP schema (examples) would simply need to write a new set of schema that meets the requirements of the sfaira ABC
, and they could work in their chosen ecosystem's schema...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Checking my understanding: What you've done instead is added cellxgene's schema as an alternative schema. Is that correct?
yes! I also refactored the classes to be derived of base classes that reflect what we require for other data bases now! this closesly ties in with the meta data as well, correct! Thanks for the suggestion!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Extended this - all relevant properties for lazy loading are now exactly those that are also read from meta data files. So in this case, we just need the matched meta data files which are then queried during lazy loading. These match a subset of all generally required properties of the object, essentially the most important dataset-wise (as oppose to cell-wise) properties. If it s the case that a data set has cells from different organs for example, the organ entry of this meta data file would just be a list of contained organs, but not cell-wise.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On 2nd thoughts, I dont think it s really necessary that a 3rd party also maintains the meta files - if the adata objects are streamlined as this is outlined here now, these relevant attributes can be easily read from a loaded adata object and an up-to-date library of meta files can be built locally with write_meta() once before lazy loading usage. This is not very costly (this is how we maintain our meta data file library for example) and reduces overhead of interfacing 3rd parties a lot, let me know if somebody disagrees.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great to me. I don't understand what this means, however:
[A]ll relevant properties for lazy loading are now exactly those that are also read from meta data files. So in this case, we just need the matched meta data files which are then queried during lazy loading
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Previously, meta data files were not directly linked to lazy loading as we largely set properties for lazy loading in datasest constructors. Initially we only used meta data files to generate data base statistics actually. I streamlined this now so that we dont have different versions of meta data files.
lazy datasets now draw from either properties defined in constructor on available in a meta data file. meta data files are streamlined, both in loading and saving.
…ty database via meta objects
addresses #15 |