Cellxgene loader #10

davidsebfischer · 2020-10-27T12:21:09Z

automatised data loader on streamlined data structure
refactoring of code to ease 3rd party integration

…arate consts object

ambrosejcarr · 2020-10-27T12:52:19Z

sfaira/data/databases/cellxgene_loader.py

+        adata = anndata.read(fn)
+        adata.X = adata.raw.X
+
+        self.adata.uns["lab"] = adata.uns["contributors"]["name"]


Looks like this loader evaded the ADATA_IDS refactor.

thanks @ambrosejcarr, indeed this was the case, I fixed that and introduced a separate constants container for cellxgene objects!

I like this implementation! I made my comment because I assumed (incorrectly) that you would replace our schema names with those in ADATA_IDS, making our data conform to your internal schema.

Checking my understanding: What you've done instead is added cellxgene's schema as an alternative schema. Is that correct?

If going this route, I'd recommend you inherit both ADATA_IDS and ADATA_IDS_CELLXGENE from an abstract base class that defines the properties that sfaira requires. I imagine this includes but may not be limited to the "lazy loading metadata". If you did it this way, a future PR could modularize the schema; a user could choose to swap between different schema based on their selection of which set of constants to use with the data loader.

Someone who wanted to use sfaira to work in the HTAN or HuBMAP schema (examples) would simply need to write a new set of schema that meets the requirements of the sfaira ABC, and they could work in their chosen ecosystem's schema...

Checking my understanding: What you've done instead is added cellxgene's schema as an alternative schema. Is that correct?

yes! I also refactored the classes to be derived of base classes that reflect what we require for other data bases now! this closesly ties in with the meta data as well, correct! Thanks for the suggestion!

Extended this - all relevant properties for lazy loading are now exactly those that are also read from meta data files. So in this case, we just need the matched meta data files which are then queried during lazy loading. These match a subset of all generally required properties of the object, essentially the most important dataset-wise (as oppose to cell-wise) properties. If it s the case that a data set has cells from different organs for example, the organ entry of this meta data file would just be a list of contained organs, but not cell-wise.

On 2nd thoughts, I dont think it s really necessary that a 3rd party also maintains the meta files - if the adata objects are streamlined as this is outlined here now, these relevant attributes can be easily read from a loaded adata object and an up-to-date library of meta files can be built locally with write_meta() once before lazy loading usage. This is not very costly (this is how we maintain our meta data file library for example) and reduces overhead of interfacing 3rd parties a lot, let me know if somebody disagrees.

This looks great to me. I don't understand what this means, however:

[A]ll relevant properties for lazy loading are now exactly those that are also read from meta data files. So in this case, we just need the matched meta data files which are then queried during lazy loading

Previously, meta data files were not directly linked to lazy loading as we largely set properties for lazy loading in datasest constructors. Initially we only used meta data files to generate data base statistics actually. I streamlined this now so that we dont have different versions of meta data files.

…ields

…ared features

… parties

lazy datasets now draw from either properties defined in constructor on available in a meta data file. meta data files are streamlined, both in loading and saving.

…ty database via meta objects

davidsebfischer · 2020-11-03T10:32:55Z

addresses #15

davidsebfischer added 3 commits October 8, 2020 14:19

added first version of cellxgene data format loader

b4e12a0

refactored anndata field entries from data loaders to be named in sep…

27fa462

…arate consts object

adapted adata field refectoring in data base classs

4b082e5

davidsebfischer self-assigned this Oct 27, 2020

davidsebfischer marked this pull request as draft October 27, 2020 12:21

ambrosejcarr reviewed Oct 27, 2020

View reviewed changes

davidsebfischer added 13 commits October 27, 2020 14:19

updated cellxgene data loader to use refactored constants for adata f…

8aa4ef6

…ields

updated missing refactored gene id fields in data loaders

2a145fd

refactored adata fields constant container classes to reflect core sh…

d70f524

…ared features

updated old usages of ADATA_IDS to ADATA_IDS_SFAIRA

4a280d0

added constants based classses into api to improve interfacing to 3rd…

16e9aba

… parties

refactored lazy dataset properties and meta data objects

60f5bce

lazy datasets now draw from either properties defined in constructor on available in a meta data file. meta data files are streamlined, both in loading and saving.

renamed remaining instances of "animal" into "species"

1e32103

allowed maps of meta data file nomenclature

77a6dff

moved meta data code in DatasetBase for readability

9f844c7

introduced meta_fn attribute of dataset class and depreceated 3rd par…

d3e356a

…ty database via meta objects

added datsetgroup subsetting based on meta / lazy properties

4f988a9

Merge branch 'master' into cellxgene_loader

facab5b

Merge branch 'master' into cellxgene_loader

4b85cbb

davidsebfischer requested a review from le-ander October 27, 2020 18:58

le-ander approved these changes Nov 2, 2020

View reviewed changes

davidsebfischer changed the base branch from master to dev November 3, 2020 10:02

davidsebfischer added this to the cellxgene compatible anndata fields milestone Nov 3, 2020

davidsebfischer marked this pull request as ready for review November 3, 2020 10:40

davidsebfischer merged commit 365739c into dev Nov 3, 2020

Zethson deleted the cellxgene_loader branch April 27, 2021 14:07

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cellxgene loader #10

Cellxgene loader #10

davidsebfischer commented Oct 27, 2020

ambrosejcarr Oct 27, 2020 •

edited

Loading

davidsebfischer Oct 27, 2020

ambrosejcarr Oct 27, 2020 •

edited

Loading

davidsebfischer Oct 27, 2020 •

edited

Loading

davidsebfischer Oct 27, 2020

davidsebfischer Oct 27, 2020

ambrosejcarr Oct 28, 2020 •

edited

Loading

davidsebfischer Oct 28, 2020

davidsebfischer commented Nov 3, 2020

Cellxgene loader #10

Cellxgene loader #10

Conversation

davidsebfischer commented Oct 27, 2020

ambrosejcarr Oct 27, 2020 • edited Loading

Choose a reason for hiding this comment

davidsebfischer Oct 27, 2020

Choose a reason for hiding this comment

ambrosejcarr Oct 27, 2020 • edited Loading

Choose a reason for hiding this comment

davidsebfischer Oct 27, 2020 • edited Loading

Choose a reason for hiding this comment

davidsebfischer Oct 27, 2020

Choose a reason for hiding this comment

davidsebfischer Oct 27, 2020

Choose a reason for hiding this comment

ambrosejcarr Oct 28, 2020 • edited Loading

Choose a reason for hiding this comment

davidsebfischer Oct 28, 2020

Choose a reason for hiding this comment

davidsebfischer commented Nov 3, 2020

ambrosejcarr Oct 27, 2020 •

edited

Loading

ambrosejcarr Oct 27, 2020 •

edited

Loading

davidsebfischer Oct 27, 2020 •

edited

Loading

ambrosejcarr Oct 28, 2020 •

edited

Loading