Feature/complete tsv maps #449

davidsebfischer · 2022-01-03T17:42:32Z

No description provided.

v0.3.7 tag

* added blocked generator * when streamlining uns: ensure list can be sorted by converting to all-strings * remove any appearances of np.sort( following np.unique( as the latter already sorts the array * fixed d10_1016_j_cell_2021_01_053 * updated sex map * allow estimator_kwargs to be passed in trainer * - added auto calling of tech and bio sample - fixed blocked generator - relaxed unit test data to be extendable * improved interactive data set and added unit test for it * added test for in memory store * enabled reactive genome container in single store * fixed unit test cache directory for ui * switched organism handling to NCBI taxon - adapted model references to organism * fixed cellxgene download interface * softened tf dependency * removed hard cellxgene-schema dependency * fixed cellxgene organ import * removed sort statements in defining .uns * fixed nan label in data loader Co-authored-by: davfischer <davfischer@Davids-MBP.fritz.box> Co-authored-by: le-ander <20015434+le-ander@users.noreply.github.com> Co-authored-by: giovp <giov.pll@gmail.com>

* performance fix for loading target universe * ran flake8 * added missing is-a-list check * revision for speeding up init_estim function

…ureSpace and split_idx function (#419) * performance fix for loading target universe * ran flake8 * added missing is-a-list check * revision for speeding up init_estim function * added performance increases and bug fixes for single_store * adde performance improvements for split_idx function

* update contributor emails * migrate hard-coded cache paths to a user-definable settings container similar to scanpy's approach * fix flake8 * skip check for use of ftp by bandit * address comments * flake8

prepare v0.3.8 release

* add manifest file * switch from global to recursive include to be more specific

…deprecated files (#425) * remove deprecated genome folder and its content * fix data store subsetting regression introduced in 0.3.8

v0.3.9

Merge pull request #427 from theislab/dev

Signed-off-by: zethson <lukas.heumos@posteo.net>

…431)

* made trainer paths more flexible

Co-authored-by: davidsebfischer <david.seb.fischer@gmail.com>

* added data loader dno_doi_luecken * improved CLI and documentation

* added CLI commands for cache clearing and reloading

* d10_1101_2021_07_19_452954 (10.1101/2021.07.19.452954) dataloader (#333) * New ontology map tsv interface with one tsv per ontology (#449) * CLI commands for cache clearing and reloading (#449)

* fix NA to empty template values (#434) Signed-off-by: zethson <lukas.heumos@posteo.net> * added bug fix for subsetting + .X method of DistributedstoreAnndata (#431) * Feature/trainer test paths (#437) * made trainer paths more flexible * do not check md5 of weighs if None (#442) * added fix to reduce size of pickle files (#440) * Dataset/d10 1016 j cell 2020 08 001 (#402) Co-authored-by: davidsebfischer <david.seb.fischer@gmail.com> * improved dask array concatenatin for .X method (#444) * Rewrite test_store.py script (#436) * Feature/dno doi luecken (#439) * added data loader dno_doi_luecken * improved CLI and documentation * 10.1101/2021.07.19.452954 and data loader tsv update * d10_1101_2021_07_19_452954 (10.1101/2021.07.19.452954) dataloader (#333) * New ontology map tsv interface with one tsv per ontology (#449) * CLI commands for cache clearing and reloading (#449) * datal loader 10.1038/s41467-021-27619-4 (desribed in #450) * Issue template for data loader request * d10_1038_s41586_019_1631_3: updated annotation and moved to format 1.1 (#467) * Rewrite batch schedule classes (#448) * rewrote batch_schedule to work directly on the supplied indicies * Torch data adaptor (#433) * added torch.utils.data.IterableDataset and torch Dataset class * added adpator unit test * cached query of assembly name to ensembl ftp server file query was already cached, but assembly name retrieval based on release version was not cached * added random rounding in intercalated batch data laoder * updated test splitting * enabled store loading from multiple paths * deprecated optional dependency of Dataset attributes on meta data files * fixed d10_1038_s41586_019_1631_3 * relaxed required arguments for DistributedStoreAnndata * added a full data set batch schedule * feature / added method to persist dask.array of DaoStroage into memory + small bug fixes (#447) * dask.array of DaoStore can now be persisted into memory in CSR format * improved docstring documentation for .adataptor * extended store docs * refactored distributedstore to store and minor obs indexing and cart default fixes * fixed obs indexing bug with empty single cart within multi cart Co-authored-by: felix0097 <47145207+felix0097@users.noreply.github.com> * bug fix for move_to_memory method of CartDask class (#475) * D10 1038 s41467 020 15543 y (#462) * 10.1038/s41467-020-15543-y * Feature: Option to shuffle datasets before writing dao stores (#473) * datasets can be shuffled now before writing dao store * Feature: match zarr chunks for randomized_batch_schedule=True for CartDask (#476) * random_batch_schedule for DaskCart now matches dask partitions * rearanged imports + removed debugging assertion statement * Create dataloader (#468) * bug fix - added .compute for .x method in DaskCart (#479) * Feature: Save .obs data as categorical dtype (#478) * str obs data is now stored and processed as categorical * fixed check for categorical dtype * Bug-Fix: d10_1038_s41467_021_27619_4 DataLoader (#480) * put column 3 back into var dataframe and named column feature_class * fixed gtf interface for newer ensembl releases for genes without symbol (#484) * add wheel to python build CI to avoid legacy setup.py install (#493) * Feature: add shuffle buffer to DaskCart (#474) * Refactor estimator code to support estimators for torch models (#485) * refactored estimator code into tf.keras code and base estimator code. this prepares side-by-side keras and torch code. * added torch losses and metrics * Improvements to CLI usability (#452) * improved guidance through CLI pipeline through action messages and error messages * improved curation documentation * relabled export phase in summary * CLI: add automatic detection of container for path handling * CLI: adapt path checking to container usecase * CLI: make env variables usable also without containers * CLI: add automatic PR submission in container * fix template creation CLI run * switched data library, CLI and curation annotation to yaml version 1.2 (#528) * improved curation docs * improved guidance in sfaira create-dataloader * improved reaction to existing loaders * improved data loader linting in CLI * support for count and processed data matrices in same loader, deprecates "normalization" * support for feature type annotation, allows for ATAC and CITE data (feature_type) * support for based genetic modification meta data (gm) * support for basic treatment meda data (treatment) * support for cell tracking in meta studies (source_doi) * support for spatial data * support for VDJ data * support for velocity data * support for reference genome annotation * support for arbirtrary organisms in genome container * added phase Pe * Data/10.1126/science.abj4008 (#531) * d10_1126_science_abj4008 #530 * added validate to template checking * improvements to VDJ interface * d10_1126_science_abe6474 (#532) * fixed sfaira structure DOI input to annotate * annotated GEO look up * d10_1126_sciimmunol_abd1554 (#471) * d10_1126_sciimmunol_abd1554 * dno_doi_luecken neurips loader update (#472) * fixed rich printing in CLI * added documentation of compressed and r file reading * Fix d10_1016_j_cell_2021_01_053 to work with GEO (#469) * added validation statement at the end of finalize * d10_1038_s41591_020_1061_7 * updated required meta data section in docs * disabled default cellxgene collection meta data caching which resulted in bug when collection were updated on remote Co-authored-by: davidsebfischer <david.seb.fischer@gmail.com> Co-authored-by: le-ander <20015434+le-ander@users.noreply.github.com> Co-authored-by: Laura Martens <laura.d.martens@icloud.com> Co-authored-by: xlancelottx <33050110+xlancelottx@users.noreply.github.com> * added new figures (#538) * added new figures * activated cellxgene caching for unit tests (#539) Co-authored-by: Leander <20015434+le-ander@users.noreply.github.com> Co-authored-by: Lukas Heumos <lukas.heumos@posteo.net> Co-authored-by: felix0097 <47145207+felix0097@users.noreply.github.com> Co-authored-by: Karin Hrovatin <47607471+Hrovatin@users.noreply.github.com> Co-authored-by: soerenab <36963673+soerenab@users.noreply.github.com> Co-authored-by: Laura Martens <laura.d.martens@icloud.com> Co-authored-by: xlancelottx <33050110+xlancelottx@users.noreply.github.com>

davidsebfischer and others added 28 commits October 22, 2021 11:08

Merge pull request #397 from theislab/release

1fef2d2

v0.3.7 tag

bug fix for return_dense kwarg for h5ad generator (#411)

aeaa60f

performance fix for loading target universe (#409)

cca997f

* performance fix for loading target universe * ran flake8 * added missing is-a-list check * revision for speeding up init_estim function

cache migration to settings container (#418)

5684f09

* update contributor emails * migrate hard-coded cache paths to a user-definable settings container similar to scanpy's approach * fix flake8 * skip check for use of ftp by bandit * address comments * flake8

Merge pull request #420 from theislab/dev

bafe80c

prepare v0.3.8 release

add manifest file (#424)

52ada8c

* add manifest file * switch from global to recursive include to be more specific

fix data store subsetting regression introduced in 0.3.8 and cleanup …

c628034

…deprecated files (#425) * remove deprecated genome folder and its content * fix data store subsetting regression introduced in 0.3.8

fix synapse data download interface (#426)

963d043

Merge pull request #427 from theislab/dev

09f83d6

v0.3.9

Merge pull request #428 from theislab/release

dc146da

Merge pull request #427 from theislab/dev

fix NA to empty template values (#434)

bafc85e

Signed-off-by: zethson <lukas.heumos@posteo.net>

added bug fix for subsetting + .X method of DistributedstoreAnndata (#…

05a12ae

…431)

Feature/trainer test paths (#437)

6a1ba1f

* made trainer paths more flexible

do not check md5 of weighs if None (#442)

e71ad11

added fix to reduce size of pickle files (#440)

ff26e60

Dataset/d10 1016 j cell 2020 08 001 (#402)

b6f66a5

Co-authored-by: davidsebfischer <david.seb.fischer@gmail.com>

improved dask array concatenatin for .X method (#444)

551ce13

Rewrite test_store.py script (#436)

31913a2

Feature/dno doi luecken (#439)

814fa29

* added data loader dno_doi_luecken * improved CLI and documentation

adapted new yaml format for d10_1101_2021_07_19_452954

651d083

Merge branch 'dev' into feature/complete_tsv_maps

179b9ba

adapted yaml template in cli to 1.1

35e953f

draft of new ontology map interface

bacecc0

annotated data loader and fixed bugs with new tsv handling

a4eadd3

* added CLI commands for cache clearing and reloading

fixed lint

98083ec

fixed lint

2b084af

davidsebfischer merged commit 2e50dcc into dataloader_d10_1101_2021_07_19_452954 Jan 3, 2022

davidsebfischer deleted the feature/complete_tsv_maps branch January 3, 2022 18:09

davidsebfischer linked an issue Jan 4, 2022 that may be closed by this pull request

build mapping tsvs for all cell-wise atributes #156

Closed

3 tasks

davidsebfischer mentioned this pull request Feb 9, 2022

release 3.10 (#533) #540

Closed

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature/complete tsv maps #449

Feature/complete tsv maps #449

davidsebfischer commented Jan 3, 2022

Feature/complete tsv maps #449

Feature/complete tsv maps #449

Conversation

davidsebfischer commented Jan 3, 2022