release 3.10 (#533) #540

davidsebfischer · 2022-02-09T15:31:53Z

fix NA to empty template values (fix NA to empty template values #434)

Signed-off-by: zethson lukas.heumos@posteo.net

added bug fix for subsetting + .X method of DistributedstoreAnndata (bug fix for subsetting + .X method of DistributedstoreAnndata #431)
Feature/trainer test paths (Feature/trainer test paths #437)
made trainer paths more flexible
do not check md5 of weighs if None (do not check md5 of weighs if None #442)
added fix to reduce size of pickle files (Bug fix - reduce size of uns.pickle files #440)
Dataset/d10 1016 j cell 2020 08 001 (Dataset/d10 1016 j cell 2020 08 001 #402)

Co-authored-by: davidsebfischer david.seb.fischer@gmail.com

improved dask array concatenatin for .X method (improved dask array concatenatin for .X method on DaoStorage #444)
Rewrite test_store.py script (Rewrite test_store.py script #436)
Feature/dno doi luecken (Feature/dno doi luecken #439)
added data loader dno_doi_luecken
improved CLI and documentation
10.1101/2021.07.19.452954 and data loader tsv update
d10_1101_2021_07_19_452954 (10.1101/2021.07.19.452954) dataloader (scNuc atlas, Gokcen et al. #333)
New ontology map tsv interface with one tsv per ontology (Feature/complete tsv maps #449)
CLI commands for cache clearing and reloading (Feature/complete tsv maps #449)
datal loader 10.1038/s41467-021-27619-4 (desribed in 10.1038/s41467-021-27619-4 #450)
Issue template for data loader request
d10_1038_s41586_019_1631_3: updated annotation and moved to format 1.1 (updated annotation #467)
Rewrite batch schedule classes (Rewrite batch schedule classes #448)
rewrote batch_schedule to work directly on the supplied indicies
Torch data adaptor (Torch data adaptor #433)
added torch.utils.data.IterableDataset and torch Dataset class
added adpator unit test
cached query of assembly name to ensembl ftp server
file query was already cached, but assembly name retrieval based on release version was not cached
added random rounding in intercalated batch data laoder
updated test splitting
enabled store loading from multiple paths
deprecated optional dependency of Dataset attributes on meta data files
fixed d10_1038_s41586_019_1631_3
relaxed required arguments for DistributedStoreAnndata
added a full data set batch schedule
feature / added method to persist dask.array of DaoStroage into memory + small bug fixes (feature / added method to persist dask.array of DaoStroage into memory + small bug fixes #447)
dask.array of DaoStore can now be persisted into memory in CSR format
improved docstring documentation for .adataptor
extended store docs
refactored distributedstore to store and minor obs indexing and cart default fixes
fixed obs indexing bug with empty single cart within multi cart

Co-authored-by: felix0097 47145207+felix0097@users.noreply.github.com

bug fix for move_to_memory method of CartDask class (Bug fix: move_to_memory method for CartDask #475)
D10 1038 s41467 020 15543 y (D10 1038 s41467 020 15543 y #462)
10.1038/s41467-020-15543-y
Feature: Option to shuffle datasets before writing dao stores (Feature: Option to shuffle datasets before writing dao stores #473)
datasets can be shuffled now before writing dao store
Feature: match zarr chunks for randomized_batch_schedule=True for CartDask (Feature: match zarr chunks for randomized_batch_schedule=True for CartDask #476)
random_batch_schedule for DaskCart now matches dask partitions
rearanged imports + removed debugging assertion statement
Create dataloader (D10.1038/s41591-021-01245-5 #468)
bug fix - added .compute for .x method in DaskCart (Bug fix: CartDask #479)
Feature: Save .obs data as categorical dtype (Feature: Save .obs data as categorical dtype #478)
str obs data is now stored and processed as categorical
fixed check for categorical dtype
Bug-Fix: d10_1038_s41467_021_27619_4 DataLoader (Bug-Fix: d10_1038_s41467_021_27619_4 DataLoader #480)
put column 3 back into var dataframe and named column feature_class
fixed gtf interface for newer ensembl releases for genes without symbol (fixed gtf interface for newer ensembl releases for genes without symbol #484)
add wheel to python build CI to avoid legacy setup.py install (add wheel to python build CI to avoid legacy setup.py install #493)
Feature: add shuffle buffer to DaskCart (Feature: add shuffle buffer to DaskCart #474)
Refactor estimator code to support estimators for torch models (Refactor estimator code to support estimators for torch models #485)
refactored estimator code into tf.keras code and base estimator code. this prepares side-by-side keras and torch code.
added torch losses and metrics
Improvements to CLI usability (Improvements to CLI usability #452)
improved guidance through CLI pipeline through action messages and error messages
improved curation documentation
relabled export phase in summary
CLI: add automatic detection of container for path handling
CLI: adapt path checking to container usecase
CLI: make env variables usable also without containers
CLI: add automatic PR submission in container
fix template creation CLI run
switched data library, CLI and curation annotation to yaml version 1.2 (switched data library, CLI and curation annotation to yaml version 1.2 #528)
improved curation docs
improved guidance in sfaira create-dataloader
improved reaction to existing loaders
improved data loader linting in CLI
support for count and processed data matrices in same loader, deprecates "normalization"
support for feature type annotation, allows for ATAC and CITE data (feature_type)
support for based genetic modification meta data (gm)
support for basic treatment meda data (treatment)
support for cell tracking in meta studies (source_doi)
support for spatial data
support for VDJ data
support for velocity data
support for reference genome annotation
support for arbirtrary organisms in genome container
added phase Pe
Data/10.1126/science.abj4008 (Data/10.1126/science.abj4008 #531)
d10_1126_science_abj4008 10.1126/science.abj4008 #530
added validate to template checking
improvements to VDJ interface
d10_1126_science_abe6474 (improvements to VDJ interface and loader d10_1126_science_abe6474 #532)
fixed sfaira structure DOI input to annotate
annotated GEO look up
d10_1126_sciimmunol_abd1554 (D10_1126_sciimmunol_abd1554 #471)
d10_1126_sciimmunol_abd1554
dno_doi_luecken neurips loader update (neurips loader update #472)
fixed rich printing in CLI
added documentation of compressed and r file reading
Fix d10_1016_j_cell_2021_01_053 to work with GEO (Fix D10 1016 j cell 2021 01 053 to work with GEO #469)
added validation statement at the end of finalize
d10_1038_s41591_020_1061_7
updated required meta data section in docs
disabled default cellxgene collection meta data caching which resulted in bug when collection were updated on remote

Co-authored-by: davidsebfischer david.seb.fischer@gmail.com
Co-authored-by: le-ander 20015434+le-ander@users.noreply.github.com
Co-authored-by: Laura Martens laura.d.martens@icloud.com
Co-authored-by: xlancelottx 33050110+xlancelottx@users.noreply.github.com

added new figures (added new figures #538)
added new figures
activated cellxgene caching for unit tests (activated cellxgene caching for unit tests #539)

Co-authored-by: Leander 20015434+le-ander@users.noreply.github.com
Co-authored-by: Lukas Heumos lukas.heumos@posteo.net
Co-authored-by: felix0097 47145207+felix0097@users.noreply.github.com
Co-authored-by: Karin Hrovatin 47607471+Hrovatin@users.noreply.github.com
Co-authored-by: soerenab 36963673+soerenab@users.noreply.github.com
Co-authored-by: Laura Martens laura.d.martens@icloud.com
Co-authored-by: xlancelottx 33050110+xlancelottx@users.noreply.github.com

Many thanks for contributing to sfaira!

PR Checklist
Please fill in the appropriate checklist below (delete whatever is not relevant). These are the most common things requested on pull requests (PRs).

This comment contains a description of changes (with reason)
Referenced issue is linked
If you've fixed a bug or added code that should be tested, add tests!
Documentation in docs is updated
docs/release-notes.rst is updated

Description of changes
Please state what you've changed and how it might affect the user.

Technical details
Please state any technical details such as limitations, reasons for additional dependencies, benchmarks etc. here.

Additional context
Add any other context or screenshots here.

* fix NA to empty template values (#434) Signed-off-by: zethson <lukas.heumos@posteo.net> * added bug fix for subsetting + .X method of DistributedstoreAnndata (#431) * Feature/trainer test paths (#437) * made trainer paths more flexible * do not check md5 of weighs if None (#442) * added fix to reduce size of pickle files (#440) * Dataset/d10 1016 j cell 2020 08 001 (#402) Co-authored-by: davidsebfischer <david.seb.fischer@gmail.com> * improved dask array concatenatin for .X method (#444) * Rewrite test_store.py script (#436) * Feature/dno doi luecken (#439) * added data loader dno_doi_luecken * improved CLI and documentation * 10.1101/2021.07.19.452954 and data loader tsv update * d10_1101_2021_07_19_452954 (10.1101/2021.07.19.452954) dataloader (#333) * New ontology map tsv interface with one tsv per ontology (#449) * CLI commands for cache clearing and reloading (#449) * datal loader 10.1038/s41467-021-27619-4 (desribed in #450) * Issue template for data loader request * d10_1038_s41586_019_1631_3: updated annotation and moved to format 1.1 (#467) * Rewrite batch schedule classes (#448) * rewrote batch_schedule to work directly on the supplied indicies * Torch data adaptor (#433) * added torch.utils.data.IterableDataset and torch Dataset class * added adpator unit test * cached query of assembly name to ensembl ftp server file query was already cached, but assembly name retrieval based on release version was not cached * added random rounding in intercalated batch data laoder * updated test splitting * enabled store loading from multiple paths * deprecated optional dependency of Dataset attributes on meta data files * fixed d10_1038_s41586_019_1631_3 * relaxed required arguments for DistributedStoreAnndata * added a full data set batch schedule * feature / added method to persist dask.array of DaoStroage into memory + small bug fixes (#447) * dask.array of DaoStore can now be persisted into memory in CSR format * improved docstring documentation for .adataptor * extended store docs * refactored distributedstore to store and minor obs indexing and cart default fixes * fixed obs indexing bug with empty single cart within multi cart Co-authored-by: felix0097 <47145207+felix0097@users.noreply.github.com> * bug fix for move_to_memory method of CartDask class (#475) * D10 1038 s41467 020 15543 y (#462) * 10.1038/s41467-020-15543-y * Feature: Option to shuffle datasets before writing dao stores (#473) * datasets can be shuffled now before writing dao store * Feature: match zarr chunks for randomized_batch_schedule=True for CartDask (#476) * random_batch_schedule for DaskCart now matches dask partitions * rearanged imports + removed debugging assertion statement * Create dataloader (#468) * bug fix - added .compute for .x method in DaskCart (#479) * Feature: Save .obs data as categorical dtype (#478) * str obs data is now stored and processed as categorical * fixed check for categorical dtype * Bug-Fix: d10_1038_s41467_021_27619_4 DataLoader (#480) * put column 3 back into var dataframe and named column feature_class * fixed gtf interface for newer ensembl releases for genes without symbol (#484) * add wheel to python build CI to avoid legacy setup.py install (#493) * Feature: add shuffle buffer to DaskCart (#474) * Refactor estimator code to support estimators for torch models (#485) * refactored estimator code into tf.keras code and base estimator code. this prepares side-by-side keras and torch code. * added torch losses and metrics * Improvements to CLI usability (#452) * improved guidance through CLI pipeline through action messages and error messages * improved curation documentation * relabled export phase in summary * CLI: add automatic detection of container for path handling * CLI: adapt path checking to container usecase * CLI: make env variables usable also without containers * CLI: add automatic PR submission in container * fix template creation CLI run * switched data library, CLI and curation annotation to yaml version 1.2 (#528) * improved curation docs * improved guidance in sfaira create-dataloader * improved reaction to existing loaders * improved data loader linting in CLI * support for count and processed data matrices in same loader, deprecates "normalization" * support for feature type annotation, allows for ATAC and CITE data (feature_type) * support for based genetic modification meta data (gm) * support for basic treatment meda data (treatment) * support for cell tracking in meta studies (source_doi) * support for spatial data * support for VDJ data * support for velocity data * support for reference genome annotation * support for arbirtrary organisms in genome container * added phase Pe * Data/10.1126/science.abj4008 (#531) * d10_1126_science_abj4008 #530 * added validate to template checking * improvements to VDJ interface * d10_1126_science_abe6474 (#532) * fixed sfaira structure DOI input to annotate * annotated GEO look up * d10_1126_sciimmunol_abd1554 (#471) * d10_1126_sciimmunol_abd1554 * dno_doi_luecken neurips loader update (#472) * fixed rich printing in CLI * added documentation of compressed and r file reading * Fix d10_1016_j_cell_2021_01_053 to work with GEO (#469) * added validation statement at the end of finalize * d10_1038_s41591_020_1061_7 * updated required meta data section in docs * disabled default cellxgene collection meta data caching which resulted in bug when collection were updated on remote Co-authored-by: davidsebfischer <david.seb.fischer@gmail.com> Co-authored-by: le-ander <20015434+le-ander@users.noreply.github.com> Co-authored-by: Laura Martens <laura.d.martens@icloud.com> Co-authored-by: xlancelottx <33050110+xlancelottx@users.noreply.github.com> * added new figures (#538) * added new figures * activated cellxgene caching for unit tests (#539) Co-authored-by: Leander <20015434+le-ander@users.noreply.github.com> Co-authored-by: Lukas Heumos <lukas.heumos@posteo.net> Co-authored-by: felix0097 <47145207+felix0097@users.noreply.github.com> Co-authored-by: Karin Hrovatin <47607471+Hrovatin@users.noreply.github.com> Co-authored-by: soerenab <36963673+soerenab@users.noreply.github.com> Co-authored-by: Laura Martens <laura.d.martens@icloud.com> Co-authored-by: xlancelottx <33050110+xlancelottx@users.noreply.github.com>

davidsebfischer closed this Feb 9, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

release 3.10 (#533) #540

release 3.10 (#533) #540

davidsebfischer commented Feb 9, 2022

release 3.10 (#533) #540

release 3.10 (#533) #540

Conversation

davidsebfischer commented Feb 9, 2022