Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/complete tsv maps #449

Conversation

davidsebfischer
Copy link
Contributor

No description provided.

davidsebfischer and others added 28 commits October 22, 2021 11:08
* added blocked generator
* when streamlining uns: ensure list can be sorted by converting to all-strings
* remove any appearances of np.sort( following np.unique( as the latter already sorts the array
* fixed d10_1016_j_cell_2021_01_053
* updated sex map
* allow estimator_kwargs to be passed in trainer
* - added auto calling of tech and bio sample
- fixed blocked generator
- relaxed unit test data to be extendable
* improved interactive data set and added unit test for it
* added test for in memory store
* enabled reactive genome container in single store
* fixed unit test cache directory for ui
* switched organism handling to NCBI taxon
- adapted model references to organism
* fixed cellxgene download interface
* softened tf dependency
* removed hard cellxgene-schema dependency
* fixed cellxgene organ import
* removed sort statements in defining .uns
* fixed nan label in data loader

Co-authored-by: davfischer <davfischer@Davids-MBP.fritz.box>
Co-authored-by: le-ander <20015434+le-ander@users.noreply.github.com>
Co-authored-by: giovp <giov.pll@gmail.com>
* performance fix for loading target universe

* ran flake8

* added missing is-a-list check

* revision for speeding up init_estim function
…ureSpace and split_idx function (#419)

* performance fix for loading target universe

* ran flake8

* added missing is-a-list check

* revision for speeding up init_estim function

* added performance increases and bug fixes for single_store

* adde performance improvements for split_idx function
* update contributor emails

* migrate hard-coded cache paths to a user-definable settings container similar to scanpy's approach

* fix flake8

* skip check for use of ftp by bandit

* address comments

* flake8
* add manifest file

* switch from global to recursive include to be more specific
…deprecated files (#425)

* remove deprecated genome folder and its content

* fix data store subsetting regression introduced in 0.3.8
Merge pull request #427 from theislab/dev
Signed-off-by: zethson <lukas.heumos@posteo.net>
* made trainer paths more flexible
Co-authored-by: davidsebfischer <david.seb.fischer@gmail.com>
* added data loader dno_doi_luecken
* improved CLI and documentation
* added CLI commands for cache clearing and reloading
@davidsebfischer davidsebfischer merged commit 2e50dcc into dataloader_d10_1101_2021_07_19_452954 Jan 3, 2022
@davidsebfischer davidsebfischer deleted the feature/complete_tsv_maps branch January 3, 2022 18:09
davidsebfischer pushed a commit that referenced this pull request Jan 3, 2022
* d10_1101_2021_07_19_452954 (10.1101/2021.07.19.452954) dataloader (#333)
* New ontology map tsv interface with one tsv per ontology (#449)
* CLI commands for cache clearing and reloading (#449)
@davidsebfischer davidsebfischer linked an issue Jan 4, 2022 that may be closed by this pull request
3 tasks
davidsebfischer added a commit that referenced this pull request Feb 9, 2022
* fix NA to empty template values (#434)

Signed-off-by: zethson <lukas.heumos@posteo.net>

* added bug fix for subsetting + .X method of DistributedstoreAnndata (#431)

* Feature/trainer test paths (#437)

* made trainer paths more flexible

* do not check md5 of weighs if None (#442)

* added fix to reduce size of pickle files (#440)

* Dataset/d10 1016 j cell 2020 08 001 (#402)

Co-authored-by: davidsebfischer <david.seb.fischer@gmail.com>

* improved dask array concatenatin for .X method (#444)

* Rewrite test_store.py script (#436)

* Feature/dno doi luecken (#439)

* added data loader dno_doi_luecken
* improved CLI and documentation

* 10.1101/2021.07.19.452954 and data loader tsv update

* d10_1101_2021_07_19_452954 (10.1101/2021.07.19.452954) dataloader (#333)
* New ontology map tsv interface with one tsv per ontology (#449)
* CLI commands for cache clearing and reloading (#449)

* datal loader 10.1038/s41467-021-27619-4 (desribed in #450)

* Issue template for data loader request

* d10_1038_s41586_019_1631_3: updated annotation and moved to format 1.1 (#467)

* Rewrite batch schedule classes (#448)

* rewrote batch_schedule to work directly on the supplied indicies

* Torch data adaptor (#433)

* added torch.utils.data.IterableDataset and torch Dataset class

* added adpator unit test

* cached query of assembly name to ensembl ftp server
file query was already cached, but assembly name retrieval based on release version was not cached

* added random rounding in intercalated batch data laoder

* updated test splitting

* enabled store loading from multiple paths

* deprecated optional dependency of Dataset attributes on meta data files

* fixed d10_1038_s41586_019_1631_3

* relaxed required arguments for DistributedStoreAnndata

* added a full data set batch schedule

* feature / added method to persist dask.array of DaoStroage into memory + small bug fixes (#447)

* dask.array of DaoStore can now be persisted into memory in CSR format
* improved docstring documentation for .adataptor

* extended store docs

* refactored distributedstore to store and minor obs indexing and cart default fixes

* fixed obs indexing bug with empty single cart within multi cart

Co-authored-by: felix0097 <47145207+felix0097@users.noreply.github.com>

* bug fix for move_to_memory method of CartDask class (#475)

* D10 1038 s41467 020 15543 y (#462)

* 10.1038/s41467-020-15543-y

* Feature: Option to shuffle datasets before writing dao stores (#473)

* datasets can be shuffled now before writing dao store

* Feature: match zarr chunks for randomized_batch_schedule=True for CartDask (#476)

* random_batch_schedule for DaskCart now matches dask partitions

* rearanged imports + removed debugging assertion statement

* Create dataloader (#468)

* bug fix - added .compute for .x method in DaskCart (#479)

* Feature: Save .obs data as categorical dtype (#478)

* str obs data is now stored and processed as categorical

* fixed check for categorical dtype

* Bug-Fix: d10_1038_s41467_021_27619_4 DataLoader (#480)

* put column 3 back into var dataframe and named column feature_class

* fixed gtf interface for newer ensembl releases for genes without symbol (#484)

* add wheel to python build CI to avoid legacy setup.py install (#493)

* Feature: add shuffle buffer to DaskCart (#474)

* Refactor estimator code to support estimators for torch models (#485)

* refactored estimator code into tf.keras code and base estimator code. this prepares side-by-side keras and torch code.
* added torch losses and metrics

* Improvements to CLI usability (#452)

* improved guidance through CLI pipeline through action messages and error messages
* improved curation documentation
* relabled export phase in summary
* CLI: add automatic detection of container for path handling
* CLI: adapt path checking to container usecase
* CLI: make env variables usable also without containers
* CLI: add automatic PR submission in container
* fix template creation CLI run
* switched data library, CLI and curation annotation to yaml version 1.2 (#528)
* improved curation docs
* improved guidance in sfaira create-dataloader
* improved reaction to existing loaders
* improved data loader linting in CLI
* support for count and processed data matrices in same loader, deprecates "normalization"
* support for feature type annotation, allows for ATAC and CITE data (feature_type)
* support for based genetic modification meta data (gm)
* support for basic treatment meda data (treatment)
* support for cell tracking in meta studies (source_doi)
* support for spatial data
* support for VDJ data
* support for velocity data
* support for reference genome annotation
* support for arbirtrary organisms in genome container
* added phase Pe
* Data/10.1126/science.abj4008 (#531)
* d10_1126_science_abj4008 #530
* added validate to template checking
* improvements to VDJ interface
* d10_1126_science_abe6474 (#532)
* fixed sfaira structure DOI input to annotate
* annotated GEO look up
* d10_1126_sciimmunol_abd1554 (#471)
* d10_1126_sciimmunol_abd1554
* dno_doi_luecken neurips loader update (#472)
* fixed rich printing in CLI
* added documentation of compressed and r file reading
* Fix d10_1016_j_cell_2021_01_053 to work with GEO (#469)
* added validation statement at the end of finalize
* d10_1038_s41591_020_1061_7 
* updated required meta data section in docs
* disabled default cellxgene collection meta data caching which resulted in bug when collection were updated on remote

Co-authored-by: davidsebfischer <david.seb.fischer@gmail.com>
Co-authored-by: le-ander <20015434+le-ander@users.noreply.github.com>
Co-authored-by: Laura Martens <laura.d.martens@icloud.com>
Co-authored-by: xlancelottx <33050110+xlancelottx@users.noreply.github.com>

* added new figures (#538)

* added new figures

* activated cellxgene caching for unit tests (#539)

Co-authored-by: Leander <20015434+le-ander@users.noreply.github.com>
Co-authored-by: Lukas Heumos <lukas.heumos@posteo.net>
Co-authored-by: felix0097 <47145207+felix0097@users.noreply.github.com>
Co-authored-by: Karin Hrovatin <47607471+Hrovatin@users.noreply.github.com>
Co-authored-by: soerenab <36963673+soerenab@users.noreply.github.com>
Co-authored-by: Laura Martens <laura.d.martens@icloud.com>
Co-authored-by: xlancelottx <33050110+xlancelottx@users.noreply.github.com>
@davidsebfischer davidsebfischer mentioned this pull request Feb 9, 2022
5 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

build mapping tsvs for all cell-wise atributes
6 participants