Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

added new uberon edge types #341

Merged
merged 1 commit into from
Jul 26, 2021
Merged

added new uberon edge types #341

merged 1 commit into from
Jul 26, 2021

Conversation

davidsebfischer
Copy link
Contributor

No description provided.

@davidsebfischer davidsebfischer mentioned this pull request Jul 26, 2021
@davidsebfischer davidsebfischer merged commit 1580cd3 into dev Jul 26, 2021
@davidsebfischer davidsebfischer deleted the fix/uberon_edges branch July 26, 2021 09:50
davidsebfischer added a commit that referenced this pull request Sep 7, 2021
* Cellxgene export (#315)

* updated count rounding warning in streamlining
* improved meta data streamlining
* updated DOIs to distinguish preprint and journal

* CLI improvements #321 #314 (#332)

* add new adding datasets figure
* add sample_source
* renamed assay to assay_sc
* fix assay_sc template
* add cell_types_original_obs_key
* add sfaira annotate-dataloader hints

Signed-off-by: zethson <lukas.heumos@posteo.net>

* added lazy ontology loading in OCS (#334, #335)

* reassigned gamma cell in pancreas to pancreatic PP cell	CL:0002275 (#338)

- affects d10_1016_j_cmet_2016_08_020, d10_1016_j_cels_2016_08_011

* added new edge types (#341)

* Improve CLI documentation (#320)

* improved error reporting in annotate
* improved file not found reporting in annotate
* update template creation workflow
* fix doi promting
* update download urls
* fix data path handling in CLI
* fix disease default in cli
* fix test-dataloader [skip ci]
* fix CI (#339)

Co-authored-by: david.seb.fischer <david.seb.fischer@gmail.com>
Co-authored-by: le-ander <20015434+le-ander@users.noreply.github.com>
Co-authored-by: Lukas Heumos <lukas.heumos@posteo.net>

* Feature/dao improvements (#318)

* updated rounding in cellxgene format export warning
* updated DOIs to distinguish preprint and journal
* fixed issue with ethnicity handling in cellxgene export
* reordered obs in cellxgene streamlining
* added store benchmark script
* added multi-organism store
* update doi setting in datasetinteractive
* added mock data for unit test
* added msle metric
* enabled in memory handling of h5ad backed store
* added infrastructure for ontology re-caching
* fixed all unit tests and optimised run time a bit

Co-authored-by: Abdul Moeed <abdulmoeed444@gmail.com>
Co-authored-by: le-ander <20015434+le-ander@users.noreply.github.com>

* store improvements (#346)

* improvments to store API
* added retrieval index sort to dask store
* fixed bug in single store generator if index input was None
* added sliced X and adata object emission to single store
* moved memory footprint into store base class
* fixed h5ad store indexing
* restructured meta data streamlining code (#347)
- includes bug fix that lead to missing meta data import from cellxgene structured data sets
- simplified meta data streamlining code and enhanced code readability
- depreceated distinction between cell type and cell type original in data set definition in favor of single attribute
- allowed all ontology constrained meta data items to be supplied in any format (original + mapl, symbol, or id) via the `*_obs_col` attribute of the loader
- removed resetting of _obs_col attributes in streamlining in favor of adataids controlled obs col names that extend to IDs and original labels
- updated cell type entry in all data loaders
* added attribute check for dictionary formatted attributes from YAML
* added processing of obs columns in cellxgene import
* extended error reporting in data loader discovery
* fixed value protection in meta data streamlining
* fixed cellxgene obs adapter
* added additional mock data set with little meta data annotation
* refactored cellxgene streamlining and added HANCESTRO support via EBI
* fixed handling of missing ethnicity ontology for mouse
* fixed EBI EFO backend
* ontology unit tests now check that ontologies can be downloaded
* added new generator interface, restructured batch index design interface and fixed adata uns merge in DatasetGroup (#351)
- Iterators for tf dataset and similar are now emitted as an instance of a class that has an property that emit the iterator. This class keeps a pointer to the data set that is iterated over in its attributes. Thus, if this instance stays in the namespace in which tensorflow uses the iterator, it can be restarted without creating a new pointer. This had previously delayed training because tensorflow restarted the validation data set for each epoch, thus creating a new dask data set in each epoch at relatively high cost.
- There is now only one iterator end point for stores (before there was base and balanced). The different index shuffling / sampling schedules are now refactored into functions and can be chosen based on string names. This makes creation and addition of new index schedules ("batch designs") easier.
- Direct conversion of adata objects in memory to a store is now supported via a new multi store class.
- Estimators do not have any more adata processing code but still acceppt adata, next to store instances. The adata are directly converted to a adata store instance though. All previous code related to adata processing is depreceated in the estimators.
- The interface of store to estimators in the estimator is heavily simplified through the new generator interface of the store. The generator instances are placed in the train name space for efficiency but not in testing and evaluation namespaces, in which only a data set single pass is required.
* Added new batch index design code
- Batch schedules are now classes rather than functions.
- Introduced epoch-wise reshuffling of indices in batch schedule: The reshuffling is achieved by transferring the schedule from a one-time function evaluation in the generator constructor to a evaluation of a schedule instance property that shuffles at the beginning of the iterator
* Fixed balanced batch schedule.
* Added merging of shared uns fields in DatasetGroup so that uns streamlining is maintained across merge of adatas.
* passed empty store index validation
* passed zero length index processing in batch schedule
* allowed re-indexing of generator and batch schedule

* added uberon versioning (#354)

*  added data life cycle rst (#355 )

Co-authored-by: Lukas Heumos <lukas.heumos@posteo.net>
Co-authored-by: le-ander <20015434+le-ander@users.noreply.github.com>
Co-authored-by: Abdul Moeed <abdulmoeed444@gmail.com>
davidsebfischer added a commit that referenced this pull request Sep 7, 2021
* Cellxgene export (#315)

* updated count rounding warning in streamlining
* improved meta data streamlining
* updated DOIs to distinguish preprint and journal

* CLI improvements #321 #314 (#332)

* add new adding datasets figure
* add sample_source
* renamed assay to assay_sc
* fix assay_sc template
* add cell_types_original_obs_key
* add sfaira annotate-dataloader hints

Signed-off-by: zethson <lukas.heumos@posteo.net>

* added lazy ontology loading in OCS (#334, #335)

* reassigned gamma cell in pancreas to pancreatic PP cell	CL:0002275 (#338)

- affects d10_1016_j_cmet_2016_08_020, d10_1016_j_cels_2016_08_011

* added new edge types (#341)

* Improve CLI documentation (#320)

* improved error reporting in annotate
* improved file not found reporting in annotate
* update template creation workflow
* fix doi promting
* update download urls
* fix data path handling in CLI
* fix disease default in cli
* fix test-dataloader [skip ci]
* fix CI (#339)

Co-authored-by: david.seb.fischer <david.seb.fischer@gmail.com>
Co-authored-by: le-ander <20015434+le-ander@users.noreply.github.com>
Co-authored-by: Lukas Heumos <lukas.heumos@posteo.net>

* Feature/dao improvements (#318)

* updated rounding in cellxgene format export warning
* updated DOIs to distinguish preprint and journal
* fixed issue with ethnicity handling in cellxgene export
* reordered obs in cellxgene streamlining
* added store benchmark script
* added multi-organism store
* update doi setting in datasetinteractive
* added mock data for unit test
* added msle metric
* enabled in memory handling of h5ad backed store
* added infrastructure for ontology re-caching
* fixed all unit tests and optimised run time a bit

Co-authored-by: Abdul Moeed <abdulmoeed444@gmail.com>
Co-authored-by: le-ander <20015434+le-ander@users.noreply.github.com>

* store improvements (#346)

* improvments to store API
* added retrieval index sort to dask store
* fixed bug in single store generator if index input was None
* added sliced X and adata object emission to single store
* moved memory footprint into store base class
* fixed h5ad store indexing
* restructured meta data streamlining code (#347)
- includes bug fix that lead to missing meta data import from cellxgene structured data sets
- simplified meta data streamlining code and enhanced code readability
- depreceated distinction between cell type and cell type original in data set definition in favor of single attribute
- allowed all ontology constrained meta data items to be supplied in any format (original + mapl, symbol, or id) via the `*_obs_col` attribute of the loader
- removed resetting of _obs_col attributes in streamlining in favor of adataids controlled obs col names that extend to IDs and original labels
- updated cell type entry in all data loaders
* added attribute check for dictionary formatted attributes from YAML
* added processing of obs columns in cellxgene import
* extended error reporting in data loader discovery
* fixed value protection in meta data streamlining
* fixed cellxgene obs adapter
* added additional mock data set with little meta data annotation
* refactored cellxgene streamlining and added HANCESTRO support via EBI
* fixed handling of missing ethnicity ontology for mouse
* fixed EBI EFO backend
* ontology unit tests now check that ontologies can be downloaded
* added new generator interface, restructured batch index design interface and fixed adata uns merge in DatasetGroup (#351)
- Iterators for tf dataset and similar are now emitted as an instance of a class that has an property that emit the iterator. This class keeps a pointer to the data set that is iterated over in its attributes. Thus, if this instance stays in the namespace in which tensorflow uses the iterator, it can be restarted without creating a new pointer. This had previously delayed training because tensorflow restarted the validation data set for each epoch, thus creating a new dask data set in each epoch at relatively high cost.
- There is now only one iterator end point for stores (before there was base and balanced). The different index shuffling / sampling schedules are now refactored into functions and can be chosen based on string names. This makes creation and addition of new index schedules ("batch designs") easier.
- Direct conversion of adata objects in memory to a store is now supported via a new multi store class.
- Estimators do not have any more adata processing code but still acceppt adata, next to store instances. The adata are directly converted to a adata store instance though. All previous code related to adata processing is depreceated in the estimators.
- The interface of store to estimators in the estimator is heavily simplified through the new generator interface of the store. The generator instances are placed in the train name space for efficiency but not in testing and evaluation namespaces, in which only a data set single pass is required.
* Added new batch index design code
- Batch schedules are now classes rather than functions.
- Introduced epoch-wise reshuffling of indices in batch schedule: The reshuffling is achieved by transferring the schedule from a one-time function evaluation in the generator constructor to a evaluation of a schedule instance property that shuffles at the beginning of the iterator
* Fixed balanced batch schedule.
* Added merging of shared uns fields in DatasetGroup so that uns streamlining is maintained across merge of adatas.
* passed empty store index validation
* passed zero length index processing in batch schedule
* allowed re-indexing of generator and batch schedule

* added uberon versioning (#354)

*  added data life cycle rst (#355 )

Co-authored-by: Lukas Heumos <lukas.heumos@posteo.net>
Co-authored-by: le-ander <20015434+le-ander@users.noreply.github.com>
Co-authored-by: Abdul Moeed <abdulmoeed444@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant