Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

split template test into annotate and test #272

Merged
merged 6 commits into from
May 28, 2021

Conversation

Zethson
Copy link
Member

@Zethson Zethson commented May 19, 2021

Hey,

attempt to fix #270

A couple of notes:

  1. I reverted dsg_f.clean_ontology_class_map() to ds.clean_ontology_class_map(), because the method does not exist in dsg_f aka DatasetGroup
  2. Based on split annotation and testing of new data set #270 I split the stuff. I did not test it, because I do not know what the expected outcome for the annotation is. It might be possible that some paths no longer match as before and it doesn't know where to write stuff? Can be tested with sfaira annotate-dataloader --doi <doi> --test-data <testdata>. What do you think @davidsebfischer, can you test it?
  3. The test is now incomplete due to changes suggested in split annotation and testing of new data set #270 -> e.g. match_to_reference=TODO get organism here -> can you push what you want here, @davidsebfischer, please?
  4. Why are we using pytest for the test to begin with? We're not even using asserts. But let's ignore this for now.

Thanks!

Signed-off-by: zethson lukas.heumos@posteo.net

Signed-off-by: zethson <lukas.heumos@posteo.net>
@Zethson Zethson requested a review from davidsebfischer May 19, 2021 14:01
@davidsebfischer
Copy link
Contributor

@Zethson addressing 2-4.: These two functions will abort if there are bugs in the data loader, that s repacing the assert statements in other unit tests and this is what we have to buffer with try-except to make such errors more understandable for users. As such, they are not unit tests that would be run when we check the code base for bugs, but, we could wrap these in unit tests with mock data sets, such that they become testable in global package checks:

By the way, if you find time to build a mini mock data set by just generating data randomly in a mock data loader function and couple it to a YAML, this would be super useful to unit test this pipeline and we could also use these mock data sets in other unit tests. Right now, contribution is not unit tested but I think it should be.

Could you try wrapping them in a mock test that makes use of a mock data loader?

I do not know what the expected outcome for the annotation is

a .csv file that comes out of write_ontology_class_map and is modified by clean_ontology_class_map, also see here how they look like: https://github.com/theislab/sfaira/blob/dev/sfaira/data/dataloaders/loaders/d10_1016_j_cell_2018_08_067/human_laminapropriaofmucosaofcolon_2019_10xsequencing_kinchen_001.tsv

@davidsebfischer
Copy link
Contributor

Seeing that you moved the python functions under sfaira/commands, sfaira/unit_tests/data_contribution/test_data_template.py would only include the call to these functions via a mock data set I think, ie. this file would only be the pytest wrapper around these functionalities defined in commands.

@Zethson
Copy link
Member Author

Zethson commented May 28, 2021

test-dataloader still works
annotate-dataloader currently fails with

ValueError: 'pancreatic A cell:pancreatic acinar cell:type B pancreatic cell:pancreatic D cell:|||:pancreas exocrine glandular cell:epithelial cell of pancreas:pancreatic stellate cell:lung 
parenchyma resident eosinophil:|||:cartwheel heterochromatin:lymph node tingible body macrophage:DN1 thymic pro-T cell:thymocyte:|||:macrophage:basophil progenitor cell:common lymphoid 
progenitor:early lymphoid progenitor:|||:CD8-positive, CD28-negative, alpha-beta regulatory T cell:CD4-negative, CD8-negative, alpha-beta intraepithelial T cell:mature CD8_alpha-positive 
CD11b-negative dendritic cell:immature CD8_alpha-positive CD11b-negative dendritic cell' is not a valid entry for celltypes.

when running

sfaira annotate-dataloader --doi 10.1016.j.cmet.2019.01.021 --test-data my/path/to/template_data

@davidsebfischer got any idea?

@davidsebfischer
Copy link
Contributor

@Zethson this long string is the automatic proposal for onotology matched cell types that sfaira generates. This is thrown in testing if loading is checked before these proposals are mitigated by the user (manual intervention after annotate call). annotate should not throw this error, you can most likely disable a data set load after this annotation was generated.

@davidsebfischer
Copy link
Contributor

It might be the call to ds.clean_ontology_class_map() in annotate that triggers this, might be better to put this into the testing function? clean_ontology_class_map takes the cell type table after manual intervention to resolve the automated proposals and adds ontology IDs into it, if annotate-test are the two functions before and after a manual intervention, this should be in testing.

Zethson added 2 commits May 28, 2021 16:07
Signed-off-by: zethson <lukas.heumos@posteo.net>
Signed-off-by: zethson <lukas.heumos@posteo.net>
@davidsebfischer
Copy link
Contributor

  • I think sfaira/data/dataloaders/loaders/d10_1016_j_cmet_2019_01_021/mouse_pancreas_2019_10xsequencing_thompson_x.tsv was accidentally modified?
  • The study technology was changed to Dropseq, is that correct?

@Zethson
Copy link
Member Author

Zethson commented May 28, 2021

* I think sfaira/data/dataloaders/loaders/d10_1016_j_cmet_2019_01_021/mouse_pancreas_2019_10xsequencing_thompson_x.tsv was accidentally modified?

That's what the annotation code that you provided me with does. So not accidentally by me, but by the code.
Is something wrong with the code?

Zethson added 3 commits May 28, 2021 18:19
Signed-off-by: zethson <lukas.heumos@posteo.net>
Signed-off-by: zethson <lukas.heumos@posteo.net>
@Zethson Zethson merged commit e615869 into dev May 28, 2021
@Zethson Zethson deleted the feature/split_annotation_testing branch May 28, 2021 16:44
davidsebfischer added a commit that referenced this pull request May 31, 2021
* split CLI template test into annotate and test (#272)
* fix correct assay_sc (#272)
* mitigates merge conflict between dev and release

Signed-off-by: zethson <lukas.heumos@posteo.net>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants