-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Scripts and results to automate OLS ontology search for cancer groups #249
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sangeetashukla Thank you for your work on this. The code looks good. I did verify the result files and it looks accurate. I had a few questions for some of the edge cases and also for my own clarification:
-
When we cannot find a
EFOOntoID
, how does the module decide to populate the field withMONDOOntoID
? For e.g. choroid plexus carcinoma has a MONDO and NCIT ID , so does MONDO have more precedence over NCIT ? -
I noticed that for few cancer groups for which there is no search results on OLS e.g. CNS Melanoma or CNS neuroblastoma, or CIC-DUX4 Sarcoma the
EFOOntoID
andOntoDesc
get populated by general terms and IDs. For e.g. CNS Melanoma is assigned
EFOOntoID
to Melanoma. Do we want to keep such fields empty ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @sangeetashukla. Thanks for working on this and it looks great. I have a few more suggestions before merging.
- Since
efo-mondo-map-prefill.tsv
is derived from past EFO/MONDO/NCIT manual curation, we will want to make sure that we are not overriding the values in this file. For example, there might be (I am not sure) be some case in which we want to use a higher level or different code for a specific cancer and the search was manually overridden. That being said, can you update this module to look for any discrepancies in old vs new and - Create a new file (perhaps
efo-mondo-map-prefill-auto.tsv
) with the first set of columns being exactly those ofefo-mondo-map-prefill.tsv
and the appended columns beingEFO_OntoDesc
, etc which are in the individual files for EFO/MODO/NCIT now. That way, we can remove the individual files. Note: This new file would contain all values fromefo-mondo-map-prefill.tsv
+ newly found values. That is, I hope this can narrow the gap for blanks which will have to be manually added.
Does that make sense?
@adilahiri For a However, for |
@sangeetashukla : Thank you for clarifying my questions. |
@jharenza I made changes to the module as you suggested. Now the module generates a single |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @sangeetashukla thanks for merging the information into one file. What is the [update_map.Rmd](https://github.com/PediatricOpenTargets/OpenPedCan-analysis/pull/249/files#diff-930cacd8ae63d0c1143914c3fa20b9f7ab08a5ce7f747ef003b6a3c7646b9899)
file doing? I don't see it in your shell script, and it seems to only serve to read in a file. Is this required, or can you read the file in the script in which it will be used?
Hi @sangeetashukla - one other update I forgot about here - can you add this module's shell script to the YAML file such as this one: |
Hi @jharenza , |
Merge sounds good! |
@jharenza I have merged the bash scripts into a new file, and removed the old ones from the branch. I also updated the yml file to run the new merged script. |
@sangeetashukla can you fix the few conflicts here as well? |
@jharenza Thank you for helping to resolve the conflict. Somehow, they didn't show up when I merged dev into the branch and pulled. |
All set now. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @sangeetashukla it looks like some of your methylation branch changes snuck in here. Can you resolve please?
Also, can you rename qc_efo_mondo_map.Rmd
to 02-qc_efo_mondo_map.Rmd
?
@sangeetashukla Thank you for work and the updates, the result files looks good. Will approve after you can commit the last changes.
|
…vant file This reverts commit 718bcb6.
Thank you for confirming this, @adilahiri. |
@jharenza |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the updates @sangeetashukla
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @sangeetashukla
Purpose/implementation Section
Automate OLS ontology search to retrieve EFO, MONDO, and NCIT codes for
cancer_group
in the current data release.What scientific question is your analysis addressing?[
OLS maintains a repository for biomedical ontologies that aims to provide a single point of access to the latest ontology versions, which are used as references for all
cancer_group
found in the OpenPedCan data. The new script added with this PR enables an automated search to retrieve relevant IDs for thecancer_group
found in the current data release.What was your approach?
results/efo-mondo-mapping.tsv
file in the same module.What GitHub issue does your pull request address?
Issue 396
Directions for reviewers. Tell potential reviewers what kind of feedback you are soliciting.
Please review the new result files for content and format, and comment in case additional data is needed.
Which areas should receive a particularly close look?
Currently, the script only performs auto search for EFO, MONDO, and NCIT term types. Please comment in case additional term types (eg. HP, UBERON, etc.) need to be included.
Is there anything that you want to discuss further?
This module can be further enhanced to compile in a single file all IDs for each cancer_group in case an exact match is found in OLS repo. This can reduce the need for manual curation efforts. Let me know in case this is something we want to pursue at this time.
Is the analysis in a mature enough form that the resulting figure(s) and/or table(s) are ready for review?
Yes
Results
What types of results are included (e.g., table, figure)?
results/map-prefill-EFO-codes.tsv
results/map-prefill-MONDO-codes.tsv
results/map-prefill-NCIT-codes.tsv
Reproducibility Checklist
Documentation Checklist
README
and it is up to date.analyses/README.md
and the entry is up to date.