Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Updated final codes per discussion on OPC Issue 526 #350

Merged
merged 10 commits into from
Apr 27, 2023

Conversation

sangeetashukla
Copy link
Collaborator

@sangeetashukla sangeetashukla commented Apr 6, 2023

Purpose/implementation Section

Update the module to use v12 efo-mondo-map-prefill.tsv

What scientific question is your analysis addressing?

With v12, a new efo-mondo-map-prefill.tsv is generated by molecular-subtype-integrate module.
This module uses the file to re-run an automated search for all the cancer_groups across EBI OLS, to assist a downstream manual review and ensure all cancer_group have their associated EFO, MONDO, and NCIT codes captured in the final efo-mondo-map.tsv file. The file is then ready to be added to the data release.

What was your approach?

Copy the provided efo-mondo-map-prefill.tsv file into the results directory and run the module.
Also review the file for any missing codes or typos and debug module as needed.
Update the efo-mondo-map.tsv file based on the automatic and manual search findings.

What GitHub issue does your pull request address?

OPC Issue 526

Directions for reviewers. Tell potential reviewers what kind of feedback you are soliciting.

Which areas should receive a particularly close look?

Only the lines for which Jo Lynne provided feedback have been updated.

Is there anything that you want to discuss further?

No. The final looks good as of now with some exceptions where missing codes are NA. They will need to be updated when OLS is updated.

Is the analysis in a mature enough form that the resulting figure(s) and/or table(s) are ready for review?

Yes.

Results

What types of results are included (e.g., table, figure)?

results/efo-mondo-map.tsv

What is your summary of the results?

  1. Diffuse hemispheric glioma NA NA NA - No codes were found, and need to be added manually. - this is a new entity. use MONDO:0016680 for now —> Done.
  2. Epstein-Barr virus-related tumor MONDO_0017342 MONDO_0017342 NA - NCIT code is pending. - OK. —> Thanks for confirming.
  3. Extraventricular neurocytoma EFO_1000856 MONDO_0016727 NCIT_C92555 - EFO code needs review. - MONDO_0016727 and NCIT_C92555 correct. EFO not correct —> Replaced EFO with the MONDO code since it was incorrect and exact EFO code is not available.
  4. Glial-neuronal tumor NA NA NA - No codes were found, and need to be added manually. - NCIT_C4747 —> Updated NCIT code, EFO and MONDO are still NA.
  5. Mesenchymal tumor EFO_1000473 MONDO_0003512 NCIT_C7059 - needs review because description for each of the codes is not necessarily the same. - NCIT:C7059 seems best here
  6. Other EFO_0030051 GO_0051707 NCIT_C17649 - needs review since this is not specific at all. - this should be removed totally, will update PR prior to this —> Removed
  7. Perineuroma NA NA NA - was added but is missing all codes, therefore needs review. - MONDO:0019404 —> Updated EFO and MONDO with the same code, and added the corresponding NCIT.
  8. Pheochromocytoma and Paraganglioma EFO_0020005 MONDO_0035540 NA - NCIT code is pending. - OK to be blank NCIT —> Thanks for confirming,
  9. SEGA NA NA NA - was added but is missing all codes, therefore needs review. - NCIT:C3696 and will update PR to use long name —> Updated with long name, and found EFO, MONDO using this NCIT code.
  10. High-grade glioma and Low-grade glioma need additional review since High-grade glioma/astrocytoma and Low-grade glioma/astrocytoma are also included, although the codes are different.
  11. T Acute Lymphoblastic Leukemia/Lymphoma EFO_0000209 MONDO_0004963 NCIT_C3183 EFO_1001936 MONDO_0003537 NCIT_C8694 The codes need to be re-visited. Alternatives are suggested here based on the automated search results captured in results/efo-mondo-map-prefill-auto.tsv - use these: EFO_0000209 MONDO_0004963 NCIT_C3183 —> No update necessary.
  12. Uterine Corpus Endometrial Carcinoma EFO_0007532 MONDO_0000553 NCIT_C159413 EFO_1000238 MONDO_0000553 NCIT_C159413 - EFO code seems to have changed, needs review. - http://purl.obolibrary.org/obo/NCIT_C159413 this seems to be nothing. Use EFO_0007532 MONDO_0000553 —> No update necessary.

Reproducibility Checklist

  • The dependencies required to run the code in this pull request have been added to the project Dockerfile.
  • This analysis has been added to continuous integration.

Documentation Checklist

  • This analysis module has a README and it is up to date.
  • This analysis is recorded in the table in analyses/README.md and the entry is up to date.
  • The analytical code is documented and contains comments.

@ewafula
Copy link

ewafula commented Apr 10, 2023

@sangeetashukla, can you update final efo-modo-map.tsv for this two cancer_groups to have the codes. Please replace the EFO/MONDO NA values in Glial-neuronal tumor with code EFO/MONDO codes in Glial-neuronal tumor NOS

Glial-neuronal tumor  NA   NA   NCIT_C4747
Glial-neuronal tumor NOS    MONDO_0016729  MONDO_0016729  NCIT_C4747

@sangeetashukla
Copy link
Collaborator Author

@sangeetashukla, can you update final efo-modo-map.tsv for this two cancer_groups to have the codes. Please replace the EFO/MONDO NA values in Glial-neuronal tumor with code EFO/MONDO codes in Glial-neuronal tumor NOS

Glial-neuronal tumor  NA   NA   NCIT_C4747
Glial-neuronal tumor NOS    MONDO_0016729  MONDO_0016729  NCIT_C4747

@ewafula Done.

@ewafula
Copy link

ewafula commented Apr 25, 2023

@sangeetashukla, we have these 7 cancer_group coming in the ongoing updates to the histologies that will be ready soon. Please update the efo-mondo mapping to include them.

> hist <- read_tsv("../data/histologies.tsv") |> filter(sample_type == "Tumor") |> pull(cancer_group) |> unique()
> efo_mondo <- read_tsv("../data/efo-mondo-map.tsv") |> pull(cancer_group) |> unique()
> setdiff(hist, efo_mondo)
[1] "Pilocytic astrocytoma"       NA                 
[3] "Glioblastoma"           "CNS Burkitt's lymphoma"      
[5] "Atypical choroid plexus papilloma" "Astrocytoma"            
[7] "Diffuse fibrillary astrocytoma"  "Astroblastoma" 

@sangeetashukla
Copy link
Collaborator Author

sangeetashukla commented Apr 26, 2023

@ewafula

This PR has been updated as below for the new cancer_groups.

Astroblastoma MONDO_0016707 MONDO_0016707 NCIT_C4324
Astrocytoma EFO_0000272 MONDO_0019781 NCIT_C6958
Atypical choroid plexus papilloma MONDO_0002684 MONDO_0002684 NCIT_C53686
CNS Burkitt's lymphoma EFO_0000309 MONDO_0007243 NCIT_C2912
Diffuse fibrillary astrocytoma MONDO_0016688 MONDO_0016688 NCIT_C4322
Glioblastoma MONDO_0018177 MONDO_0018177 NCIT_C3058
Pilocytic astrocytoma MONDO_0016691 MONDO_0016691 NCIT_C4047

Note 1 : Burkitt Leukemia/Lymphoma exists and is the same as CNS Burkitt's lymphoma. One of them should be removed. I was not able to find anything specific for ‘CNS Burkett’s lymphoma’

Note 2: Below QC failures were found. Once the histologies.tsv file is finalized the module will be run again.

[1] FALSE
      
Pilocytic astrocytoma

CNS Burkitt's lymphoma

Diffuse fibrillary astrocytoma

Gliosarcoma

Neuroepithelial neoplasm

Note 3: I updated both results/efo-mondo-prefill.tsv and results/efo-mondo-map.tsv to match.

cc: @jharenza @chinwallaa in case you want to QC the newly added codes.

@ewafula
Copy link

ewafula commented Apr 26, 2023

@sangeetashukla, you need to clean up the efo-mondo-map.tsv. Seem these new entries were just pasted in, not tab-separated.

> read_tsv("../../data/efo-mondo-map.tsv") |> filter(grepl("Pilocytic|Glioblastoma|Burkitt|papilloma|Astrocytoma|Astroblastoma", cancer_group))
# A tibble: 11 x 4
   cancer_group                                          efo_c~1 mondo~2 ncit_~3
   <chr>                                                 <chr>   <chr>   <chr>  
 1 Astroblastoma   MONDO_0016707   MONDO_0016707   NCIT~ NA      NA      NA     
 2 Astrocytoma     EFO_0000272     MONDO_0019781   NCIT~ NA      NA      NA     
 3 Atypical choroid plexus papilloma       MONDO_000268~ NA      NA      NA     
 4 Burkitt Leukemia/Lymphoma                             EFO_00~ MONDO_~ NCIT_C~
 5 Choroid plexus papilloma                              EFO_10~ MONDO_~ NCIT_C~
 6 CNS Burkitt's lymphoma  EFO_0000309     MONDO_000724~ NA      NA      NA     
 7 Glioblastoma    MONDO_0018177   MONDO_0018177   NCIT~ NA      NA      NA     
 8 Glioblastoma Multiforme                               EFO_00~ MONDO_~ NCIT_C~
 9 Pilocytic astrocytoma   MONDO_0016691   MONDO_001669~ NA      NA      NA     
10 Subependymal Giant Cell Astrocytoma                   MONDO_~ MONDO_~ NCIT_C~
11 Subependymal Giant Cell Astrocytoma                   MONDO_~ MONDO_~ NCIT_C~
# ... with abbreviated variable names 1: efo_code, 2: mondo_code, 3: ncit_code

@sangeetashukla
Copy link
Collaborator Author

@sangeetashukla, you need to clean up the efo-mondo-map.tsv. Seem these new entries were just pasted in, not tab-separated.

> read_tsv("../../data/efo-mondo-map.tsv") |> filter(grepl("Pilocytic|Glioblastoma|Burkitt|papilloma|Astrocytoma|Astroblastoma", cancer_group))
# A tibble: 11 x 4
   cancer_group                                          efo_c~1 mondo~2 ncit_~3
   <chr>                                                 <chr>   <chr>   <chr>  
 1 Astroblastoma   MONDO_0016707   MONDO_0016707   NCIT~ NA      NA      NA     
 2 Astrocytoma     EFO_0000272     MONDO_0019781   NCIT~ NA      NA      NA     
 3 Atypical choroid plexus papilloma       MONDO_000268~ NA      NA      NA     
 4 Burkitt Leukemia/Lymphoma                             EFO_00~ MONDO_~ NCIT_C~
 5 Choroid plexus papilloma                              EFO_10~ MONDO_~ NCIT_C~
 6 CNS Burkitt's lymphoma  EFO_0000309     MONDO_000724~ NA      NA      NA     
 7 Glioblastoma    MONDO_0018177   MONDO_0018177   NCIT~ NA      NA      NA     
 8 Glioblastoma Multiforme                               EFO_00~ MONDO_~ NCIT_C~
 9 Pilocytic astrocytoma   MONDO_0016691   MONDO_001669~ NA      NA      NA     
10 Subependymal Giant Cell Astrocytoma                   MONDO_~ MONDO_~ NCIT_C~
11 Subependymal Giant Cell Astrocytoma                   MONDO_~ MONDO_~ NCIT_C~
# ... with abbreviated variable names 1: efo_code, 2: mondo_code, 3: ncit_code

Looking into it right away.

@jharenza
Copy link
Member

Note 1 : Burkitt Leukemia/Lymphoma exists and is the same as CNS Burkitt's lymphoma. One of them should be removed. I was not able to find anything specific for ‘CNS Burkett’s lymphoma’

These are different, as one is peripheral and one is CNS, so they should remain separate.

@jharenza
Copy link
Member

@sangeetashukla
Copy link
Collaborator Author

@sangeetashukla, you need to clean up the efo-mondo-map.tsv. Seem these new entries were just pasted in, not tab-separated.

> read_tsv("../../data/efo-mondo-map.tsv") |> filter(grepl("Pilocytic|Glioblastoma|Burkitt|papilloma|Astrocytoma|Astroblastoma", cancer_group))
# A tibble: 11 x 4
   cancer_group                                          efo_c~1 mondo~2 ncit_~3
   <chr>                                                 <chr>   <chr>   <chr>  
 1 Astroblastoma   MONDO_0016707   MONDO_0016707   NCIT~ NA      NA      NA     
 2 Astrocytoma     EFO_0000272     MONDO_0019781   NCIT~ NA      NA      NA     
 3 Atypical choroid plexus papilloma       MONDO_000268~ NA      NA      NA     
 4 Burkitt Leukemia/Lymphoma                             EFO_00~ MONDO_~ NCIT_C~
 5 Choroid plexus papilloma                              EFO_10~ MONDO_~ NCIT_C~
 6 CNS Burkitt's lymphoma  EFO_0000309     MONDO_000724~ NA      NA      NA     
 7 Glioblastoma    MONDO_0018177   MONDO_0018177   NCIT~ NA      NA      NA     
 8 Glioblastoma Multiforme                               EFO_00~ MONDO_~ NCIT_C~
 9 Pilocytic astrocytoma   MONDO_0016691   MONDO_001669~ NA      NA      NA     
10 Subependymal Giant Cell Astrocytoma                   MONDO_~ MONDO_~ NCIT_C~
11 Subependymal Giant Cell Astrocytoma                   MONDO_~ MONDO_~ NCIT_C~
# ... with abbreviated variable names 1: efo_code, 2: mondo_code, 3: ncit_code

Looking into it right away.

@ewafula all fixed.

@sangeetashukla
Copy link
Collaborator Author

Pilocytic astrocytoma: https://www.ebi.ac.uk/ols4/ontologies/ordo/classes/http%253A%252F%252Fwww.orpha.net%252FORDO%252FOrphanet_251612

Using this Orphanet_251612 as EFO code for Pilocytic astrocytoma.

CNS Burkitt's lymphoma - we will keep separate in our tables, but seems there is no distinguishing code, so you can use the other code for this one as well?
@jharenza I am currently using the other existing code for the newly added CNS Burkett's lymphoma as well, since as you saw, I was also not able to find a distinguishing code.

Diffuse fibrillary astrocytoma: https://www.ebi.ac.uk/ols4/ontologies/mondo/classes/http%253A%252F%252Fpurl.obolibrary.org%252Fobo%252FMONDO_0016688

Already using this code.

Gliosarcoma: https://www.ebi.ac.uk/ols4/ontologies/efo/classes/http%253A%252F%252Fwww.ebi.ac.uk%252Fefo%252FEFO_1001465

Added

Neuroepithelial neoplasm: https://www.ebi.ac.uk/ols4/ontologies/mondo/classes/http%253A%252F%252Fpurl.obolibrary.org%252Fobo%252FMONDO_0021193
Added

@sangeetashukla
Copy link
Collaborator Author

@ewafula This PR is ready for a final review.

Copy link

@ewafula ewafula left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, @sangeetashukla. All cancer_group in the final histologies in now in the efo-mondo mapping. I have uploaded the file to s3 and can overwrite in case additional updates are needed.

@ewafula ewafula merged commit 5020ea4 into v12-post-release Apr 27, 2023
@sangeetashukla sangeetashukla deleted the v12-efo-final branch April 27, 2023 15:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants