Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Updated EFO code for Wilms tumor per EFO v45 in prepation for OPC v12… #272

Merged
merged 5 commits into from
Dec 2, 2022

Conversation

sangeetashukla
Copy link
Collaborator

@sangeetashukla sangeetashukla commented Oct 14, 2022

Purpose/implementation Section

EFO code for cancer_group =='Wilms tumor' needs to be updated with the latest value per EFO release version v45.

What scientific question is your analysis addressing?

MTP platform uses a particular EFO release version, and the MTP tables that CHOP shares with FNL must be compatible with that. However, the EFO code for Wilms tumor is not currently compatible with v11 data that we shared. With the expectation that MTP will update itself to EFO release v45 by the time of OPC v12 data release, this PR has the 'Wilms tumor' EFO code updated to resolve the incompatibility.

What was your approach?

Manually update EFO ID in the efo-mondo-mapping/results/efo-mondo-map.tsv file to EFO_1000056 as per EFO release v45.

What GitHub issue does your pull request address?

Issue 420

Directions for reviewers. Tell potential reviewers what kind of feedback you are soliciting.

Which areas should receive a particularly close look?

Note: The module was re-run and I can confirm the qc passed and no other changes are attached to this PR except the EFO ID change.

Is there anything that you want to discuss further?

No

Is the analysis in a mature enough form that the resulting figure(s) and/or table(s) are ready for review?

Yes

Results

What types of results are included (e.g., table, figure)?

results/efo-mondo-map.tsv

Reproducibility Checklist

  • The dependencies required to run the code in this pull request have been added to the project Dockerfile.
  • This analysis has been added to continuous integration.

Documentation Checklist

  • This analysis module has a README and it is up to date.
  • This analysis is recorded in the table in analyses/README.md and the entry is up to date.
  • The analytical code is documented and contains comments.

@zdorman
Copy link

zdorman commented Oct 18, 2022

Hi @sangeetashukla and @ewafula , thanks for working to resolve this. I want to make some clarifications, since the current approach will not fix the primary issue that ChoP Wilms tumor evidence will not load into the MTP. I see two separate discussions here:

  1. Versioning
    The issue is not that v11 and v12 EFO IDs for Wilm's tumor are mismatched. The issue is that the ID:MONDO_0006085 name:Wilms tumor used in the v11 is not found within EFO v3.40.0 (which is used by the current MTP). Our immediate ask is for a quick "find-and-replace" fix and resubmission using an ID found within EFO v3.40.0 (such as the previously suggested ID:MONDO_0019004 name:kidney Wilms tumor used in v10). Looking forward to future v12 releases compatible with an updated MTP with EFO v3.45.0 is separate, and not the immediate request. In fact, the v11 ID:MONDO_0006085 name:Wilms tumor is compatible with EFO v3.45.0, and maybe should not change for future release. The change request is for the current release.

  2. Obsolete terms
    The new ID suggested in your PR above is ID:EFO_1000056 name:obsolete_Wilms tumor(2). I apologize for not specifying that obsolete IDs within the efo_otar_slim.owl are filtered out of available OT/MTP diseases. Any evidence submitted with IDs for disease names prefixed with obsolete will still not load or be visible within MTP. I'll again suggest using the previous ID:MONDO_0019004 name:kidney Wilms tumor (or perhaps the ID:Orphanet_654 name:Nephroblastoma identified within the metadata of your EFO_1000056 as the replacement term.)

@jharenza
Copy link
Member

Hi @zdorman thanks for the explanation. Question- given the findings with the DGD (CHOP P30 cohort), I think v11 won't be added to the MTP in any state, is that correct? We plan to fix this in v12, which will also include extensive updates to all current cohorts to gencode v39, and addition of about 1000 more PBTA samples.

Given this will be pushed to v12, we suggested your next release be around Feb/March which will allow us time to integrate all of the new data, and your team to update the EFO/MONDO terms. Does that sound like a good plan?

@chinwallaa
Copy link

@zdorman also to add this this thread - we are not planning to update the backend db/API which delivers the plots for MTP, with the v11 data release. We are planning on updating with the v12 release.

@ewafula
Copy link

ewafula commented Oct 18, 2022

Hi @sangeetashukla and @ewafula , thanks for working to resolve this. I want to make some clarifications, since the current approach will not fix the primary issue that ChoP Wilms tumor evidence will not load into the MTP. I see two separate discussions here:

  1. Versioning
    The issue is not that v11 and v12 EFO IDs for Wilm's tumor are mismatched. The issue is that the ID:MONDO_0006085 name:Wilms tumor used in the v11 is not found within EFO v3.40.0 (which is used by the current MTP). Our immediate ask is for a quick "find-and-replace" fix and resubmission using an ID found within EFO v3.40.0 (such as the previously suggested ID:MONDO_0019004 name:kidney Wilms tumor used in v10). Looking forward to future v12 releases compatible with an updated MTP with EFO v3.45.0 is separate, and not the immediate request. In fact, the v11 ID:MONDO_0006085 name:Wilms tumor is compatible with EFO v3.45.0, and maybe should not change for future release. The change request is for the current release.
  2. Obsolete terms
    The new ID suggested in your PR above is ID:EFO_1000056 name:obsolete_Wilms tumor(2). I apologize for not specifying that obsolete IDs within the efo_otar_slim.owl are filtered out of available OT/MTP diseases. Any evidence submitted with IDs for disease names prefixed with obsolete will still not load or be visible within MTP. I'll again suggest using the previous ID:MONDO_0019004 name:kidney Wilms tumor (or perhaps the ID:Orphanet_654 name:Nephroblastoma identified within the metadata of your EFO_1000056 as the replacement term.)

@zdorman, this PR is not for the update you are suggesting. This is a PR (not reviewed yet) for v12 onwards for OPC efo-mondo module. @sangeetashukla is still working on the PR for the changes you need for v11. I think the updates for mutation tables are done. She is now finishing up the tpm plots table that you use internally for your QC. We will upload the tables on s3 bucket for your access when she is done; say by tomorrow. Here is the ticket @sangeetashukla is working on: d3b-center/ticket-tracker-OPC#433

@sangeetashukla
Copy link
Collaborator Author

Since the EFO change in this PR is an obsolete code for Wilms tumor, this PR does not need to be merged.

Refer to this ticket for other updates related to EFO code changes to MTP tables.

@zdorman
Copy link

zdorman commented Oct 19, 2022

@ewafula & @sangeetashukla Thanks for the redirect to #433 - I hadn't seen that one before commenting here. That ticket looks like it will address the requested changes to the Somatic Alterations files well. Thank you for blocking the merge of the obsolete term for future releases.

One question - @chinwallaa mentioned that the backend db/API for the Gene Expression (TPM) plots will not be updated, but #433 mentions some changes to the long-tpm jsonl tables used for our QA. Will the long-tpm tables and the db/API remain consistent? Regardless of update status, we want the tables to be an accurate representation of the db.

@jharenza I see the disconnect now. We understand that the CHOP P30 DGD Panel-DNA is problematic, and will not release that on MTP before it is fixed in a future OPC release. We are hoping to get a "v11.1" (or any other naming scheme you prefer) containing our requested data fixes along with the other two panels (TARGET Panel-DNA & CHOP P30 DGD Panel-RNA-Fusion) for a near-term MTP release. We're currently evaluating level of effort for updating MTP to the new OT build (which uses EFO v3.45.0), but I expect that it will be up by early 2023 when OPC v12 is ready.

@chinwallaa
Copy link

chinwallaa commented Oct 19, 2022

@zdorman can you provide the filtered EFO v3.40.0, and EFO v3.45.0 - you had indicated that OT does another layer of filtering based on the EFO version that it uses. We should sync up to the final filtered version that is being used in OT. re: the db/API for the minor v11 release - we are planning on providing the tpm files as requested. We can review need/timelines for the API dev/prod for the v11 minor release (+ ~4-5 weeks ) vs v12 release - The v11 minor release staged for MTP (tables/files) includes additional - PBTA data 11 samples, DGD fusion-pannel-data - 870 samples , TARGET-DNA-pannel - 998 samples, PBTA-DNA-Pannel - 2 samples, AND will exclude DGD-DNA-pannels (929 samples)

@zdorman
Copy link

zdorman commented Oct 19, 2022

The best way to ensure compatibility is to use the OT database directly as previously suggested. Their versioned files are publicly accessible via FTP here: http://ftp.ebi.ac.uk/pub/databases/opentargets/platform/

OT 22.04 (current MTP using EFO v3.40.0): http://ftp.ebi.ac.uk/pub/databases/opentargets/platform/22.04/output/etl/json/diseases/
OT 22.09 (future MTP using EFO v3.45.0): http://ftp.ebi.ac.uk/pub/databases/opentargets/platform/22.09/output/etl/json/diseases/

The folder contains multiple jsonl files (though labeled as json). Feel free to incorporate as best works for you, but one method of loading to a df using python pandas is:

import pandas as pd
import glob
path = 'path/to/OT/folder/'

# Create list of all files within path folder. Note that OT uses 'json' extension for 'jsonl' files
OT_files = glob.glob(path + '*.json')

# Build df by combining all files in path folder
df = pd.concat(
    (pd.read_json(f, orient='records', lines=True)
    for f in OT_files))

@ewafula ewafula merged commit 7dbf6f4 into dev Dec 2, 2022
@sangeetashukla sangeetashukla deleted the update_EFO_module branch December 5, 2022 12:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants