Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove pmtl 1 #290

Merged
merged 14 commits into from
Jan 9, 2023
Merged

Remove pmtl 1 #290

merged 14 commits into from
Jan 9, 2023

Conversation

sangeetashukla
Copy link
Collaborator

@sangeetashukla sangeetashukla commented Nov 9, 2022

Purpose/implementation Section

Remove all annotation references to PMTL for the corresponding Gene_Ensemble_ID or Ensemble_ID

What scientific question is your analysis addressing?

  • Update OpenPedCan modules to exclude PMTL annotation
  • This PR only updates the below modules
  1. cnv-frequencies
  2. fusion-frequencies
  3. rna-seq-expression-summary-stats
  4. snv-frequencies

What was your approach?

  • Scan through all script files and README in the modules to search for PMTL annotation references and exclude it
  • Ensure data integrity for any other annotations processed

What GitHub issue does your pull request address?

Issue 447

Directions for reviewers. Tell potential reviewers what kind of feedback you are soliciting.

Please review the script changes and note any potential errors that may arise from missing or additional commas, or open/unmatched braces/brackets since the PMTL annotation reference is removed.

Which areas should receive a particularly close look?

All changes were validated with a successful run on EC2 instance, and the PR includes the result files. However, some of those files are bigger and can not be handled within a Git PR.
@ewafula As discussed, I ran the modules on EC2 using the scripts/generate-mtp-tables.sh, and had commented out the line to upload to S3. This PR will need to be run again to perform that upload when required.

Is there anything that you want to discuss further?

Discuss which result files can be merged with this PR, and which are to be uploaded directly to S3, and therefore removed from this PR.

Is the analysis in a mature enough form that the resulting figure(s) and/or table(s) are ready for review?

Yes

Results

Yes, however, the PR is to be merged after v12 release. As mentioned above, some of the result files (from cnv-frequencies and rna-seq-expression-summary-stats module showed warnings due to big files) must be directly uploaded to S3, and the other results are included in the PR.

What types of results are included (e.g., table, figure)?

What is your summary of the results?

Reproducibility Checklist

  • The dependencies required to run the code in this pull request have been added to the project Dockerfile.
  • This analysis has been added to continuous integration.

Documentation Checklist

  • This analysis module has a README and it is up to date.
  • This analysis is recorded in the table in analyses/README.md and the entry is up to date.
  • The analytical code is documented and contains comments.

Copy link

@ewafula ewafula left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sangeetashukla, thank you for updating!

I suggest you regenerate results for the three modules you have updated (sv, cnv, and tpm) to help validate that PMTL dignations are no longer added to the MTP tables and that the annotator module works following the update to exclude the PMTL function.

I would recommend including the annotator module update in this PR to help with single test run for MTP tables using the scripts/generate-mtp-tables.sh. Remember to comment off the s3 bucket upload commands when testing and also to include the fusion-frequencies module, which is currently not updated in this PR.

Other pending modules that require PMTL updates:

  • gene_match module
  • create-subset-files module - utilizes ensg-hugo-pmtl-mapping.tsv produced by the gene_match module. The file will now be called ensg-hugo-mapping.tsv after you have updated the gene_match module to exclude PMTL designations.

Copy link

@ewafula ewafula left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sangeetashukla, thank you for updating.

This PR should not be merged until after the v12 data release, inlcuding updated and renamed ensg-hugo-pmtl-mapping.tsv to ensg-hugo-mapping.tsv

GA checks are failing because of missing CI testing data missing ensg-hugo-mapping.tsv file.

@sangeetashukla
Copy link
Collaborator Author

@ewafula Per conversation here, I have un-done some changes in this PR. Please review when you can.

@ewafula ewafula merged commit 44067b0 into dev Jan 9, 2023
@jharenza jharenza deleted the remove-PMTL-1 branch February 19, 2023 02:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants