Skip to content
This repository has been archived by the owner on Jun 21, 2023. It is now read-only.

Addition of cnv data to oncoprint #182

Closed
wants to merge 3 commits into from

Conversation

cbethell
Copy link
Contributor

Purpose/implementation

This PR incorporates the cnv data into the data represented on the oncoprint produced in PR #176.

It builds onto the 01-plot-oncoprint.R and run-oncoprint.sh scripts in the oncoprint-landscape module.

Issue

This PR addresses issue #6 on producing an oncoprint that displays the landscape of genetic lesions across PBTA.

Directions for reviewers

  • In this PR, I changed the option to removeNonMutated from FALSE to TRUE so that the Tumor_Sample_Barcodes would be readable on the plot. Should I set an optparse option to toggle this off and on? Should I leave it as is or set it back to FALSE?
  • I determined the threshold for distinguishing amplifications, homozygous deletions, and hemizygous deletions using the CNVkit documentation here. However, their threshold was too low to display amplifications in the selected genes, so I made the thresholds optparse options and used a cutoff of 0.2 instead of 0.5 to produce the current plot. Should I change this threshold? Perhaps this question may be better answered once we receive the consensus calls, as mentioned on the previous PR Addition of script to produce oncoprint #176.
  • This script has been getting a bit lengthy, should I now break it up into multiple scripts? If so, what would be your recommendation for the most logical way to do so?

Results

See the current plot below:
maf_oncoprint

Docker and continuous integration

  • The dependencies required to run the code in this pull request have been added to the project Dockerfile.
  • This analysis has been added to continuous integration.

PR Checklist

  • Run a linter
  • Set the seed- Not needed.
  • Comments and/or documentation up to date
  • Double check your paths
  • Spell check any Rmd file or md file
  • Restart R and run all notebooks fresh and save

cbethell and others added 2 commits October 29, 2019 16:04
- added optparse options to include `low_segmean_cutoff` and `high_segmean_cutoff` (both needed to distinguish amplifications, homozygous deletions, and hemizygous deletions)
- changed the color for `Nonsense_Mutation` from black to dark blue in `oncoplot-palette.R`
- changed `removeNonMutated` option in `oncoplot` function from FALSE to TRUE in order to make `Tumor_Sample_Barcodes`s more readable on the plot
@jaclyn-taroni jaclyn-taroni marked this pull request as ready for review October 29, 2019 20:54
@jaclyn-taroni jaclyn-taroni changed the title In Progress: Addition of cnv data to oncoprint Addition of cnv data to oncoprint Oct 29, 2019
Also use CNVkit symlink
Copy link
Member

@jaclyn-taroni jaclyn-taroni left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure what's going on with the out of memory issue in CI, I have not gotten to the bottom of it. Having taken a look at this, the "annotating SEG files with gene symbols" step should probably be broken out into its own script.

Comment on lines +144 to +150
select_maf_df <- maf_df %>%
dplyr::select(Tumor_Sample_Barcode, Hugo_Symbol)

# Add gene information to the cnv dataframe
cnv_df <- cnv_file %>%
dplyr::inner_join(select_maf_df, by = c("ID" = "Tumor_Sample_Barcode")) %>%
dplyr::distinct()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand this section correctly, you are joining the gene symbols from the MAF file to the CNV file using the sample identifier. I think what you want to do is add gene symbols to the SEG file using the genome coordinates that are already in the SEG file (a very brief google suggests bedtools might have functionality to accomplish this and GenomicFeatures also comes to mind). This step should probably be upstream of the plotting, say 01-prepare-cn-for-oncoplot.R, where this script (02-plot-oncoprint.R) then accepts the file that already contains gene symbols. You could potentially do the low_segmean_cutoff and high_segmean_cutoff steps in the preparation step as well. Note hg38 was the build used: https://github.com/AlexsLemonade/OpenPBTA-manuscript/blob/master/content/03.methods.md#somatic-copy-number-variant-calling

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cbethell - I have some code in which I created the focal copy number for oncoprints from seg file here if you want to take a look. Sorry it is a bit messy (and still has old GISTIC code - did not have time to clean that up yet, but just posted so you can see it). Note, as @jaclyn-taroni said, that these old data were hg19 and the PBTA data are hg38. The seg file for input to this is here and the output is focal copy number lesions used for oncoprints here. Note, this was also done using array data, not NGS, so we may have to figure out thresholds for cutoffs for accurate assessment of focal CN based on LRR (may advise starting with ATRT's and SMARCB1 deletions and we may not be able to do homozygous/hemizygous, but rather just deletions/amplifications - not sure - haven't dug into the data too much). I ended up doing a lot of manual inspection for driver genes, hence the re-coding in the script. Maybe this should be its own issue?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for sharing that @jharenza


# Remove NA
cnv_file <- cnv_df %>%
dplyr::filter(!is.na(Variant_Classification))
}

# Read in fusion file
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you're looking for another way to split things up, you could also break the fusion preparation into its own script (that's probably a separate, downstream PR). Imagine a situation where the MAF file changes but the fusion prep and the CN prep do not, then you could just rerun the plotting part without having to manipulate the fusion and CN files again.

@jaclyn-taroni
Copy link
Member

Although, this probably will be broader than just gene symbols.

@cbethell
Copy link
Contributor Author

I am going to close this PR. The plan is to open a separate PR related specifically to the CN matrix described in @jharenza's comment here.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants