Addition of cnv data to oncoprint #182

cbethell · 2019-10-29T20:14:54Z

Purpose/implementation

This PR incorporates the cnv data into the data represented on the oncoprint produced in PR #176.

It builds onto the 01-plot-oncoprint.R and run-oncoprint.sh scripts in the oncoprint-landscape module.

Issue

This PR addresses issue #6 on producing an oncoprint that displays the landscape of genetic lesions across PBTA.

Directions for reviewers

In this PR, I changed the option to removeNonMutated from FALSE to TRUE so that the Tumor_Sample_Barcodes would be readable on the plot. Should I set an optparse option to toggle this off and on? Should I leave it as is or set it back to FALSE?
I determined the threshold for distinguishing amplifications, homozygous deletions, and hemizygous deletions using the CNVkit documentation here. However, their threshold was too low to display amplifications in the selected genes, so I made the thresholds optparse options and used a cutoff of 0.2 instead of 0.5 to produce the current plot. Should I change this threshold? Perhaps this question may be better answered once we receive the consensus calls, as mentioned on the previous PR Addition of script to produce oncoprint #176.
This script has been getting a bit lengthy, should I now break it up into multiple scripts? If so, what would be your recommendation for the most logical way to do so?

Results

See the current plot below:

Docker and continuous integration

The dependencies required to run the code in this pull request have been added to the project Dockerfile.
This analysis has been added to continuous integration.

PR Checklist

Run a linter
Set the seed- Not needed.
Comments and/or documentation up to date
Double check your paths
Spell check any Rmd file or md file
Restart R and run all notebooks fresh and save

- added optparse options to include `low_segmean_cutoff` and `high_segmean_cutoff` (both needed to distinguish amplifications, homozygous deletions, and hemizygous deletions) - changed the color for `Nonsense_Mutation` from black to dark blue in `oncoplot-palette.R` - changed `removeNonMutated` option in `oncoplot` function from FALSE to TRUE in order to make `Tumor_Sample_Barcodes`s more readable on the plot

Also use CNVkit symlink

jaclyn-taroni

Not sure what's going on with the out of memory issue in CI, I have not gotten to the bottom of it. Having taken a look at this, the "annotating SEG files with gene symbols" step should probably be broken out into its own script.

jaclyn-taroni · 2019-10-29T22:55:26Z

analyses/oncoprint-landscape/01-plot-oncoprint.R

+  select_maf_df <- maf_df %>%
+    dplyr::select(Tumor_Sample_Barcode, Hugo_Symbol)
+
+  # Add gene information to the cnv dataframe
+  cnv_df <- cnv_file %>%
+    dplyr::inner_join(select_maf_df, by = c("ID" = "Tumor_Sample_Barcode")) %>%
+    dplyr::distinct()


If I understand this section correctly, you are joining the gene symbols from the MAF file to the CNV file using the sample identifier. I think what you want to do is add gene symbols to the SEG file using the genome coordinates that are already in the SEG file (a very brief google suggests bedtools might have functionality to accomplish this and GenomicFeatures also comes to mind). This step should probably be upstream of the plotting, say 01-prepare-cn-for-oncoplot.R, where this script (02-plot-oncoprint.R) then accepts the file that already contains gene symbols. You could potentially do the low_segmean_cutoff and high_segmean_cutoff steps in the preparation step as well. Note hg38 was the build used: https://github.com/AlexsLemonade/OpenPBTA-manuscript/blob/master/content/03.methods.md#somatic-copy-number-variant-calling

@cbethell - I have some code in which I created the focal copy number for oncoprints from seg file here if you want to take a look. Sorry it is a bit messy (and still has old GISTIC code - did not have time to clean that up yet, but just posted so you can see it). Note, as @jaclyn-taroni said, that these old data were hg19 and the PBTA data are hg38. The seg file for input to this is here and the output is focal copy number lesions used for oncoprints here. Note, this was also done using array data, not NGS, so we may have to figure out thresholds for cutoffs for accurate assessment of focal CN based on LRR (may advise starting with ATRT's and SMARCB1 deletions and we may not be able to do homozygous/hemizygous, but rather just deletions/amplifications - not sure - haven't dug into the data too much). I ended up doing a lot of manual inspection for driver genes, hence the re-coding in the script. Maybe this should be its own issue?

Thank you for sharing that @jharenza

jaclyn-taroni · 2019-10-29T22:57:58Z

analyses/oncoprint-landscape/01-plot-oncoprint.R

+
+  # Remove NA
+  cnv_file <- cnv_df %>%
+    dplyr::filter(!is.na(Variant_Classification))
 }

 # Read in fusion file


If you're looking for another way to split things up, you could also break the fusion preparation into its own script (that's probably a separate, downstream PR). Imagine a situation where the MAF file changes but the fusion prep and the CN prep do not, then you could just rerun the plotting part without having to manipulate the fusion and CN files again.

jaclyn-taroni · 2019-10-30T12:34:49Z

Although, this probably will be broader than just gene symbols.

cbethell · 2019-10-30T14:12:54Z

I am going to close this PR. The plan is to open a separate PR related specifically to the CN matrix described in @jharenza's comment here.

cbethell and others added 2 commits October 29, 2019 16:04

Merge branch 'master' into add-cnv-oncoprint

bfb40b7

jaclyn-taroni marked this pull request as ready for review October 29, 2019 20:54

jaclyn-taroni changed the title ~~In Progress: Addition of cnv data to oncoprint~~ Addition of cnv data to oncoprint Oct 29, 2019

Use lancet MAF file, it's smaller!

01ec988

Also use CNVkit symlink

jaclyn-taroni reviewed Oct 29, 2019

View reviewed changes

cbethell closed this Oct 30, 2019

jaclyn-taroni mentioned this pull request Oct 30, 2019

Proposed Analysis: map from SEG file to genes (and broader segments) #186

Closed

jharenza mentioned this pull request Nov 3, 2019

add cnv interpretation #216

Merged

jaclyn-taroni mentioned this pull request Nov 3, 2019

SMARCB1 deletions in ATRT with current SEG to gene mapping #217

Closed

2 tasks

cbethell deleted the add-cnv-oncoprint branch December 18, 2019 19:12

jaclyn-taroni mentioned this pull request Jan 23, 2020

Annotated CNVkit data no longer shows SMARCB1 deletions in ATRT #473

Closed

jaclyn-taroni mentioned this pull request Feb 3, 2020

Updated analysis: "more nuanced" copy number calls #502

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Addition of cnv data to oncoprint #182

Addition of cnv data to oncoprint #182

cbethell commented Oct 29, 2019

jaclyn-taroni left a comment

jaclyn-taroni Oct 29, 2019

jharenza Oct 30, 2019

jaclyn-taroni Oct 30, 2019

jaclyn-taroni Oct 29, 2019

jaclyn-taroni commented Oct 30, 2019

cbethell commented Oct 30, 2019

Addition of cnv data to oncoprint #182

Addition of cnv data to oncoprint #182

Conversation

cbethell commented Oct 29, 2019

Purpose/implementation

Issue

Directions for reviewers

Results

Docker and continuous integration

PR Checklist

jaclyn-taroni left a comment

Choose a reason for hiding this comment

jaclyn-taroni Oct 29, 2019

Choose a reason for hiding this comment

jharenza Oct 30, 2019

Choose a reason for hiding this comment

jaclyn-taroni Oct 30, 2019

Choose a reason for hiding this comment

jaclyn-taroni Oct 29, 2019

Choose a reason for hiding this comment

jaclyn-taroni commented Oct 30, 2019

cbethell commented Oct 30, 2019