-
Notifications
You must be signed in to change notification settings - Fork 67
Conversation
- added optparse options to include `low_segmean_cutoff` and `high_segmean_cutoff` (both needed to distinguish amplifications, homozygous deletions, and hemizygous deletions) - changed the color for `Nonsense_Mutation` from black to dark blue in `oncoplot-palette.R` - changed `removeNonMutated` option in `oncoplot` function from FALSE to TRUE in order to make `Tumor_Sample_Barcodes`s more readable on the plot
Also use CNVkit symlink
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure what's going on with the out of memory issue in CI, I have not gotten to the bottom of it. Having taken a look at this, the "annotating SEG files with gene symbols" step should probably be broken out into its own script.
select_maf_df <- maf_df %>% | ||
dplyr::select(Tumor_Sample_Barcode, Hugo_Symbol) | ||
|
||
# Add gene information to the cnv dataframe | ||
cnv_df <- cnv_file %>% | ||
dplyr::inner_join(select_maf_df, by = c("ID" = "Tumor_Sample_Barcode")) %>% | ||
dplyr::distinct() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I understand this section correctly, you are joining the gene symbols from the MAF file to the CNV file using the sample identifier. I think what you want to do is add gene symbols to the SEG file using the genome coordinates that are already in the SEG file (a very brief google suggests bedtools might have functionality to accomplish this and GenomicFeatures
also comes to mind). This step should probably be upstream of the plotting, say 01-prepare-cn-for-oncoplot.R
, where this script (02-plot-oncoprint.R
) then accepts the file that already contains gene symbols. You could potentially do the low_segmean_cutoff
and high_segmean_cutoff
steps in the preparation step as well. Note hg38
was the build used: https://github.com/AlexsLemonade/OpenPBTA-manuscript/blob/master/content/03.methods.md#somatic-copy-number-variant-calling
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cbethell - I have some code in which I created the focal copy number for oncoprints from seg file here if you want to take a look. Sorry it is a bit messy (and still has old GISTIC code - did not have time to clean that up yet, but just posted so you can see it). Note, as @jaclyn-taroni said, that these old data were hg19 and the PBTA data are hg38. The seg file for input to this is here and the output is focal copy number lesions used for oncoprints here. Note, this was also done using array data, not NGS, so we may have to figure out thresholds for cutoffs for accurate assessment of focal CN based on LRR (may advise starting with ATRT's and SMARCB1 deletions and we may not be able to do homozygous/hemizygous, but rather just deletions/amplifications - not sure - haven't dug into the data too much). I ended up doing a lot of manual inspection for driver genes, hence the re-coding in the script. Maybe this should be its own issue?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for sharing that @jharenza
|
||
# Remove NA | ||
cnv_file <- cnv_df %>% | ||
dplyr::filter(!is.na(Variant_Classification)) | ||
} | ||
|
||
# Read in fusion file |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you're looking for another way to split things up, you could also break the fusion preparation into its own script (that's probably a separate, downstream PR). Imagine a situation where the MAF file changes but the fusion prep and the CN prep do not, then you could just rerun the plotting part without having to manipulate the fusion and CN files again.
Although, this probably will be broader than just gene symbols. |
Purpose/implementation
This PR incorporates the cnv data into the data represented on the oncoprint produced in PR #176.
It builds onto the
01-plot-oncoprint.R
andrun-oncoprint.sh
scripts in the oncoprint-landscape module.Issue
This PR addresses issue #6 on producing an oncoprint that displays the landscape of genetic lesions across PBTA.
Directions for reviewers
removeNonMutated
fromFALSE
toTRUE
so that theTumor_Sample_Barcode
s would be readable on the plot. Should I set an optparse option to toggle this off and on? Should I leave it as is or set it back toFALSE
?Results
See the current plot below:
Docker and continuous integration
PR Checklist