This analysis evaluates chromosomal instability by calculating chromosomal break point density calculations and circular plot visualization using CNV and SV data.
Table of Contents generated with DocToc
This analysis can be run via the command line from the top directory of the repository as follows:
bash analyses/chromosome-instability/run_breakpoint_analysis.sh
00-setup-breakpoint-data.R
CNV and SV data are transformed into single breakpoint data as well as their intersection. These three datasets (intersection, CNV, and SV) are used to calculate genome wide breakpoint densities for each sample which are saved to three_breaks_densities.tsv
files. Note that this script is set up to handle WXS and WGS separately, however currently our PBTA dataset only has CNV and SV data for WGS samples. Additionally, abreakpoint-data/breaks_lists.RDS
file is saved which contains three data.frames:
intersection_of_breaks
contains the intersection break counts for both SV and CNV break data.cnv_breaks
contains the number of break counts for CNV.sv_breaks
contains the number of break counts for SV.
-
01-localization-of-breakpoints.Rmd
uses the data inbreaks_lists.RDS
to co-localize and map breakpoints by bins across the genome. These binned breakpoint counts are calculated by sample as well as by histology group.
Bins are created usingGenomicRanges::tileGenome
using a one Mb window size. Genome bins above a percentage (default is 75%) of their total size being covered in uncallable regions are called as NA for all output statistics.
The output of this notebook is three_binned_breakpoint_counts.tsv"
for each dataset, and an RDS file with the histology binned data:histology_breakpoint_densities.RDS
. -
02a-plot-chr-instability-heatmaps.Rmd
uses_binned_breakpoint_counts.tsv"
datasets to create three heatmaps for the intersection, CNV, and SV data respectively.NA
regions are gray. -
02b-plot-chr-instability-by-histology.Rmd
uses the_breaks_densities.tsv
files andhistology_breakpoint_binned_counts.RDS
to plot breakpoint densities byshort_histology
group.
For breakpoint analysis:
make_granges
: Given a data.frame with chr break coordinates, make aGenomicRanges
object.break_density
: Given data.frame(s) with chr break coordinates, calculate the density of the breaks.map_breaks_plot
: Given aGenomicRanges
object, use map the chromosomal coordinates to aggplot2
multipanel_break_plot
: Given a list ofGenomicRanges
objects, plot them in a combinedcowplot
.breaks_cdf_plot
: Given a genome wide breaks density file path, plot the CDF distribution for it by histology.