-
Notifications
You must be signed in to change notification settings - Fork 24
/
README.Rmd
98 lines (74 loc) · 4.18 KB
/
README.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
# MutationTimeR
MutationTimeR is an R package to time somatic mutations relative to clonal and subclonal copy number states and calculate
the relative timing of copy number gains. Time is measured as a fraction of point mutations; this is termed _mutation time_. Mutation time
is proportional to real time if the number of mutations acquired per bp and per year is constant.
MutationTimeR has been used by the PCAWG consortium to calculate mutation times of 2,778 whole genome sequencing samples.
Please see M. Gerstung, C. Jolly, I. Leshchiner, S. Dentro, S. Gonzalez _et al._, [The Evolutionary History of 2,658 Cancers](https://doi.org/10.1038/s41586-019-1907-7), _Nature_. *578*, pages 122-128(2020).
## Installation
MutationTimeR runs in most current `R` versions. You can install and load MutationTimeR using
```{R, eval=FALSE}
devtools::install_github("mg14/mg14")
devtools::install_github("gerstung-lab/MutationTimeR")
```
## Input data
MutationTimeR requires a `vcf` object containing point mutations (SNVs, MNVs, or indels) and a `GRanges` object with the copy number segments of the sample.
```{R}
library("MutationTimeR")
#vcf <- readVcf("myvcf.vcf") # Point mutations, needs `geno` entries `AD` and `DP` or `info` columns t_alt_count t_ref_count.
#bb <- GRanges(, major_cn= , minor_cn=, clonal_frequency=purity) # Copy number segments, needs columns major_cn, minor_cn and clonal_frequency of each segment
#clusters <- data.frame(cluster= , n_ssms=, proportion=) # Optional data.frame with subclonal cluster locations (VAF proportion) and size (number of variants n_ssms)
```
Here we cut this short by
```{R}
data(MutationTimeR)
```
The `vcf` object containing all point mutations is:
```{R}
vcf
```
The `GRanges` with allele-specific copy number data
```{R}
bb
```
Lastly, the prevalence and number of subclonal is
```{R}
clusters
```
## Running MutationTimeR
To run MutationTimeR simply use
```{R}
mt <- mutationTime(vcf, bb, clusters=clusters, n.boot=10)
```
## Annotation of point mutations
MutationTimer produces two main outputs. The first is the probability for each individual point mutation from the original `vcf` object to belong to different copy number levels:
```{R}
head(mt$V)
```
These probabilities are the basis for the following simple clonal states (`early clonal/late clonal/clonal/subclonal/NA`)
```{R}
table(mt$V$CLS)
```
To add this annotation to the vcf use
```{R}
vcf <- addMutTime(vcf, mt$V)
```
## Timings of copy number gains
The proportion of point mutations in different copy number states, in turn, defines the molecular timing of gains. These are stored in the following `DataFrame`:
```{R}
head(mt$T)
```
The relevant columns are `time` with 95% confidence intervals `time.lo` and `time.hi` as well as the counterparts `time2/time2.lo/time2.hi` for the second gain in cases where one allele has more than one gained copy.
The field `time.star` indicates a tiers: `***` indicates gains +1. `**` gains +2, which are found to be slightly less reliable and need certain assumptions about their temporal sequence. `*` are subclonal gains which are hit an miss.
The DataFrame can be added to the copy number `GRanges` object for convenience.
```{R}
mcols(bb) <- cbind(mcols(bb),mt$T)
```
## Plot output
Timing annotated VCF and copy number can be plotted using the following command:
```{R, fig.width=6, fig.height=6, fig.width=6}
plotSample(vcf,bb)
```
This shows the observed and expected variant allele frequencies of point mutations on the top. This is very useful to spot inconsistencies with purity and copy number configuration. As a rule of thumb the states (horizontal bars) should run
right through the middle of the clouds of point mutations. Colours indicate the timing category: Blue = clonal [other], purple = clonal [late], green = clonal [early], red = subclonal.
The middle plot shows the copy number as stacked barplots. Subclonal CN is indicated by fractional bars. Dark grey is major, light grey minor allele.
The bottom plot shows the estimated mutation time of primary and secondary gains (shaded). Boxes denote 95% CIs. The histogram at the right shows the distribution of timing events. Blue = mono-allelic gains (N:1), pink = CN-LOH/gain+loss (N:0) and green = bi-allelic gains (N:2).