forked from evenrus/mmsig
-
Notifications
You must be signed in to change notification settings - Fork 0
Home
Patrick Blaney edited this page Mar 4, 2022
·
5 revisions
Welcome to the mmsig wiki!
Below is a description of the process used to generate the updated COSMIC v3.1 and v3.2 mutational signature reference data files for use with mmsig.
The process involves basic R and bash commands.
COSMIC mutational signature reference data files found here For this demo, COSMIC SBS v3.2 (most current as of 2021) will be used.
R Packages needed:
- devtools
- mmsig
- readr
devtools::install_github(repo = "pblaney/mmsig")
library(mmsig)
data(signature_ref)
# Note, this command directs the file to be written to your Downloads folder, change if needed
write.table(x = signature_ref, file = "~/Downloads/signature_ref.txt", sep = "\t", col.names = T, row.names = F, quote = F)
Next, using the command line in the same directory as the signature_ref.txt
file from the previous step:
- Download the COSMIC mutational reference data file
wget https://cancer.sanger.ac.uk/signatures/documents/453/COSMIC_v3.2_SBS_GRCh38.txt
- Create new file with the header of the original COSMIC file
head -n 1 COSMIC_v3.2_SBS_GRCh38.txt > COSMIC_v3.2_SBS_GRCh38.sorted.txt
- Build the rest of the new sorted COSMIC file using the order of signature class as set in the
signature_ref.txt
. This order is necessary for the functions of the package to work as intended so this step is the most crucial
for class in `grep -v 'class' signature_ref.txt | cut -f 1`; do classRegex=$(echo ${class} | sed 's|\[|\\[|' | sed 's|\]|\\]|'); grep "${classRegex}" COSMIC_v3.2_SBS_GRCh38.txt; done >> COSMIC_v3.2_SBS_GRCh38.sorted.txt
- Using the first 3 columns of the
signature_ref.txt
file, all the SBS columns in the new sorted COSMIC file, and the SBS-MM1 column from thesignature_ref.txt
, stitch together a full, updated COSMIC mutational signature reference data file
paste <(cut -f 1-3 signature_ref.txt) <(cut -f 2-79 COSMIC_v3.2_SBS_GRCh38.sorted.txt) <(cut -f 13 signature_ref.txt) > COSMIC_v3.2_SBS_plus_MM1_GRCh38.sorted.txt
- Last, since mmsig only expects SBS1, SBS2, SBS5, SBS8, SBS9, SBS13, SBS18, SBS35, SBS84, SBS-MM1, and the first 3 columns to identify the substitution class, create the final signature reference file that will be converted to an R data file
cut -f 1-5,8,14,15,22,28,45,71,82 COSMIC_v3.2_SBS_plus_MM1_GRCh38.sorted.txt > signature_ref_cosmic_v3_2_hg38.txt
Finally, using R/Rstudio, read in the new signature_ref_cosmic_v3_2_hg38.txt
file from the previous step and create an .rda
file for easy use in the mmsig package
options(scipen = 999)
library(readr)
signature_ref_cosmic_v3_2_hg38 <- read_delim(file = "~/Downloads/signature_ref_cosmic_v3_2_hg38.txt", delim = "\t", col_names = T)
signature_ref_cosmic_v3_2_hg38 <- as.data.frame(signature_ref_cosmic_v3_2_hg38)
save(signature_ref_cosmic_v3_2_hg38, file = "~/Downloads/signature_ref_cosmic_v3_2_hg38.rda")