Skip to content
Patrick Blaney edited this page Mar 4, 2022 · 5 revisions

Welcome to the mmsig wiki!

Updated COSMIC Mutational Signature Reference Data

Below is a description of the process used to generate the updated COSMIC v3.1 and v3.2 mutational signature reference data files for use with mmsig.

The process involves basic R and bash commands.

COSMIC mutational signature reference data files found here For this demo, COSMIC SBS v3.2 (most current as of 2021) will be used.

R Packages needed:

  • devtools
  • mmsig
  • readr

First, using R/Rstudio, obtain the original signature reference data from the mmsig package

devtools::install_github(repo = "pblaney/mmsig")

library(mmsig)

data(signature_ref)

# Note, this command directs the file to be written to your Downloads folder, change if needed
write.table(x = signature_ref, file = "~/Downloads/signature_ref.txt", sep = "\t", col.names = T, row.names = F, quote = F)

Next, using the command line in the same directory as the signature_ref.txt file from the previous step:

  1. Download the COSMIC mutational reference data file
wget https://cancer.sanger.ac.uk/signatures/documents/453/COSMIC_v3.2_SBS_GRCh38.txt
  1. Create new file with the header of the original COSMIC file
head -n 1 COSMIC_v3.2_SBS_GRCh38.txt > COSMIC_v3.2_SBS_GRCh38.sorted.txt
  1. Build the rest of the new sorted COSMIC file using the order of signature class as set in the signature_ref.txt. This order is necessary for the functions of the package to work as intended so this step is the most crucial
for class in `grep -v 'class' signature_ref.txt | cut -f 1`; do classRegex=$(echo ${class} | sed 's|\[|\\[|' | sed 's|\]|\\]|'); grep "${classRegex}" COSMIC_v3.2_SBS_GRCh38.txt; done >> COSMIC_v3.2_SBS_GRCh38.sorted.txt
  1. Using the first 3 columns of the signature_ref.txt file, all the SBS columns in the new sorted COSMIC file, and the SBS-MM1 column from the signature_ref.txt, stitch together a full, updated COSMIC mutational signature reference data file
paste <(cut -f 1-3 signature_ref.txt) <(cut -f 2-79 COSMIC_v3.2_SBS_GRCh38.sorted.txt) <(cut -f 13 signature_ref.txt) > COSMIC_v3.2_SBS_plus_MM1_GRCh38.sorted.txt
  1. Last, since mmsig only expects SBS1, SBS2, SBS5, SBS8, SBS9, SBS13, SBS18, SBS35, SBS84, SBS-MM1, and the first 3 columns to identify the substitution class, create the final signature reference file that will be converted to an R data file
cut -f 1-5,8,14,15,22,28,45,71,82 COSMIC_v3.2_SBS_plus_MM1_GRCh38.sorted.txt > signature_ref_cosmic_v3_2_hg38.txt

Finally, using R/Rstudio, read in the new signature_ref_cosmic_v3_2_hg38.txt file from the previous step and create an .rda file for easy use in the mmsig package

options(scipen = 999)

library(readr)

signature_ref_cosmic_v3_2_hg38 <- read_delim(file = "~/Downloads/signature_ref_cosmic_v3_2_hg38.txt", delim = "\t", col_names = T)

signature_ref_cosmic_v3_2_hg38 <- as.data.frame(signature_ref_cosmic_v3_2_hg38)

save(signature_ref_cosmic_v3_2_hg38, file = "~/Downloads/signature_ref_cosmic_v3_2_hg38.rda")