You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Jun 21, 2023. It is now read-only.
There are a number of ways to define or identify recurrent mutations. The purpose of this issue is to discuss how to define a "recurrent mutation" throughout the project, with some acknowledgment that the answer might be "it depends."
My goal is to document some of the things that I've been thinking about or reading recently (which is almost certainly not a complete look at all available literature) to get the discussion started.
Here are a few examples of analyses that use or may use the concept of a recurrent mutation:
oncoprint-landscape - This code has the ability to accept a list of genes of interest and we'll probably want to generate lists of genes of interest that are comprised of recurrently altered genes to make the OncoPrint plots.
All this to say - is a recurrent mutation a specific alteration, e.g., H3F3A K28M, or is it any mutation in a gene given some constraints (e.g., drop synonymous mutations)?
I think the interaction-plots and recurrent-VUS are good examples of why the answer may depend on the specific analysis, but it would be good to get some discussion around this going.
Significantly mutated genes
Beyond recurrent mutations, there is also the question of whether or not a gene is "significantly mutated" and what method could be used to make that determination. Here, I'll link to relevant literature and software/code.
By analysing the enrichment [12, 13] of somatic alterations within each histotype or the pan-cancer cohort (see Methods), we identified 142 significantly mutated driver genes (Fig. 2a, Supplementary Table 2, Extended Data Fig. 3a).
Where the methods state
We discovered 142 candidate driver genes by this approach (Supplementary Table 2). Of these, 133 were significant by GRIN analysis (87 genes common to both GRIN and MutSigCV) and nine were significant only by MutSigCV.
MuSiC identified 77 significantly mutated genes (SMGs), which were ranked according to their pan-cancer mutation frequency [24] (Fig. 4, Supplementary Tables 9, 10). Most SMGs were mutually exclusively mutated across cancer types, demonstrating specificity of single putative driver genes in childhood cancers as compared to more frequent co-mutation in adult cancers in the TCGA study [7] (Extended Data Fig. 4c–e).
And from the methods:
Significantly mutated genes based on somatic SNVs and indels were identified with the SMG module of the MuSiC tools suite [24] separately from all cancer types and from the pan-cancer cohort, and then merged.
This kind of significance analysis often produces false positive hits (for example, very large genes), despite normalization procedures, and thus several filters were applied to the raw output [30].
This study is biased towards central nervous system tumours, and is complemented by an additional study of a non-overlapping paediatric cohort with mainly leukaemias and extracranial solid tumours [9].
A comparison to their results seems like a good thing to do as part of this project. Here's a link from that paper: http://www.pedpancan.com/ which mentions PedcBioPortal when you follow it!
The text was updated successfully, but these errors were encountered:
oncodrive is a based on algorithm oncodriveCLUST which was originally implemented in Python. Concept is based on the fact that most of the variants in cancer causing genes are enriched at few specific loci (aka hot-spots). This method takes advantage of such positions to identify cancer genes.
There's now a new version called OncodriveCLUSTL available via pip (publication, bitbucket)
The method does not assume that the baseline mutation probability is homogeneous across all gene positions but it creates a background model using silent mutations. Coding silent mutations are supposed to be under no positive selection and may reflect the baseline clustering of somatic mutations. Given recent evidences of non-random mutation processes along the genome, the assumption of homogenous mutation probabilities is likely an oversimplication introducing bias in the detection of meaningful events.
Candidate-driver-mutation identification methods and combination of results
We obtained results (P values) from 13 methods of driver discovery, including ActiveDriverWGS54, CompositeDriver, DriverPower55, dndscv46, ExInAtor56, LARVA57, MutSig tools3, NBR10, ncdDetect58, ncDriver59, OncodriveFML60 and regDriver61. We integrated the results of all these methods using a custom framework based on a previously published method62 for combining P values. Results from individual methods that showed large deviations from the expected uniform null distribution of P values were excluded. This approach was evaluated on real and simulated data.
There are a number of ways to define or identify recurrent mutations. The purpose of this issue is to discuss how to define a "recurrent mutation" throughout the project, with some acknowledgment that the answer might be "it depends."
My goal is to document some of the things that I've been thinking about or reading recently (which is almost certainly not a complete look at all available literature) to get the discussion started.
Here are a few examples of analyses that use or may use the concept of a recurrent mutation:
interaction-plots
- where mutations are processed in the following ways by default (seeanalyses/interaction-plots/scripts/02-process_mutations.R
): remove synonymous mutations, remove non-transcribed mutations, remove non-coding mutationsrecurrent-VUS
from the draft pull request Jashapiro/recurrent vus #362 - looks like it this includes a specific amino acid change (https://github.com/AlexsLemonade/OpenPBTA-analysis/pull/362/files#diff-2058996fcf1edc695eff61268de983cfR101)oncoprint-landscape
- This code has the ability to accept a list of genes of interest and we'll probably want to generate lists of genes of interest that are comprised of recurrently altered genes to make the OncoPrint plots.All this to say - is a recurrent mutation a specific alteration, e.g., H3F3A K28M, or is it any mutation in a gene given some constraints (e.g., drop synonymous mutations)?
I think the
interaction-plots
andrecurrent-VUS
are good examples of why the answer may depend on the specific analysis, but it would be good to get some discussion around this going.Significantly mutated genes
Beyond recurrent mutations, there is also the question of whether or not a gene is "significantly mutated" and what method could be used to make that determination. Here, I'll link to relevant literature and software/code.
From Ma et al. Nature 2018.:
Where the methods state
The GRIN R package is available here: https://www.stjuderesearch.org/site/depts/biostats/grin
MutSigCV v1 is available as a GenePattern module: https://www.genepattern.org/modules/docs/MutSigCV
Note I happened upon some R code that implements the MutSig1.0 statistic: https://github.com/lixiangchun/lxctk/blob/ea74021f49393c65993b28f6a11a4c5cccbf66ae/R/mutsig.gene.R#L102
And Maftools seems like it has some functionality to use the output of MutSigCV based on my skimming of Mayakonda et al. Genome Research. 2018.
From Gröbner et al. Nature. 2018:
And from the methods:
MuSiC2 is available on GitHub: https://github.com/ding-lab/MuSiC2
Some of the tests proposed by the MuSiC paper (Dees et al. Genome Research. 2012.), namely the Fisher's combined p-value test and likelihood ratio test, are implemented in the same function I linked to above: https://github.com/lixiangchun/lxctk/blob/ea74021f49393c65993b28f6a11a4c5cccbf66ae/R/mutsig.gene.R, where the method labeled
PCT
is from Kan et al. Nature. 2010. per the documentation.Comparison to other literature
The Gröbner et al. Nature. 2018 cohort is enriched for CNS tumors
A comparison to their results seems like a good thing to do as part of this project. Here's a link from that paper: http://www.pedpancan.com/ which mentions PedcBioPortal when you follow it!
The text was updated successfully, but these errors were encountered: