-
Notifications
You must be signed in to change notification settings - Fork 3
Statistical test in snpsettest
For set-based association tests, the snpsettest package employed the statistical model described in VEGAS (versatile gene-based association study) [1], which takes as input variant-level p values and reference likage disequilibrium (LD) data. Briefly, the test statistics is defined as the sum of squared variant-level Z-statistics. Letting a set of scores of individual SNPs for within a set , the test statistic is
Here, is a vector of multivariate normal distribution with a mean vector and a covariance matrix in which represents LD among SNPs. To test a set-level association, we need to evaluate the distribution of . VEGAS uses Monte Carlo simulations to approximate the distribution of (directly simulate from multivariate normal distribution), and thus, compute a set-level p value. However, its use is hampered in practice when set-based p values are very small because the number of simulations required to obtain such p values is be very large. The snpsettest package utilizes a different approach to evaluate the distribution of more efficiently.
Let (instead of , we could use any decomposition that satisfies with a non-singular matrix such that ). Then,
Now, we posit so that
and express the test statistic as a quadratic form:
With the spectral theorem, can be decomposed as follow:
where is an orthogonal matrix. If we set , is a vector of independent standard normal variable since
Under the null hypothesis, is assumed to be . Hence,
where . Thus, the null distribution of is a linear combination of independent chi-square variables (i.e., central quadratic form in independent normal variables). For computing a probability with a scalar ,
several methods have been proposed, such as numerical inversion of the characteristic function [2]. The snpsettest package uses the algorithm of Davies [3] or saddlepoint approximation [4] to obtain set-based p values.
References
-
Liu JZ, Mcrae AF, Nyholt DR, Medland SE, Wray NR, Brown KM, et al. A Versatile Gene-Based Test for Genome-wide Association Studies. Am J Hum Genet. 2010 Jul 9;87(1):139–45.
-
Duchesne P, De Micheaux P. Computing the distribution of quadratic forms: Further comparisons between the Liu-Tang-Zhang approximation and exact methods. Comput Stat Data Anal. 2010;54:858–62.
-
Davies RB. Algorithm AS 155: The Distribution of a Linear Combination of Chi-square Random Variables. J R Stat Soc Ser C Appl Stat. 1980;29(3):323–33.
-
Kuonen D. Saddlepoint Approximations for Distributions of Quadratic Forms in Normal Variables. Biometrika. 1999;86(4):929–35.