Skip to content

Allele Association Analysis

Felix edited this page Jan 25, 2016 · 8 revisions

Allele Association Analysis

Methods for association analysis between HLA alleles and diseases.

1 Options

--file input0.txt      [Mandatory]
--assoc                 [Mandatory]
--digit 4               [Default]
--test fisher           [Default]
--model allelic         [Default]
--freq 0                [Default]
--adjust FDR            [Default]
--out output.txt        [Default]
--print                 [Optimal]
--perm N                [Optimal]
--seed S                [Optimal]
--exclude EXCLUDE.txt   [Optimal]
--covar COVAR.txt       [Optimal, for logistic and linear regression only]
--covarname COVARNAME   [Optimal, for logistic and linear regression only]

1.1 Digits resolution (--digit)

Test of association using two digits, four digits or six digits. When two was used, alleles such as A*02:01 and A*02:06 will be combined as A*02. Default value is 4.

1.2 Methods for association test (--test)

chisq       Pearson chi-squared test (For disease traits, 2 x 2 coningency table)
fisher      Fisher's exact test (For disease traits, 2 x 2 coningency table)
logistic    logistic regression (For disease traits)
linear      linear regression (For quantitative traits)
raw         Pearson chi-squared test (For disease traits, 2 x m coningency table) 
score       Score test proposed by Galta (2005) et al. (For disease traits)
delta       Population frequency difference between cases and controls 
            (For disease traits,Fisher's exact test)

When linear or logistic regression was used, assume A*01:01 is the test allele, then A*01:01 A*01:01 is code as 2, A*01:01 A*01:02 is code as 1, and A*01:02 A*01:03 is code as 0.

Default value is fisher.

1.3 Genetic model to test (--model)

When Pearson chi-squared test or Fisher's exact test was used, three genetic models can be specified.

allelic    compares one allele against the others group together
dom        compares individuals carry one allele against individuals do not carry it
rec        compares individuals carry homozygous of one allele against other individuals

Default value is allelic.

Note: --model only effect when --test chisq or --test fisher is specified.

1.4 Minimal allele/allele group frequency (--freq)

A value between 0 and 1. Only alleles/allele groups have frequency higher than this threshold will be included in association analysis. Default value is 0. When --perm is specified, it is better to set a higher value than 0 to --freq to reduce permutation time.

1.5 Adjustment for multiple testing (--adjust)

Bonferroni         Bonferroni single-step adjusted p-values
Holm               Holm (1979) step-down adjusted p-values
FDR                Benjamini & Hochberg (1995) step-up FDR control
FDR_BY             Benjamini & Yekutieli (2001) step-up FDR control

1.6 Output file name (--out)

Default value is output.txt.

1.7 Print output to screen (--print)

Specify --print will print all results to screen (still write results to the output file).

1.8 Permutation (--perm)

Number of permutation will be performed.

For each permutation run, a simulated dataset is constructed from the original dataset by randomizing the assignment of phenotype status among individuals. The same individuals are used, maintaining the same LD structure and the original case/control ratio.

Only simulated dataset with the same common alleles between cases and controls as the original dataset will be used. So assign a greater than zero value to --freq can speed up the permutation.

1.9 Random seed (--seed)

Random seed for permutation. A number used to initialize the basic random number generator. By default, the current system time is used.

1.10 Exclude Alleles (--exclude)

Alleles to be excluded. One allele per line.

A*01:01:02
C*01:03

1.11 Covariates file (--covar)

One or more covariates can be included in linear and logistic regression.

The covariates file is a white-space (space or tab) delimited file. The first row is header. Row 2 onwards contain the individual ID (IID) and measures of several traits. Each row for one individual. The first column is IID and column 2 onwards contain measures of several traits. Each column for one trait.

For example, here are two individuals with three traits:

IID  age sex bmi
0001 28  1   20.70
0002 23  0   16.29

Note: Name of trait should not include any white-space.

Note: --covar only effect when --test linear or --test logistic is specified.

Note: The order of individuals in covariates file does not have to be the same as the genotype input file. The number of individuals in covariates file also does not have to be the same as the genotype input file. Only the common individuals of both files were included in the analysis.

##1.12 Covariates name (--covarname)

To select a particular subset of covariates, use --covarname covarnames command.

covarnames is a string of trait names (in the header row of covariates file) concatenate with comma(,).

For example,

--covar cov.txt                                    # use all covariates in cov.txt
--covar cov.txt --covarname bmi                    # only use 'bmi'
--covar cov.txt --covarname age,bmi                # use both 'age' and 'bmi'
--covar cov.txt --covarname age,sex,bmi            # use all three covariates

Note: if --covarname covarnames command is not specified, all covariates in cov.txt will be used.

Home Previous Next