-
Notifications
You must be signed in to change notification settings - Fork 11
Input
The input file is a white-space (space or tab) delimited file. The first two columns are mandatory: Individual ID and Phenotype. The Individual IDs are alphanumeric and should uniquely identify a person. The second column is phenotype which can be either a quantitative trait or an affection status. Affection status should be coded as 1 and 2 for unaffected and affected, respectively.
HLA types (column 3 onwards) should also be white-space delimited. Every gene must have two alleles specified. All alleles (see Nomenclature of HLA Alleles) do not need to have the same digits. However, if you want to test association at 4 digits, all alleles should have at least 4 digits resolution. Missing genotype is denoted as NA.
No header row should be given. For example, here are two individuals typed for 6 genes (one row = one person):
0001 2 A*02:07:01 A*11:01:01 B*51:01:01 B*51:01:01 C*14:02:01 C*14:02:01 DQA1*01:04:01 DQA1*01:04:01 DQB1*03:03:02 DQB1*05:02:01 DRB1*07:01:01 DRB1*14:54:01
0002 1 A*24:02:01 A*33:03:01 B*15:25:01 B*58:01:01 C*03:02:02 C*04:03 NA NA DQB1*03:01:01 DQB1*03:01:01 DRB1*03:01:01 DRB1*12:02:01
There are one case and one control. The six genes are: HLA-A
, HLA-B
, HLA-C
, HLA-DQA1
, HLA-DQB1
and HLA-DRB1
. Each gene has two columns. Individual 0002
does not have HLA types for HLA-DQA1
(two NA). All alleles have six digits resolution except that one allele of HLA-C
of individual 0002 only has four digits resolution. It is fine if we only want to test association at two or four digits resolution.
Note: The allele names in the above example do not have the HLA prefix. Allele names have the HLA prefix can also be used as input. e.g. A*02:07:01 A*11:01:01
is the same as HLA-A*02:07:01 HLA-A*11:01:01
. See the example file input0.txt
and input1.txt
for case-control trait and quantitative trait, respectively.
Alleles to be excluded from analysis. One allele per line.
A*01:01:02
C*01:03
The covariates file is a white-space (space or tab) delimited file. The first row is header. Row 2 onwards contain the individual ID (IID) and measures of several traits. Each row for one individual. The first column is IID and column 2 onwards contain measures of several traits. Each column for one trait.
For example, here are two individuals with three traits:
IID age sex bmi
0001 28 1 20.70
0002 23 0 16.29
Note: Name of trait should not include any white-space. The order of individuals in covariates file does not have to be the same as the genotype input file. The number of individuals in covariates file also does not have to be the same as the genotype input file. Only the common individuals of both files were included in the analysis. See covar.txt
for an example.