This repository contains all computer code to conduct the analyses present in the manuscript, "Leveraging phenotypic variability to identify genetic interactions in human phenotypes". This research describes a statistical framework to find SNPs associated with the means and variances of a quantitative phenotype, and then using these SNPs to discover gene-environment interactions. We applied these methods to study the genetic basis of body mass index levels and diabetes risk.
All scripts are written in the R or Bash programming languages. The analysis was performed on a linux system. Plots were created on a macOS Catalina system.
If you can not find the code you are looking for or have any questions, please contact:
Andrew Marderstein
anm2868@med.cornell.edu
Directory: ./scripts/vqtl_method_comparison
Description: Main script 1. Simulating population genetic data and comparing the false positive rates and power for different variance tests. Results are saved using various parameter settings.
Description: Main script 2. Simulating population genetic data and contrasting power for a muQTL test versus a vQTL test. Results are saved using various parameter settings.
Description: (1) Supplementary figure of interaction effect size versus variance effect size. (2) Power heatmap (supp fig). (3) muQTL vs vQTL output & plots. (4) vQTL method comparison figure.
1. beta_vs_variance_explained_boxplots.R
2. heatmap_vg_vs_N.R
3. muqtl_vqtl_plot.R
4. vQTL_method_compare_simulation_save_draw_plots.R
Directory: ./scripts/gwas
Description: Various scripts to partition UKB into discovery and validation cohorts (1), pull study covariates (2,3), and generate the full dataset for GWAS (4).
1. split_80_20.R
2. sample_qc.R
3. gen_cov1.R
4. gen_full_data.R
Description: Adjust phenotype for covariates prior to running a GWAS.
Description: This is the master pipeline script used to perform a GWAS for muQTLs, raw vQTLs, and RINT vQTLs in UKB. It was manually ran piece-by-piece, and uses the following scripts:
1. run_GWAS.impute.sh
2. run_vGWAS_subset.sh
3. mergeResults_impute.R
4. merge_vGWAS_subset.sh
5. merge_vGWAS_subset_2.R
6. ./errors/run_vGWAS_subset_specific.sh
Briefly, it runs a GWAS for muQTLs using (1) and for vQTLs using (2). Next, it merges the different files together, using (3) for muQTLs and (4, 5) for vQTLs. If an error occurs for vQTL script executions, then it will re-run using (6). Finally, the results are clumped and an all-by-all SNP-by-SNP epistasis analysis was performed from the clumped loci.
Description: This is the script to perform the Deviation Regression Model on PLINK BED formatted genotype files. It is used within (A2).
Description: See README.md within the HLMM directory.
Description: This script merges the muQTL, raw vQTL, RINT vQTL, and dQTL results together, and extracts out the significant SNPs.
Description: Creates figures that visualize the results from the different GWAS analyses. For example, displaying muQTL effects versus raw vQTL effects.
Description: Pipeline to map SNPs to the most likely gene.
Directory: ./scripts/gxe
Description: Pull environmental information and create co-factors for GxE interaction analysis, with the exception of the diet score co-factor.
Description: (1,2) perform GxE interaction testing. (3,4) perform SNP-by-diet score interaction testing. (1,3) is using SNP candidates, while (2,4) uses matched genome-wide SNPs.
(1) GxE_updated.R
(2) GxE_updated.matched.R
(3) GxE_diet_score.R
(4) GxE_diet_score.matched.R
Description: This is a broad script used to parse the GxE interaction results and extract discovery rates, validation rates, and other insights.
Description: (1,2) create figures describing validation rates between discovery and validation cohorts.
(1) GxE_validation_plots.R
(2) GxE_validation_plots2.R
Description: Creates a heat map of GxE results.
Description: Compares discovery rates between SNP candidates and random, matched genome-wide SNPs using a permutation procedure.
Description: Estimates correlation between interaction effects and muQTL effects, raw vQTL effects, RINT vQTL effects, and dQTL effects, and visualizes the results.
Description: This script compares the GxE results from analysis on different transformations of BMI.
Directory: ./scripts/gxe
Description: Fits a model that jointly fits all individual significant GxE interactions found with the FTO intronic genotype.
Description: Marginal FTO effects on BMI, conditional on environmental factor levels.
Description: Forest plot visualization of the estimated effects measured in (B).
Description: Correlation between age and TMEM18 gene expression in visceral adipose tissue GTEx samples.
Directory: ./scripts/gxe/pleiotropy
Description: Create disease case phenotypes.
Description: Perform GxE testing in case-control disease phenotypes.
Description: Plot estimated (BMI) effects in the discovery cohort with the (diabetes) effects estimated in the validation cohort.
Description: Estimate the marginal effect of the BARX1 regulatory SNP on diabetes risk, conditional on physical activity level.
Description: Forest plot of the estimated effects from (D).
Directory: ./scripts/GxG
Description: Analyzing the GxG results.
Description: Subsetting the tested GxG interactions and comparing the correlation of observed effects between discovery and validation sets.
Description: Plot of the results from (B).
Description: Figure panel for the GxG results.
Directory: ./scripts/phewas
Description: Maps list of SNPs from UK Biobank's SNP IDs to Open Target's variant IDs.
Description: The pheWAS() function takes a variant ID and finds all associated phenotypes in Open Targets (P < 0.05).
Description: Runs the pheWAS() function for a series of input SNPs and saves the results.
Description: This script runs the PheWAS enrichment analysis, where the proportion of raw vQTLs associated with some phenotype are compared to the proportion of muQTLs associated with the same phenotype.
Description: Scatterplot figure displaying the PheWAS results.
Directory: ./scripts/ldsc
Description: Three scripts used to generate summary statistics for LDSC input (1), to run LDSC and receive results (2), and to visualize the output (3).
1. ldsc_SS_generate.R
2. ldsc_run.txt
3. ldsc_plot.R
Directory: ./scripts/other
Description: Generates a SNP associated with the variance of a phenotype with no true phenotypic effect, and assesses the false positive rate and visualizes the results.
Description: Comparison of several population attributes between muQTLs and raw vQTLs.
Description: Analysis of the standard error of a mean estimate and a variance estimate.
Description: Match random genome-wide SNPs to the observed QTLs using population attributes, and create a genotype file in PLINK BED format.