This usecase describe how to run MiXeR analysis (http://github.com/precimed/mixer) on synthetic data, generated with simu_linux (http://github.com/precimed/simu).
All commands below assume that $COMORMENT
environmental is defined as described here, and $SINGULARITY_BIND
and $SIF
variables are defined as follows:
export SINGULARITY_BIND=$COMORMENT/mixer/reference:/REF
export SIF=$COMORMENT/mixer/singularity
- Generate two pairs of traits for MiXeR analysis, covering two scenarios: complete polygenic overlap (
shared
), non-overlaping causal variants (unique
), and partly overlaping causal variants (partial
):
singularity shell --home $PWD:/home $SIF/mixer.sif
simu_linux --qt --bfile /REF/hapgen/chr21 --seed 123 --causal-n 100 100 --trait1-sigsq 1 0 --trait2-sigsq 0 1 --num-components 2 --out unique --num-traits 2
simu_linux --qt --bfile /REF/hapgen/chr21 --seed 123 --causal-n 100 --trait1-sigsq 1 --trait2-sigsq 1 --num-components 1 --out shared --num-traits 2
simu_linux --qt --bfile /REF/hapgen/chr21 --seed 123 --causal-n 50 50 50 --trait1-sigsq 1 1 0 --trait2-sigsq 1 0 1 --num-components 3 --out partial --num-traits 2
plink2 --bfile /REF/hapgen/chr21 --glm allow-no-covars --pheno shared.pheno --pheno-name trait1 --out shared.1
plink2 --bfile /REF/hapgen/chr21 --glm allow-no-covars --pheno shared.pheno --pheno-name trait2 --out shared.2
plink2 --bfile /REF/hapgen/chr21 --glm allow-no-covars --pheno unique.pheno --pheno-name trait1 --out unique.1
plink2 --bfile /REF/hapgen/chr21 --glm allow-no-covars --pheno unique.pheno --pheno-name trait2 --out unique.2
plink2 --bfile /REF/hapgen/chr21 --glm allow-no-covars --pheno partial.pheno --pheno-name trait1 --out partial.1
plink2 --bfile /REF/hapgen/chr21 --glm allow-no-covars --pheno partial.pheno --pheno-name trait2 --out partial.2
- Convert the output of
plink2
to a format compatible withMiXeR
:
singularity exec --home $PWD:/home $SIF/mixer.sif python
import pandas as pd
for fname, out in [('{x}.{y}.trait{y}.glm.linear'.format(x=x,y=y), '{}.{}.sumstats'.format(x,y)) for x in ['unique', 'shared', 'partial'] for y in ['1', '2']]:
pd.read_csv(fname, delim_whitespace=True)[['ID', 'REF', 'ALT', 'OBS_CT', 'T_STAT']].rename(columns={'ID':'SNP', 'REF':'A1', 'ALT':'A2', 'OBS_CT':'N', 'T_STAT':'Z'}).to_csv(out,index=False, sep='\t')
- Run univariate MiXeR analysis (each analysis should take up to 10 minutes on a standard laptop, in this demo example)
singularity shell --home $PWD:/home $SIF/mixer.sif
export MIXER_COMMON_ARGS="--chr2use 21 --ld-file /REF/ldsc/1000G_EUR_Phase3_plink/1000G.EUR.QC.@.run4.ld --bim-file /REF/ldsc/1000G_EUR_Phase3_plink/1000G.EUR.QC.@.bim --extract /REF/ldsc/1000G_EUR_Phase3_plink/1000G.EUR.QC.prune_maf0p05_rand2M_r2p8.rep1.snps"
python /tools/mixer/precimed/mixer.py fit1 $MIXER_COMMON_ARGS --trait1-file unique.1.sumstats --out unique.1
# now you have two options
# Option 1. Run the above command for the other 5 traits:
python /tools/mixer/precimed/mixer.py fit1 $MIXER_COMMON_ARGS --trait1-file unique.2.sumstats --out unique.2
python /tools/mixer/precimed/mixer.py fit1 $MIXER_COMMON_ARGS --trait1-file shared.1.sumstats --out shared.1
python /tools/mixer/precimed/mixer.py fit1 $MIXER_COMMON_ARGS --trait1-file shared.2.sumstats --out shared.2
python /tools/mixer/precimed/mixer.py fit1 $MIXER_COMMON_ARGS --trait1-file partial.1.sumstats --out partial.1
python /tools/mixer/precimed/mixer.py fit1 $MIXER_COMMON_ARGS --trait1-file partial.2.sumstats --out partial.2
# Option 2. Do a bit of cheating, and copy copy the .json files, enforcing exacly the same genetic architecture
# ("pi", "sig2beta" and "sig2zero" parameters) in all six traits. This shouldn't make much difference as long as
# univariate estimates are fairly accurate.
cp unique.1.json unique.2.json && cp unique.1.json shared.1.json && cp unique.1.json shared.2.json && cp unique.1.json partial.1.json && cp unique.1.json partial.2.json
# Now proceed with bivariate analyses
python /tools/mixer/precimed/mixer.py fit2 $MIXER_COMMON_ARGS --trait1-file unique.1.sumstats --trait2-file unique.2.sumstats --trait1-params unique.1.json --trait2-params unique.2.json --out unique
python /tools/mixer/precimed/mixer.py fit2 $MIXER_COMMON_ARGS --trait1-file shared.1.sumstats --trait2-file shared.2.sumstats --trait1-params shared.1.json --trait2-params shared.2.json --out shared
python /tools/mixer/precimed/mixer.py fit2 $MIXER_COMMON_ARGS --trait1-file partial.1.sumstats --trait2-file partial.2.sumstats --trait1-params partial.1.json --trait2-params partial.2.json --out partial
python /tools/mixer/precimed/mixer_figures.py two --json shared.json --out shared
python /tools/mixer/precimed/mixer_figures.py two --json unique.json --out unique
python /tools/mixer/precimed/mixer_figures.py two --json partial.json --out partial
-
The results are available in
shared.png
,unique.png
andpartial.png
figures. -
You could also use MIXER_SIMU.job script to run this on a cluster 20 times, each with a random subsets of SNPs as defined by
1000G.EUR.QC.prune_maf0p05_rand2M_r2p8.rep[1-20].snps
files, and use variation in parameter estimates amoung these runs this to infer uncertainty of parameter estimates. AdjustMIXER_SIMU.job
to match configuration of your cluster. Then trigger analysis by runningsbatch MIXER_SIMU.job
command. Once you get all results, combine them across 20 runs, and produce figures as follows:
singularity shell --home $PWD:/home $SIF/mixer.sif
python /tools/mixer/precimed/mixer_figures.py combine --json unique.fit.rep@.json --out unique.fit
python /tools/mixer/precimed/mixer_figures.py combine --json unique.test.rep@.json --out unique.test
python /tools/mixer/precimed/mixer_figures.py combine --json shared.fit.rep@.json --out shared.fit
python /tools/mixer/precimed/mixer_figures.py combine --json shared.test.rep@.json --out shared.test
python /tools/mixer/precimed/mixer_figures.py combine --json partial.fit.rep@.json --out partial.fit
python /tools/mixer/precimed/mixer_figures.py combine --json partial.test.rep@.json --out partial.test
python /tools/mixer/precimed/mixer_figures.py two --statistic mean std --json-fit unique.fit.json --json-test unique.test.json --out mixer_simu_unique
python /tools/mixer/precimed/mixer_figures.py two --statistic mean std --json-fit shared.fit.json --json-test shared.test.json --out mixer_simu_shared
python /tools/mixer/precimed/mixer_figures.py two --statistic mean std --json-fit partial.fit.json --json-test partial.test.json --out mixer_simu_partial
After processing the resulting figures shoud look like this:
To include density plots to the resulting figure, add --trait1-file
and --trait2-file
arguments like this:
python /tools/mixer/precimed/mixer_figures.py two --statistic mean std --json-fit unique.fit.json --json-test unique.test.json --out mixer_simu_unique --trait1-file unique.1.sumstats --trait2-file unique.2.sumstats
The syntax above can be adapted to Docker as follows:
#!/bin/bash
# define environment variables:
export IMAGE="ghcr.io/comorment/mixer:latest" # adapt as necessary
export MIXER="<path/to>/mixer" # adapt as necessary
export SLURM_ARRAY_TASK_ID=1
export MIXER_COMMON_ARGS="--chr2use 21 --ld-file /REF/ldsc/1000G_EUR_Phase3_plink/1000G.EUR.QC.@.run4.ld --bim-file /REF/ldsc/1000G_EUR_Phase3_plink/1000G.EUR.QC.@.bim"
export REP="rep${SLURM_ARRAY_TASK_ID}"
export EXTRACT="--extract /REF/ldsc/1000G_EUR_Phase3_plink/1000G.EUR.QC.prune_maf0p05_rand2M_r2p8.$REP.snps"
# shortcuts for different binaries and interactive shell:
export PYTHON="docker run --platform=linux/amd64 --rm -v ${MIXER}/usecases:/home -v ${MIXER}/reference:/REF -w/home --entrypoint=python ${IMAGE}"
export SIMU="docker run --platform=linux/amd64 --rm -v ${MIXER}/usecases:/home -v ${MIXER}/reference:/REF -w/home --entrypoint=simu_linux ${IMAGE}"
export PLINK1="docker run --platform=linux/amd64 --rm -v ${MIXER}/usecases:/home -v ${MIXER}/reference:/REF -w/home --entrypoint=plink1 ${IMAGE}"
export PLINK2="docker run --platform=linux/amd64 --rm -v ${MIXER}/usecases:/home -v ${MIXER}/reference:/REF -w/home --entrypoint=plink2 ${IMAGE}"
export ISHELL="docker run --platform=linux/amd64 --rm -it -v ${MIXER}/usecases:/home -v ${MIXER}/reference:/REF -w/home --entrypoint=bash ${IMAGE}"
# invoke mixer.py help documentation:
$PYTHON /tools/mixer/precimed/mixer.py fit1 --help
# prepare simulated data:
$SIMU --qt --bfile /REF/hapgen/chr21 --seed 123 --causal-n 100 100 --trait1-sigsq 1 0 --trait2-sigsq 0 1 --num-components 2 --out unique --num-traits 2
$SIMU --qt --bfile /REF/hapgen/chr21 --seed 123 --causal-n 100 --trait1-sigsq 1 --trait2-sigsq 1 --num-components 1 --out shared --num-traits 2
$SIMU --qt --bfile /REF/hapgen/chr21 --seed 123 --causal-n 50 50 50 --trait1-sigsq 1 1 0 --trait2-sigsq 1 0 1 --num-components 3 --out partial --num-traits 2
$PLINK2 --bfile /REF/hapgen/chr21 --glm allow-no-covars --pheno shared.pheno --pheno-name trait1 --out shared.1
$PLINK2 --bfile /REF/hapgen/chr21 --glm allow-no-covars --pheno shared.pheno --pheno-name trait2 --out shared.2
$PLINK2 --bfile /REF/hapgen/chr21 --glm allow-no-covars --pheno unique.pheno --pheno-name trait1 --out unique.1
$PLINK2 --bfile /REF/hapgen/chr21 --glm allow-no-covars --pheno unique.pheno --pheno-name trait2 --out unique.2
$PLINK2 --bfile /REF/hapgen/chr21 --glm allow-no-covars --pheno partial.pheno --pheno-name trait1 --out partial.1
$PLINK2 --bfile /REF/hapgen/chr21 --glm allow-no-covars --pheno partial.pheno --pheno-name trait2 --out partial.2
$PYTHON -c "import pandas as pd
for fname, out in [('{x}.{y}.trait{y}.glm.linear'.format(x=x,y=y), '{}.{}.sumstats'.format(x,y)) for x in ['unique', 'shared', 'partial'] for y in ['1', '2']]:
pd.read_csv(fname, delim_whitespace=True)[['ID', 'REF', 'ALT', 'OBS_CT', 'T_STAT']].rename(columns={'ID':'SNP', 'REF':'A1', 'ALT':'A2', 'OBS_CT':'N', 'T_STAT':'Z'}).to_csv(out,index=False, sep='\t')"
# MiXeR univariate fit:
$PYTHON /tools/mixer/precimed/mixer.py fit1 $MIXER_COMMON_ARGS $EXTRACT --trait1-file unique.1.sumstats --out unique.1
# cheat:
cp unique.1.json unique.2.json && cp unique.1.json shared.1.json && cp unique.1.json shared.2.json && cp unique.1.json partial.1.json && cp unique.1.json partial.2.json
# bivariate fit:
$PYTHON /tools/mixer/precimed/mixer.py fit2 $MIXER_COMMON_ARGS $EXTRACT --trait1-file unique.1.sumstats --trait2-file unique.2.sumstats --trait1-params unique.1.json --trait2-params unique.2.json --out unique
$PYTHON /tools/mixer/precimed/mixer.py fit2 $MIXER_COMMON_ARGS $EXTRACT --trait1-file shared.1.sumstats --trait2-file shared.2.sumstats --trait1-params shared.1.json --trait2-params shared.2.json --out shared
$PYTHON /tools/mixer/precimed/mixer.py fit2 $MIXER_COMMON_ARGS $EXTRACT --trait1-file partial.1.sumstats --trait2-file partial.2.sumstats --trait1-params partial.1.json --trait2-params partial.2.json --out partial
# figures:
$PYTHON /tools/mixer/precimed/mixer_figures.py two --json shared.json --out shared
$PYTHON /tools/mixer/precimed/mixer_figures.py two --json unique.json --out unique
$PYTHON /tools/mixer/precimed/mixer_figures.py two --json partial.json --out partial