StratMRAlgo

StratMRAlgo is a stratified Mendelian randomization (MR) algorithm designed to identify effect modifiers of causal relationships, accomodating both binary and continuous outcomes. The algorithm emerged as a heuristic solution following the evaluation of five stratified MR methods across a total of 63 simulation scenarios, which considered forms of collider bias, confounding, mediation, and nonlinear relationships between variables. The derivation of the algorithm and its scope of use can be found in detail in our main manuscript.

This R package automates the algorithm and the five MR methods tested using four functions: preprocess(), mr(), algo(), and all_methods().

Installation

StratMRAlgo can be directly installed in R using devtools::install_github:

devtools::install_github("GMELab/StratMRAlgo")

If issues are encountered with devtools::install_github, the repository can be downloaded locally using the git clone command as follows:

# Enter a directory to download StratMRAlgo
cd directory

# Initialize a local git repository
git init

# Execute the git clone command
git clone https://github.com/GMELab/StratMRAlgo.git

And then loaded in R using devtools::load_all:

setwd("/directory/R")
devtools::load_all(".")

This package requires the dependencies data.table, dplyr, and metafor.

Workflow

Part 1: Data Pre-processing

The preprocess() function automates the pre-processing steps required for the algorithm. The function requires an input data file with columns representing all exposures, their respective polygenic risk scores (PRS), outcomes, stratifying variables, and covariates which you plan to include in your analyses. For the package to recognize the variable type properly, the variable names must be specified as follows:

Variable Type	Required Suffix	Sample Column Name(s)	Supported Data Type(s)

Exposure	`_EXP`	`BMI_EXP`	Continuous
Exposure PRS	`_PRS`	`BMI_PRS`	Continuous
Stratifying Variable	`_STRAT`	`LDL_STRAT`	Continuous
Outcome	`_OUTCOME` if continuous or `_CASE_STATUS` if binary	`HBA1C_OUTCOME` `T2DM_CASE_STATUS` `AGE_OUTCOME` `SEX_CASE_STATUS`	Binary and continuous
Covariate	`_COVAR`	`AGE_COVAR` `SEX_COVAR` `PC1_COVAR`	Binary and continuous

If you wish to consider a variable as more than one variable type, multiple copies of the column should be present with the appropriate suffices added to their names. For example, if you wish to consider BMI as both an exposure and stratifying variable, a duplicate set of columns labeled BMI_EXP and BMI_STRAT should be included in the input file. Additionally, age and sex columns (used either as covariates or negative control outcomes) should be specified in all capital letters AGE and SEX prior to their appropriate suffix.

All possible exposure, stratifying variable, and outcome combinations will be automatically generated using the main input data file before being filtered by the preprocess() function. Alternatively, specific combinations to process can also be added as optional input files using the continuous and binary parameters.

Another optional input file, a .txt file with diagnostically-related variables can also be specified. Each row in this file should represent a pair of variables which are diagnostically-related, such that combinations of exposures, stratifying variables, and outcomes containing these pairs can be filtered out by preprocess(). The first column should list the continuous variable(s) while the second column should list the binary outcome(s). For example, for all combinations containing both HbA1c and diabetes to be excluded, one row should be included in this .txt file, where HbA1c (continuous variable) is in the first column, and diabetes (binary outcome) is in the second column.

The following steps are performed by the preprocess() function. Sample input files can be obtained using generate_sample_data().

Sample code for generating sample input files and pre-processing steps is shown below:

# Generate sample data
StratMRAlgo::generate_sample_data(output_dir = "/Your_Directory/Package_Testing_Sample_Data/") # modify

# Pre-process sample data
combinations <- StratMRAlgo::preprocess(
  input_dir = "/Your_Directory/Package_Testing_Sample_Data/", # modify
  output_dir = "/Your_Directory/1_preprocess/", # modify
  data = "data.csv",
  diagnostically_related = "diagnostically_related.txt",
  binary = NULL, # could specify "binary_comb_interest.txt"
  continuous = NULL) # could specify "continuous_comb_interest.txt"

The function outputs a list object containing two elements: $binary and $continuous corresponding to binary_comb.txt and continuous_comb.txt respectively if output directories are specified. Both data frames contain the filtered exposure, stratifying variable, and outcome combinations in each row with their biases, overall causal estimates, and the instrument strength. The data in the two data frames are summarized below:

Column Name	Defintion	Data Frame(s)

EXP	Exposure name	`$binary` and `$continuous`
STRAT	Stratifying variable name	`$binary` and `$continuous`
CASE_STATUS	Binary outcome name	`$binary`
OUTCOME	Continuous outcome name	`$continuous`

X_C2_R2	Estimation of the variance of X explained by C² from Binary Algorithm Model 1	`$binary`
Y_C2_OR	Odds ratio estimating the effect of C² on Y from Binary Algorithm Model 2	`$binary`
Y_X2_OR	Odds ratio estimating the effect of X² on Y from Binary Algorithm Model 3	`$binary`
X_GC_R2	Estimation of the variance of X explained by the GC interaction term from Continuous Algorithm Model 2 (used for the `all_methods()` function but not `algo()` for binary outcomes)	`$binary`

C_Y2_R2	Estimation of the variance of C explained by Y² from Continuous Algorithm Model 1	`$continuous`
X_GC_R2	Estimation of the variance of X explained by the GC interaction term from Continuous Algorithm Model 2	`$continuous`
X_C2_R2	Estimation of the variance of X explained by C² from Continuous Algorithm Model 3	`$continuous`
Y_C2_R2	Estimation of the variance of Y explained by C² from Continuous Algorithm Model 4	`$continuous`
Y_X2_R2	Estimation of the variance of Y explained by X² from Continuous Algorithm Model 5	`$continuous`
C_X2_R2	Estimation of the variance of C explained by X² from Continuous Algorithm Model 6	`$continuous`

wald_ratio_OR	Causal estimate of X on Y (OR) calculated using the allele score Wald Ratio (odds Y per unit X)	`$binary`
wald_ratio_OR_95_LL	Lower limit of the 95% CI of the odds ratio	`$binary`
wald_ratio_OR_95_UL	Upper limit of the 95% CI of the odds ratio	`$binary`

wald_ratio	Causal estimate of X on Y calculated using the allele score Wald Ratio (unit Y per unit X)	`$continuous`
wald_ratio_95_LL	Lower limit of the 95% CI of the causal estimate	`$continuous`
wald_ratio_95_UL	Upper limit of the 95% CI of the causal estimate	`$continuous`

X_G_R2	Variance of X explained by G	`$binary` and `$continuous`

If an output directory is defined, additional metadata with the excluded combinations (removed_combinations.txt), as well as overall allele score MR causal effects and the instrument strength for all pairs of X and Y (binary_overall.txt and continuous_overall.txt) will also be outputted.

Part 2: Algorithm

The algo() function automates the algorithm as described in our main manuscript. First, all the filtered combinations outputted by preprocess() are evaluated using the main DR by X method. Next, all Bonferroni significant combinations are evaluated as per the steps in either diagram below.

Algorithm for Binary Outcomes

Algorithm for Continuous Outcomes

The algo() function automatically calls the preprocess() function. Therefore, it is not necessary to first pre-process the data before running this function. However, if you already completed the pre-processing steps and do not want to run preprocess() again, you can specify a non-null value for the parameter combinations such that it uses the final outputs from preprocess() directly from the R environment.

An optional input file for this function can be specified using the causal_relationships parameter. This is a file defining the directionality of causal relationships between the exposures and stratifying variables used. For example, age must be causal of BMI, as BMI cannot cause age. Each row in the causal relationships file should represent a pair wherein the causal direction is known. The first column should contain the causal parent, while the second column should contain the causal child. For example, AGE would be in the first column, while BMI would be in the second column. The column names should be C1 and C2 from left to right. There is no need to specify the suffices representing the variable type in this causal relationships file. Examples of all optional input files can also be generated using generate_sample_data().

Sample code for running the algorithm can be found below:

# Run algorithm
algo <- StratMRAlgo::algo(
  quantile_number = 50,
  input_dir = "/Your_Directory/Package_Testing_Sample_Data/", # modify
  output_dir = "/Your_Directory/2_algo/", # modify
  data = "data.csv",
  diagnostically_related = "diagnostically_related.txt",
  binary = NULL, # could specify "binary_comb_interest.txt"
  continuous = NULL, # could specify "continuous_comb_interest.txt"
  n = NULL,
  causal_relationships = "causal_relationships.txt",
  combinations = NULL, # could specify combinations = combinations if you do not wish to run the pre-processing steps again
  p_threshold = 0.05)

The function outputs a list object containing two elements: $binary_results and $continuous_results corresponding to binary_results.csv and continuous_results.csv respectively if output directories are specified. Both data frames contain the filtered exposure, stratifying variable, and outcome combinations in each row with their effect modification coefficients, p-values, and standard errors calculated using DR by X and any subsequent stratified MR methods, as well as whether they were determined to be overall Bonferroni significant by the algorithm. Data from preprocess() outputs pertaining to each combination remain in the two data frames. New variables which have not yet been defined in Part 1 are summarized below:

Column Name	Defintion	Data Frame(s)

SIGNIFICANT	TRUE if determined to be overall significant by the algorithm; FALSE if determined not to be overall significant by the algorithm; INCONCLUSIVE otherwise	`$binary_results` and `$continuous_results`

DR_BY_X_P	P-value for the effect modification coefficient calculated using DR by X	`$binary_results` and `$continuous_results`
DR_BY_X_BETA	Effect modification coefficient calculated using DR by X; units described in Part 4	`$binary_results` and `$continuous_results`
DR_BY_X_SE	Effect modification coefficient standard error calculated using DR by X	`$binary_results` and `$continuous_results`

DR_BY_G_P	P-value for the effect modification coefficient calculated using DR by G	`$binary_results` and `$continuous_results`
DR_BY_G_BETA	Effect modification coefficient calculated using DR by G; units described in Part 4	`$binary_results` and `$continuous_results`
DR_BY_G_SE	Effect modification coefficient standard error calculated using DR by G	`$binary_results` and `$continuous_results`

RESID_STRAT_P	P-value for the effect modification coefficient calculated using Resid-Strat	`$continuous_results`
RESID_STRAT_BETA	Effect modification coefficient calculated using Resid-Strat; units described in Part 4	`$continuous_results`
RESID_STRAT_SE	Effect modification coefficient standard error calculated Resid-Strat	`$continuous_results`

If an output directory is defined, individual .csv files with additional metadata for each combination/method applied will be outputted as well.

Part 3: All Stratified MR Methods

All five stratified MR methods can also be run for a set of pre-processed exposure, stratifying variable, and outcome combinations. The all_methods() function has the same input parameters as algo(). However, instead of evaluating all combinations using DR by X and running Bonferroni significant combinations through the algorithm, it simply outputs the effect modification estimates, p-values, and standard errors for each combination using all methods and states whether bias is likely per method based on the simulation-derived thresholds.

Sample code for running all stratified MR methods (DR by X, DR by G, Resid-Strat, Resid-Wald, and Raw Strat) can be found below:

# Run all methods for each combination
all_methods <- StratMRAlgo::all_methods(
  quantile_number = 50,
  input_dir = "/Your_Directory/Package_Testing_Sample_Data/", # modify
  output_dir = "/Your_Directory/3_all_methods/", # modify
  data = "data.csv",
  diagnostically_related = "diagnostically_related.txt",
  binary = NULL, # could specify "binary_comb_interest.txt"
  continuous = NULL, # could specify "continuous_comb_interest.txt"
  n = NULL, # defaults to n = 1, no Bonferroni correction
  causal_relationships = "causal_relationships.txt",
  combinations = NULL, # could specify combinations = combinations if you do not wish to run the pre-processing steps again
  p_threshold = 0.05)

The function outputs a list object containing two elements: $binary_results and $continuous_results corresponding to binary_results_all.csv and continuous_results_all.csv respectively if output directories are specified. Both data frames contain the filtered exposure, stratifying variable, and outcome combinations in each row with their effect modification coefficients, p-values, and standard errors calculated using each method, as well as whether they were determined to be likely biased based on the thresholds shown in Table 2 of the manuscript. Data from preprocess() outputs pertaining to each combination remain in the two data frames. New variables which have not yet been defined in Part 1 are summarized below:

Column Name	Defintion	Data Frame(s)

DR_BY_X_P	P-value for the effect modification coefficient calculated using DR by X	`$binary_results` and `$continuous_results`
DR_BY_X_BETA	Effect modification coefficient calculated using DR by X; units described in Part 4	`$binary_results` and `$continuous_results`
DR_BY_X_SE	Effect modification coefficient standard error calculated using DR by X	`$binary_results` and `$continuous_results`
DR_BY_X_SIGNIFICANT	TRUE if determined to be significant (< p_threshold/n, where n defaults to 1) by DR by X; FALSE if determined not to be significant by DR by X	`$binary_results` and `$continuous_results`

DR_BY_G_P	P-value for the effect modification coefficient calculated using DR by G	`$binary_results` and `$continuous_results`
DR_BY_G_BETA	Effect modification coefficient calculated using DR by G; units described in Part 4	`$binary_results` and `$continuous_results`
DR_BY_G_SE	Effect modification coefficient standard error calculated using DR by G	`$binary_results` and `$continuous_results`
DR_BY_G_SIGNIFICANT	TRUE if determined to be significant (< p_threshold/n, where n defaults to 1) by DR by G; FALSE if determined not to be significant by DR by G	`$binary_results` and `$continuous_results`

RESID_STRAT_P	P-value for the effect modification coefficient calculated using Resid-Strat	`$binary_results` and `$continuous_results`
RESID_STRAT_BETA	Effect modification coefficient calculated using Resid-Strat; units described in Part 4	`$binary_results` and `$continuous_results`
RESID_STRAT_SE	Effect modification coefficient standard error calculated Resid-Strat	`$binary_results` and `$continuous_results`
RESID_STRAT_SIGNIFICANT	TRUE if determined to be significant (< p_threshold/n, where n defaults to 1) by Resid-Strat; FALSE if determined not to be significant by Resid-Strat	`$binary_results` and `$continuous_results`

RAW_STRAT_P	P-value for the effect modification coefficient calculated using Raw Strat	`$binary_results` and `$continuous_results`
RAW_STRAT_BETA	Effect modification coefficient calculated using Raw Strat; units described in Part 4	`$binary_results` and `$continuous_results`
RAW_STRAT_SE	Effect modification coefficient standard error calculated Raw Strat	`$binary_results` and `$continuous_results`
RAW_STRAT_SIGNIFICANT	TRUE if determined to be significant (< p_threshold/n, where n defaults to 1) by Raw Strat; FALSE if determined not to be significant by Raw Strat	`$binary_results` and `$continuous_results`

RAW_STRAT_P	P-value for the effect modification coefficient calculated using Raw Strat	`$binary_results` and `$continuous_results`
RAW_STRAT_BETA	Effect modification coefficient calculated using Raw Strat; units described in Part 4	`$binary_results` and `$continuous_results`
RAW_STRAT_SE	Effect modification coefficient standard error calculated Raw Strat	`$binary_results` and `$continuous_results`
RAW_STRAT_SIGNIFICANT	TRUE if determined to be significant (< p_threshold/n, where n defaults to 1) by Raw Strat; FALSE if determined not to be significant by Raw Strat	`$binary_results` and `$continuous_results`

RESID_WALD_P	P-value for the effect modification coefficient calculated using Resid-Wald	`$continuous_results`
RESID_WALD_BETA	Effect modification coefficient calculated using Resid-Wald; units described in Part 4	`$continuous_results`
RESID_WALD_SE	Effect modification coefficient standard error calculated Resid-Wald	`$continuous_results`
RESID_WALD_SIGNIFICANT	TRUE if determined to be significant (< p_threshold/n, where n defaults to 1) by Resid-Wald; FALSE if determined not to be significant by Resid-Wald	`$continuous_results`

DR_BY_X_BIAS	POSSIBLE_BIAS indicates potential bias for DR by X; NO_BIAS_DETECTED indicates no bias detected for DR by X	`$binary_results` and `$continuous_results`
DR_BY_G_BIAS	POSSIBLE_BIAS indicates potential bias for DR by G; NO_BIAS_DETECTED indicates no bias detected for DR by G	`$binary_results` and `$continuous_results`
RESID_STRAT_BIAS	POSSIBLE_BIAS indicates potential bias for Resid-Strat; NO_BIAS_DETECTED indicates no bias detected for Resid-Strat	`$binary_results` and `$continuous_results`
RESID_WALD_BIAS	POSSIBLE_BIAS indicates potential bias for Resid-Wald; NO_BIAS_DETECTED indicates no bias detected for Resid-Wald	`$continuous_results`
RAW_STRAT_BIAS	POSSIBLE_BIAS indicates potential bias for Raw Strat; NO_BIAS_DETECTED indicates no bias detected for Raw Strat	`$binary_results` and `$continuous_results`

Similar to algo(), if an output directory is defined, individual .csv files with additional metadata for each combination/method applied will be outputted as well.

Part 4: Individual Stratified MR Methods

Any of the five methods can also be used to evaluate any exposure, stratifying variable, and outcome combination of interest using the function mr(). Doubly ranked methods were adapted from Tian et al. (2023) and residual stratification methods were adapted from Staley and Burgess (2017) and their respective Github repositories. Sample code can be found below:

# Test one specific combination using DR by X
dr_x_sample <- StratMRAlgo::mr(method = "by_x", # can specify 'by_x', 'by_g', 'resid_strat', 'raw_strat', or 'resid_wald'
   exposure_name = "X1_EXP", 
   outcome_name = "Y2_CASE_STATUS", 
   stratifying_name = "C2_STRAT", 
   quantile_number = 50, 
   data = "data.csv", 
   input_dir = "/Your_Directory/Package_Testing_Sample_Data/", # modify
   output_dir = "/Your_Directory/4_dr_x_sample/") # modify

The output is a list containing three elements: $p, $beta (effect modification coefficient), and $se obtained from the meta-regression of stratum estimates. For binary outcomes, $beta is reported in log-odds Y per unit X per unit C. For continuous outcomes $beta is reported in units Y per unit X per unit C.

Note that when age is the outcome, sex is the outcome, or age is the stratifying variable, the respective variable will be automatically removed as a covariate for the calculation of the stratum-specific MR estimates. This applies to algo(), all_methods(), and mr().

License

GNU General Public License v3.0

Contact information

Any queries pertaining to the StratMRAlgo R package or methodological framework can be addressed to either: Alice Man (mana3@mcmaster.ca) or Guillaume Paré (pareg@mcmaster.ca).

Citation

Citation will be made available in the future.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
R		R
man		man
.editorconfig		.editorconfig
.lintr		.lintr
DESCRIPTION		DESCRIPTION
LICENSE		LICENSE
NAMESPACE		NAMESPACE
README.md		README.md
binary.png		binary.png
continuous.png		continuous.png
preprocess.png		preprocess.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

StratMRAlgo

Table of Contents

Installation

Workflow

Part 1: Data Pre-processing

Part 2: Algorithm

Part 3: All Stratified MR Methods

Part 4: Individual Stratified MR Methods

License

Contact information

Citation

About

Languages

License

GMELab/StratMRAlgo

Folders and files

Latest commit

History

Repository files navigation

StratMRAlgo

Table of Contents

Installation

Workflow

Part 1: Data Pre-processing

Part 2: Algorithm

Part 3: All Stratified MR Methods

Part 4: Individual Stratified MR Methods

License

Contact information

Citation

About

Resources

License

Stars

Watchers

Forks

Languages