Example Code for a Monte Carlo Simulation Study

Code in R and Stata to code, run, and analyse a simulation study is included in this repository. The specific simulation study is described down below.

Structure of this repository

This repository is organised as follows:

R code is included in the folder named R, with three scripts: one for coding the simulation study, one for analysing the results, and one for creating plots and tables with the results;
Stata code is included in the folder named Stata, including three analogous scripts.
Simulated data with the results obtained by the above-mentioned code in R and Stata are included in the folder named data. The R dataset has extension *.RDS, while the Stata dataset has extension *.DTA.

Aims of the simulation study

The aim of this simulation study is pedagogical.

Say we have a study which can be summarised by the following DAG:

library(ggdag)
library(ggplot2)

confounder_triangle() %>%
  ggdag() +
  theme_dag_blank(base_size = 12) +
  coord_cartesian(xlim = c(-0.5, 2.5), ylim = c(-0.5, 1.5))

where X is the exposure, Y the outcome, and Z a confounder.

Note that the exposure and the outcome are independent, there is no edge between X and Y.

We know that there should be no observed association between X and Y once we adjust for Z in our analysis.

However:

Is it true that once we adjust for Z we observe no association between X and Y?
What happens if we fail to adjust for a confounder in our analysis?

This is what we are trying to learn from this simulation study.

Data-generating mechanisms

X is a binary treatment variable, Z is a continuous confounder, and Y is a continuous outcome.

Z is simulated from a standard normal distribution, while X depends on Z according to a logistic model:

$$ \log \frac{P(X = 1)}{P(X = 0)} = \gamma_0 + \gamma_1 Z $$

Y depends on Z according to a linear model:

$$ Y = \alpha_0 + \alpha_1 Z + \varepsilon $$

with $\varepsilon$ following a standard normal distribution.

We fix the following parameters:

$\gamma_0 = 1$,
$\gamma_1 = 3$,
$\alpha_0 = 10$.

For $\alpha_1$, we actually simulate two scenarios:

$\alpha_1 = 5$: in this scenario, Z is actually a confounder;
$\alpha_1 = 0$: in the second scenario, we remove the edge between Z and Y. Here Z is not a counfounder anymore.

Finally, we generate N = 200 independent subjects per each simulated dataset.

Estimand

We have a single estimand of interest here, the regression coefficient that quantifies the association between X and Y.

Remember that, according to our data-generating mechanisms, we expect no association between X and Y.

Methods

We estimate the regression coefficient of interest using linear regression. Specifically, we fit the following two models:

A model that fails to adjust for the observed confounder Z (model 1):

$$ Y = \beta_0 + \beta_1 X + \varepsilon $$

A model that properly adjusts for Z (model 2):

$$ Y = \beta_0 + \beta_1 X + \beta_2 Z + \varepsilon $$

The coefficient of interest is $\beta_1$ according to this notation.

Performance measures

The key performance measure of interest is bias, defined as

$$ E(\hat{\beta_1}) $$

as we know that — according to our DGMs — the true $\beta_1 = 0$.

We might also be interested in estimating the power of a significance test for $\hat{\beta_1}$ at a given significance level, e.g. 0.05. In our example, think of that as a test for the null hypothesis $\beta_1 = 0$.

License

This work is licensed under a Creative Commons Attribution 4.0 International License.

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
R		R
README_files/figure-gfm		README_files/figure-gfm
Stata		Stata
data		data
.gitignore		.gitignore
LICENSE		LICENSE
Makefile		Makefile
README.Rmd		README.Rmd
README.md		README.md
simulation-study-example.Rproj		simulation-study-example.Rproj

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Example Code for a Monte Carlo Simulation Study

Structure of this repository

Aims of the simulation study

Data-generating mechanisms

Estimand

Methods

Performance measures

License

About

Languages

License

ellessenne/simulation-study-example

Folders and files

Latest commit

History

Repository files navigation

Example Code for a Monte Carlo Simulation Study

Structure of this repository

Aims of the simulation study

Data-generating mechanisms

Estimand

Methods

Performance measures

License

About

Resources

License

Stars

Watchers

Forks

Languages