Skip to content

Respository to reproduce statistical analysis from Oosterwegel et al. (2023) (DOI: 10.1021/acs.est.3c03233)

Notifications You must be signed in to change notification settings

moosterwegel/variability-metabolites-paper

Repository files navigation

Oosterwegel et al. (2023)

This repository contains the code used for the statistical analysis of 'Variability of the Human Serum Metabolome over 3 Months in the EXPOsOMICS Personal Exposure Monitoring Study' by Oosterwegel et al. (2023) (DOI: 10.1021/acs.est.3c03233)

Please see below for details on reproducing the statistical analysis using the data, code and possibly the Docker image.

Data

The data was collected and generated by the multicenter EXPOsOMICS Personal Exposure Monitoring study. Details are described in our paper. Part of this data was made openly available to make reproduction of the statistical analysis of the paper as easy as possible. The data is stored in a Zenodo repository (DOI: 10.5281/zenodo.8156759). If you use this data in your own work, please cite DOI: 10.5281/zenodo.8156759.

You can download this data with by running the code from the code/0_get_data.R file. This will create a data folder with the content from the Zenodo repository, and allows you to run code that requires data such as code/1_runs_paper.R

Once you have downloaded the data, you'll find the following variables in the datasets:

data/processed_covariate_data.csv:

Rows: 298
Columns: 7
$ subjectid: hashed identifier subject
$ sample_code: indicates if it's the first (A) or second (B) blood sample
$ centre: indicates in which centre the data was collected
$ age_cat: indicates age category at the time of a PEM session
$ sq_sex:  indicates the sex of the participant (male, female) as filled in during the screening questionaire
$ traf: indicates the exposure to traffic (PM2.5 and UFP) as measured during the PEM sessions. 
$ bmi_cat: indicates BMI category at the time of a PEM session

data/processed_lcms_data data.csv contains the processed LCMS data:

Rows: 298
Columns: 4297
$ subjectid: hashed identifier subject
$ sample_code: indicates if it's the first (A) or second (B) blood sample
$ centre: indicates in which centre the data was collected
$ compounds: measured features (compounds) are prefixed by the letter X. The name contains information on the measured monoisotopicmass_retentiontime.
Non-detects (below limit of detection (LOD) are coded as 1 for the compounds.
....

In the datasets each row indicates a measurement on a day (sample_code) and person (subjectid). The datasets can be joined on these variables.

The other data files (annotations.xslx, ancestors_annotations.xlsx, annotations_plus_kegg_pathways.csv) contain the annotations, ancestors of the annotations (to assign a class to a compound based on ChEBI ontology, see our paper for details), annotations plus KEGG pathways respectively. 

Code

In combination with the open data (that contains categorized age and BMI instead of the raw data) the unadjusted statistical analysis of the paper can be reproduced using the code in this repository. code/1_runs_paper.R describes what commands were ran for the analyses reported in the paper. Subsequently the results can be summarized and visualized by running all the code from code/2_summarize_results.R

Running the unadjusted experiment can take quite some time (~14 hours using 24 cores). To speed things up you can set pre-compiled to TRUE. This will use the default flat brms prior as calibrated for the first compound on all the compounds. This will result in a different ICC for a few compounds, but the reported aggregate metrics from the paper will very likely be the same. The original analyses were done with pre-compiled set to FALSE (as can be seen in code/1_runs_paper.R).

The code/test_bench.R file was not used for the results reported in the paper, but can be used to fit a single model at a time (and may as such be easier to interpret what's going on).

Results

All results from the models are in the results folder. The files describe summary statistics computed from the posterior distribution estimated by the model for each compound. Rerunning the models with the code from code/1_runs_paper.R will overwrite these files.

Run the Docker container

To make reproduction as easy as possible, we have published a Docker image of the computational environment that was used to run the statistical analysis. This Docker image can be built by running (assuming you are in the right directory):

docker build -tag moosterwegel/variability-metabolites-paper .

This may take awhile (~20 minutes), so you can save time by downloading the image directly by running

docker pull moosterwegel/variability-metabolites-paper

After you have obtained the image, you can launch a container that contains a correctly configured R environment by running

docker run -v $PWD:/home/rstudio -p 8787:8787 -e PASSWORD=YOUR_CONTAINER_PASSWORD moosterwegel/variability-metabolites-paper

If you then navigate to the 8787 port (e.g. http://localhost:8787/ in your web browser) on your machine and login with the username rstudio and the password you provided earlier (YOUR_CONTAINER_PASSWORD), you'll be able to work in an R environment will all the packages and software required to reproduce the statistical analysis.

Contact / issues / questions

If you have any questions, you can open an issue or start a new discussion in this repository or email me at max.oosterwegel@gmail.com

About

Respository to reproduce statistical analysis from Oosterwegel et al. (2023) (DOI: 10.1021/acs.est.3c03233)

Resources

Stars

Watchers

Forks

Packages

No packages published