Data preparation and study reproducibility

Downloading the data

Please proceed to the next section if you have your own longitudinal data.

All of the datasets used in our study are publicly accessible; however, accessing them requires an application process. Here we provide some useful links.

Dataset	Dataset website	Application link
ADNI	link	link
OASIS 3	link	link
AIBL	link	link

Upon gaining access, download all T1-weighted MRIs, retaining only the data for subjects with multiple visits. Both demographic data (age, sex) and diagnostic information (cognitive status) are accessible across all three datasets. The following decision trees were utilized to extract diagnostic information from the ADNI and OASIS 3 datasets. For AIBL, the ADNI tree can be used.

Preprocessing the MRIs

Each MRI undergoes preprocessing. Assuming that your MRIs are already in the nifti format, perform the following steps:

Step	Algorithm	Command	Software
bias-field correction	N4	`N4BiasFieldCorrection`	ANTs
skull stripping	SynthStrip	`mri_synthstrip`	FreeSurfer
Affine registration to MNI 152 space	SyN	`antsRegistrationSyNQuick.sh`	ANTs
Brain regions segmentation	SynthSeg 2.0	`mri_synthseg`	FreeSurfer
Intensity normalization	WhiteStripe	`ws-normalize`	intensity-normalization

The preprocessing outputs a normalized, skull-stripped, and corrected brain MRI along with its segmentation. All these steps take approximately 30 seconds to complete (tested on a GeForce RTX 4090 GPU).

Organizing the dataset

Organize your data into a CSV file (dataset.csv) with 9 fields in total:

subject_id: the unique identifier of the subject
image_uid: the unique identifier of the brain MRI
split: train, valid or test
sex: the sex of the subject (male=0, female=1)
age: age of the subject (divided by 100) when the MRI was acquired
diagnosis: cognitive status (CN=0, MCI=0.5, AD=1)
last_diagnosis: last diagnosis available among all follow-up visits
image_path: absolute path to the brain MRI
segm_path: absolute path to the brain MRI segmentation
latent_path: absolute path to the brain MRI latent (*)

(*) The latents will be extracted following the training of the autoencoder. We provide a script that automatically extracts latents from your entire dataset. This script stores the latent representations in the same folder as the images, replacing the '.nii.gz' extension with '_latent.npz' in the filename. As a result, you can anticipate the storage location of the latent representations even before their computation. For instance, if image_path=/foo/bar/brainXYZ.nii.gz, the corresponding latent will be stored at /foo/bar/brainXYZ_latent.npz.

Computing CSVs for training

The autoencoder can be trained directly using the dataset.csv file. The Diffusion UNet requires knowledge of the regional volumes associated with each brain MRI, which will be provided in a CSV file named A.csv. The ControlNet necessitates pairs of images, $x_A$ and $x_B$, acquired at ages $A < B$ from the same subject. These records will be organized in pairs within a dataset named B.csv. To generate A.csv and B.csv, execute the following command:

python scripts/prepare/prepare_csv.py \
    --dataset_csv dataset.csv \
    --output_path /path/to/output/dir

Reproducibility inquiries

For further inquiries regarding reproducibility, please open a GitHub issue.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

REPR-DATA.md

REPR-DATA.md

Data preparation and study reproducibility

Table of Contents

Downloading the data

Preprocessing the MRIs

Organizing the dataset

Computing CSVs for training

Reproducibility inquiries

Files

REPR-DATA.md

Latest commit

History

REPR-DATA.md

File metadata and controls

Data preparation and study reproducibility

Table of Contents

Downloading the data

Preprocessing the MRIs

Organizing the dataset

Computing CSVs for training

Reproducibility inquiries