Unsupervised Deep Representation Learning Enables Phenotype Discovery for Genetic Association Studies of Brain Imaging
This is the official repository accompanying the paper Unsupervised deep representation learning enables phenotype discovery for genetic association studies of brain imaging.
Khush Patel, Ziqian Xie, Hao Yuan, Sheikh Muhammad Saiful Islam, Yaochen Xie, Wei He, Wanheng Zhang, Assaf Gottlieb, Han Chen, Luca Giancardo, Alexander Knaack, Evan Fletcher, Myriam Fornage, Shuiwang Ji, Degui Zhi.
We use unsupervised learning based on 3D convolutional autoencoder architecture to derive 128-dimensional imaging derived endophenotypes to represent complex genetic architecture of the human brain. The deep learning derived endophenotypes UDIPs (UDIP: Unsupervised Deep learning derived Imaging Phenotypes) identified 9,457 significant SNPs organized into 97 independent genetic loci of which 60 loci were replicated. Twenty-six loci were not reported in earlier T1 and T2 IDP-based UK Biobank GWAS.
Overall Pipeline
A seperate model is trained on T1 and T2. The model consists of an initial convolution block, four encoder blocks, a linear latent space of 128-dimension, four decoder blocks, and a final convolution block. Mean square error using a mask excluding background was used as loss.
We also share our model weights. Please refer to notebook in the interpretation file showing how to load the model weights.
For mapping genes identified through GWAS to the specific regions of brain, we used perturbation based approach. We add noise to the dimension of the interest in the endophenotype and then identify the changes observed in the reconstruction.
Using the decoder neural network, we designed a perturbation-based decoder-interpretation (PerDI), which can tell what brain regions one ENDO dimension can map to. Without human labels, we rediscovered genes relevant to Thalamus, Pallidum, and Putamen.
- UDIP directory contains the files for generating UDIPs (also referred to as ENDOs). Read UDIPs/readme for clear instructions on how to run as command line tool.
- training directory contains the files for running the training and instructions for data preparation.
- interpretation directory contains the files for running the decoder generated perturbation based approach.
- PyTorch 1.10.0
- Nibabel 3.2.1
- Monai 0.7.0
- PyTorch lightning 1.4.9
- pandas 1.3.4
- torchmetrics 0.8.2
- FSL 6.0.5 FLIRT
We did linear registration (12 DOF) to the preprocessed brain extracted MRI provided by UKBiobank. https://git.fmrib.ox.ac.uk/falmagro/UK_biobank_pipeline_v_1/-/tree/master/
Checkout results at http://deependo.org/
This work was supported by grants from the National Institute of Aging (U01 AG070112-01A1). In addition, L.G. is supported in part by NIH grants UL1TR003167 and R01NS121154.
Patel, K., Xie, Z., Yuan, H., Islam, S. M. S., Zhang, W., Gottlieb, A., Chen, H., Giancardo, L., Knaack, A., Fletcher, E., Fornage, M., Ji, S., & Zhi, D. (2022). New phenotype discovery method by unsupervised deep representation learning empowers genetic association studies of brain imaging. MedRxiv, 2022.12.10.22283302. https://doi.org/10.1101/2022.12.10.22283302
- This repo is for research purpose. Using it at your own risk.
- GPL-v3 license.