LDSC (LD Score Regression) is a command-line tool for estimating heritability and genetic correlation from GWAS summary statistics. It also computes LD Scores. This tool is essential for researchers in genetics and genomics aiming to understand the genetic architecture of complex traits.
- Background
- Scientific Foundation
- Installation
- Running LDSC
- Testing
- Contributing
- Citing LDSC
- License
- Authors
Genome-wide association studies (GWAS) have identified thousands of genetic variants associated with complex traits. However, interpreting these associations requires robust statistical tools. LDSC provides a framework to estimate the heritability of traits and the genetic correlation between them using summary statistics from GWAS, leveraging linkage disequilibrium (LD) patterns.
LDSC implements LD Score regression, a method that distinguishes confounding biases from true polygenic signals in GWAS data. By regressing GWAS test statistics on LD Scores, LDSC estimates the proportion of variance in a trait explained by genetic factors (heritability) and assesses the shared genetic architecture between traits (genetic correlation).
Key publications:
- Bulik-Sullivan, B., Loh, PR., Finucane, H. et al. LD Score regression distinguishes confounding from polygenicity in genome-wide association studies. Nat Genet 47, 291–295 (2015). https://doi.org/10.1038/ng.3211
- Bulik-Sullivan, B., Finucane, H., Anttila, V. et al. An atlas of genetic correlations across human diseases and traits. Nat Genet 47, 1236–1241 (2015). https://doi.org/10.1038/ng.3406
- Finucane, H., Bulik-Sullivan, B., Gusev, A. et al. Partitioning heritability by functional annotation using genome-wide association summary statistics. Nat Genet 47, 1228–1235 (2015). https://doi.org/10.1038/ng.3404
- Python: Version >3.10
- Git: For cloning the repository
- Poetry: Python dependency management tool
-
Clone the Repository
git clone https://github.com/abdenlab/ldsc-python3.git cd ldsc-python3
-
Install Poetry
If you don't have Poetry installed, you can install it using the following command:
# From python-poetry.org curl -sSL https://install.python-poetry.org | python3 - # Or install with pip pip install poetry
Make sure to add Poetry to your PATH as instructed after installation.
-
Install Dependencies
Use Poetry to install all project dependencies:
poetry install
This will create a virtual environment and install all required packages as specified in
pyproject.toml
. -
Activate the Virtual Environment
poetry shell
-
Verify Installation
Run the help command to verify that LDSC is installed correctly:
python ldsc.py -h python munge_sumstats.py -h
If these commands display help messages with available options, the installation was successful.
LDSC provides several functionalities, including estimating LD Scores, heritability, partitioned heritability, genetic correlation, and the LD Score regression intercept.
ldsc --bfile 1000G_EUR_Phase3_plink/1000G.EUR.QC. --l2 --ld-wind-cm 1 --out eur_ldscores
ldsc --h2 sumstats.txt --ref-ld-chr eur_ldscores/ --w-ld-chr eur_weights/ --out h2_results
ldsc --rg trait1.sumstats.gz,trait2.sumstats.gz --ref-ld-chr eur_ldscores/ --w-ld-chr eur_weights/ --out rg_results
ldsc --h2 sumstats.txt --ref-ld-chr eur_ldscores/ --w-ld-chr eur_weights/ --overlap-annot --frqfile-chr eur_frq/ --out partitioned_h2
Replace the file paths with your actual data files. For more detailed tutorials and options, refer to the wiki.
We have included a comprehensive test suite to ensure the correctness of LDSC.
-
Activate the Virtual Environment
poetry shell
-
Run Tests with Nose2
nose2
This will execute all unit tests located in the
test
directory.
We use GitHub Actions for continuous integration. The workflow is defined in .github/workflows/python-project.yml
and runs tests across multiple Python versions.
We welcome contributions from the community. To contribute:
-
Fork the Repository
-
Create a Feature Branch
git checkout -b feature/new-feature
-
Make Changes and Commit
git commit -am "Add new feature"
-
Push to Your Fork
git push origin feature/new-feature
-
Create a Pull Request
Please ensure that your code passes all tests and adheres to the project's coding standards before submitting a pull request.
- Formatting: We use
black
for code formatting. - Linting: Code should pass
flake8
checks. - Type Checking: We use
mypy
for static type checking. - Imports: Organize imports using
isort
.
Install development dependencies:
poetry install --with dev
We provide a Dockerfile
to containerize the application.
docker build -t ldsc:dev .
docker run -it ldsc:dev ldsc -h
Adjust the command according to your needs, especially if you have specific scripts to run.
If you use LDSC in your research, please cite the following publications:
-
LD Score Regression Methodology:
Bulik-Sullivan, B.K. et al. (2015). LD Score Regression Distinguishes Confounding from Polygenicity in Genome-Wide Association Studies. Nature Genetics, 47(3), 291–295. doi:10.1038/ng.3211
-
Genetic Correlation:
Bulik-Sullivan, B. et al. (2015). An Atlas of Genetic Correlations across Human Diseases and Traits. Nature Genetics, 47(11), 1236–1241. doi:10.1038/ng.3406
-
Partitioned Heritability:
Finucane, H.K. et al. (2015). Partitioning Heritability by Functional Annotation Using Genome-Wide Association Summary Statistics. Nature Genetics, 47(11), 1228–1235. doi:10.1038/ng.3404
-
Continuous Annotation Stratified Heritability:
Gazal, S. et al. (2017). Linkage Disequilibrium–Dependent Architecture of Human Complex Traits Shows Action of Negative Selection. Nature Genetics, 49(10), 1421–1427. doi:10.1038/ng.3954
-
Relation to Haseman-Elston Regression:
Bulik-Sullivan, B.K. (2015). Relationship Between LD Score and Haseman-Elston Regression. bioRxiv. doi:10.1101/018283
This project is licensed under the GNU General Public License v3.0. You may obtain a copy of the License at https://www.gnu.org/licenses/gpl-3.0.en.html.
-
Brendan Bulik-Sullivan
Broad Institute of MIT and Harvard
-
Hilary Finucane
MIT Department of Mathematics
-
Thomas Reimonn
UMass Chan Medical School
If you encounter issues or have questions, please consider the following resources:
-
Wiki and Tutorials
Detailed tutorials are available in the wiki section.
-
Frequently Asked Questions
Common issues are addressed in the FAQ.
-
Contact Us
For further assistance, you can reach out via the GitHub Issues page.
To update to the latest version of LDSC:
-
Navigate to the Project Directory
cd ldsc-python3
-
Pull the Latest Changes
git pull
If your local repository is up to date, you will see:
Already up-to-date.
Otherwise,
git
will display the files that have been updated. -
Update Dependencies
If dependencies have changed, update them with:
poetry install
You can download precomputed LD Scores suitable for various analyses:
Partitioned LD Scores for heritability estimation are also available here.
We hope LDSC proves valuable in your research. Your contributions and feedback are greatly appreciated!