Skip to content

Extracting disease-specific genomic coordinates from GWAS catalog

Notifications You must be signed in to change notification settings

cresswellkg/gwas2bed

 
 

Repository files navigation

Disease- or trait-specific SNP sets, genomic coordinates

A collection of datasets from various publications containing genomic coordinates of disease- and/or trait-associated SNPs. And, scripts for their processing.

Autoimmune diseases

  • autoimmune folder. Description of autoimmune-related genomics datasets. R.GR.autoimmune - working folder with an R project for the analysis of 39 disease/trait-associated SNP sets.

  • gwasCatalog folder. Scripts to extract the coordinates of disease-specific SNP sets into separate files. Description of genomics datasets and databases related to complex diseases.

  • tumorportal folder. Description of genomics datasets and databases related to cancers.

  • population folder. Individual-specific genotypes of various populations. See README.md there.

Large data collections are in the data subfolders of the autoimmune, gwasCatalog, and tumorportal folders. Each subfolder has its own README file with the dataset-specific explanations.

Disease-disease similarities

Disease-disease similarity based on symptom similarity. Zhou X, Menche J, Barabási A-L, Sharma A: Human symptoms-disease network. Nat Commun 2014, 5(May):4212.

  • human-cooccur-disease-network.txt.gz - data from Supplementary Data 1. List of all 4,442 diseases within PubMed and their occurrence. zcat < human-cooccur-disease-network.txt.gz | sort -k2 -n -r > human-cooccur-disease-names.txt - which diseases are the most frequently studied.

  • human-sig-disease-network.txt.gz - data from Supplementary Data 4. List of disease links in the disease network with both significant shared symptoms and shared genes/PPIs. In total there are 133,106 such connections between 1,596 distinct diseases. The table has 3 columns: "MeSH Disease Term", "MeSH Disease Term", "symptom similarity score".

  • human-disease-to-UMLS.xlsx - data from Supplementary Table 6. This data file includes 33,977 records of the map from HPO phenotypes to UMLS semantic types (from UMLS 2012AA). 33,977 records.

  • human-disease-to-SNOMED.xlsx - data from Supplementary Table 7. SNOMED-CT symptom-disease relationships. The data file has six components: disease-symptom relationships, disease list, disease terms, symptom list, symptom terms and SNOMED semantic types. There are 2,340 records of disease-symptom relationships, which include 1,623 diseases and 817 symptoms. The SNOMED semantic type component lists the semantic types of concepts and their numbers in SNOMED.

  • Rzhetsky_A_Appendix3.pdf - A list of diseases, their ICD9 codes, and brief descriptions. Rzhetsky, A., Wajngurt, D., Park, N., & Zheng, T. (2007). Probing genetic overlap among complex human phenotypes. Proceedings of the National Academy of Sciences of the United States of America, 104, 11694–11699. doi:10.1073/pnas.0704820104](http://www.pnas.org/content/104/28/11694.full.pdf)

Disease-disease relationships based on gene/protein interaction networks. Suthram S, Dudley JT, Chiang AP, Chen R, Hastie TJ, Butte AJ: Network-based elucidation of human disease similarities reveals common functional modules enriched for pluripotent drug targets. PLoS Comput Biol 2010, 6:1–10.

  • Suthram_TableS2_diseases-umls.xlsx - data from Supplementary Table S1. List of 54 diseases, their UMLS IDs and GEO IDs.

  • Suthram_TableS2_disease-relationships.xlsx - data from Supplementary Table S2. List of the 138 significant disease-disease correlations. Other correlations are not significant.

About

Extracting disease-specific genomic coordinates from GWAS catalog

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • HTML 99.4%
  • Other 0.6%