Skip to content

Analysis of SNP variants, derived from chip array genotyping and HTS sequencing

License

Notifications You must be signed in to change notification settings

sscansan/genomics_and_GWAS_tutorials

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Genomic projects tutorials

⚠️ Repository under construction ⚠️

This repository contains a collection of genomic projects that I am working on. GitHub repository of bioinformatic projects recolving around genomics using different tools like Plink through plinkr R package, rTASSEL and TASSEL 5 (GUI), GEMMA for mixed models analysis in R, SAMtools to analyze BAM files, and other coming soon!

The repository has been created for testing and self-teaching purposes of biological concept and bioinformatic tools, and make use of other repositories, scripts and data sources, taken or modified as such.

The report of the studies in progress is at:

"Report/build/Genomics_proj.pdf"

Contents

Tools

Example case studies

  1. Vitis vinifera subsp. sylvestris collection

Data coming from the repository: Repository.

A dataset of 9.896 single nuclear polymorphisms for 112 wild grapes, obtained with the GrapeReSeq 18K Vitis chip

The data have been published in: Ramos-Madrigal, J., Runge, A.K.W., Bouby, L. et al. Palaeogenomic insights into the origins of French grapevine diversity. \textit{Nat. Plants} 5, 595–603 (2019). https://doi.org/10.1038/s41477-019-0437-5

The dataset, comprising 9.896 SNPs for 112 wild grapes (Vitis vinifera subsp. sylvestris), is made available here in support of the paper : Ramos-Madrigal J, Wiborg Runge AK, Bouby L, Lacombe T, Samaniego-Castruita JA, Adam-Blondon AF, Figueiral I, Hallavant C, Martínez-Zapater JM, Schaal C, Töpfer R, Petersen B, Sicheritz-Pontén T, This P, Bacilieri R, Gilbert MTP, Wales, 2019. Palaeogenomic insights into the origins of French grapevine diversity. Submitted to Nature Plants, 2019. These 9.869 SNPs are a subset of the 10.207 SNPs for cultivated grapes previously published by Le Paslier et al, 2018 (\url{https://doi.org/10.15454/1.4861359557068474E12}). Plant material was harvested in two grapevine collections (FAO WIEWS instcode FRA139 and DEU098), respectively: A) France, “INRA Domaine de Vassal, Marseillan-Plage” (http://www6.montpellier.inra.fr/vassal) ; and B) Germany, “JKI Geilweilerhof, Siebeldingen” (http://www.deutsche-genbank-reben.julius-kuehn.de/) (2019-04-10) }

Vitis Wild PCA

  1. SNP profiling of goat breeds.
    Data source: Colli et al. (2018) https://doi.org/10.1186/s12711-018-0422-x

Multidimensional scaling of the genotypes

Scree plot of all genotypes and multidimensional scaling of a subset of genotypes

Multidimensional Scaling (MDS) Plot of a population of 4,653 Individuals from 169 Goat Breeds genotyped with 49,953 SNPs.

The MDS plot visualizes genetic relationships among 4,653 individuals from 169 goat breeds. Genetic distances were computed using PLINK to generate the distance matrix, and MDS analysis was conducted with the cmdscale function based on genotyping data from 49,953 SNPs. Each point represents a goat, and spatial arrangement reflects genetic dissimilarities. This exploratory analysis offers insights into genetic diversity, population structure, and relatedness.

  1. a. Manhattan plot of a GWAS on dog population for deafness.Data source: Hayward et al. (2020) https://doi.org/10.1371/journal.pone.0232900

Manhattan plot

Manhattan plot of a single canine breed

Manhattan plots showing the genome wide association (GWA) between dog deafness and their genotype. The plot displays the genomic positions of single nucleotide polymorphisms (SNPs) across the genome on the x-axis, with the corresponding -log10 transformed P-values indicating the strength of association with the trait on the y-axis. The red-dashed lines are representation of the 99.99 percentile threshold of the LOD values.

  1. b. Plot of the top significant SNPs identified in the above GWAS.

    Points are jittered around their respective chromosome.

Top scoring SNPs

and a zoom in the chromosome 3 above the 99.99 percentile (LOD score = 4.71).

Top scoring SNPs of a ABC breed in the 3^rd^ chromosome

Resources & Data

Setup of the working environment

Install R: The Comprehensive R Archive Network (CRAN)

IDE:VSCode^*^/RStudio^*^

Install Python: Miniconda 3^*^

OS: Linux^*^/WSL

^*^Suggested

Get PLINK working in Linux

  1. Download PLINK 1.90 Linux 64-bit

  2. Install PLINK

    cd Downloads/
    sudo unzip plink_linux_x86_64_20200616.zip -d plink_install
    
  3. PLINK in usr/local/bin

    cd plink_install
    sudo cp plink /usr/local/bin
    sudo chmod 755 /usr/local/bin/plink
    
  4. Add PLINK to PATH

    with bash/zsh/...

    sudo nano ~/.bashrc
    

    and include the line:

    export PATH=/usr/local/bin:$PATH
    

    Save and exit. Refresh the terminal and you should be able to call plink from the terminal at any user position in the system.

    source ~/.bashrc
    plink --help
    

Get plinkr (R)

PLINK directly in r.

refer to the installation guide at https://github.com/AJResearchGroup/plinkr/blob/master/doc/install.md

library(remotes)
install_github("richelbilderbeek/plinkr")
remotes::install_github("chrchang/plink-ng/2.0/pgenlibr")
library(plinkr)
install_plinks()

Get TASSEL (GUI) on Linux

  1. Go on the website https://www.maizegenetics.net/tassel and download the last UNIX verison.
  2. Download the TASSEL_{xxx}_unix.sh and make it executable
    chmod +x ~/Downloads/TASSEL_{xxx}_unix.sh
    
  3. Run the TASSEL installer
    ~/Downloads/TASSEL_{xxx}_unix.sh
    

Get rTASSEL (R)

  1. rJava installation

    sudo apt install default-jdk
    sudo R CMD javareconf
    R install.packages("rJava")
    
  2. Installation in R

    if (!require("devtools")) install.packages("devtools")
    devtools::install_github(
     repo = "maize-genetics/rTASSEL",
     ref = "master",
     build_vignettes = TRUE,
     dependencies = TRUE
    )
    
  3. Run rTASSEL

    • Allocate job's memory^1^ and start the logger (here at the root of the project):

    ^1^"-Xmx50g" and "-Xms50g", "50g" represents 50 Gigabytes of memory.

    !! Choose an appropriate value that fits your machine !!

    options(java.parameters = c("-Xmx50g", "-Xms50g"))
    rTASSEL::startLogger(fullPath = NULL, fileName = NULL)
    
    • Run & infos
    library(rTASSEL)
    ??rTASSEL
    

    Useful resource for rTASSEL are the vignettes and tutorials at https://rtassel.maizegenetics.net/index.html

Get GEMMA

GEMMA can be installed from source at the GitHub repo, but is also available through Bioconda http://www.ddocent.com/bioconda/. To install is suggested to have miniconda installed and working, and then added the channel for Bioconda, you should already have defaults and conda-forge.

conda config --add channels defaults
conda config --add channels conda-forge
conda config --add channels bioconda
conda install gemma

And use GEMMA with

gemma -h

Get GAPIT (R)

R package, here we are going to install it through GitHub. For the manual visit https://zzlab.net/GAPIT/gapit_help_document.pdf

R> install.packages("devtools")
R> devtools::install_github("jiabowang/GAPIT", force=TRUE)
R> library(GAPIT)

About

Analysis of SNP variants, derived from chip array genotyping and HTS sequencing

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published