Skip to content

view and locate recombinations positions using pedigree data

License

Notifications You must be signed in to change notification settings

HKyleZhang/RecView

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

92 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

License: GPL v3

RecView
- view and locate recombinations positions using pedigree data -

This R package is designed to distribute the RecView ShinyApp which aims at providing a user-friendly GUI for viewing and locating recombination positions on chromosomes using pedigree data.



Installation

devtools::install_github("HKyleZhang/RecView")


List of functions

Function Description
make_012gt() Formats the genotype file for RecView.
make_012gt_from_vcf() Formats the genotype file from VCF file for RecView.
run_RecView_App() Invokes RecView. RecView provides options to save the result figures and tables to your current working directory.

More details about the RecView ShinyApp

Required input files

File Description
Genotype file This file can be generated by using make_012gt() (or make_012gt_from_vcf()).
Scaffold file One .csv file having the order and orientation of the reference genome scaffolds. It should have the following columns (names are case sensitive): scaffold, size, CHR, order, orientation. Note: with a chromosome-level assembly, this file can be tweaked so to make scaffold and CHR identical, but still keep separate columns.

Additional settings

  • Choose offspring(s): Choose the offspring for the analysis. It supports multiple selection.

  • Choose chromosome(s): Choose the chromosome for the analysis. It supports multiple selection.

  • Locate recombination positions? Check 'Yes' to locate recombination positions with either of two algorithms (see below).

  • Algorithms (optional):

    • PD: Proportional Difference algorithm proceeds by specifying a window size (the number of informative SNPs of each flanking window), a step value (k) giving the number of SNPs between each calculated position, and a threshold to trigger denser calculations (at every SNP) to detect local maxima.
    • CCS: Cumulative Continuity Score algorithm calculates a CCS for each position along the chromosome, and (ii) finds putative recombination positions by locating regions where long continuously increasing slopes of CCSs of one grandparent-of-origin is replaced by long continuously increasing slopes of CCSs from the other grandparent.
  • Radius value (PD optional): the number of informative SNPs around the examined position for calculating the proportion of informative SNPs from specific grandparents.

  • Step value (PD optional): the step size to move along the chromosome. Larger values decrease the number of positions to be examined, while increasing analysis speed.

  • Finer step value (PD optional): the step size to move along the chromosome, after the absolute difference of the proportion of grandparent-of-origin reaches above the threshold. Larger value decreases the positions to be examined, while increasing the analysis speed.

  • Threshold (PD optional): the condition to initiate a finer step, and later filter the local maxima for effectively true recombination.

  • Threshold (CCS optional): the minimal CCS to consider an effectively true recombination. Larger value is more stringent and captures crossovers, while small value captures both crossovers and non-crossovers. However, small values can also capture artefacts of recombination due to wrongly called genotypes.

  • Saving options (optional):

    • GoO Inference: this option will save inferences of grandparent-of-origin for the selected offspring(s) as csv-file(s) separately for each selected chromosome in the current working directory.
    • Plots: this option will save the result figures for the selected offspring(s) as pdf-file(s) separately for each selected chromosome in the current working directory.
    • Locations: when Locate recombination positions? is checked "Yes", this option will save the table of the putative recombination locations in the selected offspring(s) as csv-file(s) separately for each selected chromosome in the current working directory.
  • Run analysis button: start the analysis!


Example workflow

For big VCF file, it is recommended to continue with Workflow A.

Workflow A

  1. Use --extract-FORMAT-info GT option in VCFtools to extract genotypes into a single file.
  2. Use make_012gt() to format the genotype file.
  3. Prepare scaffold file.
  4. In Rstudio, navigate to the working directory where the genotype file and scaffold file are stored.
  5. In Rstudio, start the RecView ShinyApp by run_RecView_App(); continue with settings and run analysis.

Workflow B

  1. Use make_012gt_from_vcf() to format the genotype file directly from VCF file.
  2. Prepare scaffold file.
  3. In Rstudio, navigate to the working directory where the genotype file and scaffold file are stored.
  4. In Rstudio, start the RecView ShinyApp by run_RecView_App(); continue with settings and run analysis.

Please cite

Zhang, H., Hansson, B. RecView: an interactive R application for locating recombination positions using pedigree data. BMC Genomics 24, 712 (2023). https://doi.org/10.1186/s12864-023-09807-2.


Changelog

  • Enable inferring grandparent-of-origin when genotypes of some individuals are missing at all or some sites.
  • Enable preview when multiple offspring and chromosomes are selected for analysis.
  • Show number of informative sites in GoO figure.
  • Reduce RAM usage by changing the way of loading input files.
  • Reduce running time for the PD algorithm.

Installing previous version

  • version 1.0.0 devtools::install_github("HKyleZhang/RecView@v1.0.0")

About

view and locate recombinations positions using pedigree data

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Languages