Skip to content

Variant_Analysis

Skylar Wyant edited this page Oct 31, 2017 · 1 revision

Basic Usage

The Variant_Analysis handler uses a variety of dependencies to produce statistics about the input VCF file. Information generated by the handler includes heterozygosity summaries, missing-ness summaries, a minor allele frequency histogram, the Ts/Tv ratio, and the raw count of SNPs. If the organism is barley, population genetics statistics are also generated for 18 loci to facilitate comparison with statistics from Sanger alignments in Morrell et al. 2006.

To run Variant_Analysis, all common variables and handler-specific variables must be defined within the configuration file. Once the variables have been defined, Variant_Analysis can be submitted to a job scheduler with the following command (assuming that you are in the directory containing sequence_handling):

./sequence_handling Variant_Analysis Config

Where Config is the full file path to the configuration file.

Handler-Specific Variables

The following are a list of variables that need to be defined within Config. In addition to the handler-specific variables, all common variables must be defined.

Variable Function
VA_QSUB QSub settings for batch submission. Recommended settings are "mem=22gb,nodes=1:ppn=16,walltime=24:00:00".
VA_VCF The full file path to the VCF file to be analyzed.

Output

Variant_Analysis generates text files and pdf files with the summary statistics at

${OUT_DIR}/Variant_Analysis

Dependencies

Variant_Analysis depends on VCFtools, vcflib, molpopgen, Python3, GNU Parallel, BCFtools, R, TeX Live, and the Enthought Python Distribution. In addition, PBS is required for basic operation. Please check the dependencies page to ensure that you are using the required version of each dependency.