Skip to content

Site Frequency Spectrum

Samuel Hamann edited this page May 5, 2021 · 4 revisions

This method calculates a site frequency spectrum using ANGSD. Please see ANGSD's tutorial page.

Basic Usage

To run this method, use the following command

angsd-wrapper SFS Site_Frequency_Spectrum_Config

where Site_Frequency_Spectrum_Config is the full path to the configuration file for the site frequency spectrum.

Input files

All inputs should be specified in Site_Frequency_Spectrum_Config.

Common Variables

This method does make use of Common_Config, those that are used are listed below:

Variable Function
SAMPLE_LIST
GROUP_SAMPLES on dev
A list of samples to be used in calculations
SAMPLE_INBREEDING
GROUP_INBREEDING on dev
A list of inbreeding coefficients, where each line here corresponds to a line in SAMPLE_LIST or GROUP_SAMPLES on dev
ANC_SEQ Path to ancestral sequence
REF_SEQ Path to reference sequence
PROJECT Name given to all outputs in ANGSD-wrapper
SCRATCH Place to store files, the full path is SCRATCH/PROJECT/SFS
REGIONS Limit the scope of ANGSD-wrapper to certain regions
UNIQUE_ONLY Use uniquely mapped reads only
MIN_BASEQUAL Minimum base quality score
BAQ Adjust Q scores around indels
MIN_IND Minimum number of individuals needed to use this site
GT_LIKELIHOOD Estimates genotype likelihoods
MIN_MAPQ Minimum base mapping quality
N_CORES Number of cores to use, please do not set above the limits of your system
DO_MAJORMINOR Estimate major/minor alleles
DO_GENO Peform genotype calling
DO_MAF Calculate per-site frequencies
DO_POST Calculate the posterior probability using per-site frequencies

Method-Specific Variables

This method has no method-specifc variables

Method Parameters

The parameters for this method can be tweaked as necessary, they have been set for optimal generalized function:

Parameter Function
DO_SAF Creates a site frequency spectrum
OVERRIDE If true, will recalculate files that already exist

Output files

Naming Scheme Contents
PROJECT_DerivedSFS.graph.me Final site frequency spectrum
PROJECT_SFSOut.arg Details of arguments
PROJECT_SFSOut.geno.gz Genotype calls
PROJECT_SFSOut.mafs.gz Minor allele frequencies
PROJECT_SFSOut.saf.gz Intermediate site frequency spectrum
PROJECT_SFSOUT.saf.idx Index of intermediate site frequency spectrum
PROJECT_SFSOut.saf.pos.gz Position data of the saf file

Visualization

PROJECT_DerivedSFS.graph.me can be visualized with the Shiny graphing interface. A web browser with a graphical user interface is required.

SFS without ancestral state

Newer versions of ANGSD support estimating the SFS with less developed genomes, by using the reference sequence to approximate the folded SFS, following this methodology. To use this within the wrapper, simply leave the ANC_SEQ variable blank within the config file and assign the other variables as usual.