As of October 2020, ANGSD-wrapper will be undergoing active development by Samuel Hamann to improve the project. Some areas of improvement include:
- Updating underlying dependencies to work with the newest stable ANGSD version (currently 0.933 based upon the Github repo)
- Environment and dependency management via bioconda
- Improving performance via parallelization
ANGSD-wrapper is a utility developed to aid in the analysis of next generation sequencing data. Users can do the following with this suite:
- Calculate a site frequency spectrum
- Calculate a 2D site frequency spectrum with corresponding FST estimations
- Perform ABBA/BABA tests
- Extract a FASTA sequence from BAM files
- Calculate genotype likelihoods
- Estimate Thetas and various neutrality statistics
- Calculate per-individual inbreeding coefficient
- Find admixture proportions
Likelihood based approaches are used in ANGSD to calculate summary statistics from next generation sequencing data. The wrapper scripts and documentation are designed to make ANGSD user-friendly.
To install ANGSD-wrapper, download from GitHub
git clone https://github.com/ANGSD-wrapper/angsd-wrapper.git
Go into the ANGSD-wrapper directory
cd angsd-wrapper/
Run the setup command
./angsd-wrapper setup dependencies
Download the example dataset (optional)
./angsd-wrapper setup data
Finish the installation
source ~/.bash_profile
ANGSD requires BAM files as input, and ANGSD-wrapper passes a list of BAM files to ANGSD. These BAM files have a few requirements:
- The BAM files must have an '@HD' header line
- The BAM files must be indexed (.bai)
To see whether or not the BAM files have an '@HD' header line, run the following on your list of samples:
for sample in `cat ~/path/to/sample_list.txt`
do
echo $sample
samtools view -H $sample | head -1
done
If any samples start with '@SQ' instead of '@HD', ANGSD and ANGSD-wrapper will fail. This Gist will add an @HD
header lines to your BAM files.
The index files must be generated after the BAM files. To index the BAM files using SAMTools, run the following on your sample list:
for sample in `cat ~/path/to/sample_list.txt`
do
samtools index $sample
done
If you have GNU Parallel installed on your system, this process can be sped up:
cat ~/path/to/sample_list.txt | parallel samtools index {}
To run ANGSD-wrapper, run
angsd-wrapper <wrapper> <config>
Where wrapper
is one of the methods that ANGSD-wrapper can run and config
is the relative path to the corresponding configuration file.
To see a list of available wrappers, run
angsd-wrapper
There is a configuration (config) file for each method available through angsd-wrapper.
The configuration files hold variables used by the wrappers. This is where you need to modify and save the variables (i.e., specify filepaths of indexed BAM files/CRAM files, FASTA files, sample lists, etc.) to suit your samples before running angsd-wrapper with a specified method.
The default config files can be found in the Configuration_Files
directory. You will need to modify them to suit your samples. Please refer to the config files or the wiki to see what each variable is used for and how they should be specified. If you run angsd-wrapper
without any arguments, it will return a usage message.
Example config files can be found in Example_Data/Configuration_Files
upon running angsd-wrapper setup data
.
For more information about ANGSD-wrapper, the methods availble through ANGSD-wrapper, and a comprehensive tutorial, please see the wiki.
This package requires the following dependencies:
These are downloaded and installed automatically when angsd-wrapper is installed
There are a few other dependencies that are not automatically downloaded during the installation:
- Site frequency spectrum (SFS)
- Thetas estimations
- 2D SFS and FST
- ABBA/BABA
- Ancestral sequence extractions
- Genotype likelihood estimations
- Inbreeding coefficients calculations
- Principal component analysis
- Admixture analysis
ANGSD-wrapper was published in Molecular Ecology Resources; if you use this in your work please cite the paper. For BibTeX users, the citation is as follows:
@article {MEN:MEN12578,
author = {Durvasula, Arun and Hoffman, Paul J. and Kent, Tyler V. and Liu, Chaochih and Kono, Thomas J. Y. and Morrell, Peter L. and Ross-Ibarra, Jeffrey},
title = {angsd-wrapper: utilities for analysing next-generation sequencing data},
journal = {Molecular Ecology Resources},
issn = {1755-0998},
url = {http://dx.doi.org/10.1111/1755-0998.12578},
doi = {10.1111/1755-0998.12578},
pages = {n/a--n/a},
keywords = {domestication, population genetics, software, visualization, Zea},
year = {2016},
}