-
Notifications
You must be signed in to change notification settings - Fork 3
3. Getting Started
It is easy to get started using PIrANHA. If you're in a hurry, see the Quick Guide for the Impatient on the Wiki Home page.
In the sections below, detailed background information and starting instructions are given for handling dependencies, installation, usage, functions, and input/output file formats, etc.
Dependencies for each PIrANHA function are available in the help texts for the function, accessed by piranha -f <function> -h
. You can usually get away with installing only software needed for the analysis/function you are currently running with PIrANHA, thus avoiding installing dependencies or software that are unrelated to your current workflow. However, it is recommended that each user (eventually) installs all dependencies to take full advantage of PIrANHA's capabilities, in order to be prepared for any analysis!
The main dependencies for PIrANHA are Perl and utility software such (e.g. grep
, stream editor sed
), which typically come pre-installed on most UNIX/LINUX systems. Thus, for many functions, the user will not need to install any dependency software at all, especially for DNA sequence alignment conversions and simple operations.
However, some PIrANHA functions, and especially the MAGNET
pipeline (here or here) within PIrANHA, rely on several software dependencies. These dependencies are described in the help texts for different functions (see above); however, I provide a full list of them below, with asterisk marks preceding those already included in the "MAGNET-1.2.0" subdirectory of the current release.
PartitionFinder
-
BEAST
v1.8.3++ and v2.4.2++ (or newer; available here and here)- Updated
Java
, appropriate Java virtual machine / jdk required -
BEAGLE
in beagle-lib (libhmsbeagle
* files) required - default
BEAST
packages required -
SNAPP
package addon required
- Updated
-
MrBayes
v3.2++ (available here) -
ExaBayes
(available here) -
RAxML
(available here) -
Perl
v5.1+ (available here) - *Nayoki Takebayashi's file conversion Perl scripts (here; possibly available here; note: some, but not all of these, come packaged within
MAGNET
) -
Python
v2.7 and/or 3++ (available here) -
fastSTRUCTURE
v1.0 (available here) -
∂a∂i
v1.7.0++ (or v1.6.3; available here) -
R
v3++ (available here)
Users must install all software not included in PIrANHA, and ensure that it is available via the command line on their supercomputer and/or local machine (best practice is to simply install all software in both places). For more details, see the MAGNET
README.
💻 As its functions are primarily composed of UNIX shell scripts and customized R
scripts, PIrANHA is well suited for running on a variety of machine types, especially UNIX/LINUX-like systems that are now commonplace in personal computing and dedicated supercomputer cluster facilities. The UNIX shell is common to all Linux systems and macOS. Using PIrANHA is thus very straightforward, because these systems come with the shell preinstalled.
Another factor making PIrANHA easy to use is that I recently added a homebrew
tap for PIrANHA. This allows quick and painless installation or updating from the command line on macOS and Linux, using only a couple of lines of code:
brew tap justincbagley/homebrew-tap ;
brew update ;
brew install --HEAD piranha ;
piranha -i ;
It is a good idea to run source ~/.bash_profile
to restart your shell environment after install, then check to make sure PIrANHA is available from the cli by typing piranha
into Terminal and hitting enter.
The install code will install PIrANHA in your homebrew
cellar, typically located at /usr/local/Cellar/
, and homebrew
will also place the PIrANHA executable, piranha
(which controls the actual main script piranha.sh
) in your path and add file execution permissions to all of its function scripts. The last line under Install is optional, useful just in case dynamic tab completion isn't immediately available in your terminal after install.
brew upgrade --fetch-HEAD piranha
Use this upgrade code to check for and update to a new version of PIrANHA, including the latest commits, if available. It takes less than a minute to run this upgrade, which will also cause Homebrew itself to be updated. Here is an example from updating to "head" (latest cutting-edge development release) from v0.4a3:
$ brew upgrade --fetch-HEAD piranha
Updating Homebrew...
==> Auto-updated Homebrew!
Updated 2 taps (homebrew/core and justincbagley/tap).
==> Updated Formulae
Updated 17 formulae.
==> Upgrading 1 outdated package:
justincbagley/tap/piranha HEAD-cb5ac8d -> HEAD-ae7acef
==> Upgrading justincbagley/tap/piranha HEAD-cb5ac8d -> HEAD-ae7acef
==> Cloning https://github.com/justincbagley/piranha.git
Updating /Users/justinbagley/Library/Caches/Homebrew/piranha--git
==> Checking out branch master
Already on 'master'
Your branch is up to date with 'origin/master'.
HEAD is now at ae7acef Updating lib/ files
Warning: A newer Command Line Tools release is available.
Update them from Software Update in System Preferences or run:
softwareupdate --all --install --force
If that doesn't show you an update run:
sudo rm -rf /Library/Developer/CommandLineTools
sudo xcode-select --install
Alternatively, manually download them from:
https://developer.apple.com/download/more/.
==> rm /usr/local/etc/local_piranha
==> rm /usr/local/etc/brew_piranha
==> chmod +x /usr/local/Cellar/piranha/HEAD-ae7acef/bin/piranha
==> chmod +x /usr/local/Cellar/piranha/HEAD-ae7acef/bin/source_piranha_compl.sh
==> bash source_piranha_compl.sh
==> Caveats
One line was added to your ~/.bash_profile to make dynamic tab completion of function names
available on the command line while running piranha.
It will still be there after an uninstall, but is adaptive (nothing happens if piranha was uninstalled).
If you're a zsh person, then patches are welcome: https://github.com/justincbagley/piranha/blob/master/completions/source_piranha_compl.sh
==> Summary
🍺 /usr/local/Cellar/piranha/HEAD-ae7acef: 134 files, 4.7MB, built in 2 seconds
Removing: /usr/local/Cellar/piranha/HEAD-cb5ac8d... (134 files, 5.0MB)
In general form, the usage for PIrANHA is to call the main function piranha
as follows:
piranha [OPTION]... [FILE]...
where [OPTION] will usually include the mandatory function (-f
flag) and arguments for that function, which are simply passed after the function call.
Some specific usage examples are:
piranha -h Show piranha help text and exit
piranha -f list Get list of available functions
piranha -f <function> -h Show help text for <function> and exit
piranha -f calcAlignmentPIS -h Show help text for calcAlignmentPIS function and exit
piranha -f calcAlignmentPIS -t 150 Run calcAlignmentPIS with threshold at N=150 alignments
As noted under the Dependencies section above, obtain the full help text for PIrANHA can be obtained using piranha -h
, and is as follows:
piranha v1.1.8, December 2020 (main script for PIrANHA v0.4a4, update Dec 26 22:53:10 CST 2020)
Copyright (c) 2019-2020 Justin C. Bagley. All rights reserved.
----------------------------------------------------------------------------------------------------------
piranha.sh [OPTION]... [FILE]...
This is the main script for PIrANHA v0.4a4 (update Dec 26 22:53:10 CST 2020).
Options:
-s, --shortlist Short list of available functions
-f, --func Function, <function>
-a, --args Function arguments passed to <function>
-q, --quiet Quiet (no output)
-l, --log Print log to file
-v, --verbose Output more information (items echoed to 'verbose')
-d, --debug Runs script in Bash debug mode (set -x)
-h, --help Display this help and exit
-V, --version Output version information and exit
OVERVIEW
THIS SCRIPT is the 'master' script that runs the PIrANHA software package by specifying
the <function> to be run (-f flag) and passing user-specified arguments to that function.
If no function or arguments are given, then the program prints the help text and exits.
Functions are located in the bin/ folder of the PIrANHA distribution. For detailed
information on the capabilities of PIrANHA, please refer to documentation posted on the
PIrANHA Wiki (https://github.com/justincbagley/piranha/wiki) or the PIrANHA website
(https://justinbagley.org/piranha/). Developers can test prianha and its functions by
activating Bash debug mode (-d, --debug flags).
Usage examples:
piranha -h Show piranha help text and exit
piranha -f <TAB> Get short list of available functions by dynamic completion
piranha -f list Get detailed list of available functions by function
piranha -f <function> -h Show help text for <function> and exit
piranha -f <function> <args> Run <function> script with arguments (e.g. options flags)
piranha -f <function> <args> -d Run <function> script in Bash debug mode
CITATION
Bagley, J.C. 2020. PIrANHA v0.4a4. GitHub repository, Available at:
<https://github.com/justincbagley/piranha>.
Created by Justin Bagley on Fri, Mar 8 12:43:12 CST 2019.
Copyright (c) 2019-2020 Justin C. Bagley. All rights reserved.
PIrANHA currently implements the following 58 functions, shown in Table 2 below.
Table 2: PIrANHA functions.
FUNCTION | Description |
---|---|
2logeB10.r | Rscript extracting marginal likelihood estimates and calculate 2loge B10 Bayes factors (2loge(B10)) from BEAST marginal likelihood estimation (ps / ss) runs. |
alignAlleles | Aligns and cleans allele sequences (phased DNA sequences) output by the PIrANHA function phaseAlleles (or in similar format; see phaseAlleles and alignAlleles usage texts for additional details). |
AnouraNEXUSPrepper | In-house function for preparing NEXUS files for Anoura UCE project analyses (Calderon, Bagley, and Muchhala, in prep.). |
batchRunFolders | Automates splitting a set of input files into different batches (to be run in parallel on a remote supercomputing cluster, or a local machine), starting from file type or list of input files. |
BEASTPostProc | Conducts post-processing of gene trees and species trees output by BEAST (e.g. Drummond et al. 2012; Bouckaert et al. 2014; usually on a remote supercomputer). |
BEASTReset | Function that resets the random seeds for n shell queue scripts corresponding to n BEAST runs/subfolders (destined for supercomputer). |
BEASTRunner | Automates running BEAST XML input files on a remote supercomputing cluster. |
BEAST_PSPrepper | Function that automates prepping BEAST XML input files for path sampling (marginal likelihood estimation) analyses using BEAST v2+ PathSampler. |
calcAlignmentPIS | Generates and runs custom Rscript (phyloch wrapper) to calculate number of parsimony-informative sites (pis) for all FASTA files in working dir. |
completeConcatSeqs | Function converting series of PHYLIP (Felsenstein 2002) DNA sequence alignments (with or without varying nos. of taxa) into a single concatenated PHYLIP alignment with complete taxon sampling; also makes character subset/partition files in RAxML, PartitionFinder, and NEXUS formats for the resulting alignment. |
completeSeqs | Function converting series of PHYLIP DNA sequence alignments (with or without varying nos. of taxa) into a single concatenated PHYLIP alignment with complete taxon sampling, starting from a 'taxon names and spaces' file. |
concatenateSeqs | Function that converts series of PHYLIP DNA sequence alignments with equal taxon sampling into a single concatenated PHYLIP alignment. |
concatSeqsPartitions | Function similar to concatenateSeqs, but which, in addition to concatenating the set of PHYLIP alignments, also outputs character subset/partitions files in RAxML, PartitionFinder, and NEXUS formats. This function differs from completeConcatSeqs in only taking alignments with equal taxon sampling and in being slightly faster in this usage case. |
dadiPostProc | Function for post-processing output from one or multiple ∂a∂i runs (ideally run with PIrANHA's dadiRunner function), including collation of best-fit parameter estimates, composite likelihoods, and optimal theta values. |
dadiRunner | Automates running ∂a∂i on a remote supercomputing cluster. See help text (-h) and function (bin/ dir) for details. |
dadiUncertainty | Automates uncertainty analysis in ∂a∂i, including generation of bootstrapped SNP files for parameter std. dev. estimation using the GIM method, as well as std. dev. estimation using the FIM method (orig. data only). |
dropRandomHap | This function randomly drops one phased haplotype (allele) per individual in each of n PHYLIP gene alignments in current working directory, starting from a 'taxon names' file. |
dropTaxa | Shell script automating removal of taxa from sequential, multi-individual FASTA or PHYLIP DNA sequence alignments, starting from a list of taxa to remove. |
ExaBayesPostProc | Function automating reading and conducting post-processing analyses on phylogenetic results output from ExaBayes. |
FASTA2PHYLIP | Function that automates converting one or multiple sequential FASTA DNA sequence alignment files (with sequences either unwrapped or hard-wrapped across multiple lines) to PHYLIP format. |
FASTA2VCF | Shell script function automating conversion of single multiple sequence FASTA alignment to variant call format (VCF) v4.1, with or without subsampling SNPs per partition/locus. |
fastSTRUCTURE | Interactive function that automates running fastSTRUCTURE (Raj et al. 2014) on biallelic SNP datasets. |
geneCounter | Shell script function that counts and summarizes the number of gene copies per tip species in a set of gene trees in Newick format (concatenated into a single trees file), given a taxon-species assignment file. |
getBootTrees | Function that automates organizing bootstrap trees output by RAxML runs conducted in current working directory using the MAGNET program within PIrANHA. |
getDropTaxa | Function to create drop taxon list given lists of a) all taxa and b) a subset of taxa to keep. |
getTaxonNames | Utility function that extracts tip taxon names from sequences present in one or multiple PHYLIP DNA sequence alignments in current directory, using information on maximum taxon sampling level from user. |
iqtreePostProc | Function that automates post-processing of gene tree files and log files output during phylogenetic analyses in IQ-TREE v1 or v2 (Nguyen et al. 2015; Minh et al. 2020). |
indexBAM | [In prep.] |
list | Function that prints a tabulated list of PIrANHA functions and their descriptions. |
MAGNET | Shell pipeline for automating estimation of a maximum-likelihood (ML) gene tree in RAxML for each of many loci in a RAD-seq, UCE, or other multilocus dataset. Also contains other tools. |
makePartitions | Function using PHYLIP DNA sequence alignments in current directory to make partitions/charsets files in RAxML, PartitionFinder, and NEXUS formats, which are output to separate files. |
Mega2PHYLIP | Automates converting one or more multiple sequence alignment files in Mega format (Mega v7+ or X; Kumar et al. 2016, 2018) to PHYLIP format (Felsenstein 2002), while saving (-k 1) or writing over (-k 0) the original Mega files. |
mergeBAM | [In prep.] |
MLEResultsProc | Automates post-processing of marginal likelihood estimation (MLE) results from running path sampling (ps) or stepping-stone (ss) sampling analyses on different models in BEAST. |
MrBayesPostProc | Simple script for post-processing results of a MrBayes v3.2+ (Ronquist et al. 2012) run, whose output files are assumed to be in the current working directory. |
NEXUS2MultiPHYLIP | Function that splits a sequential NEXUS alignment with charaset information into multiple PHYLIP-formatted alignments, one per gene/charset, and removes individuals with all missing data. |
NEXUS2PHYLIP | Function that reads in a single NEXUS datafile and converts it to PHYLIP ('.phy') format (Felsenstein 2002). |
nQuireRunner | Function that automates running nQuire software (Weiß et al. 2018) to determine sample ploidy level from next-generation sequencing (NGS) reads for one or multiple samples, starting from BAM file(s) for the sample(s). |
PFSubsetSum | Calculates summary statistics for DNA subsets within the optimum partitioning scheme identified for the data by PartitionFinder v1 or v2 (Lanfear et al. 2012, 2014). |
phaseAlleles | Automates phasing alleles of HTS data from targeted sequence capture experiments (or similar), including optionally transferring indel gaps from reference to the final phased FASTAs of consensus sequences, by masking |
PHYLIP2FASTA | Automates converting each of one or multiple PHYLIP DNA sequence alignments into FASTA format. |
phylip2fasta.pl | Nayoki Takebayashi utility Perl script for converting from PHYLIP to FASTA format. |
PHYLIP2Mega | Utility script for converting one or multiple PHYLIP DNA sequence alignments into Mega format. |
PHYLIP2NEXUS | Converts one or multiple PHYLIP-formatted multiple sequence alignments into NEXUS format, with or without pasting in a user-specified set of partitions (various formats). |
PHYLIP2PFSubsets | Automates construction of Y multiple sequence alignments corresponding to PartitionFinder-inferred subsets, starting from n PHYLIP, per-locus sequence alignments and a PartitionFinder results file (usually 'best_scheme.txt'). |
PHYLIPcleaner | Function that cleans one or more PHYLIP alignments in current dir by removing individuals with all (or mostly) undetermined sites. |
PHYLIPsubsampler | Automates subsampling each of one to multiple PHYLIP DNA sequence alignment files down to one (random) sequence per species, e.g. for species tree analyses. |
PHYLIPsummary | Summarizes characteristics (numbers of characters and tip taxa) in one or multiple PHYLIP DNA sequence alignment files in current working directory, and saves to file. |
PhyloMapperNullProc | Script for post-processing results of a PhyloMapper null model randomization analysis. |
phyNcharSumm | Utility function that summarizes the number of characters in each PHYLIP DNA sequence alignment in current working directory. |
pyRAD2PartitionFinder | Automates running PartitionFinder (Lanfear et al. 2012, 2014) 'out-of-the-box' starting from the PHYLIP DNA sequence alignment file ('.phy') and partitions ('.partitions') file output by pyRAD (Eaton 2014) or ipyrad (Eaton and Overcast 2016). |
pyRADLocusVarSites | Automates summarizing the numbers of variable sites and parsimony-informative sites (PIS) within RAD/GBS loci output by the programs pyRAD or ipyrad (Eaton 2014; Eaton and Overcast 2016). |
RAxMLRunChecker | Utility function that counts number of loci/partitions with completed RAxML runs, during or after a run of the MAGNET pipeline within PIrANHA, and summarizes run information. |
RAxMLRunner | Script that automates moving and running RAxML input files on a remote supercomputing cluster (with passwordless ssh access; code for extraction of results coming in 2019??...). |
renameForStarBeast2 | Function that renames tip taxa (i.e. sequence names) in all PHYLIP or FASTA DNA sequence alignments in the current working directory, so that the taxon names are suitable for assigning species in BEAUti before running *BEAST or StarBEAST2 in BEAST. |
renameTaxa | Automates renaming all tip taxa (samples) in genetic data files of type FASTA, PHYLIP, NEXUS, or VCF (variant call format) in current working directory. |
RogueNaRokRunner | Function that automates reading in a Newick-formatted tree file (-i flag) and analyzing it in RogueNaRok (Aberer et al. 2013). |
RYcoder | New (June 2019) function that converts a PHYLIP or NEXUS DNA sequence alignment into 'RY' coding, a binary format with purines (A, G) coded as 0's and pyrimidines (C, T) recoded as 1's. |
SNAPPRunner | Function that automates running SNAPP (Bryant et al. 2012) on a remote supercomputing cluster (with passwordless ssh access set up by user). |
SpeciesIdentifier | Runs the Taxon DNA software program SpeciesIdentifier, which implements methods in the well-known Meier et al. (2006) DNA barcoding paper. |
splitFASTA | Automates splitting a multi-individual FASTA DNA sequence alignment into one FASTA file per sequence (tip taxon). Works with sequential FASTAs with no text wrapping across lines. |
splitFile | Function that splits an input file into n parts (horizontally, by row) and optionally allows the user to specify the output basename for the resulting split files. |
splitPHYLIP | Splits a sequential PHYLIP DNA sequence alignment into separate PHYLIP sequence alignments, one per partition (read from a user-specified partition file). |
taxonCompFilter | Function that loops through the multiple sequence alignments and keeps only those alignments meeting the user-specified taxonomic completeness threshold ; alignments that pass this filter are saved to an output subfolder of the current directory. |
treeThinner | Function that conducts downsampling ('thinning') of trees in MrBayes .t files so that they contain every nth tree. |
trimSeqs | Function that automates trimming one or multiple PHYLIP DNA sequence alignments using the program trimAl (Capella-Gutiérrez et al. 2009), with custom trimming options, and output to FASTA, PHYLIP, or NEXUS formats. |
vcfSubsampler | Utility function that uses a list file to subsample a variant call format (VCF) file so that it only contains SNPs included in the list. |
It would be inconvenient to have to repeatedly refer back to this list. So, please note that this release of PIrANHA includes a list
function that provides a tabulated summary of PIrANHA functions. Obtain the function list from piranha
by issuing the following command from the command line:
piranha -f list
which prints:
FUNCTION DESCRIPTION
----------------------------------------------------------------------------------------------------------------
2logeB10.r Rscript extracting marginal likelihood estimates and calculate 2loge B10 Bayes factors
(2loge(B10)) from BEAST marginal likelihood estimation (ps / ss) runs.
alignAlleles Aligns and cleans allele sequences (phased DNA sequences) output by the PIrANHA function
phaseAlleles (or in similar format; see phaseAlleles and alignAlleles usage texts for
additional details)
AnouraNEXUSPrepper In-house function for preparing NEXUS files for Anoura UCE project analyses (Calderon,
Bagley, and Muchhala, in prep.).
batchRunFolders Automates splitting a set of input files into different batches (to be run in parallel on
a remote supercomputing cluster, or a local machine), starting from file type or list of
input files.
BEAST_logThinner Function that conducts downsampling ('thinning') of BEAST2 .log files to every nth line.
BEAST_PSPrepper Function that automates prepping BEAST XML input files for path sampling (marginal
likelihood estimation) analyses using BEAST v2+ PathSampler.
BEASTPostProc Conducts post-processing of gene trees and species trees output by BEAST (e.g. Drummond
et al. 2012; Bouckaert et al. 2014; usually on a remote supercomputer).
BEASTReset Function that resets the random seeds for n shell queue scripts corresponding to n BEAST
runs/subfolders (destined for supercomputer).
BEASTRunner Automates running BEAST XML input files on a remote supercomputing cluster.
calcAlignmentPIS Generates and runs custom Rscript (phyloch wrapper) to calculate number of parsimony-
informative sites (pis) for all FASTA files in working dir.
completeConcatSeqs Function converting series of PHYLIP (Felsenstein 2002) DNA sequence alignments (with or
without varying nos. of taxa) into a single concatenated PHYLIP alignment with complete
taxon sampling; also makes character subset/partition files in RAxML, PartitionFinder,
and NEXUS formats for the resulting alignment.
completeSeqs Function converting series of PHYLIP DNA sequence alignments (with or without varying nos.
of taxa) into a single concatenated PHYLIP alignment with complete taxon sampling, starting
from a 'taxon names and spaces' file.
concatenateSeqs Function that converts series of PHYLIP DNA sequence alignments with equal taxon sampling
into a single concatenated PHYLIP alignment.
concatSeqsPartitions Function similar to concatenateSeqs, but which, in addition to concatenating the set of
PHYLIP alignments, also outputs character subset/partitions files in RAxML, PartitionFinder,
and NEXUS formats. This function differs from completeConcatSeqs in only taking alignments
with equal taxon sampling and in being slightly faster in this usage case.
dadiPostProc Function for post-processing output from one or multiple ∂a∂i runs (ideally run with
PIrANHA's dadiRunner function), including collation of best-fit parameter estimates,
composite likelihoods, and optimal theta values.
dadiRunner Automates running ∂a∂i on a remote supercomputing cluster. See help text (-h) and function
(bin/ dir) for details.
dadiUncertainty Automates uncertainty analysis in ∂a∂i, including generation of bootstrapped SNP files for
parameter std. dev. estimation using the GIM method, as well as std. dev. estimation using
the FIM method (orig. data only).
dropRandomHap This function randomly drops one phased haplotype (allele) per individual in each of n
PHYLIP gene alignments in current working directory, starting from a 'taxon names' file.
dropTaxa Shell script automating removal of taxa from sequential, multi-individual FASTA or PHYLIP
DNA sequence alignments, starting from a list of taxa to remove.
ExaBayesPostProc Function automating reading and conducting post-processing analyses on phylogenetic results
output from ExaBayes.
FASTA2PHYLIP Function that automates converting one or multiple sequential FASTA DNA sequence alignment
files (with sequences either unwrapped or hard-wrapped across multiple lines) to PHYLIP
format (Felsenstein 2002).
FASTA2VCF Shell script function automating conversion of single multiple sequence FASTA alignment to
variant call format (VCF) v4.1, with or without subsampling SNPs per partition/locus.
FASTAsummary Summarizes characteristics (numbers of characters and tip taxa) in one or multiple FASTA
DNA sequence alignment files in current working directory, and saves to file.
fastSTRUCTURE Interactive function that automates running fastSTRUCTURE (Raj et al. 2014) on biallelic
SNP datasets.
geneCounter Shell script function that counts and summarizes the number of gene copies per tip species
in a set of gene trees in Newick format (concatenated into a single trees file), given a
taxon-species assignment file.
getBootTrees Function that automates organizing bootstrap trees output by RAxML runs conducted in
current working directory using the MAGNET program within PIrANHA.
getDropTaxa Function to create drop taxon list given lists of a) all taxa and b) a subset of taxa to
keep.
getTaxonNames Utility function that extracts tip taxon names from sequences present in one or multiple
PHYLIP DNA sequence alignments in current directory, using information on maximum taxon
sampling level from user.
indexBAM [In prep.]
iqtreePostProc Function that automates post-processing of gene tree files and log files output during
phylogenetic analyses in IQ-TREE v1 or v2 (Nguyen et al. 2015; Minh et al. 2020).
list Function that prints a tabulated list of PIrANHA functions and their descriptions.
MAGNET Shell pipeline for automating estimation of a maximum-likelihood (ML) gene tree in RAxML
for each of many loci in a RAD-seq, UCE, or other multilocus dataset. Also contains other
tools.
makePartitions Function using PHYLIP DNA sequence alignments in current directory to make partitions/
charsets files in RAxML, PartitionFinder, and NEXUS formats, which are output to separate
files.
Mega2PHYLIP Automates converting one or more multiple sequence alignment files in Mega format (Mega v7+
or X; Kumar et al. 2016, 2018) to PHYLIP format (Felsenstein 2002), while saving (-k 1) or
writing over (-k 0) the original Mega files.
mergeBAM [In prep.]
MLEResultsProc Automates post-processing of marginal likelihood estimation (MLE) results from running path
sampling (ps) or stepping-stone (ss) sampling analyses on different models in BEAST.
MrBayesPostProc Simple script for post-processing results of a MrBayes v3.2+ (Ronquist et al. 2012) run,
whose output files are assumed to be in the current working directory.
NEXUS2MultiPHYLIP Function that splits a sequential NEXUS alignment with charaset information into multiple
PHYLIP-formatted alignments, one per gene/charset, and removes individuals with all missing
data.
NEXUS2PHYLIP Function that reads in a single NEXUS datafile and converts it to PHYLIP ('.phy') format
(Felsenstein 2002).
nQuireRunner Function that automates running nQuire software (Weiß et al. 2018) to determine sample
ploidy level from next-generation sequencing (NGS) reads for one or multiple samples,
starting from BAM file(s) for the sample(s)
PFSubsetSum Calculates summary statistics for DNA subsets within the optimum partitioning scheme
identified for the data by PartitionFinder v1 or v2 (Lanfear et al. 2012, 2014).
phaseAlleles Automates phasing alleles of HTS data from targeted sequence capture experiments (or similar),
including optionally transferring indel gaps from reference to the final phased FASTAs of
consensus sequences, by masking
PHYLIP2FASTA Automates converting each of one or multiple PHYLIP DNA sequence alignments into FASTA
format.
phylip2fasta.pl Nayoki Takebayashi utility Perl script for converting from PHYLIP to FASTA format.
PHYLIP2Mega Utility script for converting one or multiple PHYLIP DNA sequence alignments into Mega
format.
PHYLIP2NEXUS Converts one or multiple PHYLIP-formatted multiple sequence alignments into NEXUS format,
with or without pasting in a user-specified set of partitions (various formats).
PHYLIP2PFSubsets Automates construction of Y multiple sequence alignments corresponding to PartitionFinder-
inferred subsets, starting from n PHYLIP, per-locus sequence alignments and a PartitionFinder
results file (usually 'best_scheme.txt').
PHYLIPcleaner Function that cleans one or more PHYLIP alignments in current dir by removing individuals
with all (or mostly) undetermined sites.
PHYLIPsubsampler Automates subsampling each of one to multiple PHYLIP DNA sequence alignment files down to
one (random) sequence per species, e.g. for species tree analyses.
PHYLIPsummary Summarizes characteristics (numbers of characters and tip taxa) in one or multiple PHYLIP
DNA sequence alignment files in current working directory, and saves to file.
PhyloMapperNullProc Script for post-processing results of a PhyloMapper null model randomization analysis.
phyNcharSumm Utility function that summarizes the number of characters in each PHYLIP DNA sequence
alignment in current working directory.
pyRAD2PartitionFinder Automates running PartitionFinder (Lanfear et al. 2012, 2014) 'out-of-the-box' starting
from the PHYLIP DNA sequence alignment file ('.phy') and partitions ('.partitions') file
output by pyRAD (Eaton 2014) or ipyrad (Eaton and Overcast 2016).
pyRADLocusVarSites Automates summarizing the numbers of variable sites and parsimony-informative sites (PIS)
within RAD/GBS loci output by the programs pyRAD or ipyrad (Eaton 2014; Eaton and Overcast
2016).
RAxMLRunChecker Utility function that counts number of loci/partitions with completed RAxML runs, during
or after a run of the MAGNET pipeline within PIrANHA, and summarizes run information.
RAxMLRunner Script that automates moving and running RAxML input files on a remote supercomputing
cluster (with passwordless ssh access; code for extraction of results coming in 2019??...).
renameForStarBeast2 Function that renames tip taxa (i.e. sequence names) in all PHYLIP or FASTA DNA sequence
alignments in the current working directory, so that the taxon names are suitable for
assigning species in BEAUti before running *BEAST or StarBEAST2 in BEAST.
renameTaxa Automates renaming all tip taxa (samples) in genetic data files of type FASTA, PHYLIP,
NEXUS, or VCF (variant call format) in current working directory.
RogueNaRokRunner Function that automates reading in a Newick-formatted tree file (-i flag) and analyzing it
in RogueNaRok (Aberer et al. 2013).
RYcoder New (June 2019) function that converts a PHYLIP or NEXUS DNA sequence alignment into 'RY'
coding, a binary format with purines (A, G) coded as 0's and pyrimidines (C, T) recoded
as 1's.
SNAPPRunner Function that automates running SNAPP (Bryant et al. 2012) on a remote supercomputing
cluster (with passwordless ssh access set up by user).
SpeciesIdentifier Runs the Taxon DNA software program SpeciesIdentifier, which implements methods in the well-
known Meier et al. (2006) DNA barcoding paper.
splitFASTA Automates splitting a multi-individual FASTA DNA sequence alignment into one FASTA file per
sequence (tip taxon). Works with sequential FASTAs with no text wrapping across lines.
splitFile Function that splits an input file into n parts (horizontally, by row) and optionally allows
the user to specify the output basename for the resulting split files.
splitPHYLIP Splits a sequential PHYLIP DNA sequence alignment into separate PHYLIP sequence alignments,
one per partition (read from a user-specified partition file).
taxonCompFilter Function that loops through the multiple sequence alignments and keeps only those alignments
meeting the user-specified taxonomic completeness threshold <taxCompThresh>; alignments that
pass this filter are saved to an output subfolder of the current directory.
treeThinner Function that conducts downsampling ('thinning') of trees in MrBayes .t files so that they
contain every nth tree.
trimSeqs Function that automates trimming one or multiple PHYLIP DNA sequence alignments using the
program trimAl (Capella-Gutiérrez et al. 2009), with custom trimming options, and output to
FASTA, PHYLIP, or NEXUS formats.
vcfSubsampler Utility function that uses a list file to subsample a variant call format (VCF) file so that
it only contains SNPs included in the list.
REFERENCES
Aberer, A., Krompass, D., Stamatakis, A. 2013. Pruning rogue taxa improves phylogenetic
accuracy: an efficient algorithm and webservice. Systematic Biology 62(1), 162–166.
Bouckaert, R., Heled, J., Künert, D., Vaughan, T.G., Wu, C.H., Xie, D., Suchard, M.A.,
Rambaut, A., Drummond, A.J. 2014. BEAST2: a software platform for Bayesian evolutionary
analysis. PLoS Computational Biology 10, e1003537.
Bryant, D., Bouckaert, R., Felsenstein, J., Rosenberg, N.A., RoyChoudhury, A. 2012. Inferring
species trees directly from biallelic genetic markers: bypassing gene trees in a full
coalescent analysis. Molecular Biology and Evolution 29, 1917–1932.
Capella-Gutiérrez, S., Silla-Martínez, J.M., Gabaldon, T., 2009. TRIMAL: a tool for automated
alignment trimming in large-scale phylogenetic analyses. Bioinformatics 25(15), 1972–1973.
Drummond, A.J., Suchard, M.A., Xie, D., Rambaut, A. 2012. Bayesian phylogenetics with BEAUti
and the BEAST 1.7. Molecular Biology and Evolution 29, 1969-1973.
Eaton, D.A. 2014. PyRAD: assembly of de novo RADseq loci for phylogenetic analyses.
Bioinformatics 30, 1844-1849.
Eaton, D.A.R., Overcast, I. 2016. ipyrad: interactive assembly and analysis of RADseq data sets.
Available at: <http://ipyrad.readthedocs.io/>.
Felsenstein, J. 2002. PHYLIP (Phylogeny Inference Package) Version 3.6 a3.
Available at: <http://evolution.genetics.washington.edu/phylip.html>.
Lanfear, R., Calcott, B., Ho, S.Y.W., Guindon, S. 2012. Partitionfinder: combined selection of
partitioning schemes and substitution models for phylogenetic analyses. Molecular Biology
and Evolution 29, 1695–1701.
Lanfear, R., Calcott, B., Kainer, D., Mayer, C., Stamatakis, A. 2014. Selecting optimal
partitioning schemes for phylogenomic datasets. BMC Evolutionary Biology 14, 82.
Meier, R., Shiyang, K., Vaidya, G., Ng, P.K. 2006. DNA barcoding and taxonomy in Diptera:
a tale of high intraspecific variability and low identification success. Systematic
Biology 55(5), 715-728.
Minh, B.Q., Schmidt, H.A., Chernomor, O., Schrempf, D., Woodhams, M.D., Von Haeseler, A.,
Lanfear, R., 2020. IQ-TREE 2: New models and efficient methods for phylogenetic inference
in the genomic era. Molecular Biology and Evolution 37(5), 1530-1534.
Nguyen, L.T., Schmidt, H.A., Von Haeseler, A., Minh, B.Q., 2015. IQ-TREE: a fast and effective
stochastic algorithm for estimating maximum-likelihood phylogenies. Molecular Biology and
Evolution 32(1), 268-274.
Weiß, C.L., Pais, M., Cano, L.M., Kamoun, S., Burbano, H.A. 2018. nQuire: a statistical
framework for ploidy estimation using next generation sequencing. BMC Bioinformatics
19(1), 122.
----------------------------------------------------------------------------------------------------------------
Part of what PIrANHA does focuses on allowing users with access to a remote supercomputing cluster to take advantage of that resource in an automated fashion. Thus, it is implicitly assumed when running a handful of PIrANHA functions that the user has set up passowordless ssh access to a supercomputer account.
✋ If you have not done this, or are unsure about this, then you should set up passwordless acces by creating and organizing appropriate and secure public and private ssh keys on your machine and the remote supercomputer prior to using PIrANHA. By "secure," I mean that, during this process, you should have closed write privledges to authorized keys by typing "chmod u-w authorized keys" after setting things up using ssh-keygen
.
❗ Setting up passwordless SSH access is VERY IMPORTANT for running the BEASTRunner
, RAxMLRunner
, and SNAPPRunner
functions of PIrANHA, which run pipelines that will not work if passwordless ssh is not set up. The following links provide a list of useful tutorials/discussions that can help users set up passwordless SSH access:
📄 PIrANHA functions accept a number of different input file types, which are listed in Table 1 below. These can be generated by hand or are output by specific upstream software programs. As far as output file types go, PIrANHA outputs various text, PDF, and other kinds of graphical output from software that are linked through PIrANHA pipelines.
Input file types | Software (from) |
---|---|
.partitions |
pyRAD / ipyrad
|
.phy |
pyRAD / ipyrad / by hand |
.str |
pyRAD / ipyrad
|
.gphocs |
pyRAD / ipyrad / MAGNET (NEXUS2gphocs.sh ) |
.loci |
pyRAD / ipyrad
|
.nex |
pyRAD / ipyrad / by hand |
.trees | BEAST |
.species.trees | BEAST |
.log | BEAST |
.mle.log | BEAST |
.xml | BEAUti |
.sfs | easySFS |
Exabayes_topologies.* | ExaBayes |
Exabayes_parameters.* | ExaBayes |
December 26, 2020 - Justin C. Bagley, Jacksonville, AL, USA