Miscellaneous bash scripts for analysis of sequence data.
annotates taxonomy file using FUNGuild (Nguyen et al. 2016). Input file must include the following semicolon-separated columns: group
, kingdom
, phylum
, class
, order
, family
, genus
, species
.
annotates representative OTU sequences with OTU names.
clusters sequences into OTUs using CD-HIT (Fu et al. 2012) and formats the output files.
classifies fungal ITS sequences against records in the UNITE database of reference ITS sequences (Kõljalg et al. 2005) using the Naïve Bayesian Classifier tool (Wang et al. 2007) implemented in mothur (Schloss et al. 2009). Requires system-wide access to mothur.
extracts OTU representative sequences from OTU list and fasta sequences file.
generates a list of OTUs from BlastClust output.
transforms OTU list to mothur format for running command classify.otu
in mothur.
removes taxon tags from UNITE taxonomy files and modifies the output.
taxonomically annotates rDNA sequences using a Naive Bayesian Classifier (NBC) as implemented in Mothur and a local BLAST against user-specified databases.
Requires a system-wide installation of Mothur (e.g. via sudo apt install mothur
) and of ncbi-blast+
(sudo apt install ncbi-blast+
).
Specify the reference databases to use in the ### parameters
section of the script, including sequence + taxonomy files for the NBC (e.g. available to download for bacteria), fungi and arbuscular mycorrhizal fungi) and a BLAST database for BLAST (available through NCBI).
In the latter case, it is possible to directly specify a FTP address to a database, and the script will auomatically download it. If providing a folder (e.g. $HOME/db/BLAST
), ensure there is system wide access (add line BLASTDB="$HOME/db/BLAST"
to your ~/.profile
or ~/.bashrc
files) and that it also contains the taxdb
database.
The script can run by providing a fasta file as the INPUT_SEQ
variable within the file, or using a positional argument. See examples with input file example_ITS.fasta
:
# run with input file as parameter (modify 'INPUT_SEQ="example_ITS.fasta"' within file)
bash taxAnnotation.sh
# using positional argument
bash taxAnnotation.sh examples/example_ITS.fasta
# specifying FTP address ('BLAST_DB="ftp://ftp.ncbi.nlm.nih.gov/blast/db/ITS_eukaryote_sequences.tar.gz"' within file)
bash taxAnnotation.sh examples/example_ITS.fasta
generates a tree from a taxonomy file using the script provided by Tedersoo et al. (2018). The input file must be separated by tabs.