Skip to content
Brendan Daisley edited this page Mar 11, 2024 · 3 revisions

Description:

This function effectively wraps isoQC, isoTAX, and isoLIB steps into a single command for convenience. Input can be a single directory or a list of directories to process at once. If multiple directories are provided, the resultant libraries can be sequentially merged together by toggling the parameter 'merge=TRUE'. All other respective parameters from the wrapped functions can be passed through this command. . The The respective input parameters from the wrappred can be passed through this command with exception of the .creates a strain library by grouping closely related strains of interest based on sequence similarity. For adding new sequences to an already-established strain library, specify the .CSV file path of the older strain library using the 'old_lib_csv" parameter.

Usage:

isoALL(input = NULL,
       export_html = TRUE,
       export_csv = TRUE,
       export_fasta = TRUE,
       export_fasta_revcomp = FALSE,
       quick_search = FALSE,
       db = "16S",
       iddef = 2,
       phylum_cutoff = 75,
       class_cutoff = 78.5,
       order_cutoff = 82,
       family_cutoff = 86.5,
       genus_cutoff = 96.5,
       species_cutoff = 98.7,
       include_warnings = FALSE,
       strain_group_cutoff = 0.995,
       merge = FALSE)

Arguments:

Parameter Description
input Directory path(s) containing .ab1 files. If more than one, provivde as list (e.g. 'input=c("/path/to/directory1","/path/to/directory2")')
export_html (Default=TRUE) Output the results as an HTML file
export_csv (Default=TRUE) Output the results as a CSV file.
export_fasta (Default=TRUE) Output the sequences in a FASTA file.
export_fasta_revcomp (Default=FALSE) Output the sequences in reverse complement form in a fasta file. This is useful in cases where sequencing was done reverse primer and thus the orientation of input sequences needs reversing.
verbose (Default=FALSE) Output progress while script is running.
files_manual (Default=NULL) For testing purposes only. Specify a list of files to run as filenames without extensions, rather than the whole directory format. Primarily used for testing, use at your own risk.
exclude (Default=NULL) For testing purposes only. Excludes files of interest from input directory.
min_phred_score (Default=20) Do not accept trimmed sequences with a mean Phred score below this cutoff
min_length (Default=200) Do not accept trimmed sequences with sequence length below this number
sliding_window_cutoff (Default=NULL) Quality trimming parameter (M2) for wrapping SangerRead function in sangeranalyseR package. If NULL, implements auto cutoff for Phred score (recommended), otherwise set between 1-60.
sliding_window_size (Default=15) Quality trimming parameter (M2) for wrapping SangerRead function in sangeranalyseR package. Recommended range between 5-30.
date Set date "YYYY_MM_DD" format. If NULL, attempts to parse date from .ab1 file
quick_search (Default=FALSE) Whether or not to perform a comprehensive database search (i.e. optimal global alignment). If TRUE, performs quick search equivalent to setting VSEARCH parameters "--maxaccepts 100 --maxrejects 100". If FALSE, performs comprehensive search equivalent to setting VSEARCH parameters "--maxaccepts 0 --maxrejects 0"
db (Default="16S") Select database option(s) including "16S" (for searching against the NCBI Refseq targeted loci 16S rRNA database), "ITS" (for searching against the NCBI Refseq targeted loci ITS database. For combined databases in cases where input sequences are dervied from bacteria and fungi, select "16S|ITS".
iddef Set pairwise identity definition as per VSEARCH definitions (Default=2, and is recommended for highest taxonomic accuracy)(0) CD-HIT definition: (matching columns) / (shortest sequence length). (1) Edit distance: (matching columns) / (alignment length). (2) Edit distance excluding terminal gaps (default definition). (3) Marine Biological Lab definition counting each gap opening (internal or terminal) as a single mismatch, whether or not the gap was extended: 1.0- ((mismatches + gap openings)/(longest sequence length)). (4) BLAST definition, equivalent to --iddef 1 for global pairwise alignments.
phylum_cutoff Percent cutoff for phylum rank demarcation
class_cutoff Percent cutoff for class rank demarcation
order_cutoff Percent cutoff for order rank demarcation
family_cutoff Percent cutoff for family rank demarcation
genus_cutoff Percent cutoff for genus rank demarcation
species_cutoff Percent cutoff for species rank demarcation
include_warnings (Default=FALSE) Whether or not to keep sequences with poor alignment warnings from Step 2 'isoTAX' function. Set TRUE to keep warning sequences, and FALSE to remove warning sequences.
strain_group_cutoff (Default=0.995) Similarity cutoff (0-1) for delineating between strain groups. 1 = 100% identical/0.995=0.5% difference/0.95=5.0% difference/etc.
merge If TRUE, combines isoLIB output files consecutively in the order they are listed. Default=FALSE performs all the steps (isoQC/isoTAX/isoLIB) on each directory separately.
Clone this wiki locally