VADR is a suite of tools for classifying and analyzing sequences homologous to a set of reference models of viral genomes or gene families. It has been mainly tested for analysis of Norovirus, Dengue, and SARS-CoV-2 virus sequences in preparation for submission to the GenBank database.
The VADR v-annotate.pl
script is used to classify a sequence, by
determining which in a set of reference models it is most similar to,
and then annotate that sequence based on that most similar model.
Example usage of v-annotate.pl
can be found
here. Another VADR script,
v-build.pl
, is used to create the models from NCBI RefSeq sequences
or from input multiple sequence alignments, potentially with secondary
structure annotation. v-build.pl
stores the RefSeq feature
annotation in the model, and v-annotate.pl
maps that annotation
(e.g. CDS coordinates) onto the sequences it annotates.
VADR includes 205 prebuilt models of Flaviviridae and
Caliciviridae viral RefSeq genomes, created with a process similar
to the one described
here. Example usage of
v-build.pl
can be found here. An
advanced tutorial on building VADR models using RSV as an example can
be found here. To use v-annotate.pl
with viruses other than the default set of 205, see 'Available VADR
models'. For instructions on using VADR for SARS-CoV-2
annotation see this
page.
v-annotate.pl
identifies unexpected or divergent attributes of the
sequences it annotates (e.g. invalid or early stop codons in CDS
features) and reports them to the user in the form of alerts. A
subset of alerts are fatal and cause a sequence to fail. A
sequence passes if zero fatal alerts are reported for it. VADR is
used by GenBank staff to evaluate incoming sequence submissions of
some viruses (currently Norovirus, Dengue virus, and SARS-CoV-2).
Submitted Norovirus, Dengue virus and SARS-CoV-2 sequences that pass
v-annotate.pl
are accepted into GenBank.
The homology search and alignment components of VADR scripts, the most computationally expensive steps, are performed by the Infernal, HMMER, FASTA, MINIMAP2 and BLAST software packages, which are downloaded and installed with VADR installation.
The v-annotate.pl
script includes some special options specifically
developed for SARS-CoV-2 annotation that increase speed (-s
and
--glsearch
options) and provide better annotation for sequences with
stretches of Ns (-r
option). See this
page for
more information on using VADR to annotate SARS-CoV-2 sequences.
VADR installation includes a default set of Caliciviridae models
including Norovirus virus. The installation also includes a set of
Flaviviridae models including Dengue virus. You can download
additional pre-built models to use to validate and annotate viruses,
including SARS-CoV-2, RSV, or cox1 genes. Importantly, to
use a set of models other than the default Caliciviridae set, you
will need to use either the --mdir
and --mkey
options, or the the
-m
, -i
, -x
and possibly -n
options as described
here.
See this page for a list of all available models and additional information.
- VADR installation instructions
v-build.pl
example usage and command-line optionsv-annotate.pl
example usage, command-line options and alert information- Advanced tutorial: building an RSV model library
- Explanations and examples of
v-annotate.pl
detailed alert and error messages- Output fields with detailed alert and error messages
- Explanation of sequence and model coordinate fields in
.alt
files toy50
toy model used in examples of alert messages- Examples of different alert types and corresponding
.alt
output - Posterior probability annotation in VADR output Stockholm alignments
- VADR output file formats
- Available VADR model files (github wiki)
- SARS-CoV-2 annotation (github wiki)
- Rfam-based structural annotation of a viral genome sequence for use with VADR (github wiki)
- Development notes and instructions (github wiki)
-
VADR includes contributions and input from current and former colleagues at NCBI, including:
Rodney Brister
Vince Calhoun
Sergiy Gotvyanskyy
Eneida Hatcher
Sophia Hu
Ilene Karsch-Mizrachi
Rich McVeigh
Susan Schafer
Alejandro Schäffer
Lara Shonkwiler
Beverly Underwood
Yuri Wolf
Linda Yankie
-
The recommended citation for using VADR for SARS-CoV-2 analysis: Eric P Nawrocki; Faster SARS-CoV-2 sequence validation and annotation for GenBank using VADR. NAR Genom Bioinform. 2023 Jan 20;5(1)::lqad002. (2023). https://doi.org/10.1093/nargab/lqad002
-
The recommended citation for non-SARS-CoV-2 use of VADR is: Alejandro A Schäffer, Eneida L Hatcher, Linda Yankie, Lara Shonkwiler, J Rodney Brister, Ilene Karsch-Mizrachi, Eric P Nawrocki; VADR: validation and annotation of virus sequence submissions to GenBank. BMC Bioinformatics 21, 211 (2020). https://doi.org/10.1186/s12859-020-3537-3