Skip to content

3b. Running DRAM v

Rory M Flynn edited this page Nov 2, 2022 · 8 revisions

DRAM-v annotate

Annotation of viral contigs with DRAM-v takes a bit more work. First your contigs must be run through VirSorter to identify the viral contigs. Then the predicted viral sequences you wish to include in your annotation must be concatenated into a single file. Then this concatenated fasta file and the VIRSorter_affi-contigs.tab file must be used to run DRAM-v. The following command will generate your full viral annotation:

DRAM-v.py annotate -i my_viral_contigs.fa -v VIRSorter_affi-contigs.tab -o annotation

In the output annotation folder there will be the same collection of files as is generated when running DRAM.py. The only change is the addition of columns for VIRSorter gene category, auxiliary score and metabolic flags to the annotations.tsv output file.

DRAM-v annotate key parameters

  • Input files
    • Input fasta: fasta file of viruses from VirSorter, needs to be concatenated viruses from the predicted viral sequences
    • VirSorter affi contigs: VirSorter output file containing VirSorter categories assigned to all genes
  • Parameters
    • Minimum contig size: 2500
    • Minimum bit score for MMSeqs2 searches: 60
    • Minimum bit score for reverse best hit MMSeqs2 searches: 350
    • Number of threads to use: 10

DRAM-v annotate outputs

  • Tab separated file (.tsv) with all the annotations from Pfam, KEGG, UniProt, dbCAN, MEROPS, VOGDB, and a manually curated * AMG database for all genes in all the input viral contigs
  • Single GenBank file of annotations across viral contigs
  • Single gene-finding format (.gff) file of all annotations across viral contigs
  • Single fasta format file (.faa) of each open reading frame nucleotide sequence and best ranked annotation
  • Single fasta format file (.fna) of each translated open reading frame amino acid sequence and best ranked annotation

DRAM-v annotate additional parameters

DRAM-v distill

Then after your annotation is finished, you can summarize these annotations with the following command:

DRAM-v.py distill -i annotation/annotations.tsv -o annotation/distilled

This command will generate three files. The first is called amg_summary.tsv this contains a summary of potential AMGs detected by DRAM-v. It gives information about every AMG including the modules in which the gene is present. The viral_genome_summary.tsv file contains all measures required by the MIUViG about each fasta used as input. Similar to DRAM output for bacterial and archaeal genomes, the liquor.html file is an interactive heatmap that summarizes the AMGs and the metabolic pathways to which they may contribute (Example here).

DRAM-v distill parameters

  • Input files
    • Annotations: annotations.tsv file generated during the annotate step
    • Output directory: Directory to create and write outputs
  • Default Parameters
    • Grouping: scaffold column from annotations.tsv file
    • Max auxiliary score: 3
    • Remove transposons: False
    • Remove genes near scaffold end (F): False

DRAM-v distill output

  • Distillate
    • Tab separated file (.tsv) with viral contig statistics for all input viral contigs including all statistics required by recently defined MIUVIG standards (Roux, et al.)
    • Tab separated file (.tsv) of auxiliary metabolic genes (AMG) summary from all input viral contigs, which gives putative AMGs including annotation, auxiliary scores, and metabolic flags outlined in main text Fig. 6.
  • Liquor
    • HTML (.html) file containing an interactive heatmap showing all viruses with a putative AMG and the AMG metabolism category, with number of AMGs on each contig noted
    • Tab separated file (.tsv) with corresponding genes for the html heatmap.

DRAM-v strainer

DRAM-v strainer filters down fasta files to only contain genes or scaffolds that meet criteria set by the user. For example, if you want to get the amino acid sequences of all genes annotated with the GH4 and GH5 families in fasta format to make a tree then you can do that with this command:

DRAM-v.py strainer --identifiers GH4 GH5 -i annotations.tsv -f genes.faa -o GH4_GH5_genes.faa

Or if you want to only get scaffolds 34 and 52 from the fasta bin.4 then you can use this command:

DRAM-v.py strainer --scaffolds bin.4_scaffold_34 bin.4_scaffold_52 -i annotations.tsv -f scaffolds.fna -o GH4_GH5_genes.faa

If you only want to get the genes considered potential AMGs that are transporters use this command:

DRAM-v.py strainer -i hmp_viruses/annotations.tsv -f hmp_viruses/genes.fna -o genes.amgs.fna -a --categories Transporters

If I only want genes that have a 'V' flag and auxiliary scores of 4 or 5:

DRAM-v.py strainer -i hmp_viruses/annotations.tsv -f hmp_viruses/genes.fna -o genes.viral.fna --aux_scores 4 5 --amg_flags V

DRAM-v strainer parameters

  • Input files
    • Annotations: annotations.tsv file generated during the annotate step
    • Input fasta: genes fasta file (.faa or .fna) to be filtered
    • Output fasta: location to save filtered fasta file
  • Default Parameters
    • fastas: None
    • Scaffolds: None
    • Genes: None
    • Identifiers: None