-
Notifications
You must be signed in to change notification settings - Fork 52
Home
Welcome to the wiki for the tool for Distilling and Refining Annotations of Metabolism (DRAM)! Here you will find all you need to know to setup, install and run DRAM and DRAM-v.
Like the process of making the eponymous glass of whiskey, DRAM distills genome annotations to metabolic functions in three levels that scale in information: (1) Raw, (2) Distillate, and (3) Liquor. Through this distillation process, DRAM is able to annotate high volumes of microbial genomes and organize the resulting information in a way that highlights functional guilds, allowing users to infer organismal metabolism across hundreds of genomes. To obtain the Raw output, DRAM calls genes on input genomes, searches each gene against seven databases, and considers all derived annotations together. This approach significantly increases database searches by at least 25% beyond other annotators such as DFAST, MetaERG, and Prokka. The DRAM Raw output contains all database hits per gene in every input genome, which is the final output for most annotators. DRAM significantly advances genome annotation beyond the final raw output by providing the first of its kind organization and visualization of all annotations into ecosystem relevant functions.
DRAM for MAGs works by annotating all genomes given with all databases used by DRAM. The user is given a tab delimited annotations file with all annotations from all databases for all genes in all genomes. Additionally the user is given a folder with genbank files for each genome, a gff file with all annotations across genomes as well as annotated nucleotide and amino acid fasta files of all genes. The results of annotation can be distilled. This will generate three files: 1. The genome statistics table which includes all statistics required by MIMAG, 2. the metabolism summary which gives gene counts of functional genes across a wide variety of metabolisms and 3. the liquor which is a heatmap showing coverage of modules, the coverage of electron transport chain components and the presence of selected metabolic functions.
DRAM-v works to annotated and discover auxiliary metabolic genes (AMGs) from viral contigs as detected by VirSorter. The output of VirSorter is first used to annotate using the same databases as in DRAM for MAGs. After that auxiliary scores, which measure the confidence that a gene is viral in origin, and metabolic flags, which indicate different characteristics of each gene, are generated. Based on the auxiliary scores and metabolic flags potential AMGs are identified. By default a gene is considered a potential AMG if it has an M flag, no V flag, no A flag and an auxiliary score of 3 or lower. Distillation of viral annotations generates: 1. A viral genome statistics table which includes all statistics required by MIUViG, 2. a summary of the potential AMGs present in all viral contigs as well as their metabolic functions and 3. the liquor which is a heatmap showing the functions of all AMGs present across all viral contigs.
These are the commands needed to quickly install, setup and get started running DRAM and DRAM-v.
It is recommended to install DRAM within a conda environment. If you would like to install DRAM manually see the How to Install and Set Up DRAM section of the Wiki.
wget https://raw.githubusercontent.com/shafferm/DRAM/master/environment.yaml
conda env create -f environment.yml -n DRAM
If this installation method is used then all further steps should be ran inside the created DRAM environment.
Then set up DRAM using the following command:
DRAM.py prepare_databases --output_dir DRAM_data --kegg_loc kegg.pep
Once DRAM is set up you are ready to annotate some MAGs. The following commands will generate the full annotation and distillation of MAGs:
DRAM.py annotate -i 'my_bins/*.fa' -o annotation
DRAM.py distill -i annotation/annotations.tsv -o genome_summaries --trna_path annotation/trnas.tsv --rrna_path annotation/rrnas.tsv
Annotating and distilling viral contigs requires some preprocessing and an additional input. The contigs must be processed with VirSorter and the processed viral contigs and VIRSorter_affi-contigs.tab
are used as input to DRAM-v. The following commands will generate the full annotation and distillation of viral contigs:
DRAM-v.py annotate -i my_viral_contigs.fa -v VIRSorter_affi-contigs.tab -o annotation
DRAM-v.py distill -i annotation/annotations.tsv -o annotation/distilled
DRAM has a large memory burden and is design to be ran on servers. DRAM annotates against a large variety of databases which must be processed and stored. With a standard setup the processed DRAM databases take up about 20 GB of storage. DRAM memory usage depends on the databases used. When annotating with UniRef90 around 220 GB of RAM is required. If the KEGG gene database has been provided and the --skip_uniref
flag is used then memory usage is around 100 GB of RAM. If KOfam is used to annotate KEGG along with the --skip_uniref
flag then less than 50 GB of RAM is required. DRAM can be run with any number of processors on a single node.