Skip to content

Loading SQM results into anvi'o

Natalia García García edited this page Feb 4, 2021 · 9 revisions

You have run SqueezeMeta and now you want to explore easily your data. In this tutorial we explain how to integrate your SQM project into the anvi'o platform.

Workflow to load a SQM project into anvi'o platform.

Anvi'o is an analysis and visualization platform for ‘omics data (http://merenlab.org/software/anvio/). Citation: Eren, A. M., Esen, Ö. C., Quince, C., Vineis, J. H., Morrison, H. G., Sogin, M. L., & Delmont, T. O. (2015). Anvi'o: an advanced analysis and visualization platform for 'omics data. PeerJ, 3, e1319. doi:10.7717/peerj.1319

Although we offer a procedure to visualize your results with anvi'o, we encourage you to check their webpage to learn more about its utilities as well as cite anvi'o properly.

We have used 16 samples from Hadza hunter-gatherer and italians gut microbiomes (Rampelli, S., Schnorr, S. L., Consolandi, C., Turroni, S., Severgnini, M., Peano, C., ... & Candela, M. (2015). Metagenome sequencing of the Hadza hunter-gatherer gut microbiota. Current Biology, 25(13), 1682-1693.)

Note: Don't forget to change the path of the scripts and folders, depending on your own directory structure!


1: Activate the anvi'o environment

Look anvi'o webpage (http://merenlab.org/2016/06/26/installation-v2/) and find out how to activate it. This step depends on how and where you have installed it.

conda activate anvio-7 # we also support anvio 6.2

2: Load SQM results into anvi'o: the anvi-load-sqm.py script

Now, all you need is to parse your SQM project into anvi'o databases with the anvi-load-sqm.py script. The different options will be shown by running:

python3 /path/to/SqueezeMeta/utils/anvio_utils/anvi-load-sqm.py -h

So, run this script with your options and enjoy a coffee while you wait.

python3 /path/to/SqueezeMeta/utils/anvio_utils/anvi-load-sqm.py -p Hadza16 -o Hadza16/results/anvio --num-threads 12 --min-contig-length 2500 --run-scg-taxonomy --run-HMMS

where:
-p: SQM project directory.
-o: Output directory.
--run-HMMS: run anvi-run-hmms command of anvi'o for identifying single-copy core genes.
--run-scg-taxonomy: run anvi-run-scg-taxonomy command from anvi'o for affiliating single-copy core genes with taxonomic names.
--min-contig-length: Minimun length of the contigs that will be profiled by anvi'o.
--num-threads: number of threads.

When it finishes, you'll see, in your output directory, the CONTIGS.db and the PROFILE.db databases (see anvi'o article to understand their structure).


3: Filter SQM results to visualize them: the anvi-filter-sqm.py script

If your metagenome is huge and it has lots of contigs, it would not be very useful to visualize all at the same time. For this reason, we provide a script to filter your contigs depending on the taxonomy, the functions or the coverage: the anvi-filter-sqm.py script (The rules to phrase these queries are explained at the end of this tutorial as well as within the script).

Example 1:

In this example, we have selected those contigs that contain genes annotated with the word 'antibiotic'.

/path/to/SqueezeMeta/utils/anvio_utils/anvi-filter-sqm.py -c Hadza16/results/anvio/CONTIGS.db -p Hadza16/results/anvio/PROFILE.db -t Hadza16/results/anvio/Hadza16_anvio_contig_taxonomy.txt -o Hadza16/results/anvio/antibiotic -q '(FUN CONTAINS antibiotic)'

Now explore the anvi'o interactive interface.

Figure1_antibioticView.png

Fig. 1: View of the anvi-interactive interface. The graph contains items with information of items (eg. splits) and layers (eg. samples). Splits are fragments of contigs whose length is higher than a certain size. The right panel allows the user to control some features of the view such as splits' order, colors and sizes. The left panel indicates the values of abundances and taxonomy for the chosen contig. Look http://merenlab.org/2016/02/27/the-anvio-interactive-interface/ for more information.

Fig. 2: Zoom of the anvi-interactive interface. Detail of splits and their abundances in the different samples.

If you wish it, you can export the figure into svg format and customize it.

Figure2_antibioticExport.svg

Fig. 3: Export a SVG file from your data to customize it.

Example 2:

In this case, we are interested in visualizing those genes related with 'aromatic' pathways:

/path/to/SqueezeMeta/utils/anvio_utils/anvi-filter-sqm.py -c Hadza16/results/anvio/CONTIGS.db -p Hadza16/results/anvio/PROFILE.db -t Hadza16/results/anvio/Hadza16_anvio_contig_taxonomy.txt -q FUNH CONTAINS 'aromatic' -o Hadza16/results/anvio/aromatic  

Figure3_aromaticView.svg

Fig. 4: View of the splits that host a gene whose annotation includes the word 'aromatic'.

Using the interactive interface, it is possible to search for splits that have a specific taxonomy, coverage or contain a particular function, for example, the 'salicylate hydroxylase'.

Figure4_aromaticSalicylateHydroxylase.svg

Fig. 5: Red coloured splits contain the 'salicylate hydroxylase' function.

Anvi'o even offers the option to independently inspect each contig to see its genes and run a BLAST with their sequences.

Figure5_splitsView.png

*Fig. 6: The splits view allows the user to analyze the genes in their contigs and see their annotations. *

Example 3:

Filtering contigs using taxonomical annotations is also feasible. Eg: now, we show contigs from the genus 'Ruminococcus'. Even you can take advantage of anvio's splits clustering to order contigs using other criteria instead their taxonomy:

/path/to/SqueezeMeta/utils/anvio_utils/anvi-filter-sqm.py -c Hadza16/results/anvio/CONTIGS.db -p Hadza16/results/anvio/PROFILE.db -t Hadza16/results/anvio/Hadza16_anvio_contig_taxonomy.txt -q '(GENUS == Ruminococcus)' -o Hadza16/results/anvio/Ruminococcus --enforce-clustering

Figure6_Ruminococcus.svg

Fig. 7: View of splits annotated as 'Ruminococcus'.

Don't be afraid of making more complex queries with the anvi-filter-sqm.py script, be creative! For instance, now we want to pick up contigs from a particular taxon (Ruminococcus) which are related with 'aromatic' compounds:

/path/to/SqueezeMeta/utils/anvio_utils/anvi-filter-sqm.py -c Hadza16/results/anvio/CONTIGS.db -p Hadza16/results/anvio/PROFILE.db -t Hadza16/results/anvio/Hadza16_anvio_contig_taxonomy.txt -q '(GENUS == Ruminococcus AND FUNH CONTAINS aromatic)' -o Hadza16/results/anvio/aromatic_Ruminococcus2

Figure7_aromaticRumino.svg

Fig. 8: View of splits that are annotated as 'Ruminococcus' and 'aromatic'.


4: Open a previous filtered results:

Now, imagine that you have been doing some queries and you just want to visualize one, but without making it again, the best way is to use directly the anvi'o command:

anvi-interactive -c Hadza16/results/anvio/antibiotic/SqueezeMeta/CONTIGS.db -p Hadza16/results/anvio/antibiotic/SqueezeMeta/PROFILE.db -t Hadza16/results/anvio/antibiotic/Taxonomy.nwk

5: Explore bins using anvi'o:

Anvi'o is very helpful tool to explore bins. Next, we provide you with some of their most handy commands, but we recommend visiting their tutorials about binning. Eg:
-http://merenlab.org/tutorials/infant-gut/#manual-identification-of-genomes-in-the-infant-gut-dataset
-http://merenlab.org/tutorials/infant-gut/#manually-curating-automatic-binning-outputs
-http://merenlab.org/2017/05/11/anvi-refine-by-veronika/

First, look whether there are some bins collections loaded:

anvi-show-collections-and-bins -p  Hadza16/results/anvio/PROFILE.db

Given that we are working using SQM results, we just have one default collection which is named 'DAS'.

anvi-interactive -c Hadza16/results/anvio/CONTIGS.db -p Hadza16/results/anvio/PROFILE.db -C DAS

This view shows you the number of bins, their completitude and redundancy (http://merenlab.org/2016/06/09/assessing-completion-and-contamination-of-MAGs/) and the abundance of each bin in each sample.

Figure8_DAS.svg

Fig. 9: View of all the bins of the project.

If you want to focus on an specific bin. Eg: 'maxbin_005_fasta_contigs. You can use the anvi'o command:

anvi-refine -c Hadza16/results/anvio/CONTIGS.db -p Hadza16/results/anvio/PROFILE.db -C DAS -b maxbin_005_fasta_contigs

Figure9_maxbin005.svg

Fig. 10: Refininig a specific bin.

Interactive views of anvi'o are really powerful and you can customize your plots with the most relevant info for you. So the best thing you can do is trying it!

Find more information about

The SqueezeMeta paper at: https://www.frontiersin.org/articles/10.3389/fmicb.2018.03349/full
The SQMtools paper at: https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-020-03703-2


Bonus for anvi-filter.py: Rules queries:

QUERY SYNTAX:

  • Please enclose query strings within double brackets.

  • Queries are combinations of relational operations in the form of (e.g. "PHYLUM == Bacteroidetes") joined by logical operators (AND, OR).

  • Parentheses can be used to group operations together.

  • The "AND" and "OR" logical operators can't appear together in the same expression. Parentheses must be used to separate them into different expressions. e.g: "GENUS == Escherichia OR GENUS == Prevotella AND FUN CONTAINS iron" would not be valid. Parentheses must be used to write either: "(GENUS == Escherichia OR GENUS == Prevotella)" AND FUN CONTAINS iron" -> splits from either Escherichia or Prevotella which contain ORFs related to iron. "GENUS == Escherichia OR (GENUS == Prevotella AND FUN CONTAINS iron)" -> all splits from Escherichia, and splits of Prevotella which contains ORFs related to iron.

  • Another example query would be: "(PHYLUM == Bacteroidetes OR CLASS IN [Alphaproteobacteria, Gammaproteobacteria]) AND FUN CONTAINS iron AND Sample1 > 1"

    • This would select all the anvi'o splits assigned to either the Bacteroidetes phylum or the Alphaproteobacteria or Gammaproteobacteria classes, that also contain the substring "iron" in the functional annotations of any of their ORFs, and whose anvi'o abundance (mean coverage of a split divided by overall sample mean coverage) in Sample1 is higher than 1.
  • Possible subjects are:

    • FUN: search within all the combined databases used for functional annotation.
    • FUNH: search within the KEGG BRITE and COG functional hierarchies (e.g. "FUNH CONTAINS Carbohydrate metabolism" will select all the splits containing a gene associated with the broad "Carbohydrate metabolism" category)
    • SUPERKINGDOM, PHYLUM, CLASS, ORDER, FAMILY, GENUS, SPECIES: search within the taxonomic annotation at the requested taxonomic rank.
    • <SAMPLE_NAME>: search within the abundances in the sample named <SAMPLE_NAME> (e.g. if you have two samples named "Sample1" and "Sample2", the query string "Sample1 > 0.5 AND Sample2 > 0.5" would return the anvi'o splits with an anvi'o abundance higher than 0.5 in both samples)
  • Posible relational operators are "==", "!=", ">=", "<=", ">", "<", "IN", "NOT IN", "CONTAINS", "DOES NOT CONTAIN"