From 716020442e5416d5f8507ae9efd0ad35d0ad3bae Mon Sep 17 00:00:00 2001 From: "C. Titus Brown" Date: Mon, 29 Apr 2024 03:04:48 -0700 Subject: [PATCH] MRG: add more text (#8) * add more text * add extra exercise * add citation * add more text --- docs/amr.md | 25 +++++++++++-- docs/comparing-metagenomes.md | 54 +++++++++++++++++++++++------ docs/index.md | 9 ++++- docs/single-metagenomes-taxonomy.md | 15 ++++++-- 4 files changed, 88 insertions(+), 15 deletions(-) diff --git a/docs/amr.md b/docs/amr.md index a805019..a9e563d 100644 --- a/docs/amr.md +++ b/docs/amr.md @@ -63,8 +63,9 @@ And, finally, run AMRfinder on the proteins: ``` amrfinder -p CD136.assembly.faa -t 16 -o CD136.amrfinder.tsv --plus ``` +(This will take under a minute.) -This will produce a spreadsheet named `CD136.amrfinder.tsv` that +AMRfinder will produce a spreadsheet named `CD136.amrfinder.tsv` that contains a number of columns - you can see the list like so, using `csvtk headers`: @@ -79,5 +80,25 @@ Run: csvtk -t cut -f "% Coverage of reference sequence","HMM description" CD136.amrfinder.tsv ``` - +and you will see: +``` +% Coverage of reference sequence HMM description +89.41 CfxA family broad-spectrum class A beta-lactamase +87.59 23S ribosomal RNA methyltransferase Erm +52.84 NA +100.00 macrolide efflux MFS transporter Mef(En2) +100.00 lincosamide nucleotidyltransferase Lnu(AN2) +100.00 CepA family extended-spectrum class A beta-lactamase +``` + +The first column here is the amount of the known (reference) sequence +that is present in the metagenome, and the second is the description of +the match. + +Note: If you wanted to get the abundance of these in the metagenome, +you would have to find the DNA contig that the relevant gene was on, +using the column "Protein identifier", and then map the metagenome +reads to it to get the abundance. This is because assembly collapses +the abundance of the output contigs, and you have to recover it through +other means. diff --git a/docs/comparing-metagenomes.md b/docs/comparing-metagenomes.md index 42f3665..c9c010a 100644 --- a/docs/comparing-metagenomes.md +++ b/docs/comparing-metagenomes.md @@ -1,8 +1,17 @@ # Comparing metagenomes +The tutorial uses [sourmash](https://sourmash.readthedocs.io/) to do +comparisons of multiple metagenomes based on weighted and unweighted +k-mer content. + +In this tutorial, you will learn how to create distance matrices and +ordination plots from metagenome content. Importantly, this tutorial +is *reference* and *annotation* free - it will work equally well on +any metagenome. + ## First, create a conda software environment and a working directory. -To install software, run: +To install the necessary software, run: ``` mamba create -n smash -y sourmash scikit-learn conda activate smash @@ -14,14 +23,12 @@ mkdir ~/compare-metag cd ~/compare-metag ``` - ## Comparing based on content - - Here we are going to use the +[`sourmash compare`](https://sourmash.readthedocs.io/en/latest/command-line.html#sourmash-compare-compare-many-signatures) and [`sourmash plot`](https://sourmash.readthedocs.io/en/latest/command-line.html#sourmash-plot-cluster-and-visualize-comparisons-of-many-signatures) -command to compare and cluster many metagenomes based on their content - not their annotation or assemblies. +commands to compare and cluster many metagenomes based on their content. As with the [single metagenome analysis](single-metagenomes-taxonomy.md), we have two options here: with, or without abundance information. @@ -114,19 +121,46 @@ If you plot this via MDS, you'll see a clear separation: Points to discuss: * what does this all mean, in ~microbial terms? Hint: ask Mani to - revist how the test data sets were generated! + revist how the test data sets were generated! Alternatively, + go on to the next section! + +## Extra: examining taxonomy - +Note that in this case that's not an accident: the dataset was created +specifically to contain only five species ;). --- diff --git a/docs/index.md b/docs/index.md index 446721c..9981467 100644 --- a/docs/index.md +++ b/docs/index.md @@ -1,6 +1,7 @@ # Introduction - +These are tutorials for the PIG-PARADIGM workshop on metagenomics, +Apr 29th, 2024, given at Wageningen. Tutorials: @@ -12,3 +13,9 @@ Tutorials: Data originally from [the MIntO tutorial data](https://zenodo.org/records/6369313). + +## More information + +Authors: Anneliek ter Horst and C. Titus Brown + +See the GitHub repo at [ngs-docs/2024-pig-paradigm-workshop](https://github.com/ngs-docs/2024-pig-paradigm-workshop). diff --git a/docs/single-metagenomes-taxonomy.md b/docs/single-metagenomes-taxonomy.md index 32e6e9c..61170e3 100644 --- a/docs/single-metagenomes-taxonomy.md +++ b/docs/single-metagenomes-taxonomy.md @@ -1,5 +1,18 @@ # Analyzing a single metagenome for taxonomy +The tutorial uses [sourmash](https://sourmash.readthedocs.io/) to do +various k-mer based analyses of Illumina shotgun metagenome content. + +In this tutorial, you will learn: + +* how to look at what genomes share content with a metagenome; +* how to examine the abundance of metagenome content without a reference; +* how to summarize the taxonomic content of a metagenome; + +We will be using the taxonomic classification system as benchmarked in +[Evaluation of taxonomic classification and profiling methods for long-read shotgun metagenomic sequencing datasets](https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-022-05103-0), +which is both very *sensitive* and quite *specific*. + ## Creating a working directory Run: @@ -90,8 +103,6 @@ Points to discuss: content is present in the reference database. Some of this is probably erroneous data or host contamination. - - ### K-mer abundance histogram Let's examine this data set further. First, let's take a look at the