-
Notifications
You must be signed in to change notification settings - Fork 9
06. Assembly
The tools within this workflow perform metagenome assemblies with the de novo assemblers metaSPAdes in SPAdes version 3.14.0, as well as MEGAHIT version 1.1.2, on trimmed Illumina paired-end reads. The SPAdes container can also be used to perform de novo assemblies of isolate DNA with SPAdes and plasmidSPAdes, or RNA transcripts with rnaSPAdes. QUAST version 5.0.2 is used to evaluate the assemblies, and MultiQC version 1.4 provides aggregated visualizations for the QUAST reports. This workflow has been tested to run offline in an air-gapped system following the execution of the Read Filtering Workflow.
If you have not already, you will need to activate your metscale environment and perform the Offline Setup for the assembly workflow:
[user@localhost ~]$ conda activate metscale
(metscale)[user@localhost ~]$ cd metscale/workflows
(metscale)[user@localhost workflows]$ python download_offline_files.py --workflow assembly
In the metscale/container_images/
directory, you should see the following Singularity images that were created when running the assembly or all flag during the Offline Setup:
File Name | File Size |
---|---|
spades_3.14.0--h2d02072_0.sif |
104 MB |
megahit_1.1.2--py35_0.sif |
48 MB |
quast_5.0.2--py27pl526ha92aebf_0.sif |
810 MB |
multiqc_1.4--py35_0.sif |
453 MB |
If you are missing any of these files, you should re-run the appropriate setup command, as per instructions in the Offline Setup.
The assembly workflow uses the Illumina paired-end filtered reads (outputs from the Read Filtering Workflow) as its inputs. These files should be located in the metscale/workflows/data
directory:
File Name | File Size |
---|---|
SRR606249_subset10_1_reads_trim2_1.fq.gz |
365 MB |
SRR606249_subset10_1_reads_trim2_2.fq.gz |
359 MB |
SRR606249_subset10_1_reads_trim30_1.fq.gz |
313 MB |
SRR606249_subset10_1_reads_trim30_2.fq.gz |
300 MB |
These example reference assembly files should also be located in the metscale/workflows/data/
directory for reference-based assembly evaluation with MetaQUAST:
File Name | File Size | MD5 Checksum |
---|---|---|
GCF_000008565.1_ASM856v1_genomic.fna.gz |
924 KB |
a556db886f11a8af3783d63319140e74 |
Shakya_Refs/ |
60 MB |
none, 64 reference files in directory |
If these files look good to go, then you may proceed to run the example dataset through the assembly workflow rules.
Workflows are executed according to the sample names and workflow parameters, as specified in the config file. For more information about config files, see the Getting Started wiki page.
After the config file is ready, be sure to specify the Singularity bind path from the metscale/workflows
directory before running the assembly workflow.
cd metscale/workflows
export SINGULARITY_BINDPATH="data:/tmp"
You can then execute of the workflows through snakemake using the following command:
snakemake --use-singularity {rules} {other options}
The following rules are available for execution in the assembly workflow (yellow stars indicate terminal rules):
The assembly rules and their parameters are listed under "workflows" in the metscale/workflows/config/default_workflowconfig.settings
config file.
Sample Type | Rule | Description |
---|---|---|
Metagenome | assembly_metaspades_workflow |
metaSPAdes assemblies filtered reads |
Metagenome | assembly_megahit_workflow |
MEGAHIT assemblies filtered reads |
Metagenome | assembly_all_workflow |
metaSPAdes and MEGAHIT both independently assemble filtered reads |
Metagenome | assembly_quast_workflow |
MetaQUAST evaluates the metagenomic assemblies |
Metagenome | assembly_multiqc_workflow |
MultiQC aggregates all QUAST reports from MEGAHIT or metaSPAdes assemblies into a single report |
Metagenome | assembly_metaquast_workflow |
MetaQUAST evaluates the metagenomic assemblies against a single reference or multiple references |
RNA Transcripts | assembly_rnaspades_workflow |
rnaSPAdes assemblies filtered reads from RNA transcripts* |
RNA Transcripts | assembly_rnaspades_metaquast_workflow |
MetaQUAST evaluates the RNA transcript assemblies* |
RNA Transcripts | assembly_rnaspades_multiqc_workflow |
MultiQC aggregates all MetaQUAST reports from rnaSPAdes into a single report* |
Isolate | assembly_spades_workflow |
SPAdes assemblies filtered reads from isolates** |
Isolate | assembly_quast_reference_with_spades_workflow |
QUAST evaluates the spades assembly against a reference** |
Bacterial Isolate with Plasmids | assembly_plasmidspades_workflow |
plasmidSPAdes assemblies filtered reads from plasmids of isolates** |
Bacterial Isolate with Plasmids | assembly_quast_reference_with_plasmidspades_workflow |
QUAST evaluates the plasmidSPAdes assembly against a reference** |
*The assembly_rnaspades_workflow
, assembly_rnaspades_metaquast_workflow
, and assembly_rnaspades_multiqc_workflow
rules are intended to be run on RNA transcript sequences.
**The assembly_spades_workflow
, assembly_plasmidspades_workflow
, assembly_quast_reference_with_spades_workflow
, and assembly_quast_reference_with_plasmidspades_workflow
rules are intended to be run with isolate sequences.
This wiki describes how to run those rules with the Shakya subset 10 test dataset for explanatory purposes, but it is more appropriate to run isolate genome sequences through the SPAdes and plasmidSPAdes assemblers, and metatranscriptomes or transcriptomes would be assembled with rnaSPAdes. Complex metagenomes like the Shakya subset 10 test dataset should be assembled with MEGAHIT or metaSPAdes.
The metagenome assembly rules for MEGAHIT and metaSPAdes can be run independently, or run together by listing them back to back in the command as such:
snakemake --use-singularity assembly_all_workflow assembly_metaquast_workflow assembly_multiqc_workflow
The following command will run only the metSPAdes assembler:
snakemake --use-singularity assembly_metaspades_workflow
The following command will run only the MEGAHIT assembler:
snakemake --use-singularity assembly_megahit_workflow
Both metagenome assemblers can be run in tandem, or with the assembly_all_workflow
rule:
snakemake --use-singularity assembly_all_workflow
To assemble RNA transcript sequences with rnaSPAdes, run the following rule:
snakemake --use-singularity assembly_rnaspades_workflow
To evaluate the metagenome assemblies with MEGAHIT or metaSPAdes, QUAST can be run with the assembly_quast_workflow
rule:
snakemake --use-singularity assembly_quast_workflow
The assembly_multiqc_workflow
rule concatenates all of the metagenome QUAST reports into a single report with MultiQC. This rule can also be used independently to execute the entire metagenomic assembly workflow and reference-independent assembly evaluations:
snakemake --use-singularity assembly_multiqc_workflow
To evaluate the metagenomic assemblies using MetaQUAST with a single reference, the reference file should be specified in default_workflowparams.settings
under the assembly section. The default file is downloaded in the offline assembly as an example, but this filename should be updated based on the expected reference for a sample:
"metaquast_ref" : "GCF_000008565.1_ASM856v1_genomic.fna.gz",
If multiple references are known for a metagenomic sample, those can all be used in a MetaQUAST evaluation by creating a sub-directory within /workflows/data
that includes all of the reference genome assembly files. During the offline download, a directory called Shakya_Refs
is downloaded that includes all of the expected reference genome assemblies for that sample. This name of the directory of reference genomes can be specified in the assembly section of default_workflowparams.settings
, instead of a single reference file:
"metaquast_ref" : "Shakya_Refs",
Once you have updated the parameter section (if applicable), the following command will execute MetaQUAST with your indicated reference file with MEGAHIT and/or metaSPAdes assemblies:
snakemake --use-singularity assembly_metaquast_workflow
The following command will execute MetaQUAST with your indicated reference file(s) with rnaSPAdes assemblies:
snakemake --use-singularity assembly_rnaspades_metaquast_workflow
The assembly_rnaspades_multiqc_workflow
combines and visualizes the MetaQUAST reports for rnaSPAdes:
snakemake --use-singularity assembly_rnaspades_multiqc_workflow
The following command will run the plasmidSPAdes and SPAdes assemblers on isolates:
snakemake --use-singularity assembly_spades_workflow assembly_plasmidspades_workflow
To evaluate the isolate and/or plasmid assemblies using QUAST, the reference assembly file needs to be specified in default_workflowparams.settings
under the assembly section. The default file is downloaded in the offline assembly, but it should be updated to match the expected reference for a sample.
"quast_spades_ref" : "GCF_000008565.1_ASM856v1_genomic.fna.gz",
and
"quast_plasmidspades_ref" : "GCF_000008565.1_ASM856v1_genomic.fna.gz",
Once you have updated the parameter section (if applicable), the following command will execute QUAST with your specified reference assembly file:
snakemake --use-singularity assembly_quast_reference_with_spades_workflow assembly_quast_reference_with_plasmidspades_workflow
Additional options for snakemake can be found in the snakemake documentation.
To specify your own parameters for this or any of the workflows prior to execution, see Workflow Architecture for more information.
After successful execution of the assembly workflow, outputs will be found in the metscale/workflows/data/
directory. You should expect to see the following files for each pair of trimmed reads:
Tool Output | File Name | Description |
---|---|---|
metaSPAdes | {sample}_1_reads_trim{quality_threshold}.metaspades.contigs.fa |
The final metaSPAdes assembled contigs from metagenomes, which is the output file used by downstream analysis tools |
metaSPAdes | {sample}_1_reads_trim{quality_threshold}.metaspades/ |
Directory with additional outputs from the metaSPAdes assembler |
MEGAHIT | {sample}_1_reads_trim{quality_threshold}.megahit.contigs.fa |
The final MEGAHIT assembled contigs from metagenomes, which is the output file used by downstream analysis tools |
MEGAHIT | {sample}_1_reads_trim{quality_threshold}.megahit/ |
Directory with additional outputs from the MEGAHIT assembler |
QUAST (without references) | {sample}_1_reads_trim{quality_threshold}.{assembler}_quast/ |
Directory with QUAST outputs for MEGAHIT and/or metaSPAdes |
QUAST (without references) | {sample}_1_reads_trim{quality_threshold}. {assembler}_quast/report.html |
QUAST HTML report for MEGAHIT and/or metaSPAdes |
MultiQC | {sample}_1_reads.{assembler}_multiqc_report.html |
MultiQC HTML report, including multiple QUAST reports |
MultiQC | {sample}_1_reads.{assembler}_multiqc_report_data/ |
MultiQC directory with additional QUAST data and statistics |
MetaQUAST with metaSPAdes and/or MEGAHIT | {sample}_1_reads_trim{quality_threshold}.{assembler}_metaquast/ |
MetaQUAST directory with MetaQUAST HTML report, additional data, and statistics for metaSPAdes and/or MEGAHIT |
MetaQUAST with rnaSPAdes | {sample}_1_reads_trim{quality_threshold}.rnaspades_metaquast_report_data/ |
MetaQUAST directory with MetaQUAST HTML report, additional data, and statistics for rnaSPAdes assembly |
SPAdes | {sample}_1_reads_trim{quality_threshold}_k{k_values}.spades.contigs.fa |
The final SPAdes assembled contigs from isolates, which is the output file used by downstream analysis tools |
SPAdes | {sample}_1_reads_trim{quality_threshold}_k{k_values}.spades/ |
Directory with additional outputs from the SPAdes assembler |
plasmidSPAdes | {sample}_1_reads_trim{quality_threshold}.plasmidspades.contigs.fa |
The final plasmidSPAdes assembled contigs from isolates, which is the output file used by downstream analysis tools |
plasmidSPAdes | {sample}_1_reads_trim{quality_threshold}.plasmidspades/ |
Directory with additional outputs from the plasmidSPAdes assembler |
QUAST (with reference) | {sample}_1_reads_trim{quality_threshold}.{assembler}-quast/ |
Directory with QUAST outputs for SPAdes and/or plasmidSPAdes |
QUAST (with reference) | {sample}_1_reads_trim{quality_threshold}.{assembler}-quast/report.html |
QUAST HTML report for SPAdes and/or plasmidSPAdes |
rnaSPAdes | {sample}_1_reads_trim{quality_threshold}.rnaspades.transcripts.fasta |
The final rnaSPAdes assembled contigs from transcriptome, which is the output file used by downstream analysis tools |
rnaSPAdes | {sample}_1_reads_trim{quality_threshold}.rnaspades/ |
rnaSPAdesdirectory with additional data, and statistics for rnaSPAdes |
The above files are the major outputs of the assembly workflow, and the *contigs.fa files are used as inputs into the Comparison and/or Functional Inference workflow pages.
To better understand how the workflows are operating, it may be helpful to see commands that could be used to generate equivalent outputs with the individual tools. Note that the file names in the below examples may not be exact replicates of the file naming conventions in the current workflows, but the commands are equivalent.
The metaSPAdes assembly of reads filtered with a quality score threshold of 2 is equivalent to running this command:
metaspades.py -m 240 -1 {sample}_1_reads_trim2_1.fq.gz -2 {sample}_1_reads_trim2_2.fq.gz -o {sample}_1_reads_trim2.metaspades
metaspades.py -m 240 -1 SRR606249_subset10_1_reads_trim2_1.fq.gz -2 SRR606249_subset10_1_reads_trim2_2.fq.gz -o SRR606249_subset10_1_reads_trim2.metaspades
The metaSPAdes assembly of reads filtered with a quality score threshold of 30 is equivalent to running this command:
metaspades.py -m 240 -1 {sample}_1_reads_trim30_1.fq.gz -2 {sample}_1_reads_trim30_2.fq.gz -o {sample}_1_reads_trim30.metaspades
metaspades.py -m 240 -1 SRR606249_subset10_1_reads_trim30_1.fq.gz -2 SRR606249_subset10_1_reads_trim30_2.fq.gz -o SRR606249_subset10_1_reads_trim30.metaspades
The QUAST evaluations of the metaSPAdes assemblies is equivalent to running these commands:
quast.py {sample}_1_reads_trim2.metaspades.contigs.fa -o {sample}_1_reads_trim2.metaspades_quast
quast.py {sample}_1_reads_trim30.metaspades.contigs.fa -o {sample}_1_reads_trim30.metaspades_quast
quast.py SRR606249_subset10_1_reads_trim2.metaspades.contigs.fa -o SRR606249_subset10_1_reads_trim2.metaspades_quast
quast.py SRR606249_subset10_1_reads_trim30.metaspades.contigs.fa -o SRR606249_subset10_1_reads_trim30.metaspades_quast
The MultiQC aggregation of the metaSPAdes QUAST reports is equivalent to running this command:
multiqc {sample}_1_reads_trim2.metaspades_quast/report.tsv {sample}_1_reads_trim30.metaspades_quast/report.tsv -n {sample}_1_reads_metaspades_multiqc_report -o {sample}_1_reads_metaspades_multiqc_report
multiqc SRR606249_subset10_1_reads_trim2.metaspades_quast/report.tsv SRR606249_subset10_1_reads_trim30.metaspades_quast/report.tsv -n SRR606249_subset10_1_reads_metaspades_multiqc_report -o SRR606249_subset10_1_reads_metaspades_multiqc_report
The MetaQUAST evaluations of the metaSPAdes assemblies is equivalent to running these commands:
metaquast.py {sample}_1_reads_trim2.metaspades.contigs.fa -R {reference} --fragmented --gene-finding -o {sample}_1_reads_trim2.metaspades_quast
metaquast.py {sample}_1_reads_trim30.metaspades.contigs.fa -R {reference} --fragmented --gene-finding -o {sample}_1_reads_trim30.metaspades_quast
metaquast.py SRR606249_subset10_1_reads_trim2.metaspades.contigs.fa -R Shakya_Refs/ --fragmented --gene-finding -o SRR606249_subset10_1_reads_trim2.metaspades_quast
metaquast.py SRR606249_subset10_1_reads_trim30.metaspades.contigs.fa -R Shakya_Refs/ --fragmented --gene-finding -o SRR606249_subset10_1_reads_trim30.metaspades_quast
The MEGAHIT assembly of reads filtered with a quality score threshold of 2 is equivalent to running this command:
megahit -1 {sample}_1_reads_trim2_1.fq.gz -2 {sample}_1_reads_trim2_2.fq.gz --out-prefix={sample}_1_reads_trim2.megahit -o {sample}_1_reads_trim2.megahit
megahit -1 SRR606249_subset10_1_reads_trim2_1.fq.gz -2 SRR606249_subset10_1_reads_trim2_2.fq.gz --out-prefix=SRR606249_subset10_1_reads_trim2.megahit -o SRR606249_subset10_1_reads_trim2.megahit
The MEGAHIT assembly of reads filtered with a quality score threshold of 30 is equivalent to running this command:
megahit -1 {sample}_1_reads_trim30_1.fq.gz -2 {sample}_1_reads_trim30_2.fq.gz --out-prefix={sample}_1_reads_trim30.megahit -o {sample}_1_reads_trim30.megahit
megahit -1 SRR606249_subset10_1_reads_trim30_1.fq.gz -2 SRR606249_subset10_1_reads_trim30_2.fq.gz --out-prefix=SRR606249_subset10_1_reads_trim30.megahit -o SRR606249_subset10_1_reads_trim30.megahit
The QUAST evaluations of the MEGAHIT assemblies is equivalent to running these commands:
quast.py {sample}_1_reads_trim2.megahit.contigs.fa -o {sample}_1_reads_trim2.megahit_quast
quast.py {sample}_1_reads_trim30.megahit.contigs.fa -o {sample}_1_reads_trim30.megahit_quast
quast.py SRR606249_subset10_1_reads_trim2.megahit.contigs.fa -o SRR606249_subset10_1_reads_trim2.megahit_quast
quast.py SRR606249_subset10_1_reads_trim30.megahit.contigs.fa -o SRR606249_subset10_1_reads_trim30.megahit_quast
The MultiQC aggregation of the MEGAHIT QUAST reports is equivalent to running this command:
multiqc {sample}_1_reads_trim2.megahit_quast/report.tsv {sample}_1_reads_trim30.megahit_quast/report.tsv -n {sample}_1_reads_megahit_multiqc_report -o {sample}_1_reads_megahit_multiqc_report
multiqc SRR606249_subset10_1_reads_trim2.megahit_quast/report.tsv SRR606249_subset10_1_reads_trim30.megahit_quast/report.tsv -n SRR606249_subset10_1_reads_megahit_multiqc_report -o SRR606249_subset10_1_reads_megahit_multiqc_report
The MetaQUAST evaluations of the MEGAHIT assemblies is equivalent to running these commands:
metaquast.py {sample}_1_reads_trim2.megahit.contigs.fa -R {reference} --fragmented --gene-finding -o {sample}_1_reads_trim2.megahit_quast
metaquast.py {sample}_1_reads_trim30.megahit.contigs.fa -R {reference} --fragmented --gene-finding -o {sample}_1_reads_trim30.megahit_quast
metaquast.py SRR606249_subset10_1_reads_trim2.megahit.contigs.fa -R Shakya_Refs/ --fragmented --gene-finding -o SRR606249_subset10_1_reads_trim2.megahit_quast
metaquast.py SRR606249_subset10_1_reads_trim30.megahit.contigs.fa -R Shakya_Refs/ --fragmented --gene-finding -o SRR606249_subset10_1_reads_trim30.megahit_quast
The SPAdes assembly of reads filtered with a quality score threshold of 2 is equivalent to running this command:
spades.py -k {k_values} -1 {sample}_1_reads_trim2_1.fq.gz -2 {sample}_1_reads_trim2_2.fq.gz -o {sample}_1_reads_trim2_k{k_values}.spades
spades.py -k 21,33,55 -1 SRR606249_subset10_1_reads_trim2_1.fq.gz -2 SRR606249_subset10_1_reads_trim2_2.fq.gz -o SRR606249_subset10_1_reads_trim2_k21_33_55.spades
The SPAdes assembly of reads filtered with a quality score threshold of 30 is equivalent to running this command:
spades.py -k {k_values} -1 {sample}_1_reads_trim30_1.fq.gz -2 {sample}_1_reads_trim30_2.fq.gz -o {sample}_1_reads_trim30_k{k_values}.spades
spades.py -k 21,33,55 -1 SRR606249_subset10_1_reads_trim30_1.fq.gz -2 SRR606249_subset10_1_reads_trim30_2.fq.gz -o SRR606249_subset10_1_reads_trim30_k21_33_55.spades
The QUAST evaluations of the SPAdes assemblies with a reference assembly is equivalent to running these commands:
quast.py {sample}_1_reads_trim2.spades.contigs.fa -R {reference_assembly} -o {sample}_1_reads_trim2.spades_quast
quast.py {sample}_1_reads_trim30.spades.contigs.fa -R {reference_assembly} -o {sample}_1_reads_trim30.spades_quast
quast.py SRR606249_subset10_1_reads_trim2.spades.contigs.fa -R GCF_000008565.1_ASM856v1_genomic.fna.gz -o SRR606249_subset10_1_reads_trim2.spades_quast
quast.py SRR606249_subset10_1_reads_trim30.spades.contigs.fa -R GCF_000008565.1_ASM856v1_genomic.fna.gz -o SRR606249_subset10_1_reads_trim30.spades_quast
The plasmidSPAdes assembly of reads filtered with a quality score threshold of 2 is equivalent to running this command:
plasmidspades.py -1 {sample}_1_reads_trim2_1.fq.gz -2 {sample}_1_reads_trim2_2.fq.gz -o {sample}_1_reads_trim2.plasmidspades
plasmidspades.py -1 SRR606249_subset10_1_reads_trim2_1.fq.gz -2 SRR606249_subset10_1_reads_trim2_2.fq.gz -o SRR606249_subset10_1_reads_trim2.plasmidspades
The plasmidSPAdes assembly of reads filtered with a quality score threshold of 30 is equivalent to running this command:
plasmidspades.py -1 {sample}_1_reads_trim30_1.fq.gz -2 {sample}_1_reads_trim30_2.fq.gz -o {sample}_1_reads_trim30.plasmidspades
plasmidspades.py -1 SRR606249_subset10_1_reads_trim30_1.fq.gz -2 SRR606249_subset10_1_reads_trim30_2.fq.gz -o SRR606249_subset10_1_reads_trim30.plasmidspades
The QUAST evaluations of the plasmidSPAdes assemblies with a reference assembly is equivalent to running these commands:
quast.py {sample}_1_reads_trim2.plasmidspades.contigs.fa -R {reference_assembly} -o {sample}_1_reads_trim2.plasmidSPAdes-quast
quast.py {sample}_1_reads_trim30.plasmidSPAdes.contigs.fa -R {reference_assembly} -o {sample}_1_reads_trim30.plasmidSPAdes-quast
quast.py SRR606249_subset10_1_reads_trim2.plasmidSPAdes.contigs.fa -R GCF_000008565.1_ASM856v1_genomic.fna.gz -o SRR606249_subset10_1_reads_trim2.plasmidSPAdes-quast
quast.py SRR606249_subset10_1_reads_trim30.plasmidSPAdes.contigs.fa -R GCF_000008565.1_ASM856v1_genomic.fna.gz -o SRR606249_subset10_1_reads_trim30.plasmidSPAdes-quast
The rnaSPAdes assembly of reads filtered with a quality score threshold of 2 is equivalent to running this command:
rnaspades.py -m 240 -1 {sample}_1_reads_trim2_1.fq.gz -2 {sample}_1_reads_trim2_2.fq.gz -o {sample}_1_reads_trim2.rnaspades
rnaspades.py -m 240 -1 SRR606249_subset10_1_reads_trim2_1.fq.gz -2 SRR606249_subset10_1_reads_trim2_2.fq.gz -o SRR606249_subset10_1_reads_trim2.rnaspades
The rnaSPAdes assembly of reads filtered with a quality score threshold of 30 is equivalent to running this command:
rnaspades.py -m 240 -1 {sample}_1_reads_trim30_1.fq.gz -2 {sample}_1_reads_trim30_2.fq.gz -o {sample}_1_reads_trim30.rnaspades
rnaspades.py -m 240 -1 SRR606249_subset10_1_reads_trim30_1.fq.gz -2 SRR606249_subset10_1_reads_trim30_2.fq.gz -o SRR606249_subset10_1_reads_trim30.rnaspades
The MetaQUAST evaluations of the rnaSPAdes assemblies is equivalent to running these commands:
metaquast.py {sample}_1_reads_trim2.rnaspades.contigs.fa -R {reference} --fragmented --gene-finding -o {sample}_1_reads_trim2.rnaspades_quast
metaquast.py {sample}_1_reads_trim30.rnaspades.contigs.fa -R {reference} --fragmented --gene-finding -o {sample}_1_reads_trim30.rnaspades_quast
metaquast.py SRR606249_subset10_1_reads_trim2.rnaspades.contigs.fa -R Shakya_Refs/ --fragmented --gene-finding -o SRR606249_subset10_1_reads_trim2.rnaspades_quast
metaquast.py SRR606249_subset10_1_reads_trim30.rnaspades.contigs.fa -R Shakya_Refs/ --fragmented --gene-finding -o SRR606249_subset10_1_reads_trim30.rnaspades_quast
The MultiQC aggregation of the rnaSPAdes MetaQUAST reports is equivalent to running this command:
multiqc {sample}_1_reads_trim2.rnaspades_quast/report.tsv {sample}_1_reads_trim30.rnaspades_quast/report.tsv -n {sample}_1_reads_rnaspades_multiqc_report -o {sample}_1_reads_rnaspades_multiqc_report
multiqc SRR606249_subset10_1_reads_trim2.rnaspades_quast/report.tsv SRR606249_subset10_1_reads_trim30.rnaspades_quast/report.tsv -n SRR606249_subset10_1_reads_rnaspades_multiqc_report -o SRR606249_subset10_1_reads_rnaspades_multiqc_report
Below is a more detailed description of the output files expected in the metscale/workflows/data/
directory after the assembly workflow has been successfully run.
Using the filtered reads generated by the Read Filtering Workflow:
File Name | File Size |
---|---|
SRR606249_subset10_1_reads_trim2_1.fq.gz |
365 MB |
SRR606249_subset10_1_reads_trim2_2.fq.gz |
359 MB |
SRR606249_subset10_1_reads_trim30_1.fq.gz |
313 MB |
SRR606249_subset10_1_reads_trim30_2.fq.gz |
300 MB |
The following files are produced by SPAdes after assembling the filtered reads from isolates with the assembly_spades_workflow
rule:
File Name | File Size |
---|---|
SRR606249_subset10_1_reads_trim2_k21_33_55.spades.contigs.fa |
150 MB |
SRR606249_subset10_1_reads_trim2_k21_33_55.spades/ |
779 MB |
SRR606249_subset10_1_reads_trim30_k21_33_55.spades.contigs.fa |
139 MB |
SRR606249_subset10_1_reads_trim30_k21_33_55.spades/ |
726 MB |
The following files are produced by plasmidSPAdes after assembling plasmids from the filtered reads of isolates with the assembly_plasmidspades_workflow
rule:
File Name | File Size |
---|---|
SRR606249_subset10_1_reads_trim2.plasmidspades.contigs.fa |
22 MB |
SRR606249_subset10_1_reads_trim2.plasmidspades/ |
109 MB |
SRR606249_subset10_1_reads_trim30.plasmidspades.contigs.fa |
20 MB |
SRR606249_subset10_1_reads_trim30.plasmidspades/ |
101 MB |
The following files are produced by metaSPAdes after assembling the filtered reads with the assembly_metaspades_workflow
or assembly_all_workflow
rule*:
File Name | File Size |
---|---|
SRR606249_subset10_1_reads_trim2.metaspades.contigs.fa |
153 MB |
SRR606249_subset10_1_reads_trim2.metaspades/ |
978 MB |
SRR606249_subset10_1_reads_trim30.metaspades.contigs.fa |
142 MB |
SRR606249_subset10_1_reads_trim30.metaspades/ |
909 MB |
The following files are produced by MEGAHIT after assembling the filtered reads with the assembly_megahit_workflow
or assembly_all_workflow
rule*:
File Name | File Size |
---|---|
SRR606249_subset10_1_reads_trim2.megahit.contigs.fa |
127 MB |
SRR606249_subset10_1_reads_trim2.megahit/ |
108 KB |
SRR606249_subset10_1_reads_trim30.megahit.contigs.fa |
115 MB |
SRR606249_subset10_1_reads_trim30.megahit/ |
104 KB |
*Additional files generated by the metaSPAdes and MEGAHIT assemblers are saved in the sub-directories listed above.
The following files are produced by QUAST after evaluating the assemblies with the assembly_quast_workflow
rule:
File Name | File Size |
---|---|
SRR606249_subset10_1_reads_trim2.metaspades_quast/ |
744 KB |
SRR606249_subset10_1_reads_trim2.metaspades_quast/report.html |
568 KB |
SRR606249_subset10_1_reads_trim30.metaspades_quast/ |
736 KB |
SRR606249_subset10_1_reads_trim30.metaspades_quast/report.html |
556 KB |
SRR606249_subset10_1_reads_trim2.megahit_quast/ |
748 KB |
SRR606249_subset10_1_reads_trim2.megahit_quast/report.html |
574 KB |
SRR606249_subset10_1_reads_trim30.megahit_quast/ |
732 KB |
SRR606249_subset10_1_reads_trim30.megahit_quast/report.html |
557 KB |
The following files are produced by MultiQC after aggregating the QUAST reports with the assembly_multiqc_workflow
rule:
File Name | File Size |
---|---|
SRR606249_subset10_1_reads.megahit_multiqc_report_data/ |
60 KB |
SRR606249_subset10_1_reads.megahit_multiqc_report.html |
1.1 MB |
SRR606249_subset10_1_reads.metaspades_multiqc_report_data/ |
60 KB |
SRR606249_subset10_1_reads.metaspades_multiqc_report.html |
1.1 MB |
The tables below summarize statistics from the QUAST evaluations of SRR606249_subset10_1_reads assemblies in the final MultiQC report.
Sample Name | N50 (Kbp) | N75 (Kbp) | L50 (K) | L75 (K) | Largest contig (Kbp) | Length (Mbp) |
---|---|---|---|---|---|---|
SRR606249_subset10_1_reads_trim2.metaspades.contigs |
2.5 |
1.0 |
7.2 |
26,791.0 |
264.3 |
115.8 |
SRR606249_subset10_1_reads_trim30.metaspades.contigs |
2.5 |
1.0 |
8.1 |
27,066.0 |
172.8 |
104.1 |
SRR606249_subset10_1_reads_trim2.megahit.contigs |
2.9 |
1.1 |
6.1 |
23,014.0 |
264.4 |
109.7 |
SRR606249_subset10_1_reads_trim30.megahit.contigs |
2.4 |
1.0 |
7.0 |
23,574.0 |
212.1 |
97.2 |
The statistics from the MEGAHIT and metaSPAdes assemblies for this sample are similar, although this does not assess potential differences in the taxonomic or functional content of the assembled contigs.
The following files are produced by QUAST after evaluating the assemblies with a reference for the assembly_quast_reference_with_spades_workflow
and assembly_quast_reference_with_plasmidspades_workflow
rules:
File Name | File Size |
---|---|
SRR606249_subset10_1_reads_trim2.spades_quast/ |
1.1 MB |
SRR606249_subset10_1_reads_trim2.spades_quast/report.html |
856 KB |
SRR606249_subset10_1_reads_trim30.spades_quast/ |
1.1 MB |
SRR606249_subset10_1_reads_trim30.spades_quast/report.html |
813 KB |
SRR606249_subset10_1_reads_trim2.plasmidspades-quast/ |
664 KB |
SRR606249_subset10_1_reads_trim2.plasmidspades-quast/report.html |
443 KB |
SRR606249_subset10_1_reads_trim30.plasmidspades-quast/ |
676 KB |
SRR606249_subset10_1_reads_trim30.plasmidspades-quast/report.html |
455 KB |
The following files are produced by MetaQUAST after evaluating the assemblies with reference(s) for metaSPAdes and/or MEGAHIT with the assembly_metaquast_workflow
rule:
File Name | File Size |
---|---|
SRR606249_subset10_1_reads_trim2.metaspades_metaquast/ |
1.1 MB |
SRR606249_subset10_1_reads_trim2.metaspades_metaquast/report.html |
927 KB |
SRR606249_subset10_1_reads_trim30.metaspades_metaquast/ |
1.1 MB |
SRR606249_subset10_1_reads_trim30.metaspades_metaquast/report.html |
925 KB |
SRR606249_subset10_1_reads_trim2.megahit_metaquast/ |
1.1 MB |
SRR606249_subset10_1_reads_trim2.megahit_metaquast/report.html |
926 KB |
SRR606249_subset10_1_reads_trim30.megahit_metaquast/ |
1.1 MB |
SRR606249_subset10_1_reads_trim30.megahit_metaquast/report.html |
924 KB |
The following files are produced by rnaSPAdes after assembling the transcript sequences reads with the assembly_rnaspades_workflow
:
File Name | File Size |
---|---|
SRR606249_subset10_1_reads_trim2.rnaspades.transcripts.fasta |
171 MB |
SRR606249_subset10_1_reads_trim2.rnaspades/ |
673 MB |
SRR606249_subset10_1_reads_trim30.rnaspades.transcripts.fasta |
156 MB |
SRR606249_subset10_1_reads_trim30.rnaspades/ |
623 MB |
The following files are produced by MetaQUAST after evaluating the rnaSPAdes assemblies with reference(s) for the assembly_rnaspades_metaquast_workflow
rule:
File Name | File Size |
---|---|
SRR606249_subset10_1_reads_trim2.rnaspades_metaquast_report_data/ |
1.1 MB |
SRR606249_subset10_1_reads_trim30.rnaspades_metaquast_report_data/ |
1.1 MB |
The following files are produced by MultiQC after aggregating the rnaSPAdes MetaQUAST reports with the assembly_rnaspades_multiqc_workflow
rule:
File Name | File Size |
---|---|
SRR606249_subset10_1_reads_trim2.rnaspades_multiqc_report_data/ |
64 KB |
SRR606249_subset10_1_reads_trim2.rnaspades_multiqc_report.html |
1.1 MB |
SRR606249_subset10_1_reads_trim30.rnaspades_multiqc_report_data/ |
64 KB |
SRR606249_subset10_1_reads_trim30.rnaspades_multiqc_report.html |
1.1 MB |