Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Amrplusplus #156

Merged
merged 32 commits into from
Sep 28, 2020
Merged
Show file tree
Hide file tree
Changes from 7 commits
Commits
Show all changes
32 commits
Select commit Hold shift + click to select a range
a37f835
amrplusplus
AroArz Aug 17, 2020
0d594c7
amrplusplus scripts
AroArz Aug 17, 2020
efd328b
added scripts
AroArz Aug 17, 2020
9855d7a
changed input in config.yaml
AroArz Aug 17, 2020
4877fa5
default db empty
AroArz Aug 17, 2020
a3fe561
db -> megares
AroArz Aug 17, 2020
36e2084
annotation --> megares_annotation
AroArz Aug 17, 2020
cd56d05
added amrplusplus to circleci
AroArz Aug 17, 2020
64eb927
updated CHANGELOG.md
AroArz Aug 17, 2020
d7e75d3
updated amrplusplus in config.yaml
AroArz Aug 17, 2020
4c433b7
added amrplusplus to modules.rst
AroArz Aug 17, 2020
ee715a6
updated amrplusplus.smk
AroArz Aug 17, 2020
1bc1f90
added license and removed bin dir
AroArz Aug 17, 2020
32dbfe8
updated modules.rst
AroArz Aug 17, 2020
225bfc7
added align_to_amr
AroArz Aug 19, 2020
9ae5550
fixed default database
AroArz Aug 19, 2020
0b3e141
amrplusplus db empty in conf
AroArz Aug 19, 2020
038d5c2
indentations
AroArz Aug 19, 2020
18bc69c
clarification regarding database
AroArz Aug 19, 2020
8d9ec50
Update docs/source/modules.rst
AroArz Aug 20, 2020
977465a
Update rules/antibiotic_resistance/amrplusplus.smk
AroArz Aug 20, 2020
759ace0
Update docs/source/modules.rst grammar
AroArz Aug 20, 2020
c224c41
Update CHANGELOG.md
AroArz Aug 20, 2020
42c5259
added bwa to stag-mwc.yaml
Aug 20, 2020
c3887bc
added conda environment
AroArz Aug 20, 2020
23a7d3b
amrplusplus exectuable with conda
AroArz Aug 20, 2020
feb9ac8
amrplusplus executable with conda
AroArz Aug 20, 2020
136a59f
samtools version 1.10
AroArz Aug 20, 2020
b01cce6
amrplusplus conda executable
AroArz Aug 20, 2020
6201286
added amrplusplus.yaml to not break stag-mwc.yaml dependencies
AroArz Aug 26, 2020
c98e63c
removed amrplusplus dependencies because they broke it
AroArz Aug 26, 2020
34c44c9
now using amrplusplus.yaml for conda instead of stag-mwc.yaml
AroArz Aug 26, 2020
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions Snakefile
Original file line number Diff line number Diff line change
Expand Up @@ -75,6 +75,7 @@ include: "rules/functional_profiling/humann2.smk"
# Antibiotic resistance
#############################
include: "rules/antibiotic_resistance/groot.smk"
include: "rules/antibiotic_resistance/amrplusplus.smk"

#############################
# Mappers
Expand Down
20 changes: 17 additions & 3 deletions config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -35,7 +35,9 @@ taxonomic_profile:
metaphlan2: False
functional_profile:
humann2: False
antibiotic_resistance: False
antibiotic_resistance:
groot: False
amrplusplus: True
mappers:
bbmap: False
bowtie2: False
Expand Down Expand Up @@ -123,7 +125,20 @@ groot:
maxlength: 125 # Maxlength for groot index
covcutoff: 0.97 # Coverage cutoff for groot report
lowcov: False # Report ARGs with no 5' or 3' coverage. Overrides covcutoff.

amrplusplus:
databases:
megares_fasta: "scripts/amrplusplus/db/megares_modified_database_v2.00.fasta"
megares_annotation: "scripts/amrplusplus/db/megares_modified_annotations_v2.00.csv"
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about

Suggested change
amrplusplus:
databases:
megares_fasta: "scripts/amrplusplus/db/megares_modified_database_v2.00.fasta"
megares_annotation: "scripts/amrplusplus/db/megares_modified_annotations_v2.00.csv"
amrplusplus:
megares:
fasta: "scripts/amrplusplus/db/megares_modified_database_v2.00.fasta"
annotation: "scripts/amrplusplus/db/megares_modified_annotations_v2.00.csv"

rarefaction:
script: "scripts/amrplusplus/bin/rarefaction" # path to rarefaction script
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have not utilized this way of referencing script paths in any previous StaG module. I'm not entirely sure that this belongs in the config file, as usage of the script is tightly coupled to the call to the script (in the rule body), so we might as well just hardcode the script path in there... Thoughts?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure its easy to change. My main idea with this type of referencing was if we would ever change the structure of scripts folder we would only need to change the path in config.yaml and not touch the rulefile itself.

Copy link
Member

@boulund boulund Aug 17, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would like to only have stuff that end-users should be able to modify in the config file. You can still use the params section to define the path to the script for each rule just like you currently have, just that you hardcode it in place there instead of keeping it here in the config.

min: "5" # starting rarefaction level
max: "100" # ending rarefaction level
skip: "5" # number of levels to skip
samples: "1" # number of iterations to sample at
resistome:
script: "scripts/amrplusplus/bin/resistome" # path to resistome script
threshold: "80" # resistome threshold
amr_long_to_wide: "scripts/amrplusplus/bin/amr_long_to_wide.py" # path to amr_long_to_wide.py script

#########################
# Mappers
Expand Down Expand Up @@ -162,4 +177,3 @@ metawrap:
universal: "--universal" # Use universal marker genes
minimum_completion: 70
maximum_contamination: 10

55 changes: 55 additions & 0 deletions envs/Singularity.amrplusplus
Original file line number Diff line number Diff line change
@@ -0,0 +1,55 @@
Bootstrap: docker
From: debian:jessie-slim

#Includes trimmomatic, samtools, bwa, bedtools, vcftools, htslib, kraken2, SNPfinder, freebayes, bbmap

%environment
export LC_ALL=C

%post
apt update \
&& apt install -y --no-install-recommends \
build-essential ca-certificates sudo tcsh\
git make automake autoconf openjdk-7-jre wget gzip unzip sed\
zlib1g-dev curl libbz2-dev locales libncurses5-dev liblzma-dev libcurl4-openssl-dev software-properties-common apt-transport-https\
python3-pip python3-docopt python3-pytest python-dev python3-dev\
libcurl4-openssl-dev libssl-dev zlib1g-dev fonts-texgyre \
gcc g++ gfortran libblas-dev liblapack-dev dos2unix libstdc++6\
r-base-core r-recommended hmmer\
&& rm -rf /var/lib/apt/lists/*


wget -c https://repo.continuum.io/archive/Anaconda3-2020.02-Linux-x86_64.sh
sh Anaconda3-2020.02-Linux-x86_64.sh -bfp /usr/local

# add bioconda channels
conda config --add channels defaults
conda config --add channels conda-forge
conda config --add channels bioconda

# install bulk of bioinformatic tools using conda
conda create -n AmrPlusPlus_env python=3 trimmomatic bwa samtools bedtools freebayes bbmap vcftools htslib kraken2

. /usr/local/bin/activate AmrPlusPlus_env

#ln -s /usr/local/envs/AmrPlusPlus_env/bin/* /usr/local/bin/

#Still experimenting with how to change $PATH location.
echo 'export PATH=$PATH:/usr/local/envs/AmrPlusPlus_env/bin/' >> $SINGULARITY_ENVIRONMENT

# SNPfinder
cd /usr/local
git clone https://github.com/cdeanj/snpfinder.git
cd snpfinder
make
cp snpfinder /usr/local/bin
cd /

# Make sure all the tools have the right permissions to use the tools
chmod -R 777 /usr/local/

%test

%labels
Author meglab-metagenomics
version amrplusplus_v2
179 changes: 179 additions & 0 deletions rules/antibiotic_resistance/amrplusplus.smk
Original file line number Diff line number Diff line change
@@ -0,0 +1,179 @@
# vim: syntax=python expandtab tabstop=4
# AMRPlusPlus v2.0 snakemake integration
# Aron Arzoomand 12/8 2020
AroArz marked this conversation as resolved.
Show resolved Hide resolved

from pathlib import Path, PurePath
from snakemake.exceptions import WorkflowError



localrules:
build_amr_index,


if config["antibiotic_resistance"]["amrplusplus"]:
amrplusplus_outputs = f"{OUTDIR}/amrplusplus/ResistomeResults/AMR_analytics_matrix.csv"
all_outputs.append(amrplusplus_outputs)

citations.add(publications["AMRPlusPlus2"])


amrplusplus_config=config["amrplusplus"]


rule build_amr_index:
"""Get the Megares_2.0 database and annotations"""
output:
megares_db=amrplusplus_config["databases"]["db"],
megares_annot=amrplusplus_config["databases"]["annotation"],
log:
stdout=f"{LOGDIR}/amrplusplus/build_amr.index.stdout.log",
stderr=f"{LOGDIR}/amrplusplus/build_amr.index.stderr.log",
shadow:
"shallow"
singularity:
"shub://meglab-metagenomics/amrplusplus_v2"
shell:
"""
wget -O {output.megares_db} https://raw.githubusercontent.com/meglab-metagenomics/amrplusplus_v2/master/data/amr/megares_modified_database_v2.00.fasta
wget -O {output.megares_annot} https://raw.githubusercontent.com/meglab-metagenomics/amrplusplus_v2/master/data/amr/megares_modified_annotations_v2.00.csv
bwa index {output.megares_db} \
2> {log.stderr} \
> {log.stdout}
"""

rule align_to_amr:
"""BWA to align host filtered reads to Megares_2.0 database"""
input:
R1=f"{OUTDIR}/host_removal/{{sample}}_1.fq.gz",
R2=f"{OUTDIR}/host_removal/{{sample}}_2.fq.gz",
megares_db=amrplusplus_config["databases"]["db"],
output:
alignment=f"{OUTDIR}/amrplusplus/AlignToAMR/{{sample}}.amr.alignment.sam"
log:
stdout=f"{LOGDIR}/amrplusplus/{{sample}}.align_to_amr.stdout.log",
stderr=f"{LOGDIR}/amrplusplus/{{sample}}.align_to_amr.stderr.log",
shadow:
"shallow"
singularity:
"shub://meglab-metagenomics/amrplusplus_v2"
params:
megares_db=amrplusplus_config["databases"]["db"],
shell:
"""
bwa mem \
{params.megares_db} \
{input.R1} \
{input.R2} \
-t 10 \
-R '@RG\\tID:{wildcards.sample}\\tSM:{wildcards.sample}' > {output.alignment}
"""

rule run_resistome:
"""Annotate alignments"""
input:
alignment=f"{OUTDIR}/amrplusplus/AlignToAMR/{{sample}}.amr.alignment.sam",
output:
gene=f"{OUTDIR}/amrplusplus/RunResistome/{{sample}}.gene.tsv",
group=f"{OUTDIR}/amrplusplus/RunResistome/{{sample}}.group.tsv",
mech=f"{OUTDIR}/amrplusplus/RunResistome/{{sample}}.mechanism.tsv",
klass=f"{OUTDIR}/amrplusplus/RunResistome/{{sample}}.class.tsv",
typ=f"{OUTDIR}/amrplusplus/RunResistome/{{sample}}.type.tsv",
log:
stdout=f"{LOGDIR}/amrplusplus/{{sample}}.run_resistome.stdout.log",
stderr=f"{LOGDIR}/amrplusplus/{{sample}}.run_resistome.stderr.log",
shadow:
"shallow"
singularity:
"shub://meglab-metagenomics/amrplusplus_v2"
params:
script=amrplusplus_config["resistome"]["script"],
threshold=amrplusplus_config["resistome"]["threshold"],
megares_db=amrplusplus_config["databases"]["db"],
megares_annot=amrplusplus_config["databases"]["annotation"],
shell:
"""
{params.script} \
-ref_fp {params.megares_db} \
-annot_fp {params.megares_annot} \
-sam_fp {input.alignment} \
-gene_fp {output.gene} \
-group_fp {output.group} \
-mech_fp {output.mech} \
-class_fp {output.klass} \
-type_fp {output.typ} \
-t {params.threshold} \
2> {log.stderr} \
> {log.stdout}
"""

rule run_rarefaction:
"""Running rarefaction"""
input:
alignment=f"{OUTDIR}/amrplusplus/AlignToAMR/{{sample}}.amr.alignment.sam",
output:
gene=f"{OUTDIR}/amrplusplus/RunRarefaction/{{sample}}.gene.tsv",
group=f"{OUTDIR}/amrplusplus/RunRarefaction/{{sample}}.group.tsv",
mech=f"{OUTDIR}/amrplusplus/RunRarefaction/{{sample}}.mechanism.tsv",
klass=f"{OUTDIR}/amrplusplus/RunRarefaction/{{sample}}.class.tsv",
typ=f"{OUTDIR}/amrplusplus/RunRarefaction/{{sample}}.type.tsv",
log:
stdout=f"{LOGDIR}/amrplusplus/{{sample}}.run_rarefaction.stdout.log",
stderr=f"{LOGDIR}/amrplusplus/{{sample}}.run_rarefaction.stderr.log",
shadow:
"shallow"
singularity:
"shub://meglab-metagenomics/amrplusplus_v2"
params:
megares_db=amrplusplus_config["databases"]["db"],
megares_annot=amrplusplus_config["databases"]["annotation"],
script=amrplusplus_config["rarefaction"]["script"],
min=amrplusplus_config["rarefaction"]["min"],
max=amrplusplus_config["rarefaction"]["max"],
skip=amrplusplus_config["rarefaction"]["skip"],
samples=amrplusplus_config["rarefaction"]["samples"],
threshold=amrplusplus_config["resistome"]["threshold"],
shell:
"""
{params.script} \
-ref_fp {params.megares_db} \
-sam_fp {input.alignment} \
-annot_fp {params.megares_annot} \
-gene_fp {output.gene} \
-group_fp {output.group} \
-mech_fp {output.mech} \
-class_fp {output.klass} \
-type_fp {output.typ} \
-min {params.min} \
-max {params.max} \
-skip {params.skip} \
-samples {params.samples} \
-t {params.threshold} \
2> {log.stderr} \
> {log.stdout}
"""

rule resistome_results:
"""Creating matrix of all gene hits"""
input:
gene_resistome=expand(f"{OUTDIR}/amrplusplus/RunResistome/{{sample}}.gene.tsv", sample=SAMPLES),
gene_rarefaction=expand(f"{OUTDIR}/amrplusplus/RunRarefaction/{{sample}}.gene.tsv", sample=SAMPLES)
output:
AMR_matrix=f"{OUTDIR}/amrplusplus/ResistomeResults/AMR_analytics_matrix.csv"
log:
stdout=f"{LOGDIR}/amrplusplus/resistome_results.stdout.log",
stderr=f"{LOGDIR}/amrplusplus/resistome_results.stderr.log",
shadow:
"shallow"
singularity:
"shub://meglab-metagenomics/amrplusplus_v2"
params:
script=amrplusplus_config["amr_long_to_wide"]
shell:
"""
python3 {params.script} \
-i {input.gene_resistome} \
-o {output.AMR_matrix} \
2> {log.stderr} \
> {log.stdout}
"""
2 changes: 1 addition & 1 deletion rules/antibiotic_resistance/groot.smk
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ localrules:


groot_db_path = Path(config["groot"]["index"])
if config["antibiotic_resistance"]:
if config["antibiotic_resistance"]["groot"]:
if not Path(groot_db_path).exists():
err_message = "No groot database found at: '{}'!\n".format(groot_db_path)
err_message += "Specify the DB path in the groot section of config.yaml.\n"
Expand Down
1 change: 1 addition & 0 deletions rules/publications.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@
"Bracken": "Lu J, Breitwieser FP, Thielen P, Salzberg SL (2017). Bracken: estimating species abundance in metagenomics data. PeerJ Computer Science 3:e104. https://doi.org/10.7717/peerj-cs.104",
"CONCOCT": "Alneberg, J., Bjarnason, B.S., de Bruijn, I., Schirmer, M., Quick, J., Ijaz, U.Z., Lahti, L., Loman, N.J, Andersson, A.F, Quince, C. (2014). Binning metagenomic contigs by coverage and composition. Nature Methods, https://doi.org/10.1038/nmeth.3103",
"GROOT": "Rowe WPM, Winn MD (2018). Indexed variation graphs for efficient and accurate resistome profiling. Bioinformatics, Volume 34, Issue 21. https://doi.org/10.1093/bioinformatics/bty387",
"AMRPlusPlus2": "Doster E, Lakin S, Dean C, Wlfe C, Young J, Boucher C et al. (2019). MEGAREs 2.0: a database for classification of antimicrobial drug, biocide and metal resistance determinants in metagenomic sequence data. Nucleic Acids Research, Volume 48, Issue D1, 08 January 2020, Pages D561–D569, https://doi.org/10.1093/nar/gkz1010",
"HUMAnN2": "Franzosa EA*, McIver LJ*, et al. (2018). Species-level functional profiling of metagenomes and metatranscriptomes. Nat Methods 15: 962-968. https://doi.org/10.1038/s41592-018-0176-y",
"Kaiju": "Menzel, P., Ng, K. L., & Krogh, A. (2016). Fast and sensitive taxonomic classification for metagenomics with Kaiju. Nature communications, 7, 11257. Available online at: https://github.com/bioinformatics-centre/kaiju",
"Kraken2": "Wood, D.E., Lu, J., & Langmead, B. (2019). Improved metagenomic analysis with Kraken 2. Genome biology, 20, 257. https://doi.org/10.1186/s13059-019-1891-0",
Expand Down
67 changes: 67 additions & 0 deletions scripts/amrplusplus/bin/amr_long_to_wide.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,67 @@
#!/usr/bin/env python3

__author__ = "Steven Lakin"
__copyright__ = ""
__credits__ = ["Steven Lakin"]
__version__ = ""
__maintainer__ = "lakinsm"
__email__ = "lakinsm@colostate.edu"
__status__ = "Cows go moo."

import argparse
import sys

amr_level_names = {0: 'Class', 1: 'Mechanism', 2: 'Group'}

def parse_cmdline_params(cmdline_params):
info = ""
parser = argparse.ArgumentParser(description=info)
parser.add_argument('-i', '--input_files', nargs='+', required=True,
help='Use globstar to pass a list of files, (Ex: *.tsv)')
parser.add_argument('-o', '--output_file', required=True,
help='Output file name for writing the AMR_analytic_matrix.csv file')
return parser.parse_args(cmdline_params)

def amr_load_data(file_name_list):
samples = {}
labels = set()
for file in file_name_list:
with open(file, 'r') as f:
data = f.read().split('\n')[1:]
for entry in data:
if not entry:
continue
entry = entry.split('\t')
sample = entry[0].split('.')[0]
count = float(entry[2])
gene_name = entry[1]
try:
samples[sample][gene_name] = count
except KeyError:
try:
samples[sample].setdefault(gene_name, count)
except KeyError:
samples.setdefault(sample, {gene_name: count})
labels.add(gene_name)
return samples, labels

def output_amr_analytic_data(outfile, S, L):
with open(outfile, 'w') as amr:
local_sample_names = []
for sample, dat in S.items():
local_sample_names.append(sample)
amr.write(','.join(local_sample_names) + '\n')
for label in L:
local_counts = []
amr.write(label + ',')
for local_sample in local_sample_names:
if label in S[local_sample]:
local_counts.append(str(S[local_sample][label]))
else:
local_counts.append(str(0))
amr.write(','.join(local_counts) + '\n')

if __name__ == '__main__':
opts = parse_cmdline_params(sys.argv[1:])
S, L = amr_load_data(opts.input_files)
output_amr_analytic_data(opts.output_file, S, L)
Binary file added scripts/amrplusplus/bin/rarefaction
Binary file not shown.
Binary file added scripts/amrplusplus/bin/resistome
Binary file not shown.