-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Amrplusplus #156
Amrplusplus #156
Changes from 7 commits
a37f835
0d594c7
efd328b
9855d7a
4877fa5
a3fe561
36e2084
cd56d05
64eb927
d7e75d3
4c433b7
ee715a6
1bc1f90
32dbfe8
225bfc7
9ae5550
0b3e141
038d5c2
18bc69c
8d9ec50
977465a
759ace0
c224c41
42c5259
c3887bc
23a7d3b
feb9ac8
136a59f
b01cce6
6201286
c98e63c
34c44c9
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -35,7 +35,9 @@ taxonomic_profile: | |
metaphlan2: False | ||
functional_profile: | ||
humann2: False | ||
antibiotic_resistance: False | ||
antibiotic_resistance: | ||
groot: False | ||
amrplusplus: True | ||
mappers: | ||
bbmap: False | ||
bowtie2: False | ||
|
@@ -123,7 +125,20 @@ groot: | |
maxlength: 125 # Maxlength for groot index | ||
covcutoff: 0.97 # Coverage cutoff for groot report | ||
lowcov: False # Report ARGs with no 5' or 3' coverage. Overrides covcutoff. | ||
|
||
amrplusplus: | ||
databases: | ||
megares_fasta: "scripts/amrplusplus/db/megares_modified_database_v2.00.fasta" | ||
megares_annotation: "scripts/amrplusplus/db/megares_modified_annotations_v2.00.csv" | ||
rarefaction: | ||
script: "scripts/amrplusplus/bin/rarefaction" # path to rarefaction script | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We have not utilized this way of referencing script paths in any previous StaG module. I'm not entirely sure that this belongs in the config file, as usage of the script is tightly coupled to the call to the script (in the rule body), so we might as well just hardcode the script path in there... Thoughts? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Sure its easy to change. My main idea with this type of referencing was if we would ever change the structure of There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I would like to only have stuff that end-users should be able to modify in the config file. You can still use the |
||
min: "5" # starting rarefaction level | ||
max: "100" # ending rarefaction level | ||
skip: "5" # number of levels to skip | ||
samples: "1" # number of iterations to sample at | ||
resistome: | ||
script: "scripts/amrplusplus/bin/resistome" # path to resistome script | ||
threshold: "80" # resistome threshold | ||
amr_long_to_wide: "scripts/amrplusplus/bin/amr_long_to_wide.py" # path to amr_long_to_wide.py script | ||
|
||
######################### | ||
# Mappers | ||
|
@@ -162,4 +177,3 @@ metawrap: | |
universal: "--universal" # Use universal marker genes | ||
minimum_completion: 70 | ||
maximum_contamination: 10 | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,55 @@ | ||
Bootstrap: docker | ||
From: debian:jessie-slim | ||
|
||
#Includes trimmomatic, samtools, bwa, bedtools, vcftools, htslib, kraken2, SNPfinder, freebayes, bbmap | ||
|
||
%environment | ||
export LC_ALL=C | ||
|
||
%post | ||
apt update \ | ||
&& apt install -y --no-install-recommends \ | ||
build-essential ca-certificates sudo tcsh\ | ||
git make automake autoconf openjdk-7-jre wget gzip unzip sed\ | ||
zlib1g-dev curl libbz2-dev locales libncurses5-dev liblzma-dev libcurl4-openssl-dev software-properties-common apt-transport-https\ | ||
python3-pip python3-docopt python3-pytest python-dev python3-dev\ | ||
libcurl4-openssl-dev libssl-dev zlib1g-dev fonts-texgyre \ | ||
gcc g++ gfortran libblas-dev liblapack-dev dos2unix libstdc++6\ | ||
r-base-core r-recommended hmmer\ | ||
&& rm -rf /var/lib/apt/lists/* | ||
|
||
|
||
wget -c https://repo.continuum.io/archive/Anaconda3-2020.02-Linux-x86_64.sh | ||
sh Anaconda3-2020.02-Linux-x86_64.sh -bfp /usr/local | ||
|
||
# add bioconda channels | ||
conda config --add channels defaults | ||
conda config --add channels conda-forge | ||
conda config --add channels bioconda | ||
|
||
# install bulk of bioinformatic tools using conda | ||
conda create -n AmrPlusPlus_env python=3 trimmomatic bwa samtools bedtools freebayes bbmap vcftools htslib kraken2 | ||
|
||
. /usr/local/bin/activate AmrPlusPlus_env | ||
|
||
#ln -s /usr/local/envs/AmrPlusPlus_env/bin/* /usr/local/bin/ | ||
|
||
#Still experimenting with how to change $PATH location. | ||
echo 'export PATH=$PATH:/usr/local/envs/AmrPlusPlus_env/bin/' >> $SINGULARITY_ENVIRONMENT | ||
|
||
# SNPfinder | ||
cd /usr/local | ||
git clone https://github.com/cdeanj/snpfinder.git | ||
cd snpfinder | ||
make | ||
cp snpfinder /usr/local/bin | ||
cd / | ||
|
||
# Make sure all the tools have the right permissions to use the tools | ||
chmod -R 777 /usr/local/ | ||
|
||
%test | ||
|
||
%labels | ||
Author meglab-metagenomics | ||
version amrplusplus_v2 |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,179 @@ | ||
# vim: syntax=python expandtab tabstop=4 | ||
# AMRPlusPlus v2.0 snakemake integration | ||
# Aron Arzoomand 12/8 2020 | ||
AroArz marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
from pathlib import Path, PurePath | ||
from snakemake.exceptions import WorkflowError | ||
|
||
|
||
|
||
localrules: | ||
build_amr_index, | ||
|
||
|
||
if config["antibiotic_resistance"]["amrplusplus"]: | ||
amrplusplus_outputs = f"{OUTDIR}/amrplusplus/ResistomeResults/AMR_analytics_matrix.csv" | ||
all_outputs.append(amrplusplus_outputs) | ||
|
||
citations.add(publications["AMRPlusPlus2"]) | ||
|
||
|
||
amrplusplus_config=config["amrplusplus"] | ||
|
||
|
||
rule build_amr_index: | ||
"""Get the Megares_2.0 database and annotations""" | ||
output: | ||
megares_db=amrplusplus_config["databases"]["db"], | ||
megares_annot=amrplusplus_config["databases"]["annotation"], | ||
log: | ||
stdout=f"{LOGDIR}/amrplusplus/build_amr.index.stdout.log", | ||
stderr=f"{LOGDIR}/amrplusplus/build_amr.index.stderr.log", | ||
shadow: | ||
"shallow" | ||
singularity: | ||
"shub://meglab-metagenomics/amrplusplus_v2" | ||
shell: | ||
""" | ||
wget -O {output.megares_db} https://raw.githubusercontent.com/meglab-metagenomics/amrplusplus_v2/master/data/amr/megares_modified_database_v2.00.fasta | ||
wget -O {output.megares_annot} https://raw.githubusercontent.com/meglab-metagenomics/amrplusplus_v2/master/data/amr/megares_modified_annotations_v2.00.csv | ||
bwa index {output.megares_db} \ | ||
2> {log.stderr} \ | ||
> {log.stdout} | ||
""" | ||
|
||
rule align_to_amr: | ||
"""BWA to align host filtered reads to Megares_2.0 database""" | ||
input: | ||
R1=f"{OUTDIR}/host_removal/{{sample}}_1.fq.gz", | ||
R2=f"{OUTDIR}/host_removal/{{sample}}_2.fq.gz", | ||
megares_db=amrplusplus_config["databases"]["db"], | ||
output: | ||
alignment=f"{OUTDIR}/amrplusplus/AlignToAMR/{{sample}}.amr.alignment.sam" | ||
log: | ||
stdout=f"{LOGDIR}/amrplusplus/{{sample}}.align_to_amr.stdout.log", | ||
stderr=f"{LOGDIR}/amrplusplus/{{sample}}.align_to_amr.stderr.log", | ||
shadow: | ||
"shallow" | ||
singularity: | ||
"shub://meglab-metagenomics/amrplusplus_v2" | ||
params: | ||
megares_db=amrplusplus_config["databases"]["db"], | ||
shell: | ||
""" | ||
bwa mem \ | ||
{params.megares_db} \ | ||
{input.R1} \ | ||
{input.R2} \ | ||
-t 10 \ | ||
-R '@RG\\tID:{wildcards.sample}\\tSM:{wildcards.sample}' > {output.alignment} | ||
""" | ||
|
||
rule run_resistome: | ||
"""Annotate alignments""" | ||
input: | ||
alignment=f"{OUTDIR}/amrplusplus/AlignToAMR/{{sample}}.amr.alignment.sam", | ||
output: | ||
gene=f"{OUTDIR}/amrplusplus/RunResistome/{{sample}}.gene.tsv", | ||
group=f"{OUTDIR}/amrplusplus/RunResistome/{{sample}}.group.tsv", | ||
mech=f"{OUTDIR}/amrplusplus/RunResistome/{{sample}}.mechanism.tsv", | ||
klass=f"{OUTDIR}/amrplusplus/RunResistome/{{sample}}.class.tsv", | ||
typ=f"{OUTDIR}/amrplusplus/RunResistome/{{sample}}.type.tsv", | ||
log: | ||
stdout=f"{LOGDIR}/amrplusplus/{{sample}}.run_resistome.stdout.log", | ||
stderr=f"{LOGDIR}/amrplusplus/{{sample}}.run_resistome.stderr.log", | ||
shadow: | ||
"shallow" | ||
singularity: | ||
"shub://meglab-metagenomics/amrplusplus_v2" | ||
params: | ||
script=amrplusplus_config["resistome"]["script"], | ||
threshold=amrplusplus_config["resistome"]["threshold"], | ||
megares_db=amrplusplus_config["databases"]["db"], | ||
megares_annot=amrplusplus_config["databases"]["annotation"], | ||
shell: | ||
""" | ||
{params.script} \ | ||
-ref_fp {params.megares_db} \ | ||
-annot_fp {params.megares_annot} \ | ||
-sam_fp {input.alignment} \ | ||
-gene_fp {output.gene} \ | ||
-group_fp {output.group} \ | ||
-mech_fp {output.mech} \ | ||
-class_fp {output.klass} \ | ||
-type_fp {output.typ} \ | ||
-t {params.threshold} \ | ||
2> {log.stderr} \ | ||
> {log.stdout} | ||
""" | ||
|
||
rule run_rarefaction: | ||
"""Running rarefaction""" | ||
input: | ||
alignment=f"{OUTDIR}/amrplusplus/AlignToAMR/{{sample}}.amr.alignment.sam", | ||
output: | ||
gene=f"{OUTDIR}/amrplusplus/RunRarefaction/{{sample}}.gene.tsv", | ||
group=f"{OUTDIR}/amrplusplus/RunRarefaction/{{sample}}.group.tsv", | ||
mech=f"{OUTDIR}/amrplusplus/RunRarefaction/{{sample}}.mechanism.tsv", | ||
klass=f"{OUTDIR}/amrplusplus/RunRarefaction/{{sample}}.class.tsv", | ||
typ=f"{OUTDIR}/amrplusplus/RunRarefaction/{{sample}}.type.tsv", | ||
log: | ||
stdout=f"{LOGDIR}/amrplusplus/{{sample}}.run_rarefaction.stdout.log", | ||
stderr=f"{LOGDIR}/amrplusplus/{{sample}}.run_rarefaction.stderr.log", | ||
shadow: | ||
"shallow" | ||
singularity: | ||
"shub://meglab-metagenomics/amrplusplus_v2" | ||
params: | ||
megares_db=amrplusplus_config["databases"]["db"], | ||
megares_annot=amrplusplus_config["databases"]["annotation"], | ||
script=amrplusplus_config["rarefaction"]["script"], | ||
min=amrplusplus_config["rarefaction"]["min"], | ||
max=amrplusplus_config["rarefaction"]["max"], | ||
skip=amrplusplus_config["rarefaction"]["skip"], | ||
samples=amrplusplus_config["rarefaction"]["samples"], | ||
threshold=amrplusplus_config["resistome"]["threshold"], | ||
shell: | ||
""" | ||
{params.script} \ | ||
-ref_fp {params.megares_db} \ | ||
-sam_fp {input.alignment} \ | ||
-annot_fp {params.megares_annot} \ | ||
-gene_fp {output.gene} \ | ||
-group_fp {output.group} \ | ||
-mech_fp {output.mech} \ | ||
-class_fp {output.klass} \ | ||
-type_fp {output.typ} \ | ||
-min {params.min} \ | ||
-max {params.max} \ | ||
-skip {params.skip} \ | ||
-samples {params.samples} \ | ||
-t {params.threshold} \ | ||
2> {log.stderr} \ | ||
> {log.stdout} | ||
""" | ||
|
||
rule resistome_results: | ||
"""Creating matrix of all gene hits""" | ||
input: | ||
gene_resistome=expand(f"{OUTDIR}/amrplusplus/RunResistome/{{sample}}.gene.tsv", sample=SAMPLES), | ||
gene_rarefaction=expand(f"{OUTDIR}/amrplusplus/RunRarefaction/{{sample}}.gene.tsv", sample=SAMPLES) | ||
output: | ||
AMR_matrix=f"{OUTDIR}/amrplusplus/ResistomeResults/AMR_analytics_matrix.csv" | ||
log: | ||
stdout=f"{LOGDIR}/amrplusplus/resistome_results.stdout.log", | ||
stderr=f"{LOGDIR}/amrplusplus/resistome_results.stderr.log", | ||
shadow: | ||
"shallow" | ||
singularity: | ||
"shub://meglab-metagenomics/amrplusplus_v2" | ||
params: | ||
script=amrplusplus_config["amr_long_to_wide"] | ||
shell: | ||
""" | ||
python3 {params.script} \ | ||
-i {input.gene_resistome} \ | ||
-o {output.AMR_matrix} \ | ||
2> {log.stderr} \ | ||
> {log.stdout} | ||
""" |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,67 @@ | ||
#!/usr/bin/env python3 | ||
|
||
__author__ = "Steven Lakin" | ||
__copyright__ = "" | ||
__credits__ = ["Steven Lakin"] | ||
__version__ = "" | ||
__maintainer__ = "lakinsm" | ||
__email__ = "lakinsm@colostate.edu" | ||
__status__ = "Cows go moo." | ||
|
||
import argparse | ||
import sys | ||
|
||
amr_level_names = {0: 'Class', 1: 'Mechanism', 2: 'Group'} | ||
|
||
def parse_cmdline_params(cmdline_params): | ||
info = "" | ||
parser = argparse.ArgumentParser(description=info) | ||
parser.add_argument('-i', '--input_files', nargs='+', required=True, | ||
help='Use globstar to pass a list of files, (Ex: *.tsv)') | ||
parser.add_argument('-o', '--output_file', required=True, | ||
help='Output file name for writing the AMR_analytic_matrix.csv file') | ||
return parser.parse_args(cmdline_params) | ||
|
||
def amr_load_data(file_name_list): | ||
samples = {} | ||
labels = set() | ||
for file in file_name_list: | ||
with open(file, 'r') as f: | ||
data = f.read().split('\n')[1:] | ||
for entry in data: | ||
if not entry: | ||
continue | ||
entry = entry.split('\t') | ||
sample = entry[0].split('.')[0] | ||
count = float(entry[2]) | ||
gene_name = entry[1] | ||
try: | ||
samples[sample][gene_name] = count | ||
except KeyError: | ||
try: | ||
samples[sample].setdefault(gene_name, count) | ||
except KeyError: | ||
samples.setdefault(sample, {gene_name: count}) | ||
labels.add(gene_name) | ||
return samples, labels | ||
|
||
def output_amr_analytic_data(outfile, S, L): | ||
with open(outfile, 'w') as amr: | ||
local_sample_names = [] | ||
for sample, dat in S.items(): | ||
local_sample_names.append(sample) | ||
amr.write(','.join(local_sample_names) + '\n') | ||
for label in L: | ||
local_counts = [] | ||
amr.write(label + ',') | ||
for local_sample in local_sample_names: | ||
if label in S[local_sample]: | ||
local_counts.append(str(S[local_sample][label])) | ||
else: | ||
local_counts.append(str(0)) | ||
amr.write(','.join(local_counts) + '\n') | ||
|
||
if __name__ == '__main__': | ||
opts = parse_cmdline_params(sys.argv[1:]) | ||
S, L = amr_load_data(opts.input_files) | ||
output_amr_analytic_data(opts.output_file, S, L) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about