-
Notifications
You must be signed in to change notification settings - Fork 28
AMR Prediction
This page describes the use of mykrobe predict
to make AMR predictions on the species supported by mykrobe - at the time of writing these are
Mycobacterium tuberculosis, Staphylococcus aureus, Shigella sonnei, Salmonella enterica serotype Paratyphi B, and Salmonella Typhi. You can see the available panels by running:
mykrobe panels describe
If no panels are installed, then run:
mykrobe panels update_metadata
mykrobe panels update_species all
When running on Shigella sonnei samples, the output should be post-processed as described at https://github.com/katholt/sonneityping. Details of the S. sonnei genotyping scheme are available in the paper Hawkey et al, 2021, Nature Communications.
Please note that the first time you run mykrobe predict
on each species, new files are created in mykrobe/data/skeletons/
. This means that if you are running more than one sample in parallel, you need to run one sample first to generate those files. Otherwise, it will crash on some of the samples.
Run on a Mycobacterium tuberculosis sample with one FASTQ file as input, writing the results to a comma-delimited file:
mykrobe predict --species tb --sample sample_name --seq reads.fq --output out.csv
Replace sample_name
with the name of your sample - whatever is used here will appear in the output.
Replace tb
with staph
or sonnei
for Staphylococcus aureus or Shigella sonnei samples.
As above, but the input is two gzipped FASTQ files:
mykrobe predict -S tb -s sample_name -i reads_1.fq.gz reads_2.fq.gz -o out.csv
As above, but the input is a BAM file:
mykrobe predict -S tb -s sample_name -i reads.bam -o out.csv
The default output format is CSV, and contains the essential information on the lineage of the sample (Mtb and sonnei only), and the AMR calls. For more detailed output, instead make a JSON file:
mykrobe predict -S tb -s sample_name -i reads.fq --format json -o out.json
Make both a JSON and a CSV file, called out.json
and out.csv
:
mykrobe predict -S tb -s sample_name -i reads.fq --format json_and_csv -o out
By default, the assumption is that the input reads are Illumina. If instead, you have nanopore data, then use the option --ont
. Example:
mykrobe predict -S tb -s sample_name --ont -i nanopore_reads.fq -o out.csv
By default, if a variant call is identified where a significant minority (enough to trigger a heterozygous call) of the reads have the variant, then it triggers a resistance call, reported as a lowercase "r" in the output. An uppercase "R" is used for a normal resistance call where the majority of reads have the variant (homozygous call). Use the option --ignore_minor_calls
to ignore these minor calls when predicting resistance. Example:
mykrobe predict -S tb -s sample_name --ignore_minor_calls -i reads.fq -o out.csv
The default behaviour is to only report detailed call information when it is a non-reference call (and therefore causes a resistance call). For debugging, or other in-depth analysis, it can be useful to see all calls with the --report_all_calls
option. If the output is in JSON format, this will add information for all calls in the panel into the output. Example:
mykrobe predict -S tb -s sample_name --report_all_calls -i reads.fq --format json -o out.json
The other options are more advanced, and we do not recommend using them unless you know what you are doing.
Please see the AMR prediction output page for a description of the output.
$ mykrobe predict --help
usage: mykrobe predict [-h] -s SAMPLE [-k kmer] [--tmp TMP] [--keep_tmp] [--skeleton_dir SKELETON_DIR] [-t THREADS] [-m MEMORY] [--expected_depth EXPECTED_DEPTH] [-1 seq [seq ...]]
[-c ctx] [-f] [--ont] [--guess_sequence_method] [--ignore_minor_calls] [--ignore_filtered IGNORE_FILTERED] [--model model] [--ploidy ploidy]
[--filters FILTERS [FILTERS ...]] [-A] [-e EXPECTED_ERROR_RATE] [--min_variant_conf MIN_VARIANT_CONF] [--min_gene_conf MIN_GENE_CONF]
[-D MIN_PROPORTION_EXPECTED_DEPTH] [--min_gene_percent_covg_threshold MIN_GENE_PERCENT_COVG_THRESHOLD] [-o OUTPUT] [--panels_dir DIRNAME] [-q] [-d] -S species
[--panel panel] [-P FILENAME] [-R FILENAME] [-L FILENAME] [--min_depth min_depth] [--conf_percent_cutoff conf_percent_cutoff] [-O {json,csv,json_and_csv}]
optional arguments:
-h, --help show this help message and exit
-s SAMPLE, --sample SAMPLE
Sample identifier [REQUIRED]
-k kmer, --kmer kmer K-mer length (default: 21)
--tmp TMP Directory to write temporary files to
--keep_tmp Don't remove temporary files
--skeleton_dir SKELETON_DIR
Directory for skeleton binaries
-t THREADS, --threads THREADS
Number of threads to use
-m MEMORY, --memory MEMORY
Memory to allocate for graph constuction (default: 1GB)
--expected_depth EXPECTED_DEPTH
Expected depth
-1 seq [seq ...], -i seq [seq ...], --seq seq [seq ...]
Sequence files (fasta,fastq,bam)
-c ctx, --ctx ctx Cortex graph binary
-f, --force Force override any skeleton files
--ont Set defaults for ONT data. Sets `-e 0.08 --ploidy haploid`
--guess_sequence_method
Guess if ONT or Illumia based on error rate. If error rate is > 10%, ploidy is set to haploid and a confidence threshold is used
--ignore_minor_calls Ignore minor calls when running resistance prediction
--ignore_filtered IGNORE_FILTERED
Don't include filtered genotypes
--model model Genotype model used. Options kmer_count, median_depth (default: kmer_count)
--ploidy ploidy Use a diploid (includes 0/1 calls) or haploid genotyping model (default: diploid)
--filters FILTERS [FILTERS ...]
Don't include specific filtered genotypes (default: ['MISSING_WT', 'LOW_PERCENT_COVERAGE', 'LOW_GT_CONF', 'LOW_TOTAL_DEPTH'])
-A, --report_all_calls
Report all calls
-e EXPECTED_ERROR_RATE, --expected_error_rate EXPECTED_ERROR_RATE
Expected sequencing error rate (default: 0.050)
--min_variant_conf MIN_VARIANT_CONF
Minimum genotype confidence for variant genotyping (default: 150)
--min_gene_conf MIN_GENE_CONF
Minimum genotype confidence for gene genotyping (default: 1)
-D MIN_PROPORTION_EXPECTED_DEPTH, --min_proportion_expected_depth MIN_PROPORTION_EXPECTED_DEPTH
Minimum depth required on the sum of both alleles (default: 0.30)
--min_gene_percent_covg_threshold MIN_GENE_PERCENT_COVG_THRESHOLD
All genes alleles found above this percent coverage will be reported (default: 100 (only best alleles reported))
-o OUTPUT, --output OUTPUT
File path to save output file as. Default is to stdout
--panels_dir DIRNAME Name of directory that contains panel data (default: /Users/michaelhall/Projects/mykrobe/src/mykrobe/data)
-q, --quiet Only output warnings/errors to stderr
-d, --debug Output debugging information to stderr
-S species, --species species
Species name, or 'custom' to use custom data, in which case --custom_probe_set_path is required. Run `mykrobe panels describe` to see list of options [REQUIRED]
--panel panel Name of panel to use. Ignored if species is 'custom'. Run `mykrobe panels describe` to see list of options
-P FILENAME, --custom_probe_set_path FILENAME
Required if species is 'custom'. Ignored otherwise. File path to fasta file from `mykrobe make-probes`.
-R FILENAME, --custom_variant_to_resistance_json FILENAME
For use with `--panel custom`. Ignored otherwise. File path to JSON with key,value pairs of variant names and induced drug resistance.
-L FILENAME, --custom_lineage_json FILENAME
For use with `--panel custom`. Ignored otherwise. File path to JSON made by --lineage option of make-probes
--min_depth min_depth
Minimum depth (default: 1)
--conf_percent_cutoff conf_percent_cutoff
Number between 0 and 100. Determines --min_variant_conf, by simulating variants and choosing the cutoff that would keep x% of the variants (default: 100)
-O {json,csv,json_and_csv}, --format {json,csv,json_and_csv}
Choose output format (default: csv)