This simple script determines whether a Salmonella Typhi genome belongs to lineage Bdq, first described in: Salmonella enterica Serovar Typhi in Bangladesh: Exploration of Genomic Diversity and Antimicrobial Resistance, Tanmoy et al, 2018, mBio. This lineage is a part of lineage Bd (Later renamed as genotype 4.3.1.3) which was also first described in the same article. Lineage Bdq was later described as 4.3.1.3q1 in here: CRISPR-Cas Diversity in Clinical Salmonella enterica Serovar Typhi Isolates from South Asian Countries, Tanmoy et al, 2020, Genes.
The script filters through the VCF file, mapped against Salmonella Typhi CT18 (accession: NC_003198.1 or, AL513382.1). This VCF file needs to be based on a single genome. This script cannot process a merged VCF or a VCF based on multiple genomes.
python DetectBdq.py --vcf <VCF_file> --phred_cutoff <Minimum_Phred_Score> --output <Output_File>
--vcf VCF file of single isolate, mapped against Salmonella Typhi CT18.
--phred_cutoff Minimum Phred Score (default 20).
--output Output file (default: Bdq_detect.txt).
The script calculates the proportion of reads for each of the SNP positions. As the genotype 4.3.1.3q1 relies on four different SNPs (based on this article), this script also calculates the median read_proportion value.
SampleID <tab> 4.3.1.3q1_or_not <tab> Median_read_proportion