Gene detection reporting - top gene per gene symbol, rather than the top gene per cluster as intended #7

katholt · 2014-02-06T06:18:05Z

I found bug in the code whereby we were reporting the top gene per gene symbol, rather than the top gene per cluster.

So, in those cases where there are very distinct groups of genes that share a gene symbol, you would only ever get the top scoring allele amongst them. So you can miss genes.

For example I found some cases today where I was expecting blaOXA-23 to be present, and was getting blaOXA-66 reported as the allele for ‘blaOXA’. Actually blaOXA is a common gene symbol used for genes that span as low as 70% identity, so each of these subtypes of blaOXA need to be treated as different genes that could each have alleles present. We are prepared for this because we have several distinct blaOXA clusters annotated in our resistance gene database, and recommend pre-clustering of all user databases before using with SRST2… but the code was not using the clustering IDs properly. So, instead of having blaOXA-23 (cluster 297) and blaOXA-66 (cluster 299) reported as present, I was just seeing blaOXA-66 (blaOXA) in all the outputs.

katholt · 2014-02-06T06:18:11Z

This will appear in the next release (0.1.3).

If you want to reanalyse your data with the new version, you don’t have to rerun any mapping, just use the --use_existing_scores flag to recall alleles based on your stored scores files (or --use_existing_pileup if you didn’t store the scores).

ghost assigned katholt Feb 6, 2014

katholt closed this as completed Feb 6, 2014

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Gene detection reporting - top gene per gene symbol, rather than the top gene per cluster as intended #7

Gene detection reporting - top gene per gene symbol, rather than the top gene per cluster as intended #7

katholt commented Feb 6, 2014

katholt commented Feb 6, 2014

Gene detection reporting - top gene per gene symbol, rather than the top gene per cluster as intended #7

Gene detection reporting - top gene per gene symbol, rather than the top gene per cluster as intended #7

Comments

katholt commented Feb 6, 2014

katholt commented Feb 6, 2014