Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question about ambiguous k-mers with ambiguous nucleotides #46

Open
swamidass opened this issue Feb 24, 2017 · 3 comments
Open

Question about ambiguous k-mers with ambiguous nucleotides #46

swamidass opened this issue Feb 24, 2017 · 3 comments

Comments

@swamidass
Copy link

Are k-mers with ambiguous nucleotides (e.g. N) included in the sketch or are they thrown out?

I would imagine the best strategy is to have Mash filter these kmers out. I suppose it could be handled by input processing: breaking fasta sequences into multiple sequences at every ambiguous nucleotide. This does not seem idea.

Thanks.

@ondovb
Copy link
Member

ondovb commented Feb 24, 2017

They are indeed thrown out; by default only k-mers with ACGT are used.

@swamidass
Copy link
Author

Thanks for the quick reply. Sounds like this is handled correctly. My only complaint is that it is not documented clearly here or in the paper. Perhaps this could be noted to the help or documentation. Even more obvious to the user would be to note the number of dropped kmers in with the info.

@MKLau
Copy link

MKLau commented Dec 21, 2017

A quick note on this. I also had this question upon reading the paper. I found this, http://mash.readthedocs.io/en/latest/sketches.html#strand-and-alphabet, though still left me with the question of how gaps/ambiguous characters would be handled. My recommendation would be for http://mash.readthedocs.io/en/latest/sketches.html#ambiguous-characters section directly after #strand-and-alphabet.

Thanks for all your work on this by the way! This is a great tool.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants