-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
The coverage problem and (maybe) wrong cluster problem #1
Comments
Thank you very much for your interest in our software. For your problem, the possible causes and solutions are as follows:
Therefore, the premise of execution is that the length of the query sequence is longer than the representative sequence (that is, what you mean Targeting sequence) is short. You can check the consistency of the two sequences under our software and confirm the clustering results.
I hope to help you, if you still have questions, please feel free to send an email to lirl@sccas.cn or niubf@cnic.cn, we are happy to discuss and communicate with you, thank you very much! |
Hi, sclirl, thanks for your fast reply! For Problem 1, I almost understand what you mean. So in theory, if I set "-memiden" to 99 ,then for the genomes in the same cluster ,all of these genomes should have eMEMi >=99% to the representative genome (the longest one),right? About the problem2, I have uploaded the fasta file I used to do the experiment. There are 648 Viral complete genomes in the fasta, ZKV_2 and ZKV_184 is the case that displayed in the Problem2 picture. They share high similarity but assigned to different cluster by Gclust. You can download the data and see what's going on in this case. |
Hi, sclirl Sorry to say that I got the possible reason for Problem2... I forget to sort all the genomes before I run Gclust. I can get the right cluster after the sorting step for ZKV_2 and ZKV_184... However, for ZKV_26 and ZKV_184 , the problem still exists even I sort the genome, they are very similar (>99% query cov and >99% identity with online megablast), but they are assigned to different clusters.... That makes me confused.... |
Hi, liaoherui, Problem2: You can set the parameters -minlen and -sparse to a smaller value. These two parameters have a greater impact on the clustering result, such as the recommended values: -minlen 21, -sparse 1 (or 2). If there are other questions, we can communicate at any time, thank you! |
Thanks for your wonderful tool !
My problem is
if there are the parameters related with the alignment coverage.
For example,
Just like the picture shows, the query genome and target genome share 99.46% identity but only 84% coverage. When I set the "-memiden" to 99, they will be assigned to the same cluster....
So, if there are some parameters about the "coverage" threashold filtering?
In my experiment ,there are 2 highly similar genome, their identity and coverage is displayed as below picture:
However, when I set the "-memiden" to 99, they are assigned to different clusters, that really makes me confused...I am not sure what's going on...
(All the alignment in the picture is done by the online megablast alignment tool.)
The text was updated successfully, but these errors were encountered: