-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
"Multi-query input not implemented" #10
Comments
@DaRinker Thank you for your message. Currently, multi-query input is not implemented, but this is a feature that we will make available very soon. |
In the meantime, can you help me with the question I asked? I'm finding the single query process to run very slowly (much more slowly than the MPI web portal) and I think it's because the database is having to load into memory each time. What is the (current) best practice for running 1000s of sequences locally? Thank you |
Can you contact me at stanislaw.dunin-horkawicz@tuebingen.mpg.de and we will figure out what the problem is? |
Thanks for your help. Using the examples.sh script helped cut the processing time by about 60% |
Hi, I am having 2 million sequences (query) and I want to do homology search of these against 250 sequence database. Currently these is no multi query option. For 2 million sequences, running each query individually will take such a long time. Is there any way to fast pLMBLAST? |
@Citugulia40 Hi! The multi-query option is already implemented, but needs some testing before merging with the main branch. We expect to release it very soon along with other updates. cc @DaRinker |
Thank you so much. |
Definitely this month, maybe even next week. |
Hi, I'm curious to know if you have an estimate of how long pLMBLAST would take to execute when running 2 million query sequences against a database of 250 sequences (after implementing the multi-query option)? |
We will provide a speed benchmark along with the multi-query support. As a rough estimate, the ECOD benchmark (all-versus-all comparison of 1500 sequences) took about 30 minutes on a 20-core CPU. Running times will depend heavily on the cosine similarity cutoff and the length of the sequences. You may also want to consider clustering your 2M sequences to 40-50% identity at a high coverage cutoff (e.g. with MMSeqs2). Given the sensitivity of pLM-based methods, searching with 1-2 examples per cluster should be sufficient. |
Thank you so much for your kind support, Eagerly waiting for the multi-query option. |
Hi, I just want to ask that is there any update regarding the multi-query option? Thanks in advance |
Hi, all changes are in: https://github.com/labstructbioinf/pLM-BLAST/tree/multi_query_feature i will merge them on Thursday. There is still some work to do |
Ok, Thank you very much |
Changes are now live, looking forward for your feedback :) |
What is the fastest way to handle multiple sequences? (e.g. without loading the db into memory each time?).
I know there are some sample scripts, but I'm having a hard time seeing how they relate to the simple, step-by-step example (that is only good for a single fasta file).
The text was updated successfully, but these errors were encountered: