-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug]: Output is not a QueryResult #244
Comments
The new PR #246 is trying to fix the bug, and I will merge the PR to the main branch. Then release a new version so that you can install the latest version. |
hi @anderdnavarro, I have released a new version, could you please try |
Thanks @cauliyang for your quick fix! Now there is no weird output, but it doesn't find most of the sequences in the genome. This is the list of 10 sequences I am using to test this:
And this is the output:
I am using options
One extra thing, when I checked that I was using the right version, I saw that it says 1.1.10, although I have the 1.1.18 now.
Thanks! |
Thanks for the update, I know the potential reason and will fix that soon. |
hi @anderdnavarro, thanks for the information you provided, I have tested the sequences you shared. I got the result:
Here is the code I used : def main():
test_seqs = [
"CCAAATTGAACTCATATTAGAAATGCAAAGTTGGTTTAACATTAGAAAAATCTATTCTTTTGACTTACCAGAAACAAAGGAGGAAAAAAAGACCCCATGATCATCTCAATCGATGCAGAAAAAGCATTCTAGCAAATTCAGTATCTGTGAATGATATAACCAGGAATAAAAGAACTACCTTCATCCAAAAGAAATTGTTTCTA",
"ATATATGCTGCACTTTCACATAGATTTCAGCTAAAAAAACAGTTCTACTGATTAAAAAATTTTGAAGACCACTGAATTAATCCTTCTTATTTGGGTTTGGAACCATGAAAATAAGAAACAACCCAAATGTCCATGAAGAAAAGAAAGGATATGCAATGTATAGTATAATCATATAATAGAATACTACTGAGCATTGAAAAGGA",
"GAAAACAAAAATGGACAAAGGGTATGAACAGTAATTTATAGAGAAAAATCCCAGAAAAAGTTCACAAGCATATTAAAAGATGCTTAAAATAATTAGTAATAGAAGAAAATCAAATAAAAATAACAGGATGTCACTTTTATGGCTAAGAAAGTAGCAAAAATAGGCCGGGTGTCGTGGCTCACACCTGTAATCCCAGCACTTTG",
"TGAGCTGCTGGAGATGAAGTTAGAAGAAAACAATAACAAGAATGAAGAAACGACCCCCCCTACCCCAAACAAAAGTTACTCCGAAATCACCATAGTAACAAATCACCAGTTTTCAACTGTTTCATTTCCTTTGCATTTTTGTTTCTCTGCACATACCCTTTTATTTATATTCAGCCCCAGCTCTAGCCTTTATTCAACAAGCA",
"GACAGAATAACTGTGCTGGGATGTGCTAATGCAGCAGGCATGCATAAGTGTAAACTCTGCTTAGACAAAAGCTTGCATCCTCTCTGTTTTCAAGGAGTGAATTTCTTACCGGTCTATTATTATGCTAATAAGGAGTCATGGATCACCAATGACATCTTTTCTGATTAGCTTCACAAACATTTTGTTACAGCACCTCCTGCTGA",
"CTTCCTCTCCTTTCATATGTTTTGGCTATAGTGGCATGTTGTTTATAAGTGGAGGACTGACTTTACTTTCAGGCATAAGCATTAAGCTTTCAAAAGTGAGTTTAATCCCATAGGTCTGTAAGCAGTCTTTAAACAGACTTCTCAAGTTAAAATAAAGTATGACAGATGGCAACATACTGTGAGCAGAATATAATAATGATGTA",
"GCCCGCCTCGTCCCCATGGCCGCAATGGCCAGAGGCATGGCTTCATCTCTTCTTGCCGGGTTAGTGCAGTTCTCTGCTGGTGGGGATGTGGGCCTCCAGGATTCCTCCCCAATCTGTCTGCACCGCTGTTCAGATGTGTTTTAAAAATTCAGGTCTGGTCAAGCCCTTCCCTACTCTACACCGTCAATAATTTCCTGTACTTT",
"TGAACATATTCTGAACCAGGAGGCCTGATTAGTAGCCACTGCTCTGTTCTCATCATATTCTTGCAAGAGGAAGTGGTGGTGGTTTTCCCCGCAATCAGGAAGGAATTCCATCATTCATCTGCTAAGGCAGCACCTCCCTGAAATAATCTCTTCTCCCAGTCAGGCCTAGCTGGCTCCTTTCCCTCTGCAGCAGGATTGGGAAG",
"GATACAAACTAAGCAGCGCCTGCTGCATTAGCTTCCAACTACTGAGTTGAATTTCCCTCTTTCTTTGCTTGACCTCGCCAGATGTTGTTATGATCTCATGTCCTAGCAGCTCAGTCCTTGGTCCTCCTCCCTTCTCTATCTGCCCCCTCTCCCAAGGATCTCTTGCAGTCCCATGGTTTTACATACCATCAGCAATTATAAAT",
"TGGCAGCTGGCTGTGTTGGTTGCAGCAGGCCAGGCAAAGCCAGGCTCGGGGAGTGGTGGCTGAGCAGAGGGCTTTCATGTGGCGGGGCCGCATCTATTTGGGGAAAGCAGGACTTACGTAGATTGTTAACAGGGCTGGAAGCGGAGGGTCGGTGCCGCCAGTAGGGTGGACGGAGTCATATCACTAAACGCACACTACACCAG",
]
host = "localhost"
port = 65008
seq_dir = "."
two_bit = "./hg38.2bit"
client = Client(
host=host,
port=port,
seq_dir=seq_dir,
min_score=20,
min_identity=90,
)
with Server(host, port, two_bit, can_stop=True, step_size=5) as server:
print("waiting for server to be ready")
server.wait_ready()
result = client.query(test_seqs)
print(result)
print(result[0]) # print result The issue should be fixed when you install the latest version |
Hi @cauliyang, It raises the following error (I ran the same code you pasted here because I got the same error with my code):
I don't know why this is happening now if it works for you. I also ran the command |
@anderdnavarro, I guess the reason is that we use different hg38.2bit files. But we should fix the bug, and I will try to resolve the issue. Thanks for the testing! |
Do you want me to reindex it? Or I can share it with you if you want. |
It will be better if you can share it with me so that I can reproduce and test the bug. |
@cauliyang here is the link to download them (fasta and 2bit): https://drive.google.com/drive/folders/1hR0ozxbtLEaIiTwUhlIluexVoNnCRb7m?usp=sharing |
Hi @anderdnavarro, Thanks for sharing the files. I found that I could not reproduce the issue on macOS arm64, but the issue happened on Linux x86. I have fixed the bug in the latest release |
Hi @cauliyang, yes sorry, I'm using Linux x86. It's working now! Thank you very much! The results for the test sequences are not the same you posted above (I think that because of the reference genome), but the results are good. Related to this last version I have a question. I read the change log and saw that now the ID is the first five nucleotides + the length. In my list of 2000 sequences (all of them with the same length) there are several that have the same first five nucleotides. I tested pxblat and it didn't raise any error, so I think it is working properly. This is just to confirm if I should double check those sequences. |
hi @anderdnavarro, glad to hear that it works now. Yep, I used different bit files so that the results are a little bit different. Also, I introduced a feature: Now the id of the query result will be a string concat the first 5 letters and the length of the sequence. for example, TACCG_100. Please let me know if you have a better idea! |
Hmm what do you think about |
@anderdnavarro, good points! I will consider more cases and then refactor the feature in the future. |
What happened?
Hi!
I am running pxblat for some sequences (a list with 10 test sequences), but when I try to parse the results I got an error because the output instead of be a QueryResult object is a string.
As you can see in the above example, the first result is good but then the second is not well processed. Also it is random, if I run the code again, the first sequence is weird but the second one is a proper QueryResult object.
I am running the Server this way, but I tried also running without the
tile_size
option or in the General ModeThank you very much!
Ander
Version
python-3.10.12
pxblat-1.1.10
biopython-1.83
What platform are you working on?
No response
Relevant log output
No response
Code of Conduct
The text was updated successfully, but these errors were encountered: