-
Notifications
You must be signed in to change notification settings - Fork 35
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Seqrepo not giving back consistent data #129
Comments
Hey @wlymanambry -- would you be willing to share some code or a little more details on how you're seeing this so that I can reproduce it?
Without knowing anything else, this would be my guess as to the issue, fwiw. Has anyone else (@theferrit32 ?) seen something similar? |
Sure I'm using aiohttp: import aiohttp
And then the calling code was: sequence_ids: list = []
|
Thanks! I'll put this on my list of stuff to tinker with during data loads. |
Thank you for taking a look! |
@wlymanambry can you provide the code used in the method |
|
Describe the bug
I have found that SeqRepo intermittently returns incorrect sequences. I've loaded millions of small protein sequences. I started seeing sequences returned that couldn't be accounted for. I eventually decided to write a loop on pulling sequence data from Seqrepo and comparing the returned sequence to the known sequence. (each All species match) is an iteration of checking the same 100 species or 100 loaded protein sequences:
Seq repo makes it 31 iterations, or 3,100 sequence comparisons before randomly returning incorrect data:
It then churns through about the same amount before again returning an incorrect sequence:
To Reproduce
Steps to reproduce the behavior:
Load a few million protein sequences and then query several thousand at a time while doing a check on the known sequence identity.
Expected behavior
For seqrepo to return the same sequence always.
Additional context
One bizarre aspect of this, I can't identify where the incorrect sequences are coming from. If I grab one of the incorrect returned sequences and check all of my sequence data that has been loaded. I don't see it. Also, I'm getting sequence back that isn't even sequence:
Also, it looks like this is isolated to having many concurrent calls. (100 in my case) It doesn't reproduce with serial calls.
I am using this version: seqrepo-rest-service:0.2.2
The text was updated successfully, but these errors were encountered: