Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: Output is not a QueryResult #244

Closed
1 task done
anderdnavarro opened this issue Feb 1, 2024 · 15 comments
Closed
1 task done

[Bug]: Output is not a QueryResult #244

anderdnavarro opened this issue Feb 1, 2024 · 15 comments
Assignees
Labels
bug Something isn't working

Comments

@anderdnavarro
Copy link

What happened?

Hi!

I am running pxblat for some sequences (a list with 10 test sequences), but when I try to parse the results I got an error because the output instead of be a QueryResult object is a string.

[QueryResult(id='/tmp/tmpc_hpcwrs', 21 hits), "psLayout version 3\n\nmatch\tmis- \trep. \tN's\tQ gap\tQ gap\tT gap\tT gap\tstrand\tQ        \tQ   \tQ    \tQ  \tT        \tT   \tT    \tT  \tblock\tblockSizes \tqStarts\t tStarts\n     \tmatch\tmatch\t   \tcount\tbases\tcount\tbases\t      \tname     \tsize\tstart\tend\tname     \tsize\tstart\tend\tcount\n---------------------------------------------------------------------------------------------------------------------------------------------------------------\n203\t0\t0\t0\t0\t0\t0\t0\t+\t/tmp/tmp6inztjij\t203\t0\t203\t1\t248956422\t232599292\t232599495\t1\t203,\t0,\t232599292,\n20\t0\t0\t0\t0\t0\t0\t0\t+\t/tmp/tmp6inztjij\t203\t143\t163\t16\t90338345\t68871914\t68871934\t1\t20,\t143,\t68871914,\n28\t0\t0\t0\t0\t0\t1\t128\t-\t/tmp/tmp6inztjij\t203\t93\t121\t11\t135086622\t44447222\t44447378\t2\t18,10,\t82,100,\t44447222,44447368,\n68\t6\t0\t0\t0\t0\t1\t4\t+\t/tmp/tmpc_hpcwrs\t203\t0\t74\t1\t248956422\t146819126\t146819204\t2\t41,33,\t0,41,\t146819126,146819171,\n37\t3\t0\t0\t0\t0\t0\t0\t+\t/tmp/tmpc_hpcwrs\t203\t0\t40\t1\t248956422\t168153835\t168153875\t1\t40,\t0,\t168153835,\n34\t1\t0\t0\t0\t0\t0\t0\t+\t/tmp/tmpc_hpcwrs\t203\t8\t43\t1\t248956422\t182295414\t182295449\t1\t35,\t8,\t182295414,\n75\t5\t0\t0\t1\t11\t1\t4\t+\t/tmp/tmpc_hpcwrs\t203\t1\t92\t1\t248956422\t203892097\t203892181\t3\t42,31,7,\t1,43,85,\t203892097,203892143,203892174,\n40\t3\t0\t0\t0\t0\t0\t0\t+\t/tmp/tmpc_hpcwrs\t203\t0\t43\t1\t248956422\t203851895\t203851938\t1\t43,\t0,\t203851895,\n42\t1\t0\t0\t0\t0\t0\t0\t+\t/tmp/tmpc_hpcwrs\t203\t0\t43\t1\t248956422\t222638623\t222638666\t1\t43,\t0,\t222638623,\n58\t3\t0\t0\t1\t7\t2\t328\t+\t/tmp/tmpc_hpcwrs\t203\t6\t74\t1\t248956422\t226142865\t226143254\t3\t19,16,26,\t6,25,48,\t226142865,226143201,226143228,\n69\t5\t0\t0\t1\t2\t2\t7\t+\t/tmp/tmpc_hpcwrs\t203\t6\t82\t1\t248956422\t235053408\t235053489\t4\t36,18,14,6,\t6,42,62,76,\t235053408,235053448,235053466,235053483,\n41\t3\t0\t0\t0\t0\t0\t0\t+\t/tmp/tmpc_hpcwrs\t203\t1\t45\t1\t248956422\t241585156\t241585200\t1\t44,\t1,\t241585156,\n36\t1\t0\t0\t0\t0\t0\t0\t+\t/tmp/tmpc_hpcwrs\t203\t6\t43\t1\t248956422\t244609118\t244609155\t1\t37,\t6,\t244609118,\n69\t5\t0\t0\t0\t0\t1\t4\t+\t/tmp/tmpc_hpcwrs\t203\t0\t74\t1\t248956422\t246612811\t246612889\t2\t41,33,\t0,41,\t246612811,246612856,\n40\t2\t0\t0\t0\t0\t1\t4\t+\t/tmp/tmpc_hpcwrs\t203\t34\t76\t10\t133797422\t4234175\t4234221\t2\t7,35,\t34,41,\t4234175,4234186,\n43\t0\t0\t0\t2\t19\t1\t12\t+\t/tmp/tmpc_hpcwrs\t203\t30\t92\t10\t133797422\t9536100\t9536155\t3\t13,23,7,\t30,51,85,\t9536100,9536125,9536148,\n59\t3\t0\t0\t1\t7\t2\t13\t+\t/tmp/tmpc_hpcwrs\t203\t6\t75\t10\t133797422\t59781145\t59781220\t3\t35,16,11,\t6,41,64,\t59781145,59781184,59781209,\n75\t6\t0\t0\t1\t3\t1\t4\t+\t/tmp/tmpc_hpcwrs\t203\t0\t84\t11\t135086622\t9978925\t9979010\t3\t41,33,7,\t0,41,77,\t9978925,9978970,9979003,\n62\t3\t0\t0\t1\t3\t1\t7\t+\t/tmp/tmpc_hpcwrs\t203\t6\t74\t12\t133275309\t126872181\t126872253\t2\t39,26,\t6,48,\t126872181,126872227,\n41\t2\t0\t0\t0\t0\t0\t0\t+\t/tmp/tmpc_hpcwrs\t203\t0\t43\t14\t107043718\t19957850\t19957893\t1\t43,\t0,\t19957850,\n40\t3\t0\t0\t0\t0\t0\t0\t+\t/tmp/tmpc_hpcwrs\t203\t0\t43\t14\t107043718\t90356526\t90356569\t1\t43,\t0,\t90356526,\n67\t6\t0\t0\t0\t0\t1\t4\t+\t

As you can see in the above example, the first result is good but then the second is not well processed. Also it is random, if I run the code again, the first sequence is weird but the second one is a proper QueryResult object.

I am running the Server this way, but I tried also running without the tile_size option or in the General Mode

client = Client(
        host="localhost",
        port=5000,
        seq_dir="/databases/hg38",
        min_score=20,
        min_identity=90
)

with Server("localhost", 5000, "/databases/hg38/hg38.2bit", can_stop=True, step_size=5, tile_size=10) as server: 
        sequences:list = prepare_blat_sequences(file)
        server.wait_ready()  
        results = client.query(sequences[0:10])

Thank you very much!
Ander

Version

python-3.10.12
pxblat-1.1.10
biopython-1.83

What platform are you working on?

No response

Relevant log output

No response

Code of Conduct

  • I agree to follow this project's Code of Conduct
@anderdnavarro anderdnavarro added the bug Something isn't working label Feb 1, 2024
@cauliyang cauliyang self-assigned this Feb 2, 2024
@cauliyang
Copy link
Collaborator

cauliyang commented Feb 2, 2024

The new PR #246 is trying to fix the bug, and I will merge the PR to the main branch. Then release a new version so that you can install the latest version.

@cauliyang cauliyang reopened this Feb 2, 2024
@cauliyang
Copy link
Collaborator

hi @anderdnavarro, I have released a new version, could you please try pip install pxblat==1.1.18 to use the latest version to test the issue again?

@anderdnavarro
Copy link
Author

Thanks @cauliyang for your quick fix! Now there is no weird output, but it doesn't find most of the sequences in the genome.

This is the list of 10 sequences I am using to test this:

['CCAAATTGAACTCATATTAGAAATGCAAAGTTGGTTTAACATTAGAAAAATCTATTCTTTTGACTTACCAGAAACAAAGGAGGAAAAAAAGACCCCATGATCATCTCAATCGATGCAGAAAAAGCATTCTAGCAAATTCAGTATCTGTGAATGATATAACCAGGAATAAAAGAACTACCTTCATCCAAAAGAAATTGTTTCTA',
'ATATATGCTGCACTTTCACATAGATTTCAGCTAAAAAAACAGTTCTACTGATTAAAAAATTTTGAAGACCACTGAATTAATCCTTCTTATTTGGGTTTGGAACCATGAAAATAAGAAACAACCCAAATGTCCATGAAGAAAAGAAAGGATATGCAATGTATAGTATAATCATATAATAGAATACTACTGAGCATTGAAAAGGA',
'GAAAACAAAAATGGACAAAGGGTATGAACAGTAATTTATAGAGAAAAATCCCAGAAAAAGTTCACAAGCATATTAAAAGATGCTTAAAATAATTAGTAATAGAAGAAAATCAAATAAAAATAACAGGATGTCACTTTTATGGCTAAGAAAGTAGCAAAAATAGGCCGGGTGTCGTGGCTCACACCTGTAATCCCAGCACTTTG',
'TGAGCTGCTGGAGATGAAGTTAGAAGAAAACAATAACAAGAATGAAGAAACGACCCCCCCTACCCCAAACAAAAGTTACTCCGAAATCACCATAGTAACAAATCACCAGTTTTCAACTGTTTCATTTCCTTTGCATTTTTGTTTCTCTGCACATACCCTTTTATTTATATTCAGCCCCAGCTCTAGCCTTTATTCAACAAGCA',
'GACAGAATAACTGTGCTGGGATGTGCTAATGCAGCAGGCATGCATAAGTGTAAACTCTGCTTAGACAAAAGCTTGCATCCTCTCTGTTTTCAAGGAGTGAATTTCTTACCGGTCTATTATTATGCTAATAAGGAGTCATGGATCACCAATGACATCTTTTCTGATTAGCTTCACAAACATTTTGTTACAGCACCTCCTGCTGA',
'CTTCCTCTCCTTTCATATGTTTTGGCTATAGTGGCATGTTGTTTATAAGTGGAGGACTGACTTTACTTTCAGGCATAAGCATTAAGCTTTCAAAAGTGAGTTTAATCCCATAGGTCTGTAAGCAGTCTTTAAACAGACTTCTCAAGTTAAAATAAAGTATGACAGATGGCAACATACTGTGAGCAGAATATAATAATGATGTA',
'GCCCGCCTCGTCCCCATGGCCGCAATGGCCAGAGGCATGGCTTCATCTCTTCTTGCCGGGTTAGTGCAGTTCTCTGCTGGTGGGGATGTGGGCCTCCAGGATTCCTCCCCAATCTGTCTGCACCGCTGTTCAGATGTGTTTTAAAAATTCAGGTCTGGTCAAGCCCTTCCCTACTCTACACCGTCAATAATTTCCTGTACTTT',
'TGAACATATTCTGAACCAGGAGGCCTGATTAGTAGCCACTGCTCTGTTCTCATCATATTCTTGCAAGAGGAAGTGGTGGTGGTTTTCCCCGCAATCAGGAAGGAATTCCATCATTCATCTGCTAAGGCAGCACCTCCCTGAAATAATCTCTTCTCCCAGTCAGGCCTAGCTGGCTCCTTTCCCTCTGCAGCAGGATTGGGAAG',
'GATACAAACTAAGCAGCGCCTGCTGCATTAGCTTCCAACTACTGAGTTGAATTTCCCTCTTTCTTTGCTTGACCTCGCCAGATGTTGTTATGATCTCATGTCCTAGCAGCTCAGTCCTTGGTCCTCCTCCCTTCTCTATCTGCCCCCTCTCCCAAGGATCTCTTGCAGTCCCATGGTTTTACATACCATCAGCAATTATAAAT', 'TGGCAGCTGGCTGTGTTGGTTGCAGCAGGCCAGGCAAAGCCAGGCTCGGGGAGTGGTGGCTGAGCAGAGGGCTTTCATGTGGCGGGGCCGCATCTATTTGGGGAAAGCAGGACTTACGTAGATTGTTAACAGGGCTGGAAGCGGAGGGTCGGTGCCGCCAGTAGGGTGGACGGAGTCATATCACTAAACGCACACTACACCAG']

And this is the output:

[QueryResult(id='/tmp/tmplg5fwrhv', 26 hits),
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None]

I am using options step_size=5, tile_size=10 to get the same result as the Blat web server, but using the server I can find many matches for all those sequences. Also, when I use these same sequences with the previous version of pxblat I have installed (0.3.6) the result is this:

[QueryResult(id='/tmp/97127397.1.all.q/tmpcvho834m', 26 hits), 
QueryResult(id='/tmp/97127397.1.all.q/tmpskvgjlho', 13 hits), 
QueryResult(id='/tmp/97127397.1.all.q/tmpozvutw8l', 14 hits), 
QueryResult(id='/tmp/97127397.1.all.q/tmpo1hol4eh', 2 hits), 
QueryResult(id='/tmp/97127397.1.all.q/tmp6s8o5t4t', 11 hits), 
QueryResult(id='/tmp/97127397.1.all.q/tmpaaatc007', 2 hits), 
QueryResult(id='/tmp/97127397.1.all.q/tmp2dw4kgsz', 2 hits), 
QueryResult(id='/tmp/97127397.1.all.q/tmpvyx3k401', 2 hits), 
QueryResult(id='/tmp/97127397.1.all.q/tmp29_zkjol', 4 hits), 
QueryResult(id='/tmp/97127397.1.all.q/tmpkiepaf36', 4 hits)]

One extra thing, when I checked that I was using the right version, I saw that it says 1.1.10, although I have the 1.1.18 now.

import pxblat
print(pxblat.__version__)
1.1.10

pip freeze | grep pxblat
pxblat==1.1.18

Thanks!
Ander

@cauliyang
Copy link
Collaborator

Thanks for the update, I know the potential reason and will fix that soon.

@cauliyang cauliyang reopened this Feb 5, 2024
@cauliyang
Copy link
Collaborator

hi @anderdnavarro, thanks for the information you provided, I have tested the sequences you shared. I got the result:

[QueryResult(id='/var/folders/s3/vs6nrrg52sdfjk3z90p7ndt94gg4tq/T/tmpwq3ugnzl', 27 hits), 
QueryResult(id='/var/folders/s3/vs6nrrg52sdfjk3z90p7ndt94gg4tq/T/tmp22_rm7xl', 22 hits), QueryResult(id='/var/folders/s3/vs6nrrg52sdfjk3z90p7ndt94gg4tq/T/tmpvk8r054o', 10 hits), 
QueryResult(id='/var/folders/s3/vs6nrrg52sdfjk3z90p7ndt94gg4tq/T/tmpdyw1as3i', 7 hits), 
QueryResult(id='/var/folders/s3/vs6nrrg52sdfjk3z90p7ndt94gg4tq/T/tmpuo2apz3f', 12 hits), 
QueryResult(id='/var/folders/s3/vs6nrrg52sdfjk3z90p7ndt94gg4tq/T/tmpb7x8k08m', 3 hits), QueryResult(id='/var/folders/s3/vs6nrrg52sdfjk3z90p7ndt94gg4tq/T/tmpsepv7tzh', 5 hits), QueryResult(id='/var/folders/s3/vs6nrrg52sdfjk3z90p7ndt94gg4tq/T/tmpopfl1ujb', 4 hits), QueryResult(id='/var/folders/s3/vs6nrrg52sdfjk3z90p7ndt94gg4tq/T/tmppphsxzhs', 5 hits), 
QueryResult(id='/var/folders/s3/vs6nrrg52sdfjk3z90p7ndt94gg4tq/T/tmpwi2l87u1', 4 hits)]

Here is the code I used :

def main():
    test_seqs = [
        "CCAAATTGAACTCATATTAGAAATGCAAAGTTGGTTTAACATTAGAAAAATCTATTCTTTTGACTTACCAGAAACAAAGGAGGAAAAAAAGACCCCATGATCATCTCAATCGATGCAGAAAAAGCATTCTAGCAAATTCAGTATCTGTGAATGATATAACCAGGAATAAAAGAACTACCTTCATCCAAAAGAAATTGTTTCTA",
        "ATATATGCTGCACTTTCACATAGATTTCAGCTAAAAAAACAGTTCTACTGATTAAAAAATTTTGAAGACCACTGAATTAATCCTTCTTATTTGGGTTTGGAACCATGAAAATAAGAAACAACCCAAATGTCCATGAAGAAAAGAAAGGATATGCAATGTATAGTATAATCATATAATAGAATACTACTGAGCATTGAAAAGGA",
        "GAAAACAAAAATGGACAAAGGGTATGAACAGTAATTTATAGAGAAAAATCCCAGAAAAAGTTCACAAGCATATTAAAAGATGCTTAAAATAATTAGTAATAGAAGAAAATCAAATAAAAATAACAGGATGTCACTTTTATGGCTAAGAAAGTAGCAAAAATAGGCCGGGTGTCGTGGCTCACACCTGTAATCCCAGCACTTTG",
        "TGAGCTGCTGGAGATGAAGTTAGAAGAAAACAATAACAAGAATGAAGAAACGACCCCCCCTACCCCAAACAAAAGTTACTCCGAAATCACCATAGTAACAAATCACCAGTTTTCAACTGTTTCATTTCCTTTGCATTTTTGTTTCTCTGCACATACCCTTTTATTTATATTCAGCCCCAGCTCTAGCCTTTATTCAACAAGCA",
        "GACAGAATAACTGTGCTGGGATGTGCTAATGCAGCAGGCATGCATAAGTGTAAACTCTGCTTAGACAAAAGCTTGCATCCTCTCTGTTTTCAAGGAGTGAATTTCTTACCGGTCTATTATTATGCTAATAAGGAGTCATGGATCACCAATGACATCTTTTCTGATTAGCTTCACAAACATTTTGTTACAGCACCTCCTGCTGA",
        "CTTCCTCTCCTTTCATATGTTTTGGCTATAGTGGCATGTTGTTTATAAGTGGAGGACTGACTTTACTTTCAGGCATAAGCATTAAGCTTTCAAAAGTGAGTTTAATCCCATAGGTCTGTAAGCAGTCTTTAAACAGACTTCTCAAGTTAAAATAAAGTATGACAGATGGCAACATACTGTGAGCAGAATATAATAATGATGTA",
        "GCCCGCCTCGTCCCCATGGCCGCAATGGCCAGAGGCATGGCTTCATCTCTTCTTGCCGGGTTAGTGCAGTTCTCTGCTGGTGGGGATGTGGGCCTCCAGGATTCCTCCCCAATCTGTCTGCACCGCTGTTCAGATGTGTTTTAAAAATTCAGGTCTGGTCAAGCCCTTCCCTACTCTACACCGTCAATAATTTCCTGTACTTT",
        "TGAACATATTCTGAACCAGGAGGCCTGATTAGTAGCCACTGCTCTGTTCTCATCATATTCTTGCAAGAGGAAGTGGTGGTGGTTTTCCCCGCAATCAGGAAGGAATTCCATCATTCATCTGCTAAGGCAGCACCTCCCTGAAATAATCTCTTCTCCCAGTCAGGCCTAGCTGGCTCCTTTCCCTCTGCAGCAGGATTGGGAAG",
        "GATACAAACTAAGCAGCGCCTGCTGCATTAGCTTCCAACTACTGAGTTGAATTTCCCTCTTTCTTTGCTTGACCTCGCCAGATGTTGTTATGATCTCATGTCCTAGCAGCTCAGTCCTTGGTCCTCCTCCCTTCTCTATCTGCCCCCTCTCCCAAGGATCTCTTGCAGTCCCATGGTTTTACATACCATCAGCAATTATAAAT",
        "TGGCAGCTGGCTGTGTTGGTTGCAGCAGGCCAGGCAAAGCCAGGCTCGGGGAGTGGTGGCTGAGCAGAGGGCTTTCATGTGGCGGGGCCGCATCTATTTGGGGAAAGCAGGACTTACGTAGATTGTTAACAGGGCTGGAAGCGGAGGGTCGGTGCCGCCAGTAGGGTGGACGGAGTCATATCACTAAACGCACACTACACCAG",
    ]

    host = "localhost"
    port = 65008
    seq_dir = "."
    two_bit = "./hg38.2bit"

    client = Client(
        host=host,
        port=port,
        seq_dir=seq_dir,
        min_score=20,
        min_identity=90,
    )

    with Server(host, port, two_bit, can_stop=True, step_size=5) as server:
        print("waiting for server to be ready")
        server.wait_ready()

        result = client.query(test_seqs)
        print(result)
        print(result[0])  # print result

The issue should be fixed when you install the latest version 1.1.19. Please let me know if you still face some problems.

@anderdnavarro
Copy link
Author

Hi @cauliyang,

It raises the following error (I ran the same code you pasted here because I got the same error with my code):

╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
│ in <module>:31                                                                                   │
│                                                                                                  │
│   28 │   print("waiting for server to be ready")                                                 │
│   29 │   server.wait_ready()                                                                     │
│   30 │                                                                                           │
│ ❱ 31 │   result = client.query(test_seqs)                                                        │
│   32 │   print(result)                                                                           │
│   33 │   print(result[0])  # print result                                                        │
│   34                                                                                             │
│                                                                                                  │
│ ╭─────────────────────────────────────────── locals ───────────────────────────────────────────╮ │
│ │               approach = 'attention'                                                         │ │
│ │                 Client = <class 'pxblat.server.client.Client'>                               │ │
│ │                 client = <pxblat.server.client.Client object at 0x7fc2460a33a0>              │ │
│ │                   exit = <IPython.core.autocall.ZMQExitAutocall object at 0x7fc29e2a19f0>    │ │
│ │            get_ipython = <bound method InteractiveShell.get_ipython of                       │ │
│ │                          <ipykernel.zmqshell.ZMQInteractiveShell object at 0x7fc29e2a0bb0>>  │ │
│ │                   gzip = <module 'gzip' from '/usr/lib/python3.10/gzip.py'>                  │ │
│ │                   host = 'localhost'                                                         │ │
│ │                     In = [                                                                   │ │
│ │                          │   '',                                                             │ │
│ │                          │   'import os\nimport gzip\nimport pickle\nfrom datasets import    │ │
│ │                          load_dataset\nfrom pxbla'+145,                                      │ │
│ │                          │   "approach = 'attention'",                                       │ │
│ │                          │   'import pxblat\npxblat.__version__',                            │ │
│ │                          │   "def prepare_blat_sequences(input:pickle) -> list:\n    with    │ │
│ │                          open(input, 'rb') as "+161,                                         │ │
│ │                          │   '# version 1.1.18\n# Set up the client\nclient = Client(\n      │ │
│ │                          host="localhost",\n    '+856,                                       │ │
│ │                          │   "sequences:list =                                               │ │
│ │                          prepare_blat_sequences(f'/test/"+8… │ │
│ │                          │   "# sequences =                                                  │ │
│ │                          ['CCAAATTGAACTCATATTAGAAATGCAAAGTTGGTTTAACATTAGAAAAATCTATTCTTTTGAC… │ │
│ │                          │   'test_seqs = [\n                                                │ │
│ │                          "CCAAATTGAACTCATATTAGAAATGCAAAGTTGGTTTAACATTAGAAAAATCTATTCTTTT'+25… │ │
│ │                          │   'test_seqs = [\n                                                │ │
│ │                          "CCAAATTGAACTCATATTAGAAATGCAAAGTTGGTTTAACATTAGAAAAATCTATTCTTTT'+25… │ │
│ │                          ]                                                                   │ │
│ │           load_dataset = <function load_dataset at 0x7fc2564b53f0>                           │ │
│ │                     np = <module 'numpy' from                                                │ │
│ │                          '/test/lib/py… │ │
│ │                   open = <function open at 0x7fc29f0e3880>                                   │ │
│ │                     os = <module 'os' from '/usr/lib/python3.10/os.py'>                      │ │
│ │                    Out = {3: '1.1.19'}                                                       │ │
│ │                     pd = <module 'pandas' from                                               │ │
│ │                          '/test/lib/py… │ │
│ │                 pickle = <module 'pickle' from '/usr/lib/python3.10/pickle.py'>              │ │
│ │                    plt = <module 'matplotlib.pyplot' from                                    │ │
│ │                          '/test/lib/py… │ │
│ │                   port = 8000                                                                │ │
│ │ prepare_blat_sequences = <function prepare_blat_sequences at 0x7fc246be9a20>                 │ │
│ │                 pxblat = <module 'pxblat' from                                               │ │
│ │                          '/test/lib/py… │ │
│ │                   quit = <IPython.core.autocall.ZMQExitAutocall object at 0x7fc29e2a19f0>    │ │
│ │                seq_dir = '/databases/hg38/'                      │ │
│ │              sequences = [                                                                   │ │
│ │                          │                                                                   │ │
│ │                          'GACCACTTCTAAAGGCTCATACACCTACCTGGTACATTTATACAGAAATACTATTTCTCAAAATC… │ │
│ │                          │                                                                   │ │
│ │                          'AATTTCCGAGCAGTGCAAACCGGAGAGCTCCCCATTCCCAGCGCCAACGGCAGGTTTGCGCGCCC… │ │
│ │                          │                                                                   │ │
│ │                          'TTTTGTTTCCATGAGACTTCAAGGGGAGGTTCTGTGGCAGTATTTATTTGGCTTATGGTAACCTC… │ │
│ │                          │                                                                   │ │
│ │                          'TAAGAAATAAAAAAATAAAAAGCATATGTATTTCATACTAAAAATCAACACTAGTTGGTGAAGAA… │ │
│ │                          │                                                                   │ │
│ │                          'TTGCACCGTCATCCAACTTGCCACTCTGTGCCTTATCATTGGGACATTTAGTCTGTTTATATTCA… │ │
│ │                          │                                                                   │ │
│ │                          'GCTGAGATAAGTAAGGCTACACACTTGGAACATGGTCGATTTTTGGATTCTCCAAGAGGAGCCAA… │ │
│ │                          │                                                                   │ │
│ │                          'AAACCCTAGAAGAAAACCTAGGCAATACCATTCAGGACACAGGCATGGGCAAAGACTTCATGACT… │ │
│ │                          │                                                                   │ │
│ │                          'AACATGGTGAAACCCCGTCTCTACTAAAAATACAAAAATTACCTAGGTGTGGTGGCATGTGCCTG… │ │
│ │                          │                                                                   │ │
│ │                          'CAGTGAAGACACAGTGGGTCTTGTTATCTATCATTTGGCCATAGTACAATCTCTCTCCTGATCTG… │ │
│ │                          │                                                                   │ │
│ │                          'TGAGCCCCGAGTTGTTTCTGTTTCTGTTACTGCCTGTGATGTGTGGGGCTTCTGGGTAGGTCAAG… │ │
│ │                          │   ... +1978                                                       │ │
│ │                          ]                                                                   │ │
│ │                 Server = <class 'pxblat.server.server.Server'>                               │ │
│ │                 server = Server(localhost, 8000, ready: False open: False                    │ │
│ │                          ServerOption(canStop: true, log: , logFacility: , mask: false,      │ │
│ │                          maxAaSize: 8000, maxDnaHits: 100, maxGap: 2, maxNtSize: 40000,      │ │
│ │                          maxTransHits: 200, minMatch: 2, repMatch: 2252, seqLog: false,      │ │
│ │                          ipLog: false, debugLog: false, tileSize: 11, stepSize: 5, trans:    │ │
│ │                          false, syslog: false, perSeqMax: , noSimpRepMask: false, indexFile: │ │
│ │                          , timeout: 90, genome: , genomeDataDir: , threads: 1,               │ │
│ │                          allowOneMismatch: false))                                           │ │
│ │                    sns = <module 'seaborn' from                                              │ │
│ │                          '/test/lib/py… │ │
│ │                  tabix = <module 'tabix' from                                                │ │
│ │                          '/test/lib/py… │ │
│ │              test_seqs = [                                                                   │ │
│ │                          │                                                                   │ │
│ │                          'CCAAATTGAACTCATATTAGAAATGCAAAGTTGGTTTAACATTAGAAAAATCTATTCTTTTGACT… │ │
│ │                          │                                                                   │ │
│ │                          'ATATATGCTGCACTTTCACATAGATTTCAGCTAAAAAAACAGTTCTACTGATTAAAAAATTTTGA… │ │
│ │                          │                                                                   │ │
│ │                          'GAAAACAAAAATGGACAAAGGGTATGAACAGTAATTTATAGAGAAAAATCCCAGAAAAAGTTCAC… │ │
│ │                          │                                                                   │ │
│ │                          'TGAGCTGCTGGAGATGAAGTTAGAAGAAAACAATAACAAGAATGAAGAAACGACCCCCCCTACCC… │ │
│ │                          │                                                                   │ │
│ │                          'GACAGAATAACTGTGCTGGGATGTGCTAATGCAGCAGGCATGCATAAGTGTAAACTCTGCTTAGA… │ │
│ │                          │                                                                   │ │
│ │                          'CTTCCTCTCCTTTCATATGTTTTGGCTATAGTGGCATGTTGTTTATAAGTGGAGGACTGACTTTA… │ │
│ │                          │                                                                   │ │
│ │                          'GCCCGCCTCGTCCCCATGGCCGCAATGGCCAGAGGCATGGCTTCATCTCTTCTTGCCGGGTTAGT… │ │
│ │                          │                                                                   │ │
│ │                          'TGAACATATTCTGAACCAGGAGGCCTGATTAGTAGCCACTGCTCTGTTCTCATCATATTCTTGCA… │ │
│ │                          │                                                                   │ │
│ │                          'GATACAAACTAAGCAGCGCCTGCTGCATTAGCTTCCAACTACTGAGTTGAATTTCCCTCTTTCTT… │ │
│ │                          │                                                                   │ │
│ │                          'TGGCAGCTGGCTGTGTTGGTTGCAGCAGGCCAGGCAAAGCCAGGCTCGGGGAGTGGTGGCTGAGC… │ │
│ │                          ]                                                                   │ │
│ │                two_bit = '/databases/hg38/hg38.2bit'             │ │
│ ╰──────────────────────────────────────────────────────────────────────────────────────────────╯ │
│                                                                                                  │
│ /test/lib/python3.10/site-packages/pxblat/s │
│ erver/client.py:590 in query                                                                     │
│                                                                                                  │
│   587 │   │                                                                                      │
│   588 │   │   results = []                                                                       │
│   589 │   │   for in_seq in in_seqs:                                                             │
│ ❱ 590 │   │   │   results.append(self._query(in_seq))                                            │
│   591 │   │                                                                                      │
│   592 │   │   return results                                                                     │
│   593                                                                                            │
│                                                                                                  │
│ ╭─────────────────────────────────────────── locals ───────────────────────────────────────────╮ │
│ │  in_seq = 'ATATATGCTGCACTTTCACATAGATTTCAGCTAAAAAAACAGTTCTACTGATTAAAAAATTTTGAAGACCACTGAATTAA… │ │
│ │ in_seqs = [                                                                                  │ │
│ │           │                                                                                  │ │
│ │           'CCAAATTGAACTCATATTAGAAATGCAAAGTTGGTTTAACATTAGAAAAATCTATTCTTTTGACTTACCAGAAACAAAGG… │ │
│ │           │                                                                                  │ │
│ │           'ATATATGCTGCACTTTCACATAGATTTCAGCTAAAAAAACAGTTCTACTGATTAAAAAATTTTGAAGACCACTGAATTAA… │ │
│ │           │                                                                                  │ │
│ │           'GAAAACAAAAATGGACAAAGGGTATGAACAGTAATTTATAGAGAAAAATCCCAGAAAAAGTTCACAAGCATATTAAAAGA… │ │
│ │           │                                                                                  │ │
│ │           'TGAGCTGCTGGAGATGAAGTTAGAAGAAAACAATAACAAGAATGAAGAAACGACCCCCCCTACCCCAAACAAAAGTTACT… │ │
│ │           │                                                                                  │ │
│ │           'GACAGAATAACTGTGCTGGGATGTGCTAATGCAGCAGGCATGCATAAGTGTAAACTCTGCTTAGACAAAAGCTTGCATCC… │ │
│ │           │                                                                                  │ │
│ │           'CTTCCTCTCCTTTCATATGTTTTGGCTATAGTGGCATGTTGTTTATAAGTGGAGGACTGACTTTACTTTCAGGCATAAGC… │ │
│ │           │                                                                                  │ │
│ │           'GCCCGCCTCGTCCCCATGGCCGCAATGGCCAGAGGCATGGCTTCATCTCTTCTTGCCGGGTTAGTGCAGTTCTCTGCTGG… │ │
│ │           │                                                                                  │ │
│ │           'TGAACATATTCTGAACCAGGAGGCCTGATTAGTAGCCACTGCTCTGTTCTCATCATATTCTTGCAAGAGGAAGTGGTGGT… │ │
│ │           │                                                                                  │ │
│ │           'GATACAAACTAAGCAGCGCCTGCTGCATTAGCTTCCAACTACTGAGTTGAATTTCCCTCTTTCTTTGCTTGACCTCGCCA… │ │
│ │           │                                                                                  │ │
│ │           'TGGCAGCTGGCTGTGTTGGTTGCAGCAGGCCAGGCAAAGCCAGGCTCGGGGAGTGGTGGCTGAGCAGAGGGCTTTCATGT… │ │
│ │           ]                                                                                  │ │
│ │ results = [QueryResult(id='/tmp/tmpiuztoqy7', 27 hits)]                                      │ │
│ │    self = <pxblat.server.client.Client object at 0x7fc2460a33a0>                             │ │
│ ╰──────────────────────────────────────────────────────────────────────────────────────────────╯ │
│                                                                                                  │
│ /test/lib/python3.10/site-packages/pxblat/s │
│ erver/client.py:540 in _query                                                                    │
│                                                                                                  │
│   537 │   │   │   basic_option.withInName(str(in_seq)).withInSeq("").build()                     │
│   538 │   │   else:                                                                              │
│   539 │   │   │   basic_option.withInSeq(str(in_seq)).withInName("").build()                     │
│ ❱ 540 │   │   return query_server(basic_option, parse=self._parse)                               │
│   541 │                                                                                          │
│   542 │   def query(self, in_seqs: INSEQS | list[str] | list[Path] | INSEQ):                     │
│   543 │   │   """Query the server with the specified sequences.                                  │
│                                                                                                  │
│ ╭─────────────────────────────────────────── locals ───────────────────────────────────────────╮ │
│ │ basic_option = ClientOption(hostName=localhost, portName=8000, tType=dna, qType=dna, dots=0, │ │
│ │                nohead=false, minScore=20, minIdentity=90, outputFormat=psl,                  │ │
│ │                maxIntron=750000, genome=, genomeDataDir=, isDynamic=false,                   │ │
│ │                tSeqDir=/databases/hg38/,                         │ │
│ │                inName=/tmp/tmpuhn11q26, outName=)                                            │ │
│ │       in_seq = 'ATATATGCTGCACTTTCACATAGATTTCAGCTAAAAAAACAGTTCTACTGATTAAAAAATTTTGAAGACCACTGA… │ │
│ │         self = <pxblat.server.client.Client object at 0x7fc2460a33a0>                        │ │
│ ╰──────────────────────────────────────────────────────────────────────────────────────────────╯ │
│                                                                                                  │
│ /test/lib/python3.10/site-packages/pxblat/s │
│ erver/client.py:191 in query_server                                                              │
│                                                                                                  │
│   188 │   except ValueError as e:                                                                │
│   189 │   │   if "No query results" in str(e):                                                   │
│   190 │   │   │   return None                                                                    │
│ ❱ 191 │   │   raise e                                                                            │
│   192 │   else:                                                                                  │
│   193 │   │   return _assign_info_to_query_result(res)                                           │
│   194                                                                                            │
│                                                                                                  │
│ ╭─────────────────────────────────────────── locals ───────────────────────────────────────────╮ │
│ │     fafile = <tempfile._TemporaryFileWrapper object at 0x7fc27520bdf0>                       │ │
│ │       host = None                                                                            │ │
│ │     option = ClientOption(hostName=localhost, portName=8000, tType=dna, qType=dna, dots=0,   │ │
│ │              nohead=false, minScore=20, minIdentity=90, outputFormat=psl, maxIntron=750000,  │ │
│ │              genome=, genomeDataDir=, isDynamic=false,                                       │ │
│ │              tSeqDir=/databases/hg38/, inName=/tmp/tmpuhn11q26,  │ │
│ │              outName=)                                                                       │ │
│ │      parse = True                                                                            │ │
│ │       port = None                                                                            │ │
│ │        ret = b"psLayout version 3\n\nmatch\tmis- \trep. \tN's\tQ gap\tQ gap\tT gap\tT        │ │
│ │              gap\tstrand\tQ      "+15049                                                     │ │
│ │ ret_decode = "psLayout version 3\n\nmatch\tmis- \trep. \tN's\tQ gap\tQ gap\tT gap\tT         │ │
│ │              gap\tstrand\tQ      "+14767                                                     │ │
│ │    seqname = '/tmp/tmpuhn11q26'                                                              │ │
│ ╰──────────────────────────────────────────────────────────────────────────────────────────────╯ │
│                                                                                                  │
│ /test/lib/python3.10/site-packages/pxblat/s │
│ erver/client.py:187 in query_server                                                              │
│                                                                                                  │
│   184 │   │   return ret_decode                                                                  │
│   185 │                                                                                          │
│   186 │   try:                                                                                   │
│ ❱ 187 │   │   res = read(ret_decode, "psl")                                                      │
│   188 │   except ValueError as e:                                                                │
│   189 │   │   if "No query results" in str(e):                                                   │
│   190 │   │   │   return None                                                                    │
│                                                                                                  │
│ ╭─────────────────────────────────────────── locals ───────────────────────────────────────────╮ │
│ │     fafile = <tempfile._TemporaryFileWrapper object at 0x7fc27520bdf0>                       │ │
│ │       host = None                                                                            │ │
│ │     option = ClientOption(hostName=localhost, portName=8000, tType=dna, qType=dna, dots=0,   │ │
│ │              nohead=false, minScore=20, minIdentity=90, outputFormat=psl, maxIntron=750000,  │ │
│ │              genome=, genomeDataDir=, isDynamic=false,                                       │ │
│ │              tSeqDir=/databases/hg38/, inName=/tmp/tmpuhn11q26,  │ │
│ │              outName=)                                                                       │ │
│ │      parse = True                                                                            │ │
│ │       port = None                                                                            │ │
│ │        ret = b"psLayout version 3\n\nmatch\tmis- \trep. \tN's\tQ gap\tQ gap\tT gap\tT        │ │
│ │              gap\tstrand\tQ      "+15049                                                     │ │
│ │ ret_decode = "psLayout version 3\n\nmatch\tmis- \trep. \tN's\tQ gap\tQ gap\tT gap\tT         │ │
│ │              gap\tstrand\tQ      "+14767                                                     │ │
│ │    seqname = '/tmp/tmpuhn11q26'                                                              │ │
│ ╰──────────────────────────────────────────────────────────────────────────────────────────────╯ │
│                                                                                                  │
│ /test/lib/python3.10/site-packages/pxblat/p │
│ arser.py:164 in read                                                                             │
│                                                                                                  │
│   161 │   try:                                                                                   │
│   162 │   │   next(query_results)                                                                │
│   163 │   │   msg = "More than one query result found in handle"                                 │
│ ❱ 164 │   │   raise ValueError(msg)                                                              │
│   165 │   except StopIteration:                                                                  │
│   166 │   │   pass                                                                               │
│   167                                                                                            │
│                                                                                                  │
│ ╭─────────────────────────────────────────── locals ───────────────────────────────────────────╮ │
│ │       content = "psLayout version 3\n\nmatch\tmis- \trep. \tN's\tQ gap\tQ gap\tT gap\tT      │ │
│ │                 gap\tstrand\tQ      "+14767                                                  │ │
│ │        format = 'psl'                                                                        │ │
│ │        kwargs = {}                                                                           │ │
│ │           msg = 'More than one query result found in handle'                                 │ │
│ │  query_result = QueryResult(id='/tmp/tmpuhn11q26', 23 hits)                                  │ │
│ │ query_results = <generator object parse at 0x7fc275298900>                                   │ │
│ ╰──────────────────────────────────────────────────────────────────────────────────────────────╯ │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
ValueError: More than one query result found in handle

I don't know why this is happening now if it works for you. I also ran the command pxblat server stop localhost port you told me in the other post to test the new version.

@cauliyang
Copy link
Collaborator

@anderdnavarro, I guess the reason is that we use different hg38.2bit files. But we should fix the bug, and I will try to resolve the issue. Thanks for the testing!

@anderdnavarro
Copy link
Author

Do you want me to reindex it? Or I can share it with you if you want.

@cauliyang
Copy link
Collaborator

cauliyang commented Feb 5, 2024

It will be better if you can share it with me so that I can reproduce and test the bug.

@anderdnavarro
Copy link
Author

@cauliyang here is the link to download them (fasta and 2bit):

https://drive.google.com/drive/folders/1hR0ozxbtLEaIiTwUhlIluexVoNnCRb7m?usp=sharing

@cauliyang
Copy link
Collaborator

Hi @anderdnavarro, Thanks for sharing the files. I found that I could not reproduce the issue on macOS arm64, but the issue happened on Linux x86. I have fixed the bug in the latest release 1.1.20. Free feel to check that.

@anderdnavarro
Copy link
Author

Hi @cauliyang, yes sorry, I'm using Linux x86.

It's working now! Thank you very much! The results for the test sequences are not the same you posted above (I think that because of the reference genome), but the results are good.

Related to this last version I have a question. I read the change log and saw that now the ID is the first five nucleotides + the length. In my list of 2000 sequences (all of them with the same length) there are several that have the same first five nucleotides. I tested pxblat and it didn't raise any error, so I think it is working properly. This is just to confirm if I should double check those sequences.

@cauliyang
Copy link
Collaborator

hi @anderdnavarro, glad to hear that it works now. Yep, I used different bit files so that the results are a little bit different. Also, I introduced a feature: Now the id of the query result will be a string concat the first 5 letters and the length of the sequence. for example, TACCG_100. Please let me know if you have a better idea!

@anderdnavarro
Copy link
Author

Hmm what do you think about seq0, seq1, seq2... depending on the number of sequences the user provides? That way the ID would be always unique and they can also track the result for one specific sequence because they know the ID. It's just an idea, your ID works well too!

@cauliyang
Copy link
Collaborator

@anderdnavarro, good points! I will consider more cases and then refactor the feature in the future.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants