Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question about gap open penalty for different vectorizations. #64

Closed
jvkersch opened this issue Dec 4, 2018 · 8 comments
Closed

Question about gap open penalty for different vectorizations. #64

jvkersch opened this issue Dec 4, 2018 · 8 comments

Comments

@jvkersch
Copy link
Contributor

jvkersch commented Dec 4, 2018

The following came up while trying to debug why Parasail would return different results on different machines (more specifically, a fairly recent laptop (i7) and a Travis VM). I narrowed things down to one version of the code dispatching to the AVX2 implementation, and another one to the SSE4 implementation. Surprisingly, while the remainder of the code is exactly the same, the two different vectorizations give different results. There's probably something that I've overlooked, but what could cause this?

The code snippet below reproduces the problem for me (with Parasail master). On my machine, it prints the following:

B                1 AAAAAAAAAA---AAAAAAAAAA      20
                   ||||||||||   ||||||||||
A                1 AAAAAAAAAATTTAAAAAAAAAA      23

Length: 23
Identity:        20/23 (87.0%)
Similarity:      20/23 (87.0%)
Gaps:             3/23 (13.0%)
Score: 97

B                1 AAAAAAAAAA-A-AAAAAAAAA-      20
                   |||||||||| - |||||||||
A                1 AAAAAAAAAATTTAAAAAAAAAA      23

Length: 23
Identity:        19/23 (82.6%)
Similarity:      19/23 (82.6%)
Gaps:             3/23 (13.0%)
Score: 89

Note how the alignments, the score, and the number of matches are different.

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

#include "parasail.h"
#include "parasail/matrices/nuc44.h"

int main()
{
    parasail_result_t* result;
    
    const char *s = "AAAAAAAAAATTTAAAAAAAAAA";
    const char *t = "AAAAAAAAAAAAAAAAAAAA";

    /* SSE4 alignment */
    result = parasail_sg_trace_striped_sse41_128_16(
        s, strlen(s), t, strlen(t), 1, 4, &parasail_nuc44);

    parasail_traceback_generic(
        s, strlen(s), t, strlen(t),
        "A", "B", &parasail_nuc44,
        result, '|', '+', '-', 79, 10, 1);

    /* AVX2 alignment */
    result = parasail_sg_trace_striped_avx2_256_16(
        s, strlen(s), t, strlen(t), 1, 4, &parasail_nuc44);

    parasail_traceback_generic(
        s, strlen(s), t, strlen(t),
        "A", "B", &parasail_nuc44,
        result, '|', '+', '-', 79, 10, 1);
    
    return 0;
}
@jeffdaily
Copy link
Owner

I have confirmed the behavior but haven't looked at it in any detail yet. I don't believe I regularly verified where the gap open penalty was smaller than the gap extend penalty as in your case here. These are the gap {open,extend} penalties I regularly tested:

  • {10,1},
  • {10,2},
  • {14,2},
  • {40,2},

If you're building with the configure script and Makefile, you can run make check or make test to build all the tests. Running ./tests/test_verify -f ../data/test_64.fasta -m nuc44 -o 1 -e 4 will evaluate all vector implementations and print an error whenever there is a mismatch against the non-vectorized routine.

test_64.fasta
> one
AAAAAAAAAATTTAAAAAAAAAA
> two
AAAAAAAAAAAAAAAAAAAA

Is it common for the gap open penalty to be smaller than the gap extend penalty?

@jeffdaily
Copy link
Owner

One more thing. Can you provide the correct score and traceback for these particular settings using a different tool so that I might verify it against parasail?

@jvkersch
Copy link
Contributor Author

jvkersch commented Dec 5, 2018

@jeffdaily Many thanks for the quick reply! This isn't a serious issue for us, it arose as I was putting together some tests for Cython wrappers that I had written, and the case where the gap open penalty was larger than the gap extend came up as a corner case. Still, it would be good to understand this a little deeper.

I ran the tests that you proposed: make test passes and test_verify gives the following

$ ./tests/test_verify -f ./data/test_64.fasta -m nuc44 -o 1 -e 4
2 choose 2 is 1
checking parasail_nw_sse2 functions
parasail_nw_scan(0,1,1,4,nuc44) wrong score (97!=91)
parasail_nw_scan_sse2_128_64(0,1,1,4,nuc44) wrong score (97!=94)
parasail_nw_scan_sse2_128_32(0,1,1,4,nuc44) wrong score (97!=94)
parasail_nw_scan_sse2_128_16(0,1,1,4,nuc44) wrong score (97!=94)
parasail_nw_scan_sse2_128_8(0,1,1,4,nuc44) wrong score (97!=94)
checking parasail_sg_sse2 functions
parasail_sg_scan(0,1,1,4,nuc44) wrong score (97!=91)
parasail_sg_scan_sse2_128_64(0,1,1,4,nuc44) wrong score (97!=94)
parasail_sg_scan_sse2_128_32(0,1,1,4,nuc44) wrong score (97!=94)
parasail_sg_scan_sse2_128_16(0,1,1,4,nuc44) wrong score (97!=94)
parasail_sg_scan_sse2_128_8(0,1,1,4,nuc44) wrong score (97!=94)
parasail_sg_striped_sse2_128_8(0,1,1,4,nuc44) wrong score (97!=89)
parasail_sg_striped_sse2_128_8(0,1,1,4,nuc44) wrong end_query (22!=21)
checking parasail_sw_sse2 functions
parasail_sw_scan(0,1,1,4,nuc44) wrong score (97!=91)
parasail_sw_scan_sse2_128_64(0,1,1,4,nuc44) wrong score (97!=94)
parasail_sw_scan_sse2_128_32(0,1,1,4,nuc44) wrong score (97!=94)
parasail_sw_scan_sse2_128_16(0,1,1,4,nuc44) wrong score (97!=94)
parasail_sw_scan_sse2_128_8(0,1,1,4,nuc44) wrong score (97!=94)
parasail_sw_striped_sse2_128_8(0,1,1,4,nuc44) wrong score (97!=89)
parasail_sw_striped_sse2_128_8(0,1,1,4,nuc44) wrong end_query (22!=21)
checking parasail_nw_sse41 functions
parasail_nw_scan(0,1,1,4,nuc44) wrong score (97!=91)
parasail_nw_scan_sse41_128_64(0,1,1,4,nuc44) wrong score (97!=94)
parasail_nw_scan_sse41_128_32(0,1,1,4,nuc44) wrong score (97!=94)
parasail_nw_scan_sse41_128_16(0,1,1,4,nuc44) wrong score (97!=94)
parasail_nw_scan_sse41_128_8(0,1,1,4,nuc44) wrong score (97!=94)
checking parasail_sg_sse41 functions
parasail_sg_scan(0,1,1,4,nuc44) wrong score (97!=91)
parasail_sg_scan_sse41_128_64(0,1,1,4,nuc44) wrong score (97!=94)
parasail_sg_scan_sse41_128_32(0,1,1,4,nuc44) wrong score (97!=94)
parasail_sg_scan_sse41_128_16(0,1,1,4,nuc44) wrong score (97!=94)
parasail_sg_scan_sse41_128_8(0,1,1,4,nuc44) wrong score (97!=94)
parasail_sg_striped_sse41_128_8(0,1,1,4,nuc44) wrong score (97!=89)
parasail_sg_striped_sse41_128_8(0,1,1,4,nuc44) wrong end_query (22!=21)
checking parasail_sw_sse41 functions
parasail_sw_scan(0,1,1,4,nuc44) wrong score (97!=91)
parasail_sw_scan_sse41_128_64(0,1,1,4,nuc44) wrong score (97!=94)
parasail_sw_scan_sse41_128_32(0,1,1,4,nuc44) wrong score (97!=94)
parasail_sw_scan_sse41_128_16(0,1,1,4,nuc44) wrong score (97!=94)
parasail_sw_scan_sse41_128_8(0,1,1,4,nuc44) wrong score (97!=94)
parasail_sw_striped_sse41_128_8(0,1,1,4,nuc44) wrong score (97!=89)
parasail_sw_striped_sse41_128_8(0,1,1,4,nuc44) wrong end_query (22!=21)
checking parasail_nw_avx2 functions
parasail_nw_scan(0,1,1,4,nuc44) wrong score (97!=91)
parasail_nw_scan_avx2_256_64(0,1,1,4,nuc44) wrong score (97!=94)
parasail_nw_scan_avx2_256_32(0,1,1,4,nuc44) wrong score (97!=94)
parasail_nw_scan_avx2_256_16(0,1,1,4,nuc44) wrong score (97!=94)
parasail_nw_scan_avx2_256_8(0,1,1,4,nuc44) wrong score (97!=94)
parasail_nw_striped_avx2_256_16(0,1,1,4,nuc44) wrong score (97!=88)
checking parasail_sg_avx2 functions
parasail_sg_scan(0,1,1,4,nuc44) wrong score (97!=91)
parasail_sg_scan_avx2_256_64(0,1,1,4,nuc44) wrong score (97!=94)
parasail_sg_scan_avx2_256_32(0,1,1,4,nuc44) wrong score (97!=94)
parasail_sg_scan_avx2_256_16(0,1,1,4,nuc44) wrong score (97!=94)
parasail_sg_scan_avx2_256_8(0,1,1,4,nuc44) wrong score (97!=94)
parasail_sg_striped_avx2_256_16(0,1,1,4,nuc44) wrong score (97!=89)
parasail_sg_striped_avx2_256_16(0,1,1,4,nuc44) wrong end_query (22!=21)
parasail_sg_striped_avx2_256_8(0,1,1,4,nuc44) wrong score (97!=89)
parasail_sg_striped_avx2_256_8(0,1,1,4,nuc44) wrong end_query (22!=21)
checking parasail_sw_avx2 functions
parasail_sw_scan(0,1,1,4,nuc44) wrong score (97!=91)
parasail_sw_scan_avx2_256_64(0,1,1,4,nuc44) wrong score (97!=94)
parasail_sw_scan_avx2_256_32(0,1,1,4,nuc44) wrong score (97!=94)
parasail_sw_scan_avx2_256_16(0,1,1,4,nuc44) wrong score (97!=94)
parasail_sw_scan_avx2_256_8(0,1,1,4,nuc44) wrong score (97!=94)
parasail_sw_striped_avx2_256_16(0,1,1,4,nuc44) wrong score (97!=89)
parasail_sw_striped_avx2_256_16(0,1,1,4,nuc44) wrong end_query (22!=21)
parasail_sw_striped_avx2_256_8(0,1,1,4,nuc44) wrong score (97!=89)
parasail_sw_striped_avx2_256_8(0,1,1,4,nuc44) wrong end_query (22!=21)
checking parasail_nw_disp functions
parasail_nw_scan(0,1,1,4,nuc44) wrong score (97!=91)
parasail_nw_scan_64(0,1,1,4,nuc44) wrong score (97!=94)
parasail_nw_scan_32(0,1,1,4,nuc44) wrong score (97!=94)
parasail_nw_scan_16(0,1,1,4,nuc44) wrong score (97!=94)
parasail_nw_scan_8(0,1,1,4,nuc44) wrong score (97!=94)
parasail_nw_striped_16(0,1,1,4,nuc44) wrong score (97!=88)
parasail_nw_scan_sat(0,1,1,4,nuc44) wrong score (97!=94)
parasail_nw_striped_sat(0,1,1,4,nuc44) wrong score (97!=88)
checking parasail_sg_disp functions
parasail_sg_scan(0,1,1,4,nuc44) wrong score (97!=91)
parasail_sg_scan_64(0,1,1,4,nuc44) wrong score (97!=94)
parasail_sg_scan_32(0,1,1,4,nuc44) wrong score (97!=94)
parasail_sg_scan_16(0,1,1,4,nuc44) wrong score (97!=94)
parasail_sg_scan_8(0,1,1,4,nuc44) wrong score (97!=94)
parasail_sg_striped_16(0,1,1,4,nuc44) wrong score (97!=89)
parasail_sg_striped_16(0,1,1,4,nuc44) wrong end_query (22!=21)
parasail_sg_striped_8(0,1,1,4,nuc44) wrong score (97!=89)
parasail_sg_striped_8(0,1,1,4,nuc44) wrong end_query (22!=21)
parasail_sg_scan_sat(0,1,1,4,nuc44) wrong score (97!=94)
parasail_sg_striped_sat(0,1,1,4,nuc44) wrong score (97!=89)
parasail_sg_striped_sat(0,1,1,4,nuc44) wrong end_query (22!=21)
checking parasail_sw_disp functions
parasail_sw_scan(0,1,1,4,nuc44) wrong score (97!=91)
parasail_sw_scan_64(0,1,1,4,nuc44) wrong score (97!=94)
parasail_sw_scan_32(0,1,1,4,nuc44) wrong score (97!=94)
parasail_sw_scan_16(0,1,1,4,nuc44) wrong score (97!=94)
parasail_sw_scan_8(0,1,1,4,nuc44) wrong score (97!=94)
parasail_sw_striped_16(0,1,1,4,nuc44) wrong score (97!=89)
parasail_sw_striped_16(0,1,1,4,nuc44) wrong end_query (22!=21)
parasail_sw_striped_8(0,1,1,4,nuc44) wrong score (97!=89)
parasail_sw_striped_8(0,1,1,4,nuc44) wrong end_query (22!=21)
parasail_sw_scan_sat(0,1,1,4,nuc44) wrong score (97!=94)
parasail_sw_striped_sat(0,1,1,4,nuc44) wrong score (97!=89)
parasail_sw_striped_sat(0,1,1,4,nuc44) wrong end_query (22!=21)

@jvkersch
Copy link
Contributor Author

jvkersch commented Dec 5, 2018

I will look for an external tool that can produce the "correct" alignment.

@jeffdaily
Copy link
Owner

Side question(s): Did you use cython to wrap the entire parasail C API or perhaps just a handful of the ones you need? When I started the https://github.com/jeffdaily/parasail-python repo, I initially used cython. But I had little experience with cython on windows, so I switched to ctypes. Does parasail-python not meet your needs such that you needed your own cython bindings?

@jvkersch
Copy link
Contributor Author

jvkersch commented Dec 6, 2018

To answer your side question: we already had some Cython code (independent of Parasail) so it was easier to continue along that road than to add a new dependency. Starting from scratch I'd probably use parasail-python. In fact, I started making a Conda package for parasail-python, but I got bogged down with build issues (with the Conda build system, not with parasail-python).

@jeffdaily
Copy link
Owner

A gap open penalty that is smaller than the gap extend penalty is not supported. I will note this in the readme. I appreciate the question/discussion! Thanks for being a user.

@jvkersch
Copy link
Contributor Author

Thanks @jeffdaily ! Congrats on the new release.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants