Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix x86 SSE4.1 ptestnzc #3216

Merged
merged 1 commit into from
Dec 9, 2023
Merged

Fix x86 SSE4.1 ptestnzc #3216

merged 1 commit into from
Dec 9, 2023

Commits on Dec 8, 2023

  1. Fix x86 SSE4.1 ptestnzc

    `(op & mask) == 0` and `(op & mask) == mask` need each to be calculated for the whole vector.
    
    For example, given
    * `op = [0b100, 0b010]`
    * `mask = [0b100, 0b110]`
    
    The correct result would be:
    * `op & mask = [0b100, 0b010]`
    Comparisons are done on the vector as a whole:
    * `all_zero = (op & mask) == [0, 0] = false`
    * `masked_set = (op & mask) == mask = false`
    * `!all_zero && !masked_set = true`
    
    The previous method:
    `op & mask = [0b100, 0b010]`
    Comparisons are done element-wise:
    * `all_zero = (op & mask) == [0, 0] = [true, true]`
    * `masked_set = (op & mask) == mask = [true, false]`
    * `!all_zero && !masked_set = [true, false]`
    After folding with AND, the final result would be `false`, which is incorrect.
    eduardosm committed Dec 8, 2023
    Configuration menu
    Copy the full SHA
    092eb11 View commit details
    Browse the repository at this point in the history