Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

#78303 Add transformation ~v1 & v2 to VectorXxx.AndNot(v1, v2) #81993

Merged
merged 20 commits into from
Sep 4, 2023
Merged
Changes from 7 commits
Commits
Show all changes
20 commits
Select commit Hold shift + click to select a range
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
91 changes: 91 additions & 0 deletions src/coreclr/jit/morph.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -10870,6 +10870,97 @@ GenTree* Compiler::fgOptimizeHWIntrinsic(GenTreeHWIntrinsic* node)
return node;
}

#if defined(TARGET_XARCH) || defined(TARGET_ARM64)

#if defined(TARGET_XARCH)
case NI_SSE_And:
case NI_SSE2_And:
case NI_AVX_And:
case NI_AVX2_And:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about Vector128/256_And and AdvSimd ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Vector64/128/256_And don't exist outside of import at the moment so they don't need to be handled.

AdvSimd should be since we want parity between xarch and arm.

#elif defined(TARGET_ARM64)
case NI_AdvSimd_And:
#endif
{
assert(node->GetOperandCount() == 2);

GenTree* op1 = node->Op(1);
GenTree* op2 = node->Op(2);
GenTree* lhs = nullptr;
GenTree* rhs = nullptr;
GenTreeHWIntrinsic* inner_hw = nullptr;

// Transforms ~v1 & v2 to VectorXxx.AndNot(v1, v2)
if (op1->OperIs(GT_HWINTRINSIC))
{
GenTreeHWIntrinsic* xor_hw = op1->AsHWIntrinsic();
switch (xor_hw->GetHWIntrinsicId())
{
#if defined(TARGET_XARCH) || defined(TARGET_ARM64)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is unnecessary, you're already in a larger identical ifdef from L10873.

That being said, the larger identical ifdef on L10873 should also be unnecessary given we're in a greater #ifdef FEATURE_HW_INTRINSICS


#if defined(TARGET_XARCH)
case NI_SSE_Xor:
case NI_SSE2_Xor:
case NI_AVX_Xor:
case NI_AVX2_Xor:
#elif defined(TARGET_ARM64)
case NI_AdvSimd_Xor:
#endif
inner_hw = xor_hw;
rhs = op2;
#endif
default:
{
break;
}
}
}

// Transforms v2 & (~v1) to VectorXxx.AndNot(v2, v1)
if (op2->OperIs(GT_HWINTRINSIC))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This check is going to miss the opt if we have something like ((x ^ AllBitsSet) & (y ^ z). Such a tree could have been transformed into AndNot((y ^ z), x)

In general you're going to need to match (op1 ^ AllBitsSet) up front before determining if its a match and then if that fails do the same check for (op2 ^ AllBitsSet).

For Arm64, you'll also need to directly check for ~op1 or ~op2 (since NI_AdvSimd_Not exists).

There are some things we could do to make this overall simpler, but they are slightly more involved changes.

Copy link
Member

@tannergooding tannergooding Mar 6, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd, in general, recommend extracting some of this to a helper.

For example, you could define something like:

genTreeOps GenTreeHWIntrinsic::HWOperGet()
{
    switch (GetHWIntrinsicId())
    {
#if defined(TARGET_XARCH)
        case NI_SSE_And:
        case NI_SSE2_And:
        case NI_AVX_And:
        case NI_AVX2_And:
#elif defined(TARGET_ARM64)
        case NI_AdvSimd_And:
#endif
        {
            return GT_AND;
        }

#if defined(TARGET_ARM64)
        case NI_AdvSimd_Not:
        {
            return GT_NOT;
        }
#endif

#if defined(TARGET_XARCH)
        case NI_SSE_Xor:
        case NI_SSE2_Xor:
        case NI_AVX_Xor:
        case NI_AVX2_Xor:
#elif defined(TARGET_ARM64)
        case NI_AdvSimd_Xor:
#endif
        {
            return GT_XOR;
        }

        // TODO: Handle other cases

        default:
        {
            return GT_NONE;
        }
    }
}

Such a helper allows you to instead switch over the genTreeOps equivalent. So you could have something like:

switch (node->HWOperGet())
{
    case GT_AND:
    {
        GenTree* op1 = node->Op(1);
        GenTree* op2 = node->Op(2);
        GenTree* lhs = nullptr;
        GenTree* rhs = nullptr;

        if (op1->OperIsHWIntrinsic())
        {
            // Try handle: ~op1 & op2
            GenTreeHWIntrinsic* hw     = op1->AsHWIntrinsic();
            genTreeOps          hwOper = hw->HWOperGet();

            if (hwOper == GT_NOT)
            {
                lhs = op2;
                rhs = op1;
            }
            else if (op1Oper == GT_XOR)
            {
                GenTree* hwOp1 = hw->Op(1);
                GenTree* hwOp2 = hw->Op(2);

                if (hwOp1->IsVectorAllBitsSet())
                {
                    lhs = op2;
                    rhs = hwOp2;
                }
                else if (hwOp2->IsVectorAllBitsSet())
                {
                    lhs = op2;
                    rhs = hwOp1;
                }
            }
        }

        if ((lhs == nullptr) && op2->OperIsHWIntrinsic())
        {
            // Try handle: op1 & ~op2
            GenTreeHWIntrinsic* hw     = op2->AsHWIntrinsic();
            genTreeOps          hwOper = hw->HWOperGet();

            if (hwOper == GT_NOT)
            {
                lhs = op1;
                rhs = op2;
            }
            else if (op1Oper == GT_XOR)
            {
                GenTree* hwOp1 = hw->Op(1);
                GenTree* hwOp2 = hw->Op(2);

                if (hwOp1->IsVectorAllBitsSet())
                {
                    lhs = op1;
                    rhs = hwOp2;
                }
                else if (hwOp2->IsVectorAllBitsSet())
                {
                    lhs = op1;
                    rhs = hwOp1;
                }
            }
        }

        if (lhs == nullptr)
        {
            break;
        }

        GenTree* andnNode = gtNewSimdBinOpNode(GT_AND_NOT, simdType, lhs, rhs, simdBaseJitType, simdSize, true);

        DEBUG_DESTROY_NODE(node);
        INDEBUG(andnNode->gtDebugFlags |= GTF_DEBUG_NODE_MORPHED);

        return andnNode;
    }

    default:
    {
        break;
    }
}

You could of course also extract the NOT op vs op XOR AllBitsSet matching logic to reduce duplication as well.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Longer term, I think we may want to introduce a "fake" Isa_Not hwintrinsic id for xarch. That would allow morph to transform x ^ AllBitsSet into Isa_Not and then would in turn allow this case to be simplified in its pattern checks.

We may also want to normalize cases like Sse_Xor, Sse2_Xor, and AdvSimd_Xor into Vector128_Xor, so we don't need to consider xplat differences. But that will also involve significant refactorings, far more so than introducing a HWOperGet helper for the time being.

{
GenTreeHWIntrinsic* xor_hw = op2->AsHWIntrinsic();
switch (xor_hw->GetHWIntrinsicId())
{
#if defined(TARGET_XARCH) || defined(TARGET_ARM64)

#if defined(TARGET_XARCH)
case NI_SSE_Xor:
case NI_SSE2_Xor:
case NI_AVX_Xor:
case NI_AVX2_Xor:
#elif defined(TARGET_ARM64)
case NI_AdvSimd_Xor:
#endif
inner_hw = xor_hw;
rhs = op1;
#endif
default:
{
break;
}
}
}

if ((inner_hw == nullptr) || (!inner_hw->Op(2)->IsVectorAllBitsSet()))
{
return node;
}

var_types simdType = node->TypeGet();
CorInfoType simdBaseJitType = node->GetSimdBaseJitType();
unsigned int simdSize = node->GetSimdSize();

lhs = inner_hw->Op(1);

GenTree* andnNode = gtNewSimdBinOpNode(GT_AND_NOT, simdType, lhs, rhs, simdBaseJitType, simdSize, true);

DEBUG_DESTROY_NODE(node);

INDEBUG(andnNode->gtDebugFlags |= GTF_DEBUG_NODE_MORPHED);

return andnNode;
}
#endif
default:
{
break;
Expand Down