Skip to content
This repository has been archived by the owner on Jan 23, 2023. It is now read-only.

Don't optimize MultiplyNoFlags away #21928

Merged
merged 1 commit into from
Jan 11, 2019
Merged

Don't optimize MultiplyNoFlags away #21928

merged 1 commit into from
Jan 11, 2019

Conversation

fiigii
Copy link

@fiigii fiigii commented Jan 10, 2019

@fiigii
Copy link
Author

fiigii commented Jan 10, 2019

@CarolEidt @AndyAyersMS PTAL

The code below was compiled incorrectly that MultiplyNoFlags was optimized away.

        static unsafe ulong test(ulong a, ulong b)
        {
            ulong r;
	    Bmi2.X64.MultiplyNoFlags(a, b, &r);
            return r;
        }

My current solution is to assign the intrinsic call to a local variable. Although it would have a redundant ASG IR for "normal usages" (i.e., res = Bmi2.X64.MultiplyNoFlags(a, b, &r)), the final codegen does not have redundant movs.

@mikedn
Copy link

mikedn commented Jan 10, 2019

Hmm, this intrinsic should be treated as a memory store. Is it? I guess this is another peculiar thing about it - the other memory stores don't return a value AFAIR.

@fiigii
Copy link
Author

fiigii commented Jan 10, 2019

Right, the importer usually forces to append the expr into the IR tree if it returns VOID. But this intrinsic has a real return value and memory store semantic simultaneously.

@tannergooding
Copy link
Member

Do we know why this isn't this covered by it both being a memory store (https://github.com/dotnet/coreclr/blob/master/src/jit/gentree.cpp#L17920) and the intrinsics having been updated to be special handled in fgComputeLifeLIR and fgCheckRemoveStmt (see https://github.com/dotnet/coreclr/issues/16088 for more details)?

@fiigii
Copy link
Author

fiigii commented Jan 10, 2019

@tannergooding this happens in the importer and doesn’t reach that point.

@mikedn
Copy link

mikedn commented Jan 10, 2019

Check how GT_CMPXCHG is handled in the importer, maybe it helps. It has the same characteristics - writes to memory and returns a value.

@CarolEidt
Copy link

I believe that it needs to be marked as having a side-effect. In the GT_CMPXCHG case, this is the GTF_ASG flag. The method GenTree::OperRequiresAsgFlag() is where the conditions for this are checked. I suspect that any memory store intrinsic should also be handled there.

@fiigii
Copy link
Author

fiigii commented Jan 10, 2019

I believe that it needs to be marked as having a side-effect. In the GT_CMPXCHG case, this is the GTF_ASG flag.

Ah, thank you so much, will try.

@fiigii fiigii changed the title [WIP] Don't optimize MultiplyNoFlags away Don't optimize MultiplyNoFlags away Jan 10, 2019
@fiigii
Copy link
Author

fiigii commented Jan 10, 2019

Added GTF_GLOB_REF and GT_ASG flags to solve the issue. Checked JitDump that does not have redundant IRs or codegen.

@CarolEidt @AndyAyersMS @mikedn PTAL

@CarolEidt
Copy link

I think that you should also change GenTree::OperRequiresAsgFlag() to return true for these intrinsics. @erozenfeld would know better than I, but I believe that, even though these intrinsics are setting it in the constructor, we may need this to also return true for checking of flags.

@erozenfeld
Copy link
Member

Yes, GenTree::OperRequiresAsgFlag() should also be changed. It's used both for flags validation and for recomputing the flags.

@fiigii
Copy link
Author

fiigii commented Jan 11, 2019

@CarolEidt @erozenfeld OperRequiresAsgFlag has already had the code to check these intrinsics.

coreclr/src/jit/gentree.cpp

Lines 5063 to 5081 in 459b58a

bool GenTree::OperRequiresAsgFlag()
{
if (OperIs(GT_ASG) || OperIs(GT_XADD, GT_XCHG, GT_LOCKADD, GT_CMPXCHG, GT_MEMORYBARRIER))
{
return true;
}
#ifdef FEATURE_HW_INTRINSICS
if (gtOper == GT_HWIntrinsic)
{
GenTreeHWIntrinsic* hwIntrinsicNode = this->AsHWIntrinsic();
if (hwIntrinsicNode->OperIsMemoryStore())
{
// A MemoryStore operation is an assignment
return true;
}
}
#endif // FEATURE_HW_INTRINSICS
return false;
}

@CarolEidt
Copy link

OperRequiresAsgFlag has already had the code to check these intrinsics.

Wow, right there as I was looking at it, and I missed it! Thanks.

@CarolEidt
Copy link

test Ubuntu arm Cross Checked Innerloop Build and Test

Copy link

@CarolEidt CarolEidt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM - thanks!

@CarolEidt CarolEidt merged commit b3881b4 into dotnet:master Jan 11, 2019
@fiigii fiigii deleted the fixMulx branch January 12, 2019 01:01
picenka21 pushed a commit to picenka21/runtime that referenced this pull request Feb 18, 2022
Don't optimize MultiplyNoFlags away

Commit migrated from dotnet/coreclr@b3881b4
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants