Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding EVEX encoding logic for RR/AM pathways #77419

Merged
merged 4 commits into from
Nov 5, 2022

Conversation

DeepakRajendrakumaran
Copy link
Contributor

@DeepakRajendrakumaran DeepakRajendrakumaran commented Oct 24, 2022

This PR enables EVEX encoding for the following paths

  1. emitOutputRR()
  2. paths using emitOutputAM()

@dotnet-issue-labeler dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Oct 24, 2022
@ghost ghost added the community-contribution Indicates that the PR has been added by a community member label Oct 24, 2022
@ghost
Copy link

ghost commented Oct 24, 2022

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

Issue Details

This is a draft PR created for verifying some configs.

Author: DeepakRajendrakumaran
Assignees: -
Labels:

area-CodeGen-coreclr

Milestone: -

@DeepakRajendrakumaran DeepakRajendrakumaran force-pushed the avx512-RR branch 7 times, most recently from bf2025a to 7468f73 Compare October 28, 2022 21:36
@DeepakRajendrakumaran DeepakRajendrakumaran marked this pull request as ready for review October 28, 2022 21:42
@DeepakRajendrakumaran
Copy link
Contributor Author

@tannergooding @BruceForstall @anthonycanino

These are the next set of changes for enabling EVEX encoding paths.

Copy link
Member

@BruceForstall BruceForstall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have some minor questions and suggestions

@@ -29,117 +29,117 @@
#endif
/*****************************************************************************/
#ifndef INST0
#define INST0(id, nm, um, mr, flags)
#define INST0(id, nm, um, mr, tt, flags)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a comment for tt in the comment section at the top of the file

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@@ -330,6 +339,31 @@ enum insBarrier : unsigned
};
#endif

#if defined(TARGET_XARCH)
// Represents tupletype attribute of instruction.
// This is used in determining factor N while calculated compressed displacement in EVEX encoding
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// This is used in determining factor N while calculated compressed displacement in EVEX encoding
// This is used in determining factor N while calculating compressed displacement in EVEX encoding

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

// Represents tupletype attribute of instruction.
// This is used in determining factor N while calculated compressed displacement in EVEX encoding
// Reference: Section 2.6.5 in Intel 64 and ia-32 architectures software developer's manual volume 2.
enum insTupleType : uint32_t
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason why the 0xF00 bits are not used?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whoops. Yeah that's a mistake. Fixed

@@ -1789,7 +1789,7 @@ instruction CodeGen::ins_MathOp(genTreeOps oper, var_types type)
}

// Conversions to or from floating point values
instruction CodeGen::ins_FloatConv(var_types to, var_types from)
instruction CodeGen::ins_FloatConv(var_types to, var_types from, emitAttr attr)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you document attr? (Best would be to add a new-style function header comment that documents all the arguments and the function behavior)

Is this newly required for AVX-512? Does it change any existing behavior?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. It's required now because we divided INS_cvtsi2ss to INS_cvtsi2ss32 and INS_cvtsi2ss64. Similar for INS_cvtsi2sd. This is required because these instruction have some specific characteristic depending on if input is m32 or m64(W bit set or not)

@@ -1867,7 +1887,7 @@ instruction CodeGen::ins_MathOp(genTreeOps oper, var_types type)
}
}

instruction CodeGen::ins_FloatConv(var_types to, var_types from)
instruction CodeGen::ins_FloatConv(var_types to, var_types from, emitAttr attr)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be best to ifdef this function declaration so you don't touch the ARM implementation which doesn't use this argument and for which it might be confusing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@@ -7015,7 +7015,7 @@ void CodeGen::genIntToFloatCast(GenTree* treeNode)

// Note that here we need to specify srcType that will determine
// the size of source reg/mem operand and rex.w prefix.
instruction ins = ins_FloatConv(dstType, TYP_INT);
instruction ins = ins_FloatConv(dstType, TYP_INT, emitTypeSize(srcType));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The other cases use emitTypeSize(dstType) but this uses srcType?

Copy link
Contributor Author

@DeepakRajendrakumaran DeepakRajendrakumaran Nov 1, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah.. should be srcType . We are trying to differentiate between VCVTSI2SS xmm1, xmm2, m32 and VCVTSI2SS xmm1, xmm2, m64

inline bool hasTupleTypeInfo(instruction ins)
{
assert((unsigned)ins < ArrLen(insTupleTypeInfos));
return ((insTupleTypeInfos[ins] != INS_TT_NONE));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit

Suggested change
return ((insTupleTypeInfos[ins] != INS_TT_NONE));
return (insTupleTypeInfos[ins] != INS_TT_NONE);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

inline insTupleType insTupleTypeInfo(instruction ins)
{
assert((unsigned)ins < ArrLen(insTupleTypeInfos));
assert((insTupleTypeInfos[ins] != INS_TT_NONE));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit

Suggested change
assert((insTupleTypeInfos[ins] != INS_TT_NONE));
assert(insTupleTypeInfos[ins] != INS_TT_NONE);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Comment on lines 11486 to 11487
// Explore moving IsWEvexOpcodeExtension() logic inside TakesRexWPrefix(). Not doind so currently
// since we cannot differentiate EVEX vs VEX without 'code' untill all paths have EVEX support.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nits

Suggested change
// Explore moving IsWEvexOpcodeExtension() logic inside TakesRexWPrefix(). Not doind so currently
// since we cannot differentiate EVEX vs VEX without 'code' untill all paths have EVEX support.
// Explore moving IsWEvexOpcodeExtension() logic inside TakesRexWPrefix(). Not doing so currently
// since we cannot differentiate EVEX vs VEX without 'code' until all paths have EVEX support.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

// id - Instruction descriptor.
//
// Returns:
// `true` if the instruction does embeddded broadcast.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit

Suggested change
// `true` if the instruction does embeddded broadcast.
// `true` if the instruction does embedded broadcast.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@ghost ghost added needs-author-action An issue or pull request that requires more info or actions from the author. and removed needs-author-action An issue or pull request that requires more info or actions from the author. labels Oct 29, 2022
@BruceForstall
Copy link
Member

BruceForstall commented Nov 5, 2022

No diffs, as expected. Throughput is slightly improved on x64 (about 0.09%), slightly regressed on x86 (minimal on full opts, 0.18% on MinOpts)

Comment on lines +348 to +363
INS_TT_NONE = 0x00000,
INS_TT_FULL = 0x00001,
INS_TT_HALF = 0x00002,
INS_TT_IS_BROADCAST = 0x00003,
INS_TT_FULL_MEM = 0x00010,
INS_TT_TUPLE1_SCALAR = 0x00020,
INS_TT_TUPLE1_FIXED = 0x00040,
INS_TT_TUPLE2 = 0x00080,
INS_TT_TUPLE4 = 0x00100,
INS_TT_TUPLE8 = 0x00200,
INS_TT_HALF_MEM = 0x00400,
INS_TT_QUARTER_MEM = 0x00800,
INS_TT_EIGHTH_MEM = 0x01000,
INS_TT_MEM128 = 0x02000,
INS_TT_MOVDDUP = 0x04000,
INS_TT_IS_NON_BROADCAST = 0x7FFFC
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few minor nits to consider for later:

  • The high zero isn't needed, especially since there aren't any cases which set it. Remove it?
  • Thus, the "7" in INS_TT_IS_NON_BROADCAST is not needed
  • I was confused at first by INS_TT_IS_BROADCAST being 3 and not 4. Maybe rename INS_TT_IS_BROADCAST to INS_TT_BROADCAST_MASK and INS_TT_IS_NON_BROADCAST to INS_TT_NON_BROADCAST_MASK to make it clear they are bit masks? (I notice they currently aren't used)

@BruceForstall BruceForstall merged commit 15f015f into dotnet:main Nov 5, 2022
@ghost ghost locked as resolved and limited conversation to collaborators Dec 5, 2022
@BruceForstall BruceForstall added the avx512 Related to the AVX-512 architecture label Mar 27, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI avx512 Related to the AVX-512 architecture community-contribution Indicates that the PR has been added by a community member
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants