Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Revise overload resolution for splats/truncations #7114

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

pow2clk
Copy link
Member

@pow2clk pow2clk commented Feb 3, 2025

Allow truncations when matching arguments for intrinsic overloads. This eliminates the need for explicit scalar extractions from vectors for arguments that are scalar by nature. This encompasses any vectors passed for scalars, allowing the truncation, but emitting a warning the same as is done for other assignments of vectors to scalars.

This maintains splats as the preferred transformations and promotes perfect matches to be preferred over that. This has the effect of removing the need to carefully order intrinsics to ensure that the right variant gets matched first before another one incorrectly takes its place with a faulty cast.

Allowing truncations causes a problems with a small subset of intrinsics that have explicit overloads for various matrix,vector, scalar combinations. Namely the mul overloads. These could be simplified to accept a new range of template types except the dimensions need to be matched in unconventional ways.

For these, the notion of uncastable or "ONLY" variants of the template/layout types are introduced. These are indicated with a trailing "!" after the parameter typename in gen_intrin_main, which directs them to an array that contains a NOCAST enum that, when encountered, will skip the attempts to splat or truncate.

Fixes #7079

@pow2clk pow2clk requested a review from a team as a code owner February 3, 2025 18:50
@pow2clk
Copy link
Member Author

pow2clk commented Feb 3, 2025

Alternate, but inferior approach can be viewed here #7115. I include it here because it might seem a simpler approach initially, but proved to be functional, but brittle.

Copy link
Contributor

github-actions bot commented Feb 3, 2025

✅ With the latest revision this PR passed the C/C++ code formatter.

Allow truncations when matching arguments for intrinsic overloads. This eliminates the need for explicit scalar extractions from vectors for arguments that are scalar by nature. This encompasses any vectors passed for scalars, allowing the truncation, but emitting a warning the same as is done for other assignments of vectors to scalars.

This maintains splats as the preferred transformations and promotes perfect matches to be preferred over that. This has the effect of removing the need to carefully order intrinsics to ensure that the right variant gets matched first before another one incorrectly takes its place with a faulty cast.

Allowing truncations causes a problems with a small subset of intrinsics that have explicit overloads for various matrix,vector, scalar combinations. Namely the mul overloads. These could be simplified to accept a new range of template types except the dimensions need to be matched in unconventional ways.

For these, the notion of uncastable or "ONLY" variants of the template/layout types are introduced. These are indicated with a trailing "!" after the parameter typename in gen_intrin_main, which directs them to an array that contains a NOCAST enum that, when encountered, will skip the attempts to splat or truncate.

Fixes microsoft#7079
Copy link
Member Author

@pow2clk pow2clk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review guidance comments.

LITEMPLATE_ARRAY = 6, // Scalar array.
LITEMPLATE_SCALAR_ONLY = 7, // Uncastable scalar types.
LITEMPLATE_VECTOR_ONLY = 8, // Uncastable vector types (eg. float3).
LITEMPLATE_MATRIX_ONLY = 9, // Uncastable matrix types (eg. float3x3).
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Castability here refers to the ability to truncate or splat something to change it's "shape" (or template or layout, we use a few terms) and not the castability of the contents.

}
}
// offset 0
if (opcode == OP::OpCode::TextureLoad) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was some simplification that came naturally with the removal of the extract operator here. The opcode is determined completely by the resource kind so a test for one is the same as a test for the other.

} else {
storeArgs.emplace_back(offset); // offset
}
storeArgs.emplace_back(offset); // offset
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One of several explicit "casts" in the form of a vector to scalar truncation that are now done in the default code paths instead of relying on special cases.

g_NullTT, g_ScalarTT, g_VectorTT, g_MatrixTT,
g_AnyTT, g_ObjectTT, g_ArrayTT,
g_NullTT, g_ScalarTT, g_VectorTT, g_MatrixTT, g_AnyTT,
g_ObjectTT, g_ArrayTT, g_ScalarOnlyTT, g_VectorOnlyTT, g_MatrixOnlyTT,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Automatically generated code for intrinsics will assign indexes into this array that will use the Only variants that have a NOCAST entry which will stop iterating through possibilities in a way that makes it clear that it failed due to requiring a cast where none is allowed.

@@ -6113,7 +6123,7 @@ bool HLSLExternalSource::MatchArguments(
ArBasicKind
ComponentType[MaxIntrinsicArgs]; // Component type for each argument,
// AR_BASIC_UNKNOWN if unspecified.
UINT uSpecialSize[IA_SPECIAL_SLOTS]; // row/col matching types, UNUSED_INDEX32
UINT uSpecialSize[IA_SPECIAL_SLOTS]; // row/col matching types, UnusedSize
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Incidental, just an incorrect and misleading comment.

r += status;
uav1.Load(a, status); // expected-error {{no matching member function for call to 'Load'}} expected-note {{requires single argument 'byteOffset', but 2 arguments were provided}} fxc-error {{X3013: RWByteAddressBuffer<uint>.Load(uint)}} fxc-error {{X3013: RWByteAddressBuffer<uint>.Load(uint, out uint status)}} fxc-error {{X3013: 'Load': no matching 2 parameter intrinsic method}} fxc-error {{X3013: Possible intrinsic methods are:}}
uav1.Load(a, status); // expected-warning {{implicit truncation of vector type}} fxc-error {{X3013: RWByteAddressBuffer<uint>.Load(uint)}} fxc-error {{X3013: RWByteAddressBuffer<uint>.Load(uint, out uint status)}} fxc-error {{X3013: 'Load': no matching 2 parameter intrinsic method}} fxc-error {{X3013: Possible intrinsic methods are:}}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As above, these are no longer failures and they give the same truncation warnings you'd see elsewhere when assingning a vector to a scalar.

numeric<c2> [[rn,unsigned_op=umul]] mul(in $match<1, 0> numeric<c>! a, in col_major $match<2, 0> numeric<c, c2>! b) : mul_vm;
numeric<r, c> [[rn,unsigned_op=umul]] mul(in $match<1, 0> numeric<r, c>! a, in $match<2, 0> numeric! b) : mul_ms;
numeric<r> [[rn,unsigned_op=umul]] mul(in row_major $match<1, 0> numeric<r, c>! a, in $match<2, 0> numeric<c>! b) : mul_mv;
numeric<r, c2> [[rn,unsigned_op=umul]] mul(in row_major $match<1, 0> numeric<r, c>! a, in col_major $match<2, 0> numeric<c, c2>! b) : mul_mm;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are the cases where uncastable "only" types were required since they previously relied on the order of the intrinsics to get correct behavior and allowing truncation in addition to splatting made no order possible that would preserve keeping the correct calls for the correct inputs. I chose ! to represent uncastability, but a number of options are possible.

$classT [[ro]] Load(in int<1> x) : buffer_load;
$classT [[]] Load(in int<1> x, out uint_only status) : buffer_load_s;
$classT [[ro]] Load(in int x) : buffer_load;
$classT [[]] Load(in int x, out uint_only status) : buffer_load_s;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The sole case where the load index was represented as a vec1, which contradicts our docs and is incosistent with its RW counterpart.

type_any_re = re.compile(r"(\S+)<>$")
type_array_re = re.compile(r"(\S+)\[\]$")
type_object_re = re.compile(
r"""(
sampler\w* | string |
sampler\w* | any_sampler\w* | string |
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not really related. The only change required to leave a large conditional block below as completely dead. The nice thing about changes here is that comparing the output source can verify that the results are identical as I did here.

):
template_list = "LITEMPLATE_OBJECT"
else:
template_list = "LITEMPLATE_SCALAR"
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the small change above to add any_sampler to the initial regex, all this explicit matching is redundant.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: New
Development

Successfully merging this pull request may close these issues.

Load/Store intrinsics for Raw and Typed buffers are inconsistent in allowing vec1 indexing
1 participant