[Stream] Add layouts to encodings for all stream tensor AffinityOp. #19726

hanhanW · 2025-01-17T08:51:31Z

The revision adds the support for the rest of AffinityOp that have TensorPhase trait, i.e., TensorCloneOp, TensorSliceOp, TensorFillOp, and TensorUpdateOp ops. It is tricky to handle encodings for transfer ops, so only the encoding in the fill op is updated. If other operations have tensor encodings, it returns a failure for now.

There are two stream tensor ops do not implement the AffinityOpInterface, so they are not supported within the revision. They are stream.tensor.load op and stream.tensor.store op. We should be able to track the resource affinity for these two ops, and it requires additional analysis. Thus, they are not scoped within the revision.

The revision also adds the missing documentation to the addLayoutsToTensorPhaseOps method.

benvanik

Hmm transfer ops are ones where we can't really have encoding changes or possibly even encodings at all - we could at the cost of never hitting the DMA path with them, and that's probably bad. We may have to brainstorm on that -- today I'd say we need to add a trait (SameOperandsAndResultEncodings) to these ops and verify that they can be implemented with memcpy but I'm not sure even that is enough. Basically, if any transfer op besides clone (which is "entire tensor") has an encoding we have to generate a dispatch (codegen it, bundle it with binary, and ship it) for that specific set of operand and result encodings and those will run on execution units instead of DMA ones (as we can't cuMemcpy a slice of an encoded tensor if that has funny padding, or can't have cuMemcpy swizzle/change encodings for us). If you were to run any code doing this today with this change it will be incorrect (unless it's an entire tensor and the same operand and result encoding).

Such conversions to dispatches happen in MaterializeBuiltinsPass - e.g., if you try to fill with an i64 value we have to use a dispatch as GPUs don't support i64 fills. That pass would need to look at any of the ops, decide if they can be implemented with memcpy/memset, and if not emit new executables with the custom load/store with the encodings.

We may have to look at some IR and see if we can handle that better earlier on - after MaterializeBuiltinsPass we have to have everything be either supportable by devices or turned into dispatches, but that's pretty late in the flow and we want to try to guarantee that we rarely (if ever) end up down those paths (we're properly fusing with producers/consumers).

IOW I'm worried this may not be implementable - we need to iterate a bit.

benvanik · 2025-01-17T16:23:30Z

compiler/src/iree/compiler/Dialect/Stream/Transforms/SpecializeEncodings.cpp

+  return success();
+}
+
+/// Updates the update_encoding for `op`. The op have to define a


Suggested change

/// Updates the update_encoding for `op`. The op have to define a

/// Updates the update_encoding for `op`. The op has to define a

benvanik · 2025-01-17T16:23:36Z

compiler/src/iree/compiler/Dialect/Stream/Transforms/SpecializeEncodings.cpp

@@ -141,6 +159,52 @@ updateResultEncoding(RewriterBase &rewriter, OpTy op,
  return success();
 }

+/// Updates the target_encoding for `op`. The op have to define a


Suggested change

/// Updates the target_encoding for `op`. The op have to define a

/// Updates the target_encoding for `op`. The op has to define a

benvanik · 2025-01-17T16:29:12Z

compiler/src/iree/compiler/Dialect/Stream/Transforms/SpecializeEncodings.cpp

+/// tensor type, the method resolves the layouts, strips outdated information,
+/// and adds the resolved layouts to the encodings. The updated encodings should
+/// have enough information for other lowering transformations.
+/// TODO(hanchung): Add support for stream.tensor.load ops and


there's some TBD work to make load/store better - it'll likely require the same behavior as with the current implementation and is something we'll want to do earlier on in flow as otherwise we'll need a stream builtin that can adjust to different data types and perform the conversion (for multiple elements) or something like a switch that translates the load/store indices (for single element) on the host.

Is there an issue ID that I can link with? Otherwise, I'd just put my handle to the TODO. :)

hanhanW · 2025-01-20T13:31:45Z

Hmm transfer ops are ones where we can't really have encoding changes or possibly even encodings at all - we could at the cost of never hitting the DMA path with them, and that's probably bad. We may have to brainstorm on that -- today I'd say we need to add a trait (SameOperandsAndResultEncodings) to these ops and verify that they can be implemented with memcpy but I'm not sure even that is enough. Basically, if any transfer op besides clone (which is "entire tensor") has an encoding we have to generate a dispatch (codegen it, bundle it with binary, and ship it) for that specific set of operand and result encodings and those will run on execution units instead of DMA ones (as we can't cuMemcpy a slice of an encoded tensor if that has funny padding, or can't have cuMemcpy swizzle/change encodings for us). If you were to run any code doing this today with this change it will be incorrect (unless it's an entire tensor and the same operand and result encoding).

Such conversions to dispatches happen in MaterializeBuiltinsPass - e.g., if you try to fill with an i64 value we have to use a dispatch as GPUs don't support i64 fills. That pass would need to look at any of the ops, decide if they can be implemented with memcpy/memset, and if not emit new executables with the custom load/store with the encodings.

We may have to look at some IR and see if we can handle that better earlier on - after MaterializeBuiltinsPass we have to have everything be either supportable by devices or turned into dispatches, but that's pretty late in the flow and we want to try to guarantee that we rarely (if ever) end up down those paths (we're properly fusing with producers/consumers).

IOW I'm worried this may not be implementable - we need to iterate a bit.

Thanks for all the details, very helpful! I honestly don't need to have them supported at this moment. This PR is mainly for the completeness, because I'd like to make sure that we can enable the pass by default in the future. I add a hard check on tensor transfer ops (i.e., clone op, slice op, and update op) for now. I think we don't have such case today. And only the fill op supports the specialization. We can look at the IR together when there is a need. What do you think?

The revision adds the support for the rest of AffinityOp that have TensorPhase trait, i.e., TensorCloneOp, TensorSliceOp, TensorFillOp, and TensorUpdateOp ops. There are two stream tensor ops do not implement the AffinityOpInterface, so they are not supported within the revision. They are stream.tensor.load op and stream.tensor.store op. We should be able to track the resource affinity for these two ops, and it requires additional analysis. Thus, they are not scoped within the revision. The revision also adds the missing documentation to the `addLayoutsToTensorPhaseOps` method. Signed-off-by: hanhanW <hanhan0912@gmail.com>

benvanik · 2025-02-06T16:35:25Z

compiler/src/iree/compiler/Dialect/Stream/Transforms/test/specialize_encodings.mlir

+#map0 = affine_map<(m, n, k) -> (m, k)>
+#map1 = affine_map<(m, n, k) -> (k, n)>
+#map2 = affine_map<(m, n, k) -> (m, n)>
+#executable_target_vmvx_bytecode_fb = #hal.executable.target<"vmvx", "vmvx-bytecode-fb", {encoding = #iree_cpu.vmvx_encoding_layout<>}>


you can use your nice new test attrs here!

very good point!

Signed-off-by: hanhanW <hanhan0912@gmail.com>

…ityOp. (iree-org#19726)" This reverts commit 5f7b471.

hanhanW requested a review from lialan January 17, 2025 08:51

hanhanW requested a review from benvanik as a code owner January 17, 2025 08:51

benvanik requested changes Jan 17, 2025

View reviewed changes

hanhanW requested a review from benvanik January 20, 2025 13:32

hanhanW mentioned this pull request Jan 20, 2025

Add support for executable duplication in encoding specialization pass. #19527

Closed

hanhanW force-pushed the specialize-encodings-7-n-for-review branch from 91648eb to cade205 Compare February 5, 2025 03:27

hanhanW force-pushed the specialize-encodings-7-n-for-review branch from cade205 to 7778585 Compare February 5, 2025 03:28

hanhanW requested a review from kuhar February 6, 2025 13:04

benvanik approved these changes Feb 6, 2025

View reviewed changes

hanhanW added 2 commits February 7, 2025 00:48

Merge branch 'main' into specialize-encodings-7-n-for-review

a17fb0b

Signed-off-by: hanhanW <hanhan0912@gmail.com>

Switch to use testing attributes

aa43f0f

Signed-off-by: hanhanW <hanhan0912@gmail.com>

hanhanW enabled auto-merge (squash) February 6, 2025 17:27

hanhanW merged commit 5f7b471 into iree-org:main Feb 6, 2025
42 checks passed

hanhanW deleted the specialize-encodings-7-n-for-review branch February 6, 2025 17:27

Alex-Vasile pushed a commit to Alex-Vasile/iree that referenced this pull request Feb 10, 2025

Revert "[Stream] Add layouts to encodings for all stream tensor Affin…

d9e6dc7

…ityOp. (iree-org#19726)" This reverts commit 5f7b471.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Stream] Add layouts to encodings for all stream tensor AffinityOp. #19726

[Stream] Add layouts to encodings for all stream tensor AffinityOp. #19726

hanhanW commented Jan 17, 2025 •

edited

Loading

benvanik left a comment

benvanik Jan 17, 2025

benvanik Jan 17, 2025

benvanik Jan 17, 2025

hanhanW Jan 20, 2025

hanhanW commented Jan 20, 2025

benvanik Feb 6, 2025

hanhanW Feb 6, 2025

	/// Updates the update_encoding for `op`. The op have to define a
	/// Updates the update_encoding for `op`. The op has to define a

	/// Updates the target_encoding for `op`. The op have to define a
	/// Updates the target_encoding for `op`. The op has to define a

[Stream] Add layouts to encodings for all stream tensor AffinityOp. #19726

[Stream] Add layouts to encodings for all stream tensor AffinityOp. #19726

Conversation

hanhanW commented Jan 17, 2025 • edited Loading

benvanik left a comment

Choose a reason for hiding this comment

benvanik Jan 17, 2025

Choose a reason for hiding this comment

benvanik Jan 17, 2025

Choose a reason for hiding this comment

benvanik Jan 17, 2025

Choose a reason for hiding this comment

hanhanW Jan 20, 2025

Choose a reason for hiding this comment

hanhanW commented Jan 20, 2025

benvanik Feb 6, 2025

Choose a reason for hiding this comment

hanhanW Feb 6, 2025

Choose a reason for hiding this comment

hanhanW commented Jan 17, 2025 •

edited

Loading