-
Notifications
You must be signed in to change notification settings - Fork 654
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Stream] Add layouts to encodings for all stream tensor AffinityOp. #19726
[Stream] Add layouts to encodings for all stream tensor AffinityOp. #19726
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm transfer ops are ones where we can't really have encoding changes or possibly even encodings at all - we could at the cost of never hitting the DMA path with them, and that's probably bad. We may have to brainstorm on that -- today I'd say we need to add a trait (SameOperandsAndResultEncodings
) to these ops and verify that they can be implemented with memcpy but I'm not sure even that is enough. Basically, if any transfer op besides clone (which is "entire tensor") has an encoding we have to generate a dispatch (codegen it, bundle it with binary, and ship it) for that specific set of operand and result encodings and those will run on execution units instead of DMA ones (as we can't cuMemcpy a slice of an encoded tensor if that has funny padding, or can't have cuMemcpy swizzle/change encodings for us). If you were to run any code doing this today with this change it will be incorrect (unless it's an entire tensor and the same operand and result encoding).
Such conversions to dispatches happen in MaterializeBuiltinsPass - e.g., if you try to fill with an i64 value we have to use a dispatch as GPUs don't support i64 fills. That pass would need to look at any of the ops, decide if they can be implemented with memcpy/memset, and if not emit new executables with the custom load/store with the encodings.
We may have to look at some IR and see if we can handle that better earlier on - after MaterializeBuiltinsPass we have to have everything be either supportable by devices or turned into dispatches, but that's pretty late in the flow and we want to try to guarantee that we rarely (if ever) end up down those paths (we're properly fusing with producers/consumers).
IOW I'm worried this may not be implementable - we need to iterate a bit.
return success(); | ||
} | ||
|
||
/// Updates the update_encoding for `op`. The op have to define a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/// Updates the update_encoding for `op`. The op have to define a | |
/// Updates the update_encoding for `op`. The op has to define a |
@@ -141,6 +159,52 @@ updateResultEncoding(RewriterBase &rewriter, OpTy op, | |||
return success(); | |||
} | |||
|
|||
/// Updates the target_encoding for `op`. The op have to define a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/// Updates the target_encoding for `op`. The op have to define a | |
/// Updates the target_encoding for `op`. The op has to define a |
/// tensor type, the method resolves the layouts, strips outdated information, | ||
/// and adds the resolved layouts to the encodings. The updated encodings should | ||
/// have enough information for other lowering transformations. | ||
/// TODO(hanchung): Add support for stream.tensor.load ops and |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there's some TBD work to make load/store better - it'll likely require the same behavior as with the current implementation and is something we'll want to do earlier on in flow as otherwise we'll need a stream builtin that can adjust to different data types and perform the conversion (for multiple elements) or something like a switch that translates the load/store indices (for single element) on the host.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there an issue ID that I can link with? Otherwise, I'd just put my handle to the TODO. :)
Thanks for all the details, very helpful! I honestly don't need to have them supported at this moment. This PR is mainly for the completeness, because I'd like to make sure that we can enable the pass by default in the future. I add a hard check on tensor transfer ops (i.e., clone op, slice op, and update op) for now. I think we don't have such case today. And only the fill op supports the specialization. We can look at the IR together when there is a need. What do you think? |
91648eb
to
cade205
Compare
The revision adds the support for the rest of AffinityOp that have TensorPhase trait, i.e., TensorCloneOp, TensorSliceOp, TensorFillOp, and TensorUpdateOp ops. There are two stream tensor ops do not implement the AffinityOpInterface, so they are not supported within the revision. They are stream.tensor.load op and stream.tensor.store op. We should be able to track the resource affinity for these two ops, and it requires additional analysis. Thus, they are not scoped within the revision. The revision also adds the missing documentation to the `addLayoutsToTensorPhaseOps` method. Signed-off-by: hanhanW <hanhan0912@gmail.com>
cade205
to
7778585
Compare
#map0 = affine_map<(m, n, k) -> (m, k)> | ||
#map1 = affine_map<(m, n, k) -> (k, n)> | ||
#map2 = affine_map<(m, n, k) -> (m, n)> | ||
#executable_target_vmvx_bytecode_fb = #hal.executable.target<"vmvx", "vmvx-bytecode-fb", {encoding = #iree_cpu.vmvx_encoding_layout<>}> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can use your nice new test attrs here!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
very good point!
Signed-off-by: hanhanW <hanhan0912@gmail.com>
Signed-off-by: hanhanW <hanhan0912@gmail.com>
…ityOp. (iree-org#19726)" This reverts commit 5f7b471.
The revision adds the support for the rest of AffinityOp that have TensorPhase trait, i.e., TensorCloneOp, TensorSliceOp, TensorFillOp, and TensorUpdateOp ops. It is tricky to handle encodings for transfer ops, so only the encoding in the fill op is updated. If other operations have tensor encodings, it returns a failure for now.
There are two stream tensor ops do not implement the AffinityOpInterface, so they are not supported within the revision. They are stream.tensor.load op and stream.tensor.store op. We should be able to track the resource affinity for these two ops, and it requires additional analysis. Thus, they are not scoped within the revision.
The revision also adds the missing documentation to the
addLayoutsToTensorPhaseOps
method.