-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Codegen][Metal] Disable cross-function call in Metal codegen #16033
[Codegen][Metal] Disable cross-function call in Metal codegen #16033
Conversation
CC @Lunderberg would you mind taking a look? BTW, is there a way to unit-test it as you enabled a while ago? |
Thanks @MasterJH5574 , can you also document the rationale of that we specialize this part. aka we do not have multiple cross function calls, so we can keep the delcaration once to keep things simple |
Reading the failure reason elaborated here #15835 (comment), I feel the way to reproduce or test the issue is to
|
If possible, I'd like to fix the functionality and not just revert it. #14985 and #14862 have been on the back burner for quite a while, mainly due to breakage they cause in @MasterJH5574 We should be able to test it in CI, since I added compile-only testing of the metal backend in #15756. Functionality that isn't tested in CI is inherently unstable, and so CI testing is preferable in addition to local testing. |
The two steps to add a Metal specific unit test are to (1) decorate the test with |
@Lunderberg I agree that it would be better if the issue can be fixed! For this PR, the overall codegen still bases on the updated codegen flow (we did not revert the previous PR). There is no inter function calls between kernels in Metal, so it is okay to keep the former flow in Metal codegen for now as a swift fix to the codegen issue. And thanks for the pointer on testing. I will add a test soon. |
There aren't currently any inter-function calls in Metal, but it is part of my planned implementation for #14985. Breaking a change apart into the underlying infrastructure changes (e.g. codegen support) and the usage of that infrastructure allows the changes to be tested and integrated separately. |
That said, I have an idea for how to work around the lack of inter-function calls in codegen. Since we can represent inter-function calls in TIR, we could add a |
I like the idea of keeping things simple on specific shader backend like metal, inlining sounds like a good approach |
5b4b175
to
d374a05
Compare
Finished adding a test and included it in “minimal Metal compile-only.” Also added a note in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given the special packing used for the scalar buffers, and that there's a possibility of a workaround for the limitation in Metal, I think these sections makes sense to revert.
In the void CodeGenMetal::VisitExpr_(const CallNode* op) override
, can we add an explicit check for the inter-function call? Otherwise, this falls back to the CodegenC
default that support it.
CHECK(!op->op.as<GlobalVarNode>())
<< "CodegenMetal does not support inter-function calls, "
<< "but expression " << GetRef<Expr>(op)
<< " calls PrimFunc " << op->op;
This PR restores the Metal codegen to the one before apache#15835. Due to there will likely be no internal function call in Metal, we think it is safe to do so. Verified that with this PR, the metal codegen and iPhone codegen will not fail and will work properly. The reason of the iPhone codegen failure is because the multiple declarations of a same function will lead to multiple emissions of a same structs, which is not recognizable by the metal compiler.
d374a05
to
b95db83
Compare
Thanks @Lunderberg for the suggestions! Added the check in CallNode visit function. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you, and looks good to me! The unit test looks really good; I especially like the explanation of the known portions of the failure mode.
The functionality to express a call from one `PrimFunc` to another was introduced in apache#14889. While this was initially planned to be supported at codegen for all targets (see apache#15835), this resulted in breakage on some backends (see apache#16033). After discussion, the plan was changed to support TIR inlining, which would enable the same high-level functionality in TIR without requiring immediate low-level support across all codegens. This commit implements and tests a new IRModule transform `InlinePrivateFunctions`, which can be used as part of lowering in a follow-up commit. Because this is initially implemented for use quite late in the lowering flow, many constructs are not currently supported. The current implementation has the following restrictions. * `tir::Block` nodes may not occur in the inlined function. Because a subroutine may be called multiple times, inlining of a subroutine that contains `tir::Block` would result in non-unique names. Support of subroutines with `tir::Block` instances will require de-duplication of block names. * The subroutine's callsite must occur within a `tir::Evaluate` block. Because inlining a subroutine inserts the `tir::Stmt` body at the point of use, replacement must occur in a context where a `tir::Stmt` can be returned. Support of subroutines that are called within an expression (e.g. Replacing `func` in `Buf[0] = func(1) + func(2)`) would require hoisting preprocessing done in the subroutine to the parent `tir::Stmt`. * The subroutine may only accept primitive arguments, and must have an empty `buffer_map`. Support of subroutines that are called with `tir::Buffer` or `tir::BufferRegion` arguments would require a way to represent these arguments at the callsite, and substitution of the buffer into the callee. If these unsupported constructs are used, then the inlining does is skipped. This commit includes unit tests for these unsupported constructs, to validate that `InlinePrivateFunctions` produces well-formed output even when they are present.
The functionality to express a call from one `PrimFunc` to another was introduced in apache#14889. While this was initially planned to be supported at codegen for all targets (see apache#15835), this resulted in breakage on some backends (see apache#16033). After discussion, the plan was changed to support TIR inlining, which would enable the same high-level functionality in TIR without requiring immediate low-level support across all codegens. This commit implements and tests a new IRModule transform `InlinePrivateFunctions`, which can be used as part of lowering in a follow-up commit. Because this is initially implemented for use quite late in the lowering flow, many constructs are not currently supported. The current implementation has the following restrictions. * `tir::Block` nodes may not occur in the inlined function. Because a subroutine may be called multiple times, inlining of a subroutine that contains `tir::Block` would result in non-unique names. Support of subroutines with `tir::Block` instances will require de-duplication of block names. * The subroutine's callsite must occur within a `tir::Evaluate` block. Because inlining a subroutine inserts the `tir::Stmt` body at the point of use, replacement must occur in a context where a `tir::Stmt` can be returned. Support of subroutines that are called within an expression (e.g. Replacing `func` in `Buf[0] = func(1) + func(2)`) would require hoisting preprocessing done in the subroutine to the parent `tir::Stmt`. * The subroutine may only accept primitive arguments, and must have an empty `buffer_map`. Support of subroutines that are called with `tir::Buffer` or `tir::BufferRegion` arguments would require a way to represent these arguments at the callsite, and substitution of the buffer into the callee. If these unsupported constructs are used, then the inlining of those functions is skipped. This commit includes unit tests for these unsupported constructs, to validate that `InlinePrivateFunctions` produces well-formed output even when they are present.
Circling back to this, I have PR #16184 submitted, which adds a TIR transform that can inline private function calls. That way, the user-facing functionality of TIR subroutines can be used, without requiring every codegen to natively support subroutines. |
The functionality to express a call from one `PrimFunc` to another was introduced in apache#14889. While this was initially planned to be supported at codegen for all targets (see apache#15835), this resulted in breakage on some backends (see apache#16033). After discussion, the plan was changed to support TIR inlining, which would enable the same high-level functionality in TIR without requiring immediate low-level support across all codegens. This commit implements and tests a new IRModule transform `InlinePrivateFunctions`, which can be used as part of lowering in a follow-up commit. Because this is initially implemented for use quite late in the lowering flow, many constructs are not currently supported. The current implementation has the following restrictions. * `tir::Block` nodes may not occur in the inlined function. Because a subroutine may be called multiple times, inlining of a subroutine that contains `tir::Block` would result in non-unique names. Support of subroutines with `tir::Block` instances will require de-duplication of block names. * The subroutine's callsite must occur within a `tir::Evaluate` block. Because inlining a subroutine inserts the `tir::Stmt` body at the point of use, replacement must occur in a context where a `tir::Stmt` can be returned. Support of subroutines that are called within an expression (e.g. Replacing `func` in `Buf[0] = func(1) + func(2)`) would require hoisting preprocessing done in the subroutine to the parent `tir::Stmt`. * The subroutine may only accept primitive arguments, and must have an empty `buffer_map`. Support of subroutines that are called with `tir::Buffer` or `tir::BufferRegion` arguments would require a way to represent these arguments at the callsite, and substitution of the buffer into the callee. If these unsupported constructs are used, then the inlining of those functions is skipped. This commit includes unit tests for these unsupported constructs, to validate that `InlinePrivateFunctions` produces well-formed output even when they are present.
* [TIR] Update DeclBuffer nodes when specializing PrimFunc Prior to this commit, a buffer whose parameters (e.g. shape/stride) contained a specialized parameter would not be updated when appearing in a `DeclBuffer` node. This commit updates the `Specialize` function to update buffers that occur in `DeclBuffer` nodes. * [TIR] Handle specialization that remaps a buffer var * [TIR] Handle specialization of buffer variable to PrimExpr * [TIR][Transform] Implement InlinePrivateFunctions The functionality to express a call from one `PrimFunc` to another was introduced in #14889. While this was initially planned to be supported at codegen for all targets (see #15835), this resulted in breakage on some backends (see #16033). After discussion, the plan was changed to support TIR inlining, which would enable the same high-level functionality in TIR without requiring immediate low-level support across all codegens. This commit implements and tests a new IRModule transform `InlinePrivateFunctions`, which can be used as part of lowering in a follow-up commit. Because this is initially implemented for use quite late in the lowering flow, many constructs are not currently supported. The current implementation has the following restrictions. * `tir::Block` nodes may not occur in the inlined function. Because a subroutine may be called multiple times, inlining of a subroutine that contains `tir::Block` would result in non-unique names. Support of subroutines with `tir::Block` instances will require de-duplication of block names. * The subroutine's callsite must occur within a `tir::Evaluate` block. Because inlining a subroutine inserts the `tir::Stmt` body at the point of use, replacement must occur in a context where a `tir::Stmt` can be returned. Support of subroutines that are called within an expression (e.g. Replacing `func` in `Buf[0] = func(1) + func(2)`) would require hoisting preprocessing done in the subroutine to the parent `tir::Stmt`. * The subroutine may only accept primitive arguments, and must have an empty `buffer_map`. Support of subroutines that are called with `tir::Buffer` or `tir::BufferRegion` arguments would require a way to represent these arguments at the callsite, and substitution of the buffer into the callee. If these unsupported constructs are used, then the inlining of those functions is skipped. This commit includes unit tests for these unsupported constructs, to validate that `InlinePrivateFunctions` produces well-formed output even when they are present. * Updates based on review comments * ci bump * CI bump
This PR restores the Metal codegen to the one before #15835. Due to there will likely be no internal function call in Metal, we think it is safe to do so.
Verified that with this PR, the metal codegen and iPhone codegen will not fail and will work properly.
The reason of the iPhone codegen failure is because the multiple declarations of a same function will lead to multiple emissions of a same structs, which is not recognizable by the metal compiler.