-
Notifications
You must be signed in to change notification settings - Fork 81
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RFC] Introducing DeclBuffer #70
Conversation
Co-authored-by: Eric Lunderberg <Lunderberg@users.noreply.github.com>
Thanks a lot! I think then we can handle buffer related issues in customized passes with more explicit and robust way. I have one question on tir script, for certain algorithms in DL workloads, users may want to write non-stir formed script like x = T.allocate((), "int32", "")
x[()] = 0
while x[()] < 128:
x[()] = x[()] + 1
# ... Could the parser support still write things like that (though underlying IR structure changed) instead of x_data = T.allocate((), "int32", "")
x = T.decl_buffer(data=x_data,)
x[()] = 0
# ... |
@wrongtest Thanks for bringing up this. There are a few options for the behavior in TVM script, I'm open to discussion.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks @vinx13 , couple questions on this one as well
# Unresolved questions | ||
[unresolved-questions]: #unresolved-questions | ||
|
||
Should low-level code generators handle buffer aliases? One option would be to remove them in a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this basically asking whether a codegen should have to track the data variables, or whether we should introduce an explicit TIR node that more readily translates to an index-into-opaque-pointer with type info?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is asking whether buffer aliasing should be left to codegen (codegen should track the data variable), or buffer aliasing should be unified in an earlier pass.
rfcs/0070-introducing-decl-buffer.md
Outdated
However, `T.buffer_decl` doesn’t translate to a node in AST. The AST will be | ||
``` | ||
PrimFunc { | ||
buffer_map: {A: Buffer[(16,), "float"}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the A.data member should theoretically appear here, whether or not in repr, right? i think the central challenge here is determining that the data
for a buffer is A
's data
member. is that right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. it should be A_data: Buffer(data=A_data, ...)
rfcs/0070-introducing-decl-buffer.md
Outdated
data: A_data(Var(name=...)) | ||
extent: ... | ||
body: DeclBuffer { | ||
buffer: A(data=A_data, dtype=..., shape=...), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's always required to supply data
in decl_buffer
world, right? so this changes the process of identifying aliases by making it explicit that all backing buffers would come from tir.allocate nodes or from function arguments, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Supplying data
in decl_buffer
is preferred. I mentioned an alternative in line 88, e.g
DeclBuffer {
buffer: A(data=A_data(Var(...)), dtype=..., shape=...),
body: Allocate{
data=A_data,
body: ...
}
}
The question here is whether this should be allowed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think it would be great to just be prescriptive here so that people don't try to write 10 different forms of buffer declaration. we can always expand the set of supported use cases if people can supply rationale, but reducing them after people check in a bunch of code without design docs is pretty tricky. to that end, could we clarify where we expect to get a data
value from?
I also am not quite following this notation: A_data(Var(...))
. Is A_data
somehow a function here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The question here is whether this should be allowed
I think I'd lean toward removing the alternative form and requiring the definition of the data
variable to be defined prior to use in a DeclBuffer
node in the PrimFunc's TIR. If a user gets an error message that states A is undefined
, they may point at the DeclBuffer
node and say that of course it is defined, conflating the definition of the buffer and the definition of the buffer's backing allocation. The confusion is especially likely, because the default behavior of tvm.tir.decl_buffer
is to name the BufferNode
and the VarNode
with the same name. By making it impossible to have a buffer without a defined data
variable, we avoid potential confusion about why a buffer is partially undefined.
that all backing buffers would come from tir.allocate nodes or from function arguments
They could also come from return values of functions. After MakePackedAPI
, the backing buffers are the return value of @tir.tvm_struct_get
. It could also be an entirely separate function call, such as data: T.Ptr[T.int32] = T.call_extern("device_specific_malloc", 1024, dtype="handle")
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is A_data somehow a function here?
@areusch It should be A_data{Var(data = ..., )}
. A_data
is a variable.
I think I'd lean toward removing the alternative form and requiring the definition of the data variable to be defined prior to use in a DeclBuffer node in the PrimFunc's TIR.
@Lunderberg That's also what I thought. Since we all agree on this, I'll update the RFC to be prescriptive here.
They could also come from return values of functions.
Thanks for pointing out. I'll update it accordingly.
store the physical shapes of the buffers. The change of the information stored in `buffer_map` can | ||
be confusing. These two maps can be unified into a single `buffer_map` that defines the logical | ||
shapes of the input buffers. The buffer access in physical shape, which is an internal behavior of | ||
`PrimFunc` after flattening, can be achieved by using `DeclBuffer` to create buffer aliases in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will there be an easy way to do this implemented as part of this RFC? @Lunderberg mentioned something about a utility function to at least compute the flattened buffer_map.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Lunderberg has a prototype for this apache/tvm#10940, it is based on implicit buffer aliasing. This paragraph is suggesting moving to explicit buffer aliasing using DeclBuffer
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's correct, the buffer flattening would be done through buffer aliasing. After apache/tvm#10940, the use of N-d/flattened buffers would be as below, depending on where you are in the lowering flow.
- Before
BufferFlatten
/FlattenStorage
: Buffers are declared in thebuffer_map
, and are not flattened. Buffer access is done using N-d unflattened indices. - After
BufferFlatten
/FlattenStorage
, but beforeMakePackedAPI
: Buffers are declared in thebuffer_map
, and are not flattened. Buffer access is done through a buffer alias, where the alias shares the same data pointer, but has a flattened shape and is accessed with flattened indices. - After
MakePackedAPI
: Thebuffer_map
is empty. Declarations of flattened buffers are done using the handles extracted usingtvm_struct_get
. These flattened buffers are accessed with flattened indices.
As I understand it, this RFC would impact the TIR during parts 2 and 3. Step 2 would have an explicit DeclBuffer
to mark the aliasing, rather than an implicit aliasing based on re-used Var Buffer::data
. Step 3 would have an explicit DeclBuffer
to mark the use of the T.handle
in a buffer, rather than being implicit in the first use.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for elaborating that! could we add these to the RFC?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, @vinx13. That's a super cool RFC, enabling all buffers to be defined before use. It looks good to me. Looking forward to following PRs.
@wrongtest I've thought about the option A3 vs A4. From the parsing / translation from TVM script to TIR, it is acceptable to have |
A gentle ping. Would be great to have a status checkin and let us move to make this happen |
Seems we all agree that introducing
When printing them from TIR to TVMScript, we still print in unsugared form in B1. Note that in B2, it is not feasible to make
would love to know what you think @wrongtest @Hzfengsy |
We always need the original syntax to enable bidirectional property. It does not hurt though to enable sugars that allows us to have a combination on top. How about we start with the original syntax and then discuss sugar as a separate topic. On the sugar part, perhaps we can reuse the alloc-buffer keyword |
reuse T.alloc_buffer seems good,as long as there is no ambiguity for parser impl :) |
@@ -136,11 +138,26 @@ def elemwise(A: T.Buffer[(16, 16), "float32"], C: T.Buffer[(16, 16), "float32"]) | |||
C_flattened[i * 16 + j] = A[i * 16 + j] | |||
``` | |||
|
|||
Specifically, the updated flow of buffer flattening using `DeclBuffer` will be: | |||
1. Before `FlattenBuffer/StorageFlatten`: Buffers are declared in the `buffer_map`, and are not flattened. Buffer access is done using N-d unflattened indices. | |||
2. After `FlattenBuffer/StorageFlatten`, but before `MakePackedAPI`: Buffers are declared in the `buffer_map`, and are not flattened. Buffer access is done through a buffer alias explicitly created via `DeclBuffer`, where the alias shares the same data pointer, but has a flattened shape and is accessed with flattened indices. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for some reason i thought our previous discussion was that Buffer would always contain unflattened shape and there would be a separate function that computes the flattened shape on the fly. did that change? if so why?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is not changed. The buffer in buffer_map
is always unflattened. After FlattenBuffer/StorageFlatten
, we will create a flattened view of the buffer using buffer alias, and in IR all accesses to the buffer will be done via the flattened view. This is because from the perspective of the calling convention of PrimFunc
, we always expect the input to have unflattened shape (e.g. the shape of DLTensor
). But internally in the IR, we need to flatten to physical shape and indices which is how it is done in codegen and runtime
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh sorry, i see now. the text is very clear i just misread it :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for bearing with me @vinx13 !
`PrimFuncNode::preflattened_buffer_map` was introduced in #9727, in order to maintain a record of the pre-flattened buffer shape until it can be used in `MakePackedAPI`. This commit instead maintains the pre-flattened shapes in `PrimFuncNode::buffer_map`, while the body of the function uses a flattened buffer alias, as described in [RFC#70](apache/tvm-rfcs#70)
`PrimFuncNode::preflattened_buffer_map` was introduced in apache#9727, in order to maintain a record of the pre-flattened buffer shape until it can be used in `MakePackedAPI`. This commit instead maintains the pre-flattened shapes in `PrimFuncNode::buffer_map`, while the body of the function uses a flattened buffer alias, as described in [RFC#70](apache/tvm-rfcs#70)
This is a follow-up of apache/tvm#9727 and RFC#63. Currently buffer can be implicitly declared and then used. The implicit behavior can be error prone and makes analysis more difficult. This RFC introduces
DeclBuffer
, a new IR construct as an explicit statement for buffer declaration.Rendered version: https://github.com/vinx13/tvm-rfcs/blob/decl-buffer/rfcs/0070-introducing-decl-buffer.md
cc @tqchen @Lunderberg @junrushao1994 @csullivan @mbs-octoml @jroesch @areusch @wrongtest @Hzfengsy