-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support Boundary Checking for Loop Dependent Iterators #6596
Comments
Just a kind reminder that #5130 has been there for half a year :( |
I see, looking into the case. Both this example and #5130 seems to appear when the iteration is dependent on an outer loop. And we will need to find a detector to insert the condition properly when generating the loops. The current compute bound insertion assumes the loop bounds to be not iterator dependent. It might also be useful to think about what can be done and (cannot be done) in iterator dependent loops. As the original assumption of compute means the axis should be independent from each other for certain scheduling correctness to hold. In this case, a dependent reduction bound would certainly bring different implications(for example, it no longer makes sense to reorder spatial axis and reduction), and possible different analysis for the scheduling to hold correctly. A contribution to make the enhancement for this case would be more than welcomed. However, we should also think about the long term implications and how to correctly support this kind of workloads |
Also cc @spectrometerHBH @Hzfengsy since this might bring some fruit for thoughts for the TIR schedule design |
The block isolation in TVM's Tensor IR would solve the problem. |
Currently TVM's boundary check avoids some invalid global memory access, it ignores the case when the arguments in
reduce_axis
requires global memory accessing (to an index tensor, this is common when dealing with sparse tensor/ragged tensors).Below is a simple example (segment sum) to reproduce the problem, what it did is basically is:
x
and a offset(indicates the segment information) tensoroffsets
(starts with0
and ends with the length ofx
).i
, compute the sum of elements inside segment inx
:sum(x[offsets[i]:offsets[i+1]])
, and store the results inout[i]
.Below is the generated code
Note that in
for (rv_1: int32, 0, ((int32*)offsets_2[(((blockIdx.x*4) + threadIdx.x) + 1)] - (int32*)offsets_2[((blockIdx.x*4) + threadIdx.x)])) {
, the memory access to offsets_2 is not protected thus incurring invalid memory access error when((blockIdx.x*4) + threadIdx.x)
is greater thennum_elements
.If we change the order of the if-statement and the for-loop, the program should work correctly:
The bug was also mentioned in TVM forum.
I think this error is related to https://github.com/apache/incubator-tvm/blob/f13fed55cfe872ba7f40970f6a35f965d186a30a/src/tir/transforms/bound_checker.cc, I wonder how could I change it to be aware of global memory access in
reduce_axis
?cc @junrushao1994
The text was updated successfully, but these errors were encountered: