Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DISCUSS] Revisit Buffer Boundary Check & Region Cover Check #138

Closed
spectrometerHBH opened this issue Oct 2, 2020 · 6 comments
Closed

Comments

@spectrometerHBH
Copy link
Collaborator

spectrometerHBH commented Oct 2, 2020

I've been revisiting compute_at and working on reverse_compute_at these days, and I've been thinking over the issue of buffer boundary check apache/tvm#6596.

I put some of my thoughts here for discussion.

A buffer boundary check problem in current compute_at

The basic logic of current compute_at(input_block, input_loop) is

  • do checks
  • calculate the consume requirement of consumers under input_loop (may be relaxed)
  • generate new loops outside input_block to cover the requirement (may produce more)

Hence, after compute_at, input_block may produce more than before, which consumes more input and can lead to access out of input buffers' boundary.

A clear solution to this problem is to add a predicate in block.where to do buffer boundary check.

Region Cover Check

I introduce region cover check in compute_at to restrict the data flow. It is still important in reverse_compute_at.
For example (see also #102 )

# Before
for i in [0,4):
    for k:
       opaque_produce(C[i*4, i*4 + 4])
for j in [3, 30):
      D[j] = C[j]+1

# After(wrong)
for i in [0, 4):
    for k:
       opaque_produce(C[i*4, i*4 + 4])
    for j in 4:
        D[j] = C[j]+1

# After(correct)
for i in [0, 4):
    for k:
       opaque_produce(C[i*4, i*4 + 4])
    for j in 4:
      if i*4+j in [3, 30):
        D[j] = C[j]+1

Before the transformation, D actually doesn't consume all the output of opaque_produce, and may consume elements out of the output of opaque_produce, we have to add a predicate to fix the data flow.

However, it is not trivial to judge whether the region cover is satisfied.

Region Cover Check is non-trivial

See also #45
To check if A's output covers B's input under some loop i, we can not relax A's output region, otherwise, the data flow will be problematic and the check is meaningless.

for ii in [0, 1)
    # A access {ii, ii+2, ii+4, ii+6}, which will be relaxed to [ii, ii+7)
    for io in [0, 4)
        A[io*2+ii] = io
# A's output actually doesn't cover B's input
# But with relaxation we will miss this fact
for j in [0, 7):
    B[j] = A[j]

compute_at(A, j) # which should have been banned by region cover check, but it will slip through
@spectrometerHBH
Copy link
Collaborator Author

cc @tqchen @Hzfengsy @junrushao1994

@spectrometerHBH
Copy link
Collaborator Author

spectrometerHBH commented Oct 2, 2020

An insight from above examples is that block itself is not enough, the detailed data flow and dependency between blocks matter. But to calculate the exact IO regions is intractable.

I'm trying to figure out ways to get around it.

@Hzfengsy
Copy link
Member

Hzfengsy commented Oct 2, 2020

As far as I know, if a block is continuous, (which means it can produce any part of the buffer A as we want). It always can be compute_at, but may bring repeated calculation. However, if a block has limited elements that can be produced (e.g., a block can only write A[1], A[3], but cannot write A[2], A[4]), the compute_at for this block must be baned.

@tqchen
Copy link
Contributor

tqchen commented Oct 2, 2020

In the particular case of reverse_compute_at. Let us focus on a more restricted condition than compute at that does not allow relaxation, which is most of our use cases. (e.g. conv2d follows bias add and bn

@tqchen
Copy link
Contributor

tqchen commented Oct 2, 2020

To also specific discuss the problem below, there are possibly two options to check A's covering, both are related:

  • K0: detect and analyze split/fuse patterns(aka quasi affine mappings), which is mainly something of interest
    • fuse pattern, two iterators => one iterator: x = xo * stride + xi, xi in [0, stride]
    • split pattern, one iterators => two iterators: xo = floordiv(x, stride), xi = floormod(x, stride)
  • K1: Introduce another interval analysis that is conservative(only rounds down)

NOTE that the iterator coupling matters if we want to do conservative analysis. For example xo * 3 + xi is only can only be rewritten to another iterator, if xo and xi are "independent iterators". So in order to do such analysis, we need to maintain a set of currently active iterators, and "consume" an iterator(e.g. it can no longer be used to followup transformations unless for a explicit subexpr look up) once it is used by a split or possibly fuse pattern.

The analysis goes as follows, for each subexpr, we want to map it to an iterator plus the conservative bound of that iterator.

  • The first thing we do is lookup e.g. if (xo * stride + xi, is already mapped to x) we return x.
  • The a bit more tricky part is how to deal with iterator splitting aka x % b0 and x / b0 case. Notably such splitting can happen in multiple ways(multiple splits). It seems to related to the canonical form(SplitExpr) in canonical simplify, where multiple splits(that are compatible) can be canonicalized to ((x[i] / b0[i]) % b1[i])

Related, canonical simplify: https://github.com/apache/incubator-tvm/blob/master/src/arith/canonical_simplify.cc#L88

@tqchen
Copy link
Contributor

tqchen commented Oct 11, 2020

apache/tvm#6667 Should provide the necessary utility. We might still make some enhancement based on the usecases we find(e.g. add predication => constraint, and better symbolic support) but the overal infra should solve the problem to some extent

@tqchen tqchen closed this as completed Dec 31, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants