Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

castAwayContractionLeadingOneDim introduces unnecessary transposes on outer unit dims #85691

Closed
KoolJBlack opened this issue Mar 18, 2024 · 0 comments · Fixed by #85694
Closed

Comments

@KoolJBlack
Copy link
Contributor

During IREE's mmt4d lowering, we have a vector to matrix product represented by vector.contract in transit of the following form:

  %result = vector.contract {indexing_maps = [affine_map<(d0, d1, d2, d3) -> (d0, d1, d3)>, affine_map<(d0, d1, d2, d3) -> (d0, d2, d3)>, affine_map<(d0, d1, d2, d3) -> (d1, d2)>], iterator_types = ["parallel", "parallel", "parallel", "reduction"], kind = #vector.kind<add>} %lhs, %rhs, %acc : vector<1x1x8xi32>, vector<1x8x8xi32> into vector<1x8xi32>

Passing this through castAwayContractionLeadingOneDim pattern produces the following:

    %0 = vector.transpose %arg0, [1, 0, 2] : vector<1x1x8xi32> to vector<1x1x8xi32>
    %1 = vector.extract %0[0] : vector<1x8xi32> from vector<1x1x8xi32>
    %2 = vector.extract %arg2[0] : vector<8xi32> from vector<1x8xi32>
    %3 = vector.contract {indexing_maps = [affine_map<(d0, d1, d2) -> (d0, d2)>, affine_map<(d0, d1, d2) -> (d0, d1, d2)>, affine_map<(d0, d1, d2) -> (d1)>], iterator_types = ["parallel", "parallel", "reduction"], kind = #vector.kind<add>} %1, %arg1, %2 : vector<1x8xi32>, vector<1x8x8xi32> into vector<8xi32>
    %4 = vector.broadcast %3 : vector<8xi32> to vector<1x8xi32>

The vector.transpose introduced adds additional leading dimensionality to the overall flow and cannot be trivially reduced further down. This is adding a challenge for subsequent patterns to process vec_x_matrix contracts properly.

The cause is the pattern requiring the cast away dimensions to be outermost for all operands while being driven by the accumulator of the contract.

Thoughts

In practice, transposing outer unit dimensions in this instance does not affect the underlying data layout. This transpose could be omitted before the patterns final output. Alternatively, a transpose canonicalizer that can fold transposes for outer leading unit dimensions could also apply here.

KoolJBlack added a commit that referenced this issue Apr 3, 2024
…sposes solely on leading unit dims. (#85694)

Updates `castAwayContractionLeadingOneDim` to check for leading unit
dimensions before inserting `vector.transpose` ops.

Currently `castAwayContractionLeadingOneDim` removes all leading unit
dims based on the accumulator and transpose any subsequent operands to
match the accumulator indexing. This does not take into account if the
transpose is strictly necessary, for instance when given this
vector-matrix contract:
```mlir
  %result = vector.contract {indexing_maps = [affine_map<(d0, d1, d2, d3) -> (d0, d1, d3)>, affine_map<(d0, d1, d2, d3) -> (d0, d2, d3)>, affine_map<(d0, d1, d2, d3) -> (d1, d2)>], iterator_types = ["parallel", "parallel", "parallel", "reduction"], kind = #vector.kind<add>} %lhs, %rhs, %acc : vector<1x1x8xi32>, vector<1x8x8xi32> into vector<1x8xi32>
```
Passing this through `castAwayContractionLeadingOneDim` pattern produces
the following:
```mlir
    %0 = vector.transpose %arg0, [1, 0, 2] : vector<1x1x8xi32> to vector<1x1x8xi32>
    %1 = vector.extract %0[0] : vector<1x8xi32> from vector<1x1x8xi32>
    %2 = vector.extract %arg2[0] : vector<8xi32> from vector<1x8xi32>
    %3 = vector.contract {indexing_maps = [affine_map<(d0, d1, d2) -> (d0, d2)>, affine_map<(d0, d1, d2) -> (d0, d1, d2)>, affine_map<(d0, d1, d2) -> (d1)>], iterator_types = ["parallel", "parallel", "reduction"], kind = #vector.kind<add>} %1, %arg1, %2 : vector<1x8xi32>, vector<1x8x8xi32> into vector<8xi32>
    %4 = vector.broadcast %3 : vector<8xi32> to vector<1x8xi32>
```
The `vector.transpose` introduced does not affect the underlying data
layout (effectively a no op), but it cannot be folded automatically.
This change avoids inserting transposes when only leading unit
dimensions are involved.

Fixes #85691
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants