Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Midend] Enhancements and Optimizations and [Examples] Added MLIRLinalg Examples for Various Optimization Options #384

Open
wants to merge 33 commits into
base: main
Choose a base branch
from

Conversation

6somehow
Copy link

[Midend] Enhancements and Optimizations

  • BatchMatMul:

    • Updated unroll, tile, and vectorization strategies for BatchMatMul operations.
    • Enhanced SCF (Structured Control Flow) version for BatchMatMul to support further optimization and work.
  • Conv2D NHWC-FHWC:

    • Improved vectorization and tiling techniques for Conv2D operations in NHWC-FHWC layout.
    • These updates provide a vectorization option and a tile+vectorization option for buddy-opt.
  • Depthwise Conv2D NHWC-HWC:

    • Introduced vectorization and tiling strategies for Depthwise Conv2D in NHWC-HWC format, ensuring more efficient execution.

[Examples] Added MLIRLinalg Examples for Various Optimization Options

New MLIRLinalg examples were added to demonstrate the usage of new buddy-opt various optimization options , including:

  1. conv-nhwc-fhwc-optimize: Conv2D NHWC-FHWC layout vectorization optimization.
  2. conv-nhwc-fhwc-tile-optimize: Conv2D NHWC-FHWC vectorization with tiling optimizations.
  3. depthwise-conv-nhwc-hwc-optimize: Depthwise Conv2D NHWC-HWC
    vectorization layout optimization.
  4. batchmatmul-tile-optimize: tiling and unroll optimization for BatchMatMul.
  5. batchmatmul-scf-optimize: using SCF optimization for BatchMatMul.

Example MLIR Files:

  • batchmatmul
  • conv2d_nhwc_fhwc
  • depthwise_conv_2d_nhwc_hwc

These updates collectively improve the performance of matrix and convolution operations in MLIR, providing optimized patterns for common workloads like BatchMatMul and convolution layers.

Somehow6 and others added 30 commits June 18, 2024 20:12
… scf version for further work. Update conv2dnhwcfhwc vectorization and tile. Update depthwise_conv2dnhwchwc vectorization and tile
…ize 2.conv-nhwc-fhwc-tile-optimize 3.depthwise-conv-nhwc-hwc-optimize 4.batchmatmul-tile-optimize 5.batchmatmul-scf-optimize . Example mlir: batchmatmul conv2d_nhwc_fhwc depthwise_conv_2d_nhwc_hwc
…ize 2.conv-nhwc-fhwc-tile-optimize 3.depthwise-conv-nhwc-hwc-optimize 4.batchmatmul-tile-optimize 5.batchmatmul-scf-optimize . Example mlir: batchmatmul conv2d_nhwc_fhwc depthwise_conv_2d_nhwc_hwc
llvm Outdated Show resolved Hide resolved
…ion [Examples] Added MLIRLinalg Examples for Various Optimization Options
…ion [Examples] Added MLIRLinalg Examples for Various Optimization Options
…ion [Examples] Added MLIRLinalg Examples for Various Optimization Options. fixed thirdparty.
loc, AffineMap::get(1, 0, d0.ceilDiv(tilingOC)), OC);

// clang format off
// Step 1: Create outer most loops.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have Step 2 in this file? If not, the word Step had better be removed.

Value FW = rewriter.create<memref::DimOp>(loc, filter, 2); // FW

// clang format off
// Step 1: Create outer most loops.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have Step 2 in this file? If not, the word Step had better be removed.

Comment on lines +63 to +87
linalg-conv2d_nhwc_fhwc-optimize-lower:
@${BUDDY_OPT} linalg-conv2d_nhwc_fhwc.mlir \
-conv-nhwc-fhwc-optimize="vec-size=16" \
-o ./log.mlir

linalg-conv2d_nhwc_fhwc-tile-optimize-lower:
@${BUDDY_OPT} linalg-conv2d_nhwc_fhwc.mlir \
-conv-nhwc-fhwc-tile-optimize="vec-size=16 tiling-height=2 tiling-width=3" \
-o ./log.mlir

linalg-conv2d_nhwc_fhwc-optimize-run:
@${BUDDY_OPT} linalg-conv2d_nhwc_fhwc.mlir ${MLIR_OPT_OPTIONS} \
-conv-nhwc-fhwc-optimize="vec-size=16" \
-lower-affine -convert-scf-to-cf \
-convert-vector-to-llvm -finalize-memref-to-llvm -convert-arith-to-llvm \
-convert-func-to-llvm -reconcile-unrealized-casts | \
${MLIR_CPU_RUNNER} ${OPT_FLAG} -e main -entry-point-result=void -shared-libs=${MLIR_RUNNER_UTILS} -shared-libs=${MLIR_C_RUNNER_UTILS}

linalg-conv2d_nhwc_fhwc-tile-optimize-run:
@${BUDDY_OPT} linalg-conv2d_nhwc_fhwc.mlir ${MLIR_OPT_OPTIONS} \
-conv-nhwc-fhwc-tile-optimize="vec-size=16 tiling-height=2 tiling-width=3" \
-lower-affine -convert-scf-to-cf \
-convert-vector-to-llvm -finalize-memref-to-llvm -convert-arith-to-llvm \
-convert-func-to-llvm -reconcile-unrealized-casts | \
${MLIR_CPU_RUNNER} ${OPT_FLAG} -e main -entry-point-result=void -shared-libs=${MLIR_RUNNER_UTILS} -shared-libs=${MLIR_C_RUNNER_UTILS}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better to obey the order of the existing test methods. For example, the order of tests in this code should be:

  1. linalg-conv2d_nhwc_fhwc-optimize-lower
  2. linalg-conv2d_nhwc_fhwc-optimize-run
  3. linalg-conv2d_nhwc_fhwc-tile-optimize-lower
  4. linalg-conv2d_nhwc_fhwc-tile-optimize-run

Comment on lines +85 to 262
AffineApplyOp>(
loc,
AffineMap::get(
1, 0,
d0 + j * vecSize),
ivG);

Value i = builder.create<
TransferReadOp>(
loc, vecTy, input,
ValueRange{ivA, ivE,
rowInput,
columnInput});

auto protectedF =
builder.create<
affine::AffineIfOp>(
loc, vecTy,
IntegerSet::get(
1, 1,
{s0 - 1 - d0},
{false}),
ValueRange{rowFilter,
FH},
true);

// if row in range, read
// normally.
auto thenBuilder =
protectedF
.getThenBodyBuilder();
Value normalReadVec =
thenBuilder.create<
TransferReadOp>(
loc, vecTy, filter,
ValueRange{
ivB, ivE,
rowFilter,
columnFilter});
thenBuilder.create<
affine::AffineYieldOp>(
loc, normalReadVec);

// if row out of range, give
// back N empty vector.
auto elseBuilder =
protectedF
.getElseBodyBuilder();
Value emptyVec =
elseBuilder
.create<SplatOp>(
loc, vecTy, cf0);
elseBuilder.create<
affine::AffineYieldOp>(
loc, emptyVec);

iList.push_back(i);
fList.push_back(
protectedF->getOpResult(
0));
}
}
Value lastResult =
builder
.create<memref::LoadOp>(
loc, buffer, c0);
for (int i = 0; i < kernelM;
++i) {
for (int j = 0; j < kernelN;
++j) {
lastResult = builder.create<
vector::FMAOp>(
loc, vecTy,
iList[i * kernelN + j],
fList[i * kernelN + j],
lastResult);
}
}

builder.create<memref::StoreOp>(
loc, lastResult, buffer, c0);
});
});
});

Value reduceVec =
builder.create<memref::LoadOp>(loc, buffer, c0);
Value reducedRes =
builder.create<vector::ReductionOp>(
loc, vector::CombiningKind::ADD, reduceVec);
Value bias = builder.create<memref::LoadOp>(
loc, output, ValueRange{ivA, ivB, ivC, ivD});
Value addRes = builder.create<arith::AddFOp>(
loc, bias, reducedRes);
builder.create<memref::StoreOp>(
loc, addRes, output,
ValueRange{ivA, ivB, ivC, ivD});
});
});
});
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the purpose of modifying this code for beautifying/formatting? Would the original code look better?

Value FW = rewriter.create<memref::DimOp>(loc, filter, 1); // FW

// clang format off
// Step 1: Create outer most loops.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have Step 2 in this file? If not, the word Step had better be removed.

Comment on lines +84 to +96

const Value zeroElementTypeVec =
isa<IntegerType>(elementType)
? rewriter
.create<vector::BroadcastOp>(
loc, VectorType::get({affineVectorSize}, elementType),
zeroElementType)
.getResult()
: rewriter
.create<vector::SplatOp>(
loc, VectorType::get({affineVectorSize}, elementType),
zeroElementType)
.getResult();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the reason for this change(If there are actual changes, there is no need to add comments in this code, I just wonder)?

Comment on lines +16 to +22
add_mlir_library(BatchMatMulTileOptimization
BatchMatMulTileOptimize.cpp
)

add_mlir_library(BatchMatMulSCFOptimization
BatchMatMulSCFOptimize.cpp
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do BatchMatMulOptimize, BatchMatMulTileOptimize and BatchMatMulSCFOptimize need to "add_mlir_library" twice? Should all three of them belong to BatchMatMulOptimization?

@xlinsist
Copy link
Contributor

It seems that the FileCheck of mlir files in examples/MLIRLinalg directory does not pass the lit check. When you have modified this part, you need to use ninja check-buddy command in the build folder to check whether the modified part is correct.

image

@xlinsist
Copy link
Contributor

Running make linalg-depthwise_conv_2d_nhwc_hwc-optimize-lower leads to lowering error:

image

@xlinsist
Copy link
Contributor

Running either linalg-batch-matmul-tile-optimize-lower or linalg-batch-matmul-scf-optimize-lower leads to the following error:

image

Copy link
Contributor

@xlinsist xlinsist left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When I try to understand the optimization passes of this PR, I have some questions that I would like to verify:

  1. We don't need to modify the strategies of the original ConvOptimize.cpp and BatchMatMulOptimize pass implementations, right? But I saw that there are code changes in the PR.

  2. linalg-conv2d_nhwc_fhwc-optimize vectorizes the Channel dimension, and the number of vector elements in one iteration is fixed to 16, with no tail processing implemented. Does this mean this pass is only applicable to scenarios where the number of channels is divisible by 16?

  3. The block size of linalg-conv2d_nhwc_fhwc-tile-optimize is 2x3, which I think should be the block size for Height and Width? It would be better if there are comments to elaborate optimization strategies in the code.

  4. The lowerings of linalg-batch-matmul-scf-optimize, linalg-batch-matmul-tile-optimize, linalg-depthwise_conv_2d_nhwc_hwc-optimize in makefile report error. It is recommended to ensure that the examples in examples/MLIRLinalg can be generated correctly, before integrating these passes into buddy-benchmark for Ops testing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants