-
Notifications
You must be signed in to change notification settings - Fork 160
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Midend] Enhancements and Optimizations and [Examples] Added MLIRLinalg Examples for Various Optimization Options #384
base: main
Are you sure you want to change the base?
Conversation
… scf version for further work. Update conv2dnhwcfhwc vectorization and tile. Update depthwise_conv2dnhwchwc vectorization and tile
…ize 2.conv-nhwc-fhwc-tile-optimize 3.depthwise-conv-nhwc-hwc-optimize 4.batchmatmul-tile-optimize 5.batchmatmul-scf-optimize . Example mlir: batchmatmul conv2d_nhwc_fhwc depthwise_conv_2d_nhwc_hwc
…ize 2.conv-nhwc-fhwc-tile-optimize 3.depthwise-conv-nhwc-hwc-optimize 4.batchmatmul-tile-optimize 5.batchmatmul-scf-optimize . Example mlir: batchmatmul conv2d_nhwc_fhwc depthwise_conv_2d_nhwc_hwc
midend/lib/Conversion/ConvOptimization/ConvNhwcFhwcOptimize.cpp
Outdated
Show resolved
Hide resolved
…ion [Examples] Added MLIRLinalg Examples for Various Optimization Options
…ion [Examples] Added MLIRLinalg Examples for Various Optimization Options
…ion [Examples] Added MLIRLinalg Examples for Various Optimization Options. fixed thirdparty.
loc, AffineMap::get(1, 0, d0.ceilDiv(tilingOC)), OC); | ||
|
||
// clang format off | ||
// Step 1: Create outer most loops. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we have Step 2
in this file? If not, the word Step
had better be removed.
Value FW = rewriter.create<memref::DimOp>(loc, filter, 2); // FW | ||
|
||
// clang format off | ||
// Step 1: Create outer most loops. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we have Step 2
in this file? If not, the word Step
had better be removed.
linalg-conv2d_nhwc_fhwc-optimize-lower: | ||
@${BUDDY_OPT} linalg-conv2d_nhwc_fhwc.mlir \ | ||
-conv-nhwc-fhwc-optimize="vec-size=16" \ | ||
-o ./log.mlir | ||
|
||
linalg-conv2d_nhwc_fhwc-tile-optimize-lower: | ||
@${BUDDY_OPT} linalg-conv2d_nhwc_fhwc.mlir \ | ||
-conv-nhwc-fhwc-tile-optimize="vec-size=16 tiling-height=2 tiling-width=3" \ | ||
-o ./log.mlir | ||
|
||
linalg-conv2d_nhwc_fhwc-optimize-run: | ||
@${BUDDY_OPT} linalg-conv2d_nhwc_fhwc.mlir ${MLIR_OPT_OPTIONS} \ | ||
-conv-nhwc-fhwc-optimize="vec-size=16" \ | ||
-lower-affine -convert-scf-to-cf \ | ||
-convert-vector-to-llvm -finalize-memref-to-llvm -convert-arith-to-llvm \ | ||
-convert-func-to-llvm -reconcile-unrealized-casts | \ | ||
${MLIR_CPU_RUNNER} ${OPT_FLAG} -e main -entry-point-result=void -shared-libs=${MLIR_RUNNER_UTILS} -shared-libs=${MLIR_C_RUNNER_UTILS} | ||
|
||
linalg-conv2d_nhwc_fhwc-tile-optimize-run: | ||
@${BUDDY_OPT} linalg-conv2d_nhwc_fhwc.mlir ${MLIR_OPT_OPTIONS} \ | ||
-conv-nhwc-fhwc-tile-optimize="vec-size=16 tiling-height=2 tiling-width=3" \ | ||
-lower-affine -convert-scf-to-cf \ | ||
-convert-vector-to-llvm -finalize-memref-to-llvm -convert-arith-to-llvm \ | ||
-convert-func-to-llvm -reconcile-unrealized-casts | \ | ||
${MLIR_CPU_RUNNER} ${OPT_FLAG} -e main -entry-point-result=void -shared-libs=${MLIR_RUNNER_UTILS} -shared-libs=${MLIR_C_RUNNER_UTILS} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Better to obey the order of the existing test methods. For example, the order of tests in this code should be:
- linalg-conv2d_nhwc_fhwc-optimize-lower
- linalg-conv2d_nhwc_fhwc-optimize-run
- linalg-conv2d_nhwc_fhwc-tile-optimize-lower
- linalg-conv2d_nhwc_fhwc-tile-optimize-run
AffineApplyOp>( | ||
loc, | ||
AffineMap::get( | ||
1, 0, | ||
d0 + j * vecSize), | ||
ivG); | ||
|
||
Value i = builder.create< | ||
TransferReadOp>( | ||
loc, vecTy, input, | ||
ValueRange{ivA, ivE, | ||
rowInput, | ||
columnInput}); | ||
|
||
auto protectedF = | ||
builder.create< | ||
affine::AffineIfOp>( | ||
loc, vecTy, | ||
IntegerSet::get( | ||
1, 1, | ||
{s0 - 1 - d0}, | ||
{false}), | ||
ValueRange{rowFilter, | ||
FH}, | ||
true); | ||
|
||
// if row in range, read | ||
// normally. | ||
auto thenBuilder = | ||
protectedF | ||
.getThenBodyBuilder(); | ||
Value normalReadVec = | ||
thenBuilder.create< | ||
TransferReadOp>( | ||
loc, vecTy, filter, | ||
ValueRange{ | ||
ivB, ivE, | ||
rowFilter, | ||
columnFilter}); | ||
thenBuilder.create< | ||
affine::AffineYieldOp>( | ||
loc, normalReadVec); | ||
|
||
// if row out of range, give | ||
// back N empty vector. | ||
auto elseBuilder = | ||
protectedF | ||
.getElseBodyBuilder(); | ||
Value emptyVec = | ||
elseBuilder | ||
.create<SplatOp>( | ||
loc, vecTy, cf0); | ||
elseBuilder.create< | ||
affine::AffineYieldOp>( | ||
loc, emptyVec); | ||
|
||
iList.push_back(i); | ||
fList.push_back( | ||
protectedF->getOpResult( | ||
0)); | ||
} | ||
} | ||
Value lastResult = | ||
builder | ||
.create<memref::LoadOp>( | ||
loc, buffer, c0); | ||
for (int i = 0; i < kernelM; | ||
++i) { | ||
for (int j = 0; j < kernelN; | ||
++j) { | ||
lastResult = builder.create< | ||
vector::FMAOp>( | ||
loc, vecTy, | ||
iList[i * kernelN + j], | ||
fList[i * kernelN + j], | ||
lastResult); | ||
} | ||
} | ||
|
||
builder.create<memref::StoreOp>( | ||
loc, lastResult, buffer, c0); | ||
}); | ||
}); | ||
}); | ||
|
||
Value reduceVec = | ||
builder.create<memref::LoadOp>(loc, buffer, c0); | ||
Value reducedRes = | ||
builder.create<vector::ReductionOp>( | ||
loc, vector::CombiningKind::ADD, reduceVec); | ||
Value bias = builder.create<memref::LoadOp>( | ||
loc, output, ValueRange{ivA, ivB, ivC, ivD}); | ||
Value addRes = builder.create<arith::AddFOp>( | ||
loc, bias, reducedRes); | ||
builder.create<memref::StoreOp>( | ||
loc, addRes, output, | ||
ValueRange{ivA, ivB, ivC, ivD}); | ||
}); | ||
}); | ||
}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the purpose of modifying this code for beautifying/formatting? Would the original code look better?
Value FW = rewriter.create<memref::DimOp>(loc, filter, 1); // FW | ||
|
||
// clang format off | ||
// Step 1: Create outer most loops. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we have Step 2
in this file? If not, the word Step
had better be removed.
|
||
const Value zeroElementTypeVec = | ||
isa<IntegerType>(elementType) | ||
? rewriter | ||
.create<vector::BroadcastOp>( | ||
loc, VectorType::get({affineVectorSize}, elementType), | ||
zeroElementType) | ||
.getResult() | ||
: rewriter | ||
.create<vector::SplatOp>( | ||
loc, VectorType::get({affineVectorSize}, elementType), | ||
zeroElementType) | ||
.getResult(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the reason for this change(If there are actual changes, there is no need to add comments in this code, I just wonder)?
add_mlir_library(BatchMatMulTileOptimization | ||
BatchMatMulTileOptimize.cpp | ||
) | ||
|
||
add_mlir_library(BatchMatMulSCFOptimization | ||
BatchMatMulSCFOptimize.cpp | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do BatchMatMulOptimize
, BatchMatMulTileOptimize
and BatchMatMulSCFOptimize
need to "add_mlir_library" twice? Should all three of them belong to BatchMatMulOptimization
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When I try to understand the optimization passes of this PR, I have some questions that I would like to verify:
-
We don't need to modify the strategies of the original ConvOptimize.cpp and BatchMatMulOptimize pass implementations, right? But I saw that there are code changes in the PR.
-
linalg-conv2d_nhwc_fhwc-optimize
vectorizes the Channel dimension, and the number of vector elements in one iteration is fixed to 16, with no tail processing implemented. Does this mean this pass is only applicable to scenarios where the number of channels is divisible by 16? -
The block size of
linalg-conv2d_nhwc_fhwc-tile-optimize
is 2x3, which I think should be the block size for Height and Width? It would be better if there are comments to elaborate optimization strategies in the code. -
The lowerings of
linalg-batch-matmul-scf-optimize
,linalg-batch-matmul-tile-optimize
,linalg-depthwise_conv_2d_nhwc_hwc-optimize
in makefile report error. It is recommended to ensure that the examples in examples/MLIRLinalg can be generated correctly, before integrating these passes into buddy-benchmark for Ops testing.
[Midend] Enhancements and Optimizations
BatchMatMul:
Conv2D NHWC-FHWC:
Depthwise Conv2D NHWC-HWC:
[Examples] Added MLIRLinalg Examples for Various Optimization Options
New MLIRLinalg examples were added to demonstrate the usage of new buddy-opt various optimization options , including:
vectorization layout optimization.
Example MLIR Files:
batchmatmul
conv2d_nhwc_fhwc
depthwise_conv_2d_nhwc_hwc
These updates collectively improve the performance of matrix and convolution operations in MLIR, providing optimized patterns for common workloads like BatchMatMul and convolution layers.