-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[2/10] CMSIS-NN code generator for softmax #8833
Conversation
Adding @manupa-arm @grant-arm @areusch for review. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is looking great @ashutosh-arm, really enjoying the code re-use here 😸 Just needs neatening up a bit from my POV.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One small suggestion and I think you need to run the linters against the code again @ashutoshparkhi 😸
b674bec
to
59d4217
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ashutosh-arm thanks for the PR. a few questions on the approach
tvmc_package = tvmc.compiler.compile_model( | ||
tvmc_model, | ||
target=f"cmsis-nn, c -runtime=c --system-lib --link-params -mcpu=cortex-m55 --executor=aot", | ||
output_format="mlf", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
int32_t input_bits = 5; | ||
double beta_multiplier = (beta * quant_scale * (1 << (31 - input_bits))); | ||
beta_multiplier = std::min<double>(beta_multiplier, (1ll << 31) - 1.0); | ||
auto mult_shift_pair = tvm::relay::qnn::GetFixedPointMultiplierShift(beta_multiplier); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jroesch @electriclilies @tqchen what's our stance on adding this much complexity to the codegen? can you comment whether it's okay to invoke tvm::relay::qnn::GetFixedPointMultiplierShift
from here?
in general, we should favor putting as much validation logic in the compiler as possible. it kind of seems like this should move upwards though, so that the codegen is just emitting tir.call_extern with the same arguments as are in the CallNode? Doing that would make it possible to emit code using llvm codegen.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the second point, we were planning on moving TIR generation from python implementation to C++ when we go to next operator's support. I can move this call to TIR then if others are okay with that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ashutosh-arm @Mousius and I discussed about this a bit. This is a smaller instance of a more general problem: other CMSIS-NN operators require user-defined data structures. This is the only thing stopping them from being encoded as tir.call_extern
and therefore allowing reuse of the c
and llvm
codegen.
@ZihengJiang was working on this but I believe it's been de-prioritized. my preference is not to place call-specific transformation logic inside of codegen and instead ensure the tir.CallNode
contains the final parameters where possible. might it be possible to work around this either:
a) by creating a custom TIR Node class and using this as a parameter, knowing that this parameter can only be consumed by CMSIS-NN codegen
b) by creating a placeholder parameter and attaching the struct as an attribute to the CallNode
cc @mbs-octoml
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the data structure bits. Note that most of the data structures are passed as pointers(handle), this means we likely only need intrinsics to construct those data structures(return handles), then pass as argument to the call.
We would prefer this way of reusing existing construct(via composing intrinsics) than adding semantics to the TIR when possible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tqchen the issue with this approach is that we want to stack-allocate these structs because we don't want to call out to the memory manager. so while they're passed by pointer, we still need to account for the memory usage of those structs.
we probably need to discuss this a bit more before merging calls with structs, but I think we understand enough to unblock this PR. I discussed with @ashutosh-arm and he'll update this PR to move this call out of the codegen. The next CMSIS-NN call will require us to fill a struct. This way, when we move to consider that call, we'll start from a clean slate where the checked-in codegen doesn't further require a cleanup to meet the intended codegen pattern and therefore remove that from the discussion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this PR has already been reviewed. I've created another one here: #8951 that contains changes to move TIR code generation to C++ side. This PR also has the change to move out the CMSIS-NN Softmax's parameter calculations out of C codegen.
59d4217
to
addd795
Compare
thanks @ashutosh-arm for working through this with us! we will continue review in the next PR. |
This PR is to support softmax (int8) via CMSIS-NN. It also includes changes for CMSIS-NN graph partitioner since that PR has not merged yet: #8653
This includes second and third steps of plan mentioned in the RFC: apache/tvm-rfcs#15
Co-authored-by: Chris Sidebottom chris.sidebottom@arm.com