[OpenCL][Kernel] Add concat multi inputs kernel except channel is not aligned #6075
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
【问题】
当大于 2 个输入且不对 channel 维度执行 concat 时,concat 执行流程为:
image2d->buffer, concat buffers, buffer->image2d
。当输入 tensor 个数为N(N>2)
时,会额外执行N
次image2d->buffer
和1
次buffer->image2d
,这些额外操作在 adreno gpu 上会引入较明显的不必要开销。【本PR内容】
增加一个通用型 concat kernel,可以支持 2-6 个输入 concat,使用限制条件为当对 channel 维度执行 concat 时所有输入 tensor 的 channel 必须满足 4 的整数倍,对其他维度执行 concat 则无限制。
kernel 中将重复代码放在宏中,可以实现较为简单的扩展和维护。
【效果】
ssd_mobilenetv3-large 模型中只有 2 个 conat,均为 8 输入,axis 均为 1: