Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[OpenCL][Kernel] Add concat multi inputs kernel except channel is not aligned #6075

Merged
merged 7 commits into from
May 13, 2021

Conversation

zhaoyang-star
Copy link
Collaborator

@zhaoyang-star zhaoyang-star commented May 12, 2021

【问题】
当大于 2 个输入且不对 channel 维度执行 concat 时,concat 执行流程为:image2d->buffer, concat buffers, buffer->image2d。当输入 tensor 个数为N(N>2)时,会额外执行Nimage2d->buffer1buffer->image2d,这些额外操作在 adreno gpu 上会引入较明显的不必要开销。

【本PR内容】
增加一个通用型 concat kernel,可以支持 2-6 个输入 concat,使用限制条件为当对 channel 维度执行 concat 时所有输入 tensor 的 channel 必须满足 4 的整数倍,对其他维度执行 concat 则无限制。
kernel 中将重复代码放在宏中,可以实现较为简单的扩展和维护。

【效果】
ssd_mobilenetv3-large 模型中只有 2 个 conat,均为 8 输入,axis 均为 1:

  • 第一个 concat:Inputs dims: 1x1600x4, 1x600x4, 1x150x4, 1x54x4, 1x24x4, 1x6x4; Output_dims: 1x2434x4
  • 第二个 concat:Inputs dims: 1x1600x81, 1x600x81, 1x150x81, 1x54x81, 1x24x81, 1x6x81; Output_dims: 1x2434x81
    image

@paddle-bot-old
Copy link

Thanks for your contribution!

Copy link
Collaborator

@daming5432 daming5432 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@zhaoyang-star zhaoyang-star merged commit 6478459 into PaddlePaddle:develop May 13, 2021
@zhaoyang-star zhaoyang-star deleted the refactor_concat branch May 13, 2021 07:17
zhaoyang-star added a commit to zhaoyang-star/Paddle-Lite that referenced this pull request Jun 13, 2021
zhaoyang-star added a commit to zhaoyang-star/Paddle-Lite that referenced this pull request Jun 15, 2021
daming5432 pushed a commit that referenced this pull request Jun 16, 2021
* [OpenCL] Fix select fp32 compile crash (#6006)

* [Pass] Add opencl_kernel_place_correct_pass (#6037)

* [OpenCL] Fix invalid arg size in instance_norm (#6064)

* [OpenCL][Kernel] Add concat multi inputs kernel except channel is not aligned (#6075)

* [OpenCL][Bugfix] Fix target choose in opencl_kernel_place_correct_pass  (#6079)

* [OpenCL] fix kernel select of concat (#6158)

* [OpenCL] BindTargets KOpenCL for conv_conv_fuse_pass (#6125)

* test=develop

* [UTest] Loose abs_error for group_norm and instance_norm (#6188)

* loose group_norm abs_err. test=develop
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants