FC & Softmax #6560

zhaoyang-star · 2021-07-26T10:05:05Z

【问题】

已有的fckernel 是基于cl::Buffer实现，性能不佳
已有的softmax在处理二维tensor时，性能不佳，原因是并行度很低，比如维度为 1x1000 的 tensor，axis=1，只分配了一个线程来计算

【本PR工作】

优化fc，input/output/bias 使用cl::Image2d存储，weight 使用cl::Buffer存储，且 weight 的读取方式是half16，具体参见 [OpenCL][Kernel] Use FC replace conv1x1 #6365 ；对应单测支持 fp32/fp16 两种精度验证
优化softmax，针对处理二维tensor时性能不佳的问题，调整线程分配方式为对 axis 轴所在的数据以32进行分块处理，因此使用了 local memory，核心思想是并行 reduce；同时为了高效处理channel非4整除情况，使用mask来避免使用if/else判断

【效果】
MobileNetV1 模型中有一个fc和一个softmax，在包含 mali 和 adreno gpu 6 个设备上测试 kernel 耗时，如下表（耗时单位 ms）。fc可提速 1 ~ 3 倍，softmax可提速 44% ~ 302%

单独在 845 上测试不同N值下的 FC 性能：

【TODO】
由于这两个 kernel 的输出都是 2 维的，当对其输出 tensor 的维度扩充为 4 维时，不是按照 opencl converter 中定义的对高维度pad 1，而是对低维度 pad 1，因此对 precision profile 会有影响，待解决此处。后续计划统一将 opencl converter 改为对低维度 pad 1。

daming5432

LGTM

zhaoyang-star added 12 commits July 8, 2021 15:16

add fc image impl

014b41e

add softmax_1x1

3421cde

fix bugs

5b7c9dc

Merge branch 'develop' into tune_fc

3a4e221

Merge branch 'develop' into tune_fc

8209558

fc utest passed

cbce175

support multi-batch

e88a921

fc, softmax utest all passed

01236dc

reduce include in fc.cc

880840a

update precision profile and layout_cast

d2ed7f1

update precision profile and layout_cast

ca1fb13

grace code. test=develop

c12547a

zhaoyang-star marked this pull request as ready for review July 28, 2021 13:35

zhaoyang-star requested review from daming5432 and zhenlin-work July 28, 2021 13:36

daming5432 approved these changes Jul 29, 2021

View reviewed changes

zhaoyang-star merged commit 3dbaebd into PaddlePaddle:develop Jul 29, 2021

zhaoyang-star deleted the tune_fc branch July 29, 2021 07:53

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FC & Softmax #6560

FC & Softmax #6560

zhaoyang-star commented Jul 26, 2021 •

edited

Loading

daming5432 left a comment

FC & Softmax #6560

FC & Softmax #6560

Conversation

zhaoyang-star commented Jul 26, 2021 • edited Loading

daming5432 left a comment

Choose a reason for hiding this comment

zhaoyang-star commented Jul 26, 2021 •

edited

Loading