optimize softmax with cross entropy hard label #32290

xingfeng01 · 2021-04-15T02:50:08Z

PR types

Performance optimization

PR changes

Ops

Describe

Op softmax_with_cross_entropy optimization label label. This PR includes:

hard label forward kernel: use same idea of softmax's implementation: SoftmaxWithCrossEntropyHardLabel
hard label backward kernel: merge multiple kernels to one kernel: SoftmaxWithCrossEntropyGradHardLabel

paddle-bot-old · 2021-04-15T02:50:11Z

Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

xingfeng01 · 2021-04-19T02:14:23Z

Performance optimized, computation time reduced:

softmax_with_cross_entropy_1 (forward) : -78.9%
softmax_with_cross_entropy_1 (backward): -89.2%

Xreki · 2021-04-19T05:31:29Z

paddle/fluid/operators/softmax_with_cross_entropy_op.cu

-    if (ignore_index != tmp) {
-      int64_t idx = idx_n * d + tmp * remain + idx_remain;
-      logit_grad[idx] -= static_cast<T>(1.);
+__device__ __forceinline__ T logT(T x) {


logT -> Log，另外这个函数不用返回AccT类型吗？

另外，原来的实现有TolerableValue封装，对无穷值的处理，兼容一下。

已修改。返回T即可。

Xreki · 2021-04-19T05:32:56Z

paddle/fluid/operators/softmax_with_cross_entropy_op.cu

+  return static_cast<T>(std::log(static_cast<AccT>(x)));
+}
+
+static inline int log2_ceil(int value) {


log2_ceil -> Log2Ceil，另外这个函数也是来自softmax_cudnn_op.cu里面，建议放到公共头文件中。

已修改。建议后期和softmax一起整合。

Xreki · 2021-04-19T05:57:06Z

paddle/fluid/operators/softmax_with_cross_entropy_op.cu

    }
  }
 }

+/*
+  Core function of softmax with cross entropy forward hard label.
+  Idea is similar


这个comment，看的人就不知道和谁similar了。

已修改。

Xreki · 2021-04-19T06:02:00Z

paddle/fluid/operators/softmax_with_cross_entropy_op.cu

+
+  int first_batch = (blockDim.y * blockIdx.x + threadIdx.y) * kBatchSize;
+  int local_batches = batch_size - first_batch;
+  if (local_batches > kBatchSize) {


这种情况会出现吗？注意到softmax_cudnn_op.cu里面没有这个if。

已经将softmax/softmax_with_cross_entropy 整合，WarpSoftmaxForward L112通过模板参数控制。
之前softmax_cudnn中采用了另一种处理方式，这种情况会出现。整合后代码此问题消失。

Xreki · 2021-04-19T06:05:36Z

paddle/fluid/operators/softmax_with_cross_entropy_op.cu

+  }
+
+  // read data from global memory
+  VecT srcdata[kBatchSize][kIterationsV];


注意到softmax_cudnn_op.cu里面保存的是AccT类型，数组精度上可能会有略微的差异。

已经将softmax/softmax_with_cross_entropy 整合，WarpSoftmaxForward L112通过模板参数控制。整合后代码此问题消失。

Xreki · 2021-04-19T06:24:11Z

paddle/fluid/operators/softmax_with_cross_entropy_op.cu

+    auto mode = axis == rank - 1 ? CUDNN_SOFTMAX_MODE_INSTANCE
+                                 : CUDNN_SOFTMAX_MODE_CHANNEL;
+    PADDLE_ENFORCE_CUDA_SUCCESS(platform::dynload::cudnnSoftmaxForward(
+        handle, CUDNN_SOFTMAX_ACCURATE, mode,


如果是调cudnn的话，应该还是先掉log模式比较好，因为softmax+cross_entropy融合计算不只是为了加快速度，而是避免了先exp后log计算带来的数值精度上的损失。

已修改。先计算log。

Xreki · 2021-04-19T06:25:21Z

paddle/fluid/operators/softmax_with_cross_entropy_op.cu

+    SwitchWarpSoftmaxForwardHardLabel<T>(
+        blocks, threads, ctx.cuda_device_context().stream(), loss_data,
+        softmax_data, logits_data, labels_data, N, dim, dim, kDimLog2,
+        ignore_index);


softmax、log_softmax、softmax_with_cross_entropy最好使用相同的CUDA Kernel来启动，这避免launch代码也重复出现多次。

已经整合，WarpSoftmaxForward L112通过模板参数控制。后期可以完全替换softmax_cudnn中实现，放到头文件。

Xreki · 2021-04-19T06:26:32Z

paddle/fluid/operators/softmax_with_cross_entropy_op.cu

+#ifdef PADDLE_WITH_HIP
+    miopenTensorDescriptor_t desc_ = desc.descriptor<T>(layout, tensor_dims);
+#else
+    cudnnTensorDescriptor_t desc_ = desc.descriptor<T>(layout, tensor_dims);


不是作为类成员，变量名最后不需要加下划线。

已修改。

Xreki · 2021-04-19T06:30:18Z

paddle/fluid/operators/softmax_with_cross_entropy_op.cu

    }
  }
 }

+namespace {


这里为啥要单独加个匿名的namespace呢？

已修改。这个是之前存在的，Git没有比较出来。

Xreki · 2021-04-19T06:54:30Z

paddle/fluid/operators/softmax_with_cross_entropy_op.cu

@@ -886,9 +903,9 @@ class SoftmaxWithCrossEntropyCUDAKernel : public framework::OpKernel<T> {
      } else {
        auto* logits_data = logits->data<T>();
        auto* labels_data = labels->data<int64_t>();


numeric_stable_mode属性代表了什么？上一个分支调的是cudnn的softmax。

这个是api的一个输入参数，这个PR没有修改这个分支的代码。

ronny1996 · 2021-04-19T08:09:26Z

paddle/fluid/operators/softmax_with_cross_entropy_op.cu

+#ifdef PADDLE_WITH_HIP
+    auto mode = axis == rank - 1 ? MIOPEN_SOFTMAX_MODE_INSTANCE
+                                 : MIOPEN_SOFTMAX_MODE_CHANNEL;
+    PADDLE_ENFORCE_CUDA_SUCCESS(platform::dynload::miopenSoftmaxForward(


miopenSoftmaxForward -> miopenSoftmaxForward_V2，

PADDLE_ENFORCE_CUDA_SUCCESS(platform::dynload::miopenSoftmaxForward_V2( handle, platform::CudnnDataType<T>::kOne(), desc_, logits_data, platform::CudnnDataType<T>::kZero(), desc_, softmax_data)); platform::CudnnDataType<T>::kZero(), desc_, softmax_data, MIOPEN_SOFTMAX_ACCURATE, mode));

https://rocmsoftwareplatform.github.io/MIOpen/doc/html/softmax.html?highlight=miopensoftmaxforward#miopensoftmaxforward-v2

已修改。

luotao1

文件属性不需要从644改到755

xingfeng01 · 2021-04-26T03:29:11Z

文件属性不需要从644改到755

已修改。@luotao1

Xreki

一些代码层面优化的建议，可以后续PR再优化。当前PR主要确认计算逻辑，另外也请OP负责人来review下。

Xreki · 2021-04-25T11:42:58Z

paddle/fluid/operators/softmax_with_cross_entropy_op.cu

@@ -15,44 +15,446 @@ limitations under the License. */
 #include <hipcub/hipcub.hpp>
 namespace cub = hipcub;
 #endif


文件权限不要改成755。

已修改。

Xreki · 2021-04-25T13:14:42Z

paddle/fluid/operators/softmax_with_cross_entropy_op.cu

+    if (labels[ids] == ignore_idx) {
+      loss[ids] = static_cast<T>(0.0);
+    } else {
+      loss[ids] = -Log(softmax[idx]);


Paddle/paddle/fluid/operators/softmax_with_cross_entropy_op.cu

Lines 418 to 420 in f16981b

// It also would ignore labels not in range(class_num).

if (idx_axis != labels_[idx_lbl]) {

} else {

原实现中这个判断还有对应吗？

这段代码和原代码逻辑一样，处理ignore_index。只是把原代码中的两个kernel合并成为一个。

这个Kernel是合并了原来的L400 HardLabelCrossEntropyFunctor和L434 HardLabelCrossEntropyFunctorWithIgnoreIdx吧，原来两个kernel的区别是是否检查label==ignore_index。

原来这2个Kernel的判断条件分别是L419 if (idx_axis != labels_[idx_lbl])和L455 if (idx_axis == labels_[idx_lbl] && idx_axis != ignore_idx_)，这个kernel应该是少了一个判断条件了。

原始代码总的threads数目是n * dim * remain (L642), 新代码总的threads数目是n * remain （L73）, 所以少一个判断条件。

已改动，修改后代码和源代码逻辑一致。

原始代码总的threads数目是n * dim * remain (L642), 新代码总的threads数目是n * remain （L73）, 所以少一个判断条件。

我就说公式里没有这个判断逻辑。代码里面加一些解释，或者PR描述详细一些，会更便于理解和review。

Xreki · 2021-04-25T13:27:49Z

paddle/fluid/operators/softmax_with_cross_entropy_op.cu

+  shuffle api to compute max (sum) in one warp.
+*/
+template <typename T, typename VecT, typename AccT, int Log2Elements,
+          int CPMODE>


CP是什么缩写？建议改成如下枚举类型定义方式：

enum class SoftmaxMode { kSoftmax, kLogSoftmax, kCrossEntropy, };

已修改。

Xreki · 2021-04-25T13:33:14Z

paddle/fluid/operators/softmax_with_cross_entropy_op.cu

+  constexpr int kIterations = kDimCeil / kWarpSize;
+  constexpr int kIterationsV =
+      (kIterations >= kVSize) ? (kIterations / kVSize) : 1;
+  constexpr int kBatchSize = (kDimCeil <= 128) ? 2 : 1;


标记一下，这个batch_size的计算和softmax中不一样。

会在整合softmax时一并处理。

Xreki · 2021-04-25T13:37:28Z

paddle/fluid/operators/softmax_with_cross_entropy_op.cu

+#pragma unroll
+  for (int i = 0; i < kBatchSize; ++i) {
+    // it = 0
+    if (CPMODE == 1 || CPMODE == 2) {


mode != kSoftmax

已修改。

Xreki · 2021-04-26T02:21:46Z

paddle/fluid/operators/softmax_with_cross_entropy_op.cu

+  auto stream = ctx.cuda_device_context().stream();
+
+  constexpr int max_dim = 320;
+  constexpr int warps_per_block = 4;


L382可以删除。

已删除。

Xreki · 2021-04-26T02:28:20Z

paddle/fluid/operators/softmax_with_cross_entropy_op.cu

    }
  }
 }

+/*
+  Hard label cross entropy with exp.


函数加注释说明一下，输入是啥、输出是啥。这里softmax应该是既作为输入、又作为输出，输入的数据应该是log_softmax，没有注释我就只有看到了调用处的代码才知道了。

已加注释。

Xreki · 2021-04-26T02:37:31Z

paddle/fluid/operators/softmax_with_cross_entropy_op.cu

+  // thread ids compute loss[ids] using softmax[idx]
+  if (idx < n * dim * d) {
+    if (idx_dim == labels[ids]) {
+      if (labels[ids] == ignore_idx) {


我看原来的kernel里面都有对label取值范围的检查，这个kernel里面是不是也要加一下？

这段代码和原代码逻辑一样，处理ignore_index。只是把原代码中的两个kernel合并成为一个。

Paddle/paddle/fluid/operators/softmax_with_cross_entropy_op.cu

Lines 532 to 536 in 202b0ea

PADDLE_ENFORCE(labels_[idx_lbl] >= 0 && labels_[idx_lbl] < d_ ||

labels_[idx_lbl] == ignore_idx_,

"The value of label[%ld] expected >= 0 and < %ld, or == %d,"

"but got %ld. Please check input value.",

idx_lbl, d_, ignore_idx_, labels_[idx_lbl]);

如上述代码，理应都检查一下。

建议不添加这段报错检查，消耗计算资源，如果输入错误，返回一个输出，也属合理。
原代码中大概有10处使用labels类似逻辑，只有1处有PADDLE_ENFORCE。

建议不添加这段报错检查，消耗计算资源，如果输入错误，返回一个输出，也属合理。
原代码中大概有10处使用labels类似逻辑，只有1处有PADDLE_ENFORCE。

主要考虑的应该是这个函数里面是否需要加检查，而不是原代码中有没有。
此类op，如果用户数据不准确，没有对数据进行检查，代价就是用户训练了很久，发现模型不收敛，然后定位了很久才发现是数据有问题。加个检查本身也没有太大的开销。

好的，之后建PR处理这个问题。这个PR保持和原代码行为一致。

Xreki · 2021-04-26T02:48:48Z

paddle/fluid/operators/softmax_with_cross_entropy_op.cu

+        CrossEntropyHardLabel<
+            T><<<blocks, threads, 0, context.cuda_device_context().stream()>>>(
+            loss_data, logits_data, labels_data, n, axis_dim, d / axis_dim,
+            ignore_index);


这个分支，op benchmark没有测试到，最好加个配置测试下性能。

好的，后续加上。

Xreki · 2021-04-26T02:54:35Z

paddle/fluid/operators/softmax_with_cross_entropy_op.cu

+*/
+template <typename T>
+static void SoftmaxWithCrossEntropyHardLabel(
+    const framework::ExecutionContext& ctx, int rank, int axis,


不用从context里面拿输入、输出，就建议直接传platform::CUDADeviceContext了。

已修改。

xingfeng01 · 2021-04-26T04:12:06Z

一些代码层面优化的建议，可以后续PR再优化。当前PR主要确认计算逻辑，另外也请OP负责人来review下。

Op负责人已经review过，线下已回复ok。

… develop-hardlabel

paddle-bot-old · 2021-05-08T02:36:38Z

Sorry to inform you that 0029c65's CIs have passed for more than 7 days. To prevent PR conflicts, you need to re-run all CIs manually.

Xreki

这类op精度比较敏感，建议跑一下模型，验证下收敛性。

Xreki · 2021-05-19T02:07:30Z

paddle/fluid/operators/softmax_with_cross_entropy_op.cu

+    if (labels[ids] == ignore_idx) {
+      loss[ids] = static_cast<T>(0.0);
+    } else {
+      loss[ids] = -Log(softmax[idx]);


原始代码总的threads数目是n * dim * remain (L642), 新代码总的threads数目是n * remain （L73）, 所以少一个判断条件。

我就说公式里没有这个判断逻辑。代码里面加一些解释，或者PR描述详细一些，会更便于理解和review。

… develop-hardlabel

xingfeng01 · 2021-05-21T05:38:52Z

本地验证deeplabv3, amp模型通过。

Xreki

LGTM

This reverts commit 7be6191.

) This reverts commit 7be6191.

Xreki reviewed Apr 19, 2021

View reviewed changes

ronny1996 reviewed Apr 19, 2021

View reviewed changes

xingfeng01 requested a review from Xreki April 26, 2021 02:38

luotao1 reviewed Apr 26, 2021

View reviewed changes

xingfeng01 requested a review from luotao1 April 26, 2021 03:30

Xreki reviewed Apr 26, 2021

View reviewed changes

xingfeng01 added 3 commits April 27, 2021 11:20

optimize softmax with cross entropy hard label

de38ce6

Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…

b6c4f59

… develop-hardlabel

label ignore_index cleaning

0029c65

xingfeng01 requested a review from Xreki May 6, 2021 01:28

Xreki reviewed May 19, 2021

View reviewed changes

Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…

e406cef

… develop-hardlabel

xingfeng01 requested a review from Xreki May 21, 2021 05:39

Xreki approved these changes May 21, 2021

View reviewed changes

Xreki merged commit 7be6191 into PaddlePaddle:develop May 21, 2021

Xreki added a commit that referenced this pull request Jun 4, 2021

Revert "optimize softmax with cross entropy hard label (#32290)"

5106fbf

This reverts commit 7be6191.

Xreki mentioned this pull request Jun 4, 2021

Revert "optimize softmax with cross entropy hard label" #33340

Merged

Xreki added a commit that referenced this pull request Jun 5, 2021

Revert "optimize softmax with cross entropy hard label (#32290)" (#33340

1315e3a

) This reverts commit 7be6191.

	// It also would ignore labels not in range(class_num).
	if (idx_axis != labels_[idx_lbl]) {
	} else {

	PADDLE_ENFORCE(labels_[idx_lbl] >= 0 && labels_[idx_lbl] < d_ \|\|
	labels_[idx_lbl] == ignore_idx_,
	"The value of label[%ld] expected >= 0 and < %ld, or == %d,"
	"but got %ld. Please check input value.",
	idx_lbl, d_, ignore_idx_, labels_[idx_lbl]);

optimize softmax with cross entropy hard label #32290

optimize softmax with cross entropy hard label #32290

Conversation

xingfeng01 commented Apr 15, 2021 • edited Loading

PR types

PR changes

Describe

paddle-bot-old bot commented Apr 15, 2021

xingfeng01 commented Apr 19, 2021

Choose a reason for hiding this comment

xingfeng01 Apr 19, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

xingfeng01 Apr 19, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

luotao1 left a comment

Choose a reason for hiding this comment

xingfeng01 commented Apr 26, 2021 • edited Loading

Xreki left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

xingfeng01 May 6, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

xingfeng01 Apr 26, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

xingfeng01 commented Apr 26, 2021

paddle-bot-old bot commented May 8, 2021

Xreki left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

xingfeng01 commented May 21, 2021

Xreki left a comment

Choose a reason for hiding this comment

xingfeng01 commented Apr 15, 2021 •

edited

Loading

xingfeng01 Apr 19, 2021 •

edited

Loading

xingfeng01 Apr 19, 2021 •

edited

Loading

xingfeng01 commented Apr 26, 2021 •

edited

Loading

xingfeng01 May 6, 2021 •

edited

Loading

xingfeng01 Apr 26, 2021 •

edited

Loading