-
Notifications
You must be signed in to change notification settings - Fork 682
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix softmax, math unary/binary kernel int overflow #8472
Conversation
dx[i] = __hmul(dy[row_id], __hsub(prob[i], __float2half(1.0))); | ||
} else { | ||
dx[i] = __hmul(dy[row_id], prob[i]); | ||
if (elem_cnt >= 2147483647) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
我觉得可以把这个判断放外面,加个模板参数IDX,在外面DispatchIndexType比较好。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if (elem_cnt < GetMaxVal<int32_t>() / 2) {
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
我觉得可以把这个判断放外面,加个模板参数IDX,在外面DispatchIndexType比较好。
外面是这样?
if (elem_cnt < GetMaxVal<int32_t>() / 2) {
ComputeEntropyGpuHalf<K, int32_t><<<BlocksNum4ThreadsNum(num_instances), kCudaThreadsNumPerBlock, 0,
stream->As<ep::CudaStream>()->cuda_stream()>>>(
num_instances, num_classes, depth, lower_bound, reinterpret_cast<const half*>(x), labels,
reinterpret_cast<half*>(y));
} else {
ComputeEntropyGpuHalf<K, int64_t><<<BlocksNum4ThreadsNum(num_instances), kCudaThreadsNumPerBlock, 0,
stream->As<ep::CudaStream>()->cuda_stream()>>>(
num_instances, num_classes, depth, lower_bound, reinterpret_cast<const half*>(x), labels,
reinterpret_cast<half*>(y));
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
嗯,这样可以的
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if (elem_cnt < GetMaxVal<int32_t>() / 2) {
这里为什么要除以 2 呢? @guo-ran
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里改了
View latest API docs preview at: https://staging.oneflow.info/docs/Oneflow-Inc/oneflow/pr/8472/ |
CI failed when running job: cuda-misc. PR label automerge has been removed |
Speed stats:
|
Speed stats:
|
CI failed when running job: cpu-module. PR label automerge has been removed |
Speed stats:
|
View latest API docs preview at: https://staging.oneflow.info/docs/Oneflow-Inc/oneflow/pr/8472/ |
修复 softmax, math unary / binary cuda kernel 可能遇到的 int32 溢出的问题。
对于 CUDA_1D_KERNEL_LOOP 中会对 i 进行整数除法的 kernel,则通过 if 判断进入不同的分支,防止在大部分情况 int32 可以工作的时候影响性能。