-
Notifications
You must be signed in to change notification settings - Fork 5.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CudnnNormConvolution is no longer supported on NVIDIA Hopper GPUs #48203
Conversation
你的PR提交成功,感谢你对开源项目的贡献! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@@ -45,6 +45,14 @@ struct NormConvolutionArgs { | |||
int stride, | |||
int dilation, | |||
int group) { | |||
PADDLE_ENFORCE_LT( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
有个小疑问,当初为了进一步优化ResNet50性能,使用cudnnFusedOpsPlan_t
相关接口实现了多个融合算子,代码分别在cudnn_bn_stats_finalize.cu.h
、cudnn_norm_conv.cu.h
和cudnn_scale_bias_add_relu.cu.h
,公共类实现在cudnn_fusion_helper.h
,请问只有cudnn_norm_conv.cu.h
不再支持了吗?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
目前在H100上只看到cudnn_norm_conv.cu.h
相关的test挂掉,别的暂时没问题。
…ddlePaddle#48203) * Skip tests that use fused_ops on H100 * Add error message to FusedOps on H100
* Reduce squeeze2_matmul_fuse_pass, flattent tests time (#47098) * Add missing fp32 config and reduce the testing combination * Reduce trt matmul pass test max examples * Loose TRT fp16 tests tolerance (#47100) * Loose TRT half test tolerance to 1e-3 (#47101) * Loose TRT half test tolerance to 1e-3 (#47106) * Update distributed_strategy.proto (#46531) * Close popen pipe after used (#47053) * Add launch_bounds (#47285) * Fix TRT UT failures (#47488) * Format cherry-picked commits * CudnnNormConvolution is no longer supported on NVIDIA Hopper GPUs (#48203) * Skip tests that use fused_ops on H100 * Add error message to FusedOps on H100 Co-authored-by: Shijie <505749828@qq.com> Co-authored-by: Leo Chen <39020268+leo0519@users.noreply.github.com> Co-authored-by: Tian Zheng <tizheng@nvidia.com>
PR types
Bug fixes
PR changes
OPs
Describe
CudnnNormConvolution relies on CUDNN fused_ops, which is deprecated and is no longer supported on Hopper GPUs (H100 and later). We add necessary prompts when the user tries to use this OP on Hopper devices.
Meanwhile, we fix the related unit tests on the new hardware.
Note: This is a duplicate of #47089, which was accidentally closed and could not be reopen.