Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HIP kernels may launch with non-uniform block size for backward compatibility #2307

Merged
merged 6 commits into from
Aug 28, 2023

Conversation

junliume
Copy link
Collaborator

@junliume junliume commented Aug 8, 2023

https://ontrack-internal.amd.com/browse/SWDEV-413293
https://reviews.llvm.org/D155213

From ROCm 5.7 the compiler patch above will assume uniform block sizes; however, for backward compatibility we need to add this flag in MIOpen.

Or else we will observe:
Error log :

MIOpen Error: /long_pathname_so_that_rpms_can_package_the_debug_info/data/driver/MLOpen/src/hipoc/hipoc_kernel.cpp:104: Failed to launch kernel: invalid argument

@junliume
Copy link
Collaborator Author

junliume commented Aug 8, 2023

@JehandadKhan and @atamazov :

  • do we need to make changes to src/hip/hip_build_utils.cpp ?
  • Will HIP_COMPILER_FLAGS be applied to hipRTC ?
  • Will we need to make suitable changes to OCL kernels ?

Thanks!

JehandadKhan
JehandadKhan previously approved these changes Aug 8, 2023
@junliume
Copy link
Collaborator Author

junliume commented Aug 9, 2023

@atamazov @JehandadKhan need your opinion here:
So in MIOpen building process, this only adds to the compilation command:

-DHIP_COMPILER_FLAGS=" -mllvm -amdgpu-early-inline-all=true -mllvm -amdgpu-function-calls=false -x hip -isystem \
/opt/rocm/include  -D__HIP_PLATFORM_HCC__=1 -D__HIP_PLATFORM_AMD__=1  -isystem /opt/rocm/include --hip-link  \
-fno-offload-uniform-block " 

Hence this is not directly effective since the actual compilation command looks like:

/opt/rocm/llvm/bin/clang++ -DHIP_COMPILER_FLAGS=" ${lots of options} " $(lots of other options}

and if I move -fno-offload-uniform-block out of the quotes (i.e. from HIP_COMPILER_FLAGS to $(lots of other options}), then I will meet

clang++: error: unknown argument: '-fno-offload-uniform-block'

I think we need a very precise version of HIP from which -fno-offload-uniform-block is recognized. At the same time, I wonder if HIP_COMPILER_FLAGS is effective, and if yes, when and where is it applied to?

Update: effective HIP_FLAT_VERSION 500723302

@atamazov
Copy link
Contributor

atamazov commented Aug 9, 2023

@junliume @JehandadKhan

From ROCm 5.7 the compiler patch above will assum uniform block sizes; however, for backward compatibility we need to add this flag in MIOpen.

This would switch OFF some optimization in compiler that was ON before. As a side effect, we can get performance drops.

Therefore this new flag should be applied only to the kernels that actually use non-uniform grids.

Note that old .kdb files may contain kernels built with assumption that block sizes are uniform. AFAICS these kernels may work incorrectly when the grid is non-uniform (and that is why compiler team introduced the option). Therefore I think we need to re-generate .kdb after all the other fixes are done.

Copy link
Contributor

@atamazov atamazov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very well done technically, but there is a danger of perf drop, see #2307 (comment). We can give it a try but some careful testing of performance is needed.

And this is a workaround indeed, because I do not believe we want to switch off some compiler optimizations permanently ;)

CMakeLists.txt Outdated Show resolved Hide resolved
Co-authored-by: Artem Tamazov <artem.tamazov@gmail.com>
@atamazov
Copy link
Contributor

atamazov commented Aug 12, 2023

🌀 Performance testing results

Preconditions:

  • Navi21 (gfx1030/36)
  • ROCm 5.7 RC docker compiler__debug, Ubuntu 20.04.5 LTS
  • BUILD_DEV=On, Release build, additional CMake vars:
    • -DMIOPEN_USE_COMPOSABLEKERNEL=Off
    • -DMIOPEN_ENABLE_AI_KERNEL_TUNING=On
    • -DMIOPEN_ENABLE_AI_IMMED_MODE_FALLBACK=On
    • -DMIOPEN_USE_MLIR=Off
  • branch SWDEV-406592 b3d5164 vs. develop 331fc10 with patches necessary to fix docker problems:
  • Env settings:
    • export MIOPEN_DEBUG_OPENCL_CONVOLUTIONS=0
    • export MIOPEN_DEBUG_GCN_ASM_KERNELS=0
    • export MIOPEN_DEBUG_HIP_KERNELS=1
    • export MIOPEN_DEBUG_CONV_GEMM=0
  • 93 known FP32 convolution configs
  • 93 known FP16 convolution configs

Tested modes:

  • (0) Immediate mode with default (system) find-db
  • (1) Find Mode (DYNAMIC_HYBRID)
  • (2) Find Mode (NORMAL)
  • (3) Immediate mode with user find-db filled by step (2).

Bottom line: No performance or correctness regressions. Detailed logs & csv files are available upon request.

Verdict: 🟢 GO!

@junliume junliume modified the milestones: ROCm 5.7, ROCm 6.0 Aug 15, 2023
@junliume junliume changed the title [ROCm 5.7] HIP kernels may launch with non-uniform block size for backward compatibility HIP kernels may launch with non-uniform block size for backward compatibility Aug 24, 2023
@junliume
Copy link
Collaborator Author

Ping @atamazov and @JehandadKhan for review. This feature is deferred but we may need it anyway.
P.S. If we have some time, maybe we should not use this backward compatibility option, but instead fix all the places where non-uniform grid is an assumption.

@atamazov
Copy link
Contributor

@junliume Reviewed and tested a while ago. Please go ahead.

@JehandadKhan
Copy link
Collaborator

P.S. If we have some time, maybe we should not use this backward compatibility option, but instead fix all the places where non-uniform grid is an assumption.

I like this approach better, fix the actual issue instead of introducing more workarounds. Looking at the invoker code we can easily determine which kernels rely on it and fix them. Furthermore, add a check in the kernel launch code to ensure that the grid size is symmetric.

@JehandadKhan
Copy link
Collaborator

Perhaps we can merge this PR and review the solvers post-merge and ensure that all are symmetric, once done we can remove the workaround.

@junliume

@junliume
Copy link
Collaborator Author

@JehandadKhan :
@shurale-nkn suggested the following way to reveal which kernels are not following the pattern:
"we can insert check in AddKernel(..) function in MIOpen\src\hip\handlehip.cpp . If (grid % work_group_size) !=0 throw error"

@junliume junliume merged commit 0ca134d into develop Aug 28, 2023
@JehandadKhan
Copy link
Collaborator

@junliume and @shurale-nkn I like that idea!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants