-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
perf:update preprocess #29
Conversation
Signed-off-by: Manato HIRABAYASHI <manato.hirabayashi@tier4.jp>
Signed-off-by: Manato HIRABAYASHI <manato.hirabayashi@tier4.jp>
03016d3
to
bc94689
Compare
@manato Thank you for your great PR! While building, I got the following error. Is this depending on my enviroment? ktro2828@ktro2828-desktop ~/myWorkspace/TensorRT-MTR [perf/update_preprocess] $ cmake --build build -j${nproc}
Consolidate compiler generated dependencies of target test_intention_point
Consolidate compiler generated dependencies of target test_polyline
Consolidate compiler generated dependencies of target test_agent
[ 4%] Building NVCC (Device) object CMakeFiles/custom_kernel.dir/lib/src/preprocess/custom_kernel_generated_polyline_preprocess_kernel.cu.o
Consolidate compiler generated dependencies of target custom_plugin
[ 12%] Built target test_intention_point
[ 20%] Built target test_agent
[ 29%] Built target test_polyline
[ 66%] Built target custom_plugin
/home/ktro2828/myWorkspace/TensorRT-MTR/lib/src/preprocess/polyline_preprocess_kernel.cu(48): error: function "<unnamed>::decomposer_t::operator()" returns incomplete type "cuda::std::__4::tuple<float &>"
__attribute__((device)) ::cuda::std::tuple<float&> operator()(index_and_value_t& key) const
^
/home/ktro2828/myWorkspace/TensorRT-MTR/lib/src/preprocess/polyline_preprocess_kernel.cu(52): error: list-initialization of an object type "cuda::std::__4::tuple<float &>" is not allowed because the type is incomplete
return {key.value};
^
2 errors detected in the compilation of "/home/ktro2828/myWorkspace/TensorRT-MTR/lib/src/preprocess/polyline_preprocess_kernel.cu".
CMake Error at custom_kernel_generated_polyline_preprocess_kernel.cu.o.cmake:280 (message):
Error generating file
/home/ktro2828/myWorkspace/TensorRT-MTR/build/CMakeFiles/custom_kernel.dir/lib/src/preprocess/./custom_kernel_generated_polyline_preprocess_kernel.cu.o
gmake[2]: *** [CMakeFiles/custom_kernel.dir/build.make:949: CMakeFiles/custom_kernel.dir/lib/src/preprocess/custom_kernel_generated_polyline_preprocess_kernel.cu.o] Error 1
gmake[1]: *** [CMakeFiles/Makefile2:120: CMakeFiles/custom_kernel.dir/all] Error 2
gmake: *** [Makefile:91: all] Error 2 My CUDA version is 12.1: $ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2023 NVIDIA Corporation
Built on Mon_Apr__3_17:16:06_PDT_2023
Cuda compilation tools, release 12.1, V12.1.105
Build cuda_12.1.r12.1/compiler.32688072_0 |
…r CUDA ver. Signed-off-by: Manato HIRABAYASHI <manato.hirabayashi@tier4.jp>
Thank you very much for checking this PR and pointing this out. As far as I searched, CUB does not support sorting user-defined data types as of CUDA 12.1. Fortunately, the other type of overloaded function performing "key-value sort" can be used this time, so I replaced the code to use that type of function (at 073a877) because it is supported by both CUDA12.1 and CUDA12.3. I'd appreciate it if you could try it again. Thx! |
Thanks for updates! I confirmed that all CUDA12.1~12.3 passed building and performed well! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left some comments and questions, but the code looks good to me!
Signed-off-by: Manato HIRABAYASHI <manato.hirabayashi@tier4.jp>
Signed-off-by: Manato HIRABAYASHI <manato.hirabayashi@tier4.jp>
@manato Thank you for your contribution! I will merge this PR. |
Description
This PR accelerates top-k selection to solve the issue #18.
Tests performed
Effects on system behavior
Most modern GPUs have limitations for the shared memory size that can be assigned to one thread block; the upper limit is 48KB. The current implementation assigns a fixed number of items that can be handled by one CUDA thread, which the fixed number is calculated from the shared memory limitation. For this reason, if the value of
L
exceeds 256*24=6144, the function namedpolylinePreprocessWithTopkLauncher
returns an invalid value error.