Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to limit number of threads per group in algorithms? (2024.11.18.) #1936

Open
krasznaa opened this issue Nov 18, 2024 · 8 comments
Open

Comments

@krasznaa
Copy link

We are finally starting to use oneDPL in earnest in the traccc project, now that #1060 is not an issue anymore.

Now I ran into a different, pretty interesting issue. (In the sense that I didn't see such an issue before...) During a unit test, I get this sort of a failure:

[ RUN      ] SYCLCkfToyDetectorValidation/CkfToyDetectorTests.Run/1
WARNING: No entries in volume finder

Detector check: OK
unknown file: Failure
C++ exception with description "Exceeded the number of registers available on the hardware.
        The number registers per work-group cannot exceed 65536 for this kernel on this device.
        The kernel uses 245 registers per work-item for a total of 512 work-items per work-group.
 -54 (PI_ERROR_INVALID_WORK_GROUP_SIZE)" thrown in the test body.

[  FAILED  ] SYCLCkfToyDetectorValidation/CkfToyDetectorTests.Run/1, where GetParam() = ("toy_n_particles_10000", { 0, 0, 0 }, { 0, 0, 0 }, { 1, 100 }, { -4, 4 }, { -3.14159274, 3.14159274 }, 12-byte object <0D-00 00-00 6C-63 D8-3D 00-00 80-BF>, 10000, 1, false) (4509 ms)

We indeed use some types in our code that are very register hungry. This is an issue that we're actively working on. But to my surprise, this failure didn't come from one of our own kernels, but from this oneDPL operation:

                // Sort the keys and values.
                vecmem::device_vector<device::sort_key> keys_device(
                    keys_buffer);
                vecmem::device_vector<unsigned int> param_ids_device(
                    param_ids_buffer);
                oneapi::dpl::sort_by_key(policy, keys_device.begin(),
                                         keys_device.end(),
                                         param_ids_device.begin());

This is pretty surprising, since the data types being worked on by this algorithm launch are pretty simple... 😕 So I'm quite surprised that with any launch parameters we would run into this sort of an issue.

But it's definitely a possibility that we may need to put limits on the launch parameters that oneDPL could use. The execution policy received by the functions would seem like the perfect place to store such limits in. 🤔 But I don't see an option at least in oneapi::dpl::execution::device_policy to specify such launch limits.

Is there a way to tell the algorithms to not launch more than N threads per block/group?

Cheers,
Attila

P.S. In case it may be useful, this is how this test of mine gets to the problem:

Thread 1 "traccc_test_syc" hit Catchpoint 1 (exception thrown), 0x00007ffff523d35a in __cxa_throw () from /lib/x86_64-linux-gnu/libstdc++.so.6
(gdb) bt
#0  0x00007ffff523d35a in __cxa_throw () from /lib/x86_64-linux-gnu/libstdc++.so.6
#1  0x00007ffff4f692e8 in sycl::_V1::detail::enqueue_kernel_launch::handleInvalidWorkGroupSize(sycl::_V1::detail::device_impl const&, _pi_kernel*, sycl::_V1::detail::NDRDescT const&) ()
   from /home/krasznaa/software/intel/oneapi-2024.2.1/compiler/2024.2/lib/libsycl.so.7
#2  0x00007ffff4f6c088 in sycl::_V1::detail::enqueue_kernel_launch::handleErrorOrWarning(_pi_result, sycl::_V1::detail::device_impl const&, _pi_kernel*, sycl::_V1::detail::NDRDescT const&) ()
   from /home/krasznaa/software/intel/oneapi-2024.2.1/compiler/2024.2/lib/libsycl.so.7
#3  0x00007ffff5055b1d in sycl::_V1::detail::enqueueImpKernel(std::shared_ptr<sycl::_V1::detail::queue_impl> const&, sycl::_V1::detail::NDRDescT&, std::vector<sycl::_V1::detail::ArgDesc, std::allocator<sycl::_V1::detail::ArgDesc> >&, std::shared_ptr<sycl::_V1::detail::kernel_bundle_impl> const&, std::shared_ptr<sycl::_V1::detail::kernel_impl> const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::vector<_pi_event*, std::allocator<_pi_event*> >&, std::shared_ptr<sycl::_V1::detail::event_impl> const&, std::function<void* (sycl::_V1::detail::AccessorImplHost*)> const&, _pi_kernel_cache_config, bool) () from /home/krasznaa/software/intel/oneapi-2024.2.1/compiler/2024.2/lib/libsycl.so.7
#4  0x00007ffff505b2ab in sycl::_V1::detail::ExecCGCommand::enqueueImpQueue() () from /home/krasznaa/software/intel/oneapi-2024.2.1/compiler/2024.2/lib/libsycl.so.7
#5  0x00007ffff50425ff in sycl::_V1::detail::Command::enqueue(sycl::_V1::detail::EnqueueResultT&, sycl::_V1::detail::BlockingT, std::vector<sycl::_V1::detail::Command*, std::allocator<sycl::_V1::detail::Command*> >&) ()
   from /home/krasznaa/software/intel/oneapi-2024.2.1/compiler/2024.2/lib/libsycl.so.7
#6  0x00007ffff5070373 in sycl::_V1::detail::Scheduler::GraphProcessor::enqueueCommand(sycl::_V1::detail::Command*, std::shared_lock<std::shared_timed_mutex>&, sycl::_V1::detail::EnqueueResultT&, std::vector<sycl::_V1::detail::Command*, std::allocator<sycl::_V1::detail::Command*> >&, sycl::_V1::detail::Command*, sycl::_V1::detail::BlockingT) () from /home/krasznaa/software/intel/oneapi-2024.2.1/compiler/2024.2/lib/libsycl.so.7
#7  0x00007ffff506acf4 in sycl::_V1::detail::Scheduler::enqueueCommandForCG(std::shared_ptr<sycl::_V1::detail::event_impl>, std::vector<sycl::_V1::detail::Command*, std::allocator<sycl::_V1::detail::Command*> >&, sycl::_V1::detail::BlockingT) ()
   from /home/krasznaa/software/intel/oneapi-2024.2.1/compiler/2024.2/lib/libsycl.so.7
#8  0x00007ffff506a4f1 in sycl::_V1::detail::Scheduler::addCG(std::unique_ptr<sycl::_V1::detail::CG, std::default_delete<sycl::_V1::detail::CG> >, std::shared_ptr<sycl::_V1::detail::queue_impl> const&, _pi_ext_command_buffer*, std::vector<unsigned int, std::allocator<unsigned int> > const&) () from /home/krasznaa/software/intel/oneapi-2024.2.1/compiler/2024.2/lib/libsycl.so.7
#9  0x00007ffff50a43ea in sycl::_V1::handler::finalize() () from /home/krasznaa/software/intel/oneapi-2024.2.1/compiler/2024.2/lib/libsycl.so.7
#10 0x00007ffff502689d in void sycl::_V1::detail::queue_impl::finalizeHandler<sycl::_V1::handler>(sycl::_V1::handler&, sycl::_V1::event&) () from /home/krasznaa/software/intel/oneapi-2024.2.1/compiler/2024.2/lib/libsycl.so.7
#11 0x00007ffff50263b9 in sycl::_V1::detail::queue_impl::submit_impl(std::function<void (sycl::_V1::handler&)> const&, std::shared_ptr<sycl::_V1::detail::queue_impl> const&, std::shared_ptr<sycl::_V1::detail::queue_impl> const&, std::shared_ptr<sycl::_V1::detail::queue_impl> const&, sycl::_V1::detail::code_location const&, std::function<void (bool, bool, sycl::_V1::event&)> const*) () from /home/krasznaa/software/intel/oneapi-2024.2.1/compiler/2024.2/lib/libsycl.so.7
#12 0x00007ffff5025da0 in sycl::_V1::detail::queue_impl::submit(std::function<void (sycl::_V1::handler&)> const&, std::shared_ptr<sycl::_V1::detail::queue_impl> const&, sycl::_V1::detail::code_location const&, std::function<void (bool, bool, sycl::_V1::event&)> const*) () from /home/krasznaa/software/intel/oneapi-2024.2.1/compiler/2024.2/lib/libsycl.so.7
#13 0x00007ffff50da1a5 in sycl::_V1::queue::submit_impl(std::function<void (sycl::_V1::handler&)>, sycl::_V1::detail::code_location const&) () from /home/krasznaa/software/intel/oneapi-2024.2.1/compiler/2024.2/lib/libsycl.so.7
#14 0x00007ffff60b8391 in sycl::_V1::queue::submit<oneapi::dpl::__par_backend_hetero::__subgroup_radix_sort<oneapi::dpl::execution::__dpl::DefaultKernelName, (unsigned short)512, (unsigned short)32, 4u, true, (unsigned short)16>::__one_group_submitter<oneapi::dpl::__par_backend_hetero::__internal::__optional_kernel_name<> >::operator()<oneapi::dpl::__ranges::zip_view<oneapi::dpl::__ranges::guard_view<float*>, oneapi::dpl::__ranges::guard_view<unsigned int*> >, oneapi::dpl::__internal::__pattern_sort_by_key<oneapi::dpl::__internal::__device_backend_tag, oneapi::dpl::execution::__dpl::device_policy<oneapi::dpl::execution::__dpl::DefaultKernelName>&, float*, unsigned int*, oneapi::dpl::__internal::__pstl_less>(oneapi::dpl::__internal::__hetero_tag<oneapi::dpl::__internal::__device_backend_tag>, oneapi::dpl::execution::__dpl::device_policy<oneapi::dpl::execution::__dpl::DefaultKernelName>&, float*, float*, unsigned int*, oneapi::dpl::__internal::__pstl_less)::{lambda(auto:1 const&)#1}, std::integral_constant<bool, false>, std::integral_constant<bool, true> >(sycl::_V1::queue, oneapi::dpl::__ranges::zip_view<oneapi::dpl::__ranges::guard_view<float*>, oneapi::dpl::__ranges::guard_view<unsigned int*> >&&, oneapi::dpl::__internal::__pattern_sort_by_key<oneapi::dpl::__internal::__device_backend_tag, oneapi::dpl::execution::__dpl::device_policy<oneapi::dpl::execution::__dpl::DefaultKernelName>&, float*, unsigned int*, oneapi::dpl::__internal::__pstl_less>(oneapi::dpl::__internal::__hetero_tag<oneapi::dpl::__internal::__device_backend_tag>, oneapi::dpl::execution::__dpl::device_policy<oneapi::dpl::execution::__dpl::DefaultKernelName>&, float*, float*, unsigned int*, oneapi::dpl::__internal::__pstl_less)::{lambda(auto:1 const&)#1}, std::integral_constant<bool, false>, std::integral_constant<bool, true>)::{lambda(sycl::_V1::handler&)#1}>(oneapi::dpl::__par_backend_hetero::__subgroup_radix_sort<oneapi::dpl::execution::__dpl::DefaultKernelName, (unsigned short)512, (unsigned short)32, 4u, true, (unsigned short)16>::__one_group_submitter<oneapi::dpl::__par_backend_hetero::__internal::__optional_kernel_name<> >::operator()<oneapi::dpl::__ranges::zip_view<oneapi::dpl::__ranges::guard_view<float*>, oneapi::dpl::__ranges::guard_view<unsigned int*> >, oneapi::dpl::__internal::__pattern_sort_by_key<oneapi::dpl::__internal::__device_backend_tag, oneapi::dpl::execution::__dpl::device_policy<oneapi::dpl::execution::__dpl::DefaultKernelName>&, float*, unsigned int*, oneapi::dpl::__internal::__pstl_less>(oneapi::dpl::__internal::__hetero_tag<oneapi::dpl::__internal::__device_backend_tag>, oneapi::dpl::execution::__dpl::device_policy<oneapi::dpl::execution::__dpl::DefaultKernelName>&, float*, float*, unsigned int*, oneapi::dpl::__internal::__pstl_less)::{lambda(auto:1 const&)#1}, std::integral_constant<bool, false>, std::integral_constant<bool, true> >(sycl::_V1::queue, oneapi::dpl::__ranges::zip_view<oneapi::dpl::__ranges::guard_view<float*>, oneapi::dpl::__ranges::guard_view<unsigned int*> >&&, oneapi::dpl::__internal::__pattern_sort_by_key<oneapi::dpl::__internal::__device_backend_tag, oneapi::dpl::execution::__dpl::device_policy<oneapi::dpl::execution::__dpl::DefaultKernelName>&, float*, unsigned int*, oneapi::dpl::__internal::__pstl_less>(oneapi::dpl::__internal::__hetero_tag<oneapi::dpl::__internal::__device_backend_tag>, oneapi::dpl::execution::__dpl::device_policy<oneapi::dpl::execution::__dpl::DefaultKernelName>&, float*, float*, unsigned int*, oneapi::dpl::__internal::__pstl_less)::{lambda(auto:1 const&)#1}, std::integral_constant<bool, false>, std::integral_constant<bool, true>)::{lambda(sycl::_V1::handler&)#1}, sycl::_V1::detail::code_location const&) (this=0x7ffffffef3e8, CGF=..., CodeLoc=...) at /home/krasznaa/software/intel/oneapi-2024.2.1/compiler/2024.2/bin/compiler/../../include/sycl/queue.hpp:358
#15 oneapi::dpl::__par_backend_hetero::__subgroup_radix_sort<oneapi::dpl::execution::__dpl::DefaultKernelName, (unsigned short)512, (unsigned short)32, 4u, true, (unsigned short)16>::__one_group_submitter<oneapi::dpl::__par_backend_hetero::__internal::__optional_kernel_name<> >::operator()<oneapi::dpl::__ranges::zip_view<oneapi::dpl::__ranges::guard_view<float*>, oneapi::dpl::__ranges::guard_view<unsigned int*> >, oneapi::dpl::__internal::__pattern_sort_by_key<oneapi::dpl::__internal::__device_backend_tag, oneapi::dpl::execution::__dpl::device_policy<oneapi::dpl::execution::__dpl::DefaultKernelName>&, float*, unsigned int*, oneapi::dpl::__internal::__pstl_less>(oneapi::dpl::__internal::__hetero_tag<oneapi::dpl::__internal::__device_backend_tag>, oneapi::dpl::execution::__dpl::device_policy<oneapi::dpl::execution::__dpl::DefaultKernelName>&, float*, float*, unsigned int*, oneapi::dpl::__internal::__pstl_less)::{lambda(auto:1 const&)#1}, std::integral_constant<bool, false>, std::integral_constant<bool, true> >(sycl::_V1::queue, oneapi::dpl::__ranges::zip_view<oneapi::dpl::__ranges::guard_view<float*>, oneapi::dpl::__ranges::guard_view<unsigned int*> >&&, oneapi::dpl::__internal::__pattern_sort_by_key<oneapi::dpl::__internal::__device_backend_tag, oneapi::dpl::execution::__dpl::device_policy<oneapi::dpl::execution::__dpl::DefaultKernelName>&, float*, unsigned int*, oneapi::dpl::__internal::__pstl_less>(oneapi::dpl::__internal::__hetero_tag<oneapi::dpl::__internal::__device_backend_tag>, oneapi::dpl::execution::__dpl::device_policy<oneapi::dpl::execution::__dpl::DefaultKernelName>&, float*, float*, unsigned int*, oneapi::dpl::__internal::__pstl_less)::{lambda(auto:1 const&)#1}, std::integral_constant<bool, false>, std::integral_constant<bool, true>) (__q=..., __src=..., __proj=..., this=<optimized out>)
    at /home/krasznaa/ATLAS/projects/traccc/traccc/out/build/sycl-fp32/_deps/dpl-src/include/oneapi/dpl/pstl/hetero/../hetero/dpcpp/parallel_backend_sycl_radix_sort_one_wg.h:160
#16 oneapi::dpl::__par_backend_hetero::__subgroup_radix_sort<oneapi::dpl::execution::__dpl::DefaultKernelName, (unsigned short)512, (unsigned short)32, 4u, true, (unsigned short)16>::operator()<oneapi::dpl::__ranges::zip_view<oneapi::dpl::__ranges::guard_view<float*>, oneapi::dpl::__ranges::guard_view<unsigned int*> >, oneapi::dpl::__internal::__pattern_sort_by_key<oneapi::dpl::__internal::__device_backend_tag, oneapi::dpl::execution::__dpl::device_policy<oneapi::dpl::execution::__dpl::DefaultKernelName>&, float*, unsigned int*, oneapi::dpl::__internal::__pstl_less>(oneapi::dpl::__internal::__hetero_tag<oneapi::dpl::__internal::__device_backend_tag>, oneapi::dpl::execution::__dpl::device_policy<oneapi::dpl::execution::__dpl::DefaultKernelName>&, float*, float*, unsigned int*, oneapi::dpl::__internal::__pstl_less)::{lambda(auto:1 const&)#1}>(sycl::_V1::queue, oneapi::dpl::__ranges::zip_view<oneapi::dpl::__ranges::guard_view<float*>, oneapi::dpl::__ranges::guard_view<unsigned int*> >&&, oneapi::dpl::__internal::__pattern_sort_by_key<oneapi::dpl::__internal::__device_backend_tag, oneapi::dpl::execution::__dpl::device_policy<oneapi::dpl::execution::__dpl::DefaultKernelName>&, float*, unsigned int*, oneapi::dpl::__internal::__pstl_less>(oneapi::dpl::__internal::__hetero_tag<oneapi::dpl::__internal::__device_backend_tag>, oneapi::dpl::execution::__dpl::device_policy<oneapi::dpl::execution::__dpl::DefaultKernelName>&, float*, float*, unsigned int*, oneapi::dpl::__internal::__pstl_less)::{lambda(auto:1 const&)#1}) (this=this@entry=0x7ffffffef5d0, __q=..., __src=..., __proj=...)
    at /home/krasznaa/ATLAS/projects/traccc/traccc/out/build/sycl-fp32/_deps/dpl-src/include/oneapi/dpl/pstl/hetero/../hetero/dpcpp/parallel_backend_sycl_radix_sort_one_wg.h:61
#17 0x00007ffff60b01f5 in oneapi::dpl::__par_backend_hetero::__parallel_radix_sort<true, oneapi::dpl::__ranges::zip_view<oneapi::dpl::__ranges::guard_view<float*>, oneapi::dpl::__ranges::guard_view<unsigned int*> >, oneapi::dpl::execution::__dpl::device_policy<oneapi::dpl::execution::__dpl::DefaultKernelName>&, oneapi::dpl::__internal::__pattern_sort_by_key<oneapi::dpl::__internal::__device_backend_tag, oneapi::dpl::execution::__dpl::device_policy<oneapi::dpl::execution::__dpl::DefaultKernelName>&, float*, unsigned int*, oneapi::dpl::__internal::__pstl_less>(oneapi::dpl::__internal::__hetero_tag<oneapi::dpl::__internal::__device_backend_tag>, oneapi::dpl::execution::__dpl::device_policy<oneapi::dpl::execution::__dpl::DefaultKernelName>&, float*, float*, unsigned int*, oneapi::dpl::__internal::__pstl_less)::{lambda(auto:1 const&)#1}>(oneapi::dpl::__internal::__device_backend_tag, oneapi::dpl::execution::__dpl::device_policy<oneapi::dpl::execution::__dpl::DefaultKernelName>&, oneapi::dpl::execution::__dpl::device_policy<oneapi::dpl::execution::__dpl::DefaultKernelName>&, oneapi::dpl::__internal::__pattern_sort_by_key<oneapi::dpl::__internal::__device_backend_tag, oneapi::dpl::execution::__dpl::device_policy<oneapi::dpl::execution::__dpl::DefaultKernelName>&, float*, unsigned int*, oneapi::dpl::__internal::__pstl_less>(oneapi::dpl::__internal::__hetero_tag<oneapi::dpl::__internal::__device_backend_tag>, oneapi::dpl::execution::__dpl::device_policy<oneapi::dpl::execution::__dpl::DefaultKernelName>&, float*, float*, unsigned int*, oneapi::dpl::__internal::__pstl_less)::{lambda(auto:1 const&)#1}) (__exec=..., __in_rng=..., __proj=...)
    at /home/krasznaa/ATLAS/projects/traccc/traccc/out/build/sycl-fp32/_deps/dpl-src/include/oneapi/dpl/pstl/hetero/../hetero/dpcpp/parallel_backend_sycl_radix_sort.h:810
#18 0x00007ffff60954ef in oneapi::dpl::__par_backend_hetero::__parallel_stable_sort<oneapi::dpl::execution::__dpl::device_policy<oneapi::dpl::execution::__dpl::DefaultKernelName>&, oneapi::dpl::__ranges::zip_view<oneapi::dpl::__ranges::guard_view<float*>, oneapi::dpl::__ranges::guard_view<unsigned int*> >, oneapi::dpl::__internal::__pstl_less, oneapi::dpl::__internal::__pattern_sort_by_key<oneapi::dpl::__internal::__device_backend_tag, oneapi::dpl::execution::__dpl::device_policy<oneapi::dpl::execution::__dpl::DefaultKernelName>&, float*, unsigned int*, oneapi::dpl::__internal::__pstl_less>(oneapi::dpl::__internal::__hetero_tag<oneapi::dpl::__internal::__device_backend_tag>, oneapi::dpl::execution::__dpl::device_policy<oneapi::dpl::execution::__dpl::DefaultKernelName>&, float*, float*, unsigned int*, oneapi::dpl::__internal::__pstl_less)::{lambda(auto:1 const&)#1}, 0>(oneapi::dpl::__internal::__device_backend_tag, oneapi::dpl::execution::__dpl::device_policy<oneapi::dpl::execution::__dpl::DefaultKernelName>&, oneapi::dpl::execution::__dpl::device_policy<oneapi::dpl::execution::__dpl::DefaultKernelName>&, oneapi::dpl::__internal::__pstl_less, oneapi::dpl::__internal::__pattern_sort_by_key<oneapi::dpl::__internal::__device_backend_tag, oneapi::dpl::execution::__dpl::device_policy<oneapi::dpl::execution::__dpl::DefaultKernelName>&, float*, unsigned int*, oneapi::dpl::__internal::__pstl_less>(oneapi::dpl::__internal::__hetero_tag<oneapi::dpl::__internal::__device_backend_tag>, oneapi::dpl::execution::__dpl::device_policy<oneapi::dpl::execution::__dpl::DefaultKernelName>&, float*, float*, unsigned int*, oneapi::dpl::__internal::__pstl_less)::{lambda(auto:1 const&)#1}) (__exec=..., __rng=..., 
    __backend_tag=..., __proj=...) at /home/krasznaa/ATLAS/projects/traccc/traccc/out/build/sycl-fp32/_deps/dpl-src/include/oneapi/dpl/pstl/hetero/../hetero/dpcpp/parallel_backend_sycl.h:1811
#19 oneapi::dpl::__internal::__stable_sort_with_projection<oneapi::dpl::__internal::__device_backend_tag, oneapi::dpl::execution::__dpl::device_policy<oneapi::dpl::execution::__dpl::DefaultKernelName>&, oneapi::dpl::zip_iterator<float*, unsigned int*>, oneapi::dpl::__internal::__pstl_less, oneapi::dpl::__internal::__pattern_sort_by_key<oneapi::dpl::__internal::__device_backend_tag, oneapi::dpl::execution::__dpl::device_policy<oneapi::dpl::execution::__dpl::DefaultKernelName>&, float*, unsigned int*, oneapi::dpl::__internal::__pstl_less>(oneapi::dpl::__internal::__hetero_tag<oneapi::dpl::__internal::__device_backend_tag>, oneapi::dpl::execution::__dpl::device_policy<oneapi::dpl::execution::__dpl::DefaultKernelName>&, float*, float*, unsigned int*, oneapi::dpl::__internal::__pstl_less)::{lambda(auto:1 const&)#1}>(oneapi::dpl::__internal::__hetero_tag<oneapi::dpl::__internal::__device_backend_tag>, oneapi::dpl::execution::__dpl::device_policy<oneapi::dpl::execution::__dpl::DefaultKernelName>&, oneapi::dpl::zip_iterator<float*, unsigned int*>, oneapi::dpl::zip_iterator<float*, unsigned int*>, oneapi::dpl::__internal::__pstl_less, oneapi::dpl::__internal::__pattern_sort_by_key<oneapi::dpl::__internal::__device_backend_tag, oneapi::dpl::execution::__dpl::device_policy<oneapi::dpl::execution::__dpl::DefaultKernelName>&, float*, unsigned int*, oneapi::dpl::__internal::__pstl_less>(oneapi::dpl::__internal::__hetero_tag<oneapi::dpl::__internal::__device_backend_tag>, oneapi::dpl::execution::__dpl::device_policy<oneapi::dpl::execution::__dpl::DefaultKernelName>&, float*, float*, unsigned int*, oneapi::dpl::__internal::__pstl_less)::{lambda(auto:1 const&)#1}) (__exec=..., __first=..., __last=..., 
    __comp=..., __proj=...) at /home/krasznaa/ATLAS/projects/traccc/traccc/out/build/sycl-fp32/_deps/dpl-src/include/oneapi/dpl/pstl/hetero/algorithm_impl_hetero.h:1262
#20 oneapi::dpl::__internal::__pattern_sort_by_key<oneapi::dpl::__internal::__device_backend_tag, oneapi::dpl::execution::__dpl::device_policy<oneapi::dpl::execution::__dpl::DefaultKernelName>&, float*, unsigned int*, oneapi::dpl::__internal::__pstl_less> (__exec=..., __keys_first=0x7043f2e00, __keys_last=0x7043fd3e4, __values_first=<optimized out>, __tag=..., __comp=...)
    at /home/krasznaa/ATLAS/projects/traccc/traccc/out/build/sycl-fp32/_deps/dpl-src/include/oneapi/dpl/pstl/hetero/algorithm_impl_hetero.h:1300
#21 oneapi::dpl::sort_by_key<oneapi::dpl::execution::__dpl::device_policy<oneapi::dpl::execution::__dpl::DefaultKernelName>&, float*, unsigned int*, oneapi::dpl::__internal::__pstl_less> (__exec=..., __keys_first=0x7043f2e00, 
    __keys_last=0x7043fd3e4, __values_first=<optimized out>, __comp=...) at /home/krasznaa/ATLAS/projects/traccc/traccc/out/build/sycl-fp32/_deps/dpl-src/include/oneapi/dpl/pstl/glue_algorithm_impl.h:711
#22 oneapi::dpl::sort_by_key<oneapi::dpl::execution::__dpl::device_policy<oneapi::dpl::execution::__dpl::DefaultKernelName>&, float*, unsigned int*> (__exec=..., __keys_first=0x7043f2e00, __keys_last=0x7043fd3e4, __values_first=<optimized out>)
    at /home/krasznaa/ATLAS/projects/traccc/traccc/out/build/sycl-fp32/_deps/dpl-src/include/oneapi/dpl/pstl/glue_algorithm_impl.h:720
#23 traccc::sycl::details::find_tracks<detray::rk_stepper<covfie::field_view<covfie::backend::constant<covfie::vector::vector_d<float, 3ul>, covfie::vector::vector_d<float, 3ul> > >, detray::cmath<float>, detray::constrained_step<detray::darray>, detray::stepper_rk_policy, detray::stepping::void_inspector>, detray::navigator<detray::detector<detray::default_metadata, detray::container_types<vecmem::device_vector, detray::tuple, detray::darray, vecmem::jagged_device_vector, detray::dmap> > const, 10ul, detray::navigation::void_inspector, detray::intersection2D<detray::surface_descriptor<detray::detail::typed_index<detray::default_metadata::mask_ids, unsigned int, unsigned int, 4026531840u, 268435455u>, detray::detail::typed_index<detray::default_metadata::material_ids, unsigned int, unsigned int, 4026531840u, 268435455u>, unsigned int, unsigned short>, detray::cmath<float>, false> > > (det=..., field=..., measurements=..., seeds=..., config=..., mr=..., copy=..., 
    queue=...) at /home/krasznaa/ATLAS/projects/traccc/traccc/device/sycl/src/finding/find_tracks.hpp:330
#24 0x00007ffff60923a5 in _ZNK6traccc4sycl37combinatorial_kalman_filter_algorithmclERKN6detray11dmulti_viewIJN6vecmem4data11vector_viewINS2_17volume_descriptorINS2_16default_metadata11geo_objectsENS2_6detail11typed_indexINS8_9accel_idsEjjLj4026531840ELj268435455EEENSB_INS8_12material_idsEjjLj4026531840ELj268435455EEEEEEENS6_INS2_11source_linkINS2_18surface_descriptorINSB_INS8_8mask_idsEjjLj4026531840ELj268435455EEESF_jtEEEEEENS6_IN7algebra5cmath10transform3INSQ_6matrix5actorImNSP_6matrix10array_typeENSU_11matrix_typeEfNSS_11determinant5actorImSW_fJNSX_17partial_pivot_ludImSW_fNSQ_14element_getterImSV_fEEJEEENSX_10hard_codedImSW_fS11_JLm2ELm4EEEEEEENSS_7inverse5actorImSW_fJNS16_17partial_pivot_ludImSW_fS11_JEEENS16_10hard_codedImSW_fS11_JLm2ELm4EEEEEEES11_NSQ_12block_getterImSV_fEEEEEEEENS3_IJNS6_INS2_4maskINS2_11rectangle2DEtNS2_5cmathIfEEEEEENS6_INS1I_INS2_11trapezoid2DEtS1L_EEEENS6_INS1I_INS2_9annulus2DEtS1L_EEEENS6_INS1I_INS2_10cylinder2DEtS1L_EEEENS6_INS1I_INS2_21concentric_cylinder2DEtS1L_EEEENS6_INS1I_INS2_6ring2DEtS1L_EEEENS6_INS1I_INS2_4lineILb0EEEtS1L_EEEENS6_INS1I_INS23_ILb1EEEtS1L_EEEEEEENS3_IJNS3_IJNS6_IjEENS6_INS2_4bins6singleINS2_13material_slabIfEEEEEENS6_ISt5arrayIjLm2EEEENS6_IfEEEEES2M_S2M_S2M_NS6_IS2F_EENS6_INS2_12material_rodIfEEEENS6_INS2_8materialIfSt5ratioILl1ELl1EEEEEES2M_S2M_EEENS3_IJNS3_IJS2B_NS6_ISM_EEEEENS3_IJS2B_NS3_IJNS6_INS2C_13dynamic_arrayISM_E4dataEEES2X_EEES2K_S2L_EEES34_S34_S34_EEENS3_IJNS6_INS2D_IjEEEENS3_IJS2K_S2L_EEEEEEEEERKN6covfie10field_viewINS3D_7backend8constantINS3D_6vector8vector_dIfLm3EEES3J_EEEERKNS6_IKNS_11measurementEEERKNS6_IKNS2_22bound_track_parametersIS1L_EEEE (this=<optimized out>, det=..., field=..., measurements=..., seeds=...)
    at /home/krasznaa/ATLAS/projects/traccc/traccc/device/sycl/src/finding/combinatorial_kalman_filter_algorithm_constant_field_default_detector.sycl:29
#25 0x0000000000423176 in traccc::CkfToyDetectorTests_Run_Test::TestBody (this=<optimized out>) at /home/krasznaa/ATLAS/projects/traccc/traccc/tests/sycl/test_ckf_toy_detector.cpp:191
#26 0x00007ffff5d9a5d9 in testing::internal::HandleSehExceptionsInMethodIfSupported<testing::Test, void> (method=&virtual testing::Test::TestBody(), location=0x7ffff5da586b "the test body", object=<optimized out>)
    at /home/krasznaa/ATLAS/projects/traccc/traccc/out/build/sycl-fp32/_deps/googletest-src/googletest/src/gtest.cc:2638
#27 testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void> (object=<optimized out>, method=&virtual testing::Test::TestBody(), location=0x7ffff5da586b "the test body")
    at /home/krasznaa/ATLAS/projects/traccc/traccc/out/build/sycl-fp32/_deps/googletest-src/googletest/src/gtest.cc:2674
#28 0x00007ffff5d80958 in testing::Test::Run (this=0x195cec0) at /home/krasznaa/ATLAS/projects/traccc/traccc/out/build/sycl-fp32/_deps/googletest-src/googletest/src/gtest.cc:2713
#29 0x00007ffff5d81976 in testing::TestInfo::Run (this=0x18edbe0) at /home/krasznaa/ATLAS/projects/traccc/traccc/out/build/sycl-fp32/_deps/googletest-src/googletest/src/gtest.cc:2859
#30 0x00007ffff5d82775 in testing::TestSuite::Run (this=0x18ed7c0) at /home/krasznaa/ATLAS/projects/traccc/traccc/out/build/sycl-fp32/_deps/googletest-src/googletest/src/gtest.cc:3037
#31 0x00007ffff5d9156e in testing::internal::UnitTestImpl::RunAllTests (this=0x18e83c0) at /home/krasznaa/ATLAS/projects/traccc/traccc/out/build/sycl-fp32/_deps/googletest-src/googletest/src/gtest.cc:5967
#32 0x00007ffff5d9afb9 in testing::internal::HandleSehExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool> (
    method=(bool (testing::internal::UnitTestImpl::*)(class testing::internal::UnitTestImpl * const)) 0x7ffff5d90c40 <testing::internal::UnitTestImpl::RunAllTests()>, 
    location=0x7ffff5da60dd "auxiliary test code (environments or event listeners)", object=<optimized out>) at /home/krasznaa/ATLAS/projects/traccc/traccc/out/build/sycl-fp32/_deps/googletest-src/googletest/src/gtest.cc:2638
#33 testing::internal::HandleExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool> (object=<optimized out>, 
    method=(bool (testing::internal::UnitTestImpl::*)(class testing::internal::UnitTestImpl * const)) 0x7ffff5d90c40 <testing::internal::UnitTestImpl::RunAllTests()>, 
    location=0x7ffff5da60dd "auxiliary test code (environments or event listeners)") at /home/krasznaa/ATLAS/projects/traccc/traccc/out/build/sycl-fp32/_deps/googletest-src/googletest/src/gtest.cc:2674
#34 0x00007ffff5d90bf3 in testing::UnitTest::Run (this=0x7ffff5db8d70 <testing::UnitTest::GetInstance()::instance>) at /home/krasznaa/ATLAS/projects/traccc/traccc/out/build/sycl-fp32/_deps/googletest-src/googletest/src/gtest.cc:5546
#35 0x00007ffff7fb917b in RUN_ALL_TESTS () at /home/krasznaa/ATLAS/projects/traccc/traccc/out/build/sycl-fp32/_deps/googletest-src/googletest/include/gtest/gtest.h:2334
#36 main (argc=1, argv=0x7fffffffc618) at /home/krasznaa/ATLAS/projects/traccc/traccc/out/build/sycl-fp32/_deps/googletest-src/googletest/src/gtest_main.cc:64
(gdb)
@krasznaa
Copy link
Author

Pinging @fwyzard, @AuroraPerego and @ivorobts for info.

@krasznaa
Copy link
Author

Note though that I'm 99% sure at this point that we're running into a oneAPI bug here. 🤔 Since the kernel(s) run by sort_by_key would definitely not need that many registers. The two arrays that we use as input to oneapi::dpl::sort_by_key have float and unsigned int elements respectably. 🤔

However we do absolutely have a couple of kernels in our code that we run with only 64 threads per block/group at the moment. I think the SYCL runtime is picking up the register needs of one of those kernels, when it decides that it cannot run the sorting kernel. 😕

At first I thought that this would happen because we didn't specify a custom type for some of our sycl::handler::parallel_for<...>(...) kernel launches. But even after adding very explicit, unique class names for these calls (even deep inside templated functions), I still see this runtime error from oneapi::dpl::sort_by_key. 😦

So, in this case limiting the number of threads for the sorting would mainly be needed to work around a oneAPI bug in my understanding. But the ability to set limits on the number of threads could still be a good thing to have. 🤔

@dmitriy-sobolev dmitriy-sobolev self-assigned this Nov 20, 2024
@dmitriy-sobolev
Copy link
Contributor

Hi, @krasznaa. I see that you use oneAPI 2024.2, which comes with oneDPL 2022.6. Have you tried oneDPL 2022.7 supplied as a part of oneAPI 2025.0? The issue you faced should already be fixed in the latest release: #1626.

If the suggestion above does not work for you can try oneapi::dpl::sort() as a workaround:

auto zipped_first = oneapi::dpl::make_zip_iterator(keys_device.begin(), param_ids_device.begin());
oneapi::dpl::sort(zipped_first, zipped_first + keys_device.size(), [](auto lhs, auto rhs) { return std::get<0>(lhs) < std::get<0>(rhs); });

It will call merge-sort algorithm instead of radix-sort. Basing on my empirical findings, it should perform better than radix-sort for number of elements ~100'000 or less (depends on GPU, though). Perhaps, it will suit you better.

As for configuring the size of work-groups, we have started implementing a more low-level API named kernel templates. This feature is experimental and still evolving. Currently, it has a sorting algorithm running on Intel GPUs only, but we are considering adding a more generic one.

I think a more conservative and faster approach for us to fix the issue is introspect kernels: #1938. It does not require passing external parameters.

Attila, which device do you use? It would be helpful for the issue reproduction. I am going to delve into it.

@krasznaa
Copy link
Author

This is excellent news! I'll try it out soon.

Upgrading our code to be compatible with oneAPI 2025.0.0 will take a bit more effort, but luckily our build is already not using oneDPL from oneAPI directly. 😉

https://github.com/acts-project/traccc/blob/main/extern/dpl/CMakeLists.txt#L21

So I'll try what happens when I upgrade our build to the latest oneDPL version. 👍

@krasznaa
Copy link
Author

I had to go with "option B" for now. 🤔 Which did make our code run as it's supposed to. 🥳 (At least as much as the unit tests claim.)

For some reason oneAPI 2024.2.1 really doesn't want to collaborate with oneDPL 2022.7.0. At least not in our build. 😦 When trying to use it, I get:

...
[ 82%] Building SYCL object device/sycl/CMakeFiles/traccc_sycl.dir/src/seeding/silicon_pixel_spacepoint_formation_algorithm_telescope_detector.sycl.o
[ 88%] Building CXX object device/sycl/CMakeFiles/traccc_sycl.dir/src/seeding/silicon_pixel_spacepoint_formation_algorithm.cpp.o
[ 88%] Building SYCL object device/sycl/CMakeFiles/traccc_sycl.dir/src/seeding/silicon_pixel_spacepoint_formation_algorithm_default_detector.sycl.o
[ 88%] Building CXX object device/sycl/CMakeFiles/traccc_sycl.dir/src/finding/combinatorial_kalman_filter_algorithm.cpp.o
[ 88%] Building SYCL object device/sycl/CMakeFiles/traccc_sycl.dir/src/finding/combinatorial_kalman_filter_algorithm_constant_field_telescope_detector.sycl.o
[ 88%] Building CXX object device/sycl/CMakeFiles/traccc_sycl.dir/src/fitting/kalman_fitting_algorithm.cpp.o
[ 88%] Building SYCL object device/sycl/CMakeFiles/traccc_sycl.dir/src/finding/combinatorial_kalman_filter_algorithm_constant_field_default_detector.sycl.o
[ 88%] Building SYCL object device/sycl/CMakeFiles/traccc_sycl.dir/src/fitting/kalman_fitting_algorithm_constant_field_telescope_detector.sycl.o
[ 88%] Building SYCL object device/sycl/CMakeFiles/traccc_sycl.dir/src/clusterization/clusterization_algorithm.sycl.o
[ 94%] Building SYCL object device/sycl/CMakeFiles/traccc_sycl.dir/src/seeding/seed_finding.sycl.o
[ 94%] Building SYCL object device/sycl/CMakeFiles/traccc_sycl.dir/src/seeding/spacepoint_binning.sycl.o
[ 94%] Building SYCL object device/sycl/CMakeFiles/traccc_sycl.dir/src/fitting/kalman_fitting_algorithm_constant_field_default_detector.sycl.o
[100%] Building CXX object device/sycl/CMakeFiles/traccc_sycl.dir/src/seeding/seeding_algorithm.cpp.o
[100%] Building SYCL object device/sycl/CMakeFiles/traccc_sycl.dir/src/utils/get_queue.sycl.o
[100%] Building CXX object device/sycl/CMakeFiles/traccc_sycl.dir/src/utils/queue_wrapper.cpp.o
[100%] Building SYCL object device/sycl/CMakeFiles/traccc_sycl.dir/src/seeding/track_params_estimation.sycl.o
[100%] Building SYCL object device/sycl/CMakeFiles/traccc_sycl.dir/src/utils/make_prefix_sum_buff.sycl.o
[100%] Building SYCL object device/sycl/CMakeFiles/traccc_sycl.dir/src/utils/calculate1DimNdRange.sycl.o
[100%] Linking SYCL shared library ../../lib/libtraccc_sycl.so
/usr/bin/ld: /tmp/clang-576599b5fc/combinatorial_kalman_filter_algorithm_constant_field_telescope_detector-3e69c6.o: in function `sycl::_V1::detail::SYCLMemObjAllocator::~SYCLMemObjAllocator()':
/data/ssd-1tb/projects/traccc/build-sycl/_deps/dpl-src/include/oneapi/dpl/pstl/hetero/../hetero/dpcpp/parallel_backend_sycl.h:1048: multiple definition of `oneapi::dpl::__ranges::__select_backend(oneapi::dpl::execution::v1::sequenced_policy)'; /tmp/clang-576599b5fc/combinatorial_kalman_filter_algorithm_constant_field_default_detector-bc72c3.o:/home/krasznaa/software/intel/oneapi-2024.2.1/compiler/2024.2/bin/compiler/../../include/sycl/detail/cg_types.hpp:376: first defined here
/usr/bin/ld: /tmp/clang-576599b5fc/combinatorial_kalman_filter_algorithm_constant_field_telescope_detector-3e69c6.o: in function `oneapi::dpl::__ranges::__select_backend(oneapi::dpl::execution::v1::unsequenced_policy)':
/data/ssd-1tb/projects/traccc/build-sycl/_deps/dpl-src/include/oneapi/dpl/pstl/hetero/../hetero/dpcpp/parallel_backend_sycl.h:1048: multiple definition of `oneapi::dpl::__ranges::__select_backend(oneapi::dpl::execution::v1::unsequenced_policy)'; /tmp/clang-576599b5fc/combinatorial_kalman_filter_algorithm_constant_field_default_detector-bc72c3.o:/home/krasznaa/software/intel/oneapi-2024.2.1/compiler/2024.2/bin/compiler/../../include/sycl/detail/cg_types.hpp:376: first defined here
/usr/bin/ld: /tmp/clang-576599b5fc/combinatorial_kalman_filter_algorithm_constant_field_telescope_detector-3e69c6.o: in function `oneapi::dpl::__ranges::__select_backend(oneapi::dpl::execution::v1::parallel_policy)':
/data/ssd-1tb/projects/traccc/build-sycl/_deps/dpl-src/include/oneapi/dpl/pstl/hetero/../hetero/dpcpp/parallel_backend_sycl.h:1048: multiple definition of `oneapi::dpl::__ranges::__select_backend(oneapi::dpl::execution::v1::parallel_policy)'; /tmp/clang-576599b5fc/combinatorial_kalman_filter_algorithm_constant_field_default_detector-bc72c3.o:/home/krasznaa/software/intel/oneapi-2024.2.1/compiler/2024.2/bin/compiler/../../include/sycl/detail/cg_types.hpp:(.text+0x20): first defined here
/usr/bin/ld: /tmp/clang-576599b5fc/combinatorial_kalman_filter_algorithm_constant_field_telescope_detector-3e69c6.o: in function `oneapi::dpl::__ranges::__select_backend(oneapi::dpl::execution::v1::parallel_unsequenced_policy)':
/usr/lib/gcc/x86_64-linux-gnu/13/../../../../include/c++/13/bits/shared_ptr_base.h:1522: multiple definition of `oneapi::dpl::__ranges::__select_backend(oneapi::dpl::execution::v1::parallel_unsequenced_policy)'; /tmp/clang-576599b5fc/combinatorial_kalman_filter_algorithm_constant_field_default_detector-bc72c3.o:/home/krasznaa/software/intel/oneapi-2024.2.1/compiler/2024.2/bin/compiler/../../include/sycl/detail/cg_types.hpp:380: first defined here
/usr/bin/ld: /tmp/clang-576599b5fc/kalman_fitting_algorithm_constant_field_default_detector-279957.o: in function `sycl::_V1::detail::SYCLMemObjAllocator::~SYCLMemObjAllocator()':
/data/ssd-1tb/projects/traccc/build-sycl/_deps/detray-src/core/include/detray/navigation/navigator.hpp:608: multiple definition of `oneapi::dpl::__ranges::__select_backend(oneapi::dpl::execution::v1::sequenced_policy)'; /tmp/clang-576599b5fc/combinatorial_kalman_filter_algorithm_constant_field_default_detector-bc72c3.o:/home/krasznaa/software/intel/oneapi-2024.2.1/compiler/2024.2/bin/compiler/../../include/sycl/detail/cg_types.hpp:376: first defined here
/usr/bin/ld: /tmp/clang-576599b5fc/kalman_fitting_algorithm_constant_field_default_detector-279957.o: in function `oneapi::dpl::__ranges::__select_backend(oneapi::dpl::execution::v1::unsequenced_policy)':
/data/ssd-1tb/projects/traccc/build-sycl/_deps/detray-src/core/include/detray/navigation/navigator.hpp:608: multiple definition of `oneapi::dpl::__ranges::__select_backend(oneapi::dpl::execution::v1::unsequenced_policy)'; /tmp/clang-576599b5fc/combinatorial_kalman_filter_algorithm_constant_field_default_detector-bc72c3.o:/home/krasznaa/software/intel/oneapi-2024.2.1/compiler/2024.2/bin/compiler/../../include/sycl/detail/cg_types.hpp:376: first defined here
/usr/bin/ld: /tmp/clang-576599b5fc/kalman_fitting_algorithm_constant_field_default_detector-279957.o: in function `oneapi::dpl::__ranges::__select_backend(oneapi::dpl::execution::v1::parallel_policy)':
/data/ssd-1tb/projects/traccc/build-sycl/_deps/detray-src/core/include/detray/geometry/tracking_surface.hpp:61: multiple definition of `oneapi::dpl::__ranges::__select_backend(oneapi::dpl::execution::v1::parallel_policy)'; /tmp/clang-576599b5fc/combinatorial_kalman_filter_algorithm_constant_field_default_detector-bc72c3.o:/home/krasznaa/software/intel/oneapi-2024.2.1/compiler/2024.2/bin/compiler/../../include/sycl/detail/cg_types.hpp:(.text+0x20): first defined here
/usr/bin/ld: /tmp/clang-576599b5fc/kalman_fitting_algorithm_constant_field_default_detector-279957.o: in function `oneapi::dpl::__ranges::__select_backend(oneapi::dpl::execution::v1::parallel_unsequenced_policy)':
/data/ssd-1tb/projects/traccc/build-sycl/_deps/algebraplugins-src/math/cmath/include/algebra/math/impl/cmath_getter.hpp:121: multiple definition of `oneapi::dpl::__ranges::__select_backend(oneapi::dpl::execution::v1::parallel_unsequenced_policy)'; /tmp/clang-576599b5fc/combinatorial_kalman_filter_algorithm_constant_field_default_detector-bc72c3.o:/home/krasznaa/software/intel/oneapi-2024.2.1/compiler/2024.2/bin/compiler/../../include/sycl/detail/cg_types.hpp:380: first defined here
/usr/bin/ld: /tmp/clang-576599b5fc/kalman_fitting_algorithm_constant_field_telescope_detector-d68ce6.o: in function `sycl::_V1::detail::SYCLMemObjAllocator::~SYCLMemObjAllocator()':
/data/ssd-1tb/projects/traccc/build-sycl/_deps/dpl-src/include/oneapi/dpl/pstl/hetero/../hetero/dpcpp/parallel_backend_sycl_radix_sort.h:772: multiple definition of `oneapi::dpl::__ranges::__select_backend(oneapi::dpl::execution::v1::sequenced_policy)'; /tmp/clang-576599b5fc/combinatorial_kalman_filter_algorithm_constant_field_default_detector-bc72c3.o:/home/krasznaa/software/intel/oneapi-2024.2.1/compiler/2024.2/bin/compiler/../../include/sycl/detail/cg_types.hpp:376: first defined here
/usr/bin/ld: /tmp/clang-576599b5fc/kalman_fitting_algorithm_constant_field_telescope_detector-d68ce6.o: in function `oneapi::dpl::__ranges::__select_backend(oneapi::dpl::execution::v1::unsequenced_policy)':
/data/ssd-1tb/projects/traccc/build-sycl/_deps/dpl-src/include/oneapi/dpl/pstl/hetero/../hetero/dpcpp/parallel_backend_sycl_radix_sort.h:772: multiple definition of `oneapi::dpl::__ranges::__select_backend(oneapi::dpl::execution::v1::unsequenced_policy)'; /tmp/clang-576599b5fc/combinatorial_kalman_filter_algorithm_constant_field_default_detector-bc72c3.o:/home/krasznaa/software/intel/oneapi-2024.2.1/compiler/2024.2/bin/compiler/../../include/sycl/detail/cg_types.hpp:376: first defined here
/usr/bin/ld: /tmp/clang-576599b5fc/kalman_fitting_algorithm_constant_field_telescope_detector-d68ce6.o: in function `oneapi::dpl::__ranges::__select_backend(oneapi::dpl::execution::v1::parallel_policy)':
/data/ssd-1tb/projects/traccc/build-sycl/_deps/dpl-src/include/oneapi/dpl/pstl/hetero/../hetero/dpcpp/parallel_backend_sycl_radix_sort.h:774: multiple definition of `oneapi::dpl::__ranges::__select_backend(oneapi::dpl::execution::v1::parallel_policy)'; /tmp/clang-576599b5fc/combinatorial_kalman_filter_algorithm_constant_field_default_detector-bc72c3.o:/home/krasznaa/software/intel/oneapi-2024.2.1/compiler/2024.2/bin/compiler/../../include/sycl/detail/cg_types.hpp:(.text+0x20): first defined here
/usr/bin/ld: /tmp/clang-576599b5fc/kalman_fitting_algorithm_constant_field_telescope_detector-d68ce6.o: in function `oneapi::dpl::__ranges::__select_backend(oneapi::dpl::execution::v1::parallel_unsequenced_policy)':
/data/ssd-1tb/projects/traccc/build-sycl/_deps/dpl-src/include/oneapi/dpl/pstl/hetero/../hetero/dpcpp/parallel_backend_sycl_radix_sort.h:(.text+0x30): multiple definition of `oneapi::dpl::__ranges::__select_backend(oneapi::dpl::execution::v1::parallel_unsequenced_policy)'; /tmp/clang-576599b5fc/combinatorial_kalman_filter_algorithm_constant_field_default_detector-bc72c3.o:/home/krasznaa/software/intel/oneapi-2024.2.1/compiler/2024.2/bin/compiler/../../include/sycl/detail/cg_types.hpp:380: first defined here
clang++: error: linker command failed with exit code 1 (use -v to see invocation)
gmake[3]: *** [device/sycl/CMakeFiles/traccc_sycl.dir/build.make:270: lib/libtraccc_sycl.so.0.17.0] Error 1
gmake[2]: *** [CMakeFiles/Makefile2:3633: device/sycl/CMakeFiles/traccc_sycl.dir/all] Error 2
gmake[1]: *** [CMakeFiles/Makefile2:3640: device/sycl/CMakeFiles/traccc_sycl.dir/rule] Error 2
gmake: *** [Makefile:914: traccc_sycl] Error 2

At first I thought that passing oneDPL 2022.7.0 headers to my compilation with -isystem (instead of -I) would confuse the compiler about where to pick up certain headers from. But that's not it. 😕 I don't actually know what the linked is talking about here, as none of those line numbers make any sense to me. 😕

@krasznaa
Copy link
Author

Attila, which device do you use? It would be helpful for the issue reproduction. I am going to delve into it.

The tests are currently done on NVIDIA GPUs, because we'll need to upgrade to oneAPI 2025.0.0 to get this latest part of our code working on Intel ones. (It's a longer story. You can find a taste of it in: acts-project/algebra-plugins#136)

@dmitriy-sobolev
Copy link
Contributor

The linkage error from #1936 (comment) was fixed in #1849.

@timmiesmith, could you make sure that #1849 is a part of oneDPL 2022.7 patch release? It seems it did not make it into the initial oneDPL 2022.7.

@timmiesmith
Copy link
Contributor

The linkage error from #1936 (comment) was fixed in #1849.

@timmiesmith, could you make sure that #1849 is a part of oneDPL 2022.7 patch release? It seems it did not make it into the initial oneDPL 2022.7.

I've submitted #1947 to pull #1849 into the patch release branch.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants