Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add oneDPL for SYCL #442

Closed
wants to merge 5 commits into from
Closed

Conversation

beomki-yeo
Copy link
Contributor

@beomki-yeo beomki-yeo commented Aug 15, 2023

Now depends on #461


As thrust functions doesn't seem to work with SYCL... (#437)

oneDPL is compiled only when SYCL is built
BTW, there is a tons of warnings from my compiler. Maybe my compiler is too old or I am using a wrong version of DPL.

@beomki-yeo beomki-yeo marked this pull request as draft August 15, 2023 16:19
@beomki-yeo beomki-yeo force-pushed the add-oneDPL branch 2 times, most recently from 1cb31ee to 27497b3 Compare August 15, 2023 17:21
@beomki-yeo beomki-yeo marked this pull request as ready for review August 15, 2023 17:21
@krasznaa
Copy link
Member

I'm absolutely on board with this effort! Tomorrow I'll look at the exact setup with which oneDPL should be built by hand. But generally I think this is indeed the way to run/start algorithms on device data from the host, when using SYCL.

@beomki-yeo
Copy link
Contributor Author

Yeah I appreciate that you will look into that.

@krasznaa
Copy link
Member

This was really my Vietnam... I'll tell you some of the details tomorrow at lunch I guess...

Let's hope that the CI build will work as well.

@krasznaa
Copy link
Member

🤔 I believe we found a bug in oneDPL... (I mean, yet another one. The whole issue with include orders is a serious issue by itself in my mind.)

You see, the failure comes from:

Thread 1 "traccc_test_syc" hit Catchpoint 1 (exception thrown), 0x00007ffff1cd1662 in __cxa_throw () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
(gdb) bt
#0  0x00007ffff1cd1662 in __cxa_throw () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#1  0x00007ffff1a74613 in sycl::_V1::detail::select_device(std::function<int (sycl::_V1::device const&)>, std::vector<sycl::_V1::device, std::allocator<sycl::_V1::device> >&) () from /software/intel/oneapi-2023.1.0/compiler/2023.1.0/linux/lib/libsycl.so.6
#2  0x00007ffff1a750b9 in sycl::_V1::detail::select_device(std::function<int (sycl::_V1::device const&)> const&) () from /software/intel/oneapi-2023.1.0/compiler/2023.1.0/linux/lib/libsycl.so.6
#3  0x00007ffff1a757fc in sycl::_V1::device_selector::select_device() const () from /software/intel/oneapi-2023.1.0/compiler/2023.1.0/linux/lib/libsycl.so.6
#4  0x0000000000472605 in sycl::_V1::queue::queue(sycl::_V1::device_selector const&, std::function<void (sycl::_V1::exception_list)> const&, sycl::_V1::property_list const&) (this=0x1074a78 <oneapi::dpl::execution::__dpl::dpcpp_default>, DeviceSelector=..., AsyncHandler=..., PropList=...) at /software/intel/oneapi-2023.1.0/compiler/2023.1.0/linux/bin-llvm/../include/sycl/queue.hpp:186
#5  0x0000000000472305 in sycl::_V1::queue::queue (this=0x1074a78 <oneapi::dpl::execution::__dpl::dpcpp_default>, PropList=...) at /software/intel/oneapi-2023.1.0/compiler/2023.1.0/linux/bin-llvm/../include/sycl/queue.hpp:95
#6  0x000000000047113e in oneapi::dpl::execution::__dpl::device_policy<oneapi::dpl::execution::__dpl::DefaultKernelName>::device_policy (this=0x1074a78 <oneapi::dpl::execution::__dpl::dpcpp_default>) at /software/intel/oneapi-2023.1.0/dpl/2022.1.0/linux/include/oneapi/dpl/pstl/hetero/dpcpp/execution_sycl_defs.h:48
#7  0x000000000046d092 in __cxx_global_var_init () at /software/intel/oneapi-2023.1.0/dpl/2022.1.0/linux/include/oneapi/dpl/pstl/hetero/dpcpp/execution_sycl_defs.h:121
#8  0x00000000005354ad in __libc_csu_init ()
#9  0x00007ffff15ff010 in __libc_start_main () from /lib/x86_64-linux-gnu/libc.so.6
#10 0x000000000046e27e in _start ()
(gdb)

(I had to make a Debug build against oneAPI itself, but let's not go into the technical reasons for that...)

Generally, we don't want tests to instantiate sycl::queue objects in a global scope, because that means that unless you have a "SYCL device" available, you can't even build the test. This is because our CMake based build tries to ask the test executables about the names of all of the tests that they hold. So if the test executable can't do that, we have a problem.

To test this locally, one can do something like:

[bash][atspot01]:build > ONEAPI_DEVICE_SELECTOR= ./bin/traccc_test_sycl --help
terminate called after throwing an instance of 'sycl::_V1::runtime_error'
  what():  No device of requested type available. Please check https://software.intel.com/content/www/us/en/develop/articles/intel-oneapi-dpcpp-system-requirements.html -1 (PI_ERROR_DEVICE_NOT_FOUND)
Aborted (core dumped)
[bash][atspot01]:build >

This is exactly what I attached a debugger to as well, which lead to the backtrace shown above. Which shows that oneDPL always creates at least one (but likely multiple...) sycl::queue object globally with:

https://github.com/oneapi-src/oneDPL/blob/main/include/oneapi/dpl/pstl/hetero/dpcpp/execution_sycl_defs.h#L121

That's not too cool. 😦 It means that an application that uses oneDPL somewhere deep inside itself has no way to gracefully decide during execution that it does not want to use DPL (in case it detects that there's no device that it could use).

Let me create bug reports in oneDPL about the two bugs that we found so far...

@beomki-yeo beomki-yeo force-pushed the add-oneDPL branch 2 times, most recently from 8a664df to 202b51a Compare September 28, 2023 14:33
@beomki-yeo
Copy link
Contributor Author

Now included in #689

@beomki-yeo beomki-yeo closed this Sep 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants