Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using oneDPL forces client to always run with "SYCL device" present #1060

Closed
krasznaa opened this issue Aug 17, 2023 · 7 comments
Closed

Using oneDPL forces client to always run with "SYCL device" present #1060

krasznaa opened this issue Aug 17, 2023 · 7 comments

Comments

@krasznaa
Copy link

I seem to have run into a pretty fundamental design issue in the code. 😦 As discussed in acts-project/traccc#442, apparently a oneDPL using application can never be executed on a host that has no "SYCL capable devices" at all.

In our applications we've seen a couple of times already that we must not create sycl::queue objects in a global scope. We use gtest_discover_tests(...) to set up tests in our projects to CTest. But for this to work in a CI, which generally doesn't provide us with "SYCL capable devices", the test code must only try to create sycl::queue-s once a test is actually running. (In practice this just meant not ever declaring "global" sycl::queue objects.)

Unfortunately as soon as we include anything from oneDPL, our tests fail to even build in our CI system. 😦 Because oneDPL itself apparently creates at least one (but as far as I can see actually multiple) sycl::queue object globally just by being included.

Thread 1 "traccc_test_syc" hit Catchpoint 1 (exception thrown), 0x00007ffff1cd1662 in __cxa_throw () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
(gdb) bt
#0  0x00007ffff1cd1662 in __cxa_throw () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#1  0x00007ffff1a74613 in sycl::_V1::detail::select_device(std::function<int (sycl::_V1::device const&)>, std::vector<sycl::_V1::device, std::allocator<sycl::_V1::device> >&) () from /software/intel/oneapi-2023.1.0/compiler/2023.1.0/linux/lib/libsycl.so.6
#2  0x00007ffff1a750b9 in sycl::_V1::detail::select_device(std::function<int (sycl::_V1::device const&)> const&) () from /software/intel/oneapi-2023.1.0/compiler/2023.1.0/linux/lib/libsycl.so.6
#3  0x00007ffff1a757fc in sycl::_V1::device_selector::select_device() const () from /software/intel/oneapi-2023.1.0/compiler/2023.1.0/linux/lib/libsycl.so.6
#4  0x0000000000472605 in sycl::_V1::queue::queue(sycl::_V1::device_selector const&, std::function<void (sycl::_V1::exception_list)> const&, sycl::_V1::property_list const&) (this=0x1074a78 <oneapi::dpl::execution::__dpl::dpcpp_default>, DeviceSelector=..., AsyncHandler=..., PropList=...) at /software/intel/oneapi-2023.1.0/compiler/2023.1.0/linux/bin-llvm/../include/sycl/queue.hpp:186
#5  0x0000000000472305 in sycl::_V1::queue::queue (this=0x1074a78 <oneapi::dpl::execution::__dpl::dpcpp_default>, PropList=...) at /software/intel/oneapi-2023.1.0/compiler/2023.1.0/linux/bin-llvm/../include/sycl/queue.hpp:95
#6  0x000000000047113e in oneapi::dpl::execution::__dpl::device_policy<oneapi::dpl::execution::__dpl::DefaultKernelName>::device_policy (this=0x1074a78 <oneapi::dpl::execution::__dpl::dpcpp_default>) at /software/intel/oneapi-2023.1.0/dpl/2022.1.0/linux/include/oneapi/dpl/pstl/hetero/dpcpp/execution_sycl_defs.h:48
#7  0x000000000046d092 in __cxx_global_var_init () at /software/intel/oneapi-2023.1.0/dpl/2022.1.0/linux/include/oneapi/dpl/pstl/hetero/dpcpp/execution_sycl_defs.h:121
#8  0x00000000005354ad in __libc_csu_init ()
#9  0x00007ffff15ff010 in __libc_start_main () from /lib/x86_64-linux-gnu/libc.so.6
#10 0x000000000046e27e in _start ()
(gdb)

I consider this a serious bug in the design. 🤔 Since it means that an application that uses oneDPL somewhere deep inside itself, possibly behind some checks that make sure that a "SYCL device" would actually be available before oneDPL functionality would be used, would fail to even start on a host with no "SYCL device" present. 😦

So those default policies would need to dynamically create their SYCL objects on first use as the simplest fix. But possibly some deeper re-thinking could be done on how that part of the code works...

Pinging @ivorobts.

@rarutyun
Copy link
Contributor

Hi Attila,

Please find my response and questions below:

In our applications we've seen a couple of times already that we must not create sycl::queue objects in a global scope. We use gtest_discover_tests(...) to set up tests in our projects to CTest. But for this to work in a CI, which generally doesn't provide us with "SYCL capable devices", the test code must only try to create sycl::queue-s once a test is actually running. (In practice this just meant not ever declaring "global" sycl::queue objects.)

You are writing about build error. But what I can see is a run-time exception. I am not expert in GTest, so could you please elaborate what's going on? If you really have compile-time error, then I would like to see the compilation command line and the error itself.

Unfortunately as soon as we include anything from oneDPL, our tests fail to even build in our CI system.

Do you mean "just including oneDPL headers"?

By the way, this is interesting. We (in oneDPL) should not create a global queue if user doesn't pass -fsycl option to the compiler (or if it's not enabled by default). Since you pass -fsycl (I guess) you expect to run that program eventually where you do have SYCL device. That's where my confusion comes from.

... at least one (but as far as I can see actually multiple) sycl::queue object globally just by being included.

It shouldn't be the case. It should be one.

I consider this a serious bug in the design.

Not necessarily. It's always a trade off between just creating the queue always and having lazy initialization but always check some flag for each interaction with a policy (which, in our case, contains the queue)

... would fail to even start on a host with no "SYCL device" present.

And that's my point that I tried to make previously. Why does one want to run the application that was built with -fsycl flag on the machine where there is no "SYCL device" present. As far as I remember we are doing exactly what SYCL does and the call stack you provides proves that. I mean it's SYCL runtime, which throughs the exception if no SYCL device is found.

So those default policies would need to dynamically create their SYCL objects on first use as the simplest fix.

It's not the simplest one, because this fix means ABI-breaking change. We would need to change the layout of the class.

To summarize:
Let's better understand your problematic case and see what's wrong to realize what we could potentially do.

@krasznaa
Copy link
Author

Hi Ruslan,

I guess I should start with the more fundamental issues before going to the more technical ones.

I'm very surprised that you say that anything compiled by a SYCL compiler must fundamentally always run on a host that has a SYCL capable device in it. 😕 That may be true for tiny applications that do one single thing. But any larger application, which may have certain of its internal functionalities accelerated using SYCL (and/or CUDA, HIP, etc.), may very well want to run on a host that has no SYCL devices. In which case it would want to fall back on running a non-SYCL version of its code.

As an aside it's worth noting here that in the early days the pitch for SYCL was that one would only need to write a given algorithm once, and that single implementation could then run on anything. (Including the host thread, in case no SYCL device is present.) This messaging is gone by now, since of course a single algorithm can never be efficient for very different types of devices, and most algorithms can even fundamentally not run in a single thread. (As soon as they rely on multiple threads doing work in parallel, and synchronizing with each other.)

The project that I'm considering integrating SYCL (and possibly oneDPL) into is made of millions of lines of C++ and Python. And needs to run on a lot of different types of hardware. So no, just because some source files are compiled by a SYCL compiler, the application must not be prevented from even starting on a host with no accelerators.

About the more technical things: If you're not familiar with CMake + GoogleTest then let's not even complicate things with that. I am indeed talking about a runtime issue here. (Notice that I never wrote "compile-time" in my text, but rather "build-time". Which includes some post-linking steps with CMake + GoogleTest, which require running the freshly built test executable.)

Very long story short, it must be possible to use oneDPL in an application and still be able to decide at runtime whether the oneDPL using part of the code would be executed. Otherwise it can not be considered for being included in large applications. 🤔

Best,
Attila

@krasznaa
Copy link
Author

Ping...?

@igorvorobtsov
Copy link

Hi @krasznaa ,
We are discussing different ways how to solve this problem and possible solutions. We will update this thread soon.

@rarutyun
Copy link
Contributor

Hi Attila,

Sorry for a delay.

I'm very surprised that you say that anything compiled by a SYCL compiler must fundamentally always run on a host that has a SYCL capable device in it. 😕 That may be true for tiny applications that do one single thing

We expected that SYCL should on any platform but unfortunately SYCL specification doesn't include what was called "host-device" anymore that was basically something that allows SYCL code to work even the platform doesn't "real" accelerators.

I am indeed talking about a runtime issue here. (Notice that I never wrote "compile-time" in my text, but rather "build-time".

Ok, I see. It's a difference in terminology per organizations/projects. I suspected that we are talking about some run-time errors and I also suspected that after building the tests you checking systems runs this test immediately, while SYCL might be unavailable in that platform. Thanks for clarification.

Very long story short, it must be possible to use oneDPL in an application and still be able to decide at runtime whether the oneDPL using part of the code would be executed.

Yeah, we expected the SYCL implementation might do something on their side, because the problem is not only about oneDPL. Basically, any abstraction on top of SYCL might have some globally initialized SYCL objects. Seems like fixing that issue in SYCL implementation is not something that is in plans, so we probably have the solution on our side.

As I previously said, making lazy initialization for the policy is not as straightforward as it might look like at first glance. We want to properly design that and then, implement in our code base. But at least we have the intent to fix that. You can watch over the PR #1154, linked in this issue.

Thanks for you patience.

Best Regards,
Ruslan

@akukanov
Copy link
Contributor

akukanov commented Jun 10, 2024

Hello @krasznaa, I would like to discuss the problem(s) with the oneDPL global device_policy objects in more details.

To set the background, let me state that the oneDPL implementation for algorithm offloading is based on SYCL 2020, which is also the SYCL specification supported by the oneAPI DPC++ compiler. In that specification, there are several rules that I think are important in the context of this discussion:

In principle, these rules make the current implementation of the oneDPL dpcpp_default global policy object technically correct, as it uses the default selector which is not supposed to ever throw any exception when obtaining a device e.g. for a queue construction. In other words, I rather disagree with the statement of oneDPL having a serious design flaw.

That said, we understand there might be specific cases where relying on the specification guarantees is not enough. Therefore we would like to better understand the requirements that your project/organization has in order to consider the use of oneDPL.

For the sake of discussion, let's assume that the execution environment is somehow configured to have no default SYCL device. The requirement of letting the application start without errors/exceptions in that case is clear. Besides that, what other actions with oneDPL device execution policies you expect to be possible? Or is it safe to assume that an application will not try to do anything at all with execution policies in such a setup?

This is in the context of determining the required scope for a workaround for this issue. The recent proposal we have (#1618) essentially allows initializing the predefined policy to an unusable state if there are no devices, while considering any attempt to use it afterwards an undefined behavior. In your opinion, is it a suitable approach to satisfy your stated requirement of

Very long story short, it must be possible to use oneDPL in an application and still be able to decide at runtime whether the oneDPL using part of the code would be executed. Otherwise it can not be considered for being included in large applications.

and if not, what are the points of your concern?

Thank you in advance for the feedback.

@akukanov
Copy link
Contributor

The issue should be fixed by #1652

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants