Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lazy initialization of queue inside dpcpp_default policy #1154

Closed
wants to merge 8 commits into from
37 changes: 32 additions & 5 deletions include/oneapi/dpl/pstl/hetero/dpcpp/execution_sycl_defs.h
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we also add a test that fails before this change?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might be able to add a test that just includes oneDPL but doesn't call anything, and then invoke it with a device filter that disables all devices (e.g., ONEAPI_DEVICE_SELECTOR='!*:*'). In this case, the old code should throw an exception whereas this PR should not.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have manually confirmed that this test works in principle. I created the following test:

#include <oneapi/dpl/execution>

int main()
{
    return 0;
}

Running it as ONEAPI_DEVICE_SELECTOR='!*:*' ./build/test/header_inclusion_only.pass crashes on main but does not crash with this branch.

I feel that there is benefit in adding this test, but it will require some special changes to the CMake to modify the environment specifically for this test. It is further complicated by the fact that the CMake files currently already do something similar with the variable $ONEAPI_DEVICE_SELECTOR, which is either set to ONEAPI_DEVICE_SELECTOR=gpu/acc/fpga or SYCL_DEVICE_FILTER=gpu/acc/fpga depending on the compiler version and device type.

The dynamic selection branch also makes changes to this variable. In all, I believe that there is value for adding this test but the risks are high to unintentionally break something with this being so close to the code freeze.

Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,8 @@

#include "sycl_defs.h"

#include <mutex>
#include <optional>
#include <type_traits>

namespace oneapi
Expand Down Expand Up @@ -50,14 +52,34 @@ class device_policy
device_policy(const device_policy<OtherName>& other) : q(other.queue())
{
}

device_policy(const device_policy& other) : q(other.queue()) {}
device_policy(device_policy&& other) : q(other.queue()) {}
device_policy&
operator=(const device_policy& other)
SergeyKopienko marked this conversation as resolved.
Show resolved Hide resolved
{
q = other.queue();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we really need to create sycl::queue instance here in other.queue() call ?
May be good enough simple copy q state?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should be sufficient to just copy q directly as you suggest.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

return *this;
}
device_policy&
operator=(device_policy&& other)
{
q.swap(other.q);
SergeyKopienko marked this conversation as resolved.
Show resolved Hide resolved
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The same:

q = ::std::move(other.q);

return *this;
}

explicit device_policy(sycl::queue q_) : q(q_) {}
explicit device_policy(sycl::device d_) { q.emplace(d_); }
operator sycl::queue() const { return queue(); }
sycl::queue
queue() const
{
if (!q)
SergeyKopienko marked this conversation as resolved.
Show resolved Hide resolved
q.emplace();
{
::std::scoped_lock lock{mtx};
if (!q)
q.emplace();
}
return *q;
}

Expand All @@ -81,6 +103,7 @@ class device_policy

protected:
mutable ::std::optional<sycl::queue> q;
mmichel11 marked this conversation as resolved.
Show resolved Hide resolved
mutable ::std::mutex mtx;
SergeyKopienko marked this conversation as resolved.
Show resolved Hide resolved
};

#if _ONEDPL_FPGA_DEVICE
Expand All @@ -104,13 +127,17 @@ class fpga_policy : public device_policy<KernelName>
queue() const
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Am I correct that this function sycl::queue queue() should be declared as virtual now in this and base class?

  1. Let's assume we have created the policy instance of fpga_policy
    After this PR this instance doesn't have created sycl::queue instance inside as it's lazy initialized.
  2. In some code we save result of static_cast<device_policy&>(policy) into auto& d_policy:
    fpga_policy policy;
    auto& d_policy = static_cast<device_policy&>(policy);
  1. and after that we call queue() function from reference to the base class:
    fpga_policy policy;
    auto& d_policy = static_cast<device_policy&>(policy);
    auto q = d_policy.queue();                   // call device_policy::queue() !?

Looks like we will call queue() method of the base class and creation of sycl::queue instance will be executed incorrectly.

Copy link
Contributor

@SergeyKopienko SergeyKopienko Sep 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or we need some another approach to resolve this issue: may be we are able to pack sycl::queue creation into someone predicates: one predicate for the base class and one for derived class and so on.

May be able to apply something like #1158
but this PR doesn't resolve the problem of broken lazy initialization in constructor of

    template <unsigned int other_factor, typename OtherName>
    fpga_policy(const fpga_policy<other_factor, OtherName>& other) : base(other.queue()){};

See my comment https://github.com/oneapi-src/oneDPL/pull/1154/files#r1318324154
Need to come up with something more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like that:

  • If we do that, this is not the design intent we want
  • If we don't do that - this is the API breaking change

{
if (!this->q)
this->q.emplace(
{
::std::scoped_lock lock{this->mtx};
if (!this->q)
this->q.emplace(
# if _ONEDPL_FPGA_EMU
__dpl_sycl::__fpga_emulator_selector()
__dpl_sycl::__fpga_emulator_selector()
# else
__dpl_sycl::__fpga_selector()
__dpl_sycl::__fpga_selector()
# endif // _ONEDPL_FPGA_EMU
);
);
}
return *this->q;
}
};
Expand Down