Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lazy initialization of queue inside dpcpp_default policy #1154

Closed
wants to merge 8 commits into from

Conversation

adamfidel
Copy link
Contributor

There are a few potential issues with oneDPL's global sycl::queue, including:

To address the issue, this PR lazily initializes the sycl::queue inside the dpcpp_default execution policy on first use.

@adamfidel adamfidel added this to the 2022.3.0 milestone Sep 6, 2023
@adamfidel adamfidel marked this pull request as ready for review September 6, 2023 16:01
@adamfidel adamfidel changed the title Draft: Lazy initialization of queue inside dpcpp_default policy Lazy initialization of queue inside dpcpp_default policy Sep 6, 2023
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we also add a test that fails before this change?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might be able to add a test that just includes oneDPL but doesn't call anything, and then invoke it with a device filter that disables all devices (e.g., ONEAPI_DEVICE_SELECTOR='!*:*'). In this case, the old code should throw an exception whereas this PR should not.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have manually confirmed that this test works in principle. I created the following test:

#include <oneapi/dpl/execution>

int main()
{
    return 0;
}

Running it as ONEAPI_DEVICE_SELECTOR='!*:*' ./build/test/header_inclusion_only.pass crashes on main but does not crash with this branch.

I feel that there is benefit in adding this test, but it will require some special changes to the CMake to modify the environment specifically for this test. It is further complicated by the fact that the CMake files currently already do something similar with the variable $ONEAPI_DEVICE_SELECTOR, which is either set to ONEAPI_DEVICE_SELECTOR=gpu/acc/fpga or SYCL_DEVICE_FILTER=gpu/acc/fpga depending on the compiler version and device type.

The dynamic selection branch also makes changes to this variable. In all, I believe that there is value for adding this test but the risks are high to unintentionally break something with this being so close to the code freeze.

device_policy&
operator=(const device_policy& other)
{
q = other.queue();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we really need to create sycl::queue instance here in other.queue() call ?
May be good enough simple copy q state?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should be sufficient to just copy q directly as you suggest.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Copy link
Contributor

@akukanov akukanov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not see any serious problems in the current version of the patch. However, the implementations of the policy classes could use some more uglification.

Comment on lines 90 to 92

explicit device_policy(sycl::queue q_) : q(q_) {}
explicit device_policy(sycl::device d_) : q(d_) {}
operator sycl::queue() const { return q; }
explicit device_policy(sycl::device d_) { q.emplace(d_); }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you reorder the code so that these existing constructors come before the new copy & move ones, perhaps right after the line 54?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

include/oneapi/dpl/pstl/hetero/dpcpp/execution_sycl_defs.h Outdated Show resolved Hide resolved
{
::std::scoped_lock lock{other.mtx};
q.swap(other.q);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe better to make simple move assignment to not extend life of our instance sycl::queue if it's already created earlier:

q = ::std::move(other.q);

Move assignment supported in std::optional : https://en.cppreference.com/w/cpp/utility/optional/operator%3D

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also header file for std::swap not included.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also don't understand why we do use swap member function for move and don't use it for copy. I feel like reusing copy/move-assignment from optional is the right way to go. But if you have arguments why you did swap please clarify.

My other question is can we make member-init-list there instead of default constructing and then assigning? For example, astatic function that that locks and returns optional with the correct value category?

Also header file for std::swap not included.

That's something I don't understand at all. we don't use std::swap (at least in this line). For member function including <optional> is enough

Copy link
Contributor

@SergeyKopienko SergeyKopienko Sep 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean if we take a loot into https://en.cppreference.com/w/cpp/algorithm/swap,
there are 3 headers in which std::swap declared:
Defined in header <algorithm> (until C++11)
Defined in header <utility> (since C++11)
Defined in header <string_view> (since C++17)
And when I see introduced std::swap call I'm waiting for the inclusion of one from these headers.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I switched the swap with the move constructor and rewrote the copy/move constructor as suggested by @rarutyun.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://github.com/oneapi-src/oneDPL/pull/1154/files/9d6b47a9ae2a0a22c750082719a53afee7031b3d#r1318514858

  • it was my mistake: sure, you have used swap member function, not ::std::swap and no additional headers required in this case.
    I am sorry.

operator=(device_policy&& other)
{
::std::scoped_lock lock{mtx, other.mtx};
q.swap(other.q);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The same:

q = ::std::move(other.q);


operator sycl::queue() const { return queue(); }
sycl::queue
queue() const
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Am I correct that this function sycl::queue queue() should be declared as virtual now in this and base class?

  1. Let's assume we have created the policy instance of fpga_policy
    After this PR this instance doesn't have created sycl::queue instance inside as it's lazy initialized.
  2. In some code we save result of static_cast<device_policy&>(policy) into auto& d_policy:
    fpga_policy policy;
    auto& d_policy = static_cast<device_policy&>(policy);
  1. and after that we call queue() function from reference to the base class:
    fpga_policy policy;
    auto& d_policy = static_cast<device_policy&>(policy);
    auto q = d_policy.queue();                   // call device_policy::queue() !?

Looks like we will call queue() method of the base class and creation of sycl::queue instance will be executed incorrectly.

Copy link
Contributor

@SergeyKopienko SergeyKopienko Sep 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or we need some another approach to resolve this issue: may be we are able to pack sycl::queue creation into someone predicates: one predicate for the base class and one for derived class and so on.

May be able to apply something like #1158
but this PR doesn't resolve the problem of broken lazy initialization in constructor of

    template <unsigned int other_factor, typename OtherName>
    fpga_policy(const fpga_policy<other_factor, OtherName>& other) : base(other.queue()){};

See my comment https://github.com/oneapi-src/oneDPL/pull/1154/files#r1318324154
Need to come up with something more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like that:

  • If we do that, this is not the design intent we want
  • If we don't do that - this is the API breaking change

{
}

fpga_policy() = default;
template <unsigned int other_factor, typename OtherName>
fpga_policy(const fpga_policy<other_factor, OtherName>& other) : base(other.queue()){};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lazy initialization concept would be broken at this place: queue() call will create sycl::queue instance

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May be we should pass static_cast<const base&>(other) into constructor of base class to avoid this broken case?

Copy link
Contributor

@SergeyKopienko SergeyKopienko Sep 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also looks like move constructor of this format and copy/move operaor= required in this class too.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lazy initialization concept would be broken at this place: queue() call will create sycl::queue instance

That's exactly the question what behavior we want for non-default constructors and assignments. I think I want them being initialized (I can change my mind, though), but I also can see the issues with that since we have inheritance

include/oneapi/dpl/pstl/hetero/dpcpp/execution_sycl_defs.h Outdated Show resolved Hide resolved
{
::std::scoped_lock lock{other.mtx};
q.swap(other.q);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also don't understand why we do use swap member function for move and don't use it for copy. I feel like reusing copy/move-assignment from optional is the right way to go. But if you have arguments why you did swap please clarify.

My other question is can we make member-init-list there instead of default constructing and then assigning? For example, astatic function that that locks and returns optional with the correct value category?

Also header file for std::swap not included.

That's something I don't understand at all. we don't use std::swap (at least in this line). For member function including <optional> is enough

device_policy&
operator=(const device_policy& other)
{
::std::scoped_lock lock{mtx, other.mtx};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe it's not done. What if it's self assignment? How does scoped_lock behave for the same mutex passed twice? My bet it's a deadlock (better to say, lifelock)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, @rarutyun, you are right. Agreed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I modified the code to now check for self assignment.

device_policy(const device_policy<OtherName>& other)
{
::std::scoped_lock lock{other.mtx};
q = other.q;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am wondering what is the correct design? should we call other.queue() or other.q? This is relevant for all places. I understand the difference in the behavior and I am trying to understand what is the right one.

@rarutyun
Copy link
Contributor

rarutyun commented Sep 7, 2023

Just to state it explicitly. Those are not all comments that I have. I am thinking about Sergey's comment about virtuality and also about additional things for copy/move semantics

@akukanov akukanov removed this from the 2022.3.0 milestone Sep 7, 2023
@akukanov akukanov added the follow through PRs/issues that should be completed/resolved label Nov 29, 2023
@akukanov akukanov removed the follow through PRs/issues that should be completed/resolved label Jul 10, 2024
@akukanov
Copy link
Contributor

After merging #1652, this patch is outdated.

@SergeyKopienko SergeyKopienko deleted the dev/adamfidel/lazy_init_dpcpp_default branch July 10, 2024 12:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants