[SYCL][PI][CUDA] Implements get_native interoperability #1332

steffenlarsen · 2020-03-16T17:24:59Z

Implements get_native for CUDA allowing queries for native handles on
SYCL objects; queue, event, context, and device.

This is based on the SYCL generalization proposal.

Signed-off-by: Steffen Larsen steffen.larsen@codeplay.com

smaslov-intel · 2020-03-17T17:36:37Z

This conflicts with the PI extension proposed in #1244.
Tagging @garimagu

smaslov-intel

Please see how to merge with #1244

steffenlarsen · 2020-03-18T18:30:00Z

I've added a commit for generalizing the get_mem and get_queue of interop_handler.

This conflicts with the PI extension proposed in #1244.

Thank you for notifying me. @garimagu please have a look and let me know how you think we should proceed.

garimagu · 2020-03-18T20:04:47Z

I've added a commit for generalizing the get_mem and get_queue of interop_handler.

This conflicts with the PI extension proposed in #1244.

Thank you for notifying me. @garimagu please have a look and let me know how you think we should proceed.

The API here only returns native handle from the Pi handle. To support interoperability, we should support APIs for conversion of native_handle->Pi handle.
Further, if having one method for all handles seems to be a better approach, then I will have to change the method in review #1244 .
I will let @smaslov-intel comment on which approach he thinks is better.

smaslov-intel · 2020-03-18T20:12:02Z

Me preference would be for separate API for creating PI objects from native handles, since there is not really anything shared between them. And if we stick with combining conversions to/from native handles in single API (as done in #1244) then we are up to adding remaining pi*Convert interfaces.

steffenlarsen · 2020-03-18T20:23:10Z

Me preference would be for separate API for creating PI objects from native handles, since there is not really anything shared between them. And if we stick with combining conversions to/from native handles in single API (as done in #1244) then we are up to adding remaining pi*Convert interfaces.

For OpenCL, isn't it just a cast in either case, i.e. if we want the native handle the PI object is simply cast to it and the other way around. For CUDA I'm inclined to agree with you, but the two main issues I see with it is:

It's bloating the API with a lot of extension functions.
Some BEs may not have native handles for all PI object types, so having it in a single function makes it easy for such a backend to reach a point where all the native handles are implemented, without having to create all the functions for the conversions that they don't support.

smaslov-intel · 2020-03-19T02:23:53Z

For OpenCL, isn't it just a cast in either case, i.e. if we want the native handle the PI object is simply cast to it and the other way around. For CUDA I'm inclined to agree with you, but the two main issues I see with it is:

That is correct, as we've decided to base PI API on OpenCL API as much as possible, meaning the OpenCL PI plugin is really thin.

It's bloating the API with a lot of extension functions.

Not so much a bloat. Just 1 new API for every SYCL class. With program and device are already covered in #1244, there are really just several more.

Some BEs may not have native handles for all PI object types

While this is true, I don't see why this would affect our decision on the questions at hands. BEs are free to implement only subset of PI API. If CUDA BE, for example, is not going to support some of the general interoperability, then it is equally doable with a single or multiple "convert" interfaces.

steffenlarsen · 2020-03-19T10:03:16Z

Not so much a bloat. Just 1 new API for every SYCL class. With program and device are already covered in #1244, there are really just several more.

On that note, is there a point in not having the convert be two separate functions? If we want to split somewhat unrelated operations, why have the convert function be two-way?

Some BEs may not have native handles for all PI object types

While this is true, I don't see why this would affect our decision on the questions at hands. BEs are free to implement only subset of PI API. If CUDA BE, for example, is not going to support some of the general interoperability, then it is equally doable with a single or multiple "convert" interfaces.

My point here was mostly that each backend will have to implement all the convert functions, even when new are added, whereas having it as one will allow BEs to implement all that they currently support and default to some error if not. Having separate functions and not having them implemented in each backend will likely cause segfaults.

romanovvlad · 2020-03-19T16:47:00Z

My 5 cents: I agree with @steffenlarsen , having two APIs proposed by the patch look cleaner.

smaslov-intel · 2020-03-19T17:12:19Z

How about a single API piGetNativeHandle to return the native handle as introduced in this patch, but separate APIs for constructing PI objects from native handles, i.e. piProgramCreateFromNative/piDeviceCreateFromNative/etc ?

romanovvlad · 2020-03-20T11:05:23Z

How about a single API piGetNativeHandle to return the native handle as introduced in this patch, but separate APIs for constructing PI objects from native handles, i.e. piProgramCreateFromNative/piDeviceCreateFromNative/etc ?

Why is it better than proposed solutions? To me it has worst from both.

Ruyk · 2020-03-20T12:37:36Z

I think I could argue either way. Having separate functions allows using specific PI api types directly, and its more clear what is happening. However, it adds much more surface to the API for no particular reason.
A single function to create from a handle is less code to maintain, but its more obscure to use.
If PI were to be a user-friendly API, I would side with the first, but given this is purely for implementing SYCL RT, I am leaning more towards a single function that can do all the conversions, and default to an error when the conversion doesn't exist. This reduces maintenance effort of the PI backends.

smaslov-intel · 2020-03-20T16:46:45Z

Separate APIs would follow the spirit of what PI API is currently doing: separate API for PI objects creation -- pi*Create. There are also talks to make PI API out of SYCL internals and serve the purpose of integrating other run-times with the low-level drivers. I'd rather not "pollute" the API for no good reasons (maintenance cost difference is negligible between the options we are discussing).

sycl/include/CL/sycl/backend/opencl.hpp

sycl/include/CL/sycl/backend.hpp

sycl/include/CL/sycl/backend/cuda.hpp

sycl/include/CL/sycl/backend.hpp

sycl/include/CL/sycl/backend/cuda_definitions.hpp

sycl/include/CL/sycl/backend/opencl.hpp

sycl/include/CL/sycl/detail/cg.hpp

sycl/test/backend/cuda/interop_get_native.cpp

sycl/include/CL/sycl/context.hpp

romanovvlad

LGTM. Last comment from me.
Please, align with @smaslov-intel on the number of PI APIs.

sycl/source/detail/device_impl.hpp

sycl/include/CL/sycl/detail/pi.def

sycl/include/CL/sycl/detail/pi.h

sycl/source/detail/program_impl.cpp

smaslov-intel · 2020-04-02T19:30:54Z

I've only added the piext*GetNative and piext*FromNative currently needed, but I have made sure that all PI object types that has one also has the other.

Great thanks for doing this!

I changed the wording for the piext*FromNative to specify that the created PI object must increment the ref count by one, just to ensure that we avoid confusion when implementing this for the CUDA backend as it's the PI object that is reference counted not the CUDA object itself. That is, the PI object created from a CUDA object should have a reference count of 1 whereas for OpenCL it should have the previous value +1.

I totally agree that the created PI objects should not be "hanging", so a reference-counter should be increased (or started where it didn't exist before). But please clarify that in the API description, that it is not just +1 always.

On that note, this means that the PI object takes ownership of the passed CUDA object and will release it upon destruction.

This is a worrying one. We can equally choose not taking the ownership of native objects, and just release PI object after it's reference-count gets to 0, while leaving it up to the user who created/provided the native handle to release its handles. Not sure how they would know that PI no longer uses his handles, though. Anyways, we can think/re-define this later, but in this PR please clarify this important aspect in the PI API descriptions.

steffenlarsen · 2020-04-03T12:42:14Z

Thank you for the feedback. I've adjusted it accordingly.

This is a worrying one. We can equally choose not taking the ownership of native objects, and just release PI object after it's reference-count gets to 0, while leaving it up to the user who created/provided the native handle to release its handles. Not sure how they would know that PI no longer uses his handles, though. Anyways, we can think/re-define this later, but in this PR please clarify this important aspect in the PI API descriptions.

You're absolutely right. Either way will require user-knowledge, but since this is interoperability I think we can rely on the user not using it completely blindly. The decision should however probably be documented when taken, but as you also suggest this is best left for when the functions are actually implemented in the CUDA backend.

sycl/include/CL/sycl/detail/pi.h

sycl/include/CL/sycl/backend_types.hpp

Implements get_native for CUDA allowing queries for native handles on SYCL objects; queue, event, context, and device. Signed-off-by: Steffen Larsen <steffen.larsen@codeplay.com>

This commit makes get_mem and get_queue of interop_handler return types based on a specified backend. The backend defaults to OpenCL to avoid breakages. Signed-off-by: Steffen Larsen <steffen.larsen@codeplay.com>

Signed-off-by: Steffen Larsen <steffen.larsen@codeplay.com>

PI objects created with the associated piextCreate*WithNativeHandle function takes ownership of the native handle passed to it. For OpenCL this means that the native OpenCL object is not retained, so the PI object effectively owns the reference of the caller. The OpenCL interoperability have been changed to retain the native OpenCL handle as the implementation no longer does it. This is required by the SYCL 1.2.1 specification (Rev6, section 4.3.1.) Signed-off-by: Steffen Larsen <steffen.larsen@codeplay.com>

romanovvlad

LGTM but I would like @smaslov-intel to approve as well before merge

…duler_docs * origin/sycl: [SYCL][PI][CUDA] Implements get_native interoperability (intel#1332) [SYCL] Fix check-sycl test suite on systems w/o OpenCL (intel#1503) [SYCL][Doc] Update ExtendedAtomics documentation (intel#1487) [SYCL][CUDA] Expose context extended deleters on PI API (intel#1483) [SYCL][NFC] Remove a dropped environment variable from a test (intel#1506) [SYCL] Add opencl-aot to sycl-toolchain target (intel#1504) [SYCL] Allow to run deploy LIT tests from particular directory [SYCL][CUDA] Fix LIT testing with CUDA devices (intel#1300) [SYCL] Remove operator name keywords (intel#1501) [Driver][SYCL] Consider .lo files as static archives (intel#1500) [SYCL-PTX] Update the compiler design to describe the CUDA target (intel#1408) [SYCL] Fix library build on Windows (intel#1499) [SYCL][NFC] Refactor lit.cfg.py (intel#1452) [SYCL] Fixed sub-buffer memory allocation update (intel#1486) [SYCL] Ensure proper definition of spirv builtins for SYCL (intel#1393) [SYCL][CUDA] LIT XFAIL/UNSUPPORTED (intel#1303) [SYCL][Doc] Function-type kernel attribute extension (intel#1494)

…c_abi_checks * origin/sycl: (32 commits) [SYCL] Do not force LLVM_INCLUDE_TESTS variable (intel#1505) [SYCL][NFC] Align nd_item members with constructor initialization list (intel#1521) [SYCL] Move get_info_host implementation to header (intel#1514) [SYCL] Always use dynamic CRT for Unit tests (intel#1515) [SYCL][NFC] Temporarily disable sporadically failing test (intel#1526) [SYCL] Fix inline namespaces (intel#1525) [SYCL] Release notes for March'20 DPCPP implementation update (intel#1511) [SYCL][PI][CUDA] Implements get_native interoperability (intel#1332) [SYCL] Fix check-sycl test suite on systems w/o OpenCL (intel#1503) [SYCL][Doc] Update ExtendedAtomics documentation (intel#1487) [SYCL][CUDA] Expose context extended deleters on PI API (intel#1483) [SYCL][NFC] Remove a dropped environment variable from a test (intel#1506) [SYCL] Add opencl-aot to sycl-toolchain target (intel#1504) [SYCL] Allow to run deploy LIT tests from particular directory [SYCL][CUDA] Fix LIT testing with CUDA devices (intel#1300) [SYCL] Remove operator name keywords (intel#1501) [Driver][SYCL] Consider .lo files as static archives (intel#1500) [SYCL-PTX] Update the compiler design to describe the CUDA target (intel#1408) [SYCL] Fix library build on Windows (intel#1499) [SYCL][NFC] Refactor lit.cfg.py (intel#1452) ...

steffenlarsen force-pushed the steffen/generalized-get-native branch 2 times, most recently from 0ecfea1 to e4e4441 Compare March 16, 2020 17:33

bader added the cuda CUDA back-end label Mar 17, 2020

bader requested a review from smaslov-intel March 17, 2020 09:48

bader assigned romanovvlad Mar 17, 2020

steffenlarsen force-pushed the steffen/generalized-get-native branch from e4e4441 to 5fb6161 Compare March 17, 2020 11:23

smaslov-intel suggested changes Mar 17, 2020

View reviewed changes

steffenlarsen force-pushed the steffen/generalized-get-native branch from 5fb6161 to 3406b3e Compare March 18, 2020 18:26

steffenlarsen force-pushed the steffen/generalized-get-native branch from 3406b3e to 87e82b4 Compare March 18, 2020 18:34

garimagu mentioned this pull request Mar 18, 2020

[SYCL][PI] Add interoperability with generic handles to device and program classes #1244

Merged

fwyzard reviewed Mar 21, 2020

View reviewed changes

sycl/include/CL/sycl/backend/opencl.hpp Outdated Show resolved Hide resolved

romanovvlad requested changes Mar 22, 2020

View reviewed changes

steffenlarsen force-pushed the steffen/generalized-get-native branch 2 times, most recently from 7933926 to 8c05f76 Compare March 23, 2020 18:55

romanovvlad reviewed Mar 24, 2020

View reviewed changes

sycl/source/detail/device_impl.hpp Outdated Show resolved Hide resolved

steffenlarsen force-pushed the steffen/generalized-get-native branch 2 times, most recently from e05eaeb to 11c2ce6 Compare March 24, 2020 16:49

smaslov-intel reviewed Apr 2, 2020

View reviewed changes

sycl/include/CL/sycl/detail/pi.def Outdated Show resolved Hide resolved

smaslov-intel reviewed Apr 2, 2020

View reviewed changes

sycl/include/CL/sycl/detail/pi.h Outdated Show resolved Hide resolved

smaslov-intel reviewed Apr 2, 2020

View reviewed changes

sycl/source/detail/program_impl.cpp Outdated Show resolved Hide resolved

steffenlarsen force-pushed the steffen/generalized-get-native branch from 8fcb3c4 to 2cb153c Compare April 3, 2020 12:29

steffenlarsen requested a review from bader as a code owner April 3, 2020 12:29

smaslov-intel reviewed Apr 5, 2020

View reviewed changes

sycl/include/CL/sycl/detail/pi.h Outdated Show resolved Hide resolved

hiaselhans mentioned this pull request Apr 7, 2020

[SYCL][PI] get backend type from Plugin, not env #1477

Closed

smaslov-intel mentioned this pull request Apr 8, 2020

[SYCL] Support connection with multiple plugins #1490

Merged

smaslov-intel reviewed Apr 8, 2020

View reviewed changes

sycl/include/CL/sycl/backend_types.hpp Show resolved Hide resolved

steffenlarsen force-pushed the steffen/generalized-get-native branch 2 times, most recently from f2a0879 to 8103c1e Compare April 13, 2020 16:31

[SYCL][PI][CUDA] Implements get_native interoperability

25a76c5

Implements get_native for CUDA allowing queries for native handles on SYCL objects; queue, event, context, and device. Signed-off-by: Steffen Larsen <steffen.larsen@codeplay.com>

steffenlarsen force-pushed the steffen/generalized-get-native branch from 8103c1e to 2a1d463 Compare April 14, 2020 10:54

Steffen Larsen added 3 commits April 14, 2020 12:00

[SYCL][PI][OpenCL] Generalizing interop handler getters

a884b13

This commit makes get_mem and get_queue of interop_handler return types based on a specified backend. The backend defaults to OpenCL to avoid breakages. Signed-off-by: Steffen Larsen <steffen.larsen@codeplay.com>

[SYCL][PI] Splitting the converter function

f67949a

Signed-off-by: Steffen Larsen <steffen.larsen@codeplay.com>

steffenlarsen force-pushed the steffen/generalized-get-native branch from 2a1d463 to 9a67949 Compare April 14, 2020 11:00

steffenlarsen mentioned this pull request Apr 14, 2020

PI object created from native handles take ownership #1516

Closed

bader requested review from smaslov-intel and romanovvlad April 14, 2020 15:07

romanovvlad approved these changes Apr 14, 2020

View reviewed changes

smaslov-intel approved these changes Apr 14, 2020

View reviewed changes

mehdi-goli mentioned this pull request Apr 14, 2020

Adding cuBLAS backend to oneMKL. uxlfoundation/oneMath#2

Merged

pvchupin merged commit 2d71c8e into intel:sycl Apr 14, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SYCL][PI][CUDA] Implements get_native interoperability #1332

[SYCL][PI][CUDA] Implements get_native interoperability #1332

steffenlarsen commented Mar 16, 2020

smaslov-intel commented Mar 17, 2020

smaslov-intel left a comment

steffenlarsen commented Mar 18, 2020

garimagu commented Mar 18, 2020

smaslov-intel commented Mar 18, 2020

steffenlarsen commented Mar 18, 2020

smaslov-intel commented Mar 19, 2020

steffenlarsen commented Mar 19, 2020

romanovvlad commented Mar 19, 2020

smaslov-intel commented Mar 19, 2020

romanovvlad commented Mar 20, 2020

Ruyk commented Mar 20, 2020

smaslov-intel commented Mar 20, 2020

romanovvlad left a comment

smaslov-intel commented Apr 2, 2020

steffenlarsen commented Apr 3, 2020

romanovvlad left a comment

[SYCL][PI][CUDA] Implements get_native interoperability #1332

[SYCL][PI][CUDA] Implements get_native interoperability #1332

Conversation

steffenlarsen commented Mar 16, 2020

smaslov-intel commented Mar 17, 2020

smaslov-intel left a comment

Choose a reason for hiding this comment

steffenlarsen commented Mar 18, 2020

garimagu commented Mar 18, 2020

smaslov-intel commented Mar 18, 2020

steffenlarsen commented Mar 18, 2020

smaslov-intel commented Mar 19, 2020

steffenlarsen commented Mar 19, 2020

romanovvlad commented Mar 19, 2020

smaslov-intel commented Mar 19, 2020

romanovvlad commented Mar 20, 2020

Ruyk commented Mar 20, 2020

smaslov-intel commented Mar 20, 2020

romanovvlad left a comment

Choose a reason for hiding this comment

smaslov-intel commented Apr 2, 2020

steffenlarsen commented Apr 3, 2020

romanovvlad left a comment

Choose a reason for hiding this comment