Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SYCL][PI][CUDA] Implements get_native interoperability #1332

Merged
merged 4 commits into from
Apr 14, 2020

Conversation

steffenlarsen
Copy link
Contributor

Implements get_native for CUDA allowing queries for native handles on
SYCL objects; queue, event, context, and device.

This is based on the SYCL generalization proposal.

Signed-off-by: Steffen Larsen steffen.larsen@codeplay.com

@steffenlarsen steffenlarsen force-pushed the steffen/generalized-get-native branch 2 times, most recently from 0ecfea1 to e4e4441 Compare March 16, 2020 17:33
@bader bader added the cuda CUDA back-end label Mar 17, 2020
@bader bader requested a review from smaslov-intel March 17, 2020 09:48
@steffenlarsen steffenlarsen force-pushed the steffen/generalized-get-native branch from e4e4441 to 5fb6161 Compare March 17, 2020 11:23
@smaslov-intel
Copy link
Contributor

This conflicts with the PI extension proposed in #1244.
Tagging @garimagu

Copy link
Contributor

@smaslov-intel smaslov-intel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please see how to merge with #1244

@steffenlarsen steffenlarsen force-pushed the steffen/generalized-get-native branch from 5fb6161 to 3406b3e Compare March 18, 2020 18:26
@steffenlarsen
Copy link
Contributor Author

I've added a commit for generalizing the get_mem and get_queue of interop_handler.

This conflicts with the PI extension proposed in #1244.

Thank you for notifying me. @garimagu please have a look and let me know how you think we should proceed.

@garimagu
Copy link
Contributor

I've added a commit for generalizing the get_mem and get_queue of interop_handler.

This conflicts with the PI extension proposed in #1244.

Thank you for notifying me. @garimagu please have a look and let me know how you think we should proceed.

The API here only returns native handle from the Pi handle. To support interoperability, we should support APIs for conversion of native_handle->Pi handle.
Further, if having one method for all handles seems to be a better approach, then I will have to change the method in review #1244 .
I will let @smaslov-intel comment on which approach he thinks is better.

@smaslov-intel
Copy link
Contributor

Me preference would be for separate API for creating PI objects from native handles, since there is not really anything shared between them. And if we stick with combining conversions to/from native handles in single API (as done in #1244) then we are up to adding remaining pi*Convert interfaces.

@steffenlarsen
Copy link
Contributor Author

Me preference would be for separate API for creating PI objects from native handles, since there is not really anything shared between them. And if we stick with combining conversions to/from native handles in single API (as done in #1244) then we are up to adding remaining pi*Convert interfaces.

For OpenCL, isn't it just a cast in either case, i.e. if we want the native handle the PI object is simply cast to it and the other way around. For CUDA I'm inclined to agree with you, but the two main issues I see with it is:

  • It's bloating the API with a lot of extension functions.
  • Some BEs may not have native handles for all PI object types, so having it in a single function makes it easy for such a backend to reach a point where all the native handles are implemented, without having to create all the functions for the conversions that they don't support.

@smaslov-intel
Copy link
Contributor

For OpenCL, isn't it just a cast in either case, i.e. if we want the native handle the PI object is simply cast to it and the other way around. For CUDA I'm inclined to agree with you, but the two main issues I see with it is:

That is correct, as we've decided to base PI API on OpenCL API as much as possible, meaning the OpenCL PI plugin is really thin.

It's bloating the API with a lot of extension functions.

Not so much a bloat. Just 1 new API for every SYCL class. With program and device are already covered in #1244, there are really just several more.

Some BEs may not have native handles for all PI object types

While this is true, I don't see why this would affect our decision on the questions at hands. BEs are free to implement only subset of PI API. If CUDA BE, for example, is not going to support some of the general interoperability, then it is equally doable with a single or multiple "convert" interfaces.

@steffenlarsen
Copy link
Contributor Author

Not so much a bloat. Just 1 new API for every SYCL class. With program and device are already covered in #1244, there are really just several more.

On that note, is there a point in not having the convert be two separate functions? If we want to split somewhat unrelated operations, why have the convert function be two-way?

Some BEs may not have native handles for all PI object types

While this is true, I don't see why this would affect our decision on the questions at hands. BEs are free to implement only subset of PI API. If CUDA BE, for example, is not going to support some of the general interoperability, then it is equally doable with a single or multiple "convert" interfaces.

My point here was mostly that each backend will have to implement all the convert functions, even when new are added, whereas having it as one will allow BEs to implement all that they currently support and default to some error if not. Having separate functions and not having them implemented in each backend will likely cause segfaults.

@romanovvlad
Copy link
Contributor

My 5 cents: I agree with @steffenlarsen , having two APIs proposed by the patch look cleaner.

@smaslov-intel
Copy link
Contributor

How about a single API piGetNativeHandle to return the native handle as introduced in this patch, but separate APIs for constructing PI objects from native handles, i.e. piProgramCreateFromNative/piDeviceCreateFromNative/etc ?

@romanovvlad
Copy link
Contributor

How about a single API piGetNativeHandle to return the native handle as introduced in this patch, but separate APIs for constructing PI objects from native handles, i.e. piProgramCreateFromNative/piDeviceCreateFromNative/etc ?

Why is it better than proposed solutions? To me it has worst from both.

@Ruyk
Copy link
Contributor

Ruyk commented Mar 20, 2020

I think I could argue either way. Having separate functions allows using specific PI api types directly, and its more clear what is happening. However, it adds much more surface to the API for no particular reason.
A single function to create from a handle is less code to maintain, but its more obscure to use.
If PI were to be a user-friendly API, I would side with the first, but given this is purely for implementing SYCL RT, I am leaning more towards a single function that can do all the conversions, and default to an error when the conversion doesn't exist. This reduces maintenance effort of the PI backends.

@smaslov-intel
Copy link
Contributor

Separate APIs would follow the spirit of what PI API is currently doing: separate API for PI objects creation -- pi*Create. There are also talks to make PI API out of SYCL internals and serve the purpose of integrating other run-times with the low-level drivers. I'd rather not "pollute" the API for no good reasons (maintenance cost difference is negligible between the options we are discussing).

sycl/include/CL/sycl/backend.hpp Outdated Show resolved Hide resolved
sycl/include/CL/sycl/backend/cuda.hpp Show resolved Hide resolved
sycl/include/CL/sycl/backend.hpp Show resolved Hide resolved
sycl/include/CL/sycl/backend/cuda_definitions.hpp Outdated Show resolved Hide resolved
sycl/include/CL/sycl/backend/opencl.hpp Show resolved Hide resolved
sycl/include/CL/sycl/detail/cg.hpp Outdated Show resolved Hide resolved
sycl/test/backend/cuda/interop_get_native.cpp Outdated Show resolved Hide resolved
sycl/test/backend/cuda/interop_get_native.cpp Outdated Show resolved Hide resolved
sycl/test/backend/cuda/interop_get_native.cpp Outdated Show resolved Hide resolved
sycl/include/CL/sycl/context.hpp Show resolved Hide resolved
@steffenlarsen steffenlarsen force-pushed the steffen/generalized-get-native branch 2 times, most recently from 7933926 to 8c05f76 Compare March 23, 2020 18:55
Copy link
Contributor

@romanovvlad romanovvlad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Last comment from me.
Please, align with @smaslov-intel on the number of PI APIs.

sycl/source/detail/device_impl.hpp Outdated Show resolved Hide resolved
@steffenlarsen steffenlarsen force-pushed the steffen/generalized-get-native branch 2 times, most recently from e05eaeb to 11c2ce6 Compare March 24, 2020 16:49
@smaslov-intel
Copy link
Contributor

  • I've only added the piext*GetNative and piext*FromNative currently needed, but I have made sure that all PI object types that has one also has the other.

Great thanks for doing this!

  • I changed the wording for the piext*FromNative to specify that the created PI object must increment the ref count by one, just to ensure that we avoid confusion when implementing this for the CUDA backend as it's the PI object that is reference counted not the CUDA object itself. That is, the PI object created from a CUDA object should have a reference count of 1 whereas for OpenCL it should have the previous value +1.

I totally agree that the created PI objects should not be "hanging", so a reference-counter should be increased (or started where it didn't exist before). But please clarify that in the API description, that it is not just +1 always.

  • On that note, this means that the PI object takes ownership of the passed CUDA object and will release it upon destruction.

This is a worrying one. We can equally choose not taking the ownership of native objects, and just release PI object after it's reference-count gets to 0, while leaving it up to the user who created/provided the native handle to release its handles. Not sure how they would know that PI no longer uses his handles, though. Anyways, we can think/re-define this later, but in this PR please clarify this important aspect in the PI API descriptions.

@steffenlarsen steffenlarsen force-pushed the steffen/generalized-get-native branch from 8fcb3c4 to 2cb153c Compare April 3, 2020 12:29
@steffenlarsen steffenlarsen requested a review from bader as a code owner April 3, 2020 12:29
@steffenlarsen
Copy link
Contributor Author

Thank you for the feedback. I've adjusted it accordingly.

This is a worrying one. We can equally choose not taking the ownership of native objects, and just release PI object after it's reference-count gets to 0, while leaving it up to the user who created/provided the native handle to release its handles. Not sure how they would know that PI no longer uses his handles, though. Anyways, we can think/re-define this later, but in this PR please clarify this important aspect in the PI API descriptions.

You're absolutely right. Either way will require user-knowledge, but since this is interoperability I think we can rely on the user not using it completely blindly. The decision should however probably be documented when taken, but as you also suggest this is best left for when the functions are actually implemented in the CUDA backend.

@steffenlarsen steffenlarsen force-pushed the steffen/generalized-get-native branch 2 times, most recently from f2a0879 to 8103c1e Compare April 13, 2020 16:31
Implements get_native for CUDA allowing queries for native handles on
SYCL objects; queue, event, context, and device.

Signed-off-by: Steffen Larsen <steffen.larsen@codeplay.com>
@steffenlarsen steffenlarsen force-pushed the steffen/generalized-get-native branch from 8103c1e to 2a1d463 Compare April 14, 2020 10:54
Steffen Larsen added 3 commits April 14, 2020 12:00
This commit makes get_mem and get_queue of interop_handler return
types based on a specified backend. The backend defaults to OpenCL to
avoid breakages.

Signed-off-by: Steffen Larsen <steffen.larsen@codeplay.com>
Signed-off-by: Steffen Larsen <steffen.larsen@codeplay.com>
PI objects created with the associated piextCreate*WithNativeHandle
function takes ownership of the native handle passed to it. For OpenCL
this means that the native OpenCL object is not retained, so the PI
object effectively owns the reference of the caller.
The OpenCL interoperability have been changed to retain the native
OpenCL handle as the implementation no longer does it. This is required
by the SYCL 1.2.1 specification (Rev6, section 4.3.1.)

Signed-off-by: Steffen Larsen <steffen.larsen@codeplay.com>
Copy link
Contributor

@romanovvlad romanovvlad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM but I would like @smaslov-intel to approve as well before merge

@pvchupin pvchupin merged commit 2d71c8e into intel:sycl Apr 14, 2020
alexbatashev pushed a commit to alexbatashev/llvm that referenced this pull request Apr 15, 2020
…duler_docs

* origin/sycl:
  [SYCL][PI][CUDA] Implements get_native interoperability (intel#1332)
  [SYCL] Fix check-sycl test suite on systems w/o OpenCL (intel#1503)
  [SYCL][Doc] Update ExtendedAtomics documentation (intel#1487)
  [SYCL][CUDA] Expose context extended deleters on PI API (intel#1483)
  [SYCL][NFC] Remove a dropped environment variable from a test (intel#1506)
  [SYCL] Add opencl-aot to sycl-toolchain target (intel#1504)
  [SYCL] Allow to run deploy LIT tests from particular directory
  [SYCL][CUDA] Fix LIT testing with CUDA devices (intel#1300)
  [SYCL] Remove operator name keywords (intel#1501)
  [Driver][SYCL] Consider .lo files as static archives (intel#1500)
  [SYCL-PTX] Update the compiler design to describe the CUDA target (intel#1408)
  [SYCL] Fix library build on Windows (intel#1499)
  [SYCL][NFC] Refactor lit.cfg.py (intel#1452)
  [SYCL] Fixed sub-buffer memory allocation update (intel#1486)
  [SYCL] Ensure proper definition of spirv builtins for SYCL (intel#1393)
  [SYCL][CUDA] LIT XFAIL/UNSUPPORTED (intel#1303)
  [SYCL][Doc] Function-type kernel attribute extension (intel#1494)
alexbatashev pushed a commit to alexbatashev/llvm that referenced this pull request Apr 15, 2020
…c_abi_checks

* origin/sycl: (32 commits)
  [SYCL] Do not force LLVM_INCLUDE_TESTS variable (intel#1505)
  [SYCL][NFC] Align nd_item members with constructor initialization list (intel#1521)
  [SYCL] Move get_info_host implementation to header (intel#1514)
  [SYCL] Always use dynamic CRT for Unit tests (intel#1515)
  [SYCL][NFC] Temporarily disable sporadically failing test (intel#1526)
  [SYCL] Fix inline namespaces (intel#1525)
  [SYCL] Release notes for March'20 DPCPP implementation update (intel#1511)
  [SYCL][PI][CUDA] Implements get_native interoperability (intel#1332)
  [SYCL] Fix check-sycl test suite on systems w/o OpenCL (intel#1503)
  [SYCL][Doc] Update ExtendedAtomics documentation (intel#1487)
  [SYCL][CUDA] Expose context extended deleters on PI API (intel#1483)
  [SYCL][NFC] Remove a dropped environment variable from a test (intel#1506)
  [SYCL] Add opencl-aot to sycl-toolchain target (intel#1504)
  [SYCL] Allow to run deploy LIT tests from particular directory
  [SYCL][CUDA] Fix LIT testing with CUDA devices (intel#1300)
  [SYCL] Remove operator name keywords (intel#1501)
  [Driver][SYCL] Consider .lo files as static archives (intel#1500)
  [SYCL-PTX] Update the compiler design to describe the CUDA target (intel#1408)
  [SYCL] Fix library build on Windows (intel#1499)
  [SYCL][NFC] Refactor lit.cfg.py (intel#1452)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cuda CUDA back-end
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants