[cuda][hip] Removed bad event wait constraint / use scopedContext correctly. #1403

JackAKirk · 2024-03-01T11:33:14Z

There is no reason to not allow waits from native hip/cuda events in different sycl "contexts".
Even the sycl spec does not mandate such a thing. This constraint does not appear to be applied in L0 backend.
This also fixes scopedContext to set the per device (CU/HIP)context.

This is a fix for intel/llvm#6749 (comment)

hdelan · 2024-03-01T12:45:56Z

source/adapters/cuda/event.cpp

      UR_ASSERT(Event, UR_RESULT_ERROR_INVALID_EVENT);
-      UR_ASSERT(Event->getContext() == Context,
-                UR_RESULT_ERROR_INVALID_CONTEXT);
+      ScopedContext Active(Event->getContext());


I don't think we need to set the context every time this lambda is run. It's valid to wait on an event that was recorded in another context, so setting the context just once at the start of the try catch should be enough

Is it valid to wait for an event that was set in another CUcontext (on a different device)?
I assumed this was not the case: cuEventSynchronize can return "CUDA_ERROR_INVALID_CONTEXT".

hdelan

Think we should only have a context set once per urEventWait, not once per event

JackAKirk · 2024-03-01T12:49:48Z

Think we should only have a context set once per urEventWait, not once per event

urEventWait waits on a list of events which can be associated with different device CUcontexts. This is what the code was doing (using different devices) that led to the error that motivated this patch. If you only set the context once it could generally be set for a CUcontext that doesn't correspond to the CUcontext used for the stream that the event is associated with.

hdelan · 2024-03-06T14:46:39Z

urEventWait waits on a list of events which can be associated with different device CUcontexts. This is what the code was doing (using different devices) that led to the error that motivated this patch. If you only set the context once it could generally be set for a CUcontext that doesn't correspond to the CUcontext used for the stream that the event is associated with.

These are docs for the CUDA runtime which says we don't need to set a context to use cudaEventSynchronize.
https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#stream-and-event-behavior

I think it must be the same for the driver API as well since if we have a CUevent associated with context A and a stream associated with context B, since in the CUDA runtime it says we can successfully call cudaStreamWaitEvent, this must also be possible in the CUDA driver since there is no way to emulate having two active contexts to make this call.

However this is sort of on the fringes of documentation so I think we should run some tests to make sure that this is OK in cuda driver API.

JackAKirk · 2024-03-06T14:52:08Z

urEventWait waits on a list of events which can be associated with different device CUcontexts. This is what the code was doing (using different devices) that led to the error that motivated this patch. If you only set the context once it could generally be set for a CUcontext that doesn't correspond to the CUcontext used for the stream that the event is associated with.

These are docs for the CUDA runtime which says we don't need to set a context to use cudaEventSynchronize. https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#stream-and-event-behavior

I think it must be the same for the driver API as well since if we have a CUevent associated with context A and a stream associated with context B, since in the CUDA runtime it says we can successfully call cudaStreamWaitEvent, this must also be possible in the CUDA driver since there is no way to emulate having two active contexts to make this call.

However this is sort of on the fringes of documentation so I think we should run some tests to make sure that this is OK in cuda driver API.

Yeah makes sense, nice find. I'll run some tests.

JackAKirk · 2024-03-06T17:44:32Z

urEventWait waits on a list of events which can be associated with different device CUcontexts. This is what the code was doing (using different devices) that led to the error that motivated this patch. If you only set the context once it could generally be set for a CUcontext that doesn't correspond to the CUcontext used for the stream that the event is associated with.

These are docs for the CUDA runtime which says we don't need to set a context to use cudaEventSynchronize. https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#stream-and-event-behavior
I think it must be the same for the driver API as well since if we have a CUevent associated with context A and a stream associated with context B, since in the CUDA runtime it says we can successfully call cudaStreamWaitEvent, this must also be possible in the CUDA driver since there is no way to emulate having two active contexts to make this call.
However this is sort of on the fringes of documentation so I think we should run some tests to make sure that this is OK in cuda driver API.

Yeah makes sense, nice find. I'll run some tests.

Made some multidevice/multiqueue tests and they pass fine on nvidia/hip systems so I think you must be right. I've made the change you suggested.

hdelan · 2024-03-06T17:47:53Z

Made some multidevice/multiqueue tests and they pass fine on nvidia/hip systems so I think you must be right. I've made the change you suggested.

Great thanks

JackAKirk · 2024-03-12T11:47:52Z

This is P2P functionality requireing 2 gpus and doesn't have a test-e2e

There is no reason to not allow waits from native hip/cuda events in different sycl "contexts". Signed-off-by: JackAKirk <jack.kirk@codeplay.com>

JackAKirk requested review from a team as code owners March 1, 2024 11:33

JackAKirk requested a review from GeorgeWeb March 1, 2024 11:33

JackAKirk mentioned this pull request Mar 1, 2024

NVIDIA MultiGPU support for SYCL intel/llvm#6749

Closed

hdelan reviewed Mar 1, 2024

View reviewed changes

JackAKirk requested a review from hdelan March 5, 2024 17:41

hdelan approved these changes Mar 6, 2024

View reviewed changes

JackAKirk added the ready to merge Added to PR's which are ready to merge label Mar 8, 2024

JackAKirk added the v0.9.x Include in the v0.9.x release label Mar 12, 2024

Removed bad constraint that is not in sycl spec.

f2573e8

There is no reason to not allow waits from native hip/cuda events in different sycl "contexts". Signed-off-by: JackAKirk <jack.kirk@codeplay.com>

JackAKirk force-pushed the fix-events-wait-context branch from 6d11420 to f2573e8 Compare March 14, 2024 15:28

kbenzie merged commit db4b0c1 into oneapi-src:main Mar 18, 2024
50 checks passed

hdelan mentioned this pull request Mar 20, 2024

[HIP] Use context to get active device #1437

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[cuda][hip] Removed bad event wait constraint / use scopedContext correctly. #1403

[cuda][hip] Removed bad event wait constraint / use scopedContext correctly. #1403

JackAKirk commented Mar 1, 2024 •

edited

Loading

hdelan Mar 1, 2024

JackAKirk Mar 1, 2024 •

edited

Loading

hdelan left a comment

JackAKirk commented Mar 1, 2024 •

edited

Loading

hdelan commented Mar 6, 2024

JackAKirk commented Mar 6, 2024

JackAKirk commented Mar 6, 2024

hdelan commented Mar 6, 2024

JackAKirk commented Mar 12, 2024

[cuda][hip] Removed bad event wait constraint / use scopedContext correctly. #1403

[cuda][hip] Removed bad event wait constraint / use scopedContext correctly. #1403

Conversation

JackAKirk commented Mar 1, 2024 • edited Loading

hdelan Mar 1, 2024

Choose a reason for hiding this comment

JackAKirk Mar 1, 2024 • edited Loading

Choose a reason for hiding this comment

hdelan left a comment

Choose a reason for hiding this comment

JackAKirk commented Mar 1, 2024 • edited Loading

hdelan commented Mar 6, 2024

JackAKirk commented Mar 6, 2024

JackAKirk commented Mar 6, 2024

hdelan commented Mar 6, 2024

JackAKirk commented Mar 12, 2024

JackAKirk commented Mar 1, 2024 •

edited

Loading

JackAKirk Mar 1, 2024 •

edited

Loading

JackAKirk commented Mar 1, 2024 •

edited

Loading