Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[cuda][hip] Removed bad event wait constraint / use scopedContext correctly. #1403

Merged
merged 1 commit into from
Mar 18, 2024

Conversation

JackAKirk
Copy link
Contributor

@JackAKirk JackAKirk commented Mar 1, 2024

There is no reason to not allow waits from native hip/cuda events in different sycl "contexts".
Even the sycl spec does not mandate such a thing. This constraint does not appear to be applied in L0 backend.
This also fixes scopedContext to set the per device (CU/HIP)context.

This is a fix for intel/llvm#6749 (comment)

UR_ASSERT(Event, UR_RESULT_ERROR_INVALID_EVENT);
UR_ASSERT(Event->getContext() == Context,
UR_RESULT_ERROR_INVALID_CONTEXT);
ScopedContext Active(Event->getContext());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need to set the context every time this lambda is run. It's valid to wait on an event that was recorded in another context, so setting the context just once at the start of the try catch should be enough

Copy link
Contributor Author

@JackAKirk JackAKirk Mar 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it valid to wait for an event that was set in another CUcontext (on a different device)?
I assumed this was not the case: cuEventSynchronize can return "CUDA_ERROR_INVALID_CONTEXT".

Copy link
Contributor

@hdelan hdelan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Think we should only have a context set once per urEventWait, not once per event

@JackAKirk
Copy link
Contributor Author

JackAKirk commented Mar 1, 2024

Think we should only have a context set once per urEventWait, not once per event

urEventWait waits on a list of events which can be associated with different device CUcontexts. This is what the code was doing (using different devices) that led to the error that motivated this patch. If you only set the context once it could generally be set for a CUcontext that doesn't correspond to the CUcontext used for the stream that the event is associated with.

@JackAKirk JackAKirk requested a review from hdelan March 5, 2024 17:41
@hdelan
Copy link
Contributor

hdelan commented Mar 6, 2024

urEventWait waits on a list of events which can be associated with different device CUcontexts. This is what the code was doing (using different devices) that led to the error that motivated this patch. If you only set the context once it could generally be set for a CUcontext that doesn't correspond to the CUcontext used for the stream that the event is associated with.

These are docs for the CUDA runtime which says we don't need to set a context to use cudaEventSynchronize.
https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#stream-and-event-behavior

I think it must be the same for the driver API as well since if we have a CUevent associated with context A and a stream associated with context B, since in the CUDA runtime it says we can successfully call cudaStreamWaitEvent, this must also be possible in the CUDA driver since there is no way to emulate having two active contexts to make this call.

However this is sort of on the fringes of documentation so I think we should run some tests to make sure that this is OK in cuda driver API.

@JackAKirk
Copy link
Contributor Author

urEventWait waits on a list of events which can be associated with different device CUcontexts. This is what the code was doing (using different devices) that led to the error that motivated this patch. If you only set the context once it could generally be set for a CUcontext that doesn't correspond to the CUcontext used for the stream that the event is associated with.

These are docs for the CUDA runtime which says we don't need to set a context to use cudaEventSynchronize. https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#stream-and-event-behavior

I think it must be the same for the driver API as well since if we have a CUevent associated with context A and a stream associated with context B, since in the CUDA runtime it says we can successfully call cudaStreamWaitEvent, this must also be possible in the CUDA driver since there is no way to emulate having two active contexts to make this call.

However this is sort of on the fringes of documentation so I think we should run some tests to make sure that this is OK in cuda driver API.

Yeah makes sense, nice find. I'll run some tests.

@JackAKirk
Copy link
Contributor Author

urEventWait waits on a list of events which can be associated with different device CUcontexts. This is what the code was doing (using different devices) that led to the error that motivated this patch. If you only set the context once it could generally be set for a CUcontext that doesn't correspond to the CUcontext used for the stream that the event is associated with.

These are docs for the CUDA runtime which says we don't need to set a context to use cudaEventSynchronize. https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#stream-and-event-behavior
I think it must be the same for the driver API as well since if we have a CUevent associated with context A and a stream associated with context B, since in the CUDA runtime it says we can successfully call cudaStreamWaitEvent, this must also be possible in the CUDA driver since there is no way to emulate having two active contexts to make this call.
However this is sort of on the fringes of documentation so I think we should run some tests to make sure that this is OK in cuda driver API.

Yeah makes sense, nice find. I'll run some tests.

Made some multidevice/multiqueue tests and they pass fine on nvidia/hip systems so I think you must be right. I've made the change you suggested.

@hdelan
Copy link
Contributor

hdelan commented Mar 6, 2024

Made some multidevice/multiqueue tests and they pass fine on nvidia/hip systems so I think you must be right. I've made the change you suggested.

Great thanks

@JackAKirk JackAKirk added the ready to merge Added to PR's which are ready to merge label Mar 8, 2024
@JackAKirk
Copy link
Contributor Author

This is P2P functionality requireing 2 gpus and doesn't have a test-e2e

@JackAKirk JackAKirk added the v0.9.x Include in the v0.9.x release label Mar 12, 2024
There is no reason to not allow waits from native hip/cuda events in different sycl "contexts".

Signed-off-by: JackAKirk <jack.kirk@codeplay.com>
@kbenzie kbenzie merged commit db4b0c1 into oneapi-src:main Mar 18, 2024
50 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ready to merge Added to PR's which are ready to merge v0.9.x Include in the v0.9.x release
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants