-
Notifications
You must be signed in to change notification settings - Fork 218
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cuIpcGetMemHandle
failure resulting in CUDA-aware MPI to fail
#1398
Comments
cuIpcGetMemHandle
failure resulting in CUDA-aware MPI to failcuIpcGetMemHandle
failure resulting in CUDA-aware MPI to fail
Dup: #1053 |
Thanks for your rapid reply and the fix! Would there be a way to integrate to CUDA tests some of the low-level ImplicitGlobalGrid tests. This may be helpful to catch some low-level issues before they turn into bugs. |
We could do reverse CI, but that doesn't scale well. Wouldn't it be possible to do CI on the ImplicitGlobalGrid repo using CUDA.jl#master, and do that automatically every day/week or so? |
Yeah, we could try running some weekly CI on IGG using CUDA#master. We plan to set-up (multi-)GPU CI at CSCS for IGG and PS in order to have access to various GPU and MPI archs/builds. I guess your suggestion could then fit within that frame. |
Starting with CUDA v3.8.2, CUDA-aware MPI fails with following error, resulting in a segfault:
This failure was known in the past and required to set
export JULIA_CUDA_MEMORY_POOL=none
.With CUDA v3.8.1, all works fine upon exporting
export JULIA_CUDA_MEMORY_POOL=none
. This suggests that a commit going from CUDA v3.8.1 to v3.8.2 introduced the bug. The (only) suspect may be #1383To reproduce, you can run the ImplicitGlobalGrid
test_update_halo.jl
using MPI:In the test one does not get a segfault, but warnings. In the application, it segfaults.
This occurs on Julia 1.7.1 and
The text was updated successfully, but these errors were encountered: