-
Notifications
You must be signed in to change notification settings - Fork 218
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Wrap and test peer to peer memory copies #1284
Conversation
Codecov Report
@@ Coverage Diff @@
## master #1284 +/- ##
==========================================
+ Coverage 78.29% 78.63% +0.34%
==========================================
Files 119 119
Lines 8648 8642 -6
==========================================
+ Hits 6771 6796 +25
+ Misses 1877 1846 -31
Continue to review full report at Codecov.
|
|
OK, so some offline discussion showed that if both devices have UVA enabled, we can just use |
Nice! I'll try this out later this week. We should add a I also wonder if we should just require UVA -- are there systems without it? -- and then just |
It looks like UVA was added in CUDA 6 (2013) so I'd be very surprised if anyone were still using GPUs that don't support it, but I don't know. |
Even a Jetson Nano supports it:
I also want to verify on Windows whether it's only supported on TCC or not. |
On Windows too:
So I propose we switch to mandatory UVA and simplify our copy methods as part of this PR. I can have a look at this. |
Let's hold off on requiring UVA. I've had a look at the PR, but am running into a couple of issues. For one, copies between devices that have UVA but not P2P doesn't work: julia> device(a)
CuDevice(0): NVIDIA A100-PCIE-40GB
julia> device(b)
CuDevice(1): Tesla V100-PCIE-32GB
julia> unified_addressing(device(a))
true
julia> unified_addressing(device(b))
true
julia> copyto!(a, b)
ERROR: CUDA error: invalid argument (code 1, ERROR_INVALID_VALUE)
Stacktrace:
[1] throw_api_error(res::CUDA.cudaError_enum)
@ CUDA ~/Julia/pkg/CUDA/lib/cudadrv/error.jl:91
[2] macro expansion
@ ~/Julia/pkg/CUDA/lib/cudadrv/error.jl:101 [inlined]
[3] cuMemcpyAsync(dst::CuPtr{Float32}, src::CuPtr{Float32}, ByteCount::Int64, hStream::CuStream)
@ CUDA ~/Julia/pkg/CUDA/lib/utils/call.jl:26 @kshyatt did you test this? To use the Peer version of the call, this PR currently does |
I'm thoroughly confused now; it's pretty badly documented whether cu(da)Memcpy can work with multiple devices, and some testing here on UVA as well as P2P capable GPUs doesn't seem to indicate so. I've just made it use the Peer version now, which is also what the sample does: https://github.com/zchee/cuda-sample/blob/05555eef0d49ebdde999f5430f185a225ef00dcd/1_Utilities/p2pBandwidthLatencyTest/p2pBandwidthLatencyTest.cu#L120 |
Co-authored-by: Katharine Hyatt <khyatt@flatironinstitute.org>
Fixes #1136. We might need a check that the GPUs in question support peer copies?