Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[NDTensorscuTENSORExt] Temporary fix for block sparse contractions #1485

Merged
merged 6 commits into from
Jun 6, 2024

Conversation

kmp5VT
Copy link
Collaborator

@kmp5VT kmp5VT commented Jun 5, 2024

Description

Quick fix by copying all the viewed blocks and rewriting the block back to fix cuTENSOR problem.

Checklist:

@mtfishman
Copy link
Member

mtfishman commented Jun 5, 2024

I think using a single function with if-statements would be better here. This is a temporary workaround so there isn't a reason to get too fancy with traits, and the offset is runtime information anyway so it is more natural to deal with it at runtime. So for example:

arrayR = array(R)
if !iszero(array(R).offset)
  arrayR = copy(arrayR)
end
cuR = CuTensor(arrayR, collect(labelsR))

# ...

if !iszero(array(R).offset)
  array(R) .= cuR.data
end

Then we can remove the if-statements once JuliaGPU/CUDA.jl#2407 is fixed.

@codecov-commenter
Copy link

codecov-commenter commented Jun 6, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 77.42%. Comparing base (e08e131) to head (e4d590d).
Report is 16 commits behind head on main.

Current head e4d590d differs from pull request most recent head 2f25ce9

Please upload reports for the commit 2f25ce9 to get more accurate results.

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files
@@             Coverage Diff             @@
##             main    #1485       +/-   ##
===========================================
+ Coverage   43.65%   77.42%   +33.77%     
===========================================
  Files         136      140        +4     
  Lines        8806     9126      +320     
===========================================
+ Hits         3844     7066     +3222     
+ Misses       4962     2060     -2902     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@mtfishman
Copy link
Member

Looks good, thanks. Is this ready to merge once tests pass?

@kmp5VT
Copy link
Collaborator Author

kmp5VT commented Jun 6, 2024

@mtfishman I ran a timing of the 2d-Heisenberg system in ITensorMPS examples and see comparable timings for both CUDA and cuTENSOR
cuBLAS:

After sweep 1 energy=-43.12498291296036  maxlinkdim=20 maxerr=2.90E-03 time=9.457
After sweep 2 energy=-44.150301070050936  maxlinkdim=60 maxerr=9.70E-05 time=12.150
After sweep 3 energy=-44.40197077642984  maxlinkdim=100 maxerr=1.05E-04 time=15.370
After sweep 4 energy=-44.41428889437265  maxlinkdim=100 maxerr=1.59E-04 time=15.341
After sweep 5 energy=-44.52195043449121  maxlinkdim=200 maxerr=3.38E-05 time=18.212
After sweep 6 energy=-44.560341029613355  maxlinkdim=400 maxerr=7.15E-06 time=22.673
After sweep 7 energy=-44.56555460244652  maxlinkdim=800 maxerr=9.73E-07 time=33.934
After sweep 8 energy=-44.56561293473233  maxlinkdim=800 maxerr=1.34E-06 time=36.717

cuTENSOR

After sweep 1 energy=-43.12498291296036  maxlinkdim=20 maxerr=2.90E-03 time=21.088
After sweep 2 energy=-44.150301070050936  maxlinkdim=60 maxerr=9.70E-05 time=12.117
After sweep 3 energy=-44.40197077642984  maxlinkdim=100 maxerr=1.05E-04 time=15.447
After sweep 4 energy=-44.41428889437265  maxlinkdim=100 maxerr=1.59E-04 time=15.287
After sweep 5 energy=-44.52195043449121  maxlinkdim=200 maxerr=3.38E-05 time=17.710
After sweep 6 energy=-44.560341029613355  maxlinkdim=400 maxerr=7.15E-06 time=22.046
After sweep 7 energy=-44.56555460244652  maxlinkdim=800 maxerr=9.73E-07 time=35.958
After sweep 8 energy=-44.56561293473233  maxlinkdim=800 maxerr=1.34E-06 time=41.459

@kmp5VT
Copy link
Collaborator Author

kmp5VT commented Jun 6, 2024

Looks good, thanks. Is this ready to merge once tests pass?

Yes ready to merge when tests pass!

@mtfishman
Copy link
Member

Glad to see the timings are comparable, I'll be curious to see what the timings are once we can remove the calls to copy. Pretty soon we'll have the entire table in https://itensor.github.io/ITensors.jl/dev/RunningOnGPUs.html#GPU-backends supported!

Have you tried profiling or timing (https://cuda.juliagpu.org/stable/development/profiling) to get an estimate of how much time the calls to copy are taking up?

@mtfishman mtfishman changed the title [NDTensors] Temporary fix to cuTENSOR to have block sparse contractions working [NDTensorscuTENSORExt] Temporary fix for block sparse contractions Jun 6, 2024
@kmp5VT
Copy link
Collaborator Author

kmp5VT commented Jun 6, 2024

Have you tried profiling or timing (https://cuda.juliagpu.org/stable/development/profiling) to get an estimate of how much time the calls to copy are taking up?

No I haven't yet don't any indepth timings of the function just a quick dmrg calculation to see if it was significantly effecting performance in a known case where GPU finds improvement over CPU.
I will run more and get back on timings/performance

@mtfishman mtfishman merged commit 1cad3e2 into ITensor:main Jun 6, 2024
15 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants