[NDTensorscuTENSORExt] Temporary fix for block sparse contractions #1485

kmp5VT · 2024-06-05T16:13:12Z

Description

Quick fix by copying all the viewed blocks and rewriting the block back to fix cuTENSOR problem.

Checklist:

BlockSparse cutensor contractions work.
Check that the performance isn't too slow compared to cuBLAS implementation.
Avoid copying when it isn't needed, for example when the offset field of the CuArray is 0 (https://github.com/JuliaGPU/CUDA.jl/blob/v5.4.2/src/array.jl#L61).

NDTensors/ext/NDTensorscuTENSORExt/contract.jl

mtfishman · 2024-06-05T21:04:57Z

I think using a single function with if-statements would be better here. This is a temporary workaround so there isn't a reason to get too fancy with traits, and the offset is runtime information anyway so it is more natural to deal with it at runtime. So for example:

arrayR = array(R)
if !iszero(array(R).offset)
  arrayR = copy(arrayR)
end
cuR = CuTensor(arrayR, collect(labelsR))

# ...

if !iszero(array(R).offset)
  array(R) .= cuR.data
end

Then we can remove the if-statements once JuliaGPU/CUDA.jl#2407 is fixed.

NDTensors/ext/NDTensorscuTENSORExt/contract.jl

codecov-commenter · 2024-06-06T16:15:18Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 77.42%. Comparing base (e08e131) to head (e4d590d).
Report is 16 commits behind head on main.

❗ Current head e4d590d differs from pull request most recent head 2f25ce9

Please upload reports for the commit 2f25ce9 to get more accurate results.

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files

@@             Coverage Diff             @@
##             main    #1485       +/-   ##
===========================================
+ Coverage   43.65%   77.42%   +33.77%     
===========================================
  Files         136      140        +4     
  Lines        8806     9126      +320     
===========================================
+ Hits         3844     7066     +3222     
+ Misses       4962     2060     -2902

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

mtfishman · 2024-06-06T17:57:56Z

Looks good, thanks. Is this ready to merge once tests pass?

kmp5VT · 2024-06-06T17:59:27Z

@mtfishman I ran a timing of the 2d-Heisenberg system in ITensorMPS examples and see comparable timings for both CUDA and cuTENSOR
cuBLAS:

After sweep 1 energy=-43.12498291296036  maxlinkdim=20 maxerr=2.90E-03 time=9.457
After sweep 2 energy=-44.150301070050936  maxlinkdim=60 maxerr=9.70E-05 time=12.150
After sweep 3 energy=-44.40197077642984  maxlinkdim=100 maxerr=1.05E-04 time=15.370
After sweep 4 energy=-44.41428889437265  maxlinkdim=100 maxerr=1.59E-04 time=15.341
After sweep 5 energy=-44.52195043449121  maxlinkdim=200 maxerr=3.38E-05 time=18.212
After sweep 6 energy=-44.560341029613355  maxlinkdim=400 maxerr=7.15E-06 time=22.673
After sweep 7 energy=-44.56555460244652  maxlinkdim=800 maxerr=9.73E-07 time=33.934
After sweep 8 energy=-44.56561293473233  maxlinkdim=800 maxerr=1.34E-06 time=36.717

cuTENSOR

After sweep 1 energy=-43.12498291296036  maxlinkdim=20 maxerr=2.90E-03 time=21.088
After sweep 2 energy=-44.150301070050936  maxlinkdim=60 maxerr=9.70E-05 time=12.117
After sweep 3 energy=-44.40197077642984  maxlinkdim=100 maxerr=1.05E-04 time=15.447
After sweep 4 energy=-44.41428889437265  maxlinkdim=100 maxerr=1.59E-04 time=15.287
After sweep 5 energy=-44.52195043449121  maxlinkdim=200 maxerr=3.38E-05 time=17.710
After sweep 6 energy=-44.560341029613355  maxlinkdim=400 maxerr=7.15E-06 time=22.046
After sweep 7 energy=-44.56555460244652  maxlinkdim=800 maxerr=9.73E-07 time=35.958
After sweep 8 energy=-44.56561293473233  maxlinkdim=800 maxerr=1.34E-06 time=41.459

kmp5VT · 2024-06-06T17:59:58Z

Looks good, thanks. Is this ready to merge once tests pass?

Yes ready to merge when tests pass!

mtfishman · 2024-06-06T18:04:35Z

Glad to see the timings are comparable, I'll be curious to see what the timings are once we can remove the calls to copy. Pretty soon we'll have the entire table in https://itensor.github.io/ITensors.jl/dev/RunningOnGPUs.html#GPU-backends supported!

Have you tried profiling or timing (https://cuda.juliagpu.org/stable/development/profiling) to get an estimate of how much time the calls to copy are taking up?

kmp5VT · 2024-06-06T18:30:11Z

Have you tried profiling or timing (https://cuda.juliagpu.org/stable/development/profiling) to get an estimate of how much time the calls to copy are taking up?

No I haven't yet don't any indepth timings of the function just a quick dmrg calculation to see if it was significantly effecting performance in a known case where GPU finds improvement over CPU.
I will run more and get back on timings/performance

hack to cutensor contract to get working

1050c18

mtfishman reviewed Jun 5, 2024

View reviewed changes

NDTensors/ext/NDTensorscuTENSORExt/contract.jl Outdated Show resolved Hide resolved

Create contract function to only copy data when necessary.

297f01c

kmp5VT commented Jun 5, 2024

View reviewed changes

NDTensors/ext/NDTensorscuTENSORExt/contract.jl Outdated Show resolved Hide resolved

mtfishman reviewed Jun 5, 2024

View reviewed changes

NDTensors/ext/NDTensorscuTENSORExt/contract.jl Outdated Show resolved Hide resolved

Simplifying updates to cutensor contract!

330cbee

mtfishman reviewed Jun 6, 2024

View reviewed changes

NDTensors/ext/NDTensorscuTENSORExt/contract.jl Outdated Show resolved Hide resolved

Simplify code

2f25ce9

remove data

df775a4

Merge branch 'main' into kmp5/debug/cutensor_hack

3161473

mtfishman changed the title ~~[NDTensors] Temporary fix to cuTENSOR to have block sparse contractions working~~ [NDTensorscuTENSORExt] Temporary fix for block sparse contractions Jun 6, 2024

mtfishman merged commit 1cad3e2 into ITensor:main Jun 6, 2024
15 checks passed

mtfishman mentioned this pull request Jun 6, 2024

[NDTensorscuTENSORExt] [BUG] cuTENSOR extension and block sparse contraction #1451

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[NDTensorscuTENSORExt] Temporary fix for block sparse contractions #1485

[NDTensorscuTENSORExt] Temporary fix for block sparse contractions #1485

kmp5VT commented Jun 5, 2024 •

edited

Loading

mtfishman commented Jun 5, 2024 •

edited

Loading

codecov-commenter commented Jun 6, 2024 •

edited

Loading

mtfishman commented Jun 6, 2024

kmp5VT commented Jun 6, 2024

kmp5VT commented Jun 6, 2024

mtfishman commented Jun 6, 2024

kmp5VT commented Jun 6, 2024 •

edited

Loading

[NDTensorscuTENSORExt] Temporary fix for block sparse contractions #1485

[NDTensorscuTENSORExt] Temporary fix for block sparse contractions #1485

Conversation

kmp5VT commented Jun 5, 2024 • edited Loading

Description

Checklist:

mtfishman commented Jun 5, 2024 • edited Loading

codecov-commenter commented Jun 6, 2024 • edited Loading

Codecov Report

mtfishman commented Jun 6, 2024

kmp5VT commented Jun 6, 2024

kmp5VT commented Jun 6, 2024

mtfishman commented Jun 6, 2024

kmp5VT commented Jun 6, 2024 • edited Loading

kmp5VT commented Jun 5, 2024 •

edited

Loading

mtfishman commented Jun 5, 2024 •

edited

Loading

codecov-commenter commented Jun 6, 2024 •

edited

Loading

kmp5VT commented Jun 6, 2024 •

edited

Loading