-
Notifications
You must be signed in to change notification settings - Fork 54
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comparing Fortran and CTF performance on symmetries in tensor contractions #136
Comments
Isn't numpy row major? Do you need to flip the indices on the Python version to get the equivalent data access pattern as Fortran? In any case, I'm afraid that a contracted dimension of 32 is about an order-of-magnitude too small to saturate the compute capability of a modern Intel CPU (K=384 saturates, according to former colleagues in the MKL team), so I'm afraid you are limited by bandwidth limitations and other overheads. It would be very interesting to perform the CTF comparisons on a very contractions, particularly the more expensive N^6 terms in CCSD. The so-called "four-particle ladder" term (https://github.com/nwchemgit/nwchem/blob/master/src/tce/ccsd/ccsd_t2_8.F) is the most expensive in TCE's CCSD, for a reasonable number of virtual orbitals. All the terms in CC2 are N^5 (like MP2) and aren't super interesting from a compute perspective, because all the efficient implementations of CC2 and MP2 don't store the two-electron integrals in the fully transformed representation. Unrelated to CTF, I had some fun porting your Fortran test to use GPUs using NVIDIA StdPar support (maps |
Hi, thank you for your comments and suggestions. I tried to increase the contracted dimension to 384, others 32, the output is
I can transpose the array
The revised
|
Hi,
I made a comparison for the contraction
A[a,b] B[b,c,d,e] = C[a,c,d,e]
, whereB[:,:,d,e] = B[:,:,e,d]
. I used Fortran by loop over external indicesd,e
to utilize the symmetry thendgemm
. InCTF
, I tried to setctf.tensor([N,N,N,N],sym=[ctf.SYM.NS, ctf.SYM.NS, ctf.SYM.SY,ctf.SYM.NS])
. Both Fortran (gfortran) andCTF
usedopenblas
library. The loop over indices strategy appears (does not mean first) in tensor contraction engine J. Phys. Chem. A 2003, 107, 9887-9897 and innwchem
http://gitlab.hpcrl.cse.ohio-state.edu/jinsung/nwchem-cogent-master/-/blob/eac3c06962a89c597fab04aa19bcf9c3989b3ae4/nwchem/src/tce/ccsd/cc2_t2.FHere is the Fortran code
Here is the
CTF
called from pythonThe results are, Fortran
Python
It seems I can get ~ 90% speed up by symmetry in Fortran, but ~ 0 in
CTF
. If I use python to do similar loops over external indices, witheinsum
as the Fortran code did, the results will be much slower (>a factor of 10). Is my setting onCTF
correct?From intel people, the efficiency of calling
MKL
from Fotran and C++ is similar https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/performance-of-MKL-by-c-or-fortran/td-p/936921I suppose it holds for
openblas
. So I think if I compare with C++ with dgemm and python loadedCTF
, the results will be similar.The text was updated successfully, but these errors were encountered: