Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize grouping indexes #1044

Merged
merged 2 commits into from
Mar 14, 2024
Merged

Optimize grouping indexes #1044

merged 2 commits into from
Mar 14, 2024

Conversation

GiovanniBussi
Copy link
Member

I am again working with this input file:

WHOLEMOLECULES ENTITY0=1-100000
c: CENTER ATOMS=1-100000
pos: POSITION ATOM=c
RESTRAINT ARG=pos.x AT=0.0 KAPPA=1

In this PR I tried to optimize loops like this:

 for(const auto & a : atom_value_ind) {
   plumed_dbg_massert( ind<forcesToApply.size(), "problem setting forces in " + getLabel() );
   std::size_t nn = a.first, kk = a.second;
   xpos[nn]->inputForce[kk] += forcesToApply[ind]; ind++;
   ypos[nn]->inputForce[kk] += forcesToApply[ind]; ind++;
   zpos[nn]->inputForce[kk] += forcesToApply[ind]; ind++;
 }

Into loops like this:

  for(const auto & a : atom_value_ind_grouped) {
    const auto nn=a.first;
    plumed_dbg_assert(ind<forcesToApply.size()) << "problem setting forces in " << getLabel();
    auto & xp=xpos[nn]->inputForce;
    auto & yp=ypos[nn]->inputForce;
    auto & zp=zpos[nn]->inputForce;
    for(const auto & kk : a.second) {
      xp[kk] += forcesToApply[ind]; ind++;
      yp[kk] += forcesToApply[ind]; ind++;
      zp[kk] += forcesToApply[ind]; ind++;
    }
  }

Here atom_value_ind_grouped is constructed when atom_value_ind is updated, and basically stores the same information in a different way, exployting the fact that for most of the iterations in the first implementation nn is constant.

@gtribello can you have a look and check if this makes sense? Is this a reasonable assumption on the memory access pattern?

The improvement is quite measurable. Below the comparison is:

The overhead decreases from 18% (using blas) to 12% (this commit).

BENCH:  Kernel:      /Users/bussi/plumed2/tmp/v2.9-mpi-ref/lib/libplumedKernel.dylib
BENCH:  Input:       plumed.dat
BENCH:  Comparative: 1.000 +- 0.000
BENCH:                                                Cycles        Total      Average      Minimum      Maximum
BENCH:  A Initialization                                   1     0.162712     0.162712     0.162712     0.162712
BENCH:  B0 First step                                      1     0.008648     0.008648     0.008648     0.008648
BENCH:  B1 Warm-up                                        99     0.450701     0.004553     0.004138     0.004917
BENCH:  B2 Calculation part 1                            200     0.914300     0.004572     0.004184     0.005419
BENCH:  B3 Calculation part 2                            200     0.907306     0.004537     0.004116     0.005461
PLUMED:                                               Cycles        Total      Average      Minimum      Maximum
PLUMED:                                                    1     2.440795     2.440795     2.440795     2.440795
PLUMED: 1 Prepare dependencies                           500     0.001112     0.000002     0.000001     0.000016
PLUMED: 2 Sharing data                                   500     0.143430     0.000287     0.000232     0.001424
PLUMED: 3 Waiting for data                               500     0.000257     0.000001     0.000000     0.000007
PLUMED: 4 Calculating (forward loop)                     500     1.663443     0.003327     0.002997     0.006315
PLUMED: 5 Applying (backward loop)                       500     0.461751     0.000924     0.000834     0.001461
PLUMED: 6 Update                                         500     0.000515     0.000001     0.000001     0.000003
BENCH:  
BENCH:  Kernel:      /Users/bussi/plumed2/tmp/reference/lib/libplumedKernel.dylib
BENCH:  Input:       plumed.dat
BENCH:  Comparative: 1.176 +- 0.004
BENCH:                                                Cycles        Total      Average      Minimum      Maximum
BENCH:  A Initialization                                   1     0.170923     0.170923     0.170923     0.170923
BENCH:  B0 First step                                      1     0.011203     0.011203     0.011203     0.011203
BENCH:  B1 Warm-up                                        99     0.530119     0.005355     0.004808     0.005859
BENCH:  B2 Calculation part 1                            200     1.073778     0.005369     0.004811     0.007811
BENCH:  B3 Calculation part 2                            200     1.068996     0.005345     0.004818     0.006343
PLUMED:                                               Cycles        Total      Average      Minimum      Maximum
PLUMED:                                                    1     2.852410     2.852410     2.852410     2.852410
PLUMED: 1 Prepare dependencies                           500     0.001712     0.000003     0.000001     0.000021
PLUMED: 2 Sharing data                                   500     0.173275     0.000347     0.000287     0.000940
PLUMED: 3 Waiting for data                               500     0.001615     0.000003     0.000001     0.000023
PLUMED: 4 Calculating (forward loop)                     500     1.899739     0.003799     0.003378     0.007691
PLUMED: 5 Applying (backward loop)                       500     0.595992     0.001192     0.001059     0.002648
PLUMED: 6 Update                                         500     0.002300     0.000005     0.000003     0.000042
BENCH:  
BENCH:  Kernel:      /Users/bussi/plumed2/src/lib/libplumedKernel.dylib
BENCH:  Input:       plumed.dat
BENCH:  Comparative: 1.125 +- 0.003
BENCH:                                                Cycles        Total      Average      Minimum      Maximum
BENCH:  A Initialization                                   1     0.176497     0.176497     0.176497     0.176497
BENCH:  B0 First step                                      1     0.009901     0.009901     0.009901     0.009901
BENCH:  B1 Warm-up                                        99     0.513381     0.005186     0.004732     0.005876
BENCH:  B2 Calculation part 1                            200     1.028328     0.005142     0.004676     0.005917
BENCH:  B3 Calculation part 2                            200     1.021688     0.005108     0.004611     0.005853
PLUMED:                                               Cycles        Total      Average      Minimum      Maximum
PLUMED:                                                    1     2.747601     2.747601     2.747601     2.747601
PLUMED: 1 Prepare dependencies                           500     0.001586     0.000003     0.000001     0.000005
PLUMED: 2 Sharing data                                   500     0.168293     0.000337     0.000287     0.000723
PLUMED: 3 Waiting for data                               500     0.001344     0.000003     0.000001     0.000016
PLUMED: 4 Calculating (forward loop)                     500     1.850951     0.003702     0.003321     0.007262
PLUMED: 5 Applying (backward loop)                       500     0.540332     0.001081     0.000951     0.001962
PLUMED: 6 Update                                         500     0.002144     0.000004     0.000003     0.000026

Comment on lines +317 to +319
// for(unsigned j=0; j<forcesForApply.size(); ++j) {
// forcesForApply[j]+=omp_f[j];
// }

Check notice

Code scanning / CodeQL

Commented-out code Note

This comment appears to contain commented-out code.
Comment on lines +305 to +306
// if( nt>1 ) for(unsigned j=0; j<nder; ++j) omp_f[j] += ff*thisderiv[1+j];
//else for(unsigned j=0; j<nder; ++j) forcesForApply[j] += ff*thisderiv[1+j];

Check notice

Code scanning / CodeQL

Commented-out code Note

This comment appears to contain commented-out code.
Comment on lines +330 to +336
// for(const auto & a : atom_value_ind) {
// plumed_dbg_massert( ind<forcesToApply.size(), "problem setting forces in " + getLabel() );
// std::size_t nn = a.first, kk = a.second;
// xpos[nn]->inputForce[kk] += forcesToApply[ind]; ind++;
// ypos[nn]->inputForce[kk] += forcesToApply[ind]; ind++;
// zpos[nn]->inputForce[kk] += forcesToApply[ind]; ind++;
// }

Check notice

Code scanning / CodeQL

Commented-out code Note

This comment appears to contain commented-out code.
Comment on lines +293 to +301
// for(const auto & a : atom_value_ind) {
// std::size_t nn = a.first, kk = a.second;
// positions[j][0] = xpos[nn]->data[kk];
// positions[j][1] = ypos[nn]->data[kk];
// positions[j][2] = zpos[nn]->data[kk];
// charges[j] = chargev[nn]->data[kk];
// masses[j] = masv[nn]->data[kk];
// j++;
// }

Check notice

Code scanning / CodeQL

Commented-out code Note

This comment appears to contain commented-out code.
Comment on lines +75 to +80
// for(const auto & a : atom_value_ind) {
// std::size_t nn = a.first, kk = a.second;
// xpos[nn]->inputForce[kk] += xf*xval->data[1+k] + yf*yval->data[1+k] + zf*zval->data[1+k]; k++;
// ypos[nn]->inputForce[kk] += xf*xval->data[1+k] + yf*yval->data[1+k] + zf*zval->data[1+k]; k++;
// zpos[nn]->inputForce[kk] += xf*xval->data[1+k] + yf*yval->data[1+k] + zf*zval->data[1+k]; k++;
// }

Check notice

Code scanning / CodeQL

Commented-out code Note

This comment appears to contain commented-out code.
atom_value_ind_grouped.back().second.push_back(kk);
auto prev_nn=nn;
for(unsigned i=1; i<atom_value_ind.size(); i++) {
auto nn = atom_value_ind[i].first;

Check notice

Code scanning / CodeQL

Declaration hides variable Note

Variable nn hides another variable of the same name (on
line 122
).
auto prev_nn=nn;
for(unsigned i=1; i<atom_value_ind.size(); i++) {
auto nn = atom_value_ind[i].first;
auto kk = atom_value_ind[i].second;

Check notice

Code scanning / CodeQL

Declaration hides variable Note

Variable kk hides another variable of the same name (on
line 123
).
@GiovanniBussi
Copy link
Member Author

The improvement with intel compiler on my workstation is less (29% to 27% overhead), but still measurable.

BENCH:  Kernel:      /scratch/bussi/plumed2/tmp/intel-v2.9/lib/libplumedKernel.so
BENCH:  Input:       plumed.dat
BENCH:  Comparative: 1.000 +- 0.000
BENCH:                                                Cycles        Total      Average      Minimum      Maximum
BENCH:  A Initialization                                   1     0.424078     0.424078     0.424078     0.424078
BENCH:  B0 First step                                      1     0.017435     0.017435     0.017435     0.017435
BENCH:  B1 Warm-up                                        99     0.899508     0.009086     0.008296     0.012992
BENCH:  B2 Calculation part 1                            200     1.743061     0.008715     0.008302     0.011589
BENCH:  B3 Calculation part 2                            200     1.745426     0.008727     0.008314     0.011583
PLUMED:                                               Cycles        Total      Average      Minimum      Maximum
PLUMED:                                                    1     4.821678     4.821678     4.821678     4.821678
PLUMED: 1 Prepare dependencies                           500     0.001949     0.000004     0.000002     0.000015
PLUMED: 2 Sharing data                                   500     0.404638     0.000809     0.000504     0.002490
PLUMED: 3 Waiting for data                               500     0.000810     0.000002     0.000001     0.000013
PLUMED: 4 Calculating (forward loop)                     500     3.145875     0.006292     0.006110     0.012230
PLUMED: 5 Applying (backward loop)                       500     0.824214     0.001648     0.001609     0.002594
PLUMED: 6 Update                                         500     0.001113     0.000002     0.000001     0.000006
BENCH:  
BENCH:  Kernel:      /scratch/bussi/plumed2/tmp/intel-reference/lib/libplumedKernel.so
BENCH:  Input:       plumed.dat
BENCH:  Comparative: 1.287 +- 0.003
BENCH:                                                Cycles        Total      Average      Minimum      Maximum
BENCH:  A Initialization                                   1     0.440200     0.440200     0.440200     0.440200
BENCH:  B0 First step                                      1     0.022010     0.022010     0.022010     0.022010
BENCH:  B1 Warm-up                                        99     1.163765     0.011755     0.010770     0.017656
BENCH:  B2 Calculation part 1                            200     2.243936     0.011220     0.010745     0.017048
BENCH:  B3 Calculation part 2                            200     2.243593     0.011218     0.010752     0.017610
PLUMED:                                               Cycles        Total      Average      Minimum      Maximum
PLUMED:                                                    1     6.106817     6.106817     6.106817     6.106817
PLUMED: 1 Prepare dependencies                           500     0.002200     0.000004     0.000002     0.000014
PLUMED: 2 Sharing data                                   500     0.343587     0.000687     0.000527     0.001934
PLUMED: 3 Waiting for data                               500     0.002487     0.000005     0.000004     0.000024
PLUMED: 4 Calculating (forward loop)                     500     4.122769     0.008246     0.007840     0.015409
PLUMED: 5 Applying (backward loop)                       500     1.176878     0.002354     0.002314     0.004537
PLUMED: 6 Update                                         500     0.004813     0.000010     0.000009     0.000021
BENCH:  
BENCH:  Kernel:      /scratch/bussi/plumed2/src/lib/libplumedKernel.so
BENCH:  Input:       plumed.dat
BENCH:  Comparative: 1.267 +- 0.002
BENCH:                                                Cycles        Total      Average      Minimum      Maximum
BENCH:  A Initialization                                   1     0.445317     0.445317     0.445317     0.445317
BENCH:  B0 First step                                      1     0.027140     0.027140     0.027140     0.027140
BENCH:  B1 Warm-up                                        99     1.140178     0.011517     0.010588     0.016933
BENCH:  B2 Calculation part 1                            200     2.206530     0.011033     0.010579     0.015644
BENCH:  B3 Calculation part 2                            200     2.213058     0.011065     0.010590     0.015673
PLUMED:                                               Cycles        Total      Average      Minimum      Maximum
PLUMED:                                                    1     6.026409     6.026409     6.026409     6.026409
PLUMED: 1 Prepare dependencies                           500     0.002142     0.000004     0.000003     0.000014
PLUMED: 2 Sharing data                                   500     0.344054     0.000688     0.000530     0.003223
PLUMED: 3 Waiting for data                               500     0.002326     0.000005     0.000004     0.000026
PLUMED: 4 Calculating (forward loop)                     500     4.091521     0.008183     0.007775     0.019535
PLUMED: 5 Applying (backward loop)                       500     1.123146     0.002246     0.002201     0.004265
PLUMED: 6 Update                                         500     0.004250     0.000009     0.000008     0.000018

@gtribello
Copy link
Member

Hi @GiovanniBussi

This looks fine to me. As far as I am concerned you can go ahead and merge.

Thanks for taking the time to optimise the code.

@GiovanniBussi GiovanniBussi merged commit 7b7bedf into master Mar 14, 2024
34 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants