-
Notifications
You must be signed in to change notification settings - Fork 296
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize grouping indexes #1044
Conversation
// for(unsigned j=0; j<forcesForApply.size(); ++j) { | ||
// forcesForApply[j]+=omp_f[j]; | ||
// } |
Check notice
Code scanning / CodeQL
Commented-out code Note
// if( nt>1 ) for(unsigned j=0; j<nder; ++j) omp_f[j] += ff*thisderiv[1+j]; | ||
//else for(unsigned j=0; j<nder; ++j) forcesForApply[j] += ff*thisderiv[1+j]; |
Check notice
Code scanning / CodeQL
Commented-out code Note
// for(const auto & a : atom_value_ind) { | ||
// plumed_dbg_massert( ind<forcesToApply.size(), "problem setting forces in " + getLabel() ); | ||
// std::size_t nn = a.first, kk = a.second; | ||
// xpos[nn]->inputForce[kk] += forcesToApply[ind]; ind++; | ||
// ypos[nn]->inputForce[kk] += forcesToApply[ind]; ind++; | ||
// zpos[nn]->inputForce[kk] += forcesToApply[ind]; ind++; | ||
// } |
Check notice
Code scanning / CodeQL
Commented-out code Note
// for(const auto & a : atom_value_ind) { | ||
// std::size_t nn = a.first, kk = a.second; | ||
// positions[j][0] = xpos[nn]->data[kk]; | ||
// positions[j][1] = ypos[nn]->data[kk]; | ||
// positions[j][2] = zpos[nn]->data[kk]; | ||
// charges[j] = chargev[nn]->data[kk]; | ||
// masses[j] = masv[nn]->data[kk]; | ||
// j++; | ||
// } |
Check notice
Code scanning / CodeQL
Commented-out code Note
// for(const auto & a : atom_value_ind) { | ||
// std::size_t nn = a.first, kk = a.second; | ||
// xpos[nn]->inputForce[kk] += xf*xval->data[1+k] + yf*yval->data[1+k] + zf*zval->data[1+k]; k++; | ||
// ypos[nn]->inputForce[kk] += xf*xval->data[1+k] + yf*yval->data[1+k] + zf*zval->data[1+k]; k++; | ||
// zpos[nn]->inputForce[kk] += xf*xval->data[1+k] + yf*yval->data[1+k] + zf*zval->data[1+k]; k++; | ||
// } |
Check notice
Code scanning / CodeQL
Commented-out code Note
atom_value_ind_grouped.back().second.push_back(kk); | ||
auto prev_nn=nn; | ||
for(unsigned i=1; i<atom_value_ind.size(); i++) { | ||
auto nn = atom_value_ind[i].first; |
Check notice
Code scanning / CodeQL
Declaration hides variable Note
line 122
auto prev_nn=nn; | ||
for(unsigned i=1; i<atom_value_ind.size(); i++) { | ||
auto nn = atom_value_ind[i].first; | ||
auto kk = atom_value_ind[i].second; |
Check notice
Code scanning / CodeQL
Declaration hides variable Note
line 123
The improvement with intel compiler on my workstation is less (29% to 27% overhead), but still measurable.
|
This looks fine to me. As far as I am concerned you can go ahead and merge. Thanks for taking the time to optimise the code. |
I am again working with this input file:
In this PR I tried to optimize loops like this:
Into loops like this:
Here
atom_value_ind_grouped
is constructed whenatom_value_ind
is updated, and basically stores the same information in a different way, exployting the fact that for most of the iterations in the first implementationnn
is constant.@gtribello can you have a look and check if this makes sense? Is this a reasonable assumption on the memory access pattern?
The improvement is quite measurable. Below the comparison is:
The overhead decreases from 18% (using blas) to 12% (this commit).