-
Notifications
You must be signed in to change notification settings - Fork 122
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add cudilu #5002
Add cudilu #5002
Conversation
b3c2288
to
99e0146
Compare
5e8c8d8
to
4e9f09e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally well done, there are some small beautification fixes.
In general I'd recommend splitting up the cusparse_matrix_operations-file a bit to one only containing DILU-specific stuff.
@@ -490,6 +502,27 @@ struct StandardPreconditioners<Operator,Dune::Amg::SequentialInformation> | |||
using CUJac = typename Opm::cuistl::CuJac<M, Opm::cuistl::CuVector<field_type>, Opm::cuistl::CuVector<field_type>>; | |||
return std::make_shared<Opm::cuistl::PreconditionerAdapter<V, V, CUJac>>(std::make_shared<CUJac>(op.getmat(), w)); | |||
}); | |||
|
|||
F::addCreator("CUDILU", [](const O& op, const P& prm, const std::function<V()>&, std::size_t) { | |||
DUNE_UNUSED_PARAMETER(prm); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can use [maybe_unused]
? DUNE_UNUSED_PARAMETER
is not really needed anymore.
}); | ||
|
||
F::addCreator("CUDILUFloat", [](const O& op, const P& prm, const std::function<V()>&, std::size_t) { | ||
DUNE_UNUSED_PARAMETER(prm); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can use [maybe_unused]
? DUNE_UNUSED_PARAMETER
is not really needed anymore.
int globCnt = 0; | ||
for (int i = 0; i < levelSets.size(); i++) { | ||
for (size_t j = 0; j < levelSets[i].size(); j++) { | ||
res[globCnt++] = (int)levelSets[i][j]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My understanding is that this function is probably not runtime critical? Then I'd add a sanity check here for globCnt < res.size()
. So something like
OPM_ERROR_IF(globCnt >= res.size(), fmt::format("Internal error. globCnt = {}, res.size() = {}", globCnt, res.size());
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, should probably use new style cast (static_cast<int>(...)
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What would make globCnt be larger than the vector when we have already allocated space for every item in the sparse table?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was more to guard against wrong use from outside of this function, ie enforce the invariant you have established. It is not strictly needed, but if someone is to change this class further down the line, they might not see the connection to this function.
int globCnt = 0; | ||
for (int i = 0; i < levelSets.size(); i++) { | ||
for (size_t j = 0; j < levelSets[i].size(); j++) { | ||
res[levelSets[i][j]] = globCnt++; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again here I'd add a sanity check if the function is not runtime critical (I assume not since a vector is being allocated here?)
std::vector<int> | ||
createReorderedToNatural(Opm::SparseTable<size_t> levelSets) | ||
{ | ||
auto res = std::vector<int>(levelSets.dataSize()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dataSize()
returns int
, but std::vector
expects size_t
, this should do an explicit conversion to avoid warnings (use the to_size_t
functions in detail
)
// PLS: c += A*b | ||
// MINUS: c -= A*b | ||
template <class T, int blocksize, MVType OP> | ||
__device__ __forceinline__ void mv(T* A, T* b, T* c) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a bit dubious to this, but in principle it is ok. However, could you rename it to something slightly more meaningful (say matrixVectorProductWithAction
, and then add wrapper functions that just calls this function with the correct template arguments, eg:
template<class T>
__device__ __forceinline__ void umv(T* A, T* b, T* c) {
mv<T, MVType::PLUS>(A, b, c);
}
Here, I would use either the naming standard from Dune ie
mv ==> c = Ab
umv ==> c += Ab
mmv ==> c -= Ab
or something more verbose (eg. matrixVectorMultiply
, addMatrixVector
, subtractMatrixVector
)
const auto reorderedRowIdx = startIdx + blockDim.x * blockIdx.x + threadIdx.x; | ||
if (reorderedRowIdx < rowsInLevelSet + startIdx) { | ||
int naturalRowIdx = reorderedToNatural[reorderedRowIdx]; | ||
size_t nnzIdx = rowIndices[reorderedRowIdx]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Both of these can be const
?
size_t nnzIdx = rowIndices[reorderedRowIdx]; | ||
|
||
int diagIdx = nnzIdx; | ||
while (colIndices[diagIdx] != naturalRowIdx) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
braces
} | ||
} | ||
|
||
int symOpposite = mid; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
const
size_t nnzIdx = rowIndices[row]; | ||
size_t nnzIdxLim = rowIndices[row + 1]; | ||
size_t nnzIdx = rowIndices[reorderedRowIdx]; | ||
int naturalRowIdx = indexConversion[reorderedRowIdx]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
const
?
I have now gone through the feedback and made the draft PR a regular PR. If this is given the green light I will squash everything into one commit and hopefully it will be merged. As of the only thing missing is to clear up the use of C-style arrays vs std::array for local variables in some of the kernels. |
Jenkins build this please |
preconditioner. Uses graph coloring to exploit parallelism in upper and triangular solves when computing a diagonal approximate inverse of a sparse matrix. Supports blocksizes up to 3.
c157d9e
to
4b0dd54
Compare
Rebased to simplify history. |
jenkins build this please |
All green, merging. Thanks for the effort! |
Draft version of GPU DILU preconditioner
This DILU implementation uses graph coloring to parallelize the apply and update step of the preconditioner. Supports blocksizes up to 3, and also a float version for benchmarking.