Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add cudilu #5002

Merged
merged 1 commit into from
Jan 25, 2024
Merged

Add cudilu #5002

merged 1 commit into from
Jan 25, 2024

Conversation

multitalentloes
Copy link
Contributor

@multitalentloes multitalentloes commented Nov 17, 2023

Draft version of GPU DILU preconditioner
This DILU implementation uses graph coloring to parallelize the apply and update step of the preconditioner. Supports blocksizes up to 3, and also a float version for benchmarking.

Copy link
Contributor

@kjetilly kjetilly left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally well done, there are some small beautification fixes.

In general I'd recommend splitting up the cusparse_matrix_operations-file a bit to one only containing DILU-specific stuff.

@@ -490,6 +502,27 @@ struct StandardPreconditioners<Operator,Dune::Amg::SequentialInformation>
using CUJac = typename Opm::cuistl::CuJac<M, Opm::cuistl::CuVector<field_type>, Opm::cuistl::CuVector<field_type>>;
return std::make_shared<Opm::cuistl::PreconditionerAdapter<V, V, CUJac>>(std::make_shared<CUJac>(op.getmat(), w));
});

F::addCreator("CUDILU", [](const O& op, const P& prm, const std::function<V()>&, std::size_t) {
DUNE_UNUSED_PARAMETER(prm);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can use [maybe_unused]? DUNE_UNUSED_PARAMETER is not really needed anymore.

});

F::addCreator("CUDILUFloat", [](const O& op, const P& prm, const std::function<V()>&, std::size_t) {
DUNE_UNUSED_PARAMETER(prm);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can use [maybe_unused]? DUNE_UNUSED_PARAMETER is not really needed anymore.

int globCnt = 0;
for (int i = 0; i < levelSets.size(); i++) {
for (size_t j = 0; j < levelSets[i].size(); j++) {
res[globCnt++] = (int)levelSets[i][j];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My understanding is that this function is probably not runtime critical? Then I'd add a sanity check here for globCnt < res.size(). So something like

OPM_ERROR_IF(globCnt >= res.size(), fmt::format("Internal error. globCnt = {}, res.size() = {}", globCnt, res.size());

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, should probably use new style cast (static_cast<int>(...))

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What would make globCnt be larger than the vector when we have already allocated space for every item in the sparse table?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was more to guard against wrong use from outside of this function, ie enforce the invariant you have established. It is not strictly needed, but if someone is to change this class further down the line, they might not see the connection to this function.

int globCnt = 0;
for (int i = 0; i < levelSets.size(); i++) {
for (size_t j = 0; j < levelSets[i].size(); j++) {
res[levelSets[i][j]] = globCnt++;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again here I'd add a sanity check if the function is not runtime critical (I assume not since a vector is being allocated here?)

std::vector<int>
createReorderedToNatural(Opm::SparseTable<size_t> levelSets)
{
auto res = std::vector<int>(levelSets.dataSize());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dataSize() returns int, but std::vector expects size_t, this should do an explicit conversion to avoid warnings (use the to_size_t functions in detail)

// PLS: c += A*b
// MINUS: c -= A*b
template <class T, int blocksize, MVType OP>
__device__ __forceinline__ void mv(T* A, T* b, T* c)
Copy link
Contributor

@kjetilly kjetilly Jan 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit dubious to this, but in principle it is ok. However, could you rename it to something slightly more meaningful (say matrixVectorProductWithAction, and then add wrapper functions that just calls this function with the correct template arguments, eg:

template<class T>
__device__ __forceinline__ void umv(T* A, T* b, T* c) {
    mv<T, MVType::PLUS>(A, b, c);
}

Here, I would use either the naming standard from Dune ie

  • mv ==> c = Ab
  • umv ==> c += Ab
  • mmv ==> c -= Ab

or something more verbose (eg. matrixVectorMultiply, addMatrixVector, subtractMatrixVector)

const auto reorderedRowIdx = startIdx + blockDim.x * blockIdx.x + threadIdx.x;
if (reorderedRowIdx < rowsInLevelSet + startIdx) {
int naturalRowIdx = reorderedToNatural[reorderedRowIdx];
size_t nnzIdx = rowIndices[reorderedRowIdx];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both of these can be const?

size_t nnzIdx = rowIndices[reorderedRowIdx];

int diagIdx = nnzIdx;
while (colIndices[diagIdx] != naturalRowIdx)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

braces

}
}

int symOpposite = mid;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

const

size_t nnzIdx = rowIndices[row];
size_t nnzIdxLim = rowIndices[row + 1];
size_t nnzIdx = rowIndices[reorderedRowIdx];
int naturalRowIdx = indexConversion[reorderedRowIdx];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

const?

@multitalentloes multitalentloes marked this pull request as ready for review January 19, 2024 13:28
@multitalentloes
Copy link
Contributor Author

I have now gone through the feedback and made the draft PR a regular PR. If this is given the green light I will squash everything into one commit and hopefully it will be merged. As of the only thing missing is to clear up the use of C-style arrays vs std::array for local variables in some of the kernels.

@kjetilly
Copy link
Contributor

Jenkins build this please

preconditioner. Uses graph coloring to exploit
parallelism in upper and triangular solves when
computing a diagonal approximate inverse of a
sparse matrix. Supports blocksizes up to 3.
@multitalentloes
Copy link
Contributor Author

Rebased to simplify history.

@atgeirr
Copy link
Member

atgeirr commented Jan 25, 2024

jenkins build this please

@atgeirr
Copy link
Member

atgeirr commented Jan 25, 2024

All green, merging. Thanks for the effort!

@atgeirr atgeirr merged commit 2626fbb into OPM:master Jan 25, 2024
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants