Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding cuBLAS backend to oneMKL. #2

Merged
merged 1 commit into from
Apr 21, 2020
Merged

Adding cuBLAS backend to oneMKL. #2

merged 1 commit into from
Apr 21, 2020

Conversation

mehdi-goli
Copy link
Contributor

This PR adds cuBLAS backend to OneMKL.

Requirements
To compile the cuBLAS backend PR 1332 is required.

Known issue
The test suit should run via ctest. When the tests run stand-alone(e.g ./bin/test_main_ct or ./bin/test_main_rt) , it can lead to segmentation fault due to issue 1520.

Copy link
Contributor

@vmalia vmalia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general, all new files lack a newline at the end of file. Can we ensure that an empty newline exists so that we comply with POSIX standard for file and line endings?
I am unable to find an option that enables this check in clang-format.

cmake/FindcuBLAS.cmake Outdated Show resolved Hide resolved
README.md Show resolved Hide resolved
README.md Outdated Show resolved Hide resolved
README.md Outdated Show resolved Hide resolved
README.md Outdated Show resolved Hide resolved
README.md Outdated Show resolved Hide resolved
include/onemkl/blas/detail/cublas/blas_ct.hpp Show resolved Hide resolved
src/blas/backends/cublas/CMakeLists.txt Outdated Show resolved Hide resolved
// By default the pointer mode is the CUBLAS_POINTER_MODE_HOST
// when the data is on buffer, it must be set to
// CUBLAS_POINTER_MODE_DEVICE mode otherwise it causes the segmentation
// fault. When it is said to device it is users responsibility to
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

said? is this a typo?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

// mimic iamax
// we are converting the result to be the int and then we convert it back to
// the actual data on the host
// FIXME:: this change may cause failiour as the result of integer overflow
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FIXME? Is this supposed to be fixed already? Also, there are typos in the comments.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the comment was outdated, I have updated the comments.

// cuda does not support int64_t as return type for the data. So we need to
// mimic iamax we are converting the result to be the int and then we convert
// it back to the actual data on the host
// FIXME:: this change may cause failure as the result of integer overflow
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FIXME? Is this supposed to be fixed already?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the comment was outdated, I have updated the comments.

// converting float* to double * is very constly as sycl reinterpret does not
// support conversion from two types which is not the same size. So in order,
// to avoid loosing performance we are converting the result to be the float
// FIXME:: this change may cause failiour as the result precision reduces.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FIXME? Is this supposed to be fixed already?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the comment was outdated, I have updated the comments.

CUBLAS_ERROR_FUNC(cublasSdot, err, handle, n, x_, incx, y_, incy, float_res_);
});
});
/// FIXME::This is a temporary solution, this can result it precision issue.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FIXME? Is this supposed to be fixed already? result in: typo

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the comment was outdated, I have updated the comments.


GEMM_LAUNCHER(float, cublasSgemm)
GEMM_LAUNCHER(double, cublasDgemm)
// GEMM_LAUNCHER(std::complex<float>, cublasCgemm3m) from sm5 onward can improve
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does "from sm5 onward" mean?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is removed since the function mentioned in the comment was part of BLAS extension and not related to this PR.

// converting float* to double * is very costly operation as sycl reinterpret
// does not support conversion from two types which is not the same size.
// So in order, to avoid loosing performance we are converting the result to be
// the float this change may cause failure as the result precision reduces.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the float. This chance...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

// does not support conversion from two types which is not the same size.
// So in order, to avoid loosing performance we are converting the result to be
// the float this change may cause failure as the result precision reduces.
// Alternatively we need to a sycl kernel to elementwise copy the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we need to write? a sycl kernel

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

});
});
/// Since cuBLAS does not have sdot support, we had to do the operation in float and
// convert it back into double this can result in precision issue.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

double. This can...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

using cuDataType = typename CudaEquivalentType<T>::Type;
overflow_check(n, incx);
// cuBLAS does not support int64_t as return type for the data. So we need to
// mimic iamax we are converting the result to be the int and then we convert
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mimic iamax. We are converting...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

// mimic iamax we are converting the result to be the int and then we convert
// it back to the actual data on the host.
// This change may cause failure as the result of integer overflow
// based on the size. Alternatively either we need to write two a sycl kernel
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

write two a ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

const int64_t incx, cl::sycl::buffer<int64_t, 1> &result) {
using cuDataType = typename CudaEquivalentType<T>::Type;
overflow_check(n, incx);
// cuda does not support int64_t as return type for the data. So we need to
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cuda -> cuBLAS

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

int64_t incx, cl::sycl::buffer<float, 1> &y, int64_t incy,
cl::sycl::buffer<float, 1> &result) {
overflow_check(n, incx, incy);
// cuda does not support sdot so we need to mimic sdot
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cuda -> cuBLAS?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

// mimic iamin we are converting the result to be the int and then we convert
// it back to the actual data on the host.
// This change may cause failure as the result of integer overflow
// based on the size. Alternatively, either we need to write two a sycl kernel
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

write two a ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

cublasSetPointerMode(handle, CUBLAS_POINTER_MODE_DEVICE);
auto x_ = sc.get_mem<cuDataType *>(ih, x_acc);
auto int_res_ = sc.get_mem<int *>(ih, int_res_acc);
cublasStatus_t err;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

support for negative incx?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

iamin is similar to iamax

auto x_ = sc.get_mem<cuDataType *>(ih, x_acc);
auto int_res_ = sc.get_mem<int *>(ih, int_res_acc);
cublasStatus_t err;
// IAMAX does not support negative incx
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if that is the case, why incx is passed without std::abs()?

Copy link
Contributor Author

@mehdi-goli mehdi-goli Apr 17, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the wording did not convey the meaning. For iamax when the incx is negative cuBLAS returns 0 and does not execute the Kernel. This behavior is similar to that of reference NetlibBLAS. Because of that, we don't see the difference between the reference and cuBLAS. I think Intel's implementation is the same. So the incx should not have abs around it because by adding abs, it converts the incx to positive and the result would be different. So I have changed the wording.

@mehdi-goli mehdi-goli requested a review from vmalia April 20, 2020 08:43
@vmalia
Copy link
Contributor

vmalia commented Apr 20, 2020

Can we have an empty new line in every file? Many files are missing those. Once those changes are in, I can approve my part of the review.

@mehdi-goli
Copy link
Contributor Author

Can we have an empty new line in every file? Many files are missing those. Once those changes are in, I can approve my part of the review.

We tried to match the style of other files in oneMKL as there is no new empty line on any other files. However, I have manually added a new empty line at the end of the files that we have modified.

Copy link
Contributor

@vmalia vmalia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, thank you!

README.md Show resolved Hide resolved
@vmalia
Copy link
Contributor

vmalia commented Apr 21, 2020

@mehdi-goli I apologize for my mistake and the inconvenience it may have caused you. Looks like our coding guidelines do not specify any newline requirements at the end of file, nor does clang-format support having them. You can either revert these changes before merge or we can do it on our side after merge. Which of these do you prefer?

@mehdi-goli
Copy link
Contributor Author

@mehdi-goli I apologize for my mistake and the inconvenience it may have caused you. Looks like our coding guidelines do not specify any newline requirements at the end of file, nor does clang-format support having them. You can either revert these changes before merge or we can do it on our side after merge. Which of these do you prefer?

No problem. I have reverted them.

@jasukhar jasukhar merged commit 7514726 into uxlfoundation:master Apr 21, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants