-
Notifications
You must be signed in to change notification settings - Fork 164
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding cuBLAS backend to oneMKL. #2
Adding cuBLAS backend to oneMKL. #2
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general, all new files lack a newline at the end of file. Can we ensure that an empty newline exists so that we comply with POSIX standard for file and line endings?
I am unable to find an option that enables this check in clang-format.
// By default the pointer mode is the CUBLAS_POINTER_MODE_HOST | ||
// when the data is on buffer, it must be set to | ||
// CUBLAS_POINTER_MODE_DEVICE mode otherwise it causes the segmentation | ||
// fault. When it is said to device it is users responsibility to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
said? is this a typo?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
// mimic iamax | ||
// we are converting the result to be the int and then we convert it back to | ||
// the actual data on the host | ||
// FIXME:: this change may cause failiour as the result of integer overflow |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FIXME? Is this supposed to be fixed already? Also, there are typos in the comments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the comment was outdated, I have updated the comments.
// cuda does not support int64_t as return type for the data. So we need to | ||
// mimic iamax we are converting the result to be the int and then we convert | ||
// it back to the actual data on the host | ||
// FIXME:: this change may cause failure as the result of integer overflow |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FIXME? Is this supposed to be fixed already?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the comment was outdated, I have updated the comments.
// converting float* to double * is very constly as sycl reinterpret does not | ||
// support conversion from two types which is not the same size. So in order, | ||
// to avoid loosing performance we are converting the result to be the float | ||
// FIXME:: this change may cause failiour as the result precision reduces. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FIXME? Is this supposed to be fixed already?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the comment was outdated, I have updated the comments.
CUBLAS_ERROR_FUNC(cublasSdot, err, handle, n, x_, incx, y_, incy, float_res_); | ||
}); | ||
}); | ||
/// FIXME::This is a temporary solution, this can result it precision issue. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FIXME? Is this supposed to be fixed already? result in: typo
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the comment was outdated, I have updated the comments.
|
||
GEMM_LAUNCHER(float, cublasSgemm) | ||
GEMM_LAUNCHER(double, cublasDgemm) | ||
// GEMM_LAUNCHER(std::complex<float>, cublasCgemm3m) from sm5 onward can improve |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does "from sm5 onward" mean?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is removed since the function mentioned in the comment was part of BLAS extension and not related to this PR.
// converting float* to double * is very costly operation as sycl reinterpret | ||
// does not support conversion from two types which is not the same size. | ||
// So in order, to avoid loosing performance we are converting the result to be | ||
// the float this change may cause failure as the result precision reduces. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the float. This chance...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
// does not support conversion from two types which is not the same size. | ||
// So in order, to avoid loosing performance we are converting the result to be | ||
// the float this change may cause failure as the result precision reduces. | ||
// Alternatively we need to a sycl kernel to elementwise copy the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we need to write? a sycl kernel
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
}); | ||
}); | ||
/// Since cuBLAS does not have sdot support, we had to do the operation in float and | ||
// convert it back into double this can result in precision issue. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
double. This can...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
using cuDataType = typename CudaEquivalentType<T>::Type; | ||
overflow_check(n, incx); | ||
// cuBLAS does not support int64_t as return type for the data. So we need to | ||
// mimic iamax we are converting the result to be the int and then we convert |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mimic iamax. We are converting...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
// mimic iamax we are converting the result to be the int and then we convert | ||
// it back to the actual data on the host. | ||
// This change may cause failure as the result of integer overflow | ||
// based on the size. Alternatively either we need to write two a sycl kernel |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
write two a ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
const int64_t incx, cl::sycl::buffer<int64_t, 1> &result) { | ||
using cuDataType = typename CudaEquivalentType<T>::Type; | ||
overflow_check(n, incx); | ||
// cuda does not support int64_t as return type for the data. So we need to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cuda -> cuBLAS
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
int64_t incx, cl::sycl::buffer<float, 1> &y, int64_t incy, | ||
cl::sycl::buffer<float, 1> &result) { | ||
overflow_check(n, incx, incy); | ||
// cuda does not support sdot so we need to mimic sdot |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cuda -> cuBLAS?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
// mimic iamin we are converting the result to be the int and then we convert | ||
// it back to the actual data on the host. | ||
// This change may cause failure as the result of integer overflow | ||
// based on the size. Alternatively, either we need to write two a sycl kernel |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
write two a ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
cublasSetPointerMode(handle, CUBLAS_POINTER_MODE_DEVICE); | ||
auto x_ = sc.get_mem<cuDataType *>(ih, x_acc); | ||
auto int_res_ = sc.get_mem<int *>(ih, int_res_acc); | ||
cublasStatus_t err; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
support for negative incx?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
iamin
is similar to iamax
auto x_ = sc.get_mem<cuDataType *>(ih, x_acc); | ||
auto int_res_ = sc.get_mem<int *>(ih, int_res_acc); | ||
cublasStatus_t err; | ||
// IAMAX does not support negative incx |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if that is the case, why incx is passed without std::abs()?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the wording did not convey the meaning. For iamax
when the incx
is negative cuBLAS returns 0 and does not execute the Kernel. This behavior is similar to that of reference NetlibBLAS. Because of that, we don't see the difference between the reference and cuBLAS. I think Intel's implementation is the same. So the incx
should not have abs around it because by adding abs, it converts the incx
to positive and the result would be different. So I have changed the wording.
Can we have an empty new line in every file? Many files are missing those. Once those changes are in, I can approve my part of the review. |
We tried to match the style of other files in oneMKL as there is no new empty line on any other files. However, I have manually added a new empty line at the end of the files that we have modified. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, thank you!
@mehdi-goli I apologize for my mistake and the inconvenience it may have caused you. Looks like our coding guidelines do not specify any newline requirements at the end of file, nor does clang-format support having them. You can either revert these changes before merge or we can do it on our side after merge. Which of these do you prefer? |
No problem. I have reverted them. |
This PR adds cuBLAS backend to OneMKL.
Requirements
To compile the cuBLAS backend PR 1332 is required.
Known issue
The test suit should run via ctest. When the tests run stand-alone(e.g ./bin/test_main_ct or ./bin/test_main_rt) , it can lead to segmentation fault due to issue 1520.